J. Mol. Biol. (1984) 178, 941-948
LETTER
TO THE EDITOR
Use of Transposon-promoted Deletions in DNA Sequence Analysis The usefulness of the dideoxy method for DNA sequencing can be greatly extended by the use of transposon-generated deletions. These deletions have the unique property of extending from a fixed nucleotide at the transposon terminus to various sites outside it. A plasmid (pAA3.7) carrying Tn9, which allows positive selection of such deletions as galactose-resistant colonies of Escherichia coli, is described. A cloned gene can thus be subdivided into a series of overlapping sequences, all of which are fused to a common sequence at the transposon terminus. Restriction fragments carrying the segments fused by deletions are cloned in M13, and sequenced using a primer complementary to the Tn9 terminus. Complete nucleotide sequence of the gene is assembled from sequence overlaps found in deletions with end-points approximately 350 base-pairsapart. The method is rapid, requires minimal in vitro manipulation, and is free from
redundant information normally producedin shotgun sequencing.
The dideoxy chain termination method (Sanger et al., 1977) used in conjunction with cloning in Ml3 vectors (Sanger et al., 1980; Messing, 1983) is a rapid and simple method for the determination of nucleotide sequencesin DNA. Because of the nature of the sequencing reaction and the resolution of sequencing gels, however, it suffers from the limitation that only a 300 to 400 base sequencecan be read from any given clone (Messing, 1983). Even with improvements such as the use of thin acrylamide gels (Sanger & Coulson, 1978), buffer gradient gels and 35S label (Biggin et al., 1983), the length of the readable sequence can be increased by only 90 to 100 bases. Therefore, sequencing of large genomes such as viruses, organelle DNA and eukaryotic genes often poses a problem. Under such circumstances, it is usually necessary to do random “shotgun” cloning and sequencing (Sanger et al., 1980; Messing et al., 1981) of a number of short DNA fragments, and the order of these fragments is determined by sequencing other sets of overlapping fragments. As a result, sequence determination of large genomes such as bacteriophage L (Sanger et al., 1982) can be difficult. Other strategies based on sequencing fragments progressively shortened by DNase I and restriction enzymes (Hong, 1982; Barnes et al., 1983) require extensive in vitro manipulations. Therefore, development of a simple genetic method for sequencing long segments of DNA by the dideoxy method is desirable. The sequence of a DNA segment (say ABCDE) can be assembled from the sequencesof several overlapping deletions, which start from a fixed site (X) at the left, generating segments of the type X.E, X. DE. X.CDE, etc. Such an approach was first used by Benzer (1961) to order mutational sites in the ~11 gene of bacteriophage T4 and should, in principle, be applicable to DNA sequence analysis.
Although
several
0oZ2~2~~~6/84/2sO941~08 $03.00/O
in vitro
techniques 941
are
available
(ii 1984 Academic
for
introducing
Press Inc. (London)
Ltd.
!I12
.\
:\HMEI)
IxIc D E’ 1 IX;B
C
D
Ej
Fro. 1. Scheme for the selection of overlapping deletions for DNA sequencing. A DNA fragment (of the hypothetical sequence ARCHE) is‘L inserted at the BamHI (or 8aZI) site in the tet gene of the gals plasmid pAA3.7. preferably in both orientations. E. coli A4 cells carrying the recombinant plasmid are plated on MacConkey-galactose containing ampicillin and chloramphenicol to select galR mutants. Most of these are deletions (thick lines) which start at the right terminus of Tn9 (X) and, passing through the gal region, frequently enter the cloned fragment. Thus, each deletion fuses a different sequence (E, D or (1, etc.) from the fragment to a fixed sequence (X) of Tn9. Several gaIR deletions having end-points spaced at - 350 base-pair intervals are identified by restriction analysis. The P&IBarnHI (or &ll) fragments hearing each deletion are cloned in M13mp8, and sequenced by the primer (A-G-TIARA-(:-T-T-(:-G-C-A-(:-(‘-A-T, wavy arrow) didroxg method using a I&base complementary to the terminal inverted repeats (X) of TnY. The t,ransposon Tn9 is shown as a box consisting of two IS elements (ISI-L and ISI-R) flanking of the cat (chloramphenicol acetyltransferase) gene. Barn, Ew, Hin, Pat and Sal refer to restriction sites of the enzymes BarnHI. EcoRI, HindIII, PntI and SnlI. The map is not drawn to scale.
deletions in cloned genes (reviewed by Shortle et al., 1981), none is satisfactory for sequencing purposes. This letter describes a simple method for positive selection of overlapping deletions and their use in sequence analysis of cloned DNA. The method takes advantage of an unusual property of transposable elements to promote deletions of adjacent DNA from fixed sites at their termini (reviewed by Calos & Miller, 198Oa). It is convenient to describe the method (Fig. 1) first, and present the experimental evidence later. The DNA fragment to be sequenced is inserted at the HamHT (or SalI) site in the tet gene of the gal’? plasmid pAA3.7 (Fig. 2(b)). (‘ultures of Escherichia coli strain A4 (F- recA trpC A4(gal-chlD-pgl-atti)sup” stril) harbouring the recombinant plasmid are spread on MacConkey-galactose plates containing ampicillin (100 ma/l) and chloramphenicol (12.5 mg/l). Almost all galR colonies, which appear after overnight incubation, are deletions on the plasmid extending from the right terminus of Tn9 to various points in the gal genes or the cloned fragment. The deletions are further recognized by their trpphenotype. T’lasmids from several independent galR deletions are purified, and their end-points determined by agarose gel electrophoresis of P&J plus BamHT digests. In this manner, a series of overlapping deletions terminating within the 7 Abbreviations used: gals or galR. sensitivity or resistance to galactose; kb, 10’ bases: ampR, resistance to ampicillin: tetR or tets, resistance or sensitivity to tetracycline: cama or cams, resistance or sensitivity to chloramphenicol.
LETTER
TO
THE
EDITOR
943
P””
(b) FIN. 2. Structures of plasmids (a) pAA3Rcam and (b) ~~483.7. pAA3Rcam contains an insertion of BamHT fragment carrying Tn9 (thick line) from lcaml into pAA3NlOl (Ahmed, 1984b). The wavy line indicates bacterial DNA carrying part of the gal operon. It confers an ampR, trp+, gals, tetS and cama phenotype on E. coli strain A4. Letters A t,o F indicate PslI-generated fragments corresponding to the bands seen in the agarose gel in Fig. 3(a). gala deletions (shown as arcs inside) extend from the right end of Tn9 into the gal region, and frequently enter the EcoRI fragment containing yeast TRPl and ARSl sequences (Fig. 3(b)). pAA 7 was isolated by transposing Tn9 on pAA3HlOl between the amp and TRPl genes. It confers an ampR, camR, trp+. gals and tetR phenotype. The tet gene provides BamHI, Sal1 and 2 other sites for cloning DNA fragments. Depending on the antibiotics used for selection. galR deletions start at the right end(s) of ISI-R, or ISI-L and ISI-R, remove the TBPl and gal regions, and ent,er the cloned fragment frequently. Al and A3 are 2 galRcamS deletions which fuse ISl-L to different gal sequences shown in Fig. 4. Barn, Bgl, Cla. Em, Hin. Hpa, Pst, Pm, Sal, Sph and Xma denote restriction sites for BamHI. Bg111. CZaI. EcoRI. HindIII. Hpal. PstI, PvuI, SalI, SphI and XnzaIII, respertively. (Hin) indicates a Hind111 site altered by mutation. The map scales are in kb.
cloned fragment, and having end-points about 350 base-pairs apart, are selected. Likewise, deletions entering the fragment from the ohher end can be isolated from plasmids carrying the insert in the opposite orientation. The PstI-BamHI fragments harboring each deletion are directionally cloned into M13mp8 or mpl0 (Messing, 1983) and sequenced by the dideoxy method @anger et al., 1980) using a 16-base primer complementary to the Tn9 terminus. Thus, sequences of 300 to 400 bases can be read from each deletion, and the complete sequence is assembled from overlaps occurring in adjacent deletions. The following results demonstrate that the galR mutants which arise from gals plasmids carrying Tn9 are predominant,ly deletions of the required specificity. The construction of several gal’ plasmids has been reported (Ahmed, 1984a,b). The plasmid pAA3BlOl contains a 4.33 kb EcoRI-Hind111 fragment carrying amp and
pal
(a)
302
304
316
311
310
303
307
305
306
Ecl-
A-
EC0
(b)
I I 302
-
304
-
106
-
316
-
155
-
171
-
3Ol,3/1
-
3/0,3/7
-
226,233
I
TRP I ,I
I 500
ifin
PSf
I1
1 ARSl I 11
EC0 I
,
I
I
IDDO
31.5
-
303
-
307
-
306
-
305,306
-
40
-
43
-
37
-
FIG. 3. (a) Photograph of an agarose gel of !‘&I digests of the gals plasmid pAA3Scam and several galR deletions derived from it. The gals plasmid produces 6 bands, designated A to F, which correspond to the PatI fragments shown in Fig. 2(a). The fragment sizes (in kb) are: B, 11.1; E, 4.3; D, 3.3; F, 1.9; C, 1.4; and A, 0.9. Digests of 9 gala deletions (302 to 30X. top line) are arranged to demonstrate the gradual decrease in the sizes of fragments A and C caused by the progressive entry of deletions. The lane on the right contains an Hind111 digest of 1. Digests were run in l’$/, (w/v) agarose at 28 V for
LETTER
TO
THE
EDITOR
945
tet genes from pBR322, an 8.94 kh HindIII-EcoRI fragment carrying a part (E’TK) of the gal operon and cos from a lgal phage, and a 1.45 kb EcoRI fragment carrying the TRPl gene from yeast. The plasmid confers an ampR, tetR, trp+ and gals phenotype on strain 64, and produces galR mutants at a low (1.1 x 10W8/cell per division) rate. These galR mutants arise by a variety of mutational events such as insertions, deletions and point mutations on the plasmid. Insertion of a BamHI fragment of lcanll (Gottesman & Rosner, 1975) bearing Tn9 into this plasmid to generate pAA3Bcam (Fig. 2(a)), however. causes a > 100.fold increase in the rate of gals -+ galR mutations. This drastic increase in the production of galR mutant,s has been shown to be a direct consequence of the presence of Tn9 on the plasmid (Ahmed. 1984b). Since transposable elements are known to promote many kinds of DSA rearrangements (see Kleckner, 1981), it was important t,o identify the genetic change(s) responsible for the galR phenotype. Restriction analysis of plasmids purified from 50 galRcamR mutants indicated that’ 48 were deletions extending from the right terminus of Tn9 to various sites in, or beyond, t’he gal region as illustrated in Figure 2(a). Of these, 37 deletions terminated in the EcoRI fragment which contains the yeast TRPl and ARSl sequences (Tschumper B Carbon, 1980). X photograph of an aga’rosegel from P&I digests of such galR plasmids is shown in Figure 3(a), and the end-points of 24 deletions are shown in Figure 3(b). This set includes 20 deletions which entered the EcoRI fragment from the left’ end (isolated from pAA3Bcam), and four in which the deletions entered from the right end (isolated from a plasmid in which the EcoRI fragment was inverted). From these results, it is evident that the vast majority of galR mutants are TnS-specific deletions. and that their variable end-points are essentially random. The galR deletions described above can be used to divide any cloned fragment into a seriesof overlapping segments for sequencing. However. pAA3Bcam suffers from t’he limitation that, becauseof the presence of several undesirable restriction sites, many DNA fragments can not be cloned on it. Therefore, the plasmid pAAS. 7 (Fig. 2(b)) was isolated by transposing Tn9 on pAA3BlOl (Ahmed, 19846). In this plasmid. Tn9 is inserted between the amp and TRP genes at a sequence that corresponds to residues 4271 to 4279 of the pBR322 sequence (Sutcliffe, 1979). A4 strains carrying pAAS. 7 (or its derivatives H or HB, shortened by the removal of the HpaI or the HpaT and BgZII fragments) have t,he phenotype ampR, camR, trp+, gals and tetR. As described earlier, a BarnHI or Sal1 (or with minor modification. SphT or X,maIII) fragment to be sequenced is inserted in the tet gene, and galR deletions are selected directly by plating on MacConkey-galactose in the presence of ampicillin and chloramphenicol. or ampicillin alone. In the latter case, true galRampR colonies can be identified only after re-streaking on the same plates (because the release of b-lactamase due t,o
15.25 h. (b) End-points of 24 gala deletions in the 1453 base-pair EcoRI fragment of yeast harboring and ARSl sequences (Tschumprr & Carbon. 1980). Deletions entering from the left end were isolated from pAA3Bcam, and those entering from the right were isolated from a similar plasmid in which the EcoRI fragment was inserted in the opposite orientation. Although 17 deletions of the latter class were examined, only 4 are shown. The map scale is in base-pairs.
TRPl
740
760
: I
780
a00
b)
~ACTAAATCA~TAAGTTGGC~I;CATCACC~CGACGCACTTT~CGCCGAATAAA~ACCTGTG~CGGAAGATCA IS/-L Cd
(b)
CACTAAATCAGTAAGTTGGCAGCATCACCGGTACGGCTACCGTCTGCCAGCTCGCGCTGAACATAATCCA~ 5Ll3
(Cl
(5J4
IAGTAAGTTGGCAGCAI..CC$ACGGCTACCGTCTGCCAGCTCGCGCTGAACATAATCCAC IS Primer
CACTAAATCAGTAAGTTGGCAGCATCA~~ATTACGTGGATGTAATCGCGTACGCAGTACCATCTTCGGTCG Wf 1 ;>A/ IAGTAAGTTGGCAGCATI .CC$TTn4CGTGGATGTAATCGCGTACGCAGTACCATCTTCGGTCG (e)
(d)
FIN. 4. Nucleotide sequences of TnY and gals Al and A3 near the right terminus of ISI-L to sho\l t)he specificity of deletion formation. (a) TnY sequence at the ISI-L and cat junction (Alton & Vapnek. 1979), corrected at residue 765. (h) and (d) A3 and Al sequences at ISI-L and gal junctions determined with the use of the 1.5base “mp8“ primer. (c) and (e) Sequences of the same junctions determined with the l&base “IS” primer (shown in a box). The 2 bases immediately following the primer could not be visualized clearly in autoradiograms, and are shown as dots. The subsequent sequences (- 100 bases) determined by the mp8 and IS primers were strictly identical. Sequence hyphens have been omitted for clarity.
lysis of gals cells allows t,he growth of many plasmid-free cells, which appear as galR colonies). Thus, gal R deletions starting at the right end of 181-R (or, if desired, El-L), which divide a cloned fragment into many overlapping segments, can be isolated in only one st’ep. Previous sequence studies on TSI- and TnS-promoted deletions (Ohtsubo & Ohtsubo, 1978; Calos & Miller, 1980b) have established that these deletions result in fusing a fixed nucleotide sequence at the IS2 t,erminus to other sequences from the variable end-point (Fig. l), without causing any other rearrangement. In order to confirm whether the galR deletions exhibit the same specificity, two deletions (d1 and 43) isolated from pAAS’ 7 were sequenced. Since both were cam’ and tetR, their fixed end-points were expected to be at the right end of ISl-I,. and their variable end-points somewhere in the gal region. Their end-points, as determined by restriction analysis. are shown in Figure 2(b). The PstI-BamHI fragments containing these deletions were cloned in M13mp8, and sequenced using the 15.base (A-G-T-C-A-C-G-A-(‘-G-T-T-G-T-A) primer normally used for sequencing DNA cloned on this vector. The sequence data are summa,rized in Figure 4. The Tn9 sequence from the ISI-L and cat junction is shown in line (a). and the sequences of A3 and AI from t’he IS1-I, and gal junctions are shown in lines (b) and (d). These sequences confirm that both galR deletions arise precisely at the terminal nucleotide (position 768) of 167, and fuse other sequences (presumably from the gaZ2’ and E genes) to t’he IS7 terminus. Since the junctions created by these deletions are located 206 bases from the 3’ terminus of the mp8 primer, only 75 to loo-base sequences of the gal region could be read from each of these deletions. To increase the readable sequence. a 16-base (A-G-T-A-A-G-T-TG-G-C-A-G-C-A-T) primer corresponding t,o residues 20 to 5 of the IS1 sequence (Ohtsubo & Ohtsubo. 1978) was used on the same mp8 clones. The gal sequences obtained using the IS primer are shown in lines (c) and (e). The sequences are identical to those obtained using the mp8 primer, and have the advantage that
LETTER
TO THE
EDTTOR
947
about 250 to 300 additional bases can be read from each deletion. Since the endpoints of Al and A3 are about 770 base-pairs apart, no sequence overlaps were expected or found. Therefore, systematic use of overlapping deletions with endpoints 300 to 400 base-pairs apart, in conjunction with the IS primer. should readily yield the sequence of any cloned gene. The present method has several advantages over the shotgun method. Large numbers of overlapping delet,ions in cloned genes can be selected simply b) plating: it should no longer be necessary to sequence many randomly cloned fragments and to order them by sequencing other overlapping fragments: no extensive manipulations of plasmid DNA are reyuired; and the directional cloning of I’stl-HnmHT (or PstT-MT) fragments, which cxontain these deletions, in Ml3 vectors is simpler compared to the cloning of small. blunt-ended fragments. It should be possible to collect sequence data containing systematic overlaps without the redundant information that is normally generated in shotgun sequencing. Finally, the work of Smith et aZ. (1979) and Wallace it CLI. (1981) shows that plasmid Dh:A can be used directly for dideoxy sequencing. If so, it should be possible to sequence long DNA segment’s directly on galR plasmids carrying overlapping deletions, thus making the use of Ml3 vec%ors altoget
21 Frbruary
.4SAU
AHMEl)
1984
REFERENCES Ahmed. A. (1984n). Gene, 28, 37-43. Ahmed, A. (1984h). J. Mol. Biol. 173, 523-529. Alton, X. K. & Vapnek, D. (1979). Nature (London), 282, 864-869. Barnes. W. M., Bevan, M. & Son, P. H. (1983). In Methods in Enzymology (Wu? R., Grossman, L. & Moldave, K., eds), vol. 101. part C, pp. 98-122, Academic Press, h’ew York. Benzrr. S. (1961). Proc. Nat. Acad. Sci., U.S.A. 47, 403.-415. Biggin. M. D., Gibson. T. .J. & Hong, G. F. (1983). Proc. Xat. Acad. Sci.. C7.S.A. 80, 39633965. . t
(‘alas. M. I’. & Miller, .J. H. (1980a). Cell, 20, 579-595. (‘alas. M. I’. & Miller, ,J. H. (1980b). Nature (London), 285, 38-41. Gottesman, M. M. & Rosner, J. L. (1975). Proc. Nut. Acad. Ski., I’.S.il. 72, 5041&5045. Hong, G. F. (1982). J. Mol. Biol. 158, 539-549. Klrckner. XL’. (1981). Annu. Rev. &net. 15, 341-404. ,Mrssing. *J. (1983). In Methods in i?nzymoZogy (Wu, R., Grossman. L. & Moldave, K.. eds), vol. 101. part C, pp. 20-78, Academic Press, P;ew York. Messing.
!J-cX
,\ AHMEl)
Sanger. F.. (loutson. A. R. Uarrrll. I(. (i.. Smith. A. ,J. H. & Ror. K. A. (1980). .J. Mol. Bid 143. Ifl~-tTX. Siltrarr, F.. (‘ORISON. A. I<.. Hung, (:. P’., Hilt. I). F. & I’et~rwn. (:. 13. (1982). J. ,210l. Bi/d. 162. 729 77:s. Shortlt~. I)., DiMaio. I). KL Xat,hans. II. (1981). Annu. Rw. &net. 15, bti.5-~ZM. Smith. ICI.. Leung. I). W.. (:ittam. S.. :Ist,rll. (‘. R., Montgomc~r~. I). T,. & Halt, B. I). (1979). (‘rll, 16. 753 761. Sutdiffe. .I. (:. (1979). (‘old A’prir~q Ifarbor Aynq. Quant. Hid 43. 77-90. ‘J’whurnJwr. C. Nr (larbon, .J. (1980). G’enr, 10. 157 IKH. Wallace, R. l%., ~Johnson, M. *J.. Suggs. S. \‘.. Mipshi, K.-I.. Rhatt. R. &r Ttakura, K. (1981). Ihw. 16. 21 26.
Edited by 9.
Hrennw