J. il!loz. Biol. (1984) 177, 229-245
Insertion Element ISI Encodes Two Structural Genes Required for its Transposition YASUNORI MAcHIDAf’,
CHIYOKO MACHIDA?
AND EIICHI
OHTSUBO$
Department of Microbiology, School of Medicine State University of New York at Stony Brook Stony Brook, NY 11794, U.S.A. (Received
1 December 1983)
The nucleotide sequence analysis of insertion element IS1 has shown that IS1 could have as many as six translational reading frames encoding possible proteins. In order to determine which reading frames are actual structural genes responsible for HI-mediated recombination, we introduced base substitution mutations including nonsense mutations into all of the potential reading frames and examined the ability of these IS1 mutants to mediate cointegration between two plasmids. The results reveal that IS1 has two structural genes (termed insA and &y/z!), which are required for plasmid cointegration mediated by ISI.
1. Introduction DNA insertion element ISI is the smallest active IS element known in bacteria (Hirsch et al., 1972; Fiandt et al., 1972). An IS1 element, which is present in the resistance plasmid RlOO as a natural constituent, has been sequenced and found to he 768 base-pairs in length (Ohtsubo & Ohtsubo, 1978). Genetic analysis of this IS1 (which we call ISlR, below, to differentiate from other ISls) has shown that TSlR can mediate cointegration of two different plasmids, and that this cointegration ability of ISlR is abolished when the deletion mutations are introduced within the ISlR sequence (E. Ohtsubo et al., 1980,198l; Machida et al., 19826). Further genetic complementation analysis has shown that TSlR encodes proteins that are involved in the cointegration event (Machida et al., 19826). Analysis
of the nucleotide
sequence
of ISlR
shows that
there
are six possible
open reading frames encoding proteins larger than 7 x lo3 M,. Figure 1 shows the location of these coding regions. Recently, variants of IS1 (called iso-insertion sequences of IS1 or iso-ISls), which have a few or numerous base substitutions compared to ISlR, have been identified (H. Ohtsubo et al., 1981; Johnsrud, 1979: t Present address: Institute for Plant Virus Research, Tsukuba Science City, Yatabe, Ibaraki 305. *Japan. $ Author to whom all correspondence should be addressed, at: Institute of Applied Microbiology, University of Tokyo, Bunkyo-ku. Tokyo 113, Japan. Oo22--2836/84/22o229~17
$03.00/O
“29
0 1984 Academic Press Inc. (London) Ltd.
Y. MACHIDA,
“30
(‘. MA(‘HIDA
AND
E. 0HTSI:RO Tfh Ill1
3-,
/
// 202
34
1 92 I
/
E
432
609 tt 4/ 468
/,
301 v/
1 I I
/ I
7’9// I /
G 5’ .
5’
F 200
400
600
7m
FIN. 1. The initiation and termination codons found within 1SlR as well as possible open reading frames. All 3 reading frames of both strands are shown. Long and short angled lines represent .4CU and GUG codons, respectively; vertical lines indicate the presence of a V4A. VGA. or I’AG termination codon. Six open frames for possible proteins are indicated in position with numbers, representing the co-ordinates of the nucleotide sequence of IRlR (Ohtsubo & Ohtsubo. 1978). The frames labeled with insA and in& (thick lines) are commonly seen in the same position in the iso-ISZ sequences (H. Ohtsubo et nl., 1981, and unpublished results). Sote that possible initiation codons Al’G and GUG seen within insA and insR in TSlR are not seen in one or the other iso-ISI. The other 4 frames are labeled C, D. E and F (open thick line). The frame designated R’ is at the extended region from the insR frame. These 5 fragments are not necessarily seen in all of the iso-ISI sequences. Small thick open or filled arrows indicate the positions of termination codons in the H’ frame and the D to F frame seen in ISID and ISIF, respectively. insL and in&. shown by open boxes on the map of ISIR. are left positions of cutting sites for and right terminal inverted repeats, respectively. The approximate restriction enzymes P&T. RstEIT and Tthl I1 1 are shown.
Syman et al., 1981). Analysis of possible reading frames in these iso-IS1 sequences has shown that’ two of the six reading frames are commonly seen in the same position. As shown in Figure 1, these frames are named insA and insB, and could encode proteins of 10x lo3 and 15 x lo3 molecular weight, respectively (H. Ohtsubo et al., 1981). The base substitutions seen in the iso-ISls are mostly silent and do not cause many changes in the amino acid sequence encoded by the insA and in.sR frames, while the base changes often cause amino acid changes or introduce termination codons in the other reading frames. For example, an iso-IS1 sequence named ISlD that was identified in the Shigella dysenteriae chromosome contained ten base-pair substitutions when compared to TSlR. Three of these substitutions have been found to generate termination codons in the reading frames, named E and F, of ISlR and also in the B’ region preceding the insR coding frame of ISZR, at positions indicated by the open thick arrows in Figure 1, Another iso-TSl named TSIF, which has been identified in the Shigella jlexneri chromosome, contained 74 base substitutions compared to ISlR, two of which cause termination of the D frame of TSIR, as shown by the filled thick arrows in Figure 1 (Ohtsubo et al., unpublished results). Furthermore, another iso-TSI named ISI( which was also identified in the S. dysenteriae chromosome, showed only 56% homology with the ISlR sequence, but retained the two reading frames corresponding to inaA and insB (H. Ohtsubo et al., 1981).
TWO GENES IN IS1
231
In this paper, we will first describe the construction of hybrid ISls, which are derived from ISlR by replacing its middle DNA segment with the corresponding segment from ISlD or ISIF. The resulting hybrid ISls thus contain base substitutions, some of which generate termination codons in the possible coding regions other than insA and insB. We then show that these hybrids can mediate plasmid cointegration, confirming that the reading frames other than insA and insB do not encode any essential proteins responsible for cointegration mediated by ISl. This study is important, since although existence of insA and insB was predicted from the analysis of iso-IS1 sequences, there is no evidence that base changes seen in the iso-IS1 sequences are functionally neutral, that is, allow these iso-ISls to mediate plasmid cointegration. Secondly, we will describe the isolation of mutants of ISlR using a site-directed mutagenesis technique and show that. mutants having nonsense mutations within insA and insB can no longer mediate cointegration of two plasmids harbored within a suppressor-negative strain. This should give direct evidence that the insA and insB coding frames encode essential proteins responsible for cointegration mediated by ISl. In the accompanying paper (Machida et al., 1984), we describe locations and properties of two transcriptional promoters within ISlR, one of which is for both insA and insB genes. The results presented both in this and the accompanying paper show that IS1 is uniquely different in various respects from other IS and Tn elements.
2. Materials and Methods (a) Bacteria and plasmids Bacteria used were Escherichia coli K12 strains JE5507 and C606 (Machida et al., 19826) and XA-2 (Brenner & Beckwith, 1965). Plasmids pMZ71, pYM144, pYMlO3, pYM141 and ColEl have been described (Machida et al., 1982b). Other plasmids designated pYM were prepared as described in this paper. Plasmids pH0401 (H. Ohtsubo et al., 1981) and pWD5 (unpublished) were derivatives of pBR322 and contained the sequence of ISlD and ISIF, respectively. (b) Enzymes The restriction endonuclease Pat1 and DNA polymerase I (Klenow) Boehringer-Mannheim. The restriction endonucleases B&E11 purchased from New England Biolabs. The restriction endonucleases and BstNI, and phage T4 DNA ligase were purchased from Laboratories. The reaction conditions used for these enzymes were as supplier. (c) Preparation
were purchased from and TthlllI were &‘&I, HaeIII, HinfI Bethesda Research recommended by the
of plasmid DNA
Covalently closed circular plasmid DNA molecules used for constructing new plasmid genomes were purified in a C&l solution as described (Ohtsubo et al., 1978). Restriction fragments of different sizes in the digests of plasmid DNA were separated electrophoretically in a 0.7% (w / v ) agarose gel or a 4 to 8% (w/v) polyacrylamide gel and were visualized by staining with ethidium bromide. We used the crude lysis method (Machida et aZ., 1982a) in order to isolate quickly a small amount of plasmid DNA in bacterial cells from a large number of clones and to examine their structures, which can be checked by the use of restriction endonueleases.
“32
U. MACHIDA, (d) Construction
C. MACHIDA
cflpYMl61
carrying
AND
IS. OHTSUBO
an ISIR-ISW
hybrid from pMZ71
The PstI-BstEII fragment of ISlD w-as isolated from pH0401 DNA, which carried ISID (H. Ohtsubo et aZ., 1981). This fragment was then ligated with the large fragment of pMZ71 DNA, which had been double-digested with PstI and B&E11 (see structure of pMZ71 shown in Fig. 3). The ligated DNA sample was transformed into C600 essentially according to the method of Mandel & Higa (1970). The tetracycline-resistant transformants were selected on L-agar plat,es containing 6 pg tetracycline/ml. The plasmid isolated from one of the tetracycline-resistant transformants was named pYM161. The presence of the ISID sequence in pYM161 was checked by cleavage analysis of t.he plasmid DNa using restriction enzymes. including HaeIII. which cleaves the PstI-HstEIT fragment of ISlR 3 times but the P&I-B&E11 fragment of TSlD twice (see Fig. 2). (e) Construction
of plasmids pYM164 and p YM166 hybrids from pYM144
carrying
ISlR-ISlF
To construct pYM164, the BstEII-TthlllI fragment of ISlF was isolated from pWD5 carrying ISIF. This fragment was then ligated with the larger fragment’ obtained from (see Figs 2 and 3) and the ligated double digest,s of pYM144 DNA with B&E11 and TthlllI sample was transformed into C600 as described above. The plasmid isolated from one of the tetracycline-resistant transformants was named pYM164. To construct pYM166. the BstEII-Tthl111 fragment of TSlF isolated above was partially digested wit,h HinfI and the Hi&-TthlllI fragment corresponding to 569 to 712 was saved (see Fig. 2). We then prepared the BstEII-HinfI fragment of ISZR corresponding to 328 to 568 from ISlR in pYM144 DNA and saved this fragment also (see Fig. 2). These two fragments were then ligated with the larger fragment of the pY?r1144 DNA that had been digested with BstEII (Figs 2 and 3). The ligated sample DNA was transformed into C600 as and TthlllT described. The plasmid isolated from one of the tetracycline-resistant transformants was named pYMl66. The presence of the IS1F sequence in pYMl64 and pYMl66 was checked by the cleavage analysis of plasmid DNAs using rest,riction enzymes. including BstNT. which cleaves ISlR once but not ISIF (see Fig. 2). (f) Construction
of pYM205
and pYM200
from
pMZ71
A portion (2 pg) of the &I digest of pMZ71 DNA was treated with 3 iv-sodium bisulfite (pH 6.0) at 37°C for 16 h according to the method described by Shortle & Nathans (1978). The molecules were then recircularized by treatment with 2 units of T4 DNA ligase at 0°C’ for 16 h. DNA was digested with PstT again in order to remove the molecules still containing the intact PstI site, which was not mutagenized by sodium bisulfite. One-fourt’h (or 05 pg) of the DNA sample was transformed into C600. Plasmids in 5 out of 24 tetracycline-resistant transformants obtained were found to lose the PstI cleavage site, as examined by the crude lysis method. Two plasmids so examined were named pYM200 and pYM205 and were reported in Results. Nucleotide sequencing analysis showed that no base substitution other than those described in Results was present within about 100 base-pairs around the mutated I’stI site. (g) Construction
of pYM210
and pYM211
from
pYM144
A sample (2 fig) of the 240 base-pair HaeIII fragment (corresponding to 319 to 558 of ISIR), which included the B&E11 site (at 327 to 332), was first prepared (see Fig. 5). The fragment was then treated with 10 units of E. coli DNA polymerase I (Klenow) in the presence of TTP in a reaction mixture containing 60 mM-Tris. HCI (pH &O), 6 mM-MgCl,, 1 mM-dithiothreitol, 0.3 mi%-TTP at 12°C for 1 h. As shown in Fig. 5. this treatment allows the generation of short single-stranded portions at both ends of the HaeIII fragment by utilization of the 3’ to 5’ exonuclease activity of DNA polymerase I (Lehman & Richardson.
TWO GENES IN IS1
233
1964). After extractions with phenol and ether, the sample was dialyzed against SSC (SSC is 0.15 M-NaCl, 0.015 M-sodium citrate) and then treated with sodium bisulfite as described (Shortle & Nathans, 1978). Sodium bisulfite should convert cytosine residues in the singlestranded portions of the polymerase-treated HaeIII fragment to uracil residues. The singlestranded regions of the fragments were then filled in to make blunt ends with E. coli polymerase I (Klenow), this time in the presence of 4 dNTPs (Machida et al., 1982b). This treatment should convert the original G. C base-pairs at ends of the HaeIII fragment to A. U base-pairs. The mutagenized fragment was then digested with B&E11 and the larger fragment, which contained only one mutated end, was purified by electrophoresis in a 4% polyacrylamide gel. The mutagenized fragment was put back into the ISlR sequence in pYM144, as follows (see Fig. 5). pYM144 DNA was double-digested with B&E11 and TthlllI and the larger fragment was saved. The smaller fragment in the digest was further digested with Hoe111 and the resulting HaeIII-TthlllI fragment was also saved: 0.3 pg and 0.5 pg of these two fragments, respectively, was mixed with 0.2 pg of the mutagenized fragment (prepared as described above), and they were then treated with T4 DNA ligase. The ligated sample was transformed into C600. Plasmids in the 80 independent tetracycline-resistant transformants were examined by digestion with BstEII, TthlllI and Hue111 by the crude lysis method. Only 4 plasmids lost the original Hue111 cleavage site but retained the BstEIT and TthlllI sites. Two of those plasmids were named pYM210 and pYM211 and are reported in Results. (h) Analysis of frequency of plasmid cointegration E. coli strains carrying ColEl and pMZ71 (or pYM plasmid) were prepared by transformation of the two plasmids as described (Kretchmer et al., 1975; E. Ohtsubo et al., 1980; Machida et al., 1982b). After selecting the tetracycline-resistant transformants on an L-agar plate containing 6 pg tetracycline/ml at 3O”C, the presence of the 2 plasmids in each transformant was checked by the crude lysis method. Cells carrying cointegrates were selected at 42°C on an L-agar plate containing 10 to 15 pg tetracycline/ml (E. Ohtsubo et al., 1980,1981; Machida et al., 1982b). The frequency of formation of cointegrates was determined by a fluctuation test as described (Luria & Delbriick, 1943; E. Ohtsubo et al., 1980,198l; Machida et al., 19823). In order to determine 2 types of cointegrates (namely, types A and B, as explained in Results), we collected a number of independent colonies (6 to 23 colonies), which were temperature-resistant and tetracycline-resistant as well, such that each colony was from an independent culture. We then examined cointegrate structure by size and cleavage analysis with a restriction enzyme PstI, BstEII, Sat11 or Tthl 111 using the crude lysis method (see Fig. 3 and also Machida et al., 19826). (i) DNA sequencing The Maxam & Gilbert (1980) method was used for determinating the nucleotide sequences of the mutated regions in IS1. The apparatus and conditions used in our experiments have been described (Ohtsubo & Ohtsubo, 1978; Rosen et al., 1980).
3. Results Plasmid pMZ7l is a derivative containing an ISlR insertion from a temperature-sensitive replication mutant of the tetracycline resistance plasmid pSClO1 (E. Ohtsubo et al., 1981; Machida et al., 1982b; Hashimoto & Sekiguchi, 1976). Another plasmid, pYM144, is a derivative of pMZ71 and deletes one of the t’wo TthlllI restriction endonuclease cutting sites outside ISlR without altering
PSI1
Bsf EII 6 1
I / Hybrid
IS/
pYMI6I D
CTGCAGTTCACTTACACCGCTTCTW\ACCCGGTACGCACCA~AAATCATT~TATG~CAT~ATGGCGTTGGATGCCGGGCAACCGCCCGCATTATG GPCGTCAAGTGAATGTGGCGAGAGTTGGGCCATGCGTGGTC~TTAGT~CTATAC~GGTACTTACCGCAACCTACGGCCCGTTffiCGGGCGTMTAC T A A T
~GTAACCT~GCGCATA~AGC~GGGCAGTGA~GTCATCGT~GCTAAATCGCG~~AG~G~TGGCTG CCATT~GAGCGCGTATGTCGGCCCGTCACTGCAGTAGCACAC 8 T c T T T r A A E
r
T A
I; ; i
452
487
500
447 2 ti A% Phe~rAla~~'yr*apnrg~elulrgL~ysThr"ol"~'aA2aHisv~zmeClyCiuAr~~rM~tA2a~~~i~uCL~APqLeuMetSerixuie~~rPro
;L.
" T ; i ko(CGA) .---e 0 1 Opal(UGA) 508 Lu
in E
523 Ala
TTTTACGCGTATGACAGGCTCCGWVIGACGGTTGTTGCGCACGTATTCGGTGAACGCACTATGGCGACGCTGGGGCGTGTTATUGCCTGCCTGCTGTCACCC AAAATGCGCATACTGTCCGAU;CCTTCTGCCAACAbCGCGTGCAT~GCCACTTGCGTGATACCGCTGC~CCCCGCACAATACTC~AC~CAGTGGG A G C G C C CA A T C T G G :; LCLICGAG)
:,
AmwA:,
I" E
I Ii
HLJi
HoeDI
TTTWlCGTGGTWlTATGGATGACGGATGGCTGdCCGCTGlAT~TCCCGCCT~AGG~AAGCTGCACGTMTCAG~AGCGATATACGCAGC~TT~G AAACTGCACCACTATACCTACTGCCTACCGAC~GGCGACTC GC G A T TCT C CG C A / T AGA G
T
T A
T
T
Fro. 2
TWO
GENES
IN ISI
235
any other genetic characters of pMZ71 (Machida et al., 19823). In this study, we used pMZ71 and pYM144 to assay the activity of ISlR utilizing the genetic characters of these plasmids, as will be described below in detail. Furthermore, pMZ71 and pYM144 each contains a single cleavage site for P&I and BstEII inside ISIR; also, pYM144 contains a single cleavage site for Tthll 11 within ISlR because of deletion of the second cleavage site located outside ISlR in pMZ71 (see Fig. 1; also Fig. 3). Therefore, these plasmids are also very useful for constructing plasmids containing ISlR-iso-IS1 hybrids, and for site-directed mutagenesis, as will be demonstrated below. (a) Hybrid ISls and their cointegration (i) An ISIR-ISID
ability
hybrid
The iso-IS1 element ISlD from the S. dysenteriae chromosome has been found to contain ten base substitutions when compared to ISlR (H. Ohtsubo et al., 1981). We constructed a plasmid, named pYMl61, carrying an ISlR-ISID hybrid, by replacing the P&I-B&E11 fragment of ISlR in plasmid pMZ71 with the corresponding fragment of ISID, as described in Materials and Methods. Figure 2(b), section a shows the nucleotide sequence of the fragment of ISlR as well as that of ISID, showing that the ISID fragment contained only six base substitutions compared to ISlR. As shown at the top of Figure 2 and also Figure 2(b), section a, the resulting hybrid ISI encodes the E, F and B’ reading frames, all of which contain termination codons. The hybrid ISI, however, encodes the insA reading frame without any termination codons but with two missense codons causing amino acid substitutions (see Fig. 2(b), section a). (ii) IS1 R-IS1 F hybrids ISlF from the S. jexneri chromosome contains 74 base substitutions compared to ISlR. We constructed two different ISlR-ISIF hybrids. Our primary reason for contructing these hybrids was analysis of the dispensability of the D reading frame, since some base substitutions in ISlF introduce termination codons into the D frame (see Fig. 1). The first plasmid, named pYM164, carrying an Fro. 2. (a) Structures of 3 hybrid ISls that encode reading frames having missense and nonsense mutations. The sawtooth lines, labeled a, b and c, indicate the portions substituted by ISlD or ISIF. the nucleotide sequences of which are shown in (b). Positions of base substitutions that caused missense and nonsense mutations are shown by arrows and x , respectively, on the possible reading frames (b) Nucleotide sequences of portions of ISlR and of the corresponding portions of ISlD and ISIF. a, P&I-B&E11 region of ISlR and ISID; b, BstEII-TthlllI region of ISlR and ISIF; c (boxed region), HinfI -TthlllI region of ISlR and ISIF. Substituted base-pairs in iso-IS1 are only shown below the ISlR sequence. The amino acid sequences encoded by insA and inaB seen in ISlR are shown above the nucleotide sequence. Altered amino acids due to base substitutions in iso-ISls are shown above the amino acid sequences of InsA and InsB of ISIR. The base substitutions, which are seen in iso-ISls and introduced termination codons in E, F, D and B’ reading frames, are indicated with explanations. It should be noted that the ISlR sequence shown in section a contains an HaeIII site at 281 to 264, but this site is not in the ISlD sequence, while the ISlR sequence, shown in sections b and c, contains a BarNI site at 654 to 658, but this site is not in the ISlF sequence. Absence of these 2 cleavage sites from iso-ISls was useful for identifying the presence of the iso-IS1 sequence in the hybrid ISls as described in Materials and Methods. Sequence hyphens have been omitted for clarity.
2%
Y. MACHIDA,
C. MACHIDA
AND
E. OHTSUBO
ISIR-ISlF hybrid was constructed by replacing the BstEII-TthlllI fragment of ISlR in plasmid pYM144 with the corresponding fragment of ISlF (see Materials and Methods). Figure 2(b), section b shows t,he nucleotide sequence of the BstEII-TthlllI fragment of ISlR as well as that of ISIF. showing that the ISZF fragment contains 37 base-pair substitutions out, of 74 base substitutions seen in the entire ISlF sequence. As shown at the top of Figure 2 and also Figure 2(b), section b, the resulting hybrid IS1 encodes the D and E reading frames, both of which contain termination codons. The hybrid IS1 also encodes the insB reading frame. This frame does not contain any termination codons but, there are eight missense codons causing amino acid substitutions. as shown in Figure 2(b), section b. Note, however, that the i,nsB reading frame of ISlF could be longer than insB in ISlR by extending the amino-terminal end with six amino acids. because of generation of a new init,iation codon GUG in ISZF upstream from the initiation codon AUG for InsB, as shown in Figure 2(b), section b. We constructed plasmid pYM166 carrying a second TSlR-ISlF hybrid which contained a short~er ISlF sequence. namely the FfiafI-Tthl 11I fragment corresponding to 569 to 712 for the ISlR sequence, as described in Materials and Methods. The ISlF sequence in the hybrid IS7 is only a part of the BstEII-TthlllI fragment of ISIF. as shown in Figure 2(b). section c (boxed region), because the hybrid IS1 does not include the region encoding the aminoterminal end of insB characteristic of ISlF. The resulting hybrid IS1 thus contains 13 base substitutions, in which 12 of them do not’ alter the amino acid sequence encoded by insB. but only one of them causes an amino acid substitution in insB of ISlR and two of them introduce termination codons in the D reading frame (see Fig. 2(b). section c, and top of Fig. 2). (iii) Genetic characterization (1) Explanation
of hybrid
of the plasmid
ISls
cointegration
system used for assaying
the activity
of
IS1
We have demonstrated that plasmid pMZ7l or pYM 144 can integrate into a second plasmid ColEl to give large cointegrate plasmids. These cointegrates can be isolated readily from a population of cells harboring the two parental plasmids. pMZ71 (or pYM144) and ColEl, by simply select,ing the tetracycline resistance character of pMZ71 (or pYM144) and temperature-resistant DNA replication system of ColEl (E. Ohtsubo et al.. 1980,1981; H. Ohtsubo et al.. 1980; Machida et al., 19823). As shown in Figure 3, cointegrates are found to be of two types; one (a) mediated by IS1 and the other (b) mediated by another insertion sequence IS102, which is a natural component of pSC101, and thus+MZ71 and pYM144 (H. Ohtsubo et al., 1980; Bernardi & Bernardi, 1981; Machida et aZ., 19826). Note that the cointegrates have certain characteristic structures, which consist of a duplication of the IS element that mediates the cointegration event in a direct orientation at the cointegration junctions. Thus, these two types of cointegrates can be distinguished by digesting cointegrate DNA with suitable restriction endonucleases such as PstI and BstEII, each of which cuts IS1 once, or SstII, which cuts IS102 once (see Fig. 3). Since these enzymes cleave the newly
TWO
GENES
IN ISI
EcoRI
EL (0) Bsf-...
FIG. 3. Structures of the 2 types of cointegrates formed between pMZ71 and ColEl. The sequences of pMZ71 and ColEl are shown by continuous and sawtooth lines, respectively. The sequences of ISI and IS102 are represented by filled and open boxes, respectively. The cointegrate (a) or (b) contains the IS element that mediated cointegration in a direct orientation at junctions between the 2 parental plasmids. Cleavage sites for EcoRI, PstI, BstEII, TthlllI and SstII are indicated by arrows. As the integration site of pMZ71 into ColEl is not. always fixed on the ColEl sequence but is unique for each cointegrate, the P&I and S&II sites on ColEl in the cointegrates are shown by the broken arrows (Machida et al., 1982b). Note that cleavage of the cointegrate (a) with PstI or B&E11 generates the full-sized linear pMZ71 fragment, because of a direct duplication of ISI, while cleavage of the cointegrate (b) with SstII generates the full-sized linear pMZ71 fragment, because of a direct duplication of IS102. Plasmid pYM144 is a derivative of pMZ71 and deletes the TthlllI in the parentheses outside of ISI, but can also form the 2 types of cointegrates shown here. kb, lo3 basepairs: ori, origin of replication; Tc’, tetracycline resistance gene.
generated duplicated IS sequences in a cointegrate, the resulting additional fragment is the linear fragment of pMZ71 or pYM144 (see Fig. 3). iln advantage of this genetic system is that the frequency of formation of cointegrates can be determined by a Luria-Delbriick fluctuation test (E. Ohtsubo et al., 1980,1981; Machida et al., 19823). We have shown that pMZ71 and pYM144 form cointegrates with ColEI at a frequency shown in Table 1, lines 1 and 2, although the cointegrates formed are exclusively those mediated by IS1 (Machida et al., 19826). However, the derivatives of pMZ71 and pYM144 having deletion mutations in the IS1 sequence have been found to form cointegrates mediated by t,he mutant IS1 at a greatly reduced frequency, 100 to IOOO-fold. Table 1. lines 3 and 4, shows previous results from the analysis of two mutants, pYM103, which carries an IS1 mutant having a deletion in the insA coding region, and pYM141, which carries another IS1 mutant having a deletion in the insB coding region. These two mutants form cointegrates mediated by IS102 at a unique frequency, which is higher than that for the formation of cointegrates mediated by a mutant TSI (Table 1, lines 3 and 4). The cointegration mediated by the mutant IS1 has
13R
Y. MACHIDA.
C. MACHIDA
AND
E. OHTSVBO
been assumed to be due to complementation of the mutant IS1 by multiple copies IS1 elements present in the E. coli chromosome (Machida et al., 19823).
of
(2) Cointegration
ability
of hybrid
ISls
We examined the cointegration ability of the TSIR-1811) hybrid present in plasmid pYM161 using the genetic system described above. Table 1, line 5%shows that the hybrid IS1 that had termination codons in the E. F and R’ reading frames is as efficient as ISIR in forming cointegrates with ColEl. This result, suggests that the three reading frames E, F and B’ are not required for the TSImediated cointegration. We then examined the cointegration ability of the two ISlR-ISlF hybrids. Table 1; line 6, shows that plasmid pYM164 carrying the hybrid IS1 that contained the BstEII-TthlllI fragment from ISlF (see Fig. 2) formed cointegrates at a low frequency. However, the relative cointegration ability of the hybrid IS1 ver9uR IS102 is higher than that of the deletion mutant in pYMl03 or
Cointegration
Plasmid
TABLE 1 frequency between CoEEl and pYM plasmid mutant IS1 in various E. coli strains
Mutated coding region in ISI (mutation)
Frequency per division cycle (x 108)
In E. coli tJE5507 (no suppressor) Wild-type (1) pMZ7lt (2) pYM1444t Wild-type insA, C and F (20 bp deletion) (3) pYM103t (4) pYM14lt imB and D (51 bp deletion) (5) pYM161
insA (missense). B’ (amber), E (opal), F (ochre) (6) pYM164 insB (missense), D (opal and ochre), E (opal and amber) (7) pYM166 D (opal and ochre) --------------------------------------------(8) pYM205 insA (amber) (9) pYM200 C (missense), F (amber) insB (ochre), D (amber) (10) pYMZI0 insB and D (1 bp deletion) (11) pYM211 In E. coli C600 (12) pMZ71 (13) pYM103 (14) pYM205
(supE, amber suppressor) Wild-type insA, C and F (20 bp deletion) insA (amber)
In E. coli XA-2 (supC, ochre and amber suppressor) (15) pYM144 Wild-type (16) pYM141 insB and D (51 bp deletion) (17) pYM210 insB (ochre), D (amber) (18) pYM211 insB and D (1 bp deletion) bp, base-pairs. t These data were published by Machida et al. (19826)
carrying
a
Number of examined cointegrates mediated by: IS2 (mutant) IS102
73 86 0.38 0.57
40 11 7 3
0 0 16 4
61
11
0
4
2
9
0
0.49 60 0.63 0.21
I 12 2 3
.i 0 19 6
550 0.49 49
19 2 17
0 4 0
900 0.96 2.3 2.2
19 2 10 2
0 6 10 16
04% 46
TWO GENES IN IS1
239
pYM141 versus IS102 (Table 1, lines 3 and 4). This result suggests that the hybrid IS1 is able to form cointegrates but with low efficiency. This low efficiency could be due to the additional amino acids present in the InsB protein of ISlF as mentioned earlier, or due to base substitutions in ISlF that cause eight amino acid substitutions in the InsB protein. As shown in Table 1, line 7, plasmid pYM166 carrying the other ISIR-ISlF hybrid was found to form cointegrates mediated by the hybrid IS1 at a frequency as high as pYM144 carrying wild-type ISlR. The hybrid IS1 contained an ISlF sequence that introduced termination codons in the D reading frame of ISlR but not the insB reading frame. Therefore, the above result suggests that the D reading frame seen in ISlR is not required for plasmid cointegration. (b) Isolation of IS1 mutants using a site-directed mutagenesis technique and genetic characterization of mutants obtained (if ZSI mutants including
an insA amber mutant
Plasmid pMZ71 contains a single cleavage site for PstI in the middle of the insA reading frame where the C and F reading frames also overlap. After cleavage of pMZ71 DNA with PstI, we mutagenized the resulting single-stranded regions using sodium bisulfite in order to convert a cytosine residue to a uracil residue. By the procedures described in detail in Materials and Methods, we were able to obtain two different mutant plasmids from the mutagenized pMZ71 DNA. As summarized in Figure 4, nucleotide sequence analysis showed that one mutant, pYM205, had the substituted base-pair T. A instead of C *G at position 179 within the PstI recognition sequence of ISl. This resulted in conversion of a codon for glutamic acid to the amber termination codon in the insA reading frame. This mutation, however, did not cause any changes of the amino acid sequences encoded by the C and F reading frames (see Fig. 4). Another mutant, pYM200, contained two base-pair substitutions within the PstI recognition sequence; an A . T pair instead of the G . C pair at 178, and an A. T pair instead of the G ’ C pair at 181 (Fig. 4). The substitution at 181 is unexpected but is probably due to “breathing” of the end of double-stranded DNA. These two base substitutions caused no changes in the amino acid sequence encoded by insA (Fig. 4). However, one of the substitutions at 178 resulted in generation of an amber termination codon within the F reading frame (Fig. 4). Furthermore, the two base substitutions caused two non-conservative amino acid changes (Tyr for Cys, and Asn for Ser) in the amino acid sequence encoded by the C reading frame (Fig. 4). (ii) IS1 mutants including
an insB ochre mutant
For technical reasons, we used pYM144 for the isolation of IS1 mutants, including an insB, by mutagenizing the Hue111 recognition sequence at 557 to 560 present in IS1 where the insB and D frames overlap (Fig. 4). As shown in Figure 5, we first isolated the HaeIII fragment corresponding to 319 to 558 of IS1 and mutagenized the Hue111 sites with sodium bisulfite after creating singlestranded regions at both ends of the fragment by the 3’+ 5’ exonuclease associated with E. coli DNA polymerase I. The mutagenized single-stranded 9
Y. MACHIDA,
140
F-WI 076-W 1 insA
pYM205
C. MACHIDA
E. OHTSUBO HueIlU557-5601
Tth
IIII
179
l
179 191 c
pY M200
AND
BstElI
ins8
3
5
l-I-9
557 -568
0
557
pvklao
,
FIG. 4. ISlR mutants isolated by site-directed mutagenesis techniques. (Iritical parts of nucleotide sequences of wild-type ISlR around the PatI site at 176 to 181 and the Z&e111 site at 557 to 560 within the ISlR sequence are shown. The amino acid sequences shown are those encoded by insA and imB of ISIR. Positions of the mutated base-pairs in each of the ISlR mutants carried by pYM plasmids are shown above or below the nucleotide sequence of wild-type ISIR. Nonsense or missense mutations generated by the base substitutions into seveial reading frames are indicated with explanations. At the top of this Figure, approximate positions of missenseand nonsense mutations in the several reading frames are represented schematically. The nonsense and missense mutations are shown by x and small arrows on the reading frames, respectively. Sequence hyphens have been omitted for clarity.
regions were then filled in with E. coli DNA polymerase I. This treatment can result in conversion of the two G. C pairs at 557 and 558 within the HaeIII site to two A - T pairs and thus the Trp codon (UGG) at 556 to 558 in the insB reading frame can be converted to an ochre termination codon (CAA). By the procedure described in detail in Materials and Methods, we were able to obtain two mutants. As shown in F’igure 4, nucleotide sequence analysis demonstrated that one mutant, pYM210, had two base substitutions that generated an ochre termination codon in the insB reading frame, as expected. Unexpectedly, however, pYM210 was found to contain an additional base substitution of a G. C pair at position 486, although this change did not cause a change of the amino acid sequence encoded by the insB reading frame. Among these mutations seen in pYM210, one substitution at 557 introduced an amber termination codon in the D frame, as shown in Figure 4. A second mutant pYM2 11 contained an A. T pair instead of the G. C pair at 557, but deleted unexpectedly the G *C pair at 558 as shown in Figure 4. This deletion in the ISI mutant caused frameshifts in both insB and D frames. (iii) Cointegration suppression
abilities
of the IS1 mutants
having
nonsense mutations
and their
We examined the cointegration ability of the above IS1 mutants first in the absence of suppressor genes using E. coli strain JE5507. Table l! line 8, shows
TWO GENES IN IS1 15/R
in
241
pYM144
5'
CCGCAG---GGTAACC----ATGGCT TC----CCATTGG----TACCGACC 5od~um
5'
5'
birulfite
I UUGUAG----GGTAACC----ATGGCT TC----CCATTGG----,AcCGAUU
5'
I
ma polperare I r4 dNTPa 5'
I
UUGUAli---GETAACC----ATGGCTAA MCATC~-CCATTGG----TAccGAUU
328
I
5'
8slEn
558
GTAKC----ATGGcTAA G...-TACCGAUU I
r-
k+ L,gatm
328
559 CCGCTG----GACA GGCGAC----CTGTT + 8*1ER-Tlhllll large
fragrent
Of pw44
and tranrfotmat,on
558
~~~~~~I:~~~~~_l~~~--AIGGCTA~TG----GAC~------TACCGAT
GCGAC----CTG
712 CAG-----
FIG. 5. Mutagenesis at blunt ends generated by the f&III digestion. For explanations, seeMaterials and Methods. Sequence hyphens have been omitted for clarity.
that pYM205
carrying an amber mutation in the insA reading frame formed ColEl at a greatly reduced frequency. The cointegrates formed were predominantly those mediated by IS102. This result indicates that the IS1 mutant carried by pYM205 had lost its cointegration ability because of the mutation in insA. Therefore, the result suggests that the insA reading frame encodes a protein that is essential for plasmid cointegration. Another mutant of pMZ71, pYM200, which contained two base substitutions (causing an amber termination codon in the F reading frame and introducing two missense mutations in the C reading frame) showed no difference in frequency of cointegrates
cointegration
with
mediated
by the ISI
mutant
(Table
1, line 9). This result
confirms
the observation described in the previous section that, the F reading frame is not required for cointegration. As mentioned earlier, the two missense mutations caused non-conservative
changes of amino
acids (namely
Tyr for Cys, and Asn for
Ser) encoded by the C reading frame (Fig. 4). Therefore, the result given above also suggests that the C reading frame in ISlR is not required for cointegration. Table 1, line 10, shows that pYM210 carrying an ochre termination codon in the insB reading frame and an amber termination codon in the D reading frame
242
Y. MACHIDA,
C. MACHIDA
AND
E. OHTSUBO
produces well-formed cointegrates at a greatly reduced frequency. The cointegrates formed were predominantly those mediated by ISlU2. This suggests that insB or the D frame, or both, is essential for plasmid cointegration. However, since the D reading frame was shown above not to be required for cointegration, the above result indicates that the mutation in the insB reading frame is responsible for the poor cointegration ability of the mutant ISZ. Therefore, the result suggests that the insB reading frame encodes a protein that is also essential for plasmid cointegration. A mutant, pYM211, derived from pYM144 contained a frameshift mutation in both the insB and D reading frames. Cointegration analysis showed that the mutation caused the same effects as the deletion mutants pYMlO3 and pYM141 (Table 1, line 11). This could be due to the frameshift of the essential insB reading frame for cointegration by the reasoning given above. Secondly, we examined the cointegration ability of IS1 mutants in pYM205 (insA amber), pYM210 (insB ochre) and pYM211 (insB frameshift) in the presence of suppressor genes, using E. coli strains C600 (supE, suppressor of amber mutation) and XA-2 (supC, suppressor of ochre and amber mutations). As a control, the cointegration ability of wild-type ISlR and mutant ISls having deletions was also examined in the same strains, since the frequency of cointegration mediated by IS1 differs depending upon the host strain used (E. Ohtsubo et al., 1981). As shown in Table 1, lines 12 to 14, pYM205 carrying insA amber formed cointegrates mediated by the IS1 mutant in C600 at a frequency higher than that for the formation of cointegrates by pYMlO3 carrying a deletion in insA but lower than that for pMZ71 carrying wild-type TSl. This suggests that the amber mutation in insA was suppressed by supE at a level of 10% relative to the unsuppressed level. Table 1, lines 15 to 18, shows that the frequency of cointegration of pYM210 carrying insB ochre remained low in XA-2 at, almost’ the same level as that of pYM141 carrying a deletion mutation in insB. h’ot’e, however, that the relative number of cointegrates mediated by the IS1 mutant to that of cointegrates mediated by IS108 was higher than that seen in the analysis of pYM141 carrying a deletion within insB (compare lines 16 and 17). pYM211, like pYM141, also gave rise to cointegrates mediated by the IS1 mutant having insB frameshift at a lower frequency than those mediated by IS102 (compare lines 17 and 18). These results indicate that the ochre mutation in insB was suppressed in XA-2 but apparently with a very low efficiency. The suppressor in XA-2 is known to suppress the ochre mutation, which has been converted from a codon for tryptophan in the present case, and substitutes a proline residue for the ochre termination codon. It is therefore possible to assume that the substitution of tryptophan by proline in the InsB protein did not completely restore the activity of the InsB protein to mediate cointegration.
4. Discussion In this paper, we have shown that two putative coding frames insA and insB in ISlR are two structural genes encoding proteins. The experimental results suggest
TWO
GENES
IN ISI
a3
that possibly both of these genes are required for plasmid cointegration mediated by ISlR. Alternatively, the results can be interpreted such that insA is not actually required for cointegration, but insB is absolutely required, because mutations in insA might cause a polar effect on the expression of insB located downstream from insA, resulting in a loss of cointegration ability of the TSlR mutants with mutations in insA. However, since no obvious p-dependent responsible for transcriptional termination sequences, which are primarily polarity (De Crombrugghe et aE., 1973), are found immediately downstream from the Pstl site that was used for mutagenizing the insA coding frame, it is reasonable to assume that both insA and insB are required for cointegration mediated by ISlR. We have also shown in this paper the other possible reading frames seen in ISlR are not involved in the ISI-mediated cointegration event at all, even if they encode proteins. Examination of the nucleotide sequence has shown that insA could encode a protein consisting of 91 amino acids, while insB could encode a protein consisting of 125 amino acids (H. Ohtsubo et al., 1981). The InsA protein would contain more basic amino acids (8 arginine and 4 lysine residues) than acidic amino acids (2 aspartic acid residues), while the InsB protein would contain predominantly basic amino acids (13 arginine and 8 lysine residues) and some acidic amino acids (5 aspartic acid and 5 glutamic acid residues). Thus, both InsA and InsB proteins are rather basic in nature and seem quite likely to interact with DNA. We have shown that the terminal inverted repeats at the left and right ends of ISlR (named insL and insR) are absolutely required for the cointegration event mediated by IS1 (Machida et al., 1982b). Therefore, it is likely that the InsA and InsB proteins recognize and bind the terminal inverted repeats of IS1 to precede the cointegration event, probably by co-operation with some cellular proteins. In the region preceding the initiation codon (GUG) of insA, there is a good Shine-Dalgarno sequence, GAGGTGcTC, while there is no recognizable ShineDalgarno sequence in the region preceding the initiation codon (AUG) of insB. This suggests that the InsB protein may be poorly expressed, resulting in the production of a small amount of the InsB protein. This, in turn, may tend to limit the rate of translocation of IS1 within bacterial cells, as discussed elsewhere (H. Ohtsubo et al., 1981). We describe in the accompanying paper that the left terminal inverted repeat (insL) serves as a promoter for E. coli RNA polymerase to initiate RNA synthesis in vitro, but there is no promoter in the middle region of the TSAR sequence. The promoter is located upstream from the insA coding frame, indicating that the insA and insB coding frames, the orientation of which is the same, are transcribed polycistronically. Interestingly, however, we also describe how the right terminal inverted repeat sequence serves as a promoter to initiate synthesis of RNA, possibly as a message for the D reading frame seen in TSAR. We discuss the role of this promoter in the accompanying paper. It has been reported that several other IS elements, such as ISlO, IS50 and IS903, which are components of transposons TnlO(tet), TnS(kan) and Tn903(kan), respectively, could encode proteins responsible for transposition of these elements, since mutants with deletions within the flanking IS elements reduced the frequency of transposition of the Tn elements (Kleckner et d., 1981; Rothstein et
244
Y. MACHIDA,
C. MACHIDA
ASD E. OHTGUBO
al., 1981; Grindley & Joyce, 1981). Nucleotide sequence analysis of these IS elements has shown that they contain several possible reading frames. Unlike ISl. they all contain one particularly large open frame that covers most of each sequence. In addition, the nucleotide sequences of various other IS elements, such as IS2 (Ghosal et aE., 1979), IS4 (Klaer et al., 1981): IS5 (Engler & van Bree. 1981) and IS102 (Bernardi & Bernardi, 1981), have shown that each IS element contains a large open reading frame as well as smaller reading frames that overlap the large one. Tn these elements, a protein that is probably encoded by the large reading frame seen in IS50 flanking Tn5, in TS4, or in IS5 has been identified (Rothstein et al., 1981: Trinks et al., 1981; Rack et al., 1982). It has been suggested that the large reading frame encoded by IS50 that flanks Tn5 is responsible for t,he transposition of Tn5 (Rothstein & Reznikoff, 1981). The genetic structure of TS1 as elucidated in this paper seems t,o be unique and different from 1850 and other elements. We thank H. Ohtsubo, D. Davison and K. Armstrong for critical reading of the manuscript. This work was supported by grants to E.O. and H.O. from t,he Sational Tnstitutes of Health, U.S.A.
REFERENCES Bernardi, A. & Bernardi, F. (1981). Nucl. Acids Res. 9; 2905-2911. Brenner, S. & Beckwith, J. R. (1965). J. Mol. Biol. 13, 629-637. De Crombrugghe, B., Adhya, S., Gottesman, M. & Pastan. I. (1973). Nature, Xew Biol. 241. 260-264. Engler, J. A. & van Bree, M. P. (1981). Gene, 14, 155-163. Fiandt. M., Szybalski, W. & Malamy. M. H. (1972). Mol. Gen. Genet. 119, 223-231, Ghosal, D.. Sommer. H. & Saedler, H. (1979). Nucl. Acids Res. 6, 111 l-l 122. Grindley, N. D. F. & Joyce, C. M. (1981). Cold Spring Harbor Symp. Quant. Biol. 45. 125 143. Hashimoto, T. & Sekiguchi. M. (1976). J. Bacterial. 127. 1561~1563. Hirsch, H. J.; Starlinger, P. & Brachet, P. (1972). Mol. Gen. Genet. 119. 191-206. Johnsrud, L. (1979). Mol. Gen. Genet. 169> 213-218. Klaer, R., Kuhn, S., Tilman, E., Fritz, H. J. & Starlinger, P. (1981). Mol. Grn. (ienet. 181, 169-175. Kleckner, X., Foster. T. J., Davis, M. A., Hanley-Wang. S., Helling. S. M.. Lundflad. \‘. Bi Takeshita, K. (1981). Cold Spring Harbor Symp. Quant. Biol. 45, 225-238. Kretchmer, P. J., Chang, A. C. & Cohen, S. K. (1975). J. Bacterial. 124. 225-231. Lehman. I. R. & Richardson. C. C. (1964). J. Biol. Chem. 239. 233-241. Luria, 6. E. & Delbriick. M. (1943). Genetics, 28. 491-511. Machida, C., Machida, Y. & Ohtsubo, E. (1984). J. Mol. Biol. 177. 247-267. Machida, Y.. Machida, C. & Ohtsubo. E. (1982a). GeZZ,30. 2!%36. Machida, Y., Machida, C.. Ohtsubo, H. & Ohtsubo, E. (19826). Proc. Nat. Acad. Sci.. U.S.A. 79, 277-291. Mandel, M. & Higa, A. (1970). J. MoZ. Biol. 53, 15%162. Maxam. A. & Gilbert, W. (1980). Methods Enzymol. 65, 499-560. Nyman, K.. Nakamura. K., Ohtsubo. H. & Ohtsubo, E. (1981). Nature (London), 289, 609-6 12. Ohtsubo, E., Rosenbloom. M., Schrempf, H., Goebel, W. & Rosen. J. (1978). ,woZ. Gen.. Genet. 159. 131-141. Ohtsubo, E., Zenilman, M. & Ohtsubo. H. (1980). Proc. Xat. Bead. Sci., I’.S.A. 77. 756 754.
TWO
GENES
IN ISI
245
Ohtsubo, E., Zenilman, M., Ohtsubo, H., McCormick, M., Machida, C. & Machida, Y. (1981). Cold Spring Harbor Symp. Quant. Biol. 45, 283-296. Ohtsubo, H. & Ohtsubo, E. (1978). Proc. Nat. Acad. Sci., U.S.A. 75, 615-619. Ohtsubo, H., Zenilman, M. 6 Ohtsubo, E. (1980). J. Bmteriol. 144, 131-140. Ohtsubo, H., Nyman, K., Doroszkiewicz, W. & Ohtsubo, E. (1981). Nature (London), 292. 640-643. Rack, B., Lusky, M. & Hable, M. (1982). Nature (London), 297, 124-128. Rosen, J., Ryder, T., Inokuchi, H.. Ohtsubo, H. & Ohtsubo, E. (1980). Mol. Gen. Genet. 179, 527-537. Rothstein, S. J. & Reznikoff, W. S. (1981). Cell, 23, 191-200. Rothstein. S. J., Jorgensen, R. A., Yin, J. C.-P., Yong-Di, Z., Johnson, R. C. & Reznikoff, W. S. (1981). Cold Spring Harbor Symp. Quant. Biol. 45, 99-113. Shortle, D. & Nathans, D. (1978). Proc. Nat. Acad. Sci., U.S.A. 75, 2170-2174. Trinks, K., Harberman, P., Beyreuther, K., Starlinger, P. & Ehring, R. (1981). Mol. Gen. Genet. 182, 183-188. Edited by G. A. Gilbert