Gene, 17 (1982) 299-310 Elsevier Biomedical
299
Press
Analysis of plasmid genome evolution based on nucleotide-sequence plasmids of Escherichia co/j (Antibiotic-resistance
plasmids
Rl and RlOO; synonymous
base changes;
comparison of two related
gene conversion)
Thomas B. Ryder, Daniel B. Davison, Jonathan I. Rosen *, Eiichi Ohtsubo and Hisako Ohtsubo Department (U.S.A.)
of Microbiology,
(Received
September
(Accepted
January
School of Medicine, State University of New York at Stony Brook, Stony Brook, NY 11794
23rd. 1981) 26th. 1982)
SUMMARY
Plasmid Rscl3, a small derivative of the plasmid Rl, contains a region necessary for replication as well as a complete copy (4957 bp) of the ampicillin resistance transposon, Tn3. We determined the nucleotide sequence of the replication region of Rscl3 to be 2937 bp and then compared this region (designated the 2.9-kb region) to the analogous region of pSM1, a small derivative of the plasmid RlOO which has common ancestry with Rl. Rscl3 and pSM1 were 96% homologous in this 2.9-kb region except for a discrete region of about 250 bp which showed only 44% homology. The sequence and distribution of nucleotide substitutions between Rscl3 and pSM1 supported a map of possible genes and sites which have previously been seen in the replication region of pSM1. Most of these genes and sites were shown to evolve by single base substitutions, daltons),
was found
but not by insertions
or deletions.
in that region of Rscl3
However,
one of these genes, termed
and pSM1 which showed only 44% homology.
repA
Analysis
(11000 of the
amino acid sequence and predicted conformation of the two RepA polypeptides, however, suggested that they were very similar. We proposed that the repA region of Rl and RlOO was replaced by a substitution of a short DNA segment from another plasmid which was evolutionarily related to Rl and RlOO but had more divergence. This event may have been mediated by a mechanism similar to that of gene conversion as described in eukaryotic systems.
plasmid
INTRODUCTION
Rl, RlOO and R6 are resistance plasmids that have been isolated in different parts of the world. Electron * Present
microscope address:
Department
sity School of Medicine, Abbreviations: kilodaltons;
heteroduplex
bp, base kb, kilobase
of Pathology,
Stanford, pairs;
analysis Stanford
ethidium
Univer-
bromide;
K,
pairs.
0000-0000/82/000&0000ooo/%02.75
0 1982 Elsevier Biomedical
are
although each plasmid poscomplement of translocata-
ble elements (IS and Tn) and there are several small nonhomologous regions (Sharp et al., 1973; Ohtsubo et al., 1978). These plasmids are mutually incompatible in the cell, belonging to the FII incompatibility group (Datta, 1974). Previous heteroduplex studies on small plasmids derived from Rl and RlOO and cloning anal-
of these
CA 94305 (U.S.A.) EtBr,
DNAs has shown that their sequences
largely homologous, sesses a distinguishing
Press
300
ysis showed
that the region
tion and incompatibility region
common
et al.,
1980; Molin
et al.,
and Cohen,
Covalently closed circular plasmid DNA purified in CsCl-EtBr gradients as previously
and
Nordstrom,
scribed
to contain
et
seen between
1978). We have
nucleotide
sequence
pSM1,
a small
scribed
a number
recently
determined
(c) Enzymes
region
of possible
location
the
The
of
HueIII,
of RlOO, and have decoding
frames
sen et al., 1980). We have also mapped proximate
of the origin
(Ro-
the ap-
and terminus
of
replication as well as the location of three transcripts generated in vitro from the replication region of pSM1 (Rosen et al., 1979; 1981). In this paper, we report the complete nucleotide sequence
of the small plasmid
Rscl3,
which was
derived from Rl and contains the region necessary for replication and incompatibility (Goebel and Bonewald, 1975; Ohtsubo et al., 1978). Comparison of the nucleotide sequence of Rscl3 with that of pSM1 will reveal that there are two distinctive regions, one showing homology of 96%, and the other only 44%. Based on this result of coding
regions
et al.. 1978).
RI and RlOO (Ohtsubo
of the replication
derivative
(Ohtsubo
was de-
a 0.27-kb
which was one of the small nonhomolo-
gous regions
(b) Isolation of plasmid DNA
to a 2.5-kb (Ohtsubo
et al., 1978; Taylor
1980). This region was found segment
for replica-
was restricted
to all of the plasmids
al., 1978; Kollek 1979; Miki
necessary
and analysis
and sites, we consider
a type of
plasmid genome evolution: the acquisition of a DNA segment by substitution, creating the observed region of low homology.
DdeI,
restriction HinfI, BstNI,
Bethesda
endonucleases,
AluI, HpaII,
Sau3A,
and
were
Research
HincII
Laboratories,
BglII,
Hae11,
BamHI,
SmaI,
purchased
from
Inc. HhaI,
Pst I,
PuuII, and HphI were purchased from New England Biolabs Inc. Polynucleotide kinase (PL Biochemicals), bacterial alkaline phosphatase (Worthington Biochemicals) and T4 DNA ligase (Miles
Inc.)
were also used.
for all of these enzymes
Reaction
conditions
were as recommended
by
the supplier. (d) Nucleotide sequence analysis Restriction fragments were analyzed fied by gel electrophoresis as previously (Rosen alkaline stranded
et al., 1980). After treatment
and puridescribed
with bacterial
phosphatase, the 5’ ends of doubleDNA fragments to be sequenced were
labeled using [ y-32P]ATP (Amersham) and polynucleotide kinase (PL Biochemicals). Strand separation or further cleavage with restriction endonucleases was employed to yield uniquely labeled DNA fragments (Maxam and Gilbert, 1980). DNA sequencing was performed by the Maxam and
MATERIALS
AND
METHODS
(a) Bacterial and plasmid strains The Escherichia
coli K- 12 strain
Gilbert
method
resolved
on 20 X 40 X 0.04 cm gels as described
(1980).
Sanger and Coulson C600 (thr, leu,
thi, lacy, tonA, supE) was used as a host for plasmids. Plasmids used were Rscl3 and pTR1. Rscl3 was derived from pKN 102 which is a copy number mutant of Rl (Goebel and Bonewald, 1975). Further information on the structure of Rscl3 will be given in RESULTS. pTR1 is a small derivative of Rscl3 and will be described in RESULTS. The plasmid pSM1 (5.67 kb in length), referred to in the text, was derived from a copy number mutant of RlOO (89.3 kb in length) (Mickel et al., 1977).
Sequencing
ladders
were by
(1978).
(e) Computer analysis The nucleotide sequence data were analyzed using the NIH SUMEX-AIM computer facility available through the MOLGEN project at Stanford University. We used the programs available to find dyad symmetries and restriction sites, and to translate all reading frames into amino acid sequence. Protein secondary structure predictions were performed using algorithm of Chou and Fasman (1978) with a program written by Dr. K. Thomp-
301
son of the Brookhaven ogy Department. helix, P-pleated using
National
Calculations sheet,
random
several conformational
and Fasman
Laboratory were made
coil, and P-turns, parameters
of Chou
that
base
position
negatively
(leftward)
Kollek
have recently corresponding of Rscl3
5pg and
of Rscl3
BumHI,
and
DNA
identical was digested
was incubated
with BglII
in a 100 ~1 of
reaction mixture with T4 DNA ligase using conditions recommended by Miles Inc. The ligated saminto E. coli C600 according
ple was transformed
to
Mandel and Higa (1970). pTR1 was isolated from one of the ampicillin-resistant transformants grown on L-plates
containing
50 pg/ml
of ampicillin.
(rightward)
and
from this point.
reported
nucleotide
to bases between
and between
respectively. (f) Construction of pTR1
at one of the two
positively
et al. (1980) and Stougaard
we will present
sheet.
1 appears
Bg111 sites and proceeds
for a-helix and P-pleated
(1978). In this paper,
the results of calculations
Biolof (Y-
-433
The Kollek
to the Rscl3
et al. (1981)
sequences
of Rl
+ 1702 and + 2020 and +623
of Rscl3,
et al. sequence
is almost
sequence.
There are several
differences
between
the region
-433
to -400
of
the Rscl3
sequence
and that of the Stougaard
et
al. sequence. (b) Comparison of the sequence of Rscl3 with that of pSM1, a derivative of RlOO (1) Homologous and nonhomologous regions Previous heteroduplex analysis showed that the 2.9-kb region of Rscl3 is homologous to pSM1 except for a 0.27-kb region of nonhomology indi-
RESULTS
(a) Determination Rscl3
of the nucleotide
sequence
of
Plasmid Rscl3 (Fig. 1) carries the transposon Tn3, 4957 bp long (Ohtsubo et al., 1978; Heffron et al., 1979). The non-Tn3 ferred
to in this paper
portion
as the 2.9-kb
of Rscl3,
re-
region,
is a
cated by the sawtooth line in Figs. 1 and 5. (Fig. 5 also shows the map of coding frames and various sites to be explained in the next section.) We compared the sequence of Rscl3 with the pSM1 sequence which has been previously described (Rosen et al., 1980). In Fig. 3, differences observed in the pSM1 sequence
are shown below the Rscl3
sequence. Note that pSM1 base positions have been assigned coordinates different from those of Rscl3
(see legend
to Fig. 3). The nonhomologous
segment of the large plasmid Rl (86.3 kb in length) and contains the region necessary for replication
region mentioned above was identified between - 125 and + 125 of Rscl3 and was 249 bp, con-
and incompatibility. Fig. 1 shows the 2.9-kb region with the conventional coordinates of Rl, namely
sistent with the value 0.27 f 0.05 kb obtained
at the 80.2-83.0 region, expressed in kb from a selected origin defined in the heteroduplex analysis (Ohtsubo et al., 1978). From Rscl3, we constructed a small derivative, named pTR1, which was very useful for determining the nucleotide sequence of most of the 2.9-kb region of Rscl3. pTR1 was constructed, as described in MATERIALS AND METHODS, by cloning a BglII-BumHI fragment of Rscl3 which is shown in Fig. 1. Fig. 2 shows the strategy used to determine the nucleotide sequence of pTR1 and of Rscl3. Fig. 3 summarizes the nucleotide sequence of the 2.9-kb region of Rscl3, composed of 2937 bp. The nucleotide sequence was arranged in Fig. 3 such
from
heteroduplex analysis (Ohtsubo et al., 1978). Although we call this region the nonhomologous region throughout this text, the nucleotide sequence comparison shows that there was about 44% homology between Rscl3 and pSMl as seen in Fig. 3. The sequence throughout the rest of the 2.9-kb region was essentially homologous, with .95 base substitutions in 2788 bp between Rscl3 and pSM1. There were no net deletions and insertions in this homologous region (Fig. 3). (2) Coding frames and their neighboring regions Fig. 4 summarizes termination codons
the location of initiation and in all reading frames in the
302
tnp R \\,BamHI
s bla
(Amp’)
Rsc 13
IRPstI % BglII
-x,
P
r
JIR-R Pst
I
Ori
P&I
82.5
83.0=’
I I
piRl Fig. 1. Circular repressor Rx13
(top) and linear (bottom)
(mpR)
and transposase
(tnpA).
map. The thin line indicates
region shows the location in kb of RI, the parent
of Rscl3.
(see text), contains
the 2.9-kb region, relative
(arrow)
a portion
of the plasmid
as well as the Tn3 terminal
of nonhomology
the origin (Ori) and direction Rscl3
representations
between
necessary
Rscl3
to a reference
of replication
of Rscl3
Rscl3. inverted
for replication
and pSMl
(IR-L
of genes of Tn.J for ,%lactamase and IR-R) are indicated
and incompatibility.
(see also Fig. 5). The numbers
point defined
are as defined
The locations repeats
as 86.3,‘O.O (Ohtsubo by Ohtsubo
The sawtooth
line in the 2.9.kb
(X0.2-X2.5-X3.0)
are coordinate>
et al., 197X). The approximate
et al. ( 1977). pTRl.
( h/o),
on the circular
the bmall plasmid
location
of
derivative
of
as indicated.
pTR 1 Fig. 2. Sequencing both
strands
nonhomologous from pTR
I.
strategy
were
for the 2.9-kb region of Rscl3.
sequenced.
region of Rscl3
The numbers with pSMI.
refer
The restriction
to the nucleotide
fragments coordinates
Note that most of the fragments
were sequenced
at least twice. Whenever
used in Fig. 3. The
for sequencing
sawtooth
the 2.9.kb region of Rscl3
possible.
line show5
the
were prepared
303
tions:
for example,
cated in Fig. 4 as repAl, repA2, repA3, repA4, and
with
Ser, Ile with
5.3 K, encoding
several places, amino
2.9-kb
region
of Rscl3.
Five coding
polypeptides
frames
and assumed (Rosen
Fig. 5, repA replication, origin and
seen in the pSM1
been displaced
to be the most likely coding
tion of amino
et al., 1980;
known
1981). As shown
was found in the region necessary upstream
and terminus repA
indi-
larger than 5000, are
those which have been previously sequence
frames
of the region containing of replication
in for the
copy
number
control
and
in-
compatibility (Taylor and Cohen, 1979; Miki et al., 1980; Molin and Nordstrom, 1980; Molin et al., 1981). Note that the orientation of the translation reading frame for 5.3 K was opposite to the other coding frames (Fig. 5); repA straddled the replication
origin/terminus
1979; 1980). Among
region
(Rosen
et al.,
these coding frames, existence
of repA and repA of pSM1 has been supported by genetic and biochemical evidence as discussed in Rosen et al. (1980). Fig. 3 shows the amino
acid sequences
specified
by the coding frames of Rscl3, repAl-4, above the nucleotide sequence. Different amino acids specified by the pSM1 sequence,
due to differences
domly,
in
the nucleotide sequence of pSM1 from that of Rscl3, are also shown above the Rscl3 amino acid sequences. The repA coding frame was identified between + 394 and + 1248 of the Rscl3 nucleotide sequence and could encode a 285 amino acid polypeptide (33000 daltons). In this coding frame, there were 49 bp substitutions between Rscl3 and pSM1. These changes resulted in only eight substitutions in the possible amino acid sequence, as 39 of the base substitutions were in the third position of their respective codons and did not cause amino acid changes. The repAi coding frame was found between - 159 and + 99 of Rscl3 and could encode an 86
that
with Lys, Thr
Leu,
with
and
Glu
acid sequences
by only one codon. acid sequences
region
base changes
by the non-
is very significant,
since if the
in this region
RepA
In
This conserva-
amino
were distributed acid homology
18 t 4.5% (2 1 u). Therefore,
two
Asp.
seem to have
encoded
the expected
be only
(Fig. 5); repA
as well as 5.3 K were in the region
to express
homologous
Arg exchanged
polypeptides
pSM1, though diverged,
from
ranwould
we assume Rscl3
and
have a similar structure
in
vivo. To support
this assumption,
we examined
the
computer-predicted secondary structures of the two RepA polypeptides as predicted by the algorithm of Chou and Fasman (1978). As shown in Fig. 6, the probability curves predicting the positions of a-helix
or P-pleated
polypeptides
sheet between
were remarkably
the two RepA
similar.
The coding frames, repA and repA4, were found to be well conserved between Rscl3 and pSM1.
repA3,
seen between
Rscl3
(Fig. 3), could
code
+ 196 and
+ 378 of
for a 6 1 amino
acid
polypeptide (7000 daltons). When compared to pSM1, there were two base changes, both causing amino acid changes. repA4, found between + 1614 and + 1997 of Rscl3 (Fig. 3) could code for an 128 amino acid polypeptide (14000 daltons). repA of Rscl3 differed from that of pSM1 by 12-bp changes, resulting in eight amino acid differences. The fifth coding frame, 5.3 K, which with the repA coding frame such that of the corresponding message would from the repA message, was found at from +280 to + 130 (Fig. 3, though
overlapped the polarity be opposite the position the amino
acid sequence for the 5300 dalton polypeptide is not shown). When this was compared with pSM1, this coding frame contained two single base differences, resulting in one amino acid change.
amino acid polypeptide (11000 daltons). Of particular note is that the repA coding frame of Rscl3
In all of the coding frames analyzed above, the regions which precede these coding frames and
and pSMl began at the same sequence in the homologous region, extended into the nonhomologous region, and encoded polypeptides of similar size (Fig. 3). When the two amino acid sequences encoded by the nonhomologous region were compared, they shared identical amino acids at 38% of the codons. Furthermore, some of the amino acid differences were found to be conservative substitu-
may contain possible Shine-Dalgarno (1974) sequences and termination codons around the initiation codons, characteristic of true coding frames (Atkins, 1979) were found to be completely conserved between Rscl3 and pSM1. We have previously reported the positions of three RNA transcripts, named RNAI-III, on the common region between pTR1 and pSM1 (Rosen
-200
tSorcznThr~&A& avazThrserSerLeuSer GTCGCAGACAGAAAATGhGTGACTTC~
~CACAATTCTCAAGTCGCTGTTTC~CTGTAGTAT~~T~TGC~~~T~~CTG~TT~GTATT~G~~~~ 100 A@m&r Ah LsuSsr Ala AZaQ,r Lys A,m GC
+I
A
AGCATACA A AG
G
1
GATTAT A
GC TT
G
AlaSerPk
CC T
V01 HisLyysCzuIzsLyysValP
A AGC T GTT
G
LeuCZUPWLy#Ty~ ValLysAmP~~LeuLy& K GT~MTcCTCTW Cm
At
7,111
xzaiz?t~euMet~znNot niec1uAep
Lau AspL.euMBtVaZCz,,Tyr e&A,. CzuGZyi’ztoTk ZnAzo GATCTCATGGTTGAGTACT% CGAiiAG&iAGGGGATAACAEAGGCT GCCA GC CA CC AATG TC TGA T TC G T
300 eVc?Zi~eCy&dzuL~,~ kL+C?,,.SerP,,eQ.,Al AGTTATATGCCCGGAAAAG P TCAAGACTTCTTTCTGTG d
TCCGGTGTTCACTCCCCGT T A 500
600 CATGTGGCGCATGCCCGTT
ATGGATGAGCTGATAGC A 1100 AZO 1200 Arg ArgCZnAspIZeVnll’hr uVaZLysArgCZnL&uTk rgCZ,,IZaSerCLuCzyAr PkThrAZdmCLyCl,,A aV~ZLy~ArgCluVaZCZWr~ArgV~ZLyyeCZuAr~~~ CGTCAGGATATCGTCACCC AGTGAAACGGCAGCTGACG i, G;GAAATC;CGGAAG~CG eTTCACTGC;AATyTGAGG cGGTAAAACGCGAAGl~GAGtGTCGTGTGAAGGAGCGCAT E T 'G IleZ&SerA~AsnArgA nQrSerA~#,%Aza ATTCTGTCACGTAACCGC~TTACAGCCGGCTGGCC T
GTGATCfCCTCAGAATAATCCGGCCTkGCCGWGGCATCCG::P T
4
CTGAAGCCCG:C~;GCACl,
AAAAAACAGCGTCGCATGC~C~TCTCATCATCC~CCTTCTG~GCATCCGATT~CCCCTGTTTTT~TAC~~TACGCCTCAGC~CG~~TTTTGCTTATCCACA~T~ T AC A CA C G T GCC CG C
A
(r-011 1500 CTGCAAGGGACTTCCCCATi\AGGTTAC~CCGTTCATGT~AT~GCGCCAGCCGCCAGfCTTACAGGGTGC~TGTAT~TTlT~CACCTGTTTAT~TCTCCTTT~CTACTTM~ v 1"
PkIZe yeAenTyrILePheIloA1 ,mC+TC:TCATCGC
? ”
I
2000 LyPmL.euArgPmAr zdzd4uSorhpL4uL sAZaCZyMstValR.pCLn ZyTrpCZyTrpVaLAr ““f” GAAGCTCTCTCATGGCTGAhGCGGGTATGGTCTGGCAG EGCTGGGGATGGGTAAG GACCACTCCGACCGCGCAC _-------_-_
?-RNA
III
TTTTACTCCTGTATCATATGC~CAACACACTfCcATGCCGCTt 2200 GGACAAGTTAAAAATTTACRGGCGATGCMTGATTCAAAEA
TCACGTTAAGGGATTTT -3'
CTATcAATCAGTAccGtCTTACGCC
I CGGCd:
I
I 3'1
) ’
Cl
Tn3[ PstI 1I ,
BglII
- 500
I
Fig 4. Location
//
I BqlrIPstI
I
of possible
initiation
(vertical
lines) in all reading
encoding
polypeptides
frames
coding
500
’
AUG
of the Rscl3
frames (shaded
seen in the pSM1 sequence.
These coding
fact, when they were compared caused
/WI I
larger than 5000 daltons.
other three possible
changes
codons
1
amino
(long diagonal 2.9-kb
lines) and GUG
(short
The solid boxes, labeled
diagonal repAl-
lines), and termination
codons
and 5.3 K, show coding
frames
These coding frames have also been found in the pSM1 sequence (see also Fig. 5). The
boxes, labeled
a-c) which could also encode
frames have been assumed
with those in pSM1,
acid changes
region.
1 15'
polypeptides
larger
to be less likely to encode polypeptides
there were several base changes
in Rscl3
than 5000 daltons, (Rosen
were
et al., 1980). In
and all, or more than 90%. of these
(data not shown).
et al., 1981), as schematically shown in Fig. 5. RNA1 (91 bases in length) is transcribed leftward from the position + 309 on the Rscl3 nucleotide
The -35 region (but not the - 10 region) of the promoter for RNA11 was, however, found to be located in the nonhomologous region between
map (see Fig. 3). A form of RNA1 extended
Rscl3 and pSM1 (Fig. 3). RNA111 (150 bases long) is transcribed rightward from the position +2062
its 3’ end is assumed
to be the mRNA
from
coding
for
of Rscl3 (Fig. 3). The promoter the role of which is unknown,
the 5300 dalton polypeptide (Rosen et al., 1981). When the promoter, consisting of - 10 and - 35 regions (Rosenberg and Court, 1979), was examined, between
tween the Rscl3
the promoter for RNA1 was identical the Rscl3 and pSM1 sequence (Fig. 3).
Fig. 3. Nucleotide the Rscl3
sequence
sequence.
than Rscl3,
showing
The nonhomologous
as represented
frames, repAl-
Rscl3
amino acid sequence.
origin/terminus. to -248
of Rscl3,
of the entire 2.9-kb region of Rscl3 of Rscl3
region of Rscl3
The broken-line
arrows
underneath
of pSM1 is different where a &/II
that we have corrected additional
Rosen et al. (1979) have previously determined the nucleotide sequence of the region containing
Rscl3
is explained
and pSM1 lies between positions
base shortened
the DNA
the reading
sequence
from that of Rscl3,
pSMl
differences
- 125 and + 125. In this region, pSMl Amino
acid differences
frame of RepA
symmetries.
such that base position
and proceeds
sequence
show dyad positively
by inserting
(rightward)
of pSM1 and increased
at the position
from this point (Rosen
the original
corresponding
pSMl
is 7 bp larger by the Rscl3
in pSM1 are shown above the
that the coordinate
1 of pSM1 appears
a base, A, at the position
specified
the region containing
Note
Tn3 sequence.
in pSM1 are shown below
+ 120 and + 121. The amino acid sequence
box in the region from + 1683 to + 2029 indicates
site is present,
our previous
and a few bases of the flanking
in the text. Sequence
(solid line boxes), are shown above the DNA sequence.
Facing
sequence
sequence
by seven dots between
coding
nucleotide
one strand
system of the nucleotide
and pSM1 sequences.
(3) Region containing origin and terminus of replication
RNAII, a large message, it transcribed rightward from position + 142 of Rscl3 (Fig. 3) and is assumed to be the mRNA for both repA and repAl.
The coordinate
for this transcript, was identical be-
coordinates
the replication system
for the
corresponding
et al., 1980). Note also to C84 of Rscl3. from that position
The by 1.
306
pTHl Tn3 Rsc I3@-repA
I
pSM1 I
INC-COP
I
s
I ’ REPLICATION
I
I
1
t Fig. 5. A map of genes and sites in the replication homologous
to the Rscl3
pTRl sequences. frames pII,
The solid and open thick arrows
and show the direction
~111, promoters
incompatibility
and
of translation.
t1, tI1. and
and copy number
the origin and which has been between pSM1 (Ohtsubo et al.,
regions
2.9-kb region. except for the 0.27-kb The arrows
tII1
control
labeled
terminators.
(inc-cop)
labeled RNAI. Replication
and replication
compared with that of Rscl3, one and two standard deviations (a) around the origin/terminus of pSMl were found to correspond to bases between + 1683 and + 2029 (1 a) of Rscl3 (see the brokenline box in Fig. 3) and between -t 15 19 and f 2202 of Rscl3. Note that 12 bp differences be-
(20)
in the 2 u region,
but none affected the extent of the dyad symmetries seen in this region (Rosen et al., 1979) as shown by the arrows quence in Fig. 3.
beneath
the nucleotide
se-
DISCUSSION
(a) Homologous region: evolution nomes by single base substitutions
pSMI,
nonhomologous
of plasmid ge-
As shown in the text, the homologous region contained no net deletions and insertions but many
Rscl3
and pTR1.
repA
and 5.3K correspond
11. III correspond proceeds
are indicated
pSM1
region shown by the sawtooth
repA2, repA3, repAl.
terminus of replication of pSM1, mapped in the homologous region and Rscl3 by electron microscopy 1977; 1978). When this region was
tween Rsc 13 and pSM 1 occurred
of plasmids
to RNA transcripts,
rightward
from
Ori.
The
contains
the sequence
line on the Rscl3 to polypeptide together regions
and
coding
with the p1, required
for
at the bottom.
single base substitutions. Such substitutions seen in genes coding for functional proteins
are such
as those shown for the trpA genes of S. typhimurium LT2 and E. coli K-12 (Nichols and Yanofsky. 1979). In this case, the base substitutions are those that do not drastically change the amino acid sequence of the TrpA proteins. Most occur at the third position of codons, or at other positions resulting in synonymous amino acid substitutions. The repA coding frames of Rscl3 and pSM 1 showed conservation of amino acid sequence in a fashion similar to the trpA genes, suggesting that they encode functional proteins and evolve by single base substitutions. The region containing hypothetical coding frames for RepA and the 5300 dalton polypeptides were well conserved except for two base changes which occurred in the overlapping coding frames. This region is known to be responsible for incompatibility and copy number control (Taylor and Cohen, 1979; Miki et al., 1980; Molin and Nordstrom, 1980; Molin et al., 1981). In fact, the two base changes in this region are due to newly acquired mutations, causing copy number increase
0.8
1.2 e/3> I.0 0.8 0.60 L
10
20
30
40
50
RESIDUE Fig, 6. Predicted potential
sheet potential (Pa)
above
Fasman,
conformational
of the pentapeptide
of the pentapeptide 1.0 have a-helix
et al.,
changes
from Rscl3
70
NUMBER
( -)
and pSM1 (------).
(Pa)
is the weighted
the i th amino acid, i.e. the five ammo acids from i - 2 to i to i + 2. (Pa) around
and p-sheet
80
the i th amino acid, i.e. the five amino acids from i - 2 to i to i + 2. Regions forming
potential;
values below
1.0 indicate
helix or sheet breaking
average
is the weighted
helical average
with (Pm) and
potential
(Chou and
1978).
in the parental
sen
profile of RepA around
60
plasmids
1980). The
in this coding
of Rscl3 absence
and pSM1 (Roof synonymous
frame is probably
signifi-
cant and even assuming that RepA is made, the presence of this gene is probably insufficient to explain the high degree of conservation in this region. The presence of RNA1 and its promoter and the 5’ end region of RNA11 in this region (Fig. 5) may place additional constraints on the mutability of this region.
The region where origin and terminus of replication were mapped by electron microscopy was well conserved such that none of 12 base changes in the 697-bp region affected the dyad symmetries predicted in this region (Rosen et al., 1979). The conservation may suggest that these dyad symmetries might be important and could be involved in the initiation and/or termination of DNA synthesis. The region of 384 bp containing the repA
308
coding
frame was well conserved
substitutions.
The
limited
except for 12 bp
change
in this region
to the fact that repA
may be due, in part, laps with the region
responsible
and/or
of DNA
termination
over-
for the initiation synthesis
described
in the rest of the homologous
the sequence
between
origin/terminus
between
mutation
frequency
260 bp. This may suggest
region is
ciprocal
recombination
and
with a similar
the and
with 26 differences
in
itself is
not functionally important but the precise distance from the terminus of the repA coding frame to the
origin
region
is important.
force for the evolution
Alternatively,
and insertions
events and base substitutions
the mechanism by Stahl
+ 1249
it
are rare
are the major driving
of genomes.
at small
plasmids.
Al-
may have been similar
gene conversion
( 1979), may replacing
but heterologous
which,
involve
as
a nonre-
a DNA sequence gene sequence.
It is interesting that the junctions between the homologous and nonhomologous regions present at - 125 and short
inverted
+ 125 of Rsc 13 were bounded repeats,
CTCA
T
TGAG
by and
TAGCA TGCTA respectively (Fig. 3). These inverted repeats may have been involved in or resulted the
from the recombination
nonhomologous
region
which gave rise to between
Rscl3
and
pSM1. We have described in RESULTS that the promoter pI1 for RNA11 of Rscl3 and pSM1 lies at a homologous-nonhomologous
(b) Nonhomologous region: implication of a second type of evolution of plasmid genomes
of this event
recombination
of two diverged
of eukaryotic
that the region
may be that small deletions
regions
to that
+ 1509 (Fig. 3). This region shows the highest substitution
homologous
mechanism
a simple
reviewed the end of repA
region,
molecular
have been
ternatively,
above. Notable
The actual may
junction,
down-
stream from the repA gene. It is of interest to speculate that promoters may be involved in this type of recombination.
The present nucleotide sequence study indicates that the nonhomologous region between Rscl3 and pSM 1 has 44% base homology. We have shown
Several regions of nonhomology similar to the one retained in Rscl3 and pSM1 have been observed as small substitution loops in electron mi-
that
croscope heteroduplex studies on Rl and RlOO (Sharp et al.. 1973; Ohtsubo et al.. 1978). Small
most
of these
regions
of Rscl3
encode polypeptides (called have the same function. Calculations
which consider
RepA2)
and
pSM1
which
may
substitution only the selectively
neutral, synonymous base changes (Kimura, 1981) show that the extent of divergence in the nonhomologous minimally
part of the repA coding frame 10 times greater than the divergence
is in
eroduplex al.,
1973;
(Davis Simon these
and
loops have also been seen in the hetmolecules Inselburg, Hyman.
of other 1973). 1971:
et al., 1971: Kim and structures
are
formed
plasmids and Fiandt
et al..
Davidson, due
regions similar between Rscl3
Rl
nism discussed above could be frequently in the evolution of genomes.
and RlOO, the mutation
rate must have been
et
1971;
1974). If
to mismatched
the repA coding frame. Thus, if repA and repA both diverged in situ from the common ancestor of consistently least 10 times greater in the nonhomologous region. It is possible that this region represents a mutational hot spot, but it is not clear how this phenomenon would produce such abrupt junctions between homologous and nonhomologous sequences or why such a region overlaps a gene. Therefore, we assume that genes corresponding to repA have diverged in at least two distinct genomes, one of which was the common ancestor of Rl and RlOO and that at some point after Rl and RlOO diverged, part of the repA gene of RI or RlOO has been replaced with sequences from a more distantly related genome.
(Sharp
bacteriophage
to the nonhomologous region seen and pSM1. the substitution mechainvolved
ACKNOWLEDGEMENTS
We would like to thank Drs. Brutlag and L. Kedes for permission to use the SEQ system, and Dr. K. Thompson for providing the Chou and Fasman program and some computer time. We would also like to thank Linda S. Hollmann for typing the manuscript, Mary Ann Huntington for
309
preparingthe illustrations,and Jeff Demianfor the photographs. This work was supportedby Unite! StatesPublic Health Service grants to E.O. (GM22007) and H.O. (CM26779) and by partial support to T.R. under an NIH training grant (CA09176).
Moldave,K. (Eds.)Methodsin Enzymology.Vol. 54, Part II, AcademicPress,NewYork, 1980,pp. 499-560. Mickel, S., Ohtsubo, E. and Bauer, W.: Heteroduplex mapping
of small plasmids derivedfrom R-factorR12: In vivo recombination
occurs
at ISI insertion
Miki, T., Easton,
A.M. and Rownd,
tion, incompatibility, NRl. Molin,
J. Bacterial.
trol,
Atkins,
J.F.: Is UAA or UGA part of the recognition
ribosomal Chou,
initiation?
P.Y. and
Fasman,
G.D.:
structures
of proteins
Enzymol.
47 (1978) 45-148.
Datta,
N.:
Prediction
Adv.
R.W.
and
classification
of plasmids.
In
1974. American Society
D. (Ed.) Microbiologyand
Hyman,
DNA base sequence
R.W.:
A study
homology
M., Hradecn&
Electron
between
Z., Lozeron,
micrographic
homologies
mapping
in the DNAs
in Hershey,
coliphages
the
T7 and
H.A.
and Szybalski.
of deletions,
of coliphages
Laboratory,
insertions
lambda
tor Rldrd-19B2.
J. Bacterial.
F.. McCarthy,
DNA sequence
antibiotic
NY, 1971, plas-
resistance
fac-
H. and
of the transposon
Ohtsubo.
E.:
Tn3: Three genes
in the transposition
of Tn3. Cell 18
factor
in ColE2-E3
DNA:
heteroduplex
Kim. J.S. and Davidson, study of sequence
molecules.
Nature
New
N.: Electron
relations
Estimation
homologous
microscope
of T2, T4 and T6 phage DNAs.
sequences.
J.,
Nordstrom,
RI, and identification control.
J.
M.
and
of new copy
of a polypeptide
Mol. Gen.
C.: Nucleotide
Genet.
181
sequences
of trpA
and Escherichia cob: An evolu-
typhimurium
Proc. Natl. Acad.
E., Feingold, from
Ohtsubo,
J., Ohtsubo, replication
R factor
Rl2
E., Rosenbloom,
Sci. USA 76 (1979)
ation
of small
H., Mickel, S. and Bauer, of three small plasmids
in Escherichia
M., Schrempf,
Rosen, J.: Site-specific
Oertel,
W.
and
distances
Proc.
Goebel,
of the minimal
replication
(pKNlO2)
Natl.
between Acad.
Sci.
(“basic
of the antibiotic
W.:
fragment
replicon”)
resistance
Isolation required
and for au-
of a copy mutant
factor
RI. Mol. Gen.
Mol.
de-
coli. Plasmid
H., Goebel,
recombination
plasmids.
R.. Oertel. W. and Goebel,
resistance
factor
at RI.
1
W. and
involved in the gener-
Gen.
A.: Calcium-dependent
H. and Ohtsubo,
of the region
Genet.
159 (1978)
bacteriophage
J. Mol. Biol. 53 (1970) 159-162.
Maxam, A. and Gilbert, W. Sequencing end-labeled DNA with base-specific chemical cleavages, in Grossman, L. and
E.: The nucleotide
surrounding
factor
the replication
derivative.
se-
origin
Mol. Gen. Genet.
of 171
(1979) 287-293. Rosen,
J., Ryder,
T., Inokuchi,
H., Ohtsubo,
E.: Genes and sites involved of an RlOO plasmid
Rosen,
analysis.
J., Ryder,
Nature
in antibiotic
and incompati-
based
on nucleotide
179 (1980) 527-537.
H. and Ohtsubo,
in replication,
E.: Role of
incompatibility
resistance
and copy
plasmid
derivatives.
290 (1981) 794-797. M. and Court,
in the promotion Annu.
derivative
T., Ohtsubo,
control
H. and Ohtsubo,
in replication
Mol. Gen. Genet.
transcripts
D.: Regulatory
and termination
Rev. Genet.
sequences
of RNA
involved
transcription.
13 (1979) 319-353.
F. and Coulson,
A.R.: The use of thin acrylamide
for DNA sequencing.
FEBS Lett. 87 (1978) 107-I 10.
Sharp,
P.A., Cohen,
scope
S.N. and Davidson,
heteroduplex
plasmids
studies
16s
N.: Electron
of sequence
of E. co/i. I. Structure
gels
relations
of drug resistance
microamong
(R) and F
J. Mol. Biol. 75 (1973) 235-255.
Shine, J. and Dalgarno, rRNA:
ribosome
177 (1980) 413-419.
M. and Higa.
DNA infection.
W.: Site specific deletion
origin of the antibiotic
Mol. Gen. Genet.
J., Ohtsubo,
quence
factors.
162 (1978) 51-57.
the replication
Rosen,
Sanger,
tonomous
Mandel.
Ohtsubo,
Rosenberg,
of evolutionary
nucleotide
characterization
Kollek.
con-
replication.
and characterization
number
comparison.
number
heteroduplex
USA 78 (1981) 454-458.
Genet.
of
5244-5248.
RNA
57 (1974) 93-111.
R.,
switch-off
Light,
B.P. and Yanofsky,
of Salmonella
sequence
a single non-homologous
Biol. 241 (1973) 234-237.
Kollek,
Nichols,
bility
J.: Colicin
M.:
in copy
an RlOO resistance
(1979) 1153-1163.
Kimura.
of plasmid
involved
(1977) 8-18.
123 (1975) 658-665.
B., Ohtsubo,
analysis
and three sites involved
Virology
mutants
and Cold
R.: Class of small multicopy
from the mutant
region
Rl replica-
copy number
131-141.
W. and Bonewald,
Inselburg.
P.,
K.: Isolation
rived
and phi 80,
Lambda.
Cold Spring Harbor,
mids originating Heffron.
of replica-
of R plasmid
of plasmid
in replication, and
Stougaard,
W.:
pp. 329-354. Goebel,
K.: Control
involved
W.: Unidirectional
A.D. (Ed.) The Bacteriophage
Spring Harbor
S.,
tionary
in evolution:
T3. J. Mol. Biol. 62 (1971) 287-302. Fiandt,
Molin,
Cloning
functions
(1981) 123-130.
for Microbiology, Washington, DC, 1974, pp. 9-15. Davis,
2
141 (1980) 111-120.
Nordstrom,
of the secondary
from their amino acid sequence.
Epidemiology
Schlessinger,
signal for
Nucl. Acids Res. 7 (1979) 1035-1041.
Gene
141 (1980) 87-99.
incompatibility,
Bacterial.
R.H.:
and stability
S. and Nordstrom,
tion: Functions REFERENCES
sequences.
(1977) 193-210.
K.: The 3’ terminal
Complementarity
binding
sites.
Proc.
sequence
of nonsense Natl.
Acad.
of E. coli
triplets
and
Sci. USA
71
(1974) 1342-1346. Simon,
M.N., Davis, R.W. and Davidson,
of DNA molecules
of lambdoid
of their base sequence
relationships
phages:
N.: Heteroduplexes Physical
by electron
mapping
microscopy,
310
in Hershey,
A.D. (Ed.) The Bacteriophage
Spring Harbor
Laboratory,
Cold Spring
Lambda.
Harbor,
Cold
NY. 197 1.
pp. 313-328. Stahl.
F.W.:
Taylor,
Genetic
Phage and Fungi.
Recombination, Freeman,
Thinking
San Francisco,
About
It in
CA. 1979, pp.
133-159.
Stougaard,
the resistance
plasmtd
RI&d-19.
D.P. and Cohen,
S.N.: Structural
P., Molin,
S., Nordstrom,
sequence
K. and
of the replication
Hansen. control
segments
containing
compatibility
regions of a miniplasmid J. Bacterial.
F.G.:
region of
Gen.
Genet.
and functional
sis of cloned
number mutant of NRI.
The nucleotide
Mol.
1x1
(1981) 116-122.
Communicated
by S.R. Jaskunas.
the replication derived
analyand in-
from a copy
137 (1979) 922104.