Analysis of plasmid genome evolution based on nucleotide-sequence comparison of two related plasmids of Escherichia coli

Analysis of plasmid genome evolution based on nucleotide-sequence comparison of two related plasmids of Escherichia coli

Gene, 17 (1982) 299-310 Elsevier Biomedical 299 Press Analysis of plasmid genome evolution based on nucleotide-sequence plasmids of Escherichia co/...

1MB Sizes 0 Downloads 41 Views

Gene, 17 (1982) 299-310 Elsevier Biomedical

299

Press

Analysis of plasmid genome evolution based on nucleotide-sequence plasmids of Escherichia co/j (Antibiotic-resistance

plasmids

Rl and RlOO; synonymous

base changes;

comparison of two related

gene conversion)

Thomas B. Ryder, Daniel B. Davison, Jonathan I. Rosen *, Eiichi Ohtsubo and Hisako Ohtsubo Department (U.S.A.)

of Microbiology,

(Received

September

(Accepted

January

School of Medicine, State University of New York at Stony Brook, Stony Brook, NY 11794

23rd. 1981) 26th. 1982)

SUMMARY

Plasmid Rscl3, a small derivative of the plasmid Rl, contains a region necessary for replication as well as a complete copy (4957 bp) of the ampicillin resistance transposon, Tn3. We determined the nucleotide sequence of the replication region of Rscl3 to be 2937 bp and then compared this region (designated the 2.9-kb region) to the analogous region of pSM1, a small derivative of the plasmid RlOO which has common ancestry with Rl. Rscl3 and pSM1 were 96% homologous in this 2.9-kb region except for a discrete region of about 250 bp which showed only 44% homology. The sequence and distribution of nucleotide substitutions between Rscl3 and pSM1 supported a map of possible genes and sites which have previously been seen in the replication region of pSM1. Most of these genes and sites were shown to evolve by single base substitutions, daltons),

was found

but not by insertions

or deletions.

in that region of Rscl3

However,

one of these genes, termed

and pSM1 which showed only 44% homology.

repA

Analysis

(11000 of the

amino acid sequence and predicted conformation of the two RepA polypeptides, however, suggested that they were very similar. We proposed that the repA region of Rl and RlOO was replaced by a substitution of a short DNA segment from another plasmid which was evolutionarily related to Rl and RlOO but had more divergence. This event may have been mediated by a mechanism similar to that of gene conversion as described in eukaryotic systems.

plasmid

INTRODUCTION

Rl, RlOO and R6 are resistance plasmids that have been isolated in different parts of the world. Electron * Present

microscope address:

Department

sity School of Medicine, Abbreviations: kilodaltons;

heteroduplex

bp, base kb, kilobase

of Pathology,

Stanford, pairs;

analysis Stanford

ethidium

Univer-

bromide;

K,

pairs.

0000-0000/82/000&0000ooo/%02.75

0 1982 Elsevier Biomedical

are

although each plasmid poscomplement of translocata-

ble elements (IS and Tn) and there are several small nonhomologous regions (Sharp et al., 1973; Ohtsubo et al., 1978). These plasmids are mutually incompatible in the cell, belonging to the FII incompatibility group (Datta, 1974). Previous heteroduplex studies on small plasmids derived from Rl and RlOO and cloning anal-

of these

CA 94305 (U.S.A.) EtBr,

DNAs has shown that their sequences

largely homologous, sesses a distinguishing

Press

300

ysis showed

that the region

tion and incompatibility region

common

et al.,

1980; Molin

et al.,

and Cohen,

Covalently closed circular plasmid DNA purified in CsCl-EtBr gradients as previously

and

Nordstrom,

scribed

to contain

et

seen between

1978). We have

nucleotide

sequence

pSM1,

a small

scribed

a number

recently

determined

(c) Enzymes

region

of possible

location

the

The

of

HueIII,

of RlOO, and have decoding

frames

sen et al., 1980). We have also mapped proximate

of the origin

(Ro-

the ap-

and terminus

of

replication as well as the location of three transcripts generated in vitro from the replication region of pSM1 (Rosen et al., 1979; 1981). In this paper, we report the complete nucleotide sequence

of the small plasmid

Rscl3,

which was

derived from Rl and contains the region necessary for replication and incompatibility (Goebel and Bonewald, 1975; Ohtsubo et al., 1978). Comparison of the nucleotide sequence of Rscl3 with that of pSM1 will reveal that there are two distinctive regions, one showing homology of 96%, and the other only 44%. Based on this result of coding

regions

et al.. 1978).

RI and RlOO (Ohtsubo

of the replication

derivative

(Ohtsubo

was de-

a 0.27-kb

which was one of the small nonhomolo-

gous regions

(b) Isolation of plasmid DNA

to a 2.5-kb (Ohtsubo

et al., 1978; Taylor

1980). This region was found segment

for replica-

was restricted

to all of the plasmids

al., 1978; Kollek 1979; Miki

necessary

and analysis

and sites, we consider

a type of

plasmid genome evolution: the acquisition of a DNA segment by substitution, creating the observed region of low homology.

DdeI,

restriction HinfI, BstNI,

Bethesda

endonucleases,

AluI, HpaII,

Sau3A,

and

were

Research

HincII

Laboratories,

BglII,

Hae11,

BamHI,

SmaI,

purchased

from

Inc. HhaI,

Pst I,

PuuII, and HphI were purchased from New England Biolabs Inc. Polynucleotide kinase (PL Biochemicals), bacterial alkaline phosphatase (Worthington Biochemicals) and T4 DNA ligase (Miles

Inc.)

were also used.

for all of these enzymes

Reaction

conditions

were as recommended

by

the supplier. (d) Nucleotide sequence analysis Restriction fragments were analyzed fied by gel electrophoresis as previously (Rosen alkaline stranded

et al., 1980). After treatment

and puridescribed

with bacterial

phosphatase, the 5’ ends of doubleDNA fragments to be sequenced were

labeled using [ y-32P]ATP (Amersham) and polynucleotide kinase (PL Biochemicals). Strand separation or further cleavage with restriction endonucleases was employed to yield uniquely labeled DNA fragments (Maxam and Gilbert, 1980). DNA sequencing was performed by the Maxam and

MATERIALS

AND

METHODS

(a) Bacterial and plasmid strains The Escherichia

coli K- 12 strain

Gilbert

method

resolved

on 20 X 40 X 0.04 cm gels as described

(1980).

Sanger and Coulson C600 (thr, leu,

thi, lacy, tonA, supE) was used as a host for plasmids. Plasmids used were Rscl3 and pTR1. Rscl3 was derived from pKN 102 which is a copy number mutant of Rl (Goebel and Bonewald, 1975). Further information on the structure of Rscl3 will be given in RESULTS. pTR1 is a small derivative of Rscl3 and will be described in RESULTS. The plasmid pSM1 (5.67 kb in length), referred to in the text, was derived from a copy number mutant of RlOO (89.3 kb in length) (Mickel et al., 1977).

Sequencing

ladders

were by

(1978).

(e) Computer analysis The nucleotide sequence data were analyzed using the NIH SUMEX-AIM computer facility available through the MOLGEN project at Stanford University. We used the programs available to find dyad symmetries and restriction sites, and to translate all reading frames into amino acid sequence. Protein secondary structure predictions were performed using algorithm of Chou and Fasman (1978) with a program written by Dr. K. Thomp-

301

son of the Brookhaven ogy Department. helix, P-pleated using

National

Calculations sheet,

random

several conformational

and Fasman

Laboratory were made

coil, and P-turns, parameters

of Chou

that

base

position

negatively

(leftward)

Kollek

have recently corresponding of Rscl3

5pg and

of Rscl3

BumHI,

and

DNA

identical was digested

was incubated

with BglII

in a 100 ~1 of

reaction mixture with T4 DNA ligase using conditions recommended by Miles Inc. The ligated saminto E. coli C600 according

ple was transformed

to

Mandel and Higa (1970). pTR1 was isolated from one of the ampicillin-resistant transformants grown on L-plates

containing

50 pg/ml

of ampicillin.

(rightward)

and

from this point.

reported

nucleotide

to bases between

and between

respectively. (f) Construction of pTR1

at one of the two

positively

et al. (1980) and Stougaard

we will present

sheet.

1 appears

Bg111 sites and proceeds

for a-helix and P-pleated

(1978). In this paper,

the results of calculations

Biolof (Y-

-433

The Kollek

to the Rscl3

et al. (1981)

sequences

of Rl

+ 1702 and + 2020 and +623

of Rscl3,

et al. sequence

is almost

sequence.

There are several

differences

between

the region

-433

to -400

of

the Rscl3

sequence

and that of the Stougaard

et

al. sequence. (b) Comparison of the sequence of Rscl3 with that of pSM1, a derivative of RlOO (1) Homologous and nonhomologous regions Previous heteroduplex analysis showed that the 2.9-kb region of Rscl3 is homologous to pSM1 except for a 0.27-kb region of nonhomology indi-

RESULTS

(a) Determination Rscl3

of the nucleotide

sequence

of

Plasmid Rscl3 (Fig. 1) carries the transposon Tn3, 4957 bp long (Ohtsubo et al., 1978; Heffron et al., 1979). The non-Tn3 ferred

to in this paper

portion

as the 2.9-kb

of Rscl3,

re-

region,

is a

cated by the sawtooth line in Figs. 1 and 5. (Fig. 5 also shows the map of coding frames and various sites to be explained in the next section.) We compared the sequence of Rscl3 with the pSM1 sequence which has been previously described (Rosen et al., 1980). In Fig. 3, differences observed in the pSM1 sequence

are shown below the Rscl3

sequence. Note that pSM1 base positions have been assigned coordinates different from those of Rscl3

(see legend

to Fig. 3). The nonhomologous

segment of the large plasmid Rl (86.3 kb in length) and contains the region necessary for replication

region mentioned above was identified between - 125 and + 125 of Rscl3 and was 249 bp, con-

and incompatibility. Fig. 1 shows the 2.9-kb region with the conventional coordinates of Rl, namely

sistent with the value 0.27 f 0.05 kb obtained

at the 80.2-83.0 region, expressed in kb from a selected origin defined in the heteroduplex analysis (Ohtsubo et al., 1978). From Rscl3, we constructed a small derivative, named pTR1, which was very useful for determining the nucleotide sequence of most of the 2.9-kb region of Rscl3. pTR1 was constructed, as described in MATERIALS AND METHODS, by cloning a BglII-BumHI fragment of Rscl3 which is shown in Fig. 1. Fig. 2 shows the strategy used to determine the nucleotide sequence of pTR1 and of Rscl3. Fig. 3 summarizes the nucleotide sequence of the 2.9-kb region of Rscl3, composed of 2937 bp. The nucleotide sequence was arranged in Fig. 3 such

from

heteroduplex analysis (Ohtsubo et al., 1978). Although we call this region the nonhomologous region throughout this text, the nucleotide sequence comparison shows that there was about 44% homology between Rscl3 and pSMl as seen in Fig. 3. The sequence throughout the rest of the 2.9-kb region was essentially homologous, with .95 base substitutions in 2788 bp between Rscl3 and pSM1. There were no net deletions and insertions in this homologous region (Fig. 3). (2) Coding frames and their neighboring regions Fig. 4 summarizes termination codons

the location of initiation and in all reading frames in the

302

tnp R \\,BamHI

s bla

(Amp’)

Rsc 13

IRPstI % BglII

-x,

P

r

JIR-R Pst

I

Ori

P&I

82.5

83.0=’

I I

piRl Fig. 1. Circular repressor Rx13

(top) and linear (bottom)

(mpR)

and transposase

(tnpA).

map. The thin line indicates

region shows the location in kb of RI, the parent

of Rscl3.

(see text), contains

the 2.9-kb region, relative

(arrow)

a portion

of the plasmid

as well as the Tn3 terminal

of nonhomology

the origin (Ori) and direction Rscl3

representations

between

necessary

Rscl3

to a reference

of replication

of Rscl3

Rscl3. inverted

for replication

and pSMl

(IR-L

of genes of Tn.J for ,%lactamase and IR-R) are indicated

and incompatibility.

(see also Fig. 5). The numbers

point defined

are as defined

The locations repeats

as 86.3,‘O.O (Ohtsubo by Ohtsubo

The sawtooth

line in the 2.9.kb

(X0.2-X2.5-X3.0)

are coordinate>

et al., 197X). The approximate

et al. ( 1977). pTRl.

( h/o),

on the circular

the bmall plasmid

location

of

derivative

of

as indicated.

pTR 1 Fig. 2. Sequencing both

strands

nonhomologous from pTR

I.

strategy

were

for the 2.9-kb region of Rscl3.

sequenced.

region of Rscl3

The numbers with pSMI.

refer

The restriction

to the nucleotide

fragments coordinates

Note that most of the fragments

were sequenced

at least twice. Whenever

used in Fig. 3. The

for sequencing

sawtooth

the 2.9.kb region of Rscl3

possible.

line show5

the

were prepared

303

tions:

for example,

cated in Fig. 4 as repAl, repA2, repA3, repA4, and

with

Ser, Ile with

5.3 K, encoding

several places, amino

2.9-kb

region

of Rscl3.

Five coding

polypeptides

frames

and assumed (Rosen

Fig. 5, repA replication, origin and

seen in the pSM1

been displaced

to be the most likely coding

tion of amino

et al., 1980;

known

1981). As shown

was found in the region necessary upstream

and terminus repA

indi-

larger than 5000, are

those which have been previously sequence

frames

of the region containing of replication

in for the

copy

number

control

and

in-

compatibility (Taylor and Cohen, 1979; Miki et al., 1980; Molin and Nordstrom, 1980; Molin et al., 1981). Note that the orientation of the translation reading frame for 5.3 K was opposite to the other coding frames (Fig. 5); repA straddled the replication

origin/terminus

1979; 1980). Among

region

(Rosen

et al.,

these coding frames, existence

of repA and repA of pSM1 has been supported by genetic and biochemical evidence as discussed in Rosen et al. (1980). Fig. 3 shows the amino

acid sequences

specified

by the coding frames of Rscl3, repAl-4, above the nucleotide sequence. Different amino acids specified by the pSM1 sequence,

due to differences

domly,

in

the nucleotide sequence of pSM1 from that of Rscl3, are also shown above the Rscl3 amino acid sequences. The repA coding frame was identified between + 394 and + 1248 of the Rscl3 nucleotide sequence and could encode a 285 amino acid polypeptide (33000 daltons). In this coding frame, there were 49 bp substitutions between Rscl3 and pSM1. These changes resulted in only eight substitutions in the possible amino acid sequence, as 39 of the base substitutions were in the third position of their respective codons and did not cause amino acid changes. The repAi coding frame was found between - 159 and + 99 of Rscl3 and could encode an 86

that

with Lys, Thr

Leu,

with

and

Glu

acid sequences

by only one codon. acid sequences

region

base changes

by the non-

is very significant,

since if the

in this region

RepA

In

This conserva-

amino

were distributed acid homology

18 t 4.5% (2 1 u). Therefore,

two

Asp.

seem to have

encoded

the expected

be only

(Fig. 5); repA

as well as 5.3 K were in the region

to express

homologous

Arg exchanged

polypeptides

pSM1, though diverged,

from

ranwould

we assume Rscl3

and

have a similar structure

in

vivo. To support

this assumption,

we examined

the

computer-predicted secondary structures of the two RepA polypeptides as predicted by the algorithm of Chou and Fasman (1978). As shown in Fig. 6, the probability curves predicting the positions of a-helix

or P-pleated

polypeptides

sheet between

were remarkably

the two RepA

similar.

The coding frames, repA and repA4, were found to be well conserved between Rscl3 and pSM1.

repA3,

seen between

Rscl3

(Fig. 3), could

code

+ 196 and

+ 378 of

for a 6 1 amino

acid

polypeptide (7000 daltons). When compared to pSM1, there were two base changes, both causing amino acid changes. repA4, found between + 1614 and + 1997 of Rscl3 (Fig. 3) could code for an 128 amino acid polypeptide (14000 daltons). repA of Rscl3 differed from that of pSM1 by 12-bp changes, resulting in eight amino acid differences. The fifth coding frame, 5.3 K, which with the repA coding frame such that of the corresponding message would from the repA message, was found at from +280 to + 130 (Fig. 3, though

overlapped the polarity be opposite the position the amino

acid sequence for the 5300 dalton polypeptide is not shown). When this was compared with pSM1, this coding frame contained two single base differences, resulting in one amino acid change.

amino acid polypeptide (11000 daltons). Of particular note is that the repA coding frame of Rscl3

In all of the coding frames analyzed above, the regions which precede these coding frames and

and pSMl began at the same sequence in the homologous region, extended into the nonhomologous region, and encoded polypeptides of similar size (Fig. 3). When the two amino acid sequences encoded by the nonhomologous region were compared, they shared identical amino acids at 38% of the codons. Furthermore, some of the amino acid differences were found to be conservative substitu-

may contain possible Shine-Dalgarno (1974) sequences and termination codons around the initiation codons, characteristic of true coding frames (Atkins, 1979) were found to be completely conserved between Rscl3 and pSM1. We have previously reported the positions of three RNA transcripts, named RNAI-III, on the common region between pTR1 and pSM1 (Rosen

-200

tSorcznThr~&A& avazThrserSerLeuSer GTCGCAGACAGAAAATGhGTGACTTC~

~CACAATTCTCAAGTCGCTGTTTC~CTGTAGTAT~~T~TGC~~~T~~CTG~TT~GTATT~G~~~~ 100 A@m&r Ah LsuSsr Ala AZaQ,r Lys A,m GC

+I

A

AGCATACA A AG

G

1

GATTAT A

GC TT

G

AlaSerPk

CC T

V01 HisLyysCzuIzsLyysValP

A AGC T GTT

G

LeuCZUPWLy#Ty~ ValLysAmP~~LeuLy& K GT~MTcCTCTW Cm

At

7,111

xzaiz?t~euMet~znNot niec1uAep

Lau AspL.euMBtVaZCz,,Tyr e&A,. CzuGZyi’ztoTk ZnAzo GATCTCATGGTTGAGTACT% CGAiiAG&iAGGGGATAACAEAGGCT GCCA GC CA CC AATG TC TGA T TC G T

300 eVc?Zi~eCy&dzuL~,~ kL+C?,,.SerP,,eQ.,Al AGTTATATGCCCGGAAAAG P TCAAGACTTCTTTCTGTG d

TCCGGTGTTCACTCCCCGT T A 500

600 CATGTGGCGCATGCCCGTT

ATGGATGAGCTGATAGC A 1100 AZO 1200 Arg ArgCZnAspIZeVnll’hr uVaZLysArgCZnL&uTk rgCZ,,IZaSerCLuCzyAr PkThrAZdmCLyCl,,A aV~ZLy~ArgCluVaZCZWr~ArgV~ZLyyeCZuAr~~~ CGTCAGGATATCGTCACCC AGTGAAACGGCAGCTGACG i, G;GAAATC;CGGAAG~CG eTTCACTGC;AATyTGAGG cGGTAAAACGCGAAGl~GAGtGTCGTGTGAAGGAGCGCAT E T 'G IleZ&SerA~AsnArgA nQrSerA~#,%Aza ATTCTGTCACGTAACCGC~TTACAGCCGGCTGGCC T

GTGATCfCCTCAGAATAATCCGGCCTkGCCGWGGCATCCG::P T

4

CTGAAGCCCG:C~;GCACl,

AAAAAACAGCGTCGCATGC~C~TCTCATCATCC~CCTTCTG~GCATCCGATT~CCCCTGTTTTT~TAC~~TACGCCTCAGC~CG~~TTTTGCTTATCCACA~T~ T AC A CA C G T GCC CG C

A

(r-011 1500 CTGCAAGGGACTTCCCCATi\AGGTTAC~CCGTTCATGT~AT~GCGCCAGCCGCCAGfCTTACAGGGTGC~TGTAT~TTlT~CACCTGTTTAT~TCTCCTTT~CTACTTM~ v 1"

PkIZe yeAenTyrILePheIloA1 ,mC+TC:TCATCGC

? ”

I

2000 LyPmL.euArgPmAr zdzd4uSorhpL4uL sAZaCZyMstValR.pCLn ZyTrpCZyTrpVaLAr ““f” GAAGCTCTCTCATGGCTGAhGCGGGTATGGTCTGGCAG EGCTGGGGATGGGTAAG GACCACTCCGACCGCGCAC _-------_-_

?-RNA

III

TTTTACTCCTGTATCATATGC~CAACACACTfCcATGCCGCTt 2200 GGACAAGTTAAAAATTTACRGGCGATGCMTGATTCAAAEA

TCACGTTAAGGGATTTT -3'

CTATcAATCAGTAccGtCTTACGCC

I CGGCd:

I

I 3'1

) ’

Cl

Tn3[ PstI 1I ,

BglII

- 500

I

Fig 4. Location

//

I BqlrIPstI

I

of possible

initiation

(vertical

lines) in all reading

encoding

polypeptides

frames

coding

500



AUG

of the Rscl3

frames (shaded

seen in the pSM1 sequence.

These coding

fact, when they were compared caused

/WI I

larger than 5000 daltons.

other three possible

changes

codons

1

amino

(long diagonal 2.9-kb

lines) and GUG

(short

The solid boxes, labeled

diagonal repAl-

lines), and termination

codons

and 5.3 K, show coding

frames

These coding frames have also been found in the pSM1 sequence (see also Fig. 5). The

boxes, labeled

a-c) which could also encode

frames have been assumed

with those in pSM1,

acid changes

region.

1 15'

polypeptides

larger

to be less likely to encode polypeptides

there were several base changes

in Rscl3

than 5000 daltons, (Rosen

were

et al., 1980). In

and all, or more than 90%. of these

(data not shown).

et al., 1981), as schematically shown in Fig. 5. RNA1 (91 bases in length) is transcribed leftward from the position + 309 on the Rscl3 nucleotide

The -35 region (but not the - 10 region) of the promoter for RNA11 was, however, found to be located in the nonhomologous region between

map (see Fig. 3). A form of RNA1 extended

Rscl3 and pSM1 (Fig. 3). RNA111 (150 bases long) is transcribed rightward from the position +2062

its 3’ end is assumed

to be the mRNA

from

coding

for

of Rscl3 (Fig. 3). The promoter the role of which is unknown,

the 5300 dalton polypeptide (Rosen et al., 1981). When the promoter, consisting of - 10 and - 35 regions (Rosenberg and Court, 1979), was examined, between

tween the Rscl3

the promoter for RNA1 was identical the Rscl3 and pSM1 sequence (Fig. 3).

Fig. 3. Nucleotide the Rscl3

sequence

sequence.

than Rscl3,

showing

The nonhomologous

as represented

frames, repAl-

Rscl3

amino acid sequence.

origin/terminus. to -248

of Rscl3,

of the entire 2.9-kb region of Rscl3 of Rscl3

region of Rscl3

The broken-line

arrows

underneath

of pSM1 is different where a &/II

that we have corrected additional

Rosen et al. (1979) have previously determined the nucleotide sequence of the region containing

Rscl3

is explained

and pSM1 lies between positions

base shortened

the DNA

the reading

sequence

from that of Rscl3,

pSMl

differences

- 125 and + 125. In this region, pSMl Amino

acid differences

frame of RepA

symmetries.

such that base position

and proceeds

sequence

show dyad positively

by inserting

(rightward)

of pSM1 and increased

at the position

from this point (Rosen

the original

corresponding

pSMl

is 7 bp larger by the Rscl3

in pSM1 are shown above the

that the coordinate

1 of pSM1 appears

a base, A, at the position

specified

the region containing

Note

Tn3 sequence.

in pSM1 are shown below

+ 120 and + 121. The amino acid sequence

box in the region from + 1683 to + 2029 indicates

site is present,

our previous

and a few bases of the flanking

in the text. Sequence

(solid line boxes), are shown above the DNA sequence.

Facing

sequence

sequence

by seven dots between

coding

nucleotide

one strand

system of the nucleotide

and pSM1 sequences.

(3) Region containing origin and terminus of replication

RNAII, a large message, it transcribed rightward from position + 142 of Rscl3 (Fig. 3) and is assumed to be the mRNA for both repA and repAl.

The coordinate

for this transcript, was identical be-

coordinates

the replication system

for the

corresponding

et al., 1980). Note also to C84 of Rscl3. from that position

The by 1.

306

pTHl Tn3 Rsc I3@-repA

I

pSM1 I

INC-COP

I

s

I ’ REPLICATION

I

I

1

t Fig. 5. A map of genes and sites in the replication homologous

to the Rscl3

pTRl sequences. frames pII,

The solid and open thick arrows

and show the direction

~111, promoters

incompatibility

and

of translation.

t1, tI1. and

and copy number

the origin and which has been between pSM1 (Ohtsubo et al.,

regions

2.9-kb region. except for the 0.27-kb The arrows

tII1

control

labeled

terminators.

(inc-cop)

labeled RNAI. Replication

and replication

compared with that of Rscl3, one and two standard deviations (a) around the origin/terminus of pSMl were found to correspond to bases between + 1683 and + 2029 (1 a) of Rscl3 (see the brokenline box in Fig. 3) and between -t 15 19 and f 2202 of Rscl3. Note that 12 bp differences be-

(20)

in the 2 u region,

but none affected the extent of the dyad symmetries seen in this region (Rosen et al., 1979) as shown by the arrows quence in Fig. 3.

beneath

the nucleotide

se-

DISCUSSION

(a) Homologous region: evolution nomes by single base substitutions

pSMI,

nonhomologous

of plasmid ge-

As shown in the text, the homologous region contained no net deletions and insertions but many

Rscl3

and pTR1.

repA

and 5.3K correspond

11. III correspond proceeds

are indicated

pSM1

region shown by the sawtooth

repA2, repA3, repAl.

terminus of replication of pSM1, mapped in the homologous region and Rscl3 by electron microscopy 1977; 1978). When this region was

tween Rsc 13 and pSM 1 occurred

of plasmids

to RNA transcripts,

rightward

from

Ori.

The

contains

the sequence

line on the Rscl3 to polypeptide together regions

and

coding

with the p1, required

for

at the bottom.

single base substitutions. Such substitutions seen in genes coding for functional proteins

are such

as those shown for the trpA genes of S. typhimurium LT2 and E. coli K-12 (Nichols and Yanofsky. 1979). In this case, the base substitutions are those that do not drastically change the amino acid sequence of the TrpA proteins. Most occur at the third position of codons, or at other positions resulting in synonymous amino acid substitutions. The repA coding frames of Rscl3 and pSM 1 showed conservation of amino acid sequence in a fashion similar to the trpA genes, suggesting that they encode functional proteins and evolve by single base substitutions. The region containing hypothetical coding frames for RepA and the 5300 dalton polypeptides were well conserved except for two base changes which occurred in the overlapping coding frames. This region is known to be responsible for incompatibility and copy number control (Taylor and Cohen, 1979; Miki et al., 1980; Molin and Nordstrom, 1980; Molin et al., 1981). In fact, the two base changes in this region are due to newly acquired mutations, causing copy number increase

0.8

1.2 e/3> I.0 0.8 0.60 L

10

20

30

40

50

RESIDUE Fig, 6. Predicted potential

sheet potential (Pa)

above

Fasman,

conformational

of the pentapeptide

of the pentapeptide 1.0 have a-helix

et al.,

changes

from Rscl3

70

NUMBER

( -)

and pSM1 (------).

(Pa)

is the weighted

the i th amino acid, i.e. the five ammo acids from i - 2 to i to i + 2. (Pa) around

and p-sheet

80

the i th amino acid, i.e. the five amino acids from i - 2 to i to i + 2. Regions forming

potential;

values below

1.0 indicate

helix or sheet breaking

average

is the weighted

helical average

with (Pm) and

potential

(Chou and

1978).

in the parental

sen

profile of RepA around

60

plasmids

1980). The

in this coding

of Rscl3 absence

and pSM1 (Roof synonymous

frame is probably

signifi-

cant and even assuming that RepA is made, the presence of this gene is probably insufficient to explain the high degree of conservation in this region. The presence of RNA1 and its promoter and the 5’ end region of RNA11 in this region (Fig. 5) may place additional constraints on the mutability of this region.

The region where origin and terminus of replication were mapped by electron microscopy was well conserved such that none of 12 base changes in the 697-bp region affected the dyad symmetries predicted in this region (Rosen et al., 1979). The conservation may suggest that these dyad symmetries might be important and could be involved in the initiation and/or termination of DNA synthesis. The region of 384 bp containing the repA

308

coding

frame was well conserved

substitutions.

The

limited

except for 12 bp

change

in this region

to the fact that repA

may be due, in part, laps with the region

responsible

and/or

of DNA

termination

over-

for the initiation synthesis

described

in the rest of the homologous

the sequence

between

origin/terminus

between

mutation

frequency

260 bp. This may suggest

region is

ciprocal

recombination

and

with a similar

the and

with 26 differences

in

itself is

not functionally important but the precise distance from the terminus of the repA coding frame to the

origin

region

is important.

force for the evolution

Alternatively,

and insertions

events and base substitutions

the mechanism by Stahl

+ 1249

it

are rare

are the major driving

of genomes.

at small

plasmids.

Al-

may have been similar

gene conversion

( 1979), may replacing

but heterologous

which,

involve

as

a nonre-

a DNA sequence gene sequence.

It is interesting that the junctions between the homologous and nonhomologous regions present at - 125 and short

inverted

+ 125 of Rsc 13 were bounded repeats,

CTCA

T

TGAG

by and

TAGCA TGCTA respectively (Fig. 3). These inverted repeats may have been involved in or resulted the

from the recombination

nonhomologous

region

which gave rise to between

Rscl3

and

pSM1. We have described in RESULTS that the promoter pI1 for RNA11 of Rscl3 and pSM1 lies at a homologous-nonhomologous

(b) Nonhomologous region: implication of a second type of evolution of plasmid genomes

of this event

recombination

of two diverged

of eukaryotic

that the region

may be that small deletions

regions

to that

+ 1509 (Fig. 3). This region shows the highest substitution

homologous

mechanism

a simple

reviewed the end of repA

region,

molecular

have been

ternatively,

above. Notable

The actual may

junction,

down-

stream from the repA gene. It is of interest to speculate that promoters may be involved in this type of recombination.

The present nucleotide sequence study indicates that the nonhomologous region between Rscl3 and pSM 1 has 44% base homology. We have shown

Several regions of nonhomology similar to the one retained in Rscl3 and pSM1 have been observed as small substitution loops in electron mi-

that

croscope heteroduplex studies on Rl and RlOO (Sharp et al.. 1973; Ohtsubo et al.. 1978). Small

most

of these

regions

of Rscl3

encode polypeptides (called have the same function. Calculations

which consider

RepA2)

and

pSM1

which

may

substitution only the selectively

neutral, synonymous base changes (Kimura, 1981) show that the extent of divergence in the nonhomologous minimally

part of the repA coding frame 10 times greater than the divergence

is in

eroduplex al.,

1973;

(Davis Simon these

and

loops have also been seen in the hetmolecules Inselburg, Hyman.

of other 1973). 1971:

et al., 1971: Kim and structures

are

formed

plasmids and Fiandt

et al..

Davidson, due

regions similar between Rscl3

Rl

nism discussed above could be frequently in the evolution of genomes.

and RlOO, the mutation

rate must have been

et

1971;

1974). If

to mismatched

the repA coding frame. Thus, if repA and repA both diverged in situ from the common ancestor of consistently least 10 times greater in the nonhomologous region. It is possible that this region represents a mutational hot spot, but it is not clear how this phenomenon would produce such abrupt junctions between homologous and nonhomologous sequences or why such a region overlaps a gene. Therefore, we assume that genes corresponding to repA have diverged in at least two distinct genomes, one of which was the common ancestor of Rl and RlOO and that at some point after Rl and RlOO diverged, part of the repA gene of RI or RlOO has been replaced with sequences from a more distantly related genome.

(Sharp

bacteriophage

to the nonhomologous region seen and pSM1. the substitution mechainvolved

ACKNOWLEDGEMENTS

We would like to thank Drs. Brutlag and L. Kedes for permission to use the SEQ system, and Dr. K. Thompson for providing the Chou and Fasman program and some computer time. We would also like to thank Linda S. Hollmann for typing the manuscript, Mary Ann Huntington for

309

preparingthe illustrations,and Jeff Demianfor the photographs. This work was supportedby Unite! StatesPublic Health Service grants to E.O. (GM22007) and H.O. (CM26779) and by partial support to T.R. under an NIH training grant (CA09176).

Moldave,K. (Eds.)Methodsin Enzymology.Vol. 54, Part II, AcademicPress,NewYork, 1980,pp. 499-560. Mickel, S., Ohtsubo, E. and Bauer, W.: Heteroduplex mapping

of small plasmids derivedfrom R-factorR12: In vivo recombination

occurs

at ISI insertion

Miki, T., Easton,

A.M. and Rownd,

tion, incompatibility, NRl. Molin,

J. Bacterial.

trol,

Atkins,

J.F.: Is UAA or UGA part of the recognition

ribosomal Chou,

initiation?

P.Y. and

Fasman,

G.D.:

structures

of proteins

Enzymol.

47 (1978) 45-148.

Datta,

N.:

Prediction

Adv.

R.W.

and

classification

of plasmids.

In

1974. American Society

D. (Ed.) Microbiologyand

Hyman,

DNA base sequence

R.W.:

A study

homology

M., Hradecn&

Electron

between

Z., Lozeron,

micrographic

homologies

mapping

in the DNAs

in Hershey,

coliphages

the

T7 and

H.A.

and Szybalski.

of deletions,

of coliphages

Laboratory,

insertions

lambda

tor Rldrd-19B2.

J. Bacterial.

F.. McCarthy,

DNA sequence

antibiotic

NY, 1971, plas-

resistance

fac-

H. and

of the transposon

Ohtsubo.

E.:

Tn3: Three genes

in the transposition

of Tn3. Cell 18

factor

in ColE2-E3

DNA:

heteroduplex

Kim. J.S. and Davidson, study of sequence

molecules.

Nature

New

N.: Electron

relations

Estimation

homologous

microscope

of T2, T4 and T6 phage DNAs.

sequences.

J.,

Nordstrom,

RI, and identification control.

J.

M.

and

of new copy

of a polypeptide

Mol. Gen.

C.: Nucleotide

Genet.

181

sequences

of trpA

and Escherichia cob: An evolu-

typhimurium

Proc. Natl. Acad.

E., Feingold, from

Ohtsubo,

J., Ohtsubo, replication

R factor

Rl2

E., Rosenbloom,

Sci. USA 76 (1979)

ation

of small

H., Mickel, S. and Bauer, of three small plasmids

in Escherichia

M., Schrempf,

Rosen, J.: Site-specific

Oertel,

W.

and

distances

Proc.

Goebel,

of the minimal

replication

(pKNlO2)

Natl.

between Acad.

Sci.

(“basic

of the antibiotic

W.:

fragment

replicon”)

resistance

Isolation required

and for au-

of a copy mutant

factor

RI. Mol. Gen.

Mol.

de-

coli. Plasmid

H., Goebel,

recombination

plasmids.

R.. Oertel. W. and Goebel,

resistance

factor

at RI.

1

W. and

involved in the gener-

Gen.

A.: Calcium-dependent

H. and Ohtsubo,

of the region

Genet.

159 (1978)

bacteriophage

J. Mol. Biol. 53 (1970) 159-162.

Maxam, A. and Gilbert, W. Sequencing end-labeled DNA with base-specific chemical cleavages, in Grossman, L. and

E.: The nucleotide

surrounding

factor

the replication

derivative.

se-

origin

Mol. Gen. Genet.

of 171

(1979) 287-293. Rosen,

J., Ryder,

T., Inokuchi,

H., Ohtsubo,

E.: Genes and sites involved of an RlOO plasmid

Rosen,

analysis.

J., Ryder,

Nature

in antibiotic

and incompati-

based

on nucleotide

179 (1980) 527-537.

H. and Ohtsubo,

in replication,

E.: Role of

incompatibility

resistance

and copy

plasmid

derivatives.

290 (1981) 794-797. M. and Court,

in the promotion Annu.

derivative

T., Ohtsubo,

control

H. and Ohtsubo,

in replication

Mol. Gen. Genet.

transcripts

D.: Regulatory

and termination

Rev. Genet.

sequences

of RNA

involved

transcription.

13 (1979) 319-353.

F. and Coulson,

A.R.: The use of thin acrylamide

for DNA sequencing.

FEBS Lett. 87 (1978) 107-I 10.

Sharp,

P.A., Cohen,

scope

S.N. and Davidson,

heteroduplex

plasmids

studies

16s

N.: Electron

of sequence

of E. co/i. I. Structure

gels

relations

of drug resistance

microamong

(R) and F

J. Mol. Biol. 75 (1973) 235-255.

Shine, J. and Dalgarno, rRNA:

ribosome

177 (1980) 413-419.

M. and Higa.

DNA infection.

W.: Site specific deletion

origin of the antibiotic

Mol. Gen. Genet.

J., Ohtsubo,

quence

factors.

162 (1978) 51-57.

the replication

Rosen,

Sanger,

tonomous

Mandel.

Ohtsubo,

Rosenberg,

of evolutionary

nucleotide

characterization

Kollek.

con-

replication.

and characterization

number

comparison.

number

heteroduplex

USA 78 (1981) 454-458.

Genet.

of

5244-5248.

RNA

57 (1974) 93-111.

R.,

switch-off

Light,

B.P. and Yanofsky,

of Salmonella

sequence

a single non-homologous

Biol. 241 (1973) 234-237.

Kollek,

Nichols,

bility

J.: Colicin

M.:

in copy

an RlOO resistance

(1979) 1153-1163.

Kimura.

of plasmid

involved

(1977) 8-18.

123 (1975) 658-665.

B., Ohtsubo,

analysis

and three sites involved

Virology

mutants

and Cold

R.: Class of small multicopy

from the mutant

region

Rl replica-

copy number

131-141.

W. and Bonewald,

Inselburg.

P.,

K.: Isolation

rived

and phi 80,

Lambda.

Cold Spring Harbor,

mids originating Heffron.

of replica-

of R plasmid

of plasmid

in replication, and

Stougaard,

W.:

pp. 329-354. Goebel,

K.: Control

involved

W.: Unidirectional

A.D. (Ed.) The Bacteriophage

Spring Harbor

S.,

tionary

in evolution:

T3. J. Mol. Biol. 62 (1971) 287-302. Fiandt,

Molin,

Cloning

functions

(1981) 123-130.

for Microbiology, Washington, DC, 1974, pp. 9-15. Davis,

2

141 (1980) 111-120.

Nordstrom,

of the secondary

from their amino acid sequence.

Epidemiology

Schlessinger,

signal for

Nucl. Acids Res. 7 (1979) 1035-1041.

Gene

141 (1980) 87-99.

incompatibility,

Bacterial.

R.H.:

and stability

S. and Nordstrom,

tion: Functions REFERENCES

sequences.

(1977) 193-210.

K.: The 3’ terminal

Complementarity

binding

sites.

Proc.

sequence

of nonsense Natl.

Acad.

of E. coli

triplets

and

Sci. USA

71

(1974) 1342-1346. Simon,

M.N., Davis, R.W. and Davidson,

of DNA molecules

of lambdoid

of their base sequence

relationships

phages:

N.: Heteroduplexes Physical

by electron

mapping

microscopy,

310

in Hershey,

A.D. (Ed.) The Bacteriophage

Spring Harbor

Laboratory,

Cold Spring

Lambda.

Harbor,

Cold

NY. 197 1.

pp. 313-328. Stahl.

F.W.:

Taylor,

Genetic

Phage and Fungi.

Recombination, Freeman,

Thinking

San Francisco,

About

It in

CA. 1979, pp.

133-159.

Stougaard,

the resistance

plasmtd

RI&d-19.

D.P. and Cohen,

S.N.: Structural

P., Molin,

S., Nordstrom,

sequence

K. and

of the replication

Hansen. control

segments

containing

compatibility

regions of a miniplasmid J. Bacterial.

F.G.:

region of

Gen.

Genet.

and functional

sis of cloned

number mutant of NRI.

The nucleotide

Mol.

1x1

(1981) 116-122.

Communicated

by S.R. Jaskunas.

the replication derived

analyand in-

from a copy

137 (1979) 922104.