Comparison of the complete sequence of the str operon in Salmonella typhimurium and Escherichia coli

Comparison of the complete sequence of the str operon in Salmonella typhimurium and Escherichia coli

Gene, 120 (1992) 93-98 0 1992 Elscvier Science GENE Publishers B.V. All rights reserved. 93 0378-l 119/92/$05.00 06679 Comparison typhimurium (...

647KB Sizes 0 Downloads 99 Views

Gene, 120 (1992) 93-98 0 1992 Elscvier Science

GENE

Publishers

B.V. All rights reserved.

93

0378-l 119/92/$05.00

06679

Comparison typhimurium (S7 protein;

of the complete and Escherichia

rpsG; elongation

Urban Johanson Department

Received

factor G; j&4;

and Diarmaid

of MolecularBiology.

by G. Bernardi:

sequence coli evolution

of the

rate; divergence;

str

codon

operon

adaptation

in Salmonella

index; ribosomal

protein)

Hughes

Uppsala University. Biomedical

9 April 1992; Revised/Accepted:

Center, S-751

25 May/26

24 Uppsala, Sweden

May 1992; Received

at publishers:

26 June 1992

SUMMARY

The nucleotide (nt) sequences of the str operon in Escherichia coli K- 12 and Salmonella typhimurium LT2 were completed and compared at the nt and amino acid (aa) level. The order of conservation at the nt and aa level is rpsL> tufA > rpsG> f USA. A striking difference is that the rpsG-encoded ribosomal protein, S7, in E. coli K-12 is 23 aa longer than in S. typhimurium. The very low (0.18) codon adaptation index of this part of the E. coli K-12-encoding gene and the unusual stop codon (UGA) suggest that this is a relatively recent extension. A trend towards a higher G+C content in fusA (gene encoding elongation factor (EF)-G) and tufA (gene encoding EF-Tu) in S. typhimurium is noted. In fusA, nt substitutions at all three positions in a codon occur at a much higher frequency than expected from the number of nt substitutions in the gene, assuming they are random and independent events. An analysis of substitutions in this and other genes suggests that the triple substitutions in fusA, and some other genes, are the result of the sequential accumulation of individual mutations, probably driven by selection pressure for particular codons or aa.

INTRODUCTION

The str operon in prokaryotes consists of rpsL, rpsG, fusA and tufA (Douglas, 1991 and references therein) coding for the r-proteins S12 and S7, and the translation factors EF-G and EF-Tu, respectively. The operon is expressed from a promoter upstream from rpsL, via a polycistronic mRNA (Jaskunas et al., 1975). S7 acts as an autoregulator of rpsG and fusA translation, by binding to

Correspondence

to: Dr. D. Hughes,

Uppsala

University,

Biomedical

Sweden.

Tel. (46.18)174203;

Department

Center,

of Molecular

Biology,

Box 590, S-751 24 Uppsala, EXPERIMENTAL

Fax (46-18)557723.

Abbreviations: aa, amino acid(s); bp, base pair(s); CAI, codon adaptation index: EF, elongation factor; ,jiisA, gene encoding EF-G; kb, kilobase(s) or 1000 bp: nt, nucleotide(s); ribosomal; r-proteins; S7. EF-G

rpsG, gene encoding

the mRNA sequence between rpsL and rpsG (Dean et al., 1981). The regulation of EF-Tu production is more complex, with a second chromosomal copy of the gene and two additional promoters for tufA inside $usA (Zengel and Lindahl, 1990). To facilitate the study of the regulation of tufA and the genotype of novel mutants selected in fusA in S. typhimurium the unknown part of the str operon, rpsG and fusA, was sequenced in a wt strain and the whole operon was compared with the completed E. co/i sequence.

PCR, polymerase

chain reaction;

S7; rpsL, gene encoding

r,

S12; S7, S12.

SD, Shine-Dalgarno (sequence); str, operon encoding and EF-Tu; fz&, gene encoding EF-Tu; wt, wild type.

S12,

AND DISCUSSION

(a) Sequencing of rpsG and fusA The 2.8-kb region between rpsL and tufA was amplified from chromosomal DNA of the strain S. typhimurium LT2, and sequenced. A small part of rpsG in E. coli was also sequenced using the K-12 strain MG1655 in order to make

:

c 2.iz1 652

G-7

vai

AK

lys

n1c 11e

CAC GCT SAA his ala q1u

GTh

va1

CCG C3G TCT GA& AIL: ser gl’! me lFl.

pro

TX pk

GGA 1‘A’: g1y ryr

GCA ACT ttlr

aia

CAG GIG $12 1eu

CGT XT ser

arg

CTG ACC AAIl 1eu ttlr lys

GGT

g1y

xx arg

A GC.& KG TAC ACT al.3 scr tyr ttz

n A?G

met

GAA

WC

g1uphc

CTG AAG If% 1ys

TAT tyr

GAT GAT KG asp asp ala

qlu 2611 692

ST ARC AAC asr. ,2r:n se:

s:’ “.il

GCT CAG GCC GIA ala gh ala “al

I-lg. I. Nucieotidc scyuenced opcron

scqucncc

previously

T AT: GM Ilr

GCC CGT XT arq gly

ofr/xG.

the mtcrgcmc

glL ala

in S. t_rphin~iun~ (Hughes

in E. c,o/i K-12 (Post and Nomura,

AAA

Tw’

1ys

***

gcr

spacer.

:a gca

aat gqg

tt?Laaa

cicc

aaa

g ate

,I

ccq

tqc tcz

ctc ctg sag ggg

/itsA and the .fu.sA-t~fi4 spacer in S. ryphimurium.

and Buckingham,

1980; Zengel et al.,1984:

a

agagcgeta tag taa

The first

et al., 1980), nt 244-441

scquencc is underlined where insertions or deletions occur. The deduced aa sequences two rows below for E. col;. but only where they differ. The nt sequcncc data reported

ata

tag

cc

1I nt do overlap with the rpsG region

1991); the same is true for the last spacer (Tuohy et al.,1990). Yokota

gqa

The nt sequence

of the

from this work, is shown above if it differs. The

arc shown one row below the nt sequence for S. typhimurium, and in this paper arc in the EMBL, GenBank and DDBJ nt sequcncc

databases under the accession Nos. X64591 (S. typhirnurium) and X64592 (E. coli). Methods. The 2.8-kb region between rpsL and tgfA was amplified from chromosomal DNA of the S. f~phimurium strain LT2. using two pairs of primers in a symmetric PCR (30 cycles for 1 min at 94’C; 1 min at 66’C: 2 min at 72’C). In a second asymmetric PCR using the same conditions as above, but seeded with 5:, of the first reaction, template was generated for the subacquent nt sequencing. The primers for sequencing and PCR were based on the E. coli sequence whcrc the S. typhimurium was unknown. Detarls of the primers used arc available on request. E. co/i K-12 strain MG1655 was also sequenced, from nt position 241 to 521. The asymmetric PCR products wcrc purified through Centricon 100 filters prior to sequencing with the T7 sequencing kit from Pharmacia (Uppsala)

Cr^G pro

95

bp

Sa)

124

375

R aa

97 6

d1

471

96

bp Eb) W DNAC)

95

96.1

97.9

96.6

68

28

69

27

82.5

85.5

2115

71

,185

70 94.3

90 I

39 98 0

1000

99 4

97 6

99.7

0 66

061

0 70

0 70

CAI E”

0 66

0 63

0 74

0 82

K s ‘)

0 08

CAI 5

e)

KAh)

0

Fig. 2. The data for the first part of the operon,

including

0 I8

0 13

0.23

0 09

0 003

0015

0 001

the rpsL-rpsC spacer, are from Hughes

and Buckingham

40

79.7

(1991) and for ru/il essentially

from

Sharp (1991). The part of the rpsG:fusA spacer, which is a coding sequence in E. cd K-12 but not in S. tJ’phimurium, is hatched. The figure is drawn to scale, except for firsA and tuJA which are much longer, indicated by the broken bar. s bp in S. t_vpphimurium; stop codons are included in coding region. b bp in E. coli, stop codons

are included in coding region. ’ “, identity at nt level, insertions and deletions are treated equal to substitutions. ’ ‘; identity at aa level. ’ CA1 (Sharp and Li, 1987a) in S. tJ’phimurium. r CA1 adaptation index in E. coli. g Ks (Li et a1.,1985), the number of synonymous substitutions per synonymous site. h K, (Li et a1.,1985), the number of nonsynonymous substitutions per nonsynonymous site.

the E. coli sequence complete. The deduced aa sequence of S7 from K-12 has one extra Arg9* (Fig. 1) compared to the sequence reported from protein sequencing (Reinbolt et al., 1978) otherwise it is in perfect agreement. The results of a comparative analysis of the str operons are summarized in Fig. 2.

The most striking difference between the str operons of these two organisms is that the encoded S7 protein in E. coli K-12 is 23 aa longer than S7 in S. typhimurium. However, K-12 seems to be the exception because even E. coli B and all other species examined so far encode the short version of S7 (Reinbolt et al,, 1978; Buttareli et al., 1989; Douglas, 1991; Wagar and Pang, 1992, and references therein). This extension of S7 in K-12 is probably quite recent judging from the low CA1 0.18 (Fig.2) of this part of the gene which is close to the value, 0.17, of a sequence of equiprobable sense codons (Sharp and Li, 1987a). The CA1 for the whole vpsG in K-12 is only 0.53 which is certainly lower than the average 0.61 for 32 r-proteins examined but still higher than or the same as the six lowest in the set which ranges from 0.42 to 0.8 1. The maintenance of the low CA1 in these other six genes might in part be due to regulatory constraints but shows that values at 0.53 and below are tolerated in highly expressed genes. The stop codon UGA of rpsG in K-12 is also rather unusual for an r-protein (Post and Nomura, 1980) or for any gene with a high CA1 (Sharp and Bulmer, 1988) and might reflect how unadapted this end of the gene is for optimal translation. An examination of the sequences for 32 r-proteins from E. coli K- 12 shows that only four do not use UAA as the stop codon, rpsG included. In the homologous part of S7 in S. typhimurium there is only one aa change, a Glu for an Asp,

which is considered as a conservative 1974; Li et al., 1985).

change (Grantham,

(c) Intergenic regions The total length of the coding and intergenic regions of the operon is identical in the two species, insertions in one spacer being compensated by deletions in another. The only intergenic spacer in the operon suggested to be functionally important so far (excluding the short SD regions) is the rpsL-rpsG spacer where protein S7 is supposed to bind (Nomura et al., 1980). As can be seen in Fig.2 this spacer is indeed highly conserved compared with the others in the same operon, implying that the same autoregulation occurs in S. typhimurium as has been shown in E. coli (Dean et al., 1981). (d) EF-G EF-G is the least conserved gene in the operon. However, there is only one aa change in a conserved region of EF-G (Kohno et al., 1986; Grinblat et al., 1989) and that is a moderately conservative replacement (Grantham, 1974; Li et al., 1985) of Thr493 by Ala in S. typhimurium, the very end of a conserved region. This region may be involved in the interaction with the ribosome (Kohno et al., 1986). It is somewhat unexpected that fusA is the most divergent gene in the operon (Fig. 2), but it might be that the important domains of EF-G are small relative to its size. One of the striking features of EF-G is its large size which may be necessary to make essential contacts with the two r-subunits at the same time. Two functional promoters for tufA, within the fusA gene, have been described (Zengel and Lindahl, 1990; Zengel et al., 1984). The sequence of the second of these promoters is changed in S. typhimurium. A T2466+C transition weak-

96 ens the homology with the -35 consensus sequence and may indicate that the first promoter is the dominant one in S. typhimurium.

The order is not reversed by including the missing Arg in the E. coli sequence compared. There is in general a negative correlation between KS and CA1 (Sharp and Li, 1987b; Sharp, 1991), but in the str operon fusA with the highest K, value also has the second highest CAI. This suggests that the exceptionally low KS of rpsL and rpsG are not wholly a result of the high CA1 but could in part be explained in terms of a selective pressure on the nt sequence for a regulatory purpose. The stricter conservation of the genes at both ends of the operon could reflect constraints imposed by regulation of the messenger level. Of the 196 nt substitutions in the operon, 86 replace an AT bp in E. coli with a GC bp in S. typhimurium. The reverse is observed in 66 cases. The changes responsible for this drift mainly occur in fusA and tufA, changing from 50.80/b to 51.70/, and from 53.2% to 54.0% G+C, respectively. However, this trend does not greatly change the overall nt composition of the operon. In E. coli the G+C content of the operon is 51 .O% compared with 5 1.5 “/b in S. typhimurium. A trend towards a higher G+C content in

(e) The stv operon At the nt level, rpsL and t&A show the highest interspeties similarity followed by rpsG (only comparing the homologous, coding part of the genes) and then fusA. This is partly a reflection of the similarity at the protein level which follows the same order. The K, values (the number of synonymous substitutions per synonymous site) which perhaps better illustrate the divergence at the nt level independent of the aa sequence also show the same order of divergence. A high K, indicates freedom at the nt level unperturbed by constraints on the aa sequence. The number of nonsynonymous substitutions per nonsynonymous site (KA) is clearly higher in fusA than in the other genes. The previously reported order of conservation in the operon, comparing the distantly related organisms, Spirulina platensis, E. coli and Micrococcus luteus (Buttareli et al., 1989) puts EF-G before S7 as the more conserved protein.

TABLE

I

The distribution

of nt substitutions

a

Gene

“/, identity b

K,’

CAI(E)d

(1)

(2)

(3)

(4)

Obs./Exp.’

Number of codons r

(5)

0

I

2

3

P(>n)g (7)

(6)

w

98.2

0.00 1

0.78

1.0

0.9

2.8

0

1.0

rpsL tgf;l

98.1

0

0.66

1.0

1.0

0

0

1.0

98.0

0.82

1.0

0

1.0

96.6

0.63

1.0

1.0 1.0

2.5

rp.G

0.00 1 0.003

0

0

,firsA

94.3

0.74

1.0

93.8

0.63

1.0

0.8 1.1

1.6

rpoB ompA

0.015 0.008 0.039

0.76

0.020

0.41

1.0 0.9

0.8

trpB

90.1 84.4

trpE

80.2

0.069

0.36

1.0

I.0

31.8 8.7

1.2x lo-= 4.3 X lo-

14.9

3.0 x lo-’

1.2

1.0 0.4

I.0

0.9

1.7

0.3

0.6

1

7.8 x lo-

trpA

75.3

0.083

0.34

1.0

1.1

0.8

1.5

6

1.2 x 10 2.1 x 10-i

tar

75.2

0.124

0.32

1.0

1.0

0.8

1.9

17

8.8 x 10

p&B

72.8

0.166

0.33

1.1

1.0

0.7

2.7

25

1.1 x IF5

SUIA

68.5

0.136

0.23

1.1

1.0

0.7

2.3

12

6.1 x 10-l

“ A subset of the genes that were sequenced were retrieved

from the EMBL databank.

K,

in E. co/i and S. typhimurium are listed according and CA1 are essentially

to their identity

1 ’ ’ ?

at nt level (column 2). The nt sequences

from Sharp (1991). The nt substitutions

are assumed

to occur randomly

and in-

dependently, in the calculations of the expected number of codons with 0, 1, 2 or 3 nt substitutions. The expected frequency for each type of codon (0. 1, 2 or 3 nt substitutions) is calculated as the product of the probability of occurrence of the nt substitutions and/or the nonsubstitutions making up that codon type, times the number

of permutations

in the gene to get the expected

number of codons

of 0, 1, 2 or 3 nt substitutions

may be greater. h i’O identity at nt level, allowing gaps in the alignment L K, (Li et al., 1985), the number

of nonsynonymous

’ Codon adaptation index in E. cd. ’ The observed number of codons in the reading

(columns

5). The expected

of each type. Because triple substitutions

saturate

only if they are a multiple of codons substitutions

per nonsynonymous

frame with 0, 1, 2, or 3 nt substitutions

frequency

is multiplied

by the number

of codons

the sites in a codon the number of substitution

in the reading

frame.

site. relative to the expected

r The observed number of codons in the reading frame with three substitutions. D The probability to find n or more codons with three nt substitutions in a given gene. The distribution

is assumed

number. to be binomial.

events

97 S. typhimurium has been observed before (Riley and Krawiec, 1987, and references therein).

per codon relative to the expected is even higher than in group three. In addition, triple substitutions are more rarely out of frame than in frame. The triple substitutions in group

(f) The distribution of nt substitutions There are more codons with three nt substitutions in ,@A than would be expected if each nt substitution was a random and independent event. In order to examine this in more detail 13 genes with different K, and CA1 values (Sharp, 1991) were compared. The results are listed in Table I. There are two possible ways in which the triple substitutions could have arisen (Sharp, 1991). One possibility is that there are mutational mechanisms which cause multiple simultaneous substitutions. Alternatively, they

two are also more conservative in terms of both the aa and the codon (Grantham, 1974; Sharp and Li, 1987a) than such substitutions in group three. These observations argue that the model of sequential substitutions has some relevance in fusA, rpoB and ompA. We propose that in this

could be the result of substitutions occurring sequentially with conservation of aa similarity and/or the CA1 as the driving force in the selection. The genes in Table I are listed in order of nt divergence and seem to fall into at least three groups according to how many triple substitutions they have relative to the number expected. The first group consisting of the first four genes in Table I lacks any codons with three substitutions. The next three genes, fusA, rpoB and ompA, form a second group in which the number of codons with triple substitutions exceeds the expected value by about an order of magnitude. The high number of triple substitutions in group two is statistically significant (Table I). The last six genes in Table I with approximately twice as many triple substitutions as expected, make up the third group. The lack of triple substitutions in the first group is consistent with their very low nt substitution frequencies and their high conservation at the aa level. The observed double substitutions in this group are unique events and thus not statistically significant. If nt substitutions have a tendency to appear in clusters, because of the nature of the mutational mechanism, this should be most evident in the third group where the divergence is greatest and the selection pressure against multiple changes should be less than in the two other groups. The underrepresentation of double substitutions in this group probably indicates that these genes are subject to some selection pressure, and this will account for some of the excess of triple substitutions in most of the genes in this group. However, their almost even spread in all frames indicates that their occurrence may indeed be at least partly due to a mutational mechanism causing simultaneous multiple nt substitutions. In the second group, as in the first, there is a high CA1 but the aa sequences, although highly conserved, are more divergent than in the first group as indicated by the higher K, values. The erratic frequency representation of both single and double substitutions in this group may partly be a consequence of strong selection pressure against any changes and partly be due to variation because of the small sample size. In this group the number of triple substitutions

class of genes most of the nt substitutions will reduce the fitness of the gene and thus be subject to selection pressure for better codons, resulting in the selection of additional nt substitutions within a codon.

ACKNOWLEDGEMENTS

This work was supported by grants from the Swedish Natural Science Research Council to D.H. and to C.G. Kurland, and from the Swedish Cancer Society to C.G. Kurland. We thank Farhad Abdulkarim, Otto Berg and Charles G. Kurland for helpful suggestions.

REFERENCES

Buttareli,

F.R., Calogero,

Characterization

R.A., Tiboni, O., Gualerzi,

their evolutionary

relationship

CO. and Pon, CL.:

genes from Spirulina platensis and

of the ~tr operon

to those

of other prokaryotes.

Gen. Genet. 217 (1989) 97-104. Dean, D., Yates, J.L. and Nomura, M.: Identification tein S7 as a repressor

of translation

Mol.

of ribosomal

within the str operon

pro-

of E. coli.

Cell 24 (1981) 413-419. Douglas,

S.E.: Unusual

organization

of a ribosomal

plastid genome of Cryptomonas @: evolutionary Genet. Grantham,

19 (1991) 289-294. R.: Amino acid difference

formula

protein operon in the considerations.

Curr.

to help explain

protein

evolution. Science 185 (1974) 862-864. Grinblat, Y., Brown, N.H. and Kafatos, F.C.: Isolation ization

of the Drosophila translational

elongation

and character-

factor

2 gene. Nu-

cleic Acids Res. 17 (1989) 7303-7314. Hughes,

D. and Buckingham,

R.H.: The nucleotide

sequence of rpsL and

its flanking regions in Salmonella typhimurium. Gene 104 (1991) 123124. Jaskunas, S.R., Lindahl, L., Nomura, M. and Burgess, R.R.: Identification of two copies of the gene for the elongation factor EF-Tu in E. cd.

Nature

257 (1975) 458-462.

Li, W.-H., Wu, C.-I. and Lou, C-C.: A new method for estimating onymous and nonsynonymous rates of nuclcotide substitution sidering the relative likelihood

of nucleotide

and codon changes.

synconMol.

Biol. Evol. 2 (1985) 150-174. Kohno,

K., Uchida,

T., Ohkubo,

H., Nakanishi,

S., Nakanishi,

T., Fukui,

T., Ohtsuka, E., Ikehara, M. and Okada, Y.: Amino acid sequence of mammalian elongation factor 2 deduced from the cDNA sequence: homology with GTP-binding proteins. Proc. Natl. Acad. Sci. USA 83 (1986) 4978-4982. Nomura, M., Yates, J.L., Dean, D. and Post, E.L.: Feedback regulation of ribosomal protein gene expression in Escherichia coli: structural

98 homology

of ribosomal

Natl. Acad. Post,

L.E. and Nomura,

Escherichia

Rcinbolt,

RNA and ribosomal

protein

mRNA.

Proc.

Sci. USA 77 (1980) 7084-7088. M.: DNA

coli. J. Biol. Chem.

J., Tritsch,

ture of ribosomal

sequences

from the SIT operon

of

255 (1980) 4660-4666.

D. and Wittmann-Liebold,

B.: The primary

struc-

lular and Molecular

S.: Genome

organization.

In: Neidhardt,

F.C.,

Biology, Vol. 2. ASM Press, Washington,

DC,

1.

P.M.

termination

Determinants

and Bulmer, codons.

of DNA

sequence

divergence

between

M.: Select&

differences

among

translation

Thompson,

S., Gesteland,

J.F.: The role of EF-Tu

substitution

to codon

in

usage bias. Mol.

mining translocation

R.F., Hughes,

and other translation

step size. Biochim.

Wagar,

D. and Atkins,

components

Biophys.

Acta

in deter1050

E.A. and Pang, M.: The gene for the S7 ribosomal

Chkmydia

synonymous

The codon adaptation codon

index - a measure

usage bias, and its potential

tions. Nucleic Acids Res. 15 (1987a)

1281-1295.

trachomatis:

eron. Mol. Microbial. Yokota,

T., Sugisaki,

characterization

(I 990)

of

applica-

protein

within the chlamydial

of

Str op-

6 (1992) 327-335.

H., Takanami,

M. and Kaziro,

of the cloned t!fA gene of Escherichia

Y.: The nucleotide coli. Gene

12 (1980)

25-31. Zengel, J.M. and Lindahl, L.: Mapping of two promoters for elongation factor Tu within the structural gene for elongation factor G. Biochim. Biophys. Acta 1050 (1990) 317-322. Zengel, J.M., Archer, R.H. and Lindahl,

Gene 63 (1988) 141-145.

Sharp, P.M. and Li, W.-H.: directional

related

Biol. Evol. 4 (1987b) 222-230.

sequence

Escherichia cd and Salmonella typhimurium: codon usage, map position, and concerted evolution. J. Mol. Evol. 33 (1991) 23-33.

Sharp,

The rate of synonymous

genes is inversely

274-278.

Ingraham, J.L., Low, K.B., Magasanik, B.. Schacchter, M. and Umbarger, H.E. (Eds.), Escherichia coli and Salmonella typhimurium: Cel-

P.M.:

and Li, W.-H.:

Tuohy, T.M.F.,

protein S7 from E. coli strains K and B. FEBS Lett.

Riley, M. and Krawiec,

Sharp,

P.M.

cnterobacterial

91 (1978) 297-301.

1987, pp. 967-98

Sharp,

the Escherichia

colifis

L.: The nucleotide

gene, coding for the elongation

cleic Acids Res. 12 (1984) 2181-2192.

sequence

of

factor G. Nu-