Are horizontal transfers involved in the evolution of the Streptococcus thermophilus exopolysaccharide synthesis loci?

Are horizontal transfers involved in the evolution of the Streptococcus thermophilus exopolysaccharide synthesis loci?

Gene 233 (1999) 151–161 www.elsevier.com/locate/gene Are horizontal transfers involved in the evolution of the Streptococcus thermophilus exopolysac...

196KB Sizes 1 Downloads 51 Views

Gene 233 (1999) 151–161

www.elsevier.com/locate/gene

Are horizontal transfers involved in the evolution of the Streptococcus thermophilus exopolysaccharide synthesis loci? F. Bourgoin *, A. Pluvinet, B. Gintz, B. Decaris, G. Gue´don Laboratoire de Ge´ne´tique et Microbiologie, UA INRA 952, Universite´ Henri Poincare´ Nancy I, Faculte´ des Sciences, BP 239, 54506 Vandœuvre-le`s-Nancy, France Received 4 February 1999; accepted 8 April 1999; Received by B. Dujon

Abstract A 32.5 kb variable locus of the Streptococcus thermophilus CNRZ368 chromosome, the eps locus, contains 25 ORF and seven insertion sequences (IS ). The putative products of 17 ORF are related to proteins involved in the synthesis of polysaccharides in various bacteria. The two distal regions and a small central region of the eps locus are constant and present in all or almost all of the S. thermophilus strains tested. The other regions are variable and present in only some S. thermophilus strains tested, particularly in the closely related strains CNRZ368 and A054. A 13.6 kb variable region of the eps locus of S. thermophilus CNRZ368 contains two ORF that are almost identical to epsL and orfY of the eps locus of Lactococcus lactis NIZOB40 and seven IS belonging to four different families, ISS1, IS981, IS1193 and IS1194. Five of these sequences were probably acquired by horizontal transfer from L. lactis (Bourgoin, F., et al., 1996. Gene 178, 15–23). Three probes of this 13.6 kb region hybridized with the DNA of several L. lactis strains tested. A specific probe for another sequence within the S. thermophilus eps locus, epsF, hybridized with the DNA of one of the L. lactis strains tested. Sequence comparisons also suggest that five ORF of the eps locus have a mosaic structure and probably result from recombinations between sequences that are 10 to 50% divergent. The chimeric structure of the eps locus suggests a very complex evolution. This evolution probably involves both the acquisition of the 13.6 kb region from L. lactis by horizontal transfer and exchanges within the S. thermophilus species. © 1999 Elsevier Science B.V. All rights reserved. Keywords: Chimeric locus; Lactic acid bacteria; Lactococcus lactis; Mosaic gene; Polymorphism

1. Introduction Streptococcus thermophilus is a thermophilic lactic acid bacterium that is used in starter cultures for the production of cheese and yoghurt. Some strains of S. thermophilus have been selected for their exopolysaccharide ( EPS) synthesis, which leads to an improvement in viscosity and texture of yoghurt (Cerning, 1990). An eps locus that contains 13 genes involved in EPS biosynthesis has been characterized on the chromosome of S. thermophilus Sfi6 (Stingele et al., 1996). The related cps locus, which includes at least five genes involved in EPS biosynthesis, has been identified on the chromoAbbreviations: aa, amino-acid(s); bp, base pair(s); EPS, exopolysaccharide(s); IS, insertion sequence(s); kb, kilobase(s); ORF, open reading frame(s); RBS, ribosome binding site(s). * Corresponding author. Tel.: +33-3-83-91-21-79; fax: +33-3-83-91-25-00. E-mail address: [email protected] ( F. Bourgoin)

some of S. thermophilus NCFB2393 (Griffin et al., 1996). In addition, six genes from the eps locus of the chromosome of S. thermophilus MR-1C were partially sequenced (Low et al., 1998). In contrast, EPS production is generally associated with plasmids in Lactococcus lactis, another lactic acid bacteria. The eps locus of 14 genes located on plasmid pNZ4000 from L. lactis NIZOB40 has been sequenced ( Van Kranenburg et al., 1997). Whereas some genes within the eps loci of S. thermophilus Sfi6 and L. lactis NIZOB40 encode distantly related proteins, the organization of the two loci share only limited similarities. Instability of EPS production and variability of polymer yields are well documented problems in the dairy industry (Cerning, 1990). In L. lactis, this genetic instability is thought to be due to loss of the involved plasmid. In S. thermophilus, four different insertion sequences (IS ), IS1191, IS1194, IS981 and ISS1 have been previously identified (Gue´don et al., 1995; Bourgoin et al., 1996, 1998). Copies of these elements

0378-1119/99/$ – see front matter © 1999 Elsevier Science B.V. All rights reserved. PII: S0 3 7 8 -1 1 1 9 ( 9 9 ) 0 0 14 4 - 4

152

F. Bourgoin et al. / Gene 233 (1999) 151–161

Table 1 Hybridization of various probes of the eps locus on DNA of S. thermophilus and L. lactis strainsa,b Strains

Size (kb) of the EcoRI fragments hybridizing with the different probes of the eps locus ( localized in Fig. 1) 1120

SE

I102V

I101U

A400

I114c

I116

I115

N

X

F

E

D

C

B

I118

I117

S. thermophilus CNRZ368, A054 CNRZ388 CNRZ308 CNRZ455 CNRZ702 CNRZ307 CNRZ7 CNRZ385 CNRZ445 IP6757 ATCC19258 CNRZ302 IP6756 CNRZ464d CNRZ391

5.7

5.2

3.4

3.9

6.0

3.1

4.7

2.5

5.3

5.3

5.3

5.3

5.3

5.3

5.3

5.3

6.4

11.0 12.0 9.4 8.6 9.4 13.0 15.0 12.0 15.0 9.4 7.0 8.6 12.0 12.0

− 9.4 7.8 7.3 ND 11.8 18.0 9.4 18.0 7.8 6.6 − 9.4 9.4

− − − − − − − − − − − − − −

3.4 9.6 8.4 14.0 3.4 13.0 15.0 − − − − − − −

10.0 10.5 9.2 ND 9.2 + + 10.5 13.5 9.0 7.2 8.2 10.5 10.5

− − − − − − − − − − − − − −

− − − − − − − − − − − − − −

3.4 − − − − − − − − − − − − −

5.3 − − ND ND − − − − − − − − −

5.3 − − ND ND − − − − − − − 6.2 6.2

12.0

9.4





10.5









6.2

5.3 − − ND ND − − − − − − − 6.2 6.2 7.6 6.2 7.6

5.3 3.2 3.2 ND ND 12.0 13.0 13.0 3.2 4.2 5.5 − 6.2 6.2 7.6 6.2 7.6

5.3 3.2 3.2 ND ND 12.0 13.0 13.0 3.2 4.2 5.5 − 6.2 6.2 7.6 6.2 7.6

5.3 3.2 3.2 ND ND 12.0 13.0 13.0 3.2 4.2 5.5 2.8 6.2 6.2 7.6 ND

5.3 3.2 3.2 3.2 4.2 12.0 13.0 13.0 3.2 4.2 5.5 2.8 6.2 6.2 7.6 6.2 7.6

6.4 6.4 6.4 6.4 6.4 6.4 6.4 6.4 6.4 6.4 6.4 6.4 6.4 6.4

CNRZ404

5.3 − − ND ND − − − − − − − 6.2 6.2 7.6 6.2 7.6

L. lactise Lc NCFB2045 Lc NCFB2006 Lc NCFB968e Lc CNRZ116e Ll CNRZ144

− − − − −

− − − − −

+ + − − −

+ + + + −

+ + + − −

− − − − −

− − − − −

− − − − −

− − − − −

+ − − − −

+/− − − − −

− − − − −

− − − − −

− − − − −

+/− − − − −

− − − − −

− − − − −

6.4

a Some small differences in size of EcoRI fragments hybridizing with various probes could be explained by the use of different Southern filters. b ND: not determined. +: presence of an hybridization signal, but the fragment size was not determined or indicated. +/−: presence of a weak hybridization signal; −: absence of an hybridization signal. c Digested DNA of S. thermophilus strains were hybridized with the I114 probe, whereas those of L. lactis strains were hybridized with the I114.6 probe, a 950 bp HindIII/XbaI fragment of pNST114. d S. thermophilus CNRZ464, NST2280 and NST11 showed identical hybridization patterns. e Ll: Lactococcus lactis subsp. lactis; Lc: Lactococcus lactis subsp. cremoris; Ld: Lactococcus lactis subsp. lactis biovar diacetylactis. DNA of some L. lactis strains hybridized with the same probe: Lc NCFB968, Lc NCFB1186, Lc NCFB2770, Ld NCFB2015 and Ll NCFB712; Lc CNRZ116, Ll LMA115, Lc 1, Ll 1, Ll 2 and Ld 1. All the NCFB strains tested are ropy.

are associated with polymorphic regions of the S. thermophilus chromosome (Roussel et al., 1997). A generalized genomic instability that is unrelated to IS was also found in some strains of S. thermophilus (Pe´bay et al., 1993). Therefore, instability of EPS production in S. thermophilus could be explained by mobile genetic elements like IS or by generalized genomic instability. The quasi-identities of IS905 sequence of L. lactis (Dodd et al., 1994) and IS1191 sequence of S. thermophilus and the distribution of these IS suggested that IS905/IS1191 were horizontally transferred from S. thermophilus to L. lactis (Gue´don et al., 1995, 1998). In the same way, horizontal transfers of IS981 and ISS1 have recently occurred from L. lactis to S. thermophilus, probably in co-cultures used for cheese manufacture (Gue´don et al., 1995, 1998; Bourgoin et al., 1996). The three different ISS1 copies and two of the four different IS981 copies were found in the same 12 kb region that was probably acquired from L. lactis by horizontal

transfer (Bourgoin et al., 1996). This region has been previously located on the Bsm5 fragment of the S. thermophilus CNRZ368 chromosome (Roussel et al., 1997). Additionally, this 12 kb region was found to be included in the eps locus of S. thermophilus CNRZ368. The comparison of this locus with the sequenced loci of other S. thermophilus strains, L. lactis and other streptococci suggests a complex evolution involving horizontal transfers.

2. Materials and methods 2.1. Bacterial strains and culture media S. thermophilus and L. lactis strains screened for the presence of the different probes of the eps locus are listed in Table 1. All strains were grown as previously described (Colmin et al., 1991). Escherichia coli Sure

F. Bourgoin et al. / Gene 233 (1999) 151–161

153

(Stratagene) and KW251 (Promega) were used as recipients of recombinant plasmids and phages l respectively. E. coli Sure was transformed by electroporation according to Dower et al. (1988).

These alignments were used to determine identity percentages. The DDBJ/EMBL/GenBank accession number of the 32 479 bp complete sequence of the eps locus of S. thermophilus CNRZ368 is Z98171.

2.2. Nucleic acid isolation and DNA cloning

2.5. Determination of the ropy feature

Chromosomal DNA was extracted as previously described (Colmin et al., 1991). The construction of a genomic library of S. thermophilus CNRZ368 in bacteriophage lGEM11 (Promega) has been previously reported (Pe´bay et al., 1992). The DNA of the three overlapping bacteriophages lNST302 (Bourgoin et al., 1996), lNST303 and lNST304, isolated from the genomic library, was extracted according to Sambrook et al. (1989). The plasmids pNST102, pNST101, pNST49, pNST114, pNST119, pNST116, pNST115, pNST118 and pNST117 were obtained by cloning the nine EcoRI fragments of the overlapping bacteriophage DNA in pBCKS+ (Stratagene). Similarly, pNST120 and pNST120.1 were obtained by cloning SacI fragments of lNST302 DNA in pBCKS+. The resultant plasmids were extracted by the alkaline lysis method described by Hopwood et al. (1985) and purified on a CsCl gradient. The 45 overlapping subclones used for the sequencing of the eps locus were obtained by cloning different restriction fragments of these plasmids in pBCKS+. Plasmid DNA from these subclones was extracted by the method described by Mierendorf and Pfeffer (1987).

S. thermophilus strains were cultured at 42°C for 16 h in 10% skim milk supplemented with 1% yeast extract and 0.08 M phosphate buffer (pH 7). Each sample was homogenized by vortexing and the medium was analysed by visual inspection.

2.3. Probes and hybridizations Southern transfer of digested DNA onto Nitran N Schleicher & Schuell membranes, hybridization and washings were performed as previously described (Larbi et al., 1990). High stringency washing was carried out at 60°C. The different probes used for the analysis of the eps locus are indicated in Fig. 1. All probes were purified by elution with the GeneClean procedure (Bio 101) and were digoxigenin-labelled by random priming using the Dig DNA Labelling and Detection Kit (Boehringer Mannheim).

3. Results 3.1. ORF identification and organization of the eps locus A 32.5 kb locus (Fig. 1) including the 12 kb region with a high IS content was cloned and sequenced. All ORF analysed were larger than 60 codons and exhibited ATG, GTG or TTG as a start codon 3–13 bp downstream from a putative ribosome binding site (RBS ) sequence complementary to the 3∞ end of the 16S rRNA of S. thermophilus (Ludwig et al., 1992) ( Table 2). The eps locus contains seven complete or truncated IS (Gue´don et al., 1995; Bourgoin et al., 1996, 1998; data not shown) and 25 ORF or pseudo-ORF (ORF interrupted by frameshifts or stop codons) ( Table 2). Putative products encoded by 17 of these sequences are related to proteins involved in the polysaccharide synthesis of streptococci or other bacteria ( Table 3). The genetic organization of the region is shown in Fig. 1. The G+C content of this region is 34%, which is lower than the typical 37.2–39.8% G+C content reported for the S. thermophilus genome ( Farrow and Collins, 1984). The eps locus was divided into five different regions based on hybridization of different probes with DNA of various S. thermophilus or L. lactis strains ( Table 1) and sequence comparison of this locus with eps/cps loci of S. thermophilus, other streptococci and L. lactis (see below). 3.2. The epsABCD region

2.4. DNA sequencing and sequence analysis The dideoxy chain-termination sequencing method was performed on alkali-denatured plasmids according to USB instructions using the Sequenase 2-0 kit ( USB), [35S ]-labelled dATP (Amersham) and oligonucleotides supplied by Genosys. The gapped BLAST programs (Altschul et al., 1997) were used to detect similarities of DNA sequences or their potential products with DNA or protein sequences from EMBL, GenBank, PIR, and Swissprot databases. The Clustal V program (Higgins and Sharp, 1988) was used to align related sequences.

The 3973 bp right region including epsABCD (Fig. 1) has a similar organization in other streptococci (Fig. 2) and shares more than 96% identity with the corresponding regions of the eps/cps loci of S. thermophilus Sfi6 (Stingele et al., 1996), MR-1C (Low et al., 1998) and NCFB2393 (Griffin et al., 1996) ( Table 3; Fig. 2). However, the 5 nt 3∞ end of cpsD and the downstream sequence found in S. thermophilus NCFB2393 are replaced by an unrelated sequence in S. thermophilus CNRZ368 and Sfi6. The proteins encoded by epsABCD also share 43–69% identity with the products of the first

154

F. Bourgoin et al. / Gene 233 (1999) 151–161

Fig. 1. Map of the sequenced eps locus of S. thermophilus CNRZ368. Thick lines show the inserts of the recombinant l clones. Most of their limits are not exactly known and are shown by dashed lines. EcoRI and SacI cloned fragments are indicated by thin lines. Putative promoters and terminators are indicated by flags and circles respectively. Putative promoters and terminators of IS are not indicated, but terminator or promoter found in ISS1 that could modify the transcription of adjacent sequences are indicated. Nucleotide sequences that are almost identical (greater than 95% identity) to previously reported sequences found in S. thermophilus and L. lactis strains are indicated by black and grey boxes respectively. Arrows show the location and orientation of the putative genes or pseudo-genes. DIS corresponds to a truncated IS. Pseudo-ORF that contain a potential frameshift or stop codon are indicated by asterisks. The probes used for Southern hybridization analyses are shown by white boxes.

Table 2 General features of the identified ORF or pseudo-ORF of the eps locus ORFa

Location (nt)b

Size (aa)

G+C (%)

Putative translation startc

epsA epsB epsC epsD epsE epsFd,e epsNd epsO epsP epsQ orf1 epsR orf2 orf3 orf4 epsSd epsTd epsL orfB orf 14.9 epsU epsV epsW orf 5d pgmd

30 705–32 159 29 973–30 704 29 266–29 964 28 507–29 256 27 770–28 453 26 648–27 764 25 105–26 013 24 523–24 795 23 664–24 506 22 728–23 567 21 445–22 731 20 296–21 438 19 364–20 296 18 730–19 107 18 297–18 656 16 825–18 254 16 000–16 919 14 952–15 842 14 025–14 927 11 356–11 925 6605–8020 4744–5709 1988–2179 615–1403 1–565

484 243 232 249 227 372 302 90 280 279 428 380 310 125 119 477 307 297 300 189 471 321 63 279 >188

36.7 37.7 34.8 37.1 33.9 34.6 39.5 42.2 36.5 34.6 30.9 35.2 30.4 26.2 26.7 35.7 29.3 35.0 32.0 34.6 29.7 29.5 30.2 39.3 41.9

AGGAGcaatttatATG GGAGGaaaaataaGTG AGGAGatattATG AGGAGaagaaATG GGAGGaaatgagATG GGAGcgtgttagtactgATG AGaacagatGTG AGGcggtggcATG AGGAataatcgATG GGAGGatatattacacATG AGGttggtgaATG AGGAGatgatttggaATG GAGGatgaataATG AGGaacGTG AGGAacatatATG GAGGaatgtcgaTTG AGGtcgagtcttTTG GGAGGaaaaaTTG GGAGataccATG AGGAGttagaaacagTTG GGAGGgttattaATG AGGAGaattttATG GGAGatttaataaATG AGGAGGtgctttATG AGGgtggtaaagtaATG

a IS are not reported. ORF or pseudo-ORF encoding proteins that share significant similarities to known or putative proteins are indicated in bold. b Positions corresponding to the Z98171 GenBank/EMBL accession number of the sequence of the eps locus of S. thermophilus CNRZ368. c The putative start codons and ribosome binding sites are indicated in bold and in upper case letters respectively. d Sequence analysis and comparison with other putative proteins showed one frameshift in epsF, epsS, orf5 and pgm, two in epsN and four in epsT. Stop codons were detected in position 545 in pgm and in position 1203 in orf5. e A 1 nt deletion induces a frameshift in epsF of S. thermophilus Sfi6. This suggests that the size of this pseudo-ORF is 1118 nt instead of 960 nt as reported by Stingele et al. (1996).

EpsF(38)

EpsF(36) EpsG(33) EpsG(50) EpsF(28)

epsN

EpsF(26) EpsF(54)

EpsF(27)

EpsF(55) EpsG(31)

EpsU(50)

EpsW(50)

EpsF(55)

EpsF(28)

Cps19fA(51) Cps19fB(63) Cps19fC(51) Cps19fD(58) Cps19fE(46)e

Cps14A(52) Cps14B(61) Cps14C(51) Cps14D(58) Cps14E(46)e

Cps14L(35) Cps14J(36) Cps14I(29) Cps14L(50)

Spn19Fc

Spn14c

Cps23FU(38)

Cps23FA(51) Cps23FB(62) Cps23FC(51) Cps23FD(54) Cps23FE(45)e

Spn23Fc

Sagc

Cap1A(50) Cap1B(63) CpsA(69) Cap1C(51) CpsB(55) Cap1D(58) CpsC(60) CpsD(43)e

Spn1c

EpsK(51)

EpsK(41) EpsG(37)

epsL(97) orfY(99)

EpsC(29) EpsA(27) EpsB(40) EpsD(46)

LlaNIZOc

TrsD(32)

Yenc

TrsA(44)

RfbX(26) TrsA(27) TrsB(31)

RfbQ(31) RfbR(30) RfbQ(33) RfbR(29)

Sdyc

GumJ(23)

Xcac

unknown regulation unknown transport glycosyl transferase

glycosyl transferase polymerisation/export glycosyl transferase

rhamnosyl transferase

glycosyl transferase rhamnosyl transferase

glycosyl transferase

regulation unknown polymerisation/export polymerisation/export glycosyl transferase glycosyl transferase

Putative functiond

transport Pgm(37) phosphoglycerate mutase

Tpac

a The identity percentages determined on nucleotide sequences are written in italics, in bold and doubly underlined. The other percentages were determined on protein sequences encoded by the ORF or pseudo-ORF. Only the most similar published sequences are reported in this table, but other significant similarities were also found. The identity percentages shown in parentheses were determined on the areas of the sequences that share a reliable alignment. b IS and ORF that do not share homologies are not reported. c SthSfi6: Streptococcus thermophilus Sfi6 ( U40830); SthMR: Streptococcus thermophilus MR-1C (AF053346 to AF053351); SthNC: Streptococcus thermophilus NCFB2393 ( X94980); SthCN: Streptococcus thermophilus CNRZ368 (this work: Z98171); Spn14: Streptococcus pneumoniae serotype 14 ( X85787); Spn19F: Streptococcus pneumoniae type 19F ( U09239); Spn23F: Streptococcus pneumoniae type 23F (AF030373); Spn1: Streptococcus pneumoniae type 1 (Z83335); Sag: Streptococcus agalactiae COH1 (L09116); LlaNIZO: Lactococcus lactis NIZOB40 ( U93364); Sdy: Shigella dysenteriae (L07293); Yen: Yersinia enterocolitica ( Z47767); Xca: Xanthomonas campestris ( U22511); Tpa: Treponema pallidum ( U55214). d Putative functions are predicted by sequence similarity. e Similarities were only found with the C-terminal part of the proteins CpsE, Cps14E, Cps19fE, Cps23FE and CpsD.

epsW pgm

epsL orfB orf 14.9 orf 14.9(97) epsU epsV EpsI(39)

epsR epsS epsT

EpsP(30)

epsQ

EpsF(28)

EpsN (36) EpsO (28) EpsR(28) EpsT(55) EpsF(36)

SthCNc

EpsQ(30)

cpsA(94) cpsB(97) cpsC(93) cpsD(98) CpsE(44)e

SthNCc

epsP

epsO

cpsA(99) cpsB(99) cpsC(100) cpsD(99) epsE(100) epsF(99)

epsA(99) epsB(97) epsC(93) epsD(97) epsE(99) epsF(98) EpsG(27)

epsA epsB epsC epsD epsE epsF

SthMRc

SthSfi6c

ORFb

Table 3 Identity between the ORF or pseudo-ORF of the eps locus of S. thermophilus CNRZ368 or their putative products and genes or gene products of other bacteriaa

F. Bourgoin et al. / Gene 233 (1999) 151–161 155

156

F. Bourgoin et al. / Gene 233 (1999) 151–161

Fig. 2. Organization of the polysaccharide synthesis loci of streptococci. Only the cps locus of S. pneumoniae type 19F was indicated, but the cps loci of S. pneumoniae type 23F, serotype 14 and serotype 1 have the same organization of the cpsABCDE region. White boxes show nucleotide sequences that share more than 95% identity with sequences of the eps locus of S. thermophilus CNRZ368. Grey boxes correspond to genes encoding proteins that share 43–69% identity with proteins encoded in S. thermophilus CNRZ368 and 70–80% identity to each other. Hatched boxes represent sequences that encode related proteins (30–40% identity), but these proteins are not related to proteins encoded in S. thermophilus CNRZ368. Black boxes correspond to sequences encoding unrelated proteins. Partially sequenced genes are shown by dashed lines. Other sequences are represented by thin lines. Du indicates the duplicated sequence of S. thermophilus Sfi6 that includes the end of epsF.

genes of the cps loci that encode the capsular polysaccharide biosynthesis of Streptococcus pneumoniae type 19F (Morona et al., 1997), type 23F (Coffey et al., 1998), serotype 1 (Munoz et al., 1997) and serotype 14 ( Kolkman et al., 1997) and Streptococcus agalactiae COH1 (Rubens et al., 1993) ( Table 3). These epsABCD encoded proteins also share the same identity with the putative products of the first pseudo-genes of the cps locus of S. pneumoniae type 3 (Arrecubieta et al., 1995). Conversely, the putative products of epsBCD share only 27–46% identity with products of the eps locus of L. lactis NIZOB40 ( Van Kranenburg et al., 1997) ( Table 3). However, the gene organization of the eps/cps loci is completely different between L. lactis and streptococci. The probes corresponding to the right region of the eps locus, I117, I118 and the B probe which is included in I118 (Fig. 1), hybridized with the DNA of all S. thermophilus strains tested ( Table 1). The I117 probe detected the same 6.4 kb EcoRI fragment in all tested strains, whereas I118 and the B probe revealed fragments of various size in these strains. Probes C and D, which are located at the left of the B probe, hybridized with fragments of various size in all S. thermophilus strains tested except IP6756 ( Table 1). The G+C content of this region is 36.6%, which is closely related to the G+C content of the S. thermophilus genome ( Farrow and Collins, 1984). 3.3. The epsEF region The 1885 bp region containing epsEF (Fig. 1) was found in only some S. thermophilus strains ( Table 1)

and shares more than 98% identity with the corresponding parts of the eps loci of S. thermophilus Sfi6 (Stingele et al., 1996) and MR-1C (Low et al., 1998) (Table 3; Fig. 2). Sequence comparison of epsF of S. thermophilus CNRZ368 and Sfi6 revealed a 2 nt deletion in epsF of S. thermophilus CNRZ368 and a 1 nt deletion in epsF of S. thermophilus Sfi6. These deletions suggest that frameshifts lead to a 3∞ truncated pseudo-gene in CNRZ368 and a 5∞ truncated pseudo-gene in Sfi6. epsF putative products of CNRZ368 and Sfi6 are related to the EpsG putative protein of Sfi6. Comparison of these three putative products also suggests the presence of these frameshifts in epsF of S. thermophilus CNRZ368 and Sfi6 (data not shown). Further sequence comparison revealed a 206 bp duplication within the eps locus of S. thermophilus Sfi6: one copy includes the 153 bp 3∞ end of epsF and the 53 bp located downstream from this ORF and the other, which has only 5 nt difference, begins 18 bp downstream from epsG and contains the 32 bp 5∞ end of epsH ( Fig. 2). The putative EpsE protein also shares 43–46% identity with EpsD of L. lactis NIZOB40, with the C-terminal part of CpsE of S. thermophilus NCFB2393 and different serotypes of S. pneumoniae and with the C-terminal part of CpsD of S. agalactiae COH1 ( Table 3; Fig. 2). Probes E, F and X hybridized to a 5.3 kb EcoRI fragment in three S. thermophilus strains and to a 6.2 kb EcoRI fragment in five strains, indicating that only eight out of the 17 S. thermophilus strains tested contain epsEF ( Table 1). Ribotyping (Pe´bay et al., 1993) and IS1191-specific probe hybridization (data not shown) showed that the five strains that exhibit a 6.2 kb EcoRI

F. Bourgoin et al. / Gene 233 (1999) 151–161

fragment are closely related. The size of this fragment is similar to the 5968 bp fragment found by sequence analysis in S. thermophilus Sfi6. This fragment contains epsBCDEFG, suggesting that the organization of this region is similar in S. thermophilus Sfi6 and five of our tested strains. The F probe hybridized with the DNA of L. lactis NCFB2045, one of the 14 L. lactis strains tested. The G+C content of epsEF ( Table 2) is slightly lower than the G+C content of epsABCD and that of the S. thermophilus genome (Farrow and Collins, 1984), but it is similar to the 34.1–35.5% G+C content of the L. lactis genome (Bridge and Sneath, 1983). 3.4. The epsN–epsT region The 11.7 kb epsN–epsT region is absent in nearly all S. thermophilus strains tested ( Table 1). The I115 probe included in this region ( Fig. 1) hybridized with the DNA of only three out of the 19 S. thermophilus strains tested ( Table 1). The I114 and I116 probes hybridized with the DNA of two closely related strains, CNRZ368 and A054 ( Table 1). The epsN–epsT region contains 11 ORF or pseudo-ORF ( Fig. 1). Four of the putative proteins encoded by orf1, orf2, orf3 and orf4 do not show significant similarities to sequences of GenPept, PIR and Swissprot databases. The putative proteins encoded by the other ORF share significant similarities (23–55% identity) with previously reported proteins involved in EPS synthesis ( Table 3). Four of these proteins, EpsN, EpsO, EpsR and EpsT, share significant similarities to EpsF and EpsG of S. thermophilus Sfi6 (Stingele et al., 1996). The putative products of epsP, epsQ and epsS do not share significant similarities with proteins encoded by genes of the eps/cps loci of streptococci, but they share significant similarities with proteins encoded in distantly related bacteria, like Shigella dysenteriae and Xanthomonas campestris ( Table 3). Some ORF or pseudo-ORF encoding related proteins are clustered. The putative products of the adjacent genes epsP and epsQ share 30% identity and exhibit significant similarities with rhamnosyl transferases of Sh. dysenteriae, RfbQ and RfbR ( Table 3), two proteins also encoded by adjacent genes ( Klena and Schnaitman, 1993). epsN and epsO are also clustered and encode potential proteins that are distantly related to EpsF of S. thermophilus Sfi6. The G+C contents of the ORF within this region are highly heterogenous, from 26.2 to 42.2% (Table 2). The G+C contents of the clustered genes epsP (36.5%) and epsQ (34.6%) are similar to each other but are different from those of adjacent sequences. In the same way, the G+C contents of the clustered genes epsN (39.5%) and epsO (42.2%) are similar to each other but are different from those of adjacent sequences. This variability of the G+C content suggests that the epsN–

157

epsT region may consist of small fragments with different origins. 3.5. The epsL–IS981SC region Most of the sequences of the 13.6 kb epsL–IS981SC region are closely related to sequences from L. lactis. A 1740 bp sequence containing orfB and the 765 bp 3∞ end of epsL (Fig. 1) shares 97.6% identity with the 3∞ part of the eps locus of L. lactis NIZOB40 ( Van Kranenburg et al., 1997). Conversely, the 126 nt 5∞ end of epsL of S. thermophilus CNRZ368 shares only 54% identity with the same region of L. lactis NIZOB40. The epsL– IS981SC region contains three ISS1 copies and two IS981 copies that share more than 98% identity to homologous ISS1 and IS981 from L. lactis (Gue´don et al., 1995; Bourgoin et al., 1996). This region also contains two other IS, IS1194 (Bourgoin et al., 1998) and IS1193A (Fig. 1). The potential products of two other ORF present in the epsL–IS981SC region, epsU and epsV (Fig. 1), share significant similarities with the products of genes identified in loci involved in EPS synthesis ( Table 3). With the exception of the I102V probe, all probes isolated from this epsL–IS981SC region ( Fig. 1) hybridized with DNA of various L. lactis strains ( Table 1). This suggests that the entire epsL–IS981SC region was acquired by horizontal transfer from L. lactis. The epsL–IS981SC region also contains a 646 bp sequence located at the left of ISS1SA ( Fig. 1) that shares 97.5% identity with orf14.9, an ORF located at the 3∞ end of the eps locus of S. thermophilus Sfi6, the orf14.9 potential promoter and the orf14.9 RBS (Stingele et al., 1996). The last 8 bp of orf14.9 and the downstream sequence are replaced by an unrelated sequence in S. thermophilus Sfi6. We also found that the 185 bp left part of ISS1SA, an a ISS1 element (Bourgoin et al., 1996), shares 97.8% identity with a previously unidentified a ISS1 sequence located at the 5∞ end of orf14.9 of S. thermophilus Sfi6 (GenBank accession number U40830). orf14.9 and a ISS1 are divergent in S. thermophilus Sfi6, whereas they are convergent in S. thermophilus CNRZ368 ( Fig. 2). The DNA of all S. thermophilus strains tested hybridized with a specific probe of orf14.9, A400 ( Table 1). Another probe, I101U, revealed related sequences in nine out of the 19 S. thermophilus strains tested. The I102V and I114 probes revealed related sequences in only two closely related strains, CNRZ368 and A054 ( Table 1). The ORF located in the epsL–IS981SC region share a variable G+C content, from 29.5 to 39.3% ( Table 2). Some of them have lower G+C content than those usually found in the S. thermophilus genome (37–40%; Farrow and Collins, 1984) or in the L. lactis genome (34–36%; Bridge and Sneath, 1983), suggesting another origin for these genes.

158

F. Bourgoin et al. / Gene 233 (1999) 151–161

3.6. The epsW–pgm region The 2.2 kb epsW–pgm region located on the left part of the eps locus is present in all or almost all of the S. thermophilus strains tested ( Table 1). The SE probe hybridized with the DNA of 17 out of the 19 S. thermophilus strains tested, whereas the I120 probe hybridized with the DNA of all tested strains ( Table 1). The epsW–pgm region contains three ORF or pseudoORF: epsW, orf5 and pgm. The putative product encoded by epsW shares significant similarities with proteins involved in exopolysaccharide synthesis. The putative product of orf5 shares significant similarities with the putative products of various unknown ORF of E. coli (b0844 and b0822 with accession numbers AE000186 and AE000184 respectively) and Bacillus subtilis ( yitU, Y09476). pgm encodes a putative protein that shares significant similarities to phosphoglycerate mutase of different organisms ( Table 3). 3.7. Duplication of the eps locus in S. thermophilus CNRZ391 and CNRZ404 Probes isolated from the epsBCDEF region revealed the same two EcoRI fragments in S. thermophilus CNRZ391 ( Table 1). The F probe also hybridized with two fragments on ScaI and Asp718 digests (data not shown). Similar results were obtained with EcoRI digests of S. thermophilus CNRZ404 DNA ( Table 1). These results suggest that the epsBCDEF region, or a larger sequence of the eps locus, is duplicated in these strains.

4. Discussion A 32.5 kb variable locus of the S. thermophilus CNRZ368 chromosome, the eps locus, could be involved in exopolysaccharide synthesis. On the basis of similarities with proteins involved in the EPS synthesis of various bacteria, putative functions could be assigned to 16 out of the 25 ORF identified in this region ( Table 3). The eps locus was detected by hybridization in all S. thermophilus strains tested, although only three S. thermophilus strains, NST2280, CNRZ391 and CNRZ404, were found to be ropy strains in skim milk (data not shown). Undetectable EPS production or EPS production under other conditions cannot be excluded in strains found to be non-ropy, like S. thermophilus CNRZ368. The 18 right ORF of the eps locus of S. thermophilus CNRZ368 have the same orientation and could be part of a single operon as described in L. lactis NIZOB40 ( Van Kranenburg et al., 1997); however, some putative promoters and terminators have been identified in the eps locus (Fig. 1). The right part of the eps locus seems to contain functional genes, epsABCDE. These genes

are also present in the eps/cps loci identified in two ropy strains of S. thermophilus, Sfi6 (Stingele et al., 1996) and MR-1C (Low et al., 1998). Several other genes of the eps locus of S. thermophilus CNRZ368 contain frameshifts or stop codons (epsF, epsN, epsS, epsT, orf5) and, therefore, are pseudo-genes. Sequence comparison of the epsF gene from S. thermophilus Sfi6 and CNRZ368 shows a 1 nt deletion in S. thermophilus Sfi6 and suggests that epsF of this ropy strain is also a pseudo-gene. Two other ORF of S. thermophilus CNRZ368, epsW and epsO, encode proteins that share significant similarities to only the N-terminal part of related proteins and could also be pseudo-genes. Several distantly related genes encoding similar functions are present, and sometimes clustered in the eps locus of S. thermophilus CNRZ368 (epsU and epsW; epsP and epsQ; epsN, epsO, epsR, epsT and epsF ) and S. thermophilus sfi6 (epsF and epsG). A detailed sequence comparison indicates that short fragments of epsA, epsB, epsC and epsF from the eps locus of S. thermophilus CNRZ368 exhibit high divergence (9.5–39.1%) with corresponding sequences of the eps/cps loci of S. thermophilus Sfi6 (Stingele et al., 1996) and NCFB2393 (Griffin et al., 1996). The remaining sequences of these genes exhibit only 0.7–4.2% divergence ( Fig. 3). Therefore, epsA, epsB, epsC and epsF that have highly variable divergence with related sequences are mosaic genes. ISS1SC, which is also located in the eps locus of S. thermophilus CNRZ368, is a mosaic of a ISS1 and b ISS1 elements that are 15% divergent (Bourgoin et al., 1996). These mosaic genes probably resulted from homeologous recombination (homologous recombination between 5–15% divergent sequences) or illegitimate recombination between distantly related sequences, suggesting that horizontal transfers of DNA have occurred and that the transferred sequences have replaced a part of the original sequences. The highly variable eps locus of S. thermophilus contains two small constant regions, epsAB and pgm, that flank a variable region. This variable region could also be divided into two variable regions flanking the small constant region that contains orf14.9. A high variability of loci involved in exopolysaccharide synthesis has been described in other bacteria, like S. pneumoniae (Morona et al., 1997) and Staphylococcus aureus (Sau et al., 1997). The first genes of the cps locus of S. pneumoniae, cpsAB, and the homologous epsAB genes of the eps locus of S. thermophilus are present in all strains tested. This suggests that the constant region containing epsAB could come from a common ancestor of all these S. thermophilus strains. Similarly, the cpsCD genes of S. pneumoniae and the homologous espCD genes of S. thermophilus are present in almost all strains tested. This suggests that epsCD could come from a common ancestor of S. thermophilus strains and that the sequence was deleted or replaced in some strains. In

F. Bourgoin et al. / Gene 233 (1999) 151–161

159

Fig. 3. Nucleotide sequence comparison of the eps/cps loci of S. thermophilus CNRZ368, Sfi6 and NCFB2393. The sequences of S. thermophilus MR-1C were not used in this analysis because only fragments of epsA, epsB, epsC, epsD, epsE and epsF have been sequenced by Low et al. (1998). Grey and black boxes correspond to sequences that share 9.5 to 15% divergence and 30 to 40% divergence respectively. These regions with a high divergence degree are not taken in account to determine the divergence percentage of each gene that are indicated in parentheses. Thin lines represent the unrelated sequences and partially sequenced genes are shown by dashed lines.

S. pneumoniae, the region next to epsCD is highly variable and is adjacent to a constant region located outside the cps loci. On the contrary, the next region of the eps locus of S. thermophilus could be divided into different regions. The epsEF region is present in only nine S. thermophilus strains ( Table 1) and could have replaced the cpsE/cpsD genes, similar to those found in S. thermophilus NCFB2393 and other streptococci, in an ancestor common to S. thermophilus CNRZ368, Sfi6 and seven other strains. The epsN–epsT region is highly variable and is present in its entirety in only two closely related strains, CNRZ368 and A054, and partially present in S. thermophilus CNRZ388. This region has a highly heterogenous G+C content, suggesting that it could be divided into smaller fragments acquired from various origins by horizontal transfers. This region is adjacent to a small constant region that contains orf14.9. Sequences closely related to this ORF are present in all S. thermophilus strains tested in this work ( Table 1) and in S. thermophilus Sfi6 (Stingele et al., 1996). This suggests that orf14.9 could be present in the common ancestor of all S. thermophilus strains. In S. thermophilus Sfi6, orf14.9 is located at the 3∞ end of the sequenced eps locus, but several genes involved in EPS synthesis could be present upstream from this gene. Stingele et al. (1996) showed that the cloned eps locus of S. thermophilus Sfi6 could direct EPS synthesis and secretion in L. lactis MG1363, a non-EPS-producing heterologous host. However, they did not demonstrate that EPS synthesized in the transformed strain is identical. orf14.9 is adjacent to another highly variable region, the epsL–IS981SC region, which is present in only some S. thermophilus strains. The last region particularly contains pgm, which was revealed in all S. thermophilus strains tested and seems to be outside the eps locus. Hybridization results

for S. thermophilus strains and sequence comparison of the eps/cps loci of S. thermophilus CNRZ368, Sfi6 and NCFB2393 suggest that eps loci of S. thermophilus strains have undergone numerous rearrangements leading to chimeric loci. Some of the rearrangement points could be precisely located: between epsB and epsC, at 5 nt in the 3∞ end of epsD, at 26 nt from the 3∞ end of epsF, within the epsN–epsT region, at 8 nt in the 3∞ end of orf14.9, between orf14.9 and epsU, between epsU and epsW, and in the epsW–pgm region. In the same way, a detailed sequence comparison of the eps/cps loci of S. thermophilus CNRZ368 and L. lactis NIZOB40 suggests another rearrangement point at 126 nt in the 5∞ end of epsL. These rearrangements could have occurred by homeologous or illegitimate recombination between distantly related or unrelated sequences. Seven insertion sequences have been identified in the eps locus of S. thermophilus CNRZ368 and could be involved in some of the rearrangements that occurred in the left part of the locus. IS elements or vestiges of these elements have also been found at the extremities of various cps/eps loci, for example IS1202 in S. pneumoniae type 19F (Morona et al., 1997), IS1167 in S. pneumoniae type 1 (Munoz et al., 1997) and IS982 in L. lactis NIZOB40 ( Van Kranenburg et al., 1997). Several pieces of evidence indicate that recent horizontal transfers between S. thermophilus and L. lactis could be involved in the chimeric structure and polymorphism of the cps/eps loci of both species. (i) A specific probe of epsF and most of the probes of the epsL– IS981SC region, including the specific probes of IS981, ISS1 and IS1194 (Gue´don et al., 1995; Bourgoin et al., 1996, 1998), hybridized with closely related sequences of various L. lactis strains. (ii) The distribution of IS981

160

F. Bourgoin et al. / Gene 233 (1999) 151–161

and ISS1 suggests that horizontal transfer of both IS occurred from L. lactis to S. thermophilus in co-cultures used in cheese manufacture (Gue´don et al., 1995, 1998; Bourgoin et al., 1996). (iii) Various sequences of the epsL–IS981SC region, including orfB, epsL, three ISS1 copies and two IS981 copies, are nearly identical to sequences previously identified in L. lactis (this work; Gue´don et al., 1995; Bourgoin et al., 1996). This suggests that epsF was exchanged between S. thermophilus and L. lactis and that the 13.6 kb epsL–IS981SC region was transferred from L. lactis to S. thermophilus. This region may have been integrated into the S. thermophilus chromosome by recombination at 126 nt in the 5∞ end of epsL at one end and in IS981SC at the other end, since the duplication generally caused by transposition was not found (Bourgoin et al., 1996). However, sequences closely related to IS1193A (data not shown) and epsV of the epsL–IS981SC region have not been detected by hybridization in the L. lactis strains tested. IS1193A could have transposed after the acquisition of this region. Therefore, the evolution of the 13.6 kb epsL– IS981SC region could be more complex than a single transfer from L. lactis to S. thermophilus. A single copy of IS1194 was found in only a few strains of S. thermophilus and L. lactis (Bourgoin et al., 1998), suggesting that this IS was acquired earlier from another species of bacterium. The G+C content of the eps locus of S. thermophilus CNRZ368 is highly heterogenous (26.2 to 42.2%). In some genes, the G+C content is very different from that of the S. thermophilus genome ( Farrow and Collins, 1984) and the L. lactis genome (Bridge and Sneath, 1983). This suggests that these genes could come from different origins, perhaps from other lactic acid bacteria used in co-cultures. In the same way, the G+C content of the epsG–epsM region of the eps locus of S. thermophilus Sfi6 is low (about 30%; Stingele et al., 1996), suggesting that this region, which is completely unrelated to that of S. thermophilus CNRZ368, could also have been acquired by horizontal transfer. The presence of a partially sequenced a ISS1 element at the end of the eps locus of S. thermophilus Sfi6 also suggests exchanges between S. thermophilus and L. lactis. To our knowledge, previous studies of genes involved in polysaccharide synthesis only suggest recent horizontal transfer within the same species. Three of these studies have focused on S. pneumoniae (Coffey et al., 1998), Vibrio cholerae (Bik et al., 1995) and Haemophilus influenzae ( Kroll and Moxon, 1990). Our study strongly suggests that recent horizontal transfers have occurred between the two distantly related species of lactic acid bacteria, S. thermophilus and L. lactis, and also between other species or genera. These exchanges are probably involved in the variability of the eps locus, the appearance of novel structures of eps loci and may lead to the acquisition of novel functions.

References Altschul, S.F., Thomas, L.M., Alejandro, A.S., Jinghui, Z., Zheng, Z., Webb, M., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. Arrecubieta, C., Garcia, E., Lopez, R., 1995. Sequence and transcriptional analysis of a DNA region involved in the production of capsular polysaccharide in Streptococcus pneumoniae type 3. Gene 167, 1–7. Bik, E.M., Bunschoten, A.E., Gouw, R.D., Mooi, F.R., 1995. Genesis of the novel epidemic Vibrio cholerae O139 strain: evidence for horizontal transfer of genes involved in polysaccharide synthesis. EMBO J. 14, 209–216. Bourgoin, F., Gue´don, G., Gintz, B., Decaris, B., 1998. Characterization of a novel insertion sequence, IS1194, in Streptococcus thermophilus. Plasmid 40, 44–49. Bourgoin, F., Gue´don, G., Pe´bay, M., Roussel, Y., Panis, C., Decaris, B., 1996. Characterization of a mosaic ISS1 element and evidence for the recent horizontal transfer of two different types of ISS1 between Streptococcus thermophilus and Lactococcus lactis. Gene 178, 15–23. Bridge, P.D., Sneath, P.H.A., 1983. Numerical taxonomy of Streptococcus. J. Gen. Microbiol. 129, 565–596. Cerning, J., 1990. Exocellular polysaccharides produced by lactic acid bacteria. FEMS Microbiol. Rev. 87, 113–130. Coffey, T.J., Enright, M.C., Daniels, M., Morona, J.K., Morona, R., Hryniewicz, W., Paton, J.C., Spratt, B.G., 1998. Recombinational exchanges at the capsular polysaccharide biosynthetic locus lead to frequent serotype changes among natural isolates of Steptococcus pneumoniae. Mol. Microbiol. 27, 73–83. Colmin, C., Pe´bay, M., Simonet, J.M., Decaris, B., 1991. A species specific DNA probe obtained from Streptococcus salivarius subsp. thermophilus detects strain restriction polymorphism. FEMS Microbiol. Lett. 81, 123–128. Dodd, H.M., Horn, N., Gasson, M.J., 1994. Characterization of IS905, a new multicopy insertion sequence identified in lactococci. J. Bacteriol. 176, 3393–3396. Dower, W.J., Miller, J.F., Ragsdale, C.W., 1988. High efficiency transformation of Escherichia coli by high voltage electroporation. Nucleic Acids Res. 16, 6127–6145. Farrow, J.A.E., Collins, M.D., 1984. DNA base composition, DNA–DNA homology and long chain fatty acid studies on Streptococcus thermophilus and Streptococcus salivarius. J. Gen. Microbiol. 130, 357–362. Griffin, A.M., Morris, V.J., Gasson, M.J., 1996. The cps ABCDE genes involved in polysaccharide production in Streptococcus salivarius ssp. thermophilus NCFB2393. Gene 183, 23–27. Gue´don, G., Bourgoin, F., Decaris, B., 1998. Does gene horizontal transfer occur in lactic acid bacteria co-cultures? Lait 78, 53–58. Gue´don, G., Bourgoin, F., Pe´bay, M., Roussel, Y., Colmin, C., Simonet, J.M., Decaris, B., 1995. Characterization and distribution of two insertion sequences, IS1191 and iso-IS981, in Streptococcus thermophilus: does intergeneric transfer of ISs occurs in lactic acid bacteria co-cultures? Mol. Microbiol. 16, 69–78. Higgins, D.G., Sharp, P.M., 1988. Clustal: a package for performing multiple alignments on a microcomputer. Gene 73, 237–244. Hopwood, D.A., Bibb, M.J., Chater, K.F., Kieser, T., Bruton, C.J., Kieser, H.M., Lydiate, D.J., Smith, C.P., Ward, J.M., Schrempf, H., 1985. Genetic Manipulation of Streptomyces: A Laboratory Manual. The John Innes Institute, Norwich. Klena, J.D., Schnaitman, C.A., 1993. Function of the rfb gene cluster and the rfe gene in the synthesis of O-antigen by Shigelle dysenteriae 1. Mol. Microbiol. 9, 393–402. Kolkman, M.A.B., Wakarchuk, W., Nuijten, P.J.M., Van der Zeijst, B.A.M., 1997. Capsular polysaccharide synthesis in Streptococcus

F. Bourgoin et al. / Gene 233 (1999) 151–161 pneumoniae serotype 14: molecular analysis of the complete cps locus and identification of genes encoding glycosyltransferases required for biosynthesis of the tetrasaccharide subunit. Mol. Microbiol. 26, 197–208. Kroll, J.S., Moxon, E.R., 1990. Capsulation in distantly related strains of Haemophilus influenzae type b: genetic drift and gene transfer at the capsulation locus. J. Bacteriol. 172, 1374–1379. Larbi, D., Colmin, C., Rousselle, L., Decaris, B., Simonet, J.M., 1990. Genetic and biological characterization of nine Streptococcus thermophilus bacteriophages. Lait 70, 107–116. Low, D., Ahlgren, J.A., Horne, D., McMahon, D.J., Oberg, C.J., Broadbent, J.R., 1998. Role of Streptococcus thermophilus MR-1C capsular exopolysaccharide in cheese moisture retention. Appl. Environ. Microbiol. 64, 2147–2151. Ludwig, W., Kirchof, G., Klugbauer, N., Weizenegger, M., Betzl, D., Ehrmann, M., Hertel, C., Jilg, S., Tatzel, R., Zitzelsberger, H., Liebl, S., Hochberger, M., Shah, J., Lane, D., Wallnoef, P.R., 1992. Complete 23S ribosomal RNA sequences of Gram-positive bacteria with a low DNA G+C content. Syst. Appl. Microbiol. 15, 487–501. Mierendorf, R.C., Pfeffer, D., 1987. Direct sequencing of denatured plasmid DNA. Methods Enzymol. 152, 556–562. Morona, J.K., Morona, R., Paton, J.C., 1997. Characterization of the locus encoding the Streptococcus pneumoniae type 19F capsular polysaccharide biosynthetic pathway. Mol. Microbiol. 23, 751–763. Munoz, R., Mollerach, M., Lopez, R., Garcia, E., 1997. Molecular organization of the genes required for the synthesis of type 1 capsular polysaccharide of Streptococcus pneumoniae: formation of

161

binary encapsulated pneumococci and identification of cryptic dTDP-rhamnose biosynthesis genes. Mol. Microbiol. 25, 79–92. Pe´bay, M., Colmin, C., Gue´don, G., Simonet, J.M., Decaris, B., 1993. Chromosomal genetic instability in Streptococcus thermophilus. Lait 73, 181–190. Pe´bay, M., Roussel, Y., Simonet, J.M., Decaris, B., 1992. High-frequency deletion involving closely spaced rRNA gene sets in Streptococcus thermophilus. FEMS Microbiol. Lett. 98, 51–56. Roussel, Y., Bourgoin, F., Gue´don, G., Pe´bay, M., Decaris, B., 1997. Analysis of the genetic polymorphism between three Streptococcus thermophilus strains by comparing their physical and genetic organization. Microbiology 143, 1335–1343. Rubens, C.E., Heggen, L.M., Haft, R.F., Wessels, M.R., 1993. Identification of cpsD, a gene essential for type III capsule expression in group B streptococci. Mol. Microbiol. 8, 843–855. Sambrook, J., Fritsch, E.F., Maniatis, T., 1989. Molecular Cloning: A Laboratory Manual. 2nd edition, Cold Spring Harbor Laboratory Press, New York. Sau, S., Bhasin, N., Wann, E.R., Lee, J.C., Foster, T.J., Lee, C.Y., 1997. The Staphylococcus aureus allelic genetic loci for serotype 5 and 8 capsule expression contain the type-specific genes flanked by common genes. Microbiology 143, 2395–2405. Stingele, F., Neeser, J.R., Mollet, B., 1996. Identification and characterization of the eps (exopolysaccharide) gene cluster from Streptococcus thermophilus Sfi6. J. Bacteriol. 178, 1680–1690. Van Kranenburg, R., Marugg, J.D., Van Swam, I.I., Willem, N.J., De Vos, W.M., 1997. Molecular characterization of the plasmidencoded eps gene cluster essential for exopolysaccharide biosynthesis in Lactococcus lactis. Mol. Microbiol. 24, 387–397.