Complete genome sequences and comparative genome analysis of Lactobacillus plantarum strain 5-2 isolated from fermented soybean

Complete genome sequences and comparative genome analysis of Lactobacillus plantarum strain 5-2 isolated from fermented soybean

    Complete genome sequences and comparative genome analysis of Lactobacillus plantarum strain 5–2 isolated from fermented soybean Chen-...

948KB Sizes 0 Downloads 71 Views

    Complete genome sequences and comparative genome analysis of Lactobacillus plantarum strain 5–2 isolated from fermented soybean Chen-Jian Liu, Rui Wang, Fu-Ming Gong, Xiao-Feng Liu, Hua-Jun Zheng, Yi-Yong Luo, Xiao-Ran Li PII: DOI: Reference:

S0888-7543(15)30022-7 doi: 10.1016/j.ygeno.2015.07.007 YGENO 8759

To appear in:

Genomics

Received date: Revised date: Accepted date:

28 January 2015 16 July 2015 17 July 2015

Please cite this article as: Chen-Jian Liu, Rui Wang, Fu-Ming Gong, Xiao-Feng Liu, HuaJun Zheng, Yi-Yong Luo, Xiao-Ran Li, Complete genome sequences and comparative genome analysis of Lactobacillus plantarum strain 5–2 isolated from fermented soybean, Genomics (2015), doi: 10.1016/j.ygeno.2015.07.007

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Complete genome sequences and comparative genome analysis of Lactobacillus plantarum strain 5-2 isolated from fermented

T

soybean

a

SC R

Luoa, Xiao-Ran Lia*

Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming,

Yunnan 650500, China

Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome

NU

b

IP

Chen-Jian Liua#, Rui Wangb#, Fu-Ming Gonga, Xiao-Feng Liua, Hua-Jun Zhengb, Yi-Yong

MA

Center at Shanghai, Shanghai, China

*Corresponding author

D

Name: Xiao Ran Li

TE

Mailing address: Faculty of Life Science and Technology, Kunming University of Science and Technology, Chenggong, Kunming 650500, Yunnan, China

CE P

Phone: 86-871-592-0759 Fax: 86-871-592-0759

#

AC

E-mail: [email protected]

The first two authors contributed equally to this work.

1

ACCEPTED MANUSCRIPT

SC R

IP

T

Abstract: Lactobacillus plantarum is an important probiotic and is mostly isolated from fermented foods. We sequenced the genome of L. plantarum strain 5-2, which was derived from fermented soybean isolated from Yunnan province, China. The strain was determined to contain 3,114 genes. Fourteen complete insertion sequence (IS) elements were found in 5-2 chromosome. There were 24 DNA replication proteins and 76 DNA repair proteins in the 5-2 genome. Consistent with the classification of L. plantarum as a facultative heterofermentative lactobacillus, the 5-2 genome encodes key enzymes required for the EMP (Embden-Meyerhof-Parnas) and phosphoketolase (PK) pathways. Several components of the secretion machinery are found in the 5-2 genome, which was compared with L. plantarum ST-III, JDM1 and WCFS1. Most of the specific proteins in the four genomes appeared to be related to their prophage elements. Keywords: Genome; Bacteria; Lactobacillus plantarum

1. Introduction

NU

Lactic acid bacteria (LAB) are widely distributed in nature and have a rich species diversity. LAB play a significant role in the production of fermented foods that are used as probiotics, which can improve the nutritional value, flavor, food preservation and added value of fermented foods. Furthermore, LAB possess

MA

unique physiological activities and nutritional functions. These bacteria produce antimicrobial agents such as acids, hydrogen peroxide and bacteriocins, and they have great potential as food preservatives [1, 2]. In addition, LAB can be added to various foods as probiotics to promote digestion and absorption, maintain

D

the balance of the intestinal flora and enhance human immunity. Lactobacillus plantarum is an important

TE

LAB that resides in a variety of environmental niches, such as all types of fermented foods and certain parts of the human body [3-5]. L. plantarum is commonly applied in industrial settings because of its versatility

CE P

and metabolic capacity, and some strains are being increasingly marketed as starter cultures or probiotics [6]. Research conducted in recent years has led to the conclusion that L. plantarum can promote health in humans and animals [7, 8]. L. plantarum significantly reduces the concentration of cholesterol and fibrinogen [9], and it also reduces the risk of cardiovascular disease and thus prevents atherosclerosis in and

AC

smokers[10]. L. plantarum has one of the largest genomes known among the LAB, and it is a very flexible versatile

species

[11].

According

to

the

information

released

by

GOLD

(http://www.genomesonline.org), studies of 43 genomes of L. plantarum are underway or have been completed. The genome of L. plantarum WCFS1 was the first whole genome of L. plantarum sequenced [12], among which four L. plantarum genomes are publically available [12-15]. The L. plantarum strain 5-2 that was sequenced in the present study was originally isolated from fermented soybeans in Yunan province of China. In our previous studies, L. plantarum strain 5-2 was found to possess several probiotic properties that are beneficial to its production and life. On the one hand, L. plantarum strain 5-2 has a broad capacity to inhibit the growth of pathogens; on the other hand, it has a high capacity to produce organic acid, especially a relatively high yield of phenyllactic and gamma-aminobutyric acids. Considering the above advantages of L. plantarum strain 5-2, we reported the complete genome sequence of L. plantarum strain 5-2 and conducted a comparative genome analysis among L. plantarum strain 5-2 and other publically available L. plantarum genomes.

2. Results and discussion 2

ACCEPTED MANUSCRIPT 2.1 Genome features The L. plantarum strain 5-2 (L. plantaum 5-2) genome contains a single circular chromosome of 3,237,652 bp with a GC content of 44.7% (Fig. 1). We identified 3,114 genes in the genome with an

T

average length of 876 bp and a mean GC content of 45.7% that occupied 84.3% of the genome (Table 1).

IP

Among the 3,114 genes, 2,326 (74.7%) genes could be assigned to Clusters of Orthologous Group (COG) families comprising 20 functional categories (Table S1). Biological functions were defined for

SC R

2,484 (79.8%) genes, and 587 genes were homologous to conserved proteins with unknown functions in other organisms. The remaining 43 genes encoded hypothetical proteins with no matches to any known proteins. In addition, 16 rRNA genes and 72 tRNAs were identified in the genome. In the L. plantaum 5-2 genome, 41 lipoproteins, 34 secreted proteins and 77 transmembrane proteins

NU

were identified, which indicated that 4.8% of the encoded proteins in this strain were associated with the extracellular environment. Accordingly, a lipoprotein signal peptidase gene (JM48_1567) and

MA

prolipoprotein diacylglyceryl transferase gene (JM48_0594) were predicted; three signal peptidase I genes (JM48_2467, JM48_2468 and JM48_3112) were also found. 2.2 IS elements

D

Fourteen complete insertion sequence (IS) elements were detected in the genome (Table S2). Five of

TE

the 22 IS elements were orthologs of the known elements ISP1, ISLpl1 and ISLsa1, which were identified using the IS Finder software; the other nine IS elements in the genome were classified into the IS4 family.

CE P

Moreover, eight truncated transposases were also found in the 5-2 genome. ISLpl1 consists of 1,043 bp and encodes a 309-amino-acid transposase belonging to the IS30 family. Three complete copies (JM48_1003, JM48_1020 and JM48_3106) of ISLpl1 were present in the 5-2 genome. ISLsa1 is 1,035 bp and encodes a 306-amino-acid transposase that also belongs to the IS30 family.

AC

One complete ISLsa1 (JM48_1019) was identified in the genome. ISP1 is 1,433 bp and encodes a 420-amino-acid transposase that belongs to the ISL3 family. One complete ISP1 (JM48_0256) was found in the genome.

2.3 DNA replication and DNA repair system The oriC region is frequently located within the rnpA–rmpH–dnaA–dnaN–recF–gyrB gene cluster and is

usually

next

to

the

dnaA

gene

[16].

In

the

5-2

genome,

the

complete

rnpA–rmpH–dnaA–dnaN–recF–gyrB gene cluster was located next to the dnaA gene. The rnpA-rmpH (JM48_3126-JM48_3127) was adjacent to the dnaA-dnaN (JM48_0001-JM48_0002), and the recF-gyrB (JM48_0004-JM48_0005) was also next to the dnaA-dnaN. No features of a terminus of replication were identified; however, a GC skew inversion was found at position 1,637,923, which could be considered as the terminus of replication. We found 24 DNA replication proteins in the 5-2 genome (Table S3, Sheet A). The central enzyme, the DNA polymerase III holoenzyme, comprised seven genes that separately encoded the subunits alpha (DnaE, JM48_1669), beta (DnaN, JM48_0002), delta (HolA, JM48_1858), delta’ (HolB, JM48_0549), 3

ACCEPTED MANUSCRIPT gamma/tau (DnaX, JM48_0543), epsilon (DnaQ, JM48_0645) and a Gram-positive-type alpha (PolC, JM48_1784), which has been known to endow the strain with 3’ to 5’ exonuclease activity [17]. In addition to DNA polymerase III, four genes were also involved in DNA elongation, including two RNaseH genes (JM48_1629 and JM48_2240), one DNA ligase gene (ligA, JM48_0922) and the DNA polymerase I gene

T

(polA, JM48_1267). With the exception of dnaA, seven additional genes were found to participate in DNA

IP

replication initiation, including two single-stranded DNA-binding protein genes (ssb, JM48_0008 and JM48_0951), two dnaB genes (JM48_0012 and JM48_1271), one dnaI gene (JM48_1272), one dnaG gene

SC R

(JM48_1724) and one hupB gene (JM48_1652). Moreover, five DNA topo-isomerase genes were also found, including two DNA gyrase genes (gyrB, JM48_0005 and gyrA, JM48_0006), two DNA topoisomerase IV genes (parC, JM48_1619 and parE, JM48_1620) and DNA topoisomerase I gene (topA,

NU

JM48_1627). However, no gene was identified to encode a DNA replication termination factor. We found 76 DNA repair proteins in the 5-2 genome (Table S3, Sheet B). Among these, 14 genes were involved in base excision repair, 11 genes were related to nucleotide excision repair, 20 genes

MA

participated in homologous recombination, five genes were associated with non-homologous end-joining and the ogt gene (JM48_2453) was important for direct repair. 2.4 Transcription and translation

D

One hundred and ninety-nine genes were predicted to be involved in 5-2 genome transcription (Table

TE

S4). Among these genes, seven genes in the genome encoded the DNA-dependent RNA polymerase subunits (i.e., alpha, beta, beta’, delta, sigma, sigma-54 and omega). Elongation and transcription

CE P

termination in the 5-2 genome were regulated by three Nus factors, NusA, NusB and NusG, two Gre factors, GreA and GreB, and one termination factor Rho. Bacterial GreA and GreB were cleavage factors necessary for the natural progression of RNA polymerase, which promotes transcription elongation by stimulating an endogenous, endonucleolytic transcript cleavage activity of the RNA polymerase [18]. NusA

AC

could induce transcription pausing or stimulate anti-termination together with NusB and NusG, and Rho factor interacted with the elongation factors NusA and NusG to regulate termination and antitermination of transcription, and it was required to suppress the toxic activity of foreign genes [19]. Ninety-four transcription regulators belonged to 21 families, and 90 transcription regulators that were not classified were found in the 5-2 genome: one heat-inducible transcription repressor hrcA (JM48_1767), one glutamine synthetase repressor glnR (JM48_1328), two LacI family repressors (JM48_0160 and JM48_2977) and one GntR family arabinose operon repressor araR (JM48_3040), which likely provided negative regulation [20]. A total of 103 genes were involved in 5-2 genome translation, including 54 ribosomal proteins, 31 tRNA synthetase genes, 16 translation factors and two RNA transport proteins (Table S5). The 5-2 genome translation factors included one ribosome-binding factor (RbfA, JM48_1778), one SUA5 family translation factor (RimN, JM48_2077), three initiation factors (InfA, InfB, InfC), five elongation factors and six peptide chain release factors. 2.5 Transport 4

ACCEPTED MANUSCRIPT L. plantarum is a versatile and flexible organism and is able to grow on a wide variety of sugar sources. This phenotypic trait is reflected by the high number of genes encoding putative sugar transporters [12]. The 5-2 genome transport system consisted of 240 genes, which mainly constitute the phosphotransferase system (PTS) and ATP-binding cassette (ABC) transporter system (Table S6).

T

Sixty-seven genes were related to the genomic PTS system, and ptsI (JM48_1067) and ptsH

IP

(JM48_1066) encoded PTS Enzyme I (EI) and phosphocarrier protein HPr, which delivered phosphoryl groups from phosphoenol-pyruvate to EII enzymes (EIIs) [21]. EII complexes may exist as distinct proteins

SC R

or as a single multidomain protein, and each enzyme II (EII) complex consists of one or two hydrophobic integral membrane domains (domains C and D) and two hydrophilic domains (domains A and B). In the 5-2 genome, there were ten complete PTS EII complexes that were predicted to be involved in the transport

NU

of carbon sources, including beta-glucosides, cellobiose, fructose, galactitol, glucose, mannitol, mannose, N-acetylgalactosamine, sorbitol and sucrose (Table S6). Various sugar transport systems are known to import more than one substrate, thereby expanding the carbon transport capacity [12].

MA

One hundred and seventy-eight genes encoded components of the genomic ABC transporter system (Table S6). Of these transporters, 104 were importers, and 63 were exporters. Many of these importers transport amino acids, peptides and inorganic ions, whereas the substrate specificity of most of the

D

exporters was unknown, as described for L. plantarum WCSF1 [12]. The 5-2 chromosome encoded five transporters for the uptake of branched-chain amino acids, including an ABC transporter encoded by the

TE

livFGHKM genes (JM48_2561 to JM48_2565). It is noteworthy that the glutamine-specific ABC-transporters display considerable redundancy because four complete systems were found in the 5-2

CE P

genome, which is also equivalent to the L. plantarum WCSF1 genome [12] and suggests that glutamine transport could be important in the regulation of nitrogen metabolism in L. plantarum 5-2 via its potential effect on the signaling role fulfilled by glutamine synthetase.

AC

2.6 Carbohydrate metabolism

L. plantarum is grouped into the facultative heterofermentative lactobacilli, which are the strains that ferment sugars via the EMP pathway or the phosphoketolase (PK) pathway, leading to homolactic and heterolactic fermentation profiles, respectively [12]. In the 5-2 genome, genes that participate in the intact EMP and PK pathways were both found (Table S7), including one 6-phosphofructokinase 1 (pfkA, JM48_1668) and two phosphoketolases (JM48_2294 and JM48_3033), which are the key enzymes in the respective pathways. Similarly to L. plantarum WCSF1, the 5-2 chromosome did not encode an intact citrate acid cycle (TCA), but several of the enzymes in this pathway appeared to be present, including one fumarate reductase flavoprotein subunit (JM48_0897), one class II fumarate hydratase (JM48_0896), one malate dehydrogenase (JM48_0883), one pyruvate carboxylase (JM48_1866), one dihydrolipoamide dehydrogenase (JM48_1876), one pyruvate dehydrogenase E1 component subunit alpha (JM48_1879), one pyruvate dehydrogenase E1 component subunit beta (JM48_1878), one dihydrolipoamide acetyltransferase (JM48_1877) and one phosphoenolpyruvate carboxykinase (ATP) (JM48_2915). In L. plantarum, glucose was degraded via the EMP pathway to produce pyruvate, which was subsequently converted into approximately equimolar amounts of D- and L-lactate by the activities of 5

ACCEPTED MANUSCRIPT stereospecific lactate dehydrogenase enzymes [22]. In the 5-2 genome, two L-lactate dehydrogenases genes (ldhL, JM48_0309 and JM48_0458) and one D-lactate dehydrogenase gene (ldhD, JM48_1795) were identified. In addition to these ldhL and ldhD genes, the chromosome encoded several other pyruvate-dissipating enzymes (JM48_2246, JM48_2271, JM48_1866 and JM48_1876-JM48_1879) that

T

were predicted to catalyze the production of other metabolites, including formate, acetate, oxaloacetate and

IP

acetyl-CoA. In comparison to L. plantarum WCSF1, the pyruvate-dissipating potential in 5-2 was clearly

SC R

reduced [12]. 2.7 Secretion 2.7.1 Secretion and processing machinery

AC

CE P

TE

D

MA

NU

As observed for L. plantarum WCSF1, components of the secretion machinery were found in the L. plantarum 5-2 genome, including the signal-recognition particle proteins Ffh (JM48_1379) and FtsY (JM48_1377), the general chaperone Tif (trigger factor, JM48_1850), and the components SecA (JM48_0580), SecE (JM48_0527), SecG (JM48_0631), SecY (JM48_0845), and YajC (JM48_1992) in the major translocation pathway, but no SecDF (Table S8). There were also two YidC homologs (JM48_1304 and JM48_3124) that might play a role in the secretion pathway because YidC is involved in the insertion of hydrophobic sequences into the lipid bilayer after initial recognition by the SecAYEG translocase [23]. In addition, there were two peptidylprolyl isomerases genes, prsA (JM48_1214) and ppiB (JM48_1947), three signal peptidase I genes (JM48_2467, JM48_2468 and JM48_3112), one signal peptidase II (JM48_1567) for the cleavage of lipoprotein signal peptides and coupling to membrane lipids, and a single sortase (JM48_0436), which is a transpeptidase that attaches surface proteins to cell wall, cleaves between the Gly and Thr of the LPxTG (Leu-Pro-any-Thr-Gly) motif and catalyzes the formation of an amide bond between the carboxyl group of threonine and the amino group of the cell wall peptidoglycan. No components of the twin-arginine translocation (TAT) pathway were found. The above characteristics of 5-2 genome were almost the same as those described for L. plantarum WCSF1 genome [12], which may indicate the evolutionary conservation of the secretion and processing machinery in L. plantarum. 2.7.2 Extracellular proteins One hundred and eighty-eight extracellular proteins were predicted in the 5-2 genome (Table S9). Most of these extracellular proteins were predicted to be anchored to the cell by single N- or C-terminal transmembrane anchors (80 proteins), lipoprotein anchors (17 proteins), LPxTG-type anchors (25 proteins, including five CscD family proteins), or other cell wall-binding (repeated) domains, such as LysM domains (nine proteins) or chitin-binding domain (one protein). Sixteen proteins, including seven CscB and three CscC family proteins, contained the WxL domain, which conferred a cell surface localization function and might interact with peptidoglycan [24, 25]. In addition to the CscD, CscC and CscB family proteins mentioned above, another protein family was identified: CscA. The CscA, CscB, CscC and CscD proteins were proposed to form cell-surface protein complexes and played a role in carbon source acquisition, and their primary occurrence in plant-associated gram-positive bacteria suggested a possible role in the degradation and utilization of plant oligo- or poly-saccharides [24]. Twenty-five proteins with the LPxTG-type anchor (especially the LPQTxE motif) were identified in the 5-2 genome, and 20 of the 25 proteins contained the LPQTxE motif, which was less than that identified in the WCFS1 genome [12]. Most of the predicted extracellular enzymes were hydrolases (22 proteins), 6

ACCEPTED MANUSCRIPT half of which have a known substrate specificity (beta-lactamase, endo-beta-N-acetylglucosaminidase, extracellular

zinc

metalloproteinase,

gamma-D-glutamate-meso-diaminopimelate

muropeptidases,

glycoside hydrolase, lysins, serine protease HtrA and serine-type D-Ala-D-Ala carboxypeptidases); the remainder had an unknown specificity but possessed hydrolase catalytic residue consensus motifs. One

T

bacterial group 2 Ig-like protein (JM48_1089) containing a SD-repeat with 228 residues was found.

IP

Extracellular proteins with a similar domain structure, including very long Ser-containing repeats, have also been found in other Gram-positive bacteria [12]. It has been suggested that glycosyltransferase can generate

SC R

O-linked glycosylations on serines to produce structures similar to mucins, which may coat the surface of the bacterium or interact with host cell mucins [26]. In L. plantarum WCSF1, three adjacent tagE-like genes near the sdr gene, encoding putative poly (glycerol-phosphate) alpha-glucosyltransferases, were

NU

inferred to fulfill such a role. However, in the 5-2 genome, these genes were not adjacent. In addition to the proteins described above, another large class of extracellular proteins belonged to the transport system (22 proteins, mainly the ABC transport system). These findings suggest that L.

MA

plantarum 5-2 is a versatile and flexible microbe that can sustain its growth in a variety of sugar environments. 2.8 Biosynthesis and degradation

D

Lactic acid bacteria generally inhabit protein-rich environments (including milk), and they are

TE

equipped with a protein-degradation machinery to create a selective advantage for growth under these conditions [12]. The uptake system (Opp) for peptides, which were the primary protein-degradation

CE P

products, was also found in the 5-2 genome, but another uptake system (Dpp) for peptides was not found in the 5-2 genome, in contrast to the L. plantarum WCSF1 genome. It has been proposed that the peptides were degraded by a variety of peptidases that were once internalized by lactococci and lactobacilli. Nineteen genes encoding intracellular peptidases with different specificities were identified in the 5-2

AC

genome (Table S10). Despite this elaborate protein degradation machinery, the L. plantarum 5-2 genome encoded complete pathways for the biosynthesis of 13 amino acids, with incomplete alanine, valine, leucine, isoleucine, proline, phenylalanine and tyrosine pathways, in contrast to the L. plantarum WCSF1 genome, which lacks the synthesis pathways for the three branched-chain amino acids valine, leucine, and isoleucine [12]. However, the nonribosomal peptide synthesis gene cluster found in L. plantarum WCSF1 genome was not identified in the 5-2 genome. 2.9 Regulation and signaling Two hundred and fifty-two regulatory proteins constituted a large class in the 5-2 genome, representing approximately 8.1% of the total proteins, which is similar to the 8.5% described in L. plantarum WCSF1 [12]. This class included three sigma factor-encoding genes, rpoD (JM48_1723), rpoN (JM48_0623), and rpoE (JM48_0525), and at least 11 sensor-regulator pairs belonging to the two-component regulator family (Table S11). The high proportion of regulatory genes seemed to be conserved in L. plantarum, which was similar to that in Pseudomonas aeroguinosa (8.4%) and Listeria 7

ACCEPTED MANUSCRIPT monocytogenes (7.3%) and was proposed to be a reflection of the many different environmental conditions faced by all three bacteria [12]. Most of these sensor-regulators appeared paired and belonged to six complete regulatory systems such as the VicK-VicR (cell wall metabolism) two-component regulatory system, PhoR-PhoB (phosphate

T

starvation response) two-component regulatory system, NreB-NreC (dissimilatory nitrate/nitrite reduction)

IP

two-component regulatory system, DesK-DesR (membrane lipid fluidity regulation) two-component regulatory system, AgrC-AgrA (exoprotein synthesis) two-component regulatory system and YesM-YesN

SC R

two-component regulatory system. The VicRK system has been shown to respond to and protect against oxidative stress in Streptococcus mutans [27]. The PhoR-PhoB system contains three proteins (phoP, phoR and phoB) and has been recognized to play a key role in regulating stress responses and virulence in

NU

Escherichia coli [28]. Three PhoB, two PhoR and two PhoP proteins were found in the 5-2 genome, and these proteins were not adjacent as in other systems but were scattered throughout the genome. The nreABC transcripts could be detected when the cells were grown aerobically or anaerobically with or

MA

without nitrate or nitrite. NreB and NreC consistently form a classical two-component system, and NreB acts as a sensor protein with oxygen, while the function of NreA is unknown in Staphylococcus carnosus [29]. The DesK-DesR system is involved in the transcriptional regulation of the desaturase gene to

D

maintain membrane lipid fluidity homeostasis during cold shock. However, the desaturase gene was not identified in the 5-2 genome, indicating some loss of function. Two AgrC-AgrA systems were found in the

TE

5-2 genome, including one AgrD (JM48_3051). The Agr system is the major quorum-sensing system, in which agrD encodes the autoinducing peptide pheromone (AIP) that can activate the two-component

CE P

AgrC-AgrA system to control the expression of many virulence factors and, primarily, regulate alterations in gene expression patterns when cells enter the post-exponential phase [30-32]. Orthologs of AgrC-AgrA system are widely found in Gram-positive bacteria, indicating that they play important roles in cell physiology [33]. The function of the YesM-YesN two-component regulatory system was not clear. In

AC

addition to these six complete systems, one incomplete PleC-PleD (cell fate control) two-component regulatory system was also identified in the 5-2 genome. The PleC-PleD system contained six members and functioned as a two-component phosphorylation network, which is critically important for bacterial growth and physiology. 2.10 Adaptation to stress As previously described for L. plantarum WCSF1, the 5-2 genome also encoded 79 genes for a number of stress-tolerance proteins (Table S12), including several proteases involved in the stress response [12]. The energy-dependent intracellular proteases ClpP (JM48_0622) and HslV (JM48_1624), which degrade aberrant and nonfunctional proteins, but not Lon were found in the 5-2 genome. Moreover, three small heat shock proteins from the HSP20 family, as well as the three highly homologous cold-shock proteins CspA (JM48_0027), CspC (JM48_0800), CspP (JM48_0936), were also identified in the genome. It has been proposed that lactic acid-producing bacteria must efficiently address the acidification of their local environment in addition to other common stress pathways [12]. The F0F1-ATPase is recognized as the major regulator of intracellular pH, and eight proteins (JM48_2064 to JM48_2071) were found in the 8

ACCEPTED MANUSCRIPT 5-2 genome. Moreover, eight sodium-proton (Na (+)/H (+)) antiporters, which play a central role in pH regulation and Na+ homeostasis [34], were identified in the 5-2 genome. In addition, two paralogous alkaline shock proteins (JM48_0746 and JM48_0747), which are expected to play a role in pH tolerance [35, 36], were also identified. In addition, two ABC transporters (choSQ, opuABCD) for the uptake and

T

biosynthesis of the osmoprotectants glycine/betaine/carnitine/choline were found in the 5-2 genome

IP

(JM48_0324- JM48_0325 and JM48_1352- JM48_1355). The genes involved in the oxidative stress response in L. plantarum WCSF1 were also found in the 5-2 genome, such as catalase, thiol peroxidase,

SC R

glutathione peroxidase, halo peroxidase, three thioredoxins, three glutathione reductases, five NADH-oxidases, and two NADH peroxidases. Similarly to the L. plantarum WCSF1 genome, superoxide dismutase was absent in the 5-2 genome, and consequently, the compensation system (intracellular

NU

accumulation of Mn2+ ions) for this enzyme was also present in the 5-2 genome [12, 37]. Twenty-three proteins for cation transport were identified in the 5-2 genome, including two MntA homologous proteins (JM48_1192 and JM48_1686) and three highly homologous natural resistance-associated macrophage

MA

protein (NRAMP)-like transporters, which were up-regulated in response to manganese starvation [12]. 2.11 Phages

The 5-2 genome contained four prophage elements (three intact and one incomplete). One of the three

D

intact prophage regions resembled Lactob_Sha1 (44.1 kb, region1), and two resembled Lactob_phig1e

TE

(42.3 kb and 45.6 kb, region2 and region3). The phig1e was considered to be the closest related phage for L. plantarum [38]. Moreover, the incomplete prophage region4 resembled the approximately 14.1-kb

CE P

Sphing_PAU (Fig. 2A).

Integrases are useful markers for mobile DNA elements such as prophages, integrative plasmids, and pathogenicity islands in bacterial genomes. Three integrases (JM48_0940, JM48_1436 and JM48_2145) were identified in the three intact prophage elements, respectively. All three complete prophage elements

AC

contained attL and attR sites, which were used to determine the extent of the prophage [39]. Consistent findings have been reported for L. plantarum WCSF1, in which the three intact prophage elements contain the entire packaging/ head/tail gene clusters and lysis cassette [39], DNA packaging genes (encoding small and large terminase, portal protein) and head genes (encoding protease and major head protein), as well as tail genes. Prophage region1 extended from 975,405 bp to 1,019,568 bp and contained 57 CDS with a complete prophage element from JM48_0940 (phage integrase) to JM48_0990. Prophage region2 extended from 1,487,964 bp to 1,530,329 bp and contained 54 CDS with a complete prophage element from JM48_1436 (phage integrase) to JM48_1489. Furthermore, prophage region3 extended from 2,147,045 bp to 2,192,663 bp and contained 57 CDS with a complete prophage element from JM48_2093 to JM48_2145 (phage integrase). The former two regions both contained attL sequences upstream of the integrase genes, while the integrase gene in region3 was located downstream from the attR sequence. In addition, the attL and attR sequences in region3 were completely the same (Table S13 and Fig. 2B&C). 2.12 Horizontal gene transfer Horizontal gene transfer between bacteria can occur via various mechanisms, including natural 9

ACCEPTED MANUSCRIPT competence and bacteriophage infection [12]. According to the sequence homology, most of the 5-2 genome genes were homologous to the L. plantarum genes, and only 81 genes (2.6% of the total genes in the genome, Table S14) may have been acquired by horizontal gene transfer from other microbes, such as Bacillus, Clostridium, Enterococcus faecalis, Enterococcus faecium, Enterococcus italicus, Enterococcus

T

malodoratus, Eubacterium rectal, Haemophilus paraphrohaemolyticus, L. acidipiscis, L. brevis, L. casei, L.

IP

coryniformis, L. farciminis, L. murinus, L. otakiensis, L. paracasei, L. pentosus, L. paraplantarum, L. phage, L. rossiae, L. suebicus, L. vini, Leuconostoc kimchii, Leu. mesenteroides, Listeria monocytogenes,

SC R

Melissococcus plutonius, Mogibacterium, Oenococcus oeni, Pediococcus pentosaceus, Peptostreptococcus anaerobius and Weissella ceti. The majority of the horizontally transferred genes were obtained from other species of Lactobacillus, especially L. pentosus and L. phage.

NU

In these 81 genes, three transposase genes (JM48_1003, JM48_1020 and JM48_3106) were identified, all of which were derived from L. coryniformis subsp. coryniformis CECT 5711. Adjacent to the first two transposase genes (JM48_1003, JM48_1020), two gene clusters were found (JM48_1007- JM48_1008,

MA

JM48_1019- JM48_1022) that are involved in outer membrane cell envelope biogenesis and DNA replication, recombination, and repair. In addition, most of these genes (approximately 45 genes) seemed to be phage-related, including those genes that were horizontally transferred from L. pentosus KCA1, L.

D

pentosus IG1 and L. phage. In addition, these phage-related genes consistently appeared as clusters (such as JM48_0961- JM48_0963, JM48_0968- JM48_0970, JM48_0978- JM48_0979, JM48_1436- JM48_1441, and JM48_2134- JM48_2137).

CE P

2.13 Evolutionary Position

TE

JM48_1458- JM48_1460, JM48_2098- JM48_2099, JM48_2105- JM48_2108, JM48_2116- JM48_2118

We identified 2,240 orthologous genes between L. plantarum 5-2 and the other six completely sequenced L. plantarum genomes. A phylogenetic tree based on 1,000 concatenated orthologous proteins

AC

(selected in-tandem from the 2,240 orthologous proteins) demonstrated the close genetic distance of all seven genomes, with the closest relationship observed between L. plantarum 5-2 and L. plantarum ST_III (Fig. 3). The genome size of 5-2 was similar to that of ST_III, JDM1, WCFS1 and ZJ316, while it was 204 kb and 193 kb larger than that of L. plantarum P8 and L. plantarum 16. A pan-genome analysis of these seven genomes showed that 4,123 and 2,240 genes were present in the pan-genome and core-genome, respectively (Table 2). The L. plantarum 5-2 genome was 16.7 kb smaller than that of ST_III, 39.9 kb larger than that of JDM1, and 70.6 kb smaller than that of WCFS1. In addition, 2,568 orthologs were identified between 5-2 (2,719 proteins, 87.3% of the total proteins in the genome) and ST_III (2,598 proteins, 90.7% of the total proteins in the genome). There were 2,521 orthologs identified between the 5-2 (85.8% of the total proteins in the genome) and JDM1 (90% of the total proteins in the genome) genomes, and there were 2,562 orthologs between the 5-2 (87.1% of the total proteins in the genome) and WCFS1 (83.8% of the total proteins in the genome) genomes. In addition, 167 proteins (5.4%) were identified as specific to L. plantarum 5-2 (Table S15).

10

ACCEPTED MANUSCRIPT 2.14 Genome comparison of strain 5-2 to other strains According to the evolutionary tree, L. plantarum 5-2 and L. plantarum ST_III, JDM1 and WCFS1 have a closer genetic distance. L. plantarum ST_III was isolated from kimchi, while L. plantarum JDM1 is

T

a Chinese commercial lactic acid bacterium; both are considered to be probiotics [13, 14]. The genome of L.

IP

plantarum WCFS1, which was isolated from human saliva, was first sequenced in 2001 and then re-sequenced and re-annotated in 2012 [12, 40].

SC R

MAUVE alignment and the Pan-genome method were used for the comparative analysis of these four genomes. The MAUVE alignment of the four genomes allowed the identification of approximately 8-10 Locally Collinear Blocks (LCB, filled with different colors) that were interspaced by their specific DNA stretches of various lengths (Fig. 4). According to the alignment and Pan-genome results, there were 24

NU

DNA notable fragments. No. 1 referred to the orthologous DNA fragments in the 5-2, ST-III and JDM1 genomes with no homology to the WCFS1 genome. No. 2 referred to the orthologous DNA fragments that

MA

were only identified in the 5-2 and WCFS1 genomes but were absent in the other two genomes. No. 3 indicated DNA inversions (blocks below the central line) that distinguish JDM1 from WCFS1 and are absent in the 5-2 and ST-III genomes. No. 4 to 24 (blanks penned in red boxes) referred to specific DNA fragments that were present in the four genomes, respectively.

D

Fragment No. 1, which was inverted in ST-III, was approximately 38.8 kb and included 51 genes

TE

(JM48_0940-JM48_0990) that were mainly elements of prophage region1 in the 5-2 genome. Fragment No. 2 was approximately 35.6 kb and included 47 genes (JM48_1440-JM48_1487) that consisted mostly of

CE P

prophage region2 in the 5-2 genome. Fragment No. 3 was approximately 13.9 kb and included 20 genes (JDM1_0030-JDM1_0049) that were mainly elements of the intact prophage region1 in the JDM1 genome but represented no more than half of the largest intact prophage region (69.3 kb) in the WCFS1 genome. However, the 13.9-kb fragment in the WCFS1 genome encoded the prophage P2b proteins, indicating that S16).

AC

the prophage P2a proteins from the largest intact prophage region were absent in the JDM1 genome (Table No. 4 to No. 6 referred to specific DNA fragments in the 5-2 genome. Region No. 4 encoded approximately nine proteins (JM48_1187-JM48_1195), including a cadmium/manganese transporting P family ATPase, an enolase and two DNA mismatch repair proteins. Region No. 5 encoded approximately eight proteins (JM48_1388-JM48_1395), all of which were hypothetical proteins and were homologous to those in L. plantarum ZJ316. Region No. 6 was approximately 23 kb in length and encoded 28 proteins (JM48_2096-JM48_2123) that were located in prophage region3 in the 5-2 genome and consisted of half of prophage region3. In conclusion, most of the specific proteins in the four genomes seemed to be related to their prophage elements. Moreover, many specific genes in the WCFS1 genome appeared to be involved in sugar metabolism, which indicated that L. plantarum WCFS1 might possess an improved ability to adapt to the environment compared with the other three microbes.

3. Materials and methods 11

ACCEPTED MANUSCRIPT 3.1 Bacterial growth and DNA extraction For L. plantarum 5-2 growth, we used a modified ATCC 1699 broth (0.8 g glucose, 20% pig serum, (Axygen, Inc., USA) was used to purify the DNA.

IP

3.2 High-density pyrosequencing and sequence assembly of the genome

T

100 units of penicillin, and 0.05% acetic acid thallium). A commercial tissue genomic DNA extraction kit

SC R

Complete genomic sequencing was conducted using a Roche GS FLX system. A total of 461,088 reads totaling 112,218,073 bases (average read length: 487 bp) was obtained, resulting in 23.7-fold genome coverage. Assembly was performed using the GS de novo Assembler software (http://www.454.com/) and produced 91 contigs ranging from 500 bp to 446,820 bp (the N50 contig size is 129,220 bp). The

NU

relationship of the contigs was determined using ContigScape and multiplex PCR. The gaps were then filled in by sequencing the PCR products using ABI 3730xl capillary sequencers. Phred, Phrap and Consed software packages (http://www.genome.washington.edu) were used for final assembly and editing, and low

MA

quality regions of the genome were resequenced. The final sequencing accuracy was 99.9991%. 3.3 Genome annotation

D

Putative CDS were identified using Glimmer 3.02 [41] and ZCURVE 1.02 [42], and peptides shorter

TE

than 30 amino acids were eliminated. Inserted sequences were first detected using the IS Finder database (http://www-is.biotoul.fr/is.html) with default parameters and manual selection. Transfer RNA genes were predicted with tRNAScan-SE. Functional annotation of the CDS was performed by searching against a

CE P

non-redundant (NR) protein database developed in-house using BLASTP [43] and the CDD databases (for Clusters of Orthologous Group (COG) analysis) using RPS-BLAST. The metabolic pathways were constructed using the KEGG database. The subcellular localization of the proteins was predicted with the 4.1

Server

(http://www.cbs.dtu.dk/services/SignalP)

and

TMHMM

Server

v.

2.0

AC

SignalP

(http://www.cbs.dtu.dk/services/TMHMM) [44, 45], and lipoproteins were identified with LipoP 1.0 [46]. The IS elements were predicted using the IS Finder (https://www-is.biotoul.fr) [47]. Prophage sequences were identified, annotated and graphically displayed with PHAST (http://phast.wishartlab.com). Genome comparisons were performed using the Mauve algorithm. The genome atlas was drawn using GenomeViz1.1 [48]. Genes that had undergone horizontal transfer were defined using BlastP against the non-redundant protein (NR) database. If the homologous protein of a gene originated from an organism other than Lactobacillus plantarum with an identity ≥80%, it was considered to be a horizontally transferred gene. A new pan-genome analysis pipeline (PGAP) (http://pgap.sf.net) was used to identify the orthologs among the known L. genomes. 3.4 Phylogenetic tree construction Orthologs of known L. plantarum genomes were obtained from the NCBI database. The phylogenetic position of the 5-2 genome within the Mollicutes was determined based on 1000 orthologous proteins. Concatenated protein sequences of 1000 orthologous L. plantarum species proteins were first aligned using 12

ACCEPTED MANUSCRIPT ClustalW, and the conserved alignment blocks were then extracted with the Gblocks program [49]. A maximum likelihood tree was built with PHYML [50] using the following parameters: 100 replications for bootstrap analysis, “JTT” for the substitution model, “estimated” for the proportion of invariable sites, “estimated” for the gamma distribution parameters, “4” for the number of substitution categories, “yes” to

IP

T

optimize tree topology, and “BIONJ” for starting tree(s). 3.5 Nucleotide sequence accession number

SC R

The complete genomic sequences of L. strain 5-2 have been deposited in GenBank under accession number CP009236.

NU

Acknowledgments

This study was funded by the National Natural Science Foundation of China (31160309, 31260397)

MA

and Personnel Training Fund of Kunming University of Science and Technology (KKSY201226110).

References

D

[1] B. Aslim, Z. Yuksekdag, E. Sarikaya and Y. Beyatli, Determination of the bacteriocin-like substances (2005) 691-694.

TE

produced by some lactic acid bacteria isolated from Turkish dairy products. LWT- Food Sci. Technol. 36 [2] L. Avonts, E. Van Uytven and L. De Vuyst, Cell growth and bacteriocin production of probiotic Lactobacillus strains in different media. Int. Dairy J. 14 (2004) 947-955.

CE P

[3] L. Moles, M. Gomez, H. Heilig, G. Bustos, S. Fuentes, W. de Vos, L. Fernandez, J.M. Rodriguez and E. Jimenez, Bacterial diversity in meconium of preterm neonates and evolution of their fecal microbiota during the first month of life. Plos One 8 (2013) e66986. [4] D. Wouters, S. Grosu-Tudor, M. Zamfir and L. De Vuyst, Bacterial community dynamics, lactic acid

AC

bacteria species diversity and metabolite kinetics of traditional Romanian vegetable fermentations. J. Sci. Food Agric. 93 (2013) 749-760. [5] M. Gotteland, M. Jose Cires, C. Carvallo, N. Vega, M. Antonieta Ramirez, P. Morales, P. Rivas, F. Astudillo, P. Navarrete and C. Dubos, Probiotic screening and safety evaluation of Lactobacillus strains from plants, artisanal goat cheese, human stools, and breast milk. J. Med. Food 17 (2014) 487-495. [6] P. Luxananil, R. Promchai, S. Wanasen, S. Kamdee, P. Thepkasikul, V. Plengvidhya, W. Visessanguan and R. Valyasevi, Monitoring Lactobacillus plantarum BCC 9546 starter culture during fermentation of Nham, a traditional Thai pork sausage. Int. J. Food Microbiol. 129 (2009) 312-315. [7] E. Songisepp, P. Huett, M. Raetsep, E. Shkut, S. Koljalg, K. Truusalu, J. Stsepetova, I. Smidt, H. Kolk and M. Zagura, Safety of a probiotic cheese containing Lactobacillus plantarum Tensia according to a variety of health indices in different age groups. J. Dairy Sci. 95 (2012) 5495-5509. [8] S. van Hemert, M. Meijerink, D. Molenaar, P.A. Bron, P. de Vos, M. Kleerebezem, J.M. Wells and M.L. Marco, Identification of Lactobacillus plantarum genes modulating the cytokine response of human peripheral blood mononuclear cells. BMC Microbiol. 10 (2010) DOI: 10.1186/1471-2180-10-293. [9] H. Bukowska, J. Pieczul-Mroz, M. Jastrzebska, K. Chelstowski and M. Naruszewicz, Decrease in fibrinogen and LDL-cholesterol levels upon supplementation of diet with Lactobacillus plantarum in subjects with moderately elevated cholesterol. Atherosclerosis 137 (1998) 437-438. 13

ACCEPTED MANUSCRIPT [10] M. Naruszewicz, M. Johansson, D. Zapolska-Downar and H. Bukowska, Effect of Lactobacillus plantarum 299v on cardiovascular disease risk factors in smokers. Am. J. Clin. Nutr. 76 (2002) 1249-1255. [11] B. Chevallier, J.-C. Hubert and B. Kammerer, Determination of chromosome size and number of rrn loci in Lactobacillus plantarum by pulsed-field gel electrophoresis. FEMS Microbiol. Lett. 120 (1994) 51-56.

T

[12] M. Kleerebezem, J. Boekhorst, R.v. Kranenburg, D. Molenaa, O.P. Kuipers, R. Leer, R. Tarchini, S.A. Peters, H.M. Sandbrink, M.W.E.J. Fiers, W. Stiekema, R.M.K. Lankhorst, P.A. Bron, S.M. Hoffer, M.N.N.

IP

Groot, R. Kerkhoven, M.d. Vries, B. Ursing, W.M.d. Vos and R.J. Siezen, Complete genome sequence of Lactobacillus plantarum WCFS1. Proc. Natl. Acad. Sci. U S A 100 (2003) 1990-1995.

SC R

[13] Z.Y. Zhang, C. Liu, Y.Z. Zhu, Y. Zhong, Y.Q. Zhu, H.J. Zheng, G.P. Zhao, S.Y. Wang and X.K. Guo, Complete genome sequence of Lactobacillus plantarum JDM1. Genome Announc. 191 (2009) 5020-5021. [14] Y. Wang, C. Chen, L. Ai, F. Zhou, Z. Zhou, L. Wang, H. Zhang, W. Chen and B. Guo, Complete genome sequence of the probiotic Lactobacillus plantarum ST-III. J. Bacteriol. 193 (2011) 313-314.

NU

[15] L. Axelsson, I. Rud, K. Naterstad, H. Blom, B. Renckens, J. Boekhorst, M. Kleerebezem, S.v. Hijum and R.J. Siezen, Genome sequence of the naturally plasmid-free Lactobacillus plantarum strain NC8 (CCUG 61730). J. Bacteriol. 194 (2012) 2391-2392.

MA

[16] P. Mackiewicz, J. Zakrzewska-Czerwinska, A. Zawilak, M.R. Dudek and S. Cebrat, Where does bacterial replication start? Rules for predicting the oriC region. Nucleic Acids Res. 32 (2004) 3781-3791. [17] Z. Kelman and M. O'Donnell, DNA polymerase III holoenzyme: structure and function of a chromosomal replicating machine. Annu. Rev. Biochem. 64 (1995) 171–200.

D

[18] N. Opalka, M. Chlenov, P. Chacon, W.J. Rice, W. Wriggers and A. Seth, Darst structure and function of the

TE

transcription elongation tactor GreB bound to bacterial RNA polymerase. Cell 114 (2003) 335–345. [19] C.J. Cardinale, R.S. Washburn, V.R. Tadigotla, L.M. Brown, M.E. Gottesman and E. Nudler, Termination factor Rho and its cofactors NusA and NusG silence foreign DNA in E. coli. Science 320 (2008) 935-938.

CE P

[20] I. Sá-Nogueira and L.J. Mota, Negative regulation of L-arabinose metabolism in Bacillus subtilis: characterization of the araR (araC) gene. J. Bacteriol. 179 (1997) 1598-1608. [21] P. Kotrba, M. Inui and H. Yukawa, Bacterial phosphotransferase system (PTS) in carbohydrate uptake and control of carbon metabolism. J. Biosci. Bioeng. 92 (2001) 502-517.

AC

[22] T. Ferain, A.N. Schanck and J. Delcour, 13C nuclear magnetic resonance analysis of glucose and citrate end products in an ldhL-ldhD double-knockout strain of Lactobacillus plantarum. J. Bacteriol. 178 (1996) 7311-7315.

[23] P.A. Scotti, M.L. Urbanus, J. Brunner, J.W. de Gier, G. von Heijne, C. van der Does, A.J. Driessen, B. Oudega and J. Luirink, YidC, the Escherichia coli homologue of mitochondrial Oxa1p, is a component of the Sec translocase. EMBO J. 19 (2000) 542-549. [24] R. Siezen, J. Boekhorst, L. Muscariello, D. Molenaar, B. Renckens and M. Kleerebezem, Lactobacillus plantarum gene clusters encoding putative cell-surface protein complexes for carbohydrate utilization are conserved in specific gram-positive bacteria. BMC Genomics 7 (2006) 126. [25] S. Brinster, S. Furlan and P. Serro, C-terminal WxL domain mediates cell wall binding in Enterococcus faecalis and other gram-positive bacteria. J. Bacteriol. 189 (2007) 1244-1253. [26] H. Tettelin, K.E. Nelson, I.T. Paulsen, J.A. Eisen, T.D. Read, S. Peterson, J. Heidelberg, R.T. DeBoy, D.H. Haft, R.J. Dodson, A.S. Durkin, M. Gwinn, J.F. Kolonay, W.C. Nelson, J.D. Peterson, L.A. Umayam, O. White, S.L. Salzberg, M.R. Lewis, D. Radune, Erik Holtzapple, Hoda Khouri, Alex M. Wolf, T.R. Utterback, C.L. Hansen, L.A. McDonald, T.V. Feldblyum, S. Angiuoli, T. Dickinson, E.K. Hickey, I.E. Holt, B.J. Loftus, F. Yang, H.O. Smith, J.C. Venter, B.A. Dougherty, D.A. Morrison, S.K. Hollingshead and C.M. Fraser, Complete genome sequence of a virulent isolate of Streptococcus Pneumoniae. Science 293 (2001) 14

ACCEPTED MANUSCRIPT 498–506. [27] D.M. Deng, M.J. Liu, J.M. ten Cate and W. Crielaard, The VicRK system of Streptococcus mutans responds to oxidative stress. J. Dent. Res. 86 (2007) 606-610. [28] S. Crépin, S.M. Chekabab, G. Le Bihan, N. Bertrand, C.M. Dozois and J. Harel, The Pho regulon and the

T

pathogenesis of Escherichia coli. Vet. Microbiol. 153 (2011) 82-88. [29] I. Fedtke, A. Kamps, B. Krismer and F. Götz, The nitrate reductase and nitrite reductase operons and the

IP

narT gene of Staphylococcus carnosus are positively controlled by the novel two-component system NreBC. J. Bacteriol. 184 (2002) 6624-6634.

SC R

[30] R.P. Novick, Autoinduction and signal transduction in the regulation of staphylococcal virulence. Mol. Microbiol. 48 (2003) 1429–1449.

[31] P. Recsei, Regulation of exoprotein gene expression in Staphylococcus aureus by agar. Mol. Gen. Genet. 202 (1986) 58–61.

NU

[32] T. Xue, Y. You, D. Hong, H. Sun and B. Sun, The Staphylococcus aureus KdpDE two-component system couples extracellular K+ sensing and Agr signaling to infection programming. Infect. Immun. 79 (2011) 1098-5522.

MA

[33] S. Bronner, H. Monteil and G. Prevost, Regulation of virulence determinants in Staphylococcus aureus: complexity and applications. FEMS Microbiol. Rev. 28 (2004) 183–200. [34] S. D'Souza, A. Garcia-Cabado, F. Yu, K. Teter, G. Lukacs, K. Skorecki, H. Moore, J. Orlowski and S. Grinstein, The epithelial sodium-hydrogen antiporter Na+/H+ exchanger 3 accumulates and is functional in

D

recycling endosomes. J. Biol. Chem. 273 (1998) 2035-2043.

TE

[35] E. Padan, E. Bibi, M. Ito and T.A. Krulwich, Alkaline pH homeostasis in bacteria: New insights. Biochimica Et Biophysica Acta (BBA)-Biomembranes 1717 (2005) 67-88. [36] S. Flahaut, A. Hartke, J.C. Giard and Y. Auffray, Alkaline stress response in Enterococcus faecalis: 812-814.

CE P

Adaptation, cross-protection, and changes in protein synthesis. Appl. Environ. Microbiol. 63 (1997) [37] F.S. Archibald and I. Fridovich, Manganese, superoxide dismutase, and oxygen tolerance in some lactic acid bacteria. J. Bacteriol. 145 (1981) 442-451.

AC

[38] F. Desiere, R.D. Pridmore and H. Brüssow, Comparative genomics of phages and prophages in lactic acid bacteria. Virology 275 (2000) 294-300. [39] M. Ventura, C. Canchaya, M. Kleerebezem, W.M. de Vos, R.J. Siezen and H. Brϋssow, The prophage sequences of Lactobacillus plantarum strain WCFS1. Virology 316 (2003) 245-255. [40] R. Siezen, C. Francke, B. Renckens, J. Boekhorst, M. Wels, M. Kleerebezem and S.A.F.T. van Hijum, Complete resequencing and reannotation of the Lactobacillus plantarum WCFS1 genome. J. Bacteriol. 194 (2012) 195-196. [41] A.L. Delcher, D. Harmon, S. Kasif, O. White and S.L. Salzberg, Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 27 (1999) 4636-4641. [42] F.B. Guo, H.Y. Ou and C.T. Zhang, ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Res. 31 (2003) 1780-1789. [43] S.F. Altschul, T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller and D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25 (1997) 3389-3402. [44] T.N. Petersen, S. Brunak, G. von Heijne and H. Nielsen, SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods 8 (2011) 785-786. [45] A. Krogh, B. Larsson, G. von Heijne and E.L.L. Sonnhammer, Predicting transmembrane protein topology 15

ACCEPTED MANUSCRIPT with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 305 (2001) 567-580. [46] O. Rahman, S.P. Cummings, D.J. Harrington and I.C. Sutcliffe, Methods for the bioinformatic identification of bacterial lipoproteins encoded in the genomes of Gram-positive bacteria. World J. Microbiol. Biotechnol. 24 (2008) 2377-2382.

T

[47] P. Siguier, J. Pérochon, L. Lestrade, J. Mahillon and M. Chandler, IS finder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 34 (2006) D32-D36.

IP

[48] R. Ghai, T. Hain and T. Chakraborty, GenomeViz: visualizing microbial genomes. BMC Bioinformatics 5 (2004) 198.

SC R

[49] K. Tamura, D. Peterson, N. Peterson, G. Stecher, M. Nei and S. Kumar, MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol.Biol. Evol. 28 (2011) 2731-2739.

[50] S. Guindon and O. Gascuel, A simple, fast, and accurate algorithm to estimate large phylogenies by

AC

CE P

TE

D

MA

NU

maximum likelihood. Syst. Biol. 52 (2003) 696-704.

16

ACCEPTED MANUSCRIPT Figure Legends

SC R

IP

T

Fig. 1. Chromosome Atlas of Lactobacillus plantarum Strain 5-2. The 1st and 2nd circles illustrate predicted coding sequences on the plus and minus strands, respectively, and are colored according to different functional categories. The functional category represented by each color is listed in Table S18. The 3rd circle displays the pseudogenes (grey). The 4th circle represents tRNAs (blue) and ribosomal RNA genes (red). The 5th and 6th (innermost) circles represent the mean centered G+C content of the genome (red-above mean, blue-below mean) and GC skew (G-C)/(G+C), respectively, which were calculated using a 1-kb window in steps of 500 bp.

NU

Fig. 2. Prophage regions and predicted elements in the 5-2 genome. Part A represents the prophage regions predicted in the 5-2 genome, including three complete regions (region 1, 2 and 3) and one incomplete region 4. Parts B and C show the three intact prophage elements, including the attL and attR sequences (marked), which were considered to indicate the extent of the prophage elements present.

MA

Fig. 3. Phylogenetic tree of seven completely sequenced Lactobacillus plantarum genomes. The tree was constructed using PhyML algorithms and the maximum-likelihood method.

AC

CE P

TE

D

Fig. 4. Comparison of the genomic structure of Lactobacillus plantarum strains 5-2, JDM1 and WCFS1. The graph represents an alignment of the colinear blocks identified by MAUVE that are conserved in the four closely related genomes: Lactobacillus plantarum strains 5-2, ST-III, JDM1 and WCFS1. Numbers from 1 to 24 refer to important fragments. Number 1 refers to orthologous DNA fragments in the 5-2, ST-III and JDM1 genomes with no homology to the WCFS1 genome. Number 2 refers to orthologous DNA fragments that were only identified in the 5-2 and WCFS1 genomes and were absent in the other two genomes. Number 3 denotes DNA inversions (blocks below the central line) that distinguish JDM1 from WCFS1 and are absent in the 5-2 and ST-III genomes. Numbers 4 to 24 (blanks penned in red boxes) refer to specific DNA fragments that are present in the four genomes, respectively.

17

TE

D

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

AC

CE P

Fig. 1.

18

AC

CE P

TE

D

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

19

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

AC

CE P

TE

D

Fig. 2.

20

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

AC

CE P

TE

D

MA

Fig. 3.

21

MA

NU

SC R

IP

T

ACCEPTED MANUSCRIPT

AC

CE P

TE

D

Fig. 4.

22

ACCEPTED MANUSCRIPT Table 1 Characteristics of the 5-2 genome. 5-2

Total size, bp Overall GC content, % GC content of CDS, % Number of CDS Average gene length (bp) CDS* as % of genome sequence Number of genes with unknown function Number of rRNA genes Number of tRNA genes

3,237,652 44.7 45.7 3,114 876 84.3 628 16 72

AC

CE P

TE

D

MA

NU

SC R

IP

T

Genome

23

ACCEPTED MANUSCRIPT Table 2 Pangenome analysis of seven complete sequenced L.plantarum genomes. CP009236 NC_014554 NC_012984 NC_004567 NC_020229 NC_021224 NC_021514

3,237,652 3,254,376 3,197,759 3,308,273 3,203,964 3,033,566 3,044,678

GC, %

3,114 2,996 2,947 3,057 3,159 2,892 2,778

44.7 44.6 44.7 44.5 44.6 44.8 44.7

MA D TE CE P AC

24

PanGenome

CoreGenome

4,123

2,240

T

L.plantarum_5-2 L.plantarum_ST_III L.plantarum_JDM1 L.plantarum_WCFS1 L.plantarum_ZJ316 L.plantarum_P8 L.plantarum_16

Genes

IP

Size, bp

SC R

Accession

NU

Genome

ACCEPTED MANUSCRIPT

T

IP SC R NU MA D TE CE P AC

    

Highlights The Lactobacillus plantarum strain 5-2 isolated from fermented soybeans. Fourteen complete insertion sequence elements were found in the genome. There were 24 DNA replication proteins and 76 DNA repair proteins in 5-2 genome. The genome encodes the key enzymes required for the EMP and phosphoketolase pathways. An extracellular serine protease existed in 5-2 different with L. plantarum WCSF1.

25