Journal Pre-proof Comparative genomics of Lactobacillus fermentum suggests a free-living lifestyle of this lactic acid bacterial species Marko Verce, Marko Verce, Luc De Vuyst, Luc De Vuyst, Stefan Weckx PII:
S0740-0020(20)30037-X
DOI:
https://doi.org/10.1016/j.fm.2020.103448
Reference:
YFMIC 103448
To appear in:
Food Microbiology
Received Date: 15 July 2019 Revised Date:
12 December 2019
Accepted Date: 26 January 2020
Please cite this article as: Verce, M., Verce, M., De Vuyst, L., De Vuyst, L., Weckx, S., Comparative genomics of Lactobacillus fermentum suggests a free-living lifestyle of this lactic acid bacterial species, Food Microbiology, https://doi.org/10.1016/j.fm.2020.103448. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Elsevier Ltd. All rights reserved.
-
28 Lactobacillus fermentum strains clustered into five clades. Unclear grouping of strains by isolation source indicated a free-living lifestyle. Many traits, including the usage of xylose and arabinose, were strain-dependent. Some traits of L. fermentum IMDO 130101 were relevant for sourdough production.
1
Comparative genomics of Lactobacillus fermentum suggests a free-living lifestyle of this
2
lactic acid bacterial species
3 4
Marko Verce, Luc De Vuyst, Stefan Weckx*
5 6 7
Research Group of Industrial Microbiology and Food Biotechnology (IMDO), Faculty of
8
Sciences and Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium
9 10 11
Email:
12
Marko Verce:
[email protected]
13
Luc De Vuyst:
[email protected]
14
Stefan Weckx:
[email protected]
15 16
*Correspondent footnote
17
Mailing address:
18
Research Group of Industrial Microbiology and Food Biotechnology (IMDO), Vrije
19
Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium.
20 21
Phone: +32 2 6293245
22
Fax: +32 2 6292720
23
E-mail:
[email protected]
24 25
1
26
ABSTRACT
27 28
Lactobacillus fermentum is a lactic acid bacterium frequently isolated from mammal tissues,
29
milk, and plant material fermentations, such as sourdough. A comparative genomics analysis
30
of 28 L. fermentum strains enabled the investigation of the core and accessory genes of this
31
species. The core protein phylogenomic tree of the strains examined, consisting of five
32
clades, did not exhibit clear clustering of strains based on isolation source, suggesting a free-
33
living lifestyle. Based on the presence/absence of orthogroups, the largest clade, containing
34
most of the human-related strains, was separated from the rest. The extended core genome
35
included genes necessary for the heterolactic fermentation. Many traits were found to be
36
strain-dependent, for instance utilisation of xylose and arabinose. Compared to other strains,
37
the genome of L. fermentum IMDO 130101, a candidate starter culture strain capable of
38
dominating sourdough fermentations, contained unique genes related to the metabolism of
39
starch degradation products, which could be advantageous for growth in sourdough matrices.
40
This study explained the traits that were previously demonstrated for L. fermentum IMDO
41
130101 at the genetic level and provided future avenues of research regarding L. fermentum
42
strains isolated from sourdough.
43 44
KEYWORDS
45 46
Genomics; comparative genomics; sourdough; Lactobacillus fermentum; carbohydrate
47
metabolism
48
2
49
1. Introduction
50 51
Sourdough is a mixture of flour and water that is fermented by lactic acid bacteria (LAB) and
52
yeasts, which acidifies the bread dough, provides it with leavening capacity, modifies its
53
flavour, and retards the growth of spoilage microorganisms, thus prolonging the shelf-life of
54
the end-products (Gobbetti et al., 2005; Rehman et al., 2006; De Vuyst et al., 2014, 2017).
55
The most typical LAB species found in sourdough is the obligately heterofermentative
56
Lactobacillus sanfranciscensis, whose small genome of 1.3 Mbp reflects its high adaptation
57
to the sourdough environment (Vogel et al., 2011; De Vuyst et al., 2014; Van Kerrebroeck et
58
al., 2017). Apart from L. sanfranciscensis, a wide variety of LAB species is often prevailing
59
in different types of sourdoughs made from different flour types and using various
60
approaches and technological conditions, such as Lactobacillus fermentum, Lactobacillus
61
paralimentarius, Lactobacillus plantarum, and Lactobacillus reuteri (De Vuyst et al., 2014,
62
2017).
63
Lactobacillus fermentum is a heterofermentative LAB species that occurs not only in food
64
fermentation ecosystems, such as sourdough fermentation (De Vuyst et al., 2014, 2017),
65
fermenting cocoa pulp-bean mass (De Vuyst and Weckx, 2016), and other plant material
66
fermentations (Endo et al., 2008; Morita et al., 2008), but also in environments such as
67
(breast) milk (Martín et al., 2003; Bao et al., 2010; Jiménez et al., 2010; Lehri et al., 2015;
68
Sun et al., 2015), and as part of human-related microbiota, for example in the colon,
69
urogenital tract, and oral cavity (Rogosa et al., 1953; Grover et al., 2013; Lee et al., 2017).
70
Lactobacillus fermentum IMDO 130101 was isolated from a rye sourdough backslopped in
71
the laboratory (Van der Meulen et al., 2007; Ravyts and De Vuyst, 2011). Laboratory
72
fermentations have revealed that this strain grows well in sourdough fermentations, ferments
73
maltose, converts fructose into mannitol, tolerates acidic conditions down to pH 3.0, and has
3
74
an active arginine deiminase (ADI) pathway (Vrancken et al., 2008, 2009a, 2009b), which
75
means it could be used as a starter culture.
76
In the present study, a comparative genomics approach was applied to investigate the core
77
metabolic pathways of the L. fermentum species as well as the genetic basis of the attributes
78
characterised for L. fermentum IMDO 130101. Furthermore, possible adaptations of L.
79
fermentum IMDO 130101 to the sourdough environment were investigated through its
80
genome sequence annotation to underline its importance as candidate sourdough starter
81
culture strain.
82 83
2. Materials and methods
84 85
Unless stated otherwise, software tools were used with their default settings.
86 87
2.1. Genome sequences used
88 89
All 27 L. fermentum genomes available in the RefSeq Assembly database (O’Leary et al.,
90
2016) at the time of the study were included in the analyses, as well as the complete L.
91
fermentum IMDO 130101 genome (Table 1; Verce et al., 2018). For L. fermentum FTDC
92
8312, a strain that has been sequenced and annotated twice, the assembly with the highest
93
quality status was used. The genome sequence of Lactobacillus gorillae KZ01 (RefSeq
94
Assembly accession number GCF_001293735.1) was used as an outgroup in all analyses
95
performed, since this species is a close relative of L. fermentum (Duar et al., 2017).
96 97
2.2. Comparative genomics
98
4
99
2.2.1. Orthogroup inference
100
Amino acid sequences encoded in all L. fermentum genomes examined, as well as L. gorillae
101
KZ01, were used for orthogroup (OG) inference. The comparison was performed with
102
OrthoFinder 1.1.10 using DIAMOND as alignment tool (Emms and Kelly, 2015; Buchfink et
103
al., 2015). Pangenome and the strict core genome estimates were calculated as the sum of
104
OGs present in any strain, including singletons, or the number of OGs present in all strains
105
examined, respectively. The calculations were made based on the results of OrthoFinder and
106
for a maximum of 500 combinations of strains for each number of strains, starting from two
107
till 28. The extended core genome was defined as a set of OGs that were present in at least 26
108
from the 28 strains examined, to prevent erroneous exclusion of OGs from the core genome
109
due to the draft status of some of the genomes considered (Lapierre and Gogarten, 2009).
110 111
2.2.2. Phylogenomic tree inference
112
Amino acid sequences from 584 single-copy core genes found with OrthoFinder, as described
113
in Section 2.2.1., that were also present in L. gorillae KZ01 were aligned with MUSCLE
114
(Edgar, 2004). The alignments were trimmed with trimAl (Capella-Gutierrez et al., 2009)
115
without allowing gaps. The trimmed alignments were then concatenated using an in-house
116
Python script. From these concatenated alignments, a rooted phylogenomic tree was created
117
using FastTree (Price et al., 2010). FastTree estimates the reliability of each split in the tree
118
by applying a Shimodaira-Hasegawa (SH) test with 1000 bootstrap replicates for the current
119
topology and the two alternate topologies, resulting in SH-like local support values for each
120
node of the tree. The splits with the local support values of ≥ 0.95 were considered to be
121
strongly supported.
122 123
2.2.3. Clustering based on presence/absence of orthogroups
5
124
All 28 L. fermentum strains and the L. gorillae KZ01 were clustered based on the table of OG
125
presence/absence obtained using OrthoFinder, as described in Section 2.2.2. R packages
126
vegan (Oksanen et al., 2016), massageR (Stanstrup, 2017), and gplots (Warnes et al., 2016)
127
were used to create a heatmap of OG presence/absence using Jaccard distances and Ward
128
clustering (Ward.D2) (R Core Team, 2018).
129 130
2.2.4. Average nucleotide identity calculation
131
The average nucleotide identity (ANI) values of the 28 L. fermentum strains and L. gorillae
132
KZ01 were calculated with the OrthoANIu tool (Yoon et al., 2017), relying on an improved
133
OrthoANI algorithm. The phylogenomic tree and ANI values were visualized using the R
134
packages ape and ggtree (R Core Team, 2018; Paradis et al., 2004; Yu et al., 2017).
135 136
2.3. In silico analysis of potential metabolic pathways
137 138
Based on the OG and phylogenomic tree inferences, the core metabolic pathways of the
139
species L. fermentum were reconstructed. In addition, genes unique to L. fermentum IMDO
140
130101 were highlighted. As the function predictions for metabolic pathway reconstruction
141
were based on homologies to known protein sequences in databases and literature
142
information, the actual functions may differ from those deduced.
143
To confirm that the absence of glycerol dehydratase genes in the L. fermentum genomes was
144
not due to inadequate annotation, the amino acid sequences of Lactobacillus reuteri JCM
145
1112T glycerol dehydratase subunits (GenBank accession numbers BAG26149.1-
146
BAG26151.1) were used as query sequences in BLAST searches with the blastp algorithm in
147
the National Center for Biotechnology information (NCBI) non-redundant protein sequences
148
(nr) database, while limiting the search to L. fermentum-related sequences. A similar
6
149
confirmation was performed for the absence of L-serine O-acetyltransferase genes in L.
150
fermentum genomes, using the amino acid sequence of a Lactobacillus casei L-serine O-
151
acetyltransferase as query (GenBank accession number AEK48252.1).
152
For glucanotransferase sequence analysis, the amino acid sequences of glycosyl hydrolase
153
family 70 (GH70) proteins encoded by different strains of L. fermentum, the characterised
154
4,6-α-glucanotransferase sequences of L. reuteri 121, L. reuteri DSM 20015, and L. reuteri
155
ML1 (NCBI Protein accession numbers AAU08014.2, ABQ83597.1, and AAU08003.2,
156
respectively), as well as the L. reuteri 121 reuteransucrase sequence (AAU08015.1) and the
157
L. reuteri 180 dextransucrase sequence (AAU08001.1) were aligned with MUSCLE,
158
followed by trimming from the N-terminus to the amino acid residues WYRP using trimAl,
159
to allow the alignment of catalytic cores solely (Kralj et al., 2011). These trimmed sequences
160
were re-aligned with MUSCLE. Based on this alignment, an unrooted phylogenetic tree was
161
created using FastTree and visualised with FigTree (http://tree.bio.ed.ac.uk/).
162 163
3. RESULTS
164 165
3.1. Comparative genomics
166 167
The number of protein-encoding DNA sequences (CDSs) in the 28 L. fermentum genomes
168
examined varied from 1,496 to 2,126. Per genome, the CDSs were grouped into between
169
1,439 and 1,962 OGs, singletons included. Among all L. fermentum strains, 2,995 OGs were
170
found (Figure 1), among which were 441 singletons. The core genome of the species L.
171
fermentum, established through this analysis, contained 630 OGs (Figure 1), whereas the
172
extended core genome contained 1,231 OGs. The estimates of the strict core genome size and
173
the pangenome size stabilised when including additional genomes, indicating that the
7
174
estimates approached the actual strict core genome and pangenome sizes of the species
175
(Figure 1).
176
The origin of the strains agreed with their position on the phylogenomic tree only to a low
177
degree. Nevertheless, the strains clustered into five groups, which was supported by the ANI
178
values calculated (Figure 2A). The biggest group, Clade 5, included 17 strains, nine of which
179
originated from human hosts, four from raw milk, two from fermentations of plant-derived
180
materials, and two from an unknown source. Clade 2, which also included L. fermentum
181
IMDO 130101, was the second largest clade and consisted of four strains isolated from
182
fermentations of plant-derived material and one strain from saliva..
183
Based on the presence/absence of OGs in the genomes compared, the strains clustered into
184
two groups (Figure 2B). The first group consisted of all Clade 5 L. fermentum strains and the
185
sole Clade 3 L. fermentum strain. The second group consisted of Clade 1, Clade 2, and Clade
186
4, as well as the L. gorillae KZ01. Within the latter group, the strains were separated into
187
clades comparable to those on the phylogenomic tree.
188
Regarding the clades, there were 17, 40, 21, 14, and 284 OGs exclusively present in the
189
genomes of clades 1, 2, 3, 4, and 5, respectively. However, few of these OGs were present in
190
all members of each clade (Table 2). Regarding their origins, there were 37, 24, and 124 OGs
191
exclusively present in the genomes of the plant material-, milk-, and human host-related
192
strains, respectively. However, none of those OGs appeared in all members of the respective
193
niches.
194
Apart from seven OGs, encoding six hypothetical proteins and a 50S ribosomal protein L33,
195
there were no other OGs common to all strains except for L. fermentum IMDO 130101.
196
Conversely, there were 40 genes uniquely present in L. fermentum IMDO 130101. Of those,
197
23 encoded hypothetical proteins and three encoded transposases. The remaining 14 encoded
198
a putative stress protein, a sugar (glycoside-pentoside-hexuronide) transporter, an MFS-type
8
199
transporter, an amino acid transporter, a YihY family membrane protein, a putative 2-
200
hydroxyacid dehydrogenase, an HAD family hydrolase, a modification methylase, an
201
exopolyphosphatase with an interrupted gene, a trehalose-6-phosphate phosphorylase, and a
202
transport system possibly related to starch degradation.
203 204
3.2. In silico analysis of potential metabolic pathways
205 206
3.2.1. Metabolism of carbohydrates and related metabolic pathways
207 208
As to the carbohydrate metabolism, the L. fermentum core genome included genes related to
209
the
210
phosphotransferase system (PTS) and a glucokinase, whereas a fructokinase was present in
211
the extended core genome (Figure 3). Twenty-one of the 28 strains considered, including the
212
complete Clades 1, 2 and 3, had the genetic repertoire to reduce fructose into D-mannitol, as
213
mannitol 2-dehydrogenase genes were found in their genomes. Sucrose could be imported as
214
sucrose 6-phosphate via a PTS system and hydrolysed into glucose 6-phosphate and D-
215
fructose by a sucrose-(phosphate) hydrolase in 25 strains. Of those, 20 strains, including the
216
complete Clade 2, also possessed both a fructokinase gene and a glucose-6-phosphate
217
isomerase gene.
218
All genes necessary for heterolactic fermentation were found in the core genome, except for
219
the pyruvate kinase gene, which was found in the extended core genome. One L-lactate
220
dehydrogenase gene, at least two D-lactate dehydrogenase genes, and the pyruvate
221
dehydrogenase genes pdhABCD were also found in the core genome, enabling the species to
222
reduce pyruvate to L-lactic acid or D-lactic acid, or to convert it into acetyl-CoA through
223
oxidative decarboxylation. Alternatively, in Clade 2 and in one strain of Clade 5, pyruvate
three
mannose-specific
components
of
the
phosphoenolpyruvate:sugar
9
224
could also be converted into acetyl phosphate under aerobic conditions in a FAD- and
225
thiamine pyrophosphate (TPP)-dependent manner, due to a pyruvate oxidase gene. Acetyl
226
phosphate could in turn be converted into acetate, coupled to substrate-level phosphorylation,
227
by an acetate kinase, whose gene was present in all strains. Additionally, (S)-2-acetolactate
228
and carbon dioxide could be formed from pyruvate, followed by (S)-2-acetolactate
229
decarboxylation into acetoin. Diacetyl spontaneously formed from (S)-2-acetolactate could
230
also be converted into acetoin and further into 2,3-butanediol. However, despite the presence
231
of genes for the latter three conversions in the core genome, the genes necessary for (S)-2-
232
acetolactate formation from pyruvate were not present in four strains.
233
Assuming its import, ribose utilisation through phosphorylation and isomerisation to ribulose
234
5-phosphate by a ribokinase and a ribose-5-phosphate isomerase, respectively, was encoded
235
in the extended core genome, linking ribose metabolism with the heterolactic fermentation
236
pathway. Compared to all other strains, an additional, unique, ribose 5-phosphate isomerase A
237
gene was found for all strains in Clade 2. Similarly, the utilisation of gluconic acid through
238
phosphorylation to gluconate 6-phosphate was also encoded in the extended core genome. In
239
contrast, based on the OGs found in the genomes examined, xylose and arabinose could only
240
be utilised in the heterolactic fermentation pathway by eleven and 18 strains, respectively.
241
Xylose could be utilised by all members of Clade 2 and by six members of Clade 5, whereas
242
all members of Clades 1 and 2, as well as ten of 17 members of Clade 5, could utilise
243
arabinose. Similarly, lactose could be utilised by being split into D-galactose and D-glucose
244
by 25 strains, due to the presence of a lactose permease gene and LacL/LacM β-galactosidase
245
genes. An α-galactosidase gene was part of the extended core genome.
246
Genes necessary for glycerol utilisation, encoding glycerol kinase, glycerol-3-phosphate
247
dehydrogenase and triosephosphate isomerase, were present in the core genome, whereas
248
only 14 strains of Clade 5 and four strains of Clade 2 contained a glycerol transporter gene. A 10
249
glycerol dehydrogenase gene was present in 20 of the 28 strains examined, of which 16
250
belonged to Clade 5. The same was the case for the 1,3-propanediol dehydrogenase gene,
251
which was present in all Clade 5 strains, the sole Clade 3 strain, and two Clade 2 strains.
252
However, a glycerol dehydratase gene was not present in any L. fermentum genome.
253
A putative β-glucanase-encoding gene was present in six strains, namely the three Clade 1
254
strains, two Clade 2 strains and one Clade 5 strain. The predicted products belonged to the
255
glycoside hydrolase family 8, members of which cleave β-1,4 linkages of β-1,4 glucans,
256
xylans, chitosans, and lichenans.
257
An α-glucosidase gene that would enable the cleaving of glucose subunits from
258
maltodextrins was found in three Clade 2 strains and one Clade 5 strain isolated from the
259
human colon, including L. fermentum IMDO 130101. All genes necessary for maltose
260
utilisation were present in 24 of the 28 strains examined, including a maltose phosphorylase
261
gene and two neighbouring genes encoding a transporter and a β-phosphoglucomutase.
262
Additionally, 25 strains contained an oligo-1,6-glucosidase gene. Moreover, a cluster of genes
263
(LF130101_1262-1265) potentially involved in maltodextrin import appeared in L.
264
fermentum IMDO 130101 only. The 4.3-kbp nucleotide sequence containing the latter four
265
genes was 99 % identical to sequences from Lactobacillus hokkaidonensis (plasmid
266
pLOOC260-1), Lactobacillus brevis KB290, L. plantarum subsp. argentoratensis DSM
267
16365 (plasmid), L. brevis SRCM101106, L. brevis SRCM101174, L. brevis 100D8, L. brevis
268
ATCC 367, L. plantarum TMW 1.1623 (plasmid), L. paracasei IIA (plasmid), and a plasmid
269
of Pediococcus pentosaceus SRCM100194 (NCBI GenBank accession numbers AP014681.1,
270
AP012167.1,
271
CP017383.1, CP014986.1, and CP021926.1, respectively). In all these cases, as well as in L.
272
fermentum IMDO 130101, the cluster was flanked by a putative transposase gene, at least on
273
one side. Furthermore, the L. fermentum IMDO 130101 genome was the only one containing
CP032754.1,
CP021674.1,
CP021479.1,
CP015338.1,
CP000416.1,
11
274
a gene encoding a glycoside hydrolase family 65 protein, which was annotated as a trehalose-
275
6-phosphate phosphorylase.
276
A 29-kb region was present in strains L. fermentum IMDO 130101 and L930BB only and
277
contained genes encoding enzymes involved in the metabolism of carbohydrates and related
278
compounds, such as a putative fructuronate reductase, a glucarate dehydratase, a
279
gluconokinase, an aldose 1-epimerase, two uronate isomerases, and several glycoside
280
hydrolases, namely a putative O-glycosyl hydrolase similar to proteins of glycoside hydrolase
281
family 30, a β-glucuronidase, a putative polygalacturonase of glycoside hydrolase family 28,
282
and an α-glucosidase of glycoside hydrolase family 31, as well as two sugar (glycoside-
283
pentoside-hexuronide) transporters. An additional gene encoding a sugar (glycoside-
284
pentoside-hexuronide) transporter was present in the L. fermentum IMDO 130101 genome
285
only, although not in the same genomic region as the two other ones.
286
Based on the OGs found in the genomes, citrate could be metabolised by nine of the 28
287
strains examined, including four of the five strains isolated from milk (products), as the six
288
genes necessary for citrate lyase activity were present in their genomes. Lactobacillus
289
fermentum IMDO 130101 was not among them. A gene encoding a malic enzyme was
290
present in 21 strains, enabling the conversion of malate or oxaloacetate into pyruvate.
291
Alternatively, malate would be reversibly reduced to fumarate by 24 strains, as indicated by
292
the presence of a gene encoding a fumarate hydratase, and further into succinate by 23 of
293
those strains, due to the presence of a fumarate reductase flavoprotein subunit gene. Four
294
strains of Clade 5 were the only ones missing both of the former genes, whereas one
295
additional Clade 5 strain was missing one of the two genes. Most strains, namely 25 of the
296
28, could also convert L-citrulline and L-aspartate into L-arginine and L-fumarate via L-
297
argininosuccinate, at the cost of ATP, due to the presence of argininosuccinate synthase and
298
argininosuccinate lyase genes. 12
299 300
3.2.2. Amino acid biosynthesis
301 302
The genetic basis for the biosynthesis of L-aspartate from oxaloacetate, the biosynthesis of L-
303
asparagine, L-lysine, and L-threonine from L-aspartate, L-glycine from L-serine, L-glutamine
304
from L-glutamate, and the conversion of L-glutamine to L-glutamate was found in the
305
(extended) core genome, making these amino acids non-essential for this species. The core
306
genome also included the genes involved in L-cysteine biosynthesis from L-alanine and vice
307
versa. However, biosynthesis of L-cysteine from L-homoserine could not occur, due to the
308
absence of a L-serine O-acetyltransferase-encoding gene, although several additional
309
acetyltransferase-encoding genes were present. Additionally, apart from the phosphoserine
310
phosphatase-encoding gene, the other two genes necessary for the biosynthesis of L-serine
311
from 3-phospho-D-glycerate were present in the core genome. The genetic potential for the
312
biosynthesis of L-methionine from L-aspartate and L-cysteine, L-proline from L-glutamate or
313
L-ornithine, and L-arginine from L-glutamate and L-aspartate through the acetyl cycle was
314
strain-dependent. The genetic potential for the biosynthesis of L-histidine was also strain-
315
dependent, and it was only found in nine strains, all belonging to Clade 5. In contrast, the
316
branched-chain amino acids L-valine, L-leucine and L-isoleucine, and the aromatic amino
317
acids L-tryptophan, L-tyrosine and L-phenylalanine, were found to be essential amino acids
318
for all 28 strains of L. fermentum.
319
Seven of the 28 strains, four belonging to Clade 5, two to Clade 1, and the sole Clade 3
320
member, could decarboxylate L-glutamate to form 4-aminobutyric acid (GABA), due to the
321
presence of a glutamate decarboxylase gene. The genes for the complete ADI pathway,
322
including the arginine-ornithine antiporter, were found in 25 strains, as three strains had
323
frameshift mutations in up to two of the four genes. A gene coding for a putative arginase,
13
324
enabling a direct conversion of L-arginine into L-ornithine with concomitant urea production,
325
was present in 18 strains. However, this gene may also be encoding an enzyme with a related
326
function, as so far arginase activity has not been demonstrated experimentally in lactobacilli.
327 328
3.2.3. Cofactors
329 330
The core genome included all four genes necessary for the biosynthesis of CoA from (R)-
331
pantothenate as well as the pyridoxal kinase gene. Twenty of the 28 strains examined
332
contained all genes needed for riboflavin biosynthesis, and one such gene was missing in the
333
genomes of six strains, four of those belonging to Clade 5 and two to Clade 4. All six genes
334
for the biosynthesis of NAD+ and NADP+ from nicotinic acid were present in 23 strains,
335
whereas it was not found in one Clade 4 strain and four Clade 5 strains. Except for the
336
dihydroneopterin triphosphate pyrophosphohydrolase, all genes for the biosynthesis of folic
337
acid from GTP and 4-aminobenzoate were present in 25 strains. The same was the case for
338
the thiamine salvage pathway genes.
339 340
3.2.4. Additional sourdough-relevant traits
341 342
Seven strains, including L. fermentum IMDO 130101, possessed a gene encoding a GH70
343
protein. Based on a phylogenetic analysis, the amino acid sequences of their products formed
344
two groups on an unrooted tree (Figure 4A). Also, all amino acid sequences contained the
345
same conserved sites (Figure 4B). In the genomes of L. fermentum strains ATCC 14931, 39,
346
28-3-CHN, and NCC2970 of Clade 5, the genes were located in the same genomic region, as
347
indicated by the neighbouring genes. The C-terminal region of the gene product in L.
348
fermentum IMDO 130101 was 94 % identical and 97 % similar to the catalytic domain of a
14
349
previously characterised 4,6-α-glucanotransferase 4,6-αGT-W from L. reuteri DSM 20016
350
(NCBI GenBank accession number ABQ83597.1). The identity and similarity dropped to 73
351
% and 79 %, respectively, when comparing the full amino acid sequences. In comparison, the
352
identity and similarity between the full length amino acid sequences of the gene products of
353
L. fermentum IMDO 130101 and NCC2970 were 52 % and 63 %, respectively. The
354
sequences of the two Clade 1 strains, L. fermentum 779 LFER and DSM 20055, most likely
355
encoded a glucansucrase, as they contained a tryptophan residue whereas the sequences from
356
the rest of the strains contained a tyrosine residue at the same site.
357
A phenolic acid decarboxylase gene was present in 13 strains. The amino acid sequence of
358
the L. fermentum IMDO 130101 phenolic acid decarboxylase was highly similar to
359
previously experimentally characterised proteins, such as phenolic acid decarboxylases from
360
L. brevis RM84 (83 % identity, 89 % similarity; sequence obtained from Landete et al., 2010)
361
and Bacillus subtilis 168 (Swiss-Prot accession number O07006; 71 % identity, 86 %
362
similarity) and the p-coumaric acid decarboxylase from L. plantarum (UniProt accession
363
number P94900; 80 % identity, 90 % similarity).
364
An oleate hydratase-encoding gene was present in all strains of Clade 2 and in the Clade 5
365
strain FTDC 8312 isolated from faeces. Expression of this gene could enable the strains to
366
convert linoleic acid and oleic acid into hydroxylated derivatives.
367 368
4. DISCUSSION
369 370
Lactobacillus fermentum is a lactic acid bacterium species that is frequently isolated from
371
various sources, including plant material fermentations, such as sourdough. Since several
372
genome sequences of L. fermentum isolates from different sources are available, comparative
373
genomics could be used to better characterise the strain-dependent genetic potential, together
15
374
with the core genome and pangenome of this species. The inability to reconstruct the
375
heterofermentative pathway indicated the need to consider an extended core genome, a
376
concept introduced by Lapierre and Gogarten (2009). The actual core genome size of a
377
species is expected to be larger than the apparent (strict) core genome size, possibly due to
378
draft genomes not representing the full genome content of a strain or due to differences in
379
assembly and annotation quality.
380
The phylogenomic analysis of 28 L. fermentum strains, including the sourdough isolate L.
381
fermentum IMDO 130101, revealed that the strains could be grouped into five clades. The
382
clustering of strains into clades did not clearly correspond to their isolation source. However,
383
the largest clade (Clade 5) contained the majority of the strains that were isolated from human
384
hosts and all strains isolated from raw milk. Lactobacillus fermentum IMDO 130101 was
385
located in the clade with strains isolated almost exclusively from fermentations of plant-
386
derived materials, namely Clade 2. The size differences between the clades may reflect a bias
387
in research interests, whereby genomic research is mostly oriented towards isolates from the
388
human microbiome. Strains of L. fermentum did not partition into the five clades depending
389
on their isolation source in a way that would indicate niche specialisation, as is the case for L.
390
reuteri and its host specialisations (Su et al., 2012). This is in line with a previous proposition
391
that L. fermentum may be a species that is undergoing a reversion from a host-adapted
392
lifestyle typical for the L. reuteri group to a free-living one (Duar et al., 2017). In this
393
context, the acquisition of niche-specific genes from other members of the ecosystem would
394
be advantageous for strains of a species to survive in different niches (e.g., putative
395
maltodextrin import-related genes for starch-enriched environments). An alternative
396
explanation for the lack of partitioning could be the inherent connectedness between plant
397
material fermentations and the gastrointestinal tract of humans, as fermented plant products
398
are part of the human diet. Whereas the clades still separated when clustered based on the
16
399
presence/absence of OG, there was a bigger separation into two groups, namely Clades 5 and
400
3 versus Clades 1, 2, and 4. Moreover, L. gorillae did not cluster apart from the L. fermentum
401
strains, indicating that the intra-species diversity in L. fermentum was as big as the inter-
402
species diversity between L. fermentum and L. gorillae.
403
The L. fermentum (extended) core genome contained all genetic capabilities regarding the
404
carbohydrate metabolism and products derived thereof as mentioned in the emended species
405
description of L. fermentum (Dellaglio et al., 2004). However, the current study showed that
406
lactose utilisation was strain-dependent. In addition, the capability of the utilisation of xylose,
407
arabinose, β-glucans, α-glucosides, oxidation of pyruvate to acetyl phosphate, and citrate
408
metabolism was present in the accessory genome, and this for only a few strains. The
409
findings regarding xylose and arabinose fermentation were also in agreement with the species
410
description of L. fermentum, as their fermentation is considered as a variable trait of the
411
species (Dellaglio et al., 2004). L. fermentum IMDO 130101 had the genetic potential to
412
metabolise xylose, arabinose, β-glucans, α-glucosides, and oxidise pyruvate to acetyl
413
phosphate, but not to metabolise citrate.
414
Sourdough isolates such as L. fermentum IMDO 130101 are well fit to thrive in a sourdough
415
environment, as has been shown through fermentation experiments previously (Vrancken et
416
al., 2008, 2009a, 2009b). The main carbohydrates in flour are starch and starch degradation
417
products, such as maltodextrins and maltose, followed by sucrose (Gänzle, 2014), though
418
raffinose (an α-galactoside) and arabinoxylans are also present. The latter may be degraded to
419
arabinose and xylose (Gänzle, 2014). Utilisation of raffinose and arabinose by L. fermentum
420
IMDO 130101 was predicted as well, although their metabolism requires respective import
421
systems. The presence of oligo-1,6-glucosidase- and α-glucosidase-encoding genes suggested
422
cleavage of the glucose subunits from isomaltose and maltodextrins, which may possibly be
423
imported into the cells as well. Putative maltodextrin import-related genes were unique to L. 17
424
fermentum IMDO 130101 and could represent an advantage of this strain to the sourdough
425
environment. Analysis of its genome during the current study showed the presence of genes
426
involved in acid tolerance, transport and metabolism of plant-derived carbohydrates (in
427
particular maltose), an ADI pathway, and the production of mannitol from fructose, which are
428
all prevalent features in LAB species and strains indigenous to sourdough, confirming the
429
genetic adaptation to such fermentation previously demonstrated phenotypically (Gänzle et
430
al., 2007; Weckx et al., 2007). All these characteristics provide the genetic background
431
related to previous experimental findings regarding L. fermentum IMDO 130101 (Vrancken
432
et al., 2008, 2009a, 2009b) and explain the common appearance of L. fermentum strains in
433
sourdough (De Vuyst et al., 2017; Van Kerrebroeck et al., 2017). The production of mannitol
434
out of fructose as an alternative external electron acceptor is of special interest, as mannitol is
435
generally used as a sweetener with a low glycemic index (Saha and Racine, 2011; Ortiz et al.,
436
2013). However, using L. fermentum IMDO 130101 as a sourdough starter culture would
437
mainly impact the baked goods’ flavour due to the production of acetate that is commonly
438
associated with the reduction of fructose to mannitol in heterofermentative LAB (Hansen and
439
Schieberle, 2005).
440
The finding that the branched-chain amino acids L-valine, L-leucine and L-isoleucine, and
441
the aromatic amino acids L-tryptophan, L-tyrosine and L-phenylalanine, were essential amino
442
acids for all 28 strains of L. fermentum was in agreement with a growth study of L.
443
fermentum IFO 3956 (Kuratsu et al., 2010). Furthermore, the metabolism of L-arginine by
444
the enzymes of the ADI pathway has been experimentally shown for L. fermentum IMDO
445
130101 (Vrancken et al., 2009a, 2009b). This could result in accumulation and eventually
446
export of ornithine, a precursor of the bread crust flavour compound 2-acetyl-1-pyrroline
447
(Hofmann and Schieberle, 1998; Hansen and Schieberle, 2005; Gänzle et al., 2007).
448
Assuming there are no alternative pathways for amino acid biosynthesis, the conditions under
18
449
which L. fermentum thrives have to provide enough free (essential) amino acids and peptides,
450
which is for instance the case through endogenous flour protease activity in sourdough
451
fermentations (Gänzle et al., 2008).
452
The absence of biosynthetic genes pointed to the dependence of L. fermentum on (R)-
453
pantothenate, pyridoxal or pyridoxal 5-phosphate, nicotinate or β-nicotinate D-
454
ribonucleotide, thiamine or TPP, 7,8-dihydropteroate or tetrahydrofolate (or at least 4-amino
455
benzoate), and biotin. However, the absence of D-pantothenate does not affect the growth of
456
heterofermentative LAB species on pentoses (Zaunmüller et al., 2006). In contrast, growth on
457
hexoses demands redox balancing through the production of ethanol from acetyl phosphate,
458
which can be hindered upon limitation of coenzyme A, a D-pantothenate derivative, leading
459
to the necessity of alternative external electron acceptors (Zaunmüller et al., 2006).
460
The presence of the GH70 OG indicated that some L. fermentum strains could produce
461
glucans from sucrose (glucansucrase) or could act on starch/maltodextrin substrates to
462
produce linear isomalto/malto-polysaccharides (4,6-α-glucanotransferase) or an α-glucan
463
with
464
glucanotransferase; Leemhuis et al., 2013, 2014; Gangoiti et al., 2017). Based on the
465
tryptophan residue conserved in glucansucrases, the glycosyl hydrolases from the L.
466
fermentum strains 779_LFER and DSM 20055 are most likely glucansucrases (Vujičić-Žagar
467
et al., 2010; Ito et al., 2011). Except for the L. fermentum NCC2970 4,3-α-
468
glucanotransferase, the remaining L. fermentum glucanotransferase sequences are likely 4,6-
469
α-glucanotransferases, as they contain a QRKN motif that is conserved in these enzymes
470
(Gangoiti et al., 2017). The in vivo role of these glucanotransferases is unknown, although
471
they may play a role in scavenging and modifying oligosaccharides as substrates for the
472
biosynthesis of larger saccharides, inaccessible for other microorganisms (Kralj et al., 2011).
473
However, the 4,6-α-glucanotransferase activity in L. fermentum IMDO 130101, especially
alternating (α1->3)/(α1->4)-linkages
and
(α->3,4) branching points
(4,3-α-
19
474
when grown in a sourdough environment, needs to be investigated, in particular to elucidate
475
its contribution to texture formation in both doughs and breads, as well as its role as a
476
prebiotic soluble dietary fibre.
477
The phenolic acid decarboxylase gene present in almost half of the L. fermentum strains,
478
including L. fermentum IMDO 130101, indicated the capability to decarboxylate at least p-
479
coumaric acid, and possibly ferulic and other phenolic acids, as has been shown for L. brevis,
480
L. plantarum, and B. subtilis (Cavin et al., 1997, 1998; Landete et al., 2010; Rodríguez et al.,
481
2010), leading to the production of volatile compounds like 4-vinylphenol and 4-
482
vinylguaiacol. The concentrations of free phenolic acids in wheat and rye flour are low, yet,
483
their liberation contributes to the aroma of sourdough and its anti-oxidative effects (Boskov
484
Hansen et al., 2002; Rodríguez et al., 2009, 2010; Shewry et al., 2010). In particular, the
485
capability of microorganisms to convert phenolic acids raises their tolerance to these
486
inhibitory compounds (Gänzle, 2014).
487
Lactobacilli are known to hydrate unsaturated fatty acids to hydroxyl fatty acids (Gänzle,
488
2014; Chen et al., 2016). Breads made with sourdough fermented by antifungal fatty acid-
489
producing Lactobacillus hammesii are free of molds for a longer time than control breads
490
(Black et al., 2013). Further research could elucidate whether or not a similar protective
491
effect could also be conferred on breads made with sourdough fermented by L. fermentum
492
IMDO 130101.
493 494
5. CONCLUSIONS
495 496
Comparative genomics of 28 strains of the LAB species L. fermentum that have been isolated
497
from various sources showed that these strains could be grouped into five clades, the largest
498
of which contained most strains isolated from human hosts, whereas the second largest clade
20
499
contained almost exclusively strains isolated from plant material-based fermentations.
500
However, the clustering was not straightforward enough to indicate niche specialisation,
501
suggesting a rather free-living lifestyle for this species. The relatively modest extended core
502
genome encoded the genes necessary for the metabolism of several carbohydrates via the
503
heterolactic fermentation, showing these traits to be common to possibly all L. fermentum
504
strains. Two of the strains had the genetic features to produce glucans from sucrose, whereas
505
five others had the genetic features to restructure maltodextrins, which may contribute to the
506
sourdough and bread texture. Besides these properties, L. fermentum IMDO 130101
507
possessed, among others, genes for the utilisation of maltose, arabinose, xylose, and other
508
carbohydrates, as well as the production of acetoin, mannitol from fructose, and L-ornithine
509
from L-arginine, which are all traits beneficial for sourdough fermentation. Additionally,
510
genes encoding a putative starch degradation product import system were unique to this strain
511
and may represent an advantage in a sourdough environment. These results particularly
512
explain on the genetic level the traits that have previously been demonstrated for L.
513
fermentum IMDO 130101 phenotypically. Further, they provide new avenues of research
514
regarding L. fermentum strains isolated from sourdough.
515 516
ACKNOWLEDGEMENTS
517 518
The authors gratefully acknowledge Dr. ir. Sander Wuyts for helpful discussions regarding
519
data analysis and visualisations.
520 521
FUNDING
522
Part of this research was financed by the Research Council of the Vrije Universiteit Brussel
523
(SRP7 and IOF342 projects) and by the Research Foundation Flanders (FWO-Vlaanderen;
21
524
project number 1510809N). MV is the receiver of a PhD fellowship of the Research
525
Foundation Flanders (grant numbers 1119916N and 1111918N).
526 527
REFERENCES
528 529
Bao, Y., Zhang, Y., Zhang, Y., Liu, Y., Wang, S., Dong, X., Wang, Y., Zhang, H., 2010.
530
Screening of potential probiotic properties of Lactobacillus fermentum isolated from
531
traditional dairy products. Food Control. 21, 695-701.
532
Black, B.A., Zannini, E., Curtis, J.M., Gänzle, M.G., 2013. Antifungal hydroxy fatty acids
533
produced during sourdough fermentation: microbial and enzymatic pathways, and antifungal
534
activity in bread. Appl. Environ. Microbiol. 79, 1866-1873.
535
Boskov Hansen, H., Andreasen, M.F., Nielsen, M.M., Larsen, L.M., Bach Knudsen, K.E.,
536
Meyer, A.S., Christensen, L.P., Hansen, Å., 2002. Changes in dietary fibre, phenolic acids
537
and activity of endogenous enzymes during rye bread-making. Eur. Food. Res. Technol. 214,
538
33-42.
539
Buchfink, B., Xie, C., Huson, D.H., 2015. Fast and sensitive protein alignment using
540
DIAMOND. Nat. Methods. 12, 59-60.
541
Capella-Gutierrez, S., Silla-Martinez, J.M., Gabaldon, T., 2009. TrimAl: a tool for automated
542
alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25, 1972-1973.
543
Cavin, J.-F., Barthelmebs, L., Diviès, C., 1997. Molecular characterization of an inducible p-
544
coumaric acid decarboxylase from Lactobacillus plantarum: gene cloning, transcriptional
545
analysis, overexpression in Escherichia coli, purification and characterization. Appl. Environ.
546
Microbiol. 63, 1939-1944.
22
547
Cavin, J.-F., Dartois, V., Diviès, C., 1998. Gene cloning, transcriptional analysis, purification,
548
and characterization of phenolic acid decarboxylase from Bacillus subtilis. Appl. Environ.
549
Microbiol. 64, 1466-1471.
550
Chen, Y.Y., Liang, N.Y., Curtis, J.M., Gänzle, M.G. 2016. Characterization of linoleate 10-
551
hydratase of Lactobacillus plantarum and novel antifungal metabolites. Frontiers Microbiol.
552
7, 1561.
553
De Vuyst, L., Van Kerrebroeck, S., Harth, H., Huys, G., Daniel, H.-M., Weckx, S., 2014.
554
Microbial ecology of sourdough fermentations: diverse or uniform? Food Microbiol. 37, 11-
555
29.
556
De Vuyst, L., Van Kerrebroeck, S., Leroy, F., 2017. Microbial ecology and process
557
technology of sourdough fermentation. Adv. Appl. Microbiol. 100, 49-160.
558
De Vuyst, L., Weckx, S., 2016. The cocoa bean fermentation process: from ecosystem
559
analysis to starter culture development. J. Appl. Microbiol. 121, 5-17.
560
Dellaglio, F., Torriani, S., Felis, G.E., 2004. Reclassification of Lactobacillus cellobiosus
561
Rogosa et al. 1953 as a later synonym of Lactobacillus fermentum Beijerinck 1901. Int. J.
562
Sys. Evol. 54, 809-812.
563
Duar, R.M., Lin, X.B., Zheng, J., Martino, M.E., Grenier, T., Pérez-Muñoz, M.E., Leulier, F.,
564
Gänzle, M., Walter, J., 2017. Lifestyles in transition: evolution and natural history of the
565
genus Lactobacillus. FEMS Microbiol. Rev. 41, S27-S48.
566
Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high
567
throughput. Nucleic Acids Res. 32, 1792-1797.
568
Emms, D.M,. Kelly, S., 2015. OrthoFinder: solving fundamental biases in whole genome
569
comparisons dramatically improves orthogroup inference accuracy. BMC Genome Biol. 16,
570
157.
23
571
Endo, A., Mizuno, H., Okada, S., 2008. Monitoring the bacterial community during
572
fermentation of sunki, an unsalted, fermented vegetable traditional to the Kiso area of Japan.
573
Lett. Appl. Microbiol. 47, 221-226.
574
Gangoiti, J., van Leeuwen, S., Gerwig, G.J., Duboux, S., Vafiadi, C., Pijning, T., Dijkhuizen,
575
L., 2017. 4,3-α-Glucanotransferase, a novel reaction specificity in glycoside hydrolase family
576
70 and clan GH-H. Sci. Reports. 7, 39761.
577
Gänzle, M.G., Vermeulen, N., Vogel, R.F., 2007. Carbohydrate, peptide and lipid metabolism
578
of lactic acid bacteria in sourdough. Food Microbiol. 24, 128-138.
579
Gänzle, M.G., Loponen, J., Gobbeti, M., 2008. Proteolysis in sourdough fermentations:
580
mechanisms and potential for improved bread quality. Trends Food Sci. Technol. 19, 513-
581
521.
582
Gänzle, M.G., 2014. Enzymatic and bacterial conversions during sourdough fermentation.
583
Food Microbiol. 37, 2-10.
584
Gobbetti, M., De Angelis, M., Corsetti, A., Di Cagno, R., 2005. Biochemistry and physiology
585
of sourdough lactic acid bacteria. Trends Food Sci Technol. 16, 57-69.
586
Grover, S., Sharma, V.K., Mallapa, R.H., Batish, V.K., 2013. Draft genome sequence of
587
Lactobacillus fermentum Lf1, an Indian isolate of human gut origin. Genome Announc. 1,
588
e00883-13.
589
Hansen, A., Schieberle, P., 2005. Generation of aroma compounds during sourdough
590
fermentation: applied and fundamental aspects. Trends Food Sci. Technol. 16, 85-94.
591
Hofmann, T., Schieberle, P., 1998. 2-Oxopropanal, hydroxy-2-propanone, and 1-pyrroline –
592
important intermediates in the generation of the roast-smelling food flavor compounds 2-
593
acetyl-1-pyrroline and 2-acetyltetrahydropyridine. J. Agric. Food Chem. 46, 2270-2277.
24
594
Huerta-Cepas, J., Forslund, K., Coelho, L.P., Sklarczyk, D., Jensen, L.J., von Mering, C.,
595
Bork, P., 2017. Fast genome-wide functional annotation through orthology assignment by
596
eggNOG-Mapper. Mol. Biol. Evol. 34, 2115-2122.
597
Ito, K., Ito, S., Shimamura, T., Weyand, S., Kawarasaki, Y., Misaka, T., Abe, K., Kobayashi,
598
T., Cameron, A.D., Iwata, S., 2011. Crystal structure of glucansucrase from the dental caries
599
pathogen Streptococcus mutans. J. Mol. Biol. 2, 177-186.
600
Jiménez, E., Langa, S., Martín, V., Arroyo, R., Martín, R., Fernández, L., Rodríguez, J.M.,
601
2010. Complete genome sequence of Lactobacillus fermentum CECT 5716, a probiotic strain
602
isolated from human milk. J. Bacteriol. 192, 4800-4800.
603
Kralj, S., Grijpstra, P., van Leeuwen, S.S., Leemhuis, H., Dobruchowska, J.M., van der Kaaij.
604
R.M., Malik, A., Oetari, A., Kamerling, J.P., Dijkhuizen, L., 2011. 4,6-α-Glucanotransferase,
605
a novel enzyme that structurally and functionally provides an evolutionary link between
606
glycoside hydrolase enzyme families 13 and 70. Appl. Environ. Microbiol. 77, 8154-8163.
607
Kuratsu, M., Hamano, Y., Dairi, T., 2010. Analysis of the Lactobacillus metabolic pathway.
608
Appl. Environ. Microbiol. 76, 7299-7301.
609
Landete, J.M., Rodríguez, H., Curiel, J.A., de las Rivas, B., Mancheño, J.M., Muñoz, R.,
610
2010. Gene cloning, expression, and characterization of phenolic acid decarboxylase from
611
Lactobacillus brevis RM84. J. Ind. Microbiol. Biotechnol. 37, 617-624.
612
Lapierre, P., Gogarten, J.P., 2009. Estimating the size of the bacterial pan-genome. Trends in
613
Genet. 25, 107-110.
614
Lee, S., You, H.J., Kwon, B., Ko, G., 2017. Complete genome sequence of the plasmid-
615
bearing Lactobacillus fermentum strain SNUV175, a probiotic for women’s health isolated
616
from the vagina of a healthy South Korean woman. Genome Announc. 5, e00045-17.
617
Leemhuis, H., Dijkman, W.P., Dobruchowska, J.M., Pijning, T., Grijpstra, P., Kralj, S.,
618
Kamerling, J.P., Dijkhuizen, L., 2013. 4,6-α-glucanotransferase activity occurs more
25
619
widespread in Lactobacillus strains and constitutes a separate GH70 subfamily. Appl.
620
Microbiol. Biotechnol. 97, 181-193.
621
Leemhuis, H., Dobruchowska, J.M., Ebbelaar, M., Faber, F., Buwalda, P.L., van der Maarel,
622
M.J., Kamerling, J.P., Dijkhuizen, L., 2014. Isomalto/malto-polysaccharide, a novel soluble
623
dietary fiber made via enzymatic conversion of starch. J. Agric. Food Chem. 62, 12034-
624
12044.
625
Lehri, B., Seddon, A.M., Karlyhev, A.V., 2015. Lactobacillus fermentum 3872 genome
626
sequencing reveals plasmid and chromosomal genes potentially involved in a probiotic
627
activity. FEMS Microbiol. Lett. 362, fnv068.
628
Martín, R., Langa, S., Reviriego, C., Jiménez, E., Marín, M.L., Xaus, J., Fernández, L.,
629
Rodríguez, J.M., 2003. Human milk is a source of lactic acid bacteria for the infant gut. J.
630
Pediatr. 143, 754-758.
631
Morita, H., Toh, H., Fukuda, S., Horikawa, H., Oshima, K., Suzuki, T., Murakami, M.,
632
Hisamatsu, S., Kato, Y., Takizawa, T., Fukuoka, H., Yoshimura, T., Itoh, K., O'Sullivan, D.J.,
633
McKay, L.L., Ohno, H., Kikuchi, J., Masaoka, T., Hattori, M., 2008. Comparative genome
634
analysis of Lactobacillus reuteri and Lactobacillus fermentum reveal a genomic island for
635
reuterin and cobalamin production. DNA Res. 15, 151-161.
636
O'Leary, N.A., Wright, M.W., Brister, J.R., Ciufo, S., Haddad, D., McVeigh, R., Rajput, B.,
637
Robbertse, B., Smith-White, B., Ako-Adjei, D., Astashyn, A., Badretdin, A., Bao, Y.,
638
Blinkova, O., Brover, V., Chetvernin, V., Choi, J., Cox, E., Ermolaeva, O., Farrell, C.M.,
639
Goldfarb, T., Gupta, T., Haft, D., Hatcher, E., Hlavina, W., Joardar, V.S., Kodali, V.K., Li, W.,
640
Maglott, D., Masterson, P., McGarvey, K.M., Murphy, M.R., O'Neill, K., Pujar, S., Rangwala,
641
S.H., Rausch, D., Riddick, L.D., Schoch, C., Shkeda, A., Storz, S.S., Sun, H., Thibaud-
642
Nissen, F., Tolstoy, I., Tully, R.E., Vatsan, A.R., Wallin, C., Webb, D., Wu, W., Landrum,
643
M.J., Kimchi, A., Tatusova, T., DiCuccio, M., Kitts, P., Murphy, T.D., Pruitt, K.D., 2016.
26
644
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and
645
functional annotation. Nucleic Acids Res. 44, D733-D745.
646
Oksanen, J., Blanchet, F.G., Kindt, R., Legendre, P., Minchin, P.R., O’Hara, R.B., Simpson,
647
G.L., Solymos, P., Stevens, M.H.H., Wagner, H., 2016. vegan: community ecology package.
648
R package version 2.3.-5. https://CRAN.R-project.org/package=vegan.
649
Ortiz, M., Bleckwedel, J., Raya, R., Mozzi, F., 2013. Biotechnological and in situ production
650
of polyols by lactic acid bacteria. Appl. Microbiol. Biotechnol. 97, 4713-4726.
651
Paradis, E., Claude, J., Strimmer, K., 2004. APE: analyses of phylogenetics and evolution in
652
R language. Bioinformatics. 20, 289-290.
653
Price, M.N., Dehal, P.S., Arkin, A.P., 2010. FastTree 2 – Approximately maximum-likelihood
654
trees for large alignments. PLoS ONE. 5, e9490.
655
R Core Team, 2018. R: a language and environment for statistical computing. Vienna,
656
Austria. URL https://www.R-project.org/.
657
Ravyts, F., De Vuyst, L., 2011. Prevalence and impact of single-strain starter cultures of lactic
658
acid bacteria on metabolite formation in sourdough. Food Microbiol. 28, 1129-1139.
659
Rehman, S., Paterson, A., Piggott, J.R., 2006. Flavour in sourdough breads: a review. Trends
660
Food. Sci. Technol. 17, 557-566.
661
Rodríguez, H., Curiel, J.A., Landete, J.M., de las Rivas, B., López de Felipe, F., Gómez-
662
Cordovés, C., Mancheño, J.M., Muñoz, R., 2009. Food phenolics and lactic acid bacteria. Int
663
J. Food Microbiol. 132, 79-90.
664
Rodríguez, H., Angulo, I., de las Rivas, B., Campillo, N., Páez, J.A., Muñoz, R., Mancheño,
665
J.M., 2010. p-Coumaric acid decarboxylase from Lactobacillus plantarum: structural insights
666
into the active site and decarboxylation catalytic mechanism. Proteins. 78, 1662-1676.
27
667
Rogosa, M., Wiseman, R.F., Mitchell, J.A., Disraely, M.N., Beaman, A.J., 1953. Species
668
differentiation of oral lactobacilli from man including descriptions of Lactobacillus salivarius
669
nov. spec. and Lactobacillus cellobiosus nov. spec. J. Bacteriol. 65, 681-699.
670
Saha, B., Racine, F., 2011. Biotechnological production of mannitol and its applications.
671
Appl. Microbiol. Biotechnol. 89, 879-891.
672
Shewry, P.R., Piironen, V., Lampi, A.-M., Edelmann, M., Kariluoto, S., Nurmi, T., Fernandez-
673
Orozco, R., Andersson, A.A.M., Åman, P., Fraś, A., Boros, D., Gebruers, K., Dornez, E.,
674
Courtin, C.M., Delcour, J.A., Ravel, C., Charmet, G., Rakszegi, M., Bedo, Z., Ward, J.L.,
675
2010. Effects of genotype and environment on the content and composition of
676
phytochemicals and dietary fiber components in rye in the HEALTHGRAIN diversity screen.
677
J. Agric. Food Chem. 58, 9372-9383.
678
Stanstrup, J., 2017. Some code was provided by others on stackoverflow.com as indicated in
679
the
680
https://github.com/stanstrup/massageR.
681
Su, M.S.W., Oh, P.L., Walter, J., Gänzle, M.G., 2012. Intestinal origin of sourdough
682
Lactobacillus reuteri isolates as revealed by phylogenetic, genetic, and physiological
683
analysis. Appl. Environ. Microbiol. 78, 6777-6780.
684
Sun, Z., Zhang, W., Bilige, M., Zhang, H., 2015. Complete genome sequence of the probiotic
685
Lactobacillus fermentum F-6 isolated from raw milk. J. Biotech. 194, 110-111.
686
Van der Meulen, R., Scheirlink, I., Van Schoor, A., Huys, G., Vancanneyt, M., Vandamme, P.,
687
De Vuyst, L., 2007. Population dynamics and metabolite target analysis of lactic acid bacteria
688
during laboratory fermentations of wheat and spelt sourdoughs. Appl. Environ. Microbiol. 73,
689
4741-4750.
690
Van Kerrebroeck, S., Maes, D., De Vuyst, L., 2017. Sourdoughs as a function of their species
691
diversity and process conditions, a meta-analysis. Trends Food Sci. Technol. 68, 152-159.
individual
functions.
massageR:
massageR.
R
package
version
0.7.2.
28
692
Verce, M., De Vuyst, L., Weckx, S., 2018. Complete and annotated genome sequence of the
693
sourdough lactic acid bacterium Lactobacillus fermentum IMDO 130101. Genome Announc.
694
6, e00256-18.
695
Vogel, R., Pavlovic, M., Ehrmann, M.A., Wiezer, A., Liesegang, H., Offschanka, S., Voget,
696
S., Angelov, A., Böcker, G., Liebl, W., 2011. Genomic analysis reveals Lactobacillus
697
sanfranciscensis as stable element in traditional sourdoughs. Microb. Cell Fact. 10, S6.
698
Vrancken, G., Rimaux, T., De Vuyst, L., Leroy, F., 2008. Kinetic analysis of growth and sugar
699
consumption by Lactobacillus fermentum IMDO 130101 reveals adaptation to the acidic
700
sourdough ecosystem. Int. J. Food Microbiol. 128, 58-66.
701
Vrancken, G., Rimaux, T., Weckx, S., De Vuyst, L., Leroy, F., 2009a. Environmental pH
702
determines citrulline and ornithine release through the arginine deiminase pathway in
703
Lactobacillus fermentum IMDO 130101. Int. J. Food Microbiol. 135, 216-222.
704
Vrancken, G., Rimaux, T., Wouters, D., Leroy, F., De Vuyst, L., 2009b. The arginine
705
deiminase pathway of Lactobacillus fermentum IMDO 130101 responds to growth under
706
stress conditions of both temperature and salt. Food Microbiol. 26, 720-727.
707
Vujičić-Žagar, A., Pijning, T., Kralj, S., Lopez, C.A., Eeuwema, W., Dijkhuizen, L., Dijkstra,
708
B.W., 2010. Crystal structure of a 117 kDa glucansucrase fragment provides an insight into
709
evolution and product specificity of GH70 enzymes. Proc. Natl. Acad. Sci. USA. 50, 21406-
710
21411.
711
Warnes, G.R., Bolker, B., Bonebakker, L., Gentleman, R., Liaw, W.H.A., Lumley, T.,
712
Maechler, M., Magnusson, A., Moeller, S., Schwartz, M., Venables, B., 2016. gplots: various
713
R programming tools for plotting data. R package version 3.0.1. https://CRAN.R-
714
project.org/package=gplots.
715
Weckx, S., Van der Meulen, R., Maes, D., Scheirlink, I., Huys, G., Vandamme, P., De Vuyst,
716
L., 2007. Lactic acid bacteria community dynamics and metabolite production of rye
29
717
sourdough fermentations share characteristics of wheat and spelt sourdough fermentations.
718
Food Microbiol. 27, 1000-1008.
719
Yoon, S.H., Ha, S.M., Lim, J.M., Kwon, S.J., Chun, J., 2017. A large-scale evaluation of
720
algorithms to calculate average nucleotide identity. Antonie van Leeuwenhoek. 110, 1281-
721
1286.
722
Yu, G., Smith, D., Zhu, H., Guan, Y., Lam, T.T., 2017. ggtree: an R package for visualization
723
and annotation of phylogenetic trees with their covariates and other associated data. Methods
724
Ecol. Evol. 8, 28-36.
725
Zaunmüller, T., Eichert, M., Richter, H., Undern, G., 2006. Variations in the energy
726
metabolism of biotechnologically relevant heterofermentative lactic acid bacteria during
727
growth on sugars and organic acids. Appl. Microbiol. Biotechnol. 72, 421-429.
728 729
Tables
730 731
Table 1. Strains of the lactic acid bacterial species Lactobacillus fermentum, the genomes of
732
which were used in the comparative genomics analysis of the current study. NA – not
733
available.
734 Strain
Assembly status
Source
Category
Accession number
IMDO 130101 Complete
Sourdough
Plant
GCA_900205745.1
IFO 3956
Complete
Fermented plant material Plant
GCF_000010145.1
CECT 5716
Complete
Human milk
Milk
GCF_000210515.1
F-6
Complete
Raw milk
Milk
GCF_000397165.1
3872
Complete
Human milk
Milk
GCF_000466785.3
NCC2970
Complete
NA
NA
GCF_001742205.1
SNUV175
Complete
Human vagina
Human
GCF_001941785.1
30
LAC FRN-92 Complete
Oral cavity
Human
GCF_002192435.1
MTCC 25067 Complete
Fermented milk
Milk
GCF_002356135.1
FTDC 8312
Complete
Human faeces
Human
GCF_002119645.1
47-7
Chromosome Healthy infant
Human
GCF_001854105.1
ATCC 14931
Scaffold
Fermented beets
Plant
GCF_000159215.1
28-3-CHN
Scaffold
Urogenital tract
Human
GCF_000162395.1
NB-22
Scaffold
Human vagina
Human
GCF_000496435.1
LfQi6
Scaffold
Human milk
Milk
GCF_000966835.1
Lf1
Contig
Human faeces
Human
GCF_000472265.1
39
Contig
NA
NA
GCF_001010185.1
90 TC-4
Contig
NA
NA
GCF_001010245.1
L930BB
Contig
Human colon
Human
GCF_001039735.1
UCO-979C
Contig
Gastric biopsy
Human
GCF_001297905.1
222
Contig
Cocoa bean fermentation Plant
GCF_001368755.1
SHI-2
Contig
Saliva
Human
GCF_002591935.1
S6
Contig
Tchapalo
Plant
GCF_900163585.1
S13
Contig
Tchapalo
Plant
GCF_900163595.1
779_LFER
Scaffold
Human
Human
GCF_001077025.1
DSM 20055
Scaffold
Human saliva
Human
GCF_001436835.1
RI-508
Scaffold
Cocoa bean fermentation Plant
GCF_001982185.1
BFE 6620
Scaffold
Gari
GCF_002204495.1
Plant
735 736
Table 2. Overview of the number of orthogroups (OGs) unique to a clade, along with the
737
functions of those OGs that were present in all members of that particular clade. Clade
Number of OGs unique Functions of OGs unique to the clade that were present in to the clade all strains within the clade
Clade 1 17
Hypothetical protein Transcriptional regulator
Clade 2 40
Ribose-5-phosphate isomerase Putative divalent cation transporter
Clade 3 21
Hypothetical protein (14 OGs) Sensor histidine kinase Response regulator PAP2 family protein FMN-binding protein 2-Dehydropantoate 2-reductase Flippase 31
Sucrose phosphorylase Clade 4 14
β-Galactosidase (2 OGs or pseudogene) Acetoin reductase ABC transporter subunit (3 OGs) S-Ribosylhomocysteine lyase Glutamate dehydrogenase Malate dehydrogenase Nicotinamide mononucleotide transporter ADP-ribose pyrophosphatase Citrate_sodium symporter Isochorismatase
Clade 5 284
Hypothetical protein Transcriptional regulator
738 739 740
Legends to the figures
741 742
Fig. 1. The pangenome (in green) and strict core genome (in yellow) estimates expressed as
743
the number of orthogroups in relation to the number of strains (above). For each number of
744
strains, at most 500 combinations of strains were considered. Each point represents the
745
pangenome or the strict core genome estimate of one combination and the line connects the
746
arithmetic means of the pangenome or strict core genome estimates.
747 748
Fig. 2. (A) A rooted phylogenomic tree based on protein sequence alignment from 584 single-
749
copy core genes common to all 28 Lactobacillus fermentum strains examined and the
750
outgroup (Lactobacillus gorillae KZ01) and average nucleotide identities between the
751
genomes. The strains are highlighted according to their isolation source - human host (red),
752
milk (yellow), or plant material (green). (B) Orthogroup (OG) presence/absence clustering
753
dendrogram and heatmap. On the phylogenomic tree, the nodes denoted with black dots had a
754
local support value higher than 0.85; red dots denote nodes with a local support value lower
755
than 0.40; the blue dot denotes the root of the L. fermentum tree. On the OG presence/absence 32
756
heatmap, OGs are ordered on the x-axis from left to right according to the number of strains
757
they appear in; dark blue denotes OG presence, light blue OG absence.
758 759
Fig. 3. Schematic representation of the core carbohydrate metabolism and arginine deiminase
760
pathway of the lactic acid bacterial species Lactobacillus fermentum. Solid black arrows
761
represent reactions catalysed by enzymes encoded in its extended core genome. Grey arrows
762
represent reactions catalysed by enzymes in its accessory genome, shaded according to the
763
number of strains that possess the corresponding gene. Dotted lines represent assumed
764
transport. Dash-dotted lines represent abbreviations in the metabolic pathways. 1, β-
765
galactosidase;
766
uridylyltransferase;
767
uridylyltransferase; 6, phosphoglucomutase; 7, putative 1,4-α-glucosidase; 8, maltose
768
phosphorylase; 9, β-phosphoglucomutase; 10, glucokinase; 11, sucrose-(phosphate)
769
hydrolase; 12, mannitol 2-dehydrogenase; 13, fructokinase/hexokinase; 14, glucose-6-
770
phosphate
771
phosphogluconolactonase; 17, phosphogluconate dehydrogenase; 18, ribulose-phosphate 3-
772
epimerase; 19, xylulose-5-phosphate phosphoketolase; 20, acetate kinase; 21, phosphate
773
acetyltransferase; 22, aldehyde/alcohol dehydrogenase; 23, alcohol dehydrogenase; 24,
774
pyruvate dehydrogenase; 25, pyruvate oxidase; 26, acetolactate synthase; 27, α-acetolactate
775
decarboxylase; 28, 2,3-butanediol dehydrogenase; 29, (spontaneous); 30, diacetyl reductase
776
[(S)-acetoin-forming]; 31, L-lactate dehydrogenase; 32, D-lactate dehydrogenase; 33,
777
gluconokinase; 34, ribokinase; 35, ribose-5-phosphate isomerase; 36, xylose isomerase; 37,
778
xylulose kinase; 38, glycerol kinase; 39, glycerol-3-phosphate dehydrogenase; 40, triose
779
phosphate isomerase; 41, malate dehydrogenase (oxaloacetate-decarboxylating); 42, fumarate
2,
galactokinase; 4,
isomerase;
UDP-glucose
15,
3,
UDP-glucose:α-D-galactose-1-phosphate
4-epimerase;
glucose-6-phosphate
5,
UTP-glucose-1-phosphate
1-dehydrogenase;
16,
6-
33
780
hydratase; 43, arginine deiminase; 44, ornithine carbamoyltransferase; 45, carbamate kinase;
781
46, argininosuccinate synthase; 47, argininosuccinate lyase; and 48, putative arginase.
782 783
Fig. 4. (A) An unrooted tree based on the amino acid sequence comparison of catalytic cores
784
of glycosyl hydrolase family 70 (GH70) proteins encoded by seven strains of Lactobacillus
785
fermentum (indicated by their strain number only), of 4,6-α-glucanotransferases encoded by
786
Lactobacillus reuteri strains 121, DSM 20015, and ML1, of a reuteransucrase encoded by L.
787
reuteri strain 121, and of a dextransucrase encoded by L. reuteri strain 180. (B) Amino acid
788
sequence alignment of the catalytic cores of the GH70 proteins considered. Amino acid sites
789
are numbered according to the L. fermentum IMDO 130101 sequence in the alignment of
790
catalytic cores (from the amino acids WYRP onwards). Black dots denote conserved sites.
791
The blue dot denotes the location of a tryptophan residue conserved in glucansucrases.
34