Comparative genomics of Lactobacillus fermentum suggests a free-living lifestyle of this lactic acid bacterial species

Comparative genomics of Lactobacillus fermentum suggests a free-living lifestyle of this lactic acid bacterial species

Journal Pre-proof Comparative genomics of Lactobacillus fermentum suggests a free-living lifestyle of this lactic acid bacterial species Marko Verce, ...

2MB Sizes 0 Downloads 35 Views

Journal Pre-proof Comparative genomics of Lactobacillus fermentum suggests a free-living lifestyle of this lactic acid bacterial species Marko Verce, Marko Verce, Luc De Vuyst, Luc De Vuyst, Stefan Weckx PII:

S0740-0020(20)30037-X

DOI:

https://doi.org/10.1016/j.fm.2020.103448

Reference:

YFMIC 103448

To appear in:

Food Microbiology

Received Date: 15 July 2019 Revised Date:

12 December 2019

Accepted Date: 26 January 2020

Please cite this article as: Verce, M., Verce, M., De Vuyst, L., De Vuyst, L., Weckx, S., Comparative genomics of Lactobacillus fermentum suggests a free-living lifestyle of this lactic acid bacterial species, Food Microbiology, https://doi.org/10.1016/j.fm.2020.103448. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Elsevier Ltd. All rights reserved.

-

28 Lactobacillus fermentum strains clustered into five clades. Unclear grouping of strains by isolation source indicated a free-living lifestyle. Many traits, including the usage of xylose and arabinose, were strain-dependent. Some traits of L. fermentum IMDO 130101 were relevant for sourdough production.

1

Comparative genomics of Lactobacillus fermentum suggests a free-living lifestyle of this

2

lactic acid bacterial species

3 4

Marko Verce, Luc De Vuyst, Stefan Weckx*

5 6 7

Research Group of Industrial Microbiology and Food Biotechnology (IMDO), Faculty of

8

Sciences and Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium

9 10 11

Email:

12

Marko Verce: [email protected]

13

Luc De Vuyst: [email protected]

14

Stefan Weckx: [email protected]

15 16

*Correspondent footnote

17

Mailing address:

18

Research Group of Industrial Microbiology and Food Biotechnology (IMDO), Vrije

19

Universiteit Brussel, Pleinlaan 2, B-1050 Brussels, Belgium.

20 21

Phone: +32 2 6293245

22

Fax: +32 2 6292720

23

E-mail: [email protected]

24 25

1

26

ABSTRACT

27 28

Lactobacillus fermentum is a lactic acid bacterium frequently isolated from mammal tissues,

29

milk, and plant material fermentations, such as sourdough. A comparative genomics analysis

30

of 28 L. fermentum strains enabled the investigation of the core and accessory genes of this

31

species. The core protein phylogenomic tree of the strains examined, consisting of five

32

clades, did not exhibit clear clustering of strains based on isolation source, suggesting a free-

33

living lifestyle. Based on the presence/absence of orthogroups, the largest clade, containing

34

most of the human-related strains, was separated from the rest. The extended core genome

35

included genes necessary for the heterolactic fermentation. Many traits were found to be

36

strain-dependent, for instance utilisation of xylose and arabinose. Compared to other strains,

37

the genome of L. fermentum IMDO 130101, a candidate starter culture strain capable of

38

dominating sourdough fermentations, contained unique genes related to the metabolism of

39

starch degradation products, which could be advantageous for growth in sourdough matrices.

40

This study explained the traits that were previously demonstrated for L. fermentum IMDO

41

130101 at the genetic level and provided future avenues of research regarding L. fermentum

42

strains isolated from sourdough.

43 44

KEYWORDS

45 46

Genomics; comparative genomics; sourdough; Lactobacillus fermentum; carbohydrate

47

metabolism

48

2

49

1. Introduction

50 51

Sourdough is a mixture of flour and water that is fermented by lactic acid bacteria (LAB) and

52

yeasts, which acidifies the bread dough, provides it with leavening capacity, modifies its

53

flavour, and retards the growth of spoilage microorganisms, thus prolonging the shelf-life of

54

the end-products (Gobbetti et al., 2005; Rehman et al., 2006; De Vuyst et al., 2014, 2017).

55

The most typical LAB species found in sourdough is the obligately heterofermentative

56

Lactobacillus sanfranciscensis, whose small genome of 1.3 Mbp reflects its high adaptation

57

to the sourdough environment (Vogel et al., 2011; De Vuyst et al., 2014; Van Kerrebroeck et

58

al., 2017). Apart from L. sanfranciscensis, a wide variety of LAB species is often prevailing

59

in different types of sourdoughs made from different flour types and using various

60

approaches and technological conditions, such as Lactobacillus fermentum, Lactobacillus

61

paralimentarius, Lactobacillus plantarum, and Lactobacillus reuteri (De Vuyst et al., 2014,

62

2017).

63

Lactobacillus fermentum is a heterofermentative LAB species that occurs not only in food

64

fermentation ecosystems, such as sourdough fermentation (De Vuyst et al., 2014, 2017),

65

fermenting cocoa pulp-bean mass (De Vuyst and Weckx, 2016), and other plant material

66

fermentations (Endo et al., 2008; Morita et al., 2008), but also in environments such as

67

(breast) milk (Martín et al., 2003; Bao et al., 2010; Jiménez et al., 2010; Lehri et al., 2015;

68

Sun et al., 2015), and as part of human-related microbiota, for example in the colon,

69

urogenital tract, and oral cavity (Rogosa et al., 1953; Grover et al., 2013; Lee et al., 2017).

70

Lactobacillus fermentum IMDO 130101 was isolated from a rye sourdough backslopped in

71

the laboratory (Van der Meulen et al., 2007; Ravyts and De Vuyst, 2011). Laboratory

72

fermentations have revealed that this strain grows well in sourdough fermentations, ferments

73

maltose, converts fructose into mannitol, tolerates acidic conditions down to pH 3.0, and has

3

74

an active arginine deiminase (ADI) pathway (Vrancken et al., 2008, 2009a, 2009b), which

75

means it could be used as a starter culture.

76

In the present study, a comparative genomics approach was applied to investigate the core

77

metabolic pathways of the L. fermentum species as well as the genetic basis of the attributes

78

characterised for L. fermentum IMDO 130101. Furthermore, possible adaptations of L.

79

fermentum IMDO 130101 to the sourdough environment were investigated through its

80

genome sequence annotation to underline its importance as candidate sourdough starter

81

culture strain.

82 83

2. Materials and methods

84 85

Unless stated otherwise, software tools were used with their default settings.

86 87

2.1. Genome sequences used

88 89

All 27 L. fermentum genomes available in the RefSeq Assembly database (O’Leary et al.,

90

2016) at the time of the study were included in the analyses, as well as the complete L.

91

fermentum IMDO 130101 genome (Table 1; Verce et al., 2018). For L. fermentum FTDC

92

8312, a strain that has been sequenced and annotated twice, the assembly with the highest

93

quality status was used. The genome sequence of Lactobacillus gorillae KZ01 (RefSeq

94

Assembly accession number GCF_001293735.1) was used as an outgroup in all analyses

95

performed, since this species is a close relative of L. fermentum (Duar et al., 2017).

96 97

2.2. Comparative genomics

98

4

99

2.2.1. Orthogroup inference

100

Amino acid sequences encoded in all L. fermentum genomes examined, as well as L. gorillae

101

KZ01, were used for orthogroup (OG) inference. The comparison was performed with

102

OrthoFinder 1.1.10 using DIAMOND as alignment tool (Emms and Kelly, 2015; Buchfink et

103

al., 2015). Pangenome and the strict core genome estimates were calculated as the sum of

104

OGs present in any strain, including singletons, or the number of OGs present in all strains

105

examined, respectively. The calculations were made based on the results of OrthoFinder and

106

for a maximum of 500 combinations of strains for each number of strains, starting from two

107

till 28. The extended core genome was defined as a set of OGs that were present in at least 26

108

from the 28 strains examined, to prevent erroneous exclusion of OGs from the core genome

109

due to the draft status of some of the genomes considered (Lapierre and Gogarten, 2009).

110 111

2.2.2. Phylogenomic tree inference

112

Amino acid sequences from 584 single-copy core genes found with OrthoFinder, as described

113

in Section 2.2.1., that were also present in L. gorillae KZ01 were aligned with MUSCLE

114

(Edgar, 2004). The alignments were trimmed with trimAl (Capella-Gutierrez et al., 2009)

115

without allowing gaps. The trimmed alignments were then concatenated using an in-house

116

Python script. From these concatenated alignments, a rooted phylogenomic tree was created

117

using FastTree (Price et al., 2010). FastTree estimates the reliability of each split in the tree

118

by applying a Shimodaira-Hasegawa (SH) test with 1000 bootstrap replicates for the current

119

topology and the two alternate topologies, resulting in SH-like local support values for each

120

node of the tree. The splits with the local support values of ≥ 0.95 were considered to be

121

strongly supported.

122 123

2.2.3. Clustering based on presence/absence of orthogroups

5

124

All 28 L. fermentum strains and the L. gorillae KZ01 were clustered based on the table of OG

125

presence/absence obtained using OrthoFinder, as described in Section 2.2.2. R packages

126

vegan (Oksanen et al., 2016), massageR (Stanstrup, 2017), and gplots (Warnes et al., 2016)

127

were used to create a heatmap of OG presence/absence using Jaccard distances and Ward

128

clustering (Ward.D2) (R Core Team, 2018).

129 130

2.2.4. Average nucleotide identity calculation

131

The average nucleotide identity (ANI) values of the 28 L. fermentum strains and L. gorillae

132

KZ01 were calculated with the OrthoANIu tool (Yoon et al., 2017), relying on an improved

133

OrthoANI algorithm. The phylogenomic tree and ANI values were visualized using the R

134

packages ape and ggtree (R Core Team, 2018; Paradis et al., 2004; Yu et al., 2017).

135 136

2.3. In silico analysis of potential metabolic pathways

137 138

Based on the OG and phylogenomic tree inferences, the core metabolic pathways of the

139

species L. fermentum were reconstructed. In addition, genes unique to L. fermentum IMDO

140

130101 were highlighted. As the function predictions for metabolic pathway reconstruction

141

were based on homologies to known protein sequences in databases and literature

142

information, the actual functions may differ from those deduced.

143

To confirm that the absence of glycerol dehydratase genes in the L. fermentum genomes was

144

not due to inadequate annotation, the amino acid sequences of Lactobacillus reuteri JCM

145

1112T glycerol dehydratase subunits (GenBank accession numbers BAG26149.1-

146

BAG26151.1) were used as query sequences in BLAST searches with the blastp algorithm in

147

the National Center for Biotechnology information (NCBI) non-redundant protein sequences

148

(nr) database, while limiting the search to L. fermentum-related sequences. A similar

6

149

confirmation was performed for the absence of L-serine O-acetyltransferase genes in L.

150

fermentum genomes, using the amino acid sequence of a Lactobacillus casei L-serine O-

151

acetyltransferase as query (GenBank accession number AEK48252.1).

152

For glucanotransferase sequence analysis, the amino acid sequences of glycosyl hydrolase

153

family 70 (GH70) proteins encoded by different strains of L. fermentum, the characterised

154

4,6-α-glucanotransferase sequences of L. reuteri 121, L. reuteri DSM 20015, and L. reuteri

155

ML1 (NCBI Protein accession numbers AAU08014.2, ABQ83597.1, and AAU08003.2,

156

respectively), as well as the L. reuteri 121 reuteransucrase sequence (AAU08015.1) and the

157

L. reuteri 180 dextransucrase sequence (AAU08001.1) were aligned with MUSCLE,

158

followed by trimming from the N-terminus to the amino acid residues WYRP using trimAl,

159

to allow the alignment of catalytic cores solely (Kralj et al., 2011). These trimmed sequences

160

were re-aligned with MUSCLE. Based on this alignment, an unrooted phylogenetic tree was

161

created using FastTree and visualised with FigTree (http://tree.bio.ed.ac.uk/).

162 163

3. RESULTS

164 165

3.1. Comparative genomics

166 167

The number of protein-encoding DNA sequences (CDSs) in the 28 L. fermentum genomes

168

examined varied from 1,496 to 2,126. Per genome, the CDSs were grouped into between

169

1,439 and 1,962 OGs, singletons included. Among all L. fermentum strains, 2,995 OGs were

170

found (Figure 1), among which were 441 singletons. The core genome of the species L.

171

fermentum, established through this analysis, contained 630 OGs (Figure 1), whereas the

172

extended core genome contained 1,231 OGs. The estimates of the strict core genome size and

173

the pangenome size stabilised when including additional genomes, indicating that the

7

174

estimates approached the actual strict core genome and pangenome sizes of the species

175

(Figure 1).

176

The origin of the strains agreed with their position on the phylogenomic tree only to a low

177

degree. Nevertheless, the strains clustered into five groups, which was supported by the ANI

178

values calculated (Figure 2A). The biggest group, Clade 5, included 17 strains, nine of which

179

originated from human hosts, four from raw milk, two from fermentations of plant-derived

180

materials, and two from an unknown source. Clade 2, which also included L. fermentum

181

IMDO 130101, was the second largest clade and consisted of four strains isolated from

182

fermentations of plant-derived material and one strain from saliva..

183

Based on the presence/absence of OGs in the genomes compared, the strains clustered into

184

two groups (Figure 2B). The first group consisted of all Clade 5 L. fermentum strains and the

185

sole Clade 3 L. fermentum strain. The second group consisted of Clade 1, Clade 2, and Clade

186

4, as well as the L. gorillae KZ01. Within the latter group, the strains were separated into

187

clades comparable to those on the phylogenomic tree.

188

Regarding the clades, there were 17, 40, 21, 14, and 284 OGs exclusively present in the

189

genomes of clades 1, 2, 3, 4, and 5, respectively. However, few of these OGs were present in

190

all members of each clade (Table 2). Regarding their origins, there were 37, 24, and 124 OGs

191

exclusively present in the genomes of the plant material-, milk-, and human host-related

192

strains, respectively. However, none of those OGs appeared in all members of the respective

193

niches.

194

Apart from seven OGs, encoding six hypothetical proteins and a 50S ribosomal protein L33,

195

there were no other OGs common to all strains except for L. fermentum IMDO 130101.

196

Conversely, there were 40 genes uniquely present in L. fermentum IMDO 130101. Of those,

197

23 encoded hypothetical proteins and three encoded transposases. The remaining 14 encoded

198

a putative stress protein, a sugar (glycoside-pentoside-hexuronide) transporter, an MFS-type

8

199

transporter, an amino acid transporter, a YihY family membrane protein, a putative 2-

200

hydroxyacid dehydrogenase, an HAD family hydrolase, a modification methylase, an

201

exopolyphosphatase with an interrupted gene, a trehalose-6-phosphate phosphorylase, and a

202

transport system possibly related to starch degradation.

203 204

3.2. In silico analysis of potential metabolic pathways

205 206

3.2.1. Metabolism of carbohydrates and related metabolic pathways

207 208

As to the carbohydrate metabolism, the L. fermentum core genome included genes related to

209

the

210

phosphotransferase system (PTS) and a glucokinase, whereas a fructokinase was present in

211

the extended core genome (Figure 3). Twenty-one of the 28 strains considered, including the

212

complete Clades 1, 2 and 3, had the genetic repertoire to reduce fructose into D-mannitol, as

213

mannitol 2-dehydrogenase genes were found in their genomes. Sucrose could be imported as

214

sucrose 6-phosphate via a PTS system and hydrolysed into glucose 6-phosphate and D-

215

fructose by a sucrose-(phosphate) hydrolase in 25 strains. Of those, 20 strains, including the

216

complete Clade 2, also possessed both a fructokinase gene and a glucose-6-phosphate

217

isomerase gene.

218

All genes necessary for heterolactic fermentation were found in the core genome, except for

219

the pyruvate kinase gene, which was found in the extended core genome. One L-lactate

220

dehydrogenase gene, at least two D-lactate dehydrogenase genes, and the pyruvate

221

dehydrogenase genes pdhABCD were also found in the core genome, enabling the species to

222

reduce pyruvate to L-lactic acid or D-lactic acid, or to convert it into acetyl-CoA through

223

oxidative decarboxylation. Alternatively, in Clade 2 and in one strain of Clade 5, pyruvate

three

mannose-specific

components

of

the

phosphoenolpyruvate:sugar

9

224

could also be converted into acetyl phosphate under aerobic conditions in a FAD- and

225

thiamine pyrophosphate (TPP)-dependent manner, due to a pyruvate oxidase gene. Acetyl

226

phosphate could in turn be converted into acetate, coupled to substrate-level phosphorylation,

227

by an acetate kinase, whose gene was present in all strains. Additionally, (S)-2-acetolactate

228

and carbon dioxide could be formed from pyruvate, followed by (S)-2-acetolactate

229

decarboxylation into acetoin. Diacetyl spontaneously formed from (S)-2-acetolactate could

230

also be converted into acetoin and further into 2,3-butanediol. However, despite the presence

231

of genes for the latter three conversions in the core genome, the genes necessary for (S)-2-

232

acetolactate formation from pyruvate were not present in four strains.

233

Assuming its import, ribose utilisation through phosphorylation and isomerisation to ribulose

234

5-phosphate by a ribokinase and a ribose-5-phosphate isomerase, respectively, was encoded

235

in the extended core genome, linking ribose metabolism with the heterolactic fermentation

236

pathway. Compared to all other strains, an additional, unique, ribose 5-phosphate isomerase A

237

gene was found for all strains in Clade 2. Similarly, the utilisation of gluconic acid through

238

phosphorylation to gluconate 6-phosphate was also encoded in the extended core genome. In

239

contrast, based on the OGs found in the genomes examined, xylose and arabinose could only

240

be utilised in the heterolactic fermentation pathway by eleven and 18 strains, respectively.

241

Xylose could be utilised by all members of Clade 2 and by six members of Clade 5, whereas

242

all members of Clades 1 and 2, as well as ten of 17 members of Clade 5, could utilise

243

arabinose. Similarly, lactose could be utilised by being split into D-galactose and D-glucose

244

by 25 strains, due to the presence of a lactose permease gene and LacL/LacM β-galactosidase

245

genes. An α-galactosidase gene was part of the extended core genome.

246

Genes necessary for glycerol utilisation, encoding glycerol kinase, glycerol-3-phosphate

247

dehydrogenase and triosephosphate isomerase, were present in the core genome, whereas

248

only 14 strains of Clade 5 and four strains of Clade 2 contained a glycerol transporter gene. A 10

249

glycerol dehydrogenase gene was present in 20 of the 28 strains examined, of which 16

250

belonged to Clade 5. The same was the case for the 1,3-propanediol dehydrogenase gene,

251

which was present in all Clade 5 strains, the sole Clade 3 strain, and two Clade 2 strains.

252

However, a glycerol dehydratase gene was not present in any L. fermentum genome.

253

A putative β-glucanase-encoding gene was present in six strains, namely the three Clade 1

254

strains, two Clade 2 strains and one Clade 5 strain. The predicted products belonged to the

255

glycoside hydrolase family 8, members of which cleave β-1,4 linkages of β-1,4 glucans,

256

xylans, chitosans, and lichenans.

257

An α-glucosidase gene that would enable the cleaving of glucose subunits from

258

maltodextrins was found in three Clade 2 strains and one Clade 5 strain isolated from the

259

human colon, including L. fermentum IMDO 130101. All genes necessary for maltose

260

utilisation were present in 24 of the 28 strains examined, including a maltose phosphorylase

261

gene and two neighbouring genes encoding a transporter and a β-phosphoglucomutase.

262

Additionally, 25 strains contained an oligo-1,6-glucosidase gene. Moreover, a cluster of genes

263

(LF130101_1262-1265) potentially involved in maltodextrin import appeared in L.

264

fermentum IMDO 130101 only. The 4.3-kbp nucleotide sequence containing the latter four

265

genes was 99 % identical to sequences from Lactobacillus hokkaidonensis (plasmid

266

pLOOC260-1), Lactobacillus brevis KB290, L. plantarum subsp. argentoratensis DSM

267

16365 (plasmid), L. brevis SRCM101106, L. brevis SRCM101174, L. brevis 100D8, L. brevis

268

ATCC 367, L. plantarum TMW 1.1623 (plasmid), L. paracasei IIA (plasmid), and a plasmid

269

of Pediococcus pentosaceus SRCM100194 (NCBI GenBank accession numbers AP014681.1,

270

AP012167.1,

271

CP017383.1, CP014986.1, and CP021926.1, respectively). In all these cases, as well as in L.

272

fermentum IMDO 130101, the cluster was flanked by a putative transposase gene, at least on

273

one side. Furthermore, the L. fermentum IMDO 130101 genome was the only one containing

CP032754.1,

CP021674.1,

CP021479.1,

CP015338.1,

CP000416.1,

11

274

a gene encoding a glycoside hydrolase family 65 protein, which was annotated as a trehalose-

275

6-phosphate phosphorylase.

276

A 29-kb region was present in strains L. fermentum IMDO 130101 and L930BB only and

277

contained genes encoding enzymes involved in the metabolism of carbohydrates and related

278

compounds, such as a putative fructuronate reductase, a glucarate dehydratase, a

279

gluconokinase, an aldose 1-epimerase, two uronate isomerases, and several glycoside

280

hydrolases, namely a putative O-glycosyl hydrolase similar to proteins of glycoside hydrolase

281

family 30, a β-glucuronidase, a putative polygalacturonase of glycoside hydrolase family 28,

282

and an α-glucosidase of glycoside hydrolase family 31, as well as two sugar (glycoside-

283

pentoside-hexuronide) transporters. An additional gene encoding a sugar (glycoside-

284

pentoside-hexuronide) transporter was present in the L. fermentum IMDO 130101 genome

285

only, although not in the same genomic region as the two other ones.

286

Based on the OGs found in the genomes, citrate could be metabolised by nine of the 28

287

strains examined, including four of the five strains isolated from milk (products), as the six

288

genes necessary for citrate lyase activity were present in their genomes. Lactobacillus

289

fermentum IMDO 130101 was not among them. A gene encoding a malic enzyme was

290

present in 21 strains, enabling the conversion of malate or oxaloacetate into pyruvate.

291

Alternatively, malate would be reversibly reduced to fumarate by 24 strains, as indicated by

292

the presence of a gene encoding a fumarate hydratase, and further into succinate by 23 of

293

those strains, due to the presence of a fumarate reductase flavoprotein subunit gene. Four

294

strains of Clade 5 were the only ones missing both of the former genes, whereas one

295

additional Clade 5 strain was missing one of the two genes. Most strains, namely 25 of the

296

28, could also convert L-citrulline and L-aspartate into L-arginine and L-fumarate via L-

297

argininosuccinate, at the cost of ATP, due to the presence of argininosuccinate synthase and

298

argininosuccinate lyase genes. 12

299 300

3.2.2. Amino acid biosynthesis

301 302

The genetic basis for the biosynthesis of L-aspartate from oxaloacetate, the biosynthesis of L-

303

asparagine, L-lysine, and L-threonine from L-aspartate, L-glycine from L-serine, L-glutamine

304

from L-glutamate, and the conversion of L-glutamine to L-glutamate was found in the

305

(extended) core genome, making these amino acids non-essential for this species. The core

306

genome also included the genes involved in L-cysteine biosynthesis from L-alanine and vice

307

versa. However, biosynthesis of L-cysteine from L-homoserine could not occur, due to the

308

absence of a L-serine O-acetyltransferase-encoding gene, although several additional

309

acetyltransferase-encoding genes were present. Additionally, apart from the phosphoserine

310

phosphatase-encoding gene, the other two genes necessary for the biosynthesis of L-serine

311

from 3-phospho-D-glycerate were present in the core genome. The genetic potential for the

312

biosynthesis of L-methionine from L-aspartate and L-cysteine, L-proline from L-glutamate or

313

L-ornithine, and L-arginine from L-glutamate and L-aspartate through the acetyl cycle was

314

strain-dependent. The genetic potential for the biosynthesis of L-histidine was also strain-

315

dependent, and it was only found in nine strains, all belonging to Clade 5. In contrast, the

316

branched-chain amino acids L-valine, L-leucine and L-isoleucine, and the aromatic amino

317

acids L-tryptophan, L-tyrosine and L-phenylalanine, were found to be essential amino acids

318

for all 28 strains of L. fermentum.

319

Seven of the 28 strains, four belonging to Clade 5, two to Clade 1, and the sole Clade 3

320

member, could decarboxylate L-glutamate to form 4-aminobutyric acid (GABA), due to the

321

presence of a glutamate decarboxylase gene. The genes for the complete ADI pathway,

322

including the arginine-ornithine antiporter, were found in 25 strains, as three strains had

323

frameshift mutations in up to two of the four genes. A gene coding for a putative arginase,

13

324

enabling a direct conversion of L-arginine into L-ornithine with concomitant urea production,

325

was present in 18 strains. However, this gene may also be encoding an enzyme with a related

326

function, as so far arginase activity has not been demonstrated experimentally in lactobacilli.

327 328

3.2.3. Cofactors

329 330

The core genome included all four genes necessary for the biosynthesis of CoA from (R)-

331

pantothenate as well as the pyridoxal kinase gene. Twenty of the 28 strains examined

332

contained all genes needed for riboflavin biosynthesis, and one such gene was missing in the

333

genomes of six strains, four of those belonging to Clade 5 and two to Clade 4. All six genes

334

for the biosynthesis of NAD+ and NADP+ from nicotinic acid were present in 23 strains,

335

whereas it was not found in one Clade 4 strain and four Clade 5 strains. Except for the

336

dihydroneopterin triphosphate pyrophosphohydrolase, all genes for the biosynthesis of folic

337

acid from GTP and 4-aminobenzoate were present in 25 strains. The same was the case for

338

the thiamine salvage pathway genes.

339 340

3.2.4. Additional sourdough-relevant traits

341 342

Seven strains, including L. fermentum IMDO 130101, possessed a gene encoding a GH70

343

protein. Based on a phylogenetic analysis, the amino acid sequences of their products formed

344

two groups on an unrooted tree (Figure 4A). Also, all amino acid sequences contained the

345

same conserved sites (Figure 4B). In the genomes of L. fermentum strains ATCC 14931, 39,

346

28-3-CHN, and NCC2970 of Clade 5, the genes were located in the same genomic region, as

347

indicated by the neighbouring genes. The C-terminal region of the gene product in L.

348

fermentum IMDO 130101 was 94 % identical and 97 % similar to the catalytic domain of a

14

349

previously characterised 4,6-α-glucanotransferase 4,6-αGT-W from L. reuteri DSM 20016

350

(NCBI GenBank accession number ABQ83597.1). The identity and similarity dropped to 73

351

% and 79 %, respectively, when comparing the full amino acid sequences. In comparison, the

352

identity and similarity between the full length amino acid sequences of the gene products of

353

L. fermentum IMDO 130101 and NCC2970 were 52 % and 63 %, respectively. The

354

sequences of the two Clade 1 strains, L. fermentum 779 LFER and DSM 20055, most likely

355

encoded a glucansucrase, as they contained a tryptophan residue whereas the sequences from

356

the rest of the strains contained a tyrosine residue at the same site.

357

A phenolic acid decarboxylase gene was present in 13 strains. The amino acid sequence of

358

the L. fermentum IMDO 130101 phenolic acid decarboxylase was highly similar to

359

previously experimentally characterised proteins, such as phenolic acid decarboxylases from

360

L. brevis RM84 (83 % identity, 89 % similarity; sequence obtained from Landete et al., 2010)

361

and Bacillus subtilis 168 (Swiss-Prot accession number O07006; 71 % identity, 86 %

362

similarity) and the p-coumaric acid decarboxylase from L. plantarum (UniProt accession

363

number P94900; 80 % identity, 90 % similarity).

364

An oleate hydratase-encoding gene was present in all strains of Clade 2 and in the Clade 5

365

strain FTDC 8312 isolated from faeces. Expression of this gene could enable the strains to

366

convert linoleic acid and oleic acid into hydroxylated derivatives.

367 368

4. DISCUSSION

369 370

Lactobacillus fermentum is a lactic acid bacterium species that is frequently isolated from

371

various sources, including plant material fermentations, such as sourdough. Since several

372

genome sequences of L. fermentum isolates from different sources are available, comparative

373

genomics could be used to better characterise the strain-dependent genetic potential, together

15

374

with the core genome and pangenome of this species. The inability to reconstruct the

375

heterofermentative pathway indicated the need to consider an extended core genome, a

376

concept introduced by Lapierre and Gogarten (2009). The actual core genome size of a

377

species is expected to be larger than the apparent (strict) core genome size, possibly due to

378

draft genomes not representing the full genome content of a strain or due to differences in

379

assembly and annotation quality.

380

The phylogenomic analysis of 28 L. fermentum strains, including the sourdough isolate L.

381

fermentum IMDO 130101, revealed that the strains could be grouped into five clades. The

382

clustering of strains into clades did not clearly correspond to their isolation source. However,

383

the largest clade (Clade 5) contained the majority of the strains that were isolated from human

384

hosts and all strains isolated from raw milk. Lactobacillus fermentum IMDO 130101 was

385

located in the clade with strains isolated almost exclusively from fermentations of plant-

386

derived materials, namely Clade 2. The size differences between the clades may reflect a bias

387

in research interests, whereby genomic research is mostly oriented towards isolates from the

388

human microbiome. Strains of L. fermentum did not partition into the five clades depending

389

on their isolation source in a way that would indicate niche specialisation, as is the case for L.

390

reuteri and its host specialisations (Su et al., 2012). This is in line with a previous proposition

391

that L. fermentum may be a species that is undergoing a reversion from a host-adapted

392

lifestyle typical for the L. reuteri group to a free-living one (Duar et al., 2017). In this

393

context, the acquisition of niche-specific genes from other members of the ecosystem would

394

be advantageous for strains of a species to survive in different niches (e.g., putative

395

maltodextrin import-related genes for starch-enriched environments). An alternative

396

explanation for the lack of partitioning could be the inherent connectedness between plant

397

material fermentations and the gastrointestinal tract of humans, as fermented plant products

398

are part of the human diet. Whereas the clades still separated when clustered based on the

16

399

presence/absence of OG, there was a bigger separation into two groups, namely Clades 5 and

400

3 versus Clades 1, 2, and 4. Moreover, L. gorillae did not cluster apart from the L. fermentum

401

strains, indicating that the intra-species diversity in L. fermentum was as big as the inter-

402

species diversity between L. fermentum and L. gorillae.

403

The L. fermentum (extended) core genome contained all genetic capabilities regarding the

404

carbohydrate metabolism and products derived thereof as mentioned in the emended species

405

description of L. fermentum (Dellaglio et al., 2004). However, the current study showed that

406

lactose utilisation was strain-dependent. In addition, the capability of the utilisation of xylose,

407

arabinose, β-glucans, α-glucosides, oxidation of pyruvate to acetyl phosphate, and citrate

408

metabolism was present in the accessory genome, and this for only a few strains. The

409

findings regarding xylose and arabinose fermentation were also in agreement with the species

410

description of L. fermentum, as their fermentation is considered as a variable trait of the

411

species (Dellaglio et al., 2004). L. fermentum IMDO 130101 had the genetic potential to

412

metabolise xylose, arabinose, β-glucans, α-glucosides, and oxidise pyruvate to acetyl

413

phosphate, but not to metabolise citrate.

414

Sourdough isolates such as L. fermentum IMDO 130101 are well fit to thrive in a sourdough

415

environment, as has been shown through fermentation experiments previously (Vrancken et

416

al., 2008, 2009a, 2009b). The main carbohydrates in flour are starch and starch degradation

417

products, such as maltodextrins and maltose, followed by sucrose (Gänzle, 2014), though

418

raffinose (an α-galactoside) and arabinoxylans are also present. The latter may be degraded to

419

arabinose and xylose (Gänzle, 2014). Utilisation of raffinose and arabinose by L. fermentum

420

IMDO 130101 was predicted as well, although their metabolism requires respective import

421

systems. The presence of oligo-1,6-glucosidase- and α-glucosidase-encoding genes suggested

422

cleavage of the glucose subunits from isomaltose and maltodextrins, which may possibly be

423

imported into the cells as well. Putative maltodextrin import-related genes were unique to L. 17

424

fermentum IMDO 130101 and could represent an advantage of this strain to the sourdough

425

environment. Analysis of its genome during the current study showed the presence of genes

426

involved in acid tolerance, transport and metabolism of plant-derived carbohydrates (in

427

particular maltose), an ADI pathway, and the production of mannitol from fructose, which are

428

all prevalent features in LAB species and strains indigenous to sourdough, confirming the

429

genetic adaptation to such fermentation previously demonstrated phenotypically (Gänzle et

430

al., 2007; Weckx et al., 2007). All these characteristics provide the genetic background

431

related to previous experimental findings regarding L. fermentum IMDO 130101 (Vrancken

432

et al., 2008, 2009a, 2009b) and explain the common appearance of L. fermentum strains in

433

sourdough (De Vuyst et al., 2017; Van Kerrebroeck et al., 2017). The production of mannitol

434

out of fructose as an alternative external electron acceptor is of special interest, as mannitol is

435

generally used as a sweetener with a low glycemic index (Saha and Racine, 2011; Ortiz et al.,

436

2013). However, using L. fermentum IMDO 130101 as a sourdough starter culture would

437

mainly impact the baked goods’ flavour due to the production of acetate that is commonly

438

associated with the reduction of fructose to mannitol in heterofermentative LAB (Hansen and

439

Schieberle, 2005).

440

The finding that the branched-chain amino acids L-valine, L-leucine and L-isoleucine, and

441

the aromatic amino acids L-tryptophan, L-tyrosine and L-phenylalanine, were essential amino

442

acids for all 28 strains of L. fermentum was in agreement with a growth study of L.

443

fermentum IFO 3956 (Kuratsu et al., 2010). Furthermore, the metabolism of L-arginine by

444

the enzymes of the ADI pathway has been experimentally shown for L. fermentum IMDO

445

130101 (Vrancken et al., 2009a, 2009b). This could result in accumulation and eventually

446

export of ornithine, a precursor of the bread crust flavour compound 2-acetyl-1-pyrroline

447

(Hofmann and Schieberle, 1998; Hansen and Schieberle, 2005; Gänzle et al., 2007).

448

Assuming there are no alternative pathways for amino acid biosynthesis, the conditions under

18

449

which L. fermentum thrives have to provide enough free (essential) amino acids and peptides,

450

which is for instance the case through endogenous flour protease activity in sourdough

451

fermentations (Gänzle et al., 2008).

452

The absence of biosynthetic genes pointed to the dependence of L. fermentum on (R)-

453

pantothenate, pyridoxal or pyridoxal 5-phosphate, nicotinate or β-nicotinate D-

454

ribonucleotide, thiamine or TPP, 7,8-dihydropteroate or tetrahydrofolate (or at least 4-amino

455

benzoate), and biotin. However, the absence of D-pantothenate does not affect the growth of

456

heterofermentative LAB species on pentoses (Zaunmüller et al., 2006). In contrast, growth on

457

hexoses demands redox balancing through the production of ethanol from acetyl phosphate,

458

which can be hindered upon limitation of coenzyme A, a D-pantothenate derivative, leading

459

to the necessity of alternative external electron acceptors (Zaunmüller et al., 2006).

460

The presence of the GH70 OG indicated that some L. fermentum strains could produce

461

glucans from sucrose (glucansucrase) or could act on starch/maltodextrin substrates to

462

produce linear isomalto/malto-polysaccharides (4,6-α-glucanotransferase) or an α-glucan

463

with

464

glucanotransferase; Leemhuis et al., 2013, 2014; Gangoiti et al., 2017). Based on the

465

tryptophan residue conserved in glucansucrases, the glycosyl hydrolases from the L.

466

fermentum strains 779_LFER and DSM 20055 are most likely glucansucrases (Vujičić-Žagar

467

et al., 2010; Ito et al., 2011). Except for the L. fermentum NCC2970 4,3-α-

468

glucanotransferase, the remaining L. fermentum glucanotransferase sequences are likely 4,6-

469

α-glucanotransferases, as they contain a QRKN motif that is conserved in these enzymes

470

(Gangoiti et al., 2017). The in vivo role of these glucanotransferases is unknown, although

471

they may play a role in scavenging and modifying oligosaccharides as substrates for the

472

biosynthesis of larger saccharides, inaccessible for other microorganisms (Kralj et al., 2011).

473

However, the 4,6-α-glucanotransferase activity in L. fermentum IMDO 130101, especially

alternating (α1->3)/(α1->4)-linkages

and

(α->3,4) branching points

(4,3-α-

19

474

when grown in a sourdough environment, needs to be investigated, in particular to elucidate

475

its contribution to texture formation in both doughs and breads, as well as its role as a

476

prebiotic soluble dietary fibre.

477

The phenolic acid decarboxylase gene present in almost half of the L. fermentum strains,

478

including L. fermentum IMDO 130101, indicated the capability to decarboxylate at least p-

479

coumaric acid, and possibly ferulic and other phenolic acids, as has been shown for L. brevis,

480

L. plantarum, and B. subtilis (Cavin et al., 1997, 1998; Landete et al., 2010; Rodríguez et al.,

481

2010), leading to the production of volatile compounds like 4-vinylphenol and 4-

482

vinylguaiacol. The concentrations of free phenolic acids in wheat and rye flour are low, yet,

483

their liberation contributes to the aroma of sourdough and its anti-oxidative effects (Boskov

484

Hansen et al., 2002; Rodríguez et al., 2009, 2010; Shewry et al., 2010). In particular, the

485

capability of microorganisms to convert phenolic acids raises their tolerance to these

486

inhibitory compounds (Gänzle, 2014).

487

Lactobacilli are known to hydrate unsaturated fatty acids to hydroxyl fatty acids (Gänzle,

488

2014; Chen et al., 2016). Breads made with sourdough fermented by antifungal fatty acid-

489

producing Lactobacillus hammesii are free of molds for a longer time than control breads

490

(Black et al., 2013). Further research could elucidate whether or not a similar protective

491

effect could also be conferred on breads made with sourdough fermented by L. fermentum

492

IMDO 130101.

493 494

5. CONCLUSIONS

495 496

Comparative genomics of 28 strains of the LAB species L. fermentum that have been isolated

497

from various sources showed that these strains could be grouped into five clades, the largest

498

of which contained most strains isolated from human hosts, whereas the second largest clade

20

499

contained almost exclusively strains isolated from plant material-based fermentations.

500

However, the clustering was not straightforward enough to indicate niche specialisation,

501

suggesting a rather free-living lifestyle for this species. The relatively modest extended core

502

genome encoded the genes necessary for the metabolism of several carbohydrates via the

503

heterolactic fermentation, showing these traits to be common to possibly all L. fermentum

504

strains. Two of the strains had the genetic features to produce glucans from sucrose, whereas

505

five others had the genetic features to restructure maltodextrins, which may contribute to the

506

sourdough and bread texture. Besides these properties, L. fermentum IMDO 130101

507

possessed, among others, genes for the utilisation of maltose, arabinose, xylose, and other

508

carbohydrates, as well as the production of acetoin, mannitol from fructose, and L-ornithine

509

from L-arginine, which are all traits beneficial for sourdough fermentation. Additionally,

510

genes encoding a putative starch degradation product import system were unique to this strain

511

and may represent an advantage in a sourdough environment. These results particularly

512

explain on the genetic level the traits that have previously been demonstrated for L.

513

fermentum IMDO 130101 phenotypically. Further, they provide new avenues of research

514

regarding L. fermentum strains isolated from sourdough.

515 516

ACKNOWLEDGEMENTS

517 518

The authors gratefully acknowledge Dr. ir. Sander Wuyts for helpful discussions regarding

519

data analysis and visualisations.

520 521

FUNDING

522

Part of this research was financed by the Research Council of the Vrije Universiteit Brussel

523

(SRP7 and IOF342 projects) and by the Research Foundation Flanders (FWO-Vlaanderen;

21

524

project number 1510809N). MV is the receiver of a PhD fellowship of the Research

525

Foundation Flanders (grant numbers 1119916N and 1111918N).

526 527

REFERENCES

528 529

Bao, Y., Zhang, Y., Zhang, Y., Liu, Y., Wang, S., Dong, X., Wang, Y., Zhang, H., 2010.

530

Screening of potential probiotic properties of Lactobacillus fermentum isolated from

531

traditional dairy products. Food Control. 21, 695-701.

532

Black, B.A., Zannini, E., Curtis, J.M., Gänzle, M.G., 2013. Antifungal hydroxy fatty acids

533

produced during sourdough fermentation: microbial and enzymatic pathways, and antifungal

534

activity in bread. Appl. Environ. Microbiol. 79, 1866-1873.

535

Boskov Hansen, H., Andreasen, M.F., Nielsen, M.M., Larsen, L.M., Bach Knudsen, K.E.,

536

Meyer, A.S., Christensen, L.P., Hansen, Å., 2002. Changes in dietary fibre, phenolic acids

537

and activity of endogenous enzymes during rye bread-making. Eur. Food. Res. Technol. 214,

538

33-42.

539

Buchfink, B., Xie, C., Huson, D.H., 2015. Fast and sensitive protein alignment using

540

DIAMOND. Nat. Methods. 12, 59-60.

541

Capella-Gutierrez, S., Silla-Martinez, J.M., Gabaldon, T., 2009. TrimAl: a tool for automated

542

alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25, 1972-1973.

543

Cavin, J.-F., Barthelmebs, L., Diviès, C., 1997. Molecular characterization of an inducible p-

544

coumaric acid decarboxylase from Lactobacillus plantarum: gene cloning, transcriptional

545

analysis, overexpression in Escherichia coli, purification and characterization. Appl. Environ.

546

Microbiol. 63, 1939-1944.

22

547

Cavin, J.-F., Dartois, V., Diviès, C., 1998. Gene cloning, transcriptional analysis, purification,

548

and characterization of phenolic acid decarboxylase from Bacillus subtilis. Appl. Environ.

549

Microbiol. 64, 1466-1471.

550

Chen, Y.Y., Liang, N.Y., Curtis, J.M., Gänzle, M.G. 2016. Characterization of linoleate 10-

551

hydratase of Lactobacillus plantarum and novel antifungal metabolites. Frontiers Microbiol.

552

7, 1561.

553

De Vuyst, L., Van Kerrebroeck, S., Harth, H., Huys, G., Daniel, H.-M., Weckx, S., 2014.

554

Microbial ecology of sourdough fermentations: diverse or uniform? Food Microbiol. 37, 11-

555

29.

556

De Vuyst, L., Van Kerrebroeck, S., Leroy, F., 2017. Microbial ecology and process

557

technology of sourdough fermentation. Adv. Appl. Microbiol. 100, 49-160.

558

De Vuyst, L., Weckx, S., 2016. The cocoa bean fermentation process: from ecosystem

559

analysis to starter culture development. J. Appl. Microbiol. 121, 5-17.

560

Dellaglio, F., Torriani, S., Felis, G.E., 2004. Reclassification of Lactobacillus cellobiosus

561

Rogosa et al. 1953 as a later synonym of Lactobacillus fermentum Beijerinck 1901. Int. J.

562

Sys. Evol. 54, 809-812.

563

Duar, R.M., Lin, X.B., Zheng, J., Martino, M.E., Grenier, T., Pérez-Muñoz, M.E., Leulier, F.,

564

Gänzle, M., Walter, J., 2017. Lifestyles in transition: evolution and natural history of the

565

genus Lactobacillus. FEMS Microbiol. Rev. 41, S27-S48.

566

Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high

567

throughput. Nucleic Acids Res. 32, 1792-1797.

568

Emms, D.M,. Kelly, S., 2015. OrthoFinder: solving fundamental biases in whole genome

569

comparisons dramatically improves orthogroup inference accuracy. BMC Genome Biol. 16,

570

157.

23

571

Endo, A., Mizuno, H., Okada, S., 2008. Monitoring the bacterial community during

572

fermentation of sunki, an unsalted, fermented vegetable traditional to the Kiso area of Japan.

573

Lett. Appl. Microbiol. 47, 221-226.

574

Gangoiti, J., van Leeuwen, S., Gerwig, G.J., Duboux, S., Vafiadi, C., Pijning, T., Dijkhuizen,

575

L., 2017. 4,3-α-Glucanotransferase, a novel reaction specificity in glycoside hydrolase family

576

70 and clan GH-H. Sci. Reports. 7, 39761.

577

Gänzle, M.G., Vermeulen, N., Vogel, R.F., 2007. Carbohydrate, peptide and lipid metabolism

578

of lactic acid bacteria in sourdough. Food Microbiol. 24, 128-138.

579

Gänzle, M.G., Loponen, J., Gobbeti, M., 2008. Proteolysis in sourdough fermentations:

580

mechanisms and potential for improved bread quality. Trends Food Sci. Technol. 19, 513-

581

521.

582

Gänzle, M.G., 2014. Enzymatic and bacterial conversions during sourdough fermentation.

583

Food Microbiol. 37, 2-10.

584

Gobbetti, M., De Angelis, M., Corsetti, A., Di Cagno, R., 2005. Biochemistry and physiology

585

of sourdough lactic acid bacteria. Trends Food Sci Technol. 16, 57-69.

586

Grover, S., Sharma, V.K., Mallapa, R.H., Batish, V.K., 2013. Draft genome sequence of

587

Lactobacillus fermentum Lf1, an Indian isolate of human gut origin. Genome Announc. 1,

588

e00883-13.

589

Hansen, A., Schieberle, P., 2005. Generation of aroma compounds during sourdough

590

fermentation: applied and fundamental aspects. Trends Food Sci. Technol. 16, 85-94.

591

Hofmann, T., Schieberle, P., 1998. 2-Oxopropanal, hydroxy-2-propanone, and 1-pyrroline –

592

important intermediates in the generation of the roast-smelling food flavor compounds 2-

593

acetyl-1-pyrroline and 2-acetyltetrahydropyridine. J. Agric. Food Chem. 46, 2270-2277.

24

594

Huerta-Cepas, J., Forslund, K., Coelho, L.P., Sklarczyk, D., Jensen, L.J., von Mering, C.,

595

Bork, P., 2017. Fast genome-wide functional annotation through orthology assignment by

596

eggNOG-Mapper. Mol. Biol. Evol. 34, 2115-2122.

597

Ito, K., Ito, S., Shimamura, T., Weyand, S., Kawarasaki, Y., Misaka, T., Abe, K., Kobayashi,

598

T., Cameron, A.D., Iwata, S., 2011. Crystal structure of glucansucrase from the dental caries

599

pathogen Streptococcus mutans. J. Mol. Biol. 2, 177-186.

600

Jiménez, E., Langa, S., Martín, V., Arroyo, R., Martín, R., Fernández, L., Rodríguez, J.M.,

601

2010. Complete genome sequence of Lactobacillus fermentum CECT 5716, a probiotic strain

602

isolated from human milk. J. Bacteriol. 192, 4800-4800.

603

Kralj, S., Grijpstra, P., van Leeuwen, S.S., Leemhuis, H., Dobruchowska, J.M., van der Kaaij.

604

R.M., Malik, A., Oetari, A., Kamerling, J.P., Dijkhuizen, L., 2011. 4,6-α-Glucanotransferase,

605

a novel enzyme that structurally and functionally provides an evolutionary link between

606

glycoside hydrolase enzyme families 13 and 70. Appl. Environ. Microbiol. 77, 8154-8163.

607

Kuratsu, M., Hamano, Y., Dairi, T., 2010. Analysis of the Lactobacillus metabolic pathway.

608

Appl. Environ. Microbiol. 76, 7299-7301.

609

Landete, J.M., Rodríguez, H., Curiel, J.A., de las Rivas, B., Mancheño, J.M., Muñoz, R.,

610

2010. Gene cloning, expression, and characterization of phenolic acid decarboxylase from

611

Lactobacillus brevis RM84. J. Ind. Microbiol. Biotechnol. 37, 617-624.

612

Lapierre, P., Gogarten, J.P., 2009. Estimating the size of the bacterial pan-genome. Trends in

613

Genet. 25, 107-110.

614

Lee, S., You, H.J., Kwon, B., Ko, G., 2017. Complete genome sequence of the plasmid-

615

bearing Lactobacillus fermentum strain SNUV175, a probiotic for women’s health isolated

616

from the vagina of a healthy South Korean woman. Genome Announc. 5, e00045-17.

617

Leemhuis, H., Dijkman, W.P., Dobruchowska, J.M., Pijning, T., Grijpstra, P., Kralj, S.,

618

Kamerling, J.P., Dijkhuizen, L., 2013. 4,6-α-glucanotransferase activity occurs more

25

619

widespread in Lactobacillus strains and constitutes a separate GH70 subfamily. Appl.

620

Microbiol. Biotechnol. 97, 181-193.

621

Leemhuis, H., Dobruchowska, J.M., Ebbelaar, M., Faber, F., Buwalda, P.L., van der Maarel,

622

M.J., Kamerling, J.P., Dijkhuizen, L., 2014. Isomalto/malto-polysaccharide, a novel soluble

623

dietary fiber made via enzymatic conversion of starch. J. Agric. Food Chem. 62, 12034-

624

12044.

625

Lehri, B., Seddon, A.M., Karlyhev, A.V., 2015. Lactobacillus fermentum 3872 genome

626

sequencing reveals plasmid and chromosomal genes potentially involved in a probiotic

627

activity. FEMS Microbiol. Lett. 362, fnv068.

628

Martín, R., Langa, S., Reviriego, C., Jiménez, E., Marín, M.L., Xaus, J., Fernández, L.,

629

Rodríguez, J.M., 2003. Human milk is a source of lactic acid bacteria for the infant gut. J.

630

Pediatr. 143, 754-758.

631

Morita, H., Toh, H., Fukuda, S., Horikawa, H., Oshima, K., Suzuki, T., Murakami, M.,

632

Hisamatsu, S., Kato, Y., Takizawa, T., Fukuoka, H., Yoshimura, T., Itoh, K., O'Sullivan, D.J.,

633

McKay, L.L., Ohno, H., Kikuchi, J., Masaoka, T., Hattori, M., 2008. Comparative genome

634

analysis of Lactobacillus reuteri and Lactobacillus fermentum reveal a genomic island for

635

reuterin and cobalamin production. DNA Res. 15, 151-161.

636

O'Leary, N.A., Wright, M.W., Brister, J.R., Ciufo, S., Haddad, D., McVeigh, R., Rajput, B.,

637

Robbertse, B., Smith-White, B., Ako-Adjei, D., Astashyn, A., Badretdin, A., Bao, Y.,

638

Blinkova, O., Brover, V., Chetvernin, V., Choi, J., Cox, E., Ermolaeva, O., Farrell, C.M.,

639

Goldfarb, T., Gupta, T., Haft, D., Hatcher, E., Hlavina, W., Joardar, V.S., Kodali, V.K., Li, W.,

640

Maglott, D., Masterson, P., McGarvey, K.M., Murphy, M.R., O'Neill, K., Pujar, S., Rangwala,

641

S.H., Rausch, D., Riddick, L.D., Schoch, C., Shkeda, A., Storz, S.S., Sun, H., Thibaud-

642

Nissen, F., Tolstoy, I., Tully, R.E., Vatsan, A.R., Wallin, C., Webb, D., Wu, W., Landrum,

643

M.J., Kimchi, A., Tatusova, T., DiCuccio, M., Kitts, P., Murphy, T.D., Pruitt, K.D., 2016.

26

644

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and

645

functional annotation. Nucleic Acids Res. 44, D733-D745.

646

Oksanen, J., Blanchet, F.G., Kindt, R., Legendre, P., Minchin, P.R., O’Hara, R.B., Simpson,

647

G.L., Solymos, P., Stevens, M.H.H., Wagner, H., 2016. vegan: community ecology package.

648

R package version 2.3.-5. https://CRAN.R-project.org/package=vegan.

649

Ortiz, M., Bleckwedel, J., Raya, R., Mozzi, F., 2013. Biotechnological and in situ production

650

of polyols by lactic acid bacteria. Appl. Microbiol. Biotechnol. 97, 4713-4726.

651

Paradis, E., Claude, J., Strimmer, K., 2004. APE: analyses of phylogenetics and evolution in

652

R language. Bioinformatics. 20, 289-290.

653

Price, M.N., Dehal, P.S., Arkin, A.P., 2010. FastTree 2 – Approximately maximum-likelihood

654

trees for large alignments. PLoS ONE. 5, e9490.

655

R Core Team, 2018. R: a language and environment for statistical computing. Vienna,

656

Austria. URL https://www.R-project.org/.

657

Ravyts, F., De Vuyst, L., 2011. Prevalence and impact of single-strain starter cultures of lactic

658

acid bacteria on metabolite formation in sourdough. Food Microbiol. 28, 1129-1139.

659

Rehman, S., Paterson, A., Piggott, J.R., 2006. Flavour in sourdough breads: a review. Trends

660

Food. Sci. Technol. 17, 557-566.

661

Rodríguez, H., Curiel, J.A., Landete, J.M., de las Rivas, B., López de Felipe, F., Gómez-

662

Cordovés, C., Mancheño, J.M., Muñoz, R., 2009. Food phenolics and lactic acid bacteria. Int

663

J. Food Microbiol. 132, 79-90.

664

Rodríguez, H., Angulo, I., de las Rivas, B., Campillo, N., Páez, J.A., Muñoz, R., Mancheño,

665

J.M., 2010. p-Coumaric acid decarboxylase from Lactobacillus plantarum: structural insights

666

into the active site and decarboxylation catalytic mechanism. Proteins. 78, 1662-1676.

27

667

Rogosa, M., Wiseman, R.F., Mitchell, J.A., Disraely, M.N., Beaman, A.J., 1953. Species

668

differentiation of oral lactobacilli from man including descriptions of Lactobacillus salivarius

669

nov. spec. and Lactobacillus cellobiosus nov. spec. J. Bacteriol. 65, 681-699.

670

Saha, B., Racine, F., 2011. Biotechnological production of mannitol and its applications.

671

Appl. Microbiol. Biotechnol. 89, 879-891.

672

Shewry, P.R., Piironen, V., Lampi, A.-M., Edelmann, M., Kariluoto, S., Nurmi, T., Fernandez-

673

Orozco, R., Andersson, A.A.M., Åman, P., Fraś, A., Boros, D., Gebruers, K., Dornez, E.,

674

Courtin, C.M., Delcour, J.A., Ravel, C., Charmet, G., Rakszegi, M., Bedo, Z., Ward, J.L.,

675

2010. Effects of genotype and environment on the content and composition of

676

phytochemicals and dietary fiber components in rye in the HEALTHGRAIN diversity screen.

677

J. Agric. Food Chem. 58, 9372-9383.

678

Stanstrup, J., 2017. Some code was provided by others on stackoverflow.com as indicated in

679

the

680

https://github.com/stanstrup/massageR.

681

Su, M.S.W., Oh, P.L., Walter, J., Gänzle, M.G., 2012. Intestinal origin of sourdough

682

Lactobacillus reuteri isolates as revealed by phylogenetic, genetic, and physiological

683

analysis. Appl. Environ. Microbiol. 78, 6777-6780.

684

Sun, Z., Zhang, W., Bilige, M., Zhang, H., 2015. Complete genome sequence of the probiotic

685

Lactobacillus fermentum F-6 isolated from raw milk. J. Biotech. 194, 110-111.

686

Van der Meulen, R., Scheirlink, I., Van Schoor, A., Huys, G., Vancanneyt, M., Vandamme, P.,

687

De Vuyst, L., 2007. Population dynamics and metabolite target analysis of lactic acid bacteria

688

during laboratory fermentations of wheat and spelt sourdoughs. Appl. Environ. Microbiol. 73,

689

4741-4750.

690

Van Kerrebroeck, S., Maes, D., De Vuyst, L., 2017. Sourdoughs as a function of their species

691

diversity and process conditions, a meta-analysis. Trends Food Sci. Technol. 68, 152-159.

individual

functions.

massageR:

massageR.

R

package

version

0.7.2.

28

692

Verce, M., De Vuyst, L., Weckx, S., 2018. Complete and annotated genome sequence of the

693

sourdough lactic acid bacterium Lactobacillus fermentum IMDO 130101. Genome Announc.

694

6, e00256-18.

695

Vogel, R., Pavlovic, M., Ehrmann, M.A., Wiezer, A., Liesegang, H., Offschanka, S., Voget,

696

S., Angelov, A., Böcker, G., Liebl, W., 2011. Genomic analysis reveals Lactobacillus

697

sanfranciscensis as stable element in traditional sourdoughs. Microb. Cell Fact. 10, S6.

698

Vrancken, G., Rimaux, T., De Vuyst, L., Leroy, F., 2008. Kinetic analysis of growth and sugar

699

consumption by Lactobacillus fermentum IMDO 130101 reveals adaptation to the acidic

700

sourdough ecosystem. Int. J. Food Microbiol. 128, 58-66.

701

Vrancken, G., Rimaux, T., Weckx, S., De Vuyst, L., Leroy, F., 2009a. Environmental pH

702

determines citrulline and ornithine release through the arginine deiminase pathway in

703

Lactobacillus fermentum IMDO 130101. Int. J. Food Microbiol. 135, 216-222.

704

Vrancken, G., Rimaux, T., Wouters, D., Leroy, F., De Vuyst, L., 2009b. The arginine

705

deiminase pathway of Lactobacillus fermentum IMDO 130101 responds to growth under

706

stress conditions of both temperature and salt. Food Microbiol. 26, 720-727.

707

Vujičić-Žagar, A., Pijning, T., Kralj, S., Lopez, C.A., Eeuwema, W., Dijkhuizen, L., Dijkstra,

708

B.W., 2010. Crystal structure of a 117 kDa glucansucrase fragment provides an insight into

709

evolution and product specificity of GH70 enzymes. Proc. Natl. Acad. Sci. USA. 50, 21406-

710

21411.

711

Warnes, G.R., Bolker, B., Bonebakker, L., Gentleman, R., Liaw, W.H.A., Lumley, T.,

712

Maechler, M., Magnusson, A., Moeller, S., Schwartz, M., Venables, B., 2016. gplots: various

713

R programming tools for plotting data. R package version 3.0.1. https://CRAN.R-

714

project.org/package=gplots.

715

Weckx, S., Van der Meulen, R., Maes, D., Scheirlink, I., Huys, G., Vandamme, P., De Vuyst,

716

L., 2007. Lactic acid bacteria community dynamics and metabolite production of rye

29

717

sourdough fermentations share characteristics of wheat and spelt sourdough fermentations.

718

Food Microbiol. 27, 1000-1008.

719

Yoon, S.H., Ha, S.M., Lim, J.M., Kwon, S.J., Chun, J., 2017. A large-scale evaluation of

720

algorithms to calculate average nucleotide identity. Antonie van Leeuwenhoek. 110, 1281-

721

1286.

722

Yu, G., Smith, D., Zhu, H., Guan, Y., Lam, T.T., 2017. ggtree: an R package for visualization

723

and annotation of phylogenetic trees with their covariates and other associated data. Methods

724

Ecol. Evol. 8, 28-36.

725

Zaunmüller, T., Eichert, M., Richter, H., Undern, G., 2006. Variations in the energy

726

metabolism of biotechnologically relevant heterofermentative lactic acid bacteria during

727

growth on sugars and organic acids. Appl. Microbiol. Biotechnol. 72, 421-429.

728 729

Tables

730 731

Table 1. Strains of the lactic acid bacterial species Lactobacillus fermentum, the genomes of

732

which were used in the comparative genomics analysis of the current study. NA – not

733

available.

734 Strain

Assembly status

Source

Category

Accession number

IMDO 130101 Complete

Sourdough

Plant

GCA_900205745.1

IFO 3956

Complete

Fermented plant material Plant

GCF_000010145.1

CECT 5716

Complete

Human milk

Milk

GCF_000210515.1

F-6

Complete

Raw milk

Milk

GCF_000397165.1

3872

Complete

Human milk

Milk

GCF_000466785.3

NCC2970

Complete

NA

NA

GCF_001742205.1

SNUV175

Complete

Human vagina

Human

GCF_001941785.1

30

LAC FRN-92 Complete

Oral cavity

Human

GCF_002192435.1

MTCC 25067 Complete

Fermented milk

Milk

GCF_002356135.1

FTDC 8312

Complete

Human faeces

Human

GCF_002119645.1

47-7

Chromosome Healthy infant

Human

GCF_001854105.1

ATCC 14931

Scaffold

Fermented beets

Plant

GCF_000159215.1

28-3-CHN

Scaffold

Urogenital tract

Human

GCF_000162395.1

NB-22

Scaffold

Human vagina

Human

GCF_000496435.1

LfQi6

Scaffold

Human milk

Milk

GCF_000966835.1

Lf1

Contig

Human faeces

Human

GCF_000472265.1

39

Contig

NA

NA

GCF_001010185.1

90 TC-4

Contig

NA

NA

GCF_001010245.1

L930BB

Contig

Human colon

Human

GCF_001039735.1

UCO-979C

Contig

Gastric biopsy

Human

GCF_001297905.1

222

Contig

Cocoa bean fermentation Plant

GCF_001368755.1

SHI-2

Contig

Saliva

Human

GCF_002591935.1

S6

Contig

Tchapalo

Plant

GCF_900163585.1

S13

Contig

Tchapalo

Plant

GCF_900163595.1

779_LFER

Scaffold

Human

Human

GCF_001077025.1

DSM 20055

Scaffold

Human saliva

Human

GCF_001436835.1

RI-508

Scaffold

Cocoa bean fermentation Plant

GCF_001982185.1

BFE 6620

Scaffold

Gari

GCF_002204495.1

Plant

735 736

Table 2. Overview of the number of orthogroups (OGs) unique to a clade, along with the

737

functions of those OGs that were present in all members of that particular clade. Clade

Number of OGs unique Functions of OGs unique to the clade that were present in to the clade all strains within the clade

Clade 1 17

Hypothetical protein Transcriptional regulator

Clade 2 40

Ribose-5-phosphate isomerase Putative divalent cation transporter

Clade 3 21

Hypothetical protein (14 OGs) Sensor histidine kinase Response regulator PAP2 family protein FMN-binding protein 2-Dehydropantoate 2-reductase Flippase 31

Sucrose phosphorylase Clade 4 14

β-Galactosidase (2 OGs or pseudogene) Acetoin reductase ABC transporter subunit (3 OGs) S-Ribosylhomocysteine lyase Glutamate dehydrogenase Malate dehydrogenase Nicotinamide mononucleotide transporter ADP-ribose pyrophosphatase Citrate_sodium symporter Isochorismatase

Clade 5 284

Hypothetical protein Transcriptional regulator

738 739 740

Legends to the figures

741 742

Fig. 1. The pangenome (in green) and strict core genome (in yellow) estimates expressed as

743

the number of orthogroups in relation to the number of strains (above). For each number of

744

strains, at most 500 combinations of strains were considered. Each point represents the

745

pangenome or the strict core genome estimate of one combination and the line connects the

746

arithmetic means of the pangenome or strict core genome estimates.

747 748

Fig. 2. (A) A rooted phylogenomic tree based on protein sequence alignment from 584 single-

749

copy core genes common to all 28 Lactobacillus fermentum strains examined and the

750

outgroup (Lactobacillus gorillae KZ01) and average nucleotide identities between the

751

genomes. The strains are highlighted according to their isolation source - human host (red),

752

milk (yellow), or plant material (green). (B) Orthogroup (OG) presence/absence clustering

753

dendrogram and heatmap. On the phylogenomic tree, the nodes denoted with black dots had a

754

local support value higher than 0.85; red dots denote nodes with a local support value lower

755

than 0.40; the blue dot denotes the root of the L. fermentum tree. On the OG presence/absence 32

756

heatmap, OGs are ordered on the x-axis from left to right according to the number of strains

757

they appear in; dark blue denotes OG presence, light blue OG absence.

758 759

Fig. 3. Schematic representation of the core carbohydrate metabolism and arginine deiminase

760

pathway of the lactic acid bacterial species Lactobacillus fermentum. Solid black arrows

761

represent reactions catalysed by enzymes encoded in its extended core genome. Grey arrows

762

represent reactions catalysed by enzymes in its accessory genome, shaded according to the

763

number of strains that possess the corresponding gene. Dotted lines represent assumed

764

transport. Dash-dotted lines represent abbreviations in the metabolic pathways. 1, β-

765

galactosidase;

766

uridylyltransferase;

767

uridylyltransferase; 6, phosphoglucomutase; 7, putative 1,4-α-glucosidase; 8, maltose

768

phosphorylase; 9, β-phosphoglucomutase; 10, glucokinase; 11, sucrose-(phosphate)

769

hydrolase; 12, mannitol 2-dehydrogenase; 13, fructokinase/hexokinase; 14, glucose-6-

770

phosphate

771

phosphogluconolactonase; 17, phosphogluconate dehydrogenase; 18, ribulose-phosphate 3-

772

epimerase; 19, xylulose-5-phosphate phosphoketolase; 20, acetate kinase; 21, phosphate

773

acetyltransferase; 22, aldehyde/alcohol dehydrogenase; 23, alcohol dehydrogenase; 24,

774

pyruvate dehydrogenase; 25, pyruvate oxidase; 26, acetolactate synthase; 27, α-acetolactate

775

decarboxylase; 28, 2,3-butanediol dehydrogenase; 29, (spontaneous); 30, diacetyl reductase

776

[(S)-acetoin-forming]; 31, L-lactate dehydrogenase; 32, D-lactate dehydrogenase; 33,

777

gluconokinase; 34, ribokinase; 35, ribose-5-phosphate isomerase; 36, xylose isomerase; 37,

778

xylulose kinase; 38, glycerol kinase; 39, glycerol-3-phosphate dehydrogenase; 40, triose

779

phosphate isomerase; 41, malate dehydrogenase (oxaloacetate-decarboxylating); 42, fumarate

2,

galactokinase; 4,

isomerase;

UDP-glucose

15,

3,

UDP-glucose:α-D-galactose-1-phosphate

4-epimerase;

glucose-6-phosphate

5,

UTP-glucose-1-phosphate

1-dehydrogenase;

16,

6-

33

780

hydratase; 43, arginine deiminase; 44, ornithine carbamoyltransferase; 45, carbamate kinase;

781

46, argininosuccinate synthase; 47, argininosuccinate lyase; and 48, putative arginase.

782 783

Fig. 4. (A) An unrooted tree based on the amino acid sequence comparison of catalytic cores

784

of glycosyl hydrolase family 70 (GH70) proteins encoded by seven strains of Lactobacillus

785

fermentum (indicated by their strain number only), of 4,6-α-glucanotransferases encoded by

786

Lactobacillus reuteri strains 121, DSM 20015, and ML1, of a reuteransucrase encoded by L.

787

reuteri strain 121, and of a dextransucrase encoded by L. reuteri strain 180. (B) Amino acid

788

sequence alignment of the catalytic cores of the GH70 proteins considered. Amino acid sites

789

are numbered according to the L. fermentum IMDO 130101 sequence in the alignment of

790

catalytic cores (from the amino acids WYRP onwards). Black dots denote conserved sites.

791

The blue dot denotes the location of a tryptophan residue conserved in glucansucrases.

34