In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models

In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models

CSBJ-00225; No of Pages 11 Computational and Structural Biotechnology Journal xxx (2018) xxx–xxx Contents lists available at ScienceDirect journal h...

2MB Sizes 0 Downloads 16 Views

CSBJ-00225; No of Pages 11 Computational and Structural Biotechnology Journal xxx (2018) xxx–xxx

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/csbj

Q24Q1

F

3

In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models Julia Zieringer, Ralf Takors ⁎

5 6

Institute of Biochemical Engineering, University of Stuttgart, Germany

7

a r t i c l e

8 9 10 11 12 17 16 15 14 13 33 34 35 36 37 38

Article history: Received 26 March 2018 Received in revised form 11 June 2018 Accepted 12 June 2018 Available online xxxx

i n f o

O

2

Mini Review

a b s t r a c t

P

Industrial bioreactors range from 10.000 to 700.000 L and characteristically show different zones of substrate availabilities, dissolved gas concentrations and pH values reflecting physical, technical and economic constraints of scale-up. Microbial producers are fluctuating inside the bioreactors thereby experiencing frequently changing micro-environmental conditions. The external stimuli induce responses on microbial metabolism and on transcriptional regulation programs. Both may deteriorate the expected microbial production performance in large scale compared to expectations deduced from ideal, well-mixed lab-scale conditions. Accordingly, predictive tools are needed to quantify large-scale impacts considering bioreactor heterogeneities. The review shows that the time is right to combine simulations of microbial kinetics with calculations of large-scale environmental conditions to predict the bioreactor performance. Accordingly, basic experimental procedures and computational tools are presented to derive proper microbial models and hydrodynamic conditions, and to link both for bioreactor modeling. Particular emphasis is laid on the identification of gene regulatory networks as the implementation of such models will surely gain momentum in future studies. © 2018 Zieringer, Takors. Published by Elsevier B.V. on behalf of the Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

T

E

D

Keywords: Gene regulatory networks Scale-down devices CFD Compartment models CFD-based compartment models

42 40 39

E

41

R

Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . Data-driven Approach. . . . . . . . . . . . . . . . . . . . . . . 2.1. Experimental Set-Ups Mimicking Large-Scale Heterogeneities . 2.2. Experimental Access to Metabolic and Transcriptional Responses 2.3. Experimental Access to Single Cell Analysis . . . . . . . . . . 3. Modeling Microbial Growth with Different Granularity . . . . . . . . 3.1. Identifying Structured and Non-Structured Microbial Models . . 3.2. Identifying Gene Regulatory Networks (GRNs) . . . . . . . . 4. Simulating the Cellular Environment with Embedded Growing Cells . . 4.1. Modeling of Hydrodynamics and Mass Transfer . . . . . . . . 4.2. Hydrodynamic Modeling Linked to GRN Models. . . . . . . . 5. Conclusion and Perspectives . . . . . . . . . . . . . . . . . . . . Conflict of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . Uncited references . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

N C

O R

1. 2.

U

45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

18 19 20 21 22 23 24 25 26 27 28 29 30

C

31 32

44 43

R O

1

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

60

1. Introduction ⁎ Corresponding author. E-mail address: [email protected] (R. Takors).

61

With the advent of metabolic engineering in the 1990s Bailey [1], the 62 engineers' view on microbes changed. Process optimization no longer 63 considered the extracellular environment (i.e. cultivation conditions) 64

https://doi.org/10.1016/j.csbj.2018.06.002 2001-0370/© 2018 Zieringer, Takors. Published by Elsevier B.V. on behalf of the Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Please cite this article as: Zieringer J, Takors R, In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models, Comput Struct Biotechnol J (2018), https://doi.org/10.1016/j.csbj.2018.06.002

86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116

117

2. Data-driven Approach

118 119

Comprehensive data sets are necessary to develop gene regulatory models, generated to answer the biological question of interest. This also holds true for elucidating complex metabolic and regulatory responses of producer cells that are exposed to industrial production conditions. One approach to collect representative data is to mimic large-scale conditions and to capture time series of regulatory dynamics as a basis for unraveling dynamic regulatory models. Such approaches usually require rapid sampling experiments that ‘freeze’ metabolic states monitored in scale-down experiments. Examples of experimental procedures are given in the following.

120 121 122 123 124 125 126 127

2.2. Experimental Access to Metabolic and Transcriptional Responses

161

Samples taken from the scale-up simulators need to be processed so that metabolic and transcriptional states are ‘frozen’ immediately. Metabolic inactivation and purification can be achieved via several approaches Oldiges et al. [31]; Teleki et al. [32]; Pfizenmaier et al. [33]; Matuszczyk et al. [34] and requires individual optimization for the given problem. Blocking intracellular transcription is achieved by sampling into RNA protect kits Löffler et al. [14]. Correctly prepared, samples can be treated further to identify metabolic compositions via metabolic profiling or fingerprinting techniques Fernie et al. [35]; Winder et al. [36]; Fiehn [37], protein contents via affinity tags Gygi et al. [38] or mass spectrometry Aebersold and Mann [39] and transcript levels, either applying microarrays or, more preferred, next generation sequencing technologies analyzing mRNAs Nagalakshmi et al. [40]; Nookaew et al. [41]; Wang et al. [42]. To reduce the overall sequencing expenses, library preparation usually is done via a rRNA depletion or poly-A enrichment step to remove non-coding rRNA. Various methods for RNA Seq analysis are available and have been reviewed recently by Conesa et al. Conesa et al. [43]. Regarding modeling, time series of transcripts are particularly important which requires methods of differential gene expression analysis. Fig. 2 provides an overview of a typical workflow making use of public R packages. Once time series of transcripts are available, modelers may be interested in unraveling gene clusters showing similar transcription dynamics and data integration in dynamic models. Applicants may be guided via evaluating reports of Rapaport et al. Rapaport et al. [49], Hecker et al. and Banf et al. Hecker et al. [50]; Banf and Rhee [51]. Currently, algorithms such as DeSeq2 Love et al. [52] and MaSigPro Conesa et al. [53] are often applied.

162

O

F

129 130

R O

84 85

In large-scale production processes micro-environmental inhomogeneities often occur. Insufficient mixing leads to severe axial and horizontal concentration gradients. Producer cells frequently cross these poorly mixed zones which triggers metabolic and transcriptional responses accordingly Takors [23]. Because large-scale experimental data are rarely accessible, experimental scale-up simulators are typically applied, reflecting large-scale conditions Delafosse et al. [24]. Pioneering studies were performed by Oosterhuis and Kossen Oosterhuis et al. [10] using a two compartment system comprising two stirred tank reactors (STRs) to investigate the effect of different oxygen levels upon the gluconic acid fermentation of Gluconobacter oxydans. Since then, variations of the two compartment set up considered the combination of an STR and a plug flow reactor (PFR). Reviews have been given by Delvigne et al. and Neubauer et al. Delvigne et al. [22]; Neubauer and Junne [26]. Fig. 1 depicts selected examples for several STR-STR and STR-PFR applications. Experimental scale-up simulators do not merely consist of two compartments. Three compartment approaches have been studied as well. Examples are the STR-STR-STR cascade of Buchholz et al. Buchholz et al. [13] and the PFR-STR-PFR set-up of Lemoine et al. [28]. Accordingly, more complex scale-up scenarios could be analyzed. Notably, two and three compartment scale-up simulators mirror the cellular responses on repeated, frequent stimuli. In contrast, investigations of single perturbations may be a proper tool for deriving distinct stimulus/response correlations, see Fig. 1 for examples. On this basis, explicit metabolic and transcriptional dynamics can be deduced that, when properly superimposed, result in the complex cellular response observed. However, signal transduction is highly networked in the cells which may cause the cross-interference of multiple stimuli. The coincidence of multiple stimuli in large-scale fermentation is the rule rather than the exception Xu et al. [29]; Egli [30]. Accordingly, multiple stimulus/response studies are likely to gain importance in the future.

P

82 83

128

D

80 81

2.1. Experimental Set-Ups Mimicking Large-Scale Heterogeneities

T

78 79

C

76 77

E

74 75

R R

72 73

O

71

C

69 70

N

67 68

alone, but started to investigate intracellular mechanisms in addition Bailey [1]; Vallino and Stephanopoulos [2]. Since then, intracellular reaction rates have been quantified and models of regulatory processes finally aiming at identifying ta rgets for further strain and process improvement have been derived. To some extent driven by the observations that cellular engineering always results in multiple and complex systemic responses Bailey [1], furthermore catalyzed by the avalanche of omics data that were accessible, systems biology and systems metabolic engineering emerged in 2000. In essence, holistic models have been developed that aim to provide as sound and comprehensive a cellular view as possible. The development clearly reflects the general engineering mindset of investigating the whole system by modularization, quantitative analysis, reassembling and studying the interaction of the networked modules. The earliest, simple examples may be given by the Monod growth model Jacob and Monod [3], followed by more sophisticated approaches like the lactose operon considering feedback regulation in Escherichia coli, finally leading to complex models comprising multiple levels of cellular regulation Kitano [4]. While such movements led to the birth of systems biology Westerhoff and Palsson [5] and systems metabolic engineering Lee et al. [6]; Park and Lee [7]; Becker et al. [8]; Wittmann and Lee [9] core engineering activities such as scale-up were a matter of steady development, too. Scale-up is the procedure to transfer lab-bioprocesses in production (large) conditions, often covering 7 to 8 orders of magnitude of volume. Unfortunately, loss or even failure of large-scale performance may occur. Detailed knowhow is necessary to prevent unwanted production losses. Accordingly, Oosterhuis and Kossen were the first who presented a scale-up simulator (1983) for investigating the impact of oxygen gradients on Gluconobacter oxydans Oosterhuis et al. [10]. They further introduced bioreactor compartment models to achieve the coarse spatial resolution of local oxygen transfer rates to identify micro- and anaerobic zones Oosterhuis and Kossen [11]. This line of thinking was followed by a series of similar studies Neubauer et al. [12]; Buchholz et al. [13]; Löffler et al. [14,15]; von Wulffen et al. [16] and reached a new level of complexity by linking simulations of hydrodynamics and mass transports with simple metabolic models of Saccaromyces cerevisiae and E. coli Bylund et al. [17]; Lapin et al. [18,19]; Wang et al. [20]; Haringa et al. [21]. Notably, cellular dynamics were modeled by focusing on metabolism dynamics only. This is remarkable as systems biology has already shown that holistic models are able to cover a far broader range of complexity. Scale-up engineers have already pointed out Delvigne et al. [22] that profound knowhow is necessary to enable the best knowledge-based scale-up using in silico predictions. This review addresses the current need for knowledge-based process scale-up by elucidating the putative contributions of modeling. The existing plethora of modeling approaches will be structured with respect to granularity and usefulness to (i) identify and (ii) model key regulatory phenomena and (iii) to link cellular models with predictions of large-scale hydrodynamics. It will be shown that the time is right to approach the challenging goal of in silico predicted large-scale performance of microbial producers.

U

65 66

J. Zieringer, R. Takors / Computational and Structural Biotechnology Journal xxx (2018) xxx–xxx

E

2

Please cite this article as: Zieringer J, Takors R, In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models, Comput Struct Biotechnol J (2018), https://doi.org/10.1016/j.csbj.2018.06.002

131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160

163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189

J. Zieringer, R. Takors / Computational and Structural Biotechnology Journal xxx (2018) xxx–xxx

3

PFR

Feed

STR

Semi-C onti. ST R Semi-C onti. PF R Discon ti. SRT Discon ti. PFR Refere nce

R

R

pH

Conti. S T

Oxyge n

Nitroge n Glucos e

R O

Conti. P F

R2

O

Semi-C onti. ST R1 Semi-C onti. ST R2 Discon ti. SRT 1 Discon ti. SRT 2 Refere nce

F

STR 2

Conti. S T

Oxyge n Conti. S T

Nitroge n Glucos e

R1

STR 1

[10]

S. cer evisiae

G. cerevisiae

[28]

E. coli

S. cerevisiae

[29]

E. coli i (lac+)

[33]

E. coli

[30]

E. coli

[34]

C. glutamicum

[31]

E. coli

[35]

B. subtilis

[36]

[32] [12]

E. coli

[37]

E. coli

[38]

E. coli

[39]

B. subtilis AS3

[40]

C. glutamicum DM 1933

[41]

E. coli

[14]

E. coli

[16]

E. coli

[42]

O R

R

E

C

T

E

D

P

G. oxydans

190

N C

Fig. 1. Matrix of STR-STR and STR-PFR applications with different fluctuating conditions and operation modes (blue dots). The E. coli strain is the standard strain W3110. Alternative approaches or different operation modes within the same publication are displayed as blue circles with a white filling. The experimental setups are arranged by the year of publication. Investigations with redundant application information are mentioned once according to the most recent paper. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

201

2.3. Experimental Access to Single Cell Analysis

202 203

It is a well known fact that microbial populations in bioreactors are rather heterogeneous than homogeneous. A combination of stimuli such as local substrate availabilities, temperature and pH conditions may induce differences in cell cycle status, cell division, growth rates, etc. Müller and Davey [59].

193 194 195 196 197 198

204 205 206

U

199 200

Application examples are given by transcript time series and monitoring of metabolic changes reflecting the stimuli of glucose Löffler et al. [14,15]; Simen et al. [54], nitrogen Brown et al. [55], oxygen von Wulffen et al. [16]; Liu et al. [56] or temperature stress Caspeta et al. [57] of E. coli. Data like this, derived from transcriptome analysis as it is described in Fig. 2, are the basis of proper validated mathematical models. Transcript analysis even enabled the engineering of robust E. coli strains Michalowski et al. [58] by attenuating the level of the alarmone ppGpp, the inducer of the stringent response regulation program. The new host is able to maintain high glucose uptake rates even under non or slow growing conditions.

191 192

Such subpopulations can be experimentally analyzed via studies, for instance using fully-automated real-time, flow injection flow cytometry (FI-FCM) Broger et al. [60]; Brognaux et al. [61] or real-time imaging in combination with microfluidic cultivation devices Dusny et al. [62]; Taheri-Araghi and Jun [63]; Westerwalbesloh et al. [64]. Bennett et al. Bennett and Hasty [65] reviewed several microfluidic devices which can be used to examine intracellular signaling pathways and the dynamics of gene regulation in bacteria, yeast and higher eukaryotes on a single cell basis. Often, on-line monitoring is combined with microfluidic studies to achieve full resolution of complex interactions. These technologies are expected to yield novel insights and allow the construction of mathematical models that more accurately describe the complex dynamics of gene regulation Bennett and Hasty [65]. Lemoine et al. provided a review about tools for monitoring population heterogeneities on a single cell basis Lemoine et al. [66]. Today, even single cell transcription analysis using novel sequencing technologies is becoming achievable Bossert et al. [67] which may further increase the quality of mathematical models.

Please cite this article as: Zieringer J, Takors R, In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models, Comput Struct Biotechnol J (2018), https://doi.org/10.1016/j.csbj.2018.06.002

207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225

4

J. Zieringer, R. Takors / Computational and Structural Biotechnology Journal xxx (2018) xxx–xxx

If new genes/ alignments are discovered

Reference genome

Counting

Aligned reads (.bam, .sam)

Read count table (.txt)

F

Mapping

O

Filtering, Normalizing

R O

Transcriptomic data (.fastq, .fasta)

GEO, EBI Database

Analysis Tools: DeSeq2, MaSigPro, Limma, EdgeR, ...

Analysis of results

P

BRENDA, Regulon DB, EcoCyc, SABIO-RK, iTAP, ...

Clustering

E

C

T

E

Gene Pattern (Up-, Down-, CoRegulation)

Gene Expression Analysis

Differentially expressed genes

D

Time Series Analysis, Enrichment Analysis

Normalized read count table (.txt)

Experimental design

O

R R

Fig. 2. Workflow illustrating the general procedure when analyzing gene expression data. RNA Seq analysis creates FASTA and FASTQ as raw data formats compiled in gene expression repositories (GEO, EBI), followed by SAM or BAM files for aligned reads. The analysis steps are, in general: (1) Mapping the reference genome onto the transcriptomic data, (2) Counting reads, (3) Filtering low read counts and normalizing counts, (4) Gene Expression Analysis, (5) Clustering, (6) Time Series Analysis and Enrichment Analysis. DeSeq2, MaSigPro, limma and edgeR are often applied packages within the language R to analyze transcriptomic data, in case of differential gene expression as well as gene pattern analysis. The resulting dynamic expressions and parameters are stored in databases like BRENDA Scheer et al. [44], Regulon DB Salgado et al. [45], EcoCyc Keseler et al. [46], SABIO-RK Wittig et al. [47] and iTAP Sundararaman et al. [48].

3. Modeling Microbial Growth with Different Granularity

227

Based on proper analyzed experimentally data sets, mathematical models can be derived to simulate the microbial behaviour under different conditions with a varying level of detail. Following the well-known classification of Bailey et al. Bailey [68] microbial models can be divided into non-structured/structured and non-segregated/segregated approaches. Non-structured/Non-segregated approaches represent the simplest growth models assuming average cells without subcellular detail. Such models are typically applied for bioprocess design. For the sake of simplicity, they are also applied in agent-based modeling for tracking individual cells. The consideration of subpopulations or individual cell properties leads to segregated approaches which, thanks to the improving availability of experimental data, is gaining more and more attraction. Structured, non-segregated models are commonly used for implementing the subcellular details of metabolic and transcriptional regulation, compartmentation or signal transduction Nielsen et al. [69]; Tang et al. [70]. They are computationally intensive but represent a powerful tool for predicting detailed cellular responses to extracellular stimuli. The most accurate approach are structured/ segregated models Chassagnole et al. [71], which for example describe

232 233 234 235 236 237 238 239 240 241 242 243 244 245

N

230 231

U

228 229

C

226

the whole glycolysis process with reactions for each enzyme, depending on enzyme affinites and turn over rates. These paramters are more difficult to identify but transferable to other conditions. However, models like this are limited in scale, due to the complexity of the cellular mechanisms and the single cell consideration, which results in a quadratic scaling problem.

246

3.1. Identifying Structured and Non-Structured Microbial Models

252

Non-segregated, structured models typically consist of a rigid network structure and a set of rate expressions including sensitive parameters. Knowledge of the network structure, the kinetic equations and the parameters is key to identifying a proper model. Often, such structures are determined following the bottom-up approach, i.e. the statistically profound identification of correlations between the structuring elements based on experimental data. The bottom-up concept can be applied to merge already existing small-scale models into large models Klipp et al. [72]; Guido et al. [73]; Brandon et al. [74]. Alternatively, top-down approaches aim for the identification of model parameters for a given structure. Accordingly, the top-down approach is a powerful tool for deciphering details of pathway interaction with the

253

Please cite this article as: Zieringer J, Takors R, In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models, Comput Struct Biotechnol J (2018), https://doi.org/10.1016/j.csbj.2018.06.002

247 248 249 250 251

254 255 256 257 258 259 260 261 262 263 264

J. Zieringer, R. Takors / Computational and Structural Biotechnology Journal xxx (2018) xxx–xxx Table 1 Comparison of bottom-up and top-down approach. Bottom-up

Top-down

t1:4

Design steps

From single molecule to pattern

t1:5

Model size

Small-scale

t1:6

Model complexity Prediction goals Limitations

Detailed

From single elements to relations Coarse-grained large-scale Global

3.2. Identifying Gene Regulatory Networks (GRNs)

272

When cells are exposed to dynamic stimuli, such as the fluctuating micro-environmental changes in large-scale bioreactors, they show short- and long-term physiological responses. Whereas the first are dominated by metabolic interactions, the second include strategies for microbial adaptation usually comprising changes of transcriptome and proteome. However, recent findings Löffler et al. [14,15]; von Wulffen et al. [16]; Simen et al. [54] have shown that transcriptional response occurs massively, even during short-term, sub-minute periods. Accordingly, GRN models gain importance even for modeling short-term responses which explicitly motivates their use. A gene regulatory network links transcription factors to their target genes thereby creating a dynamic interaction map connecting external stimuli with internal transcriptional and even metabolic responses. Accordingly, GRN models may comprise the signal stimulus, its transduction to the receptor, the transcriptional response and downstream processes such as translation, post-translational modifications of protein and protein degradation. Altogether, these interactions form a very complex regulatory network. Roughly, the plentitude of GRNs may be divided into three representative approaches: continuous models (in this case based on ordinary differential equations) Chassagnole et al. [71]; Bolouri and Davidson [86]; Kremling et al. [87]; Lemuth et al. [88]; Hardiman et al. [89]; Machado et al. [90]; Khodayari and Maranas [91], Boolean models Thomas [92]; Kauffman et al. [93]; Davidich and Bornholdt [94]; Wang et al. [95] and probabilistic models Qian and Elson [96]; Turner et al. [97]; Chandrasekaran and Price [98]; Nieß et al. [99]. These and other methods, such as Petri nets, Bayesian networks or neural networks, have been extensively reviewed by Karlebach et al. Karlebach and Shamir [100] and Machado et al. Machado et al. [101]. Stochastic models start from the assumption that gene expression should be described by random events e.g. caused by the shortage of mRNA molecules and factors of transcription. Similarly, initiation and elongation factors are scarce and may cause stochastic translation processes. Accordingly, the continuum paradigm, i.e. the assumption of a sufficient, homogeneous availability of each model entity, may be questionable and could be checked using the chemical master equation. In case the number of molecules per cell is too low, stochastic models should be considered Barberis et al. [102]. Pragmatic guidelines have been published by Kremling et al. Kremling et al. [103] and Turner et al. Turner et al. [97], identifying molecule numbers of about 100 per cell as the threshold value. Alternatively, systems of ordinary differential equations (ODEs) can be applied ignoring stochastic transcription events and assuming

283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314

C

281 282

E

279 280

R

277 278

O R

275 276

N C

273 274

U

268 269

F

271

266 267

T

270

network, provided that the given structure is correct Chou and Voit [75]; Erickson et al. [76]. Table 1 depicts a comparison of the two approaches, including prediction goals and limitations. Statements hold true irrespective of whether model complexity is limited to metabolic interactions or whether superior regulation levels such as transcriptional or post-translational feedbacks are included.

O

265

Lack of kinetic parameters and in-depth knowledge

Global cellular dynamics Neglect of single reaction steps

R O

t1:8

Detailed time-scale resolution

P

t1:7

D

t1:3

cellular continuum instead. Cellular entities are simulated as continuous time courses. Such models require knowledge about gene regulatory mechanisms to select appropriate rate laws and to identify corresponding rate parameters for parameter estimation. Standardized formats simplify the development process and encourage the automatic construction of kinetic models. However, the prediction quality of the model may deteriorate for the prediction of cellular states that are not reflected by the experimental data. Checking thermodynamic feasibility for estimated fluxes is strongly encouraged to prevent misleading findings Costa et al. [104]. If only a small amount of data is available, Boolean approaches may be useful to model regulatory networks. As a key feature, Boolean models consider on/off activation and the inhibition of transcription factors and genes. Accordingly, such models are helpful to predict on/off-like gene switching but fail to simulate distinct time series of transcriptional dynamics that occur after frequent stimulations in large-scale bioreactors. Moreover, Mochizuki et al. Mochizuki [105] showed that high prediction qualities and the easy handling of Boolean models is limited to small-scale models. Consequently, large-scale GRNs should preferably be composed of ODEs Karlebach and Shamir [100]. A short summary of the different class of models with their specific advantages and disadvantages is given in Table 2. Inferring gene regulatory networks from gene expression data remains a challenging task due to the large number of potential interactions, the relatively small number of available measurements and the intrinsic noise often caused by the biological variance which reflects the heterogeneity of the cell population. Despite success with automated model set-up and identification, manual curation of the inferred network interactions can become time intensive and cumbersome due to the amount of data investigated Margolin et al. [106]. To achieve high prediction quality, kinetic parameters need to be accurate and sensitive, i.e. parameter variance should be low and parameter sensitivity should be high to enable highly accurate model prediction with the lowest amount of parameters necessary. Parameter values may be extracted from experimental data, taken from public databases or already existing dynamic models and kinetics used. Parameters of the GRN model can be deduced from experimental data usually applying least-square error estimation. In essence, parameter estimation is an optimization problem which minimizes the weighted squared distance between simulation and experimental observation to achieve a parameter set for the least squares. Such approaches can be combined with model discrimination methods Degenring et al. [107]. In case the parameter estimation problem does not have a unique solution, the solution space can be further constrained using thermodynamic laws or by expanding the experimental basis taking into account other experimental conditions to challenge the applicability of the model Almquist et al. [108]. Parameter estimation methods have been reviewed by Lillacci et al. Lillacci and Khammash [109]. For example, small-scale regulatory networks of E. coli Chassagnole et al. [71]; Kremling et al. [87]; Hardiman et al. [110] and also large or genome-scale regulatory networks Chandrasekaran and Price [98]; Faria et al. [111]; Ma et al. [112]; Reed et al. [113] have already been published. In general, applicants should pay attention to the transferability of the models because reference conditions may be different compared to the current case which is likely to cause the improper extrapolation of experimental findings. Once a model has been identified, its validity needs to be checked against new data sets that were not used for parameter identification. When models can successfully simulate such new data, it is a strong indication that the mechanistic principles and assumptions behind the model are sound. If a model fails to pass the validation step, the modeler needs to revise the previous steps of their modeling process. Recent examples of data-driven GRN models are given by Erickson Erickson et al. [76], Palsson and Nielson O'brien et al. (2013); Liu et al. [115]; Thiele et al. [116]; Bordbar et al. [117] and others Chandrasekaran

E

t1:1 t1:2

5

Please cite this article as: Zieringer J, Takors R, In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models, Comput Struct Biotechnol J (2018), https://doi.org/10.1016/j.csbj.2018.06.002

315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 Q3 380

6 t2:1 t2:2

J. Zieringer, R. Takors / Computational and Structural Biotechnology Journal xxx (2018) xxx–xxx

Table 2 Summary of advantages, disadvantages and application of the above mentioned methods.

t2:3

Model class

Application

Advantages

Disadvantages

t2:4 t2:5 t2:6 t2:7 t2:8 t2:9

Structured Non-structured Segregated Non-segregated Probabilistic Continuous

Cellular compartmentation Easy to build More representative and informative Easy to build, for a large number of cells Realistic behaviour Large-scale possible

t2:10

Boolean

Systems in transient state Steady-state systems Heterogeneous, individual cell systems Systems with average cell description Randomly distributed events in time Evenly distributed events in time (cellular continuum) Discrete dynamical system

Biological knowledge Only phenomenological cell description Difficult to handle mathematically Average cell description Only for low number of molecules in cell Detailed knowledge about mechanisms and biological parameters Fails to simulate distinct time series

4.1. Modeling of Hydrodynamics and Mass Transfer

406

To describe hydrodynamic turbulence, multiple suggestions have been published, often applying the modified Reynolds-averaged Navier-Stokes equation (RANS) for multiphase systems Ahlstedt and Lahtinen [124]. Other alternative approaches, such as Large Eddy Simulation (LES) and Direct Numerical Simulation (DNS), offer increased accuracy, but require immense computational capacity Hoekstra et al. [125]; Hartmann et al. [126] as displayed in Fig. 3. The Reynolds-averaged Navier-Stokes equations are time-averaged equations of motion for turbulent flows approximating different turbulent scales through fluctuating quantities, an idea first proposed by Reynolds Reynolds [127]. RANS models offer the most economic approach for simulating complex turbulent flows, because turbulences are considered with different levels of complexity. The most common RANS turbulence models are classified with respect to the number of additional transport equations that need to be solved along with the RANS flow equation. Besides, the often used two-equation models, such as the standard k-ϵ, k-ω or Renormalization group k-ϵ models, one-equation models (low-cost RANS models, e.g. the SpalartAllmaras approach) or even zero-equation models which estimate the turbulence viscosity via the mean velocity and the length scale using an empirical formula are available Fluent [128]. Details are given in the review of Rodi Rodi [129]. In addition to the simulation of turbulence, the proper modeling of interactions between different phases (e.g. aqueous media, air bubbles

409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429

C

T

Embedded

E

R R

407 408

with

O

401 402

C

399 400

N

398

U

396 397

Environment

F

405

394 395

Cellular

O

403 404

Large-scale bioreactor conditions need to be calculated, aiming at a spatial resolution of the mass, momentum and energy balances via numerical simulation. In particular, the Navier-Stokes equations (NSE) representing the conservation of momentum, the continuity equation representing the conservation of mass and the energy equation predicting the temperature in the fluid of a multiphase system have to be considered. The Navier-Stokes equations basically describe the motion of viscous fluid flows where the fluids are considered as a continuum rather than a number of colliding particles. Under the typical mixing conditions given, the occurrence of turbulent zones is likely. Turbulence is defined as a state consisting of structures such as eddies which affect molecular diffusion, heat transfer and the mixing behavior.

the

R O

393

388 389

P

4. Simulating Growing Cells

386 387

D

391 392

384 385

and solid cells) is a challenge. Table 3 provides an overview of three common approaches. Another way to predict hydrodynamics is the use of compartment models (CM) Vrábel et al. [130]. Characteristically, the reactor is divided into a subset of spatial parts, each assumed to be ideally mixed, see Fig. 4. Compartment models are much less computationally demanding than CFD simulations and moreover, allow easy implementation of complex reaction schemes. Fluxes between the compartments are often defined by considering global quantities which are not representative of the flow complexity. Moreover, incoming concentrations are instantly ideally mixed in the whole compartment and erratic changes occur, which are not observed in reality Delafosse et al. [24]. Recently, the combination of CFD and CM modeling has been presented to couple the accuracy of hydrodynamic CFD simulations with the simplicity and speed of compartmented modelings Delafosse et al. [24]; Bezzo et al. [131]; Guha et al. [132]; Le Moullec et al. [133]. As shown in Fig. 5, the approach can be applied for describing concentration gradients in industrial-scale bioreactors, calculating the intercompartmental fluxes from CFD velocity fields. Characteristically, turbulent liquid flows are computed by CFD first, followed by the implementation of net mean and turbulent flow rates in the compartment approach. The simplicity of the approach even allows complex genome-scale kinetic models to be used. Likewise for gene regulatory models, fluid flow simulations must be validated based on independent experimental data. However, experimental observation of large-scale hydrodynamics is often lacking, which limits the comparison with the predicted flow patterns.

E

390

and Price [98]; Lerman et al. [118]; Gonçalves et al. [119]; Ma et al. [120]; Klosik et al. [121]; Arrieta-Ortiz et al. [122]. For example, Klipp et al. Klipp et al. [72]; Klipp [123] describes bacterial growth transitions considering the proteome level, the complex interactions of the yeast cell cycle and the prediction of complex regulatory patterns following the mindset of optimized resource allocation in yeasts, respectively. The current developments of metabolism and gene expression (ME) models are in line with the pioneering approach of Chassagnole et al. Chassagnole et al. [71] who published a comprehensive dynamic model of the central metabolism in E. coli.

382 383

430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457

RANS

Model scope

381

Small amount of experimental data (On/Off conditions)

LES

Grid-based CFD Methods

103

DNS

104

105

106

107

108

Computational capacity Fig. 3. Different approaches of turbulence models, regarding the model scope and fields of application, as well as the corresponding computational capacity required. RANS: Reynolds-averaged Navier-Stokes equation, LES: Large Eddy Simulation, DNS: Direct Numerical Simulation.

Please cite this article as: Zieringer J, Takors R, In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models, Comput Struct Biotechnol J (2018), https://doi.org/10.1016/j.csbj.2018.06.002

Numbe grid ce

J. Zieringer, R. Takors / Computational and Structural Biotechnology Journal xxx (2018) xxx–xxx

t3:12

Reactor heterogeneity Computational effort High Level of detail High Prediction accuracy of High flow regime Single cell tracking No Integrable model size Coarse-grained small-scale Amount of particles High

Compartmentation

Cell-Environment Interaction High High High

Reactor design

Yes Coarse-grained small-scale Low (b10%)

Yes Genome-scale

Low Low Low

High

F

t3:6 t3:7 t3:8 t3:9 t3:10 t3:11

Prediction purpose

Euler-Lagrange

4.2. Hydrodynamic Modeling Linked to GRN Models

459 460

The physiological state of microorganisms and its impact on growth and product formation is the result of a complex interaction between the cellular environment and the cells. Large-scale studies have shown that homogeneous culture conditions are difficult to establish, nevertheless process engineering and bioreactor design may aim to create the least heterogeneous impact possible Lara et al. [134]. So far, large-scale simulations almost entirely focused on the integration of metabolism kinetics. They basically mirror the instantaneous cellular response on environmental changes Haringa et al. [21]; Kuschel et al. [143]; Haringa et al. [144]. However, cells react in a multi-

PFR:

T

Compartmentation:

E

C

XN

Z X2

–DiL

ΔX

R

467 468

O R

465 466

X1

STR:

N C

463 464

Compartmentation:

DiL Z

∂2ci ∂z2

uL ∂ci eL ∂z



uL ∂ci eL ∂z

∂Xi ∂t

Xj

+DiL

∂2ci ∂z2

DiL

∂2ci ∂z2

(mi–mdi)Xi

+

uL ∂ci eL ∂z

uL ∂ci eL ∂z

∂Xi ∂t

Xj,k

X2 X1

∂2ci ∂z2

(mi–mdi)Xi

XN

U

461 462

E

458

O

Euler-Euler

R O

CFD

t3:4

P

t3:3

t3:5

response, multi-layer fashion also comprising the on- and offset of transcriptional regulation programs. Such responses are triggered in poorly mixed zones and are propagated into well mixed zones Löffler et al. [14]; Nieß et al. [99]. Initiation and execution may be spatially disconnected which differs fundamentally from the metabolic responses studied so far. To investigate the consequences of environmental heterogeneities, proper modeling frameworks should link local variations with cellular and subcellular kinetics. The tool of choice is CFD simulation which can link the interaction of cellular activities with local environments Delvigne et al. [22]; Kelly [135]; Noorman [136]. Regarding the Euler-Euler approach, the liquid phase and the microorganisms are considered as a continuum. A continuum is a continuous system which does not allow erratic changes Schmalzriedt et al. [137]. However, microorganisms are individual in their behavior and therefore the continuum description is a greatly simplified assumption. As a result, the continuum approach leads to a lack of individual responses of the cells. Conventional Euler-Euler approaches of two-phase flow scenarios can be extended considering Population Balance Equations (PBEs) with unstructured kinetic growth models Morchain et al. [138]; Heins et al. [139]; Bouguettoucha et al. [140]; Pateraki et al. [141]. PBEs are used to model population adaptation dynamics considering nutrient gradients inside large-scale bioreactors. In general, Euler-Euler approaches in combination with PBEs are suited to model particle (cell) swarms that follow flow patterns in the reactor Wang et al. [142].

Table 3 Comparison of CM and CFD model approaches.

D

t3:1 t3:2

7

DiL

∂2ci ∂z2

DiL

∂2ci ∂z2

uL ∂ci eL ∂z

uL ∂ci eL ∂z

1 ... M

Fig. 4. Compartmentation of a PFR and STR. Each section is homogeneously mixed and represented by a partial differential equation that usually has an accumulation term, a source/sink term due to bacterial growth and death, and convection and diffusion terms (in/out) which describe the environmental conditions. Other than the PFR, the STR needs finer discretization due to tangential mixing.

Please cite this article as: Zieringer J, Takors R, In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models, Comput Struct Biotechnol J (2018), https://doi.org/10.1016/j.csbj.2018.06.002

469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495

8

J. Zieringer, R. Takors / Computational and Structural Biotechnology Journal xxx (2018) xxx–xxx

Turbulence Modelling

Reactor Compartmentation

ANSYS Fluent

1 ... M

xN

z

Mean Flow Rate:

O

Reactor Construction + Meshing

F

x2 x1

P

Metlab

R O

F ij = A ij ( U k( i ) + U k( j ) ) / 2

E

D

Genome-scale Model

C

T

CFD-based Compartment Model

503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522

O

501 502

C

499 500

However, an inherent limitation of PBEs is that the incorporation of a detailed kinetic network leads to massive computational effort because of high dimensional distribution functions that need to be solved. Additionally, no information on the level of single particles, such as their lifelines and history can be obtained with this approach. This limitation can be tackled by using the Euler-Lagrange approach, which tracks the fate of each particle (cell) individually. The Lagrangian implementation requires detailed metabolic models of the cell, e.g. to describe the transport processes across the cellular membrane, via substrate uptake rates and product excretion rates. For simplification, massless cells are often used which are described via Monod-like black-box models. Such cells are assumed to travel along the flow fields thereby experiencing substrate gradients. Notably, the cellular environment is typically ‘frozen’, i.e. fundamental cellular reactions are implemented in the Euler continuum and traveling cells only respond to the given hydrodynamic and concentration gradients. Kuschel et al. [143]. Pioneering studies have been performed by Lapin et al. Lapin et al. [18] and have been elaborated further in many follow-up studies Haringa et al. [21]; Kuschel et al. [143]; Haringa et al. [144]; Westerwalbesloh et al. [145]. Such studies clearly outline that cells are subject to repetitive and fast changes which in turn create heterogeneity within the population. However, the computational effort for the spatial resolution of the conservation equations is high, requiring smart compositions of the computational grid and, for simplifying, assumptions to solve the numerical problem. Recently, Chen et al. Chen et al. [146] used a rather simple CM approach to simulate a syngas fermentation of Clostridium

N

497 498

U

496

R R

E

Fig. 5. CFD-based compartment model set-up steps. (1) Construction and meshing of the reactor, (2) CFD simulation of the turbulent flow (velocity components, dissipation rate, turbulent kinetic energy), (3) Definition of reactor compartmentation, (4) Definition of mean flow Fij between cell i and cell j, where Uk(i) is the velocity component and Aij is the area of the face between the two cells, (5) Incorporation of genome-scale model. Common simulation frameworks: Step (1) & (2): ANSYS Fluent, Step (3), (4) & (5): Matlab.

ljungdahlii. As a result, they could show that multi-compartment approaches, even if not widely used yet, give good results regarding the interaction of rather complex cells with their environment. Thus, in situations where a simple model structure meets the requirements of the modeling purpose, non-essential details should be avoided since they will unnecessarily prolong the modeling process.

523

5. Conclusion and Perspectives

529

Understanding the function of cellular behavior under varying conditions requires the development of computational approaches that incorporate gene regulatory models as well as environmental perturbation simulations relying on reliable experimental evidence. On the one hand, due to efficient large-scale simulations and stimulus/response experiments, experimental findings have revealed a complex organization of regulatory response in the cell and improved the understanding of several regulatory processes. To further expand this understanding, development towards single cell resolution techniques has evolved. Although this is at the very beginning, this topic has significant potential for further developments regarding reactor design and genetic engineering towards robust strains. On the other hand, numerical gene regulatory models based on ODE systems or modeling on a single molecular level with stochastic algorithms in combination with hydrodynamic simulations provided a broad and detailed insight into the regulatory mechanisms of microorganisms inside large-scale bioreactors. But due to a lack of large-scale experimental data, many regulation theories are still based to some

530

Please cite this article as: Zieringer J, Takors R, In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models, Comput Struct Biotechnol J (2018), https://doi.org/10.1016/j.csbj.2018.06.002

524 525 526 527 528

531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547

J. Zieringer, R. Takors / Computational and Structural Biotechnology Journal xxx (2018) xxx–xxx

569 570

573 574 Q4 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616

F

None declared.

E

567 568

Uncited references [25,27,77–85,114]

T

565 566

C

563 564

References

[1] Bailey J. Toward a science of metabolic engineering. Science 1991;252(5013): 1668–75. [2] Vallino JJ, Stephanopoulos G. Metabolic flux distributions in corynebacterium glutamicum during growth and lysine overproduction. Biotechnol Bioeng 1993; 41(6):633–46. [3] Jacob F, Monod J. Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 1961;3(3):318–56. [4] Kitano H. Computational cellular dynamics: a network-physics integral. Nat Rev Mol Cell Biol 2006;7(3):163–4. [5] Westerhoff HV, Palsson BO. The evolution of molecular biology into systems biology. Nat Biotechnol 2004;22(10):1249–52. [6] Lee KH, Park JH, Kim TY, Kim HU, Lee SY. Systems metabolic engineering of Escherichia coli for l-threonine production. Mol Syst Biol 2007;3(1):149. [7] Park JH, Lee SY. Towards systems metabolic engineering of microorganisms for amino acid production. Curr Opin Biotechnol 2008;19(5):454–60. [8] Becker J, Zelder O, Häfner S, Schröder H, Wittmann C. From zero to hero. Designbased systems metabolic engineering of corynebacterium glutamicum for l-lysine production. Metab Eng 2011;13(2):159–68. [9] Wittmann C, Lee SY. Systems metabolic engineering. Springer Science & Business Media; 2012. [10] Oosterhuis N, Groesbeek N, Olivier A, Kossen N. Scale-down aspects of the gluconic acid fermentation. Biotechnol Lett 1983;5(3):141–6. [11] Oosterhuis N, Kossen N. Dissolved oxygen concentration profiles in a productionscale bioreactor. Biotechnol Bioeng 1984;26(5):546–50. [12] Neubauer P, Häggström L, Enfors SO. Influence of substrate oscillations on acetate formation and growth yield in Escherichia coli glucose limited fed-batch cultivations. Biotechnol Bioeng 1995;47(2):139–46. [13] Buchholz J, Graf M, Freund A, Busche T, Kalinowski J, Blombach B, et al. Co2/hco3perturbations of simulated large scale gradients in a scale-down device cause fast transcriptional responses in corynebacterium glutamicum. Appl Microbiol Biotechnol 2014;98(20):8563–72. [14] Löffler M, Simen JD, Jäger G, Schäferhoff K, Freund A, Takors R. Engineering E. colifor large-scale production–strategies considering atp expenses and transcriptional responses. Metab Eng 2016;38:73–85. [15] Löffler M, Simen JD, Müller J, Jäger G, Laghrami S, Schäferhoff K, et al. Switching between nitrogen and glucose limitation: unraveling transcriptional dynamics in Escherichia coli. J Biotechnol 2017;285:2–12. [16] von Wulffen J, Ulmer A, Jäger G, Sawodny O, Feuer R. Rapid sampling of Escherichia coli after changing oxygen conditions reveals transcriptional dynamics. Gene 2017; 8(3):90.

E

561 562

R

559 560

O R

557 558

N C

555 556

U

554

O

Conflict of Interest

552 553

R O

572

550 551

[17] Bylund F, Collet E, Enfors SO, Larsson G. Substrate gradient formation in the largescale bioreactor lowers cell yield and increases by-product formation. Bioprocess Eng 1998;18(3):171–80. [18] Lapin A, Müller D, Reuss M. Dynamic behavior of microbial populations in stirred bioreactors simulated with euler-lagrange methods: traveling along the lifelines of single cells. Ind Eng Chem Res 2004;43(16):4647–56. [19] Lapin A, Schmid J, Reuss M. Modeling the dynamics of E. coli populations in the three-dimensional turbulent field of a stirred-tank bioreactor—a structured– segregated approach. Chem Eng Sci 2006;61(14):4783–97. [20] Wang G, Tang W, Xia J, Chu J, Noorman H, Gulik WM. Integration of microbial kinetics and fluid dynamics toward model-driven scale-up of industrial bioprocesses. Eng Life Sci 2015;15(1):20–9. [21] Haringa C, Tang W, Deshmukh AT, Xia J, Reuss M, Heijnen JJ, et al. Euler-lagrange computational fluid dynamics for (bio) reactor scale down: an analysis of organism lifelines. Eng Life Sci 2016;16(7):652–63. [22] Delvigne F, Takors R, Mudde R, Gulik W, Noorman H. Bioprocess scale-up/down as integrative enabling technology: from fluid mechanics to systems biology and beyond. Microb Biotechnol 2017;10(5):1267–74. [23] Takors R. Scale-up of microbial processes: impacts, tools and open questions. J Biotechnol 2012;160(1):3–9. [24] Delafosse A, Collignon ML, Calvo S, Delvigne F, Crine M, Thonart P, et al. Cfd-based compartment model for description of mixing in bioreactors. Chem Eng Sci 2014; 106:76–85. [25] Delvigne F, Destain J, Thonart P. A methodology for the design of scale-down bioreactors by the use of mixing and circulation stochastic models. Biochem Eng J 2006;28(3):256–68. [26] Neubauer P, Junne S. Scale-down simulators for metabolic analysis of large-scale bioprocesses. Curr Opin Biotechnol 2010;21(1):114–21. [27] Papagianni M, Mattey M, Kristiansen B. Design of a tubular loop bioreactor for scale-up and scale-down of fermentation processes. Biotechnol Prog 2003;19(5): 1498–504. [28] Lemoine A, Maya Martínez-Iturralde N, Spann R, Neubauer P, Junne S. Response of corynebacterium glutamicum exposed to oscillating cultivation conditions in a two-and a novel three-compartment scale-down bioreactor. Biotechnol Bioeng 2015;112(6):1220–31. [29] Xu B, Jahic M, Blomsten G, Enfors SO. Glucose overflow metabolism and mixed-acid fermentation in aerobic large-scale fed-batch processes with Escherichia coli. Appl Microbiol Biotechnol 1999;51(5):564–71. [30] Egli T. On multiple-nutrient-limited growth of microorganisms, with special reference to dual limitation by carbon and nitrogen substrates. Antonie Van Leeuwenhoek 1991;60(3–4):225–34. [31] Oldiges M, Lütz S, Pflug S, Schroer K, Stein N, Wiendahl C. Metabolomics: current state and evolving methodologies and tools. Appl Microbiol Biotechnol 2007;76 (3):495–511. [32] Teleki A, Sánchez-Kopper A, Takors R. Alkaline conditions in hydrophilic interaction liquid chromatography for intracellular metabolite quantification using tandem mass spectrometry. Anal Biochem 2015;475:4–13. [33] Pfizenmaier J, Junghans L, Teleki A, Takors R. Hyperosmotic stimulus study discloses benefits in atp supply and reveals mirna/mrna targets to improve recombinant protein production of cho cells. Biotechnol J 2016;11(8):1037–47. [34] Matuszczyk JC, Teleki A, Pfizenmaier J, Takors R. Compartment-specific metabolomics for cho reveals that atp pools in mitochondria are much lower than in cytosol. Biotechnol J 2015;10(10):1639–50. [35] Fernie AR, Trethewey RN, Krotzky AJ, Willmitzer L. Metabolite profiling: from diagnostics to systems biology. Nat Rev Mol Cell Biol 2004;5(9):763. [36] Winder CL, Dunn WB, Schuler S, Broadhurst D, Jarvis R, Stephens GM, et al. Global metabolic profiling of Escherichia coli cultures: an evaluation of methods for quenching and extraction of intracellular metabolites. Anal Chem 2008;80(8): 2939–48. [37] Fiehn O. Metabolomics, the link between genotypes and phenotypes. Functional Genomics. Springer; 2002. p. 155–71. [38] Gygi SP, Rist B, Gerber SA, Turecek F, Gelb MH, Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol 1999; 17(10):994. [39] Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature 2003;422 (6928):198. [40] Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, et al. The transcriptional landscape of the yeast genome defined by rna sequencing. Science 2008;320 (5881):1344–9. [41] Nookaew I, Papini M, Pornputtapong N, Scalcinati G, Fagerberg L, Uhlén M, et al. A comprehensive comparison of rna-seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae. Nucleic Acids Res 2012;40(20):10084–97. [42] Wang Z, Gerstein M, Snyder M. Rna-seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009;10(1):57–63. [43] Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for rna-seq data analysis. Genome Biol 2016;17(1):13. [44] Scheer M, Grote A, Chang A, Schomburg I, Munaretto C, Rother M, et al. Brenda, the enzyme information system in 2011. Nucleic Acids Res 2010;39(Suppl_1):D670–6. [45] Salgado H, Gama-Castro S, Martínez-Antonio A, Díaz-Peredo E, Sánchez-Solano F, Peralta-Gil M, et al. Regulondb (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli k-12. Nucleic Acids Res 2004; 32(1):D303–6 suppl. [46] Keseler IM, Mackie A, Peralta-Gil M, Santos-Zavaleta A, Gama-Castro S, BonavidesMartínez C, et al. Ecocyc: fusing model organism databases with systems biology. Nucleic Acids Res 2013;41(D1):D605–12.

D

571

extent on empirical observations. To date, hydrodynamic simulations as well as kinetic cellular models are available with different scales of complexity which favors the usability regarding the computational effort. It could also be shown that a combination of already existing methods is often advantageous, such as CFD-based compartment models, providing the possibility of combining genome-scale models with hydrodynamic simulations. Based on the extended variety and good results of cellular and hydrodynamic modeling approaches and the availability of reliable experimental data allowing detailed insight into cellular mechanisms, the time is right to use and combine these methods to predict the large-scale performance of microbial producers. However, the above discussion has highlighted the need for knowledge-based process scale-up by elucidating the putative contributions of modeling. The contribution of numerical simulations also warrants further investigation with in vivo experiments that incorporate large-scale conditions and single cell resolution. The development towards automated high resolution processes and the detection of single cell behavior is a promising trend. This review shows that the basis to predict in silico large-scale performance of microbial producers is given. As a result, robust strains, as well as reactor design parameters and optimized cultivation conditions for more efficient processes can be developed.

P

548 549

9

Please cite this article as: Zieringer J, Takors R, In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models, Comput Struct Biotechnol J (2018), https://doi.org/10.1016/j.csbj.2018.06.002

617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702

D

P

R O

O

F

[79] Covert MW, Xiao N, Chen TJ, Karr JR. Integrating metabolic, transcriptional regulatory and signal transduction models in Escherichia coli. Bioinformatics 2008;24 (18):2044–50. [80] Buescher JM, Liebermeister W, Jules M, Uhr M, Muntel J, Botella E, et al. Global network reorganization during dynamic adaptations of bacillus subtilis metabolism. Science 2012;335(6072):1099–103. [81] Varma A, Palsson BO. Metabolic flux balancing: basic concepts, scientific and practical use. Nat Biotechnol 1994;12(10):994. [82] Sauer U. Metabolic networks in motion: 13c-based flux analysis. Mol Syst Biol 2006;2(1):62. [83] Feist AM, Palsson BØ. The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli. Nat Biotechnol 2008;26(6):659. [84] Lima AP, Baixinho V, Machado D, Rocha I. A comparative analysis of dynamic models of the central carbon metabolism of Escherichia coli. IFAC-PapersOnLine 2016;49(26):270–6. [85] Vasilakou E, Machado D, Theorell A, Rocha I, Nöh K, Oldiges M, et al. Current state and challenges for dynamic metabolic modeling. Curr Opin Microbiol 2016;33: 97–104. [86] Bolouri H, Davidson EH. Modeling transcriptional regulatory networks. BioEssays 2002;24(12):1118–29. [87] Kremling A, Bettenbrock K, Gilles ED. Analysis of global control of Escherichia coli carbohydrate uptake. BMC Syst Biol 2007;1(1):42. [88] Lemuth K, Hardiman T, Winter S, Pfeiffer D, Keller M, Lange S, et al. Global transcription and metabolic flux analysis of Escherichia coli in glucose-limited fedbatch cultivations. Appl Environ Microbiol 2008;74(22):7002–15. [89] Hardiman T, Lemuth K, Siemann-Herzberg M, Reuss M. Dynamic modeling of the central metabolism of E. coli–linking metabolite and regulatory networks. Systems Biology and Biotechnology of Escherichia coli. Springer; 2009. p. 209–35. [90] Machado D, Costa RS, Ferreira EC, Rocha I, Tidor B. Exploring the gap between dynamic and constraint-based models of metabolism. Metab Eng 2012;14(2):112–9. [91] Khodayari A, Maranas CD. A genome-scale Escherichia coli kinetic metabolic model k-ecoli457 satisfying flux data for multiple mutant strains. Nat Commun 2016;7. [92] Thomas R. Boolean formalization of genetic control circuits. J Theor Biol 1973;42 (3):563–85. [93] Kauffman S, Peterson C, Samuelsson B, Troein C. Random boolean network models and the yeast transcriptional network. Proc Natl Acad Sci 2003;100(25): 14796–14,799. [94] Davidich M, Bornholdt S. The transition from differential equations to boolean networks: a case study in simplifying a regulatory network model. J Theor Biol 2008; 255(3):269–77. [95] Wang RS, Saadatpour A, Albert R. Boolean modeling in systems biology: an overview of methodology and applications. Phys Biol 2012;9(5):055001. [96] Qian H, Elson EL. Single-molecule enzymology: stochastic michaelis–menten kinetics. Biophys Chem 2002;101:565–76. [97] Turner TE, Schnell S, Burrage K. Stochastic approaches for modeling in vivo reactions. Comput Biol Chem 2004;28(3):165–78. [98] Chandrasekaran S, Price ND. Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis. Proc Natl Acad Sci 2010;107(41):17845–17,850. [99] Nieß A, Löffler M, Simen JD, Takors R. Repetitive short-term stimuli imposed in poor mixing zones induce long-term adaptation of E. coli cultures in large-scale bioreactors: experimental evidence and mathematical model. Front Microbiol 2017;8:1195. [100] Karlebach G, Shamir R. Modeling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol 2008;9(10):770–80. [101] Machado D, Costa RS, Rocha M, Ferreira EC, Tidor B, Rocha I. Modeling formalisms in systems biology. AMB Express 2011;1(1):45. [102] Barberis M, Beck C, Amoussouvi A, Schreiber G, Diener C, Herrmann A, et al. A low number of sic1 mrna molecules ensures a low noise level in cell cycle progression of budding yeast. Mol BioSyst 2011;7(10):2804–12. [103] Kremling A, Heermann R, Centler F, Jung K, Gilles E. Analysis of two-component signal transduction by mathematical modeling using the kdpd/kdpe system of Escherichia coli. Biosystems 2004;78(1):23–37. [104] Costa RS, Machado D, Rocha I, Ferreira E. Critical perspective on the consequences of the limited availability of kinetic data in metabolic dynamic modeling. IET Syst Biol 2011;5(3):157–63. [105] Mochizuki A. An analytical study of the number of steady states in gene regulatory networks. J Theor Biol 2005;236(3):291–310. [106] Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, et al. Aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 2006;7(1):S7. [107] Degenring D, Froemel C, Dikta G, Takors R. Sensitivity analysis for the reduction of complex metabolism models. J Process Control 2004;14(7):729–45. [108] Almquist J, Cvijovic M, Hatzimanikatis V, Nielsen J, Jirstrand M. Kinetic models in industrial biotechnology–improving cell factory performance. Metab Eng 2014; 24:38–60. [109] Lillacci G, Khammash M. Parameter estimation and model selection in computational biology. PLoS Comput Biol 2010;6(3):e1000696. [110] Hardiman T, Lemuth K, Keller MA, Reuss M, Siemann-Herzberg M. Topology of the global regulatory network of carbon limitation in Escherichia coli. J Biotechnol 2007;132(4):359–74. [111] Faria JP, Overbeek R, Xia F, Rocha M, Rocha I, Henry CS. Genome-scale bacterial transcriptional regulatory networks: reconstruction and integrated analysis with metabolic models. Brief Bioinform 2013;15(4):592–611. [112] Ma S, Kemmeren P, Gresham D, Statnikov A. De-novo learning of genome-scale regulatory networks in S. cerevisiae. PLoS One 2014;9(9):e106479.

N

C

O

R R

E

C

T

[47] Wittig U, Kania R, Golebiewski M, Rey M, Shi L, Jong L, et al. Sabio-rk—database for biochemical reaction kinetics. Nucleic Acids Res 2011;40(D1):D790–6. [48] Sundararaman N, Ash C, Guo W, Button R, Singh J, Feng X. itap: integrated transcriptomics and phenotype database for stress response of Escherichia coli and Saccharomyces cerevisiae. BMC Research Notes 2015;8(1):771. [49] Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, et al. Comprehensive evaluation of differential gene expression analysis methods for rna-seq data. Genome Biol 2013;14(9):3158. [50] Hecker M, Lambeck S, Toepfer S, Van Someren E, Guthke R. Gene regulatory network inference: data integration in dynamic models, a review. Biosystems 2009; 96(1):86–103. [51] Banf M, Rhee SY. Computational inference of gene regulatory networks: Approaches, limitations and opportunities. Biochimica et Biophysica Acta (BBA)Gene Regulatory Mechanisms 2017;1860(1):41–52. [52] Love M, Anders S, Huber W. Differential analysis of count data–the deseq2 package. Genome Biol 2014;15:550. [53] Conesa A, Nueda MJ, Ferrer A, Talón M. masigpro: a method to identify significantly differential expression profiles in time-course microarray experiments. Bioinformatics 2006;22(9):1096–102. [54] Simen JD, Löffler M, Jäger G, Schäferhoff K, Freund A, Matthes J, et al. Transcriptional response of Escherichia coli to ammonia and glucose fluctuations. Microb Biotechnol 2017(10):858–72. [55] Brown DR, Barton G, Pan Z, Buck M, Wigneshweraraj S. Nitrogen stress response and stringent response are coupled in Escherichia coli. Nat Commun 2014;5. [56] Liu X, Yang S, Wang F, Dai X, Yang Y, Bai Z. Comparative analysis of the corynebacterium glutamicum transcriptome in response to changes in dissolved oxygen levels. J Ind Microbiol Biotechnol 2017;44(2):181–95. [57] Caspeta L, Flores N, Pérez NO, Bolívar F, Ramírez OT. The effect of heating rate on Escherichia coli metabolism, physiological stress, transcriptional response, and production of temperature-induced recombinant protein: A scale-down study. Biotechnol Bioeng 2009;102(2):468–82. [58] Michalowski A, Siemann-Herzberg M, Takors R. Escherichia coli hgt: Engineered for high glucose throughput even under slowly growing or resting conditions. Metab Eng 2017;40:93–103. [59] Müller S, Davey H. Recent advances in the analysis of individual microbial cells. Cytometry Part A 2009;75(2):83–5. [60] Broger T, Odermatt RP, Huber P, Sonnleitner B. Real-time on-line flow cytometry for bioprocess monitoring. J Biotechnol 2011;154(4):240–7. [61] Brognaux A, Han S, Sørensen SJ, Lebeau F, Thonart P, Delvigne F. A low-cost, multiplexable, automated flow cytometry procedure for the characterization of microbial stress dynamics in bioreactors. Microb Cell Factories 2013;12(1):100. [62] Dusny C, Fritzsch FSO, Frick O, Schmid A. Isolated microbial single cells and resulting micropopulations grow faster in controlled environments. Appl Environ Microbiol 2012;78(19):7132–6. [63] Taheri-Araghi S, Jun S. Single-cell cultivation in microfluidic devices. Hydrocarbon and Lipid Microbiology Protocols: Single-Cell and Single-Molecule Methods 2016: 5–16. [64] Westerwalbesloh C, Grünberger A, Kohlheyer D, von Lieres E. Modeling inhomogeneities across cultivation chamber arrays in single-cell cultivation devices; 2016. [65] Bennett MR, Hasty J. Microfluidic devices for measuring gene network dynamics in single cells. Nat Rev Genet 2009;10(9):628–38. [66] Lemoine A, Delvigne F, Bockisch A, Neubauer P, Junne S. Tools for the determination of population heterogeneity caused by inhomogeneous cultivation conditions. J Biotechnol 2017;251:84–93. [67] Bossert M, Kracht D, Scherer S, Landstorfer R, Neuhaus K. Improving the reliability of rna-seq: Approaching single-cell transcriptomics to explore individuality in bacteria. Information-and Communication Theory in Molecular Biology. Springer; 2018. p. 181–98. [68] Bailey JE. Mathematical modeling and analysis in biochemical engineering: past accomplishments and future opportunities. Biotechnol Prog 1998;14(1):8–20. [69] Nielsen J, Nikolajsen K, Villadsen J. Structured modeling of a microbial system: I. a theoretical study of lactic acid fermentation. Biotechnol Bioeng 1991;38(1): 1–10. [70] Tang W, Deshmukh AT, Haringa C, Wang G, van Gulik W, van Winden W, et al. A 9pool metabolic structured kinetic model describing days to seconds dynamics of growth and product formation by Penicillium chrysogenum. Biotechnol Bioeng 2017;114:1733–43. [71] Chassagnole C, Noisommit-Rizzi N, Schmid JW, Mauch K, Reuss M. Dynamic modeling of the central carbon metabolism of Escherichia coli. Biotechnol Bioeng 2002;79 (1):53–73. [72] Klipp E, Nordlander B, Krüger R, Gennemark P, Hohmann S. Integrative model of the response of yeast to osmotic shock. Nat Biotechnol 2005;23(8):975. [73] Guido NJ, Wang X, Adalsteinsson D, McMillen D, Hasty J, Cantor CR, et al. A bottomup approach to gene regulation. Nature 2006;439(7078):856. [74] Brandon M, Howard B, Lawrence C, Laubenbacher R. Iron acquisition and oxidative stress response in aspergillus fumigatus. BMC Syst Biol 2015;9(1):19. [75] Chou IC, Voit EO. Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Math Biosci 2009;219(2): 57–83. [76] Erickson DW, Schink SJ, Patsalo V, Williamson JR, Gerland U, Hwa T. A global resource allocation strategy governs growth transition kinetics of Escherichia coli. Nature 2017;551(7678):119–23. [77] Covert MW, Schilling CH, Palsson B. Regulation of gene expression in flux balance models of metabolism. J Theor Biol 2001;213(1):73–88. [78] Covert MW, Palsson BØ. Transcriptional regulation in constraints-based metabolic models of Escherichia coli. J Biol Chem 2002;277(31):28058–28,064.

U

703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788

J. Zieringer, R. Takors / Computational and Structural Biotechnology Journal xxx (2018) xxx–xxx

E

10

Please cite this article as: Zieringer J, Takors R, In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models, Comput Struct Biotechnol J (2018), https://doi.org/10.1016/j.csbj.2018.06.002

789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874

J. Zieringer, R. Takors / Computational and Structural Biotechnology Journal xxx (2018) xxx–xxx

E

D

P

R O

O

F

[130] Vrábel P, van der Lans RG, van der Schot FN, Luyben KCA, Xu B, Enfors SO. Cma: integration of fluid dynamics and microbial kinetics in modeling of large-scale fermentations. Chem Eng J 2001;84(3):463–74. [131] Bezzo F, Macchietto S, Pantelides C. General hybrid multizonal/cfd approach for bioreactor modeling. AICHE J 2003;49(8):2133–48. [132] Guha D, Dudukovic M, Ramachandran P, Mehta S, Alvare J. Cfd-based compartmental modeling of single phase stirred-tank reactors. AICHE J 2006;52(5):1836–46. [133] Le Moullec Y, Gentric C, Potier O, Leclerc J. Comparison of systemic, compartmental and cfd modeling approaches: application to the simulation of a biological reactor of wastewater treatment. Chem Eng Sci 2010;65(1):343–50. [134] Lara AR, Galindo E, Ramírez OT, Palomares LA. Living with heterogeneities in bioreactors. Mol Biotechnol 2006;34(3):355–81. [135] Kelly WJ. Using computational fluid dynamics to characterize and improve bioreactor performance. Biotechnol Appl Biochem 2008;49(4):225–38. [136] Noorman H. An industrial perspective on bioreactor scale-down: What we can learn from combined large-scale bioprocess and model fluid studies. Biotechnol J 2011;6(8):934–43. [137] Schmalzriedt S, Jenne M, Mauch K, Reuss M. Integration of physiology and fluid dynamics. Process Integration in Biochemical Engineering. Springer; 2003. p. 19–68. [138] Morchain J, Gabelle JC, Cockx A. A coupled population balance model and cfd approach for the simulation of mixing issues in lab-scale and industrial bioreactors. AICHE J 2014;60(1):27–40. [139] Heins AL, Lencastre Fernandes R, Gernaey KV, Lantz AE. Experimental and in silico investigation of population heterogeneity in continuous Saccharomyces cerevisiae scale-down fermentation in a two-compartment setup. J Chem Technol Biotechnol 2015;90(2):324–40. [140] Bouguettoucha A, Balannec B, Amrane A. Unstructured models for lactic acid fermentation-a review. Food Technol Biotechnol 2011;49(1):3. [141] Pateraki C, Almqvist H, Ladakis D, Lidén G, Koutinas AA, Vlysidis A. Modeling succinic acid fermentation using a xylose based substrate. Biochem Eng J 2016; 114:26–41. [142] Wang T, Wang J, Jin Y. Population balance model for gas- liquid flows: Influence of bubble coalescence and breakup models. Ind Eng Chem Res 2005;44(19):7540–9. [143] Kuschel M, Siebler F, Takors R. Lagrangian trajectories to predict the formation of population heterogeneity in large-scale bioreactors. Bioengineering 2017;4(2):27. [144] Haringa C, Deshmukh AT, Mudde RF, Noorman HJ. Euler-lagrange analysis towards representative down-scaling of a 22 cubic meters aerobic S. cerevisiae fermentation. Chem Eng Sci 2017;16(7):652–63. [145] Westerwalbesloh C, Grünberger A, Stute B, Weber S, Wiechert W, Kohlheyer D, et al. Modeling and cfd simulation of nutrient distribution in picoliter bioreactors for bacterial growth studies on single-cell level. Lab Chip 2015;15(21):4177–86. [146] Chen J, Gomez JA, Höffner K, Phalak P, Barton PI, Henson MA. Spatiotemporal modeling of microbial metabolism. BMC Syst Biol 2016;10(1):21.

N C

O R

R

E

C

T

[113] Reed JL, Vo TD, Schilling CH, Palsson BO. An expanded genome-scale model of Escherichia coli k-12 (i jr904 gsm/gpr). Genome Biol 2003;4(9):R54. [114] O'brien EJ, Lerman JA, Chang RL, Hyduke DR, Palsson BØ. Genome-scale models of metabolism and gene expression extend and refine growth phenotype prediction. Mol Syst Biol 2013;9(1):693. [115] Liu JK, O'Brien EJ, Lerman JA, Zengler K, Palsson BO, Feist AM. Reconstruction and modeling protein translocation and compartmentalization in Escherichia coli at the genome-scale. BMC Syst Biol 2014;8(1):110. [116] Thiele I, Fleming RM, Que R, Bordbar A, Diep D, Palsson BO. Multiscale modeling of metabolism and macromolecular synthesis in E. coli and its application to the evolution of codon usage. PLoS One 2012;7(9):e45635. [117] Bordbar A, Monk JM, King ZA, Palsson BO. Constraint-based models predict metabolic and associated cellular functions. Nat Rev Genet 2014;15(2):107. [118] Lerman JA, Hyduke DR, Latif H, Portnoy VA, Lewis NE, Orth JD, et al. In silico method for modeling metabolism and gene product expression at genome scale. Nat Commun 2012;3:929. [119] Gonçalves E, Bucher J, Ryll A, Niklas J, Mauch K, Klamt S, et al. Bridging the layers: towards integration of signal transduction, regulation and metabolism into mathematical models. Mol BioSyst 2013;9(7):1576–83. [120] Ma S, Minch KJ, Rustad TR, Hobbs S, Zhou SL, Sherman DR, et al. Integrated modeling of gene regulatory and metabolic networks in mycobacterium tuberculosis. PLoS Comput Biol 2015;11(11):e1004543. [121] Klosik DF, Grimbs A, Bornholdt S, Hütt MT. The interdependent network of gene regulation and metabolism is robust where it needs to be. Nat Commun 2017;8 (1):534. [122] Arrieta-Ortiz ML, Hafemeister C, Bate AR, Chu T, Greenfield A, Shuster B, et al. An experimentally supported model of the bacillus subtilis global transcriptional regulatory network. Mol Syst Biol 2015;11(11):839. [123] Klipp E. Modeling dynamic processes in yeast. Yeast 2007;24(11):943–59. [124] Ahlstedt H, Lahtinen M. Calculation of flow field in a stirred tank with rushton turbine impeller. Proceedings of the Third CFX International Users Conference, Chesham, UK, vol. 30. ; 1996. p. 91–108. [125] Hoekstra A, Derksen J, Van Den Akker H. An experimental and numerical study of turbulent swirling flow in gas cyclones. Chem Eng Sci 1999;54(13–14)): 2055–65. [126] Hartmann H, Derksen J, Montavon C, Pearson J, Hamill I, Van den Akker H. Assessment of large eddy and rans stirred tank simulations by means of lda. Chem Eng Sci 2004;59(12):2419–32. [127] Reynolds O. Iv. on the dynamical theory of incompressible viscous fluids and the determination of the criterion. Phil Trans R Soc Lond A 1895;186:123–64. [128] Fluent A. 12.0 theory guide. , 5Ansys Inc.; 2009. [129] Rodi W. Turbulence modeling and simulation in hydraulics: A historical review. J Hydraul Eng 2017;143(5):03117001.

U

875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917

11

Please cite this article as: Zieringer J, Takors R, In Silico Prediction of Large-Scale Microbial Production Performance: Constraints for Getting Proper Data-Driven Models, Comput Struct Biotechnol J (2018), https://doi.org/10.1016/j.csbj.2018.06.002

918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954