Root cause investigation of deviations in protein chromatography based on mechanistic models and artificial neural networks

Root cause investigation of deviations in protein chromatography based on mechanistic models and artificial neural networks

Accepted Manuscript Title: Root cause investigation of deviations in protein chromatography based on mechanistic models and artificial neural networks...

5MB Sizes 2 Downloads 18 Views

Accepted Manuscript Title: Root cause investigation of deviations in protein chromatography based on mechanistic models and artificial neural networks Author: Gang Wang Till Briskot Tobias Hahn Pascal Baumann Jurgen ¨ Hubbuch PII: DOI: Reference:

S0021-9673(17)31133-0 http://dx.doi.org/doi:10.1016/j.chroma.2017.07.089 CHROMA 358736

To appear in:

Journal of Chromatography A

Received date: Revised date: Accepted date:

24-3-2017 7-7-2017 28-7-2017

Please cite this article as: Gang Wang, Till Briskot, Tobias Hahn, Pascal Baumann, Jddoturgen Hubbuch, Root cause investigation of deviations in protein chromatography based on mechanistic models and artificial neural networks, (2017), http://dx.doi.org/10.1016/j.chroma.2017.07.089 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

3

Gang Wanga , Till Briskota , Tobias Hahnb , Pascal Baumanna , J¨ urgen Hubbucha,∗ a Karlsruhe

5 6

Institute of Technology (KIT), Institute of Process Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe, Germany b GoSilico GmbH, Karlsruhe, Germany

cr

4

ip t

2

Root cause investigation of deviations in protein chromatography based on mechanistic models and artificial neural networks

1

Abstract

8

In protein chromatography, process variations, such as aging of column or process errors, can result

9

in deviations of the product and impurity levels. Consequently, the process performance described

10

by purity, yield, or production rate may decrease. Based on visual inspection of the UV signal,

11

it is hard to identify the source of the error and almost unfeasible to determine the quantity of

12

deviation. The problem becomes even more pronounced, if multiple root causes of the deviation are

13

interconnected and lead to an observable deviation. In the presented work, a novel method based

14

on the combination of mechanistic chromatography models and the artificial neural networks is

15

suggested to solve this problem. In a case study using a model protein mixture, the determination

16

of deviations in column capacity and elution gradient length was shown. Maximal errors of 1.5 % and

17

4.90 % for the prediction of deviation in column capacity and elution gradient length respectively

18

demonstrated the capability of this method for root cause investigation.

19

Keywords: Protein chromatography modeling; Root cause investigation; Artificial neural

20

networks; Ion-exchange chromatography

Ac ce p

te

d

M

an

us

7

∗ Corresponding

author, E-mail address: [email protected], Telephone: +49 721 608 47526

Page 1 of 27 Preprint submitted to Journal of Chromatography A

July 31, 2017

21

1. Introduction Mechanistic models have found extensive applications in protein chromatography as reported [1,

23

2, 3, 4]. It requires little experimental effort and provides mechanistic process understanding, which

24

satisfies the demands of the Quality by Design (QbD) approach promoted by the US Food and Drug

25

Administration (FDA) [5, 6, 7].

ip t

22

Mechanistic models for chromatography consist of partial differential equations describing the

27

convection, diffusion, and adsorption mechanisms in the chromatographic column. For ion-exchange

28

chromatography, a proven approach is given by the combination of the widely accepted general rate

29

model (GRM) covering the mass transfer [8], and the stoichiometric displacement isotherm model

30

(SDM) [9, 10] or its extension, the steric-mass action (SMA) isotherm model [11]. Although these

31

mechanistic models contain a huge amount of information, they has been almost exclusively applied

32

to model-based process development [12, 13, 14, 1, 3] and robustness analysis [15, 16, 17, 18, 19].

an

us

cr

26

Artificial intelligence and machine learning based on artificial neural networks (ANNs) have been

34

among the fastest growing research topics in the last decades. The multilateral applicability of ANN

35

has been demonstrated in numerous fields for solving many complex real-world problems [20, 21].

36

In chromatography, ANNs were thought to be an alternative to first-principle modeling. The back-

37

propagation ANN (BP-ANN) of simple architecture was employed to model retention times in

38

different chromatographic modes and formats [22, 23, 24, 25, 26]. In this context, however, it lacks

39

mechanistic understanding and requires a large amount of data for the learning process that is often

40

not available.

te

d

M

33

Combining both mechanistic and non-mechanistic modeling results in the so-called hybrid mod-

42

els or metamodels which enable entirely new opportunities [4, 27, 28, 29, 30]. ANNs can make

43

accessible hidden unused information contained in mechanistic models; whereas the mechanistic

44

model supports the learning process of ANNs with overwhelming data amounts generated by in

45

silico experimentation. To demonstrate the power of this combined method, root cause investi-

46

gation is considered. Here, an exact mechanistic model which can immediately identify the root

47

causes of the deviation does not exist. Also the conventional ANN approach would fail, because it

48

is unfeasible to artificially generate process variations of any magnitude in the laboratory to gather

49

enough information for its application.

Ac ce p

41

50

In this work, the combination of mechanistic models and ANNs is carried out to identify causes

51

for deviation in chromatograms. The presented case study pays special attention to the ionic

52

capacity for two reasons. First, the ionic capacity of a column can be influenced by day-to-day

53

operations, column aging, or even a column exchange - and its deviation can result in deviation of

54

the elution behavior of product and impurities, leading to reduced purity, yield, or production rate.

55

Secondly, a real-time analytical tool for determination of the ionic capacity is not available, and the

56

current standard method is the time-consuming acid-base titration. The additional parameter salt

Page 2 of 27 2

gradient length was chosen to complicate the case study, since its deviation has a very similar effect

58

on the chromatogram. Of course, the actual values of the salt gradient length can be measured

59

easily using a conductivity sensor. However, the measurement itself does not allow a quantitative

60

explanation of deviating chromatograms.

ip t

57

After the mechanistic model has been calibrated, it is employed for in silico experimentation

62

to generate information about the interrelationships between the chromatograms and the two pa-

63

rameters ionic capacity and salt gradient length. Based on this information, the ANN learns to

64

recognize chromatograms and to respond with the corresponding parameter values. To verify the

65

ANN’s generalization capability, 20 % of the in silico data are used for cross-validation. To demon-

66

strate the practical applicability of this method, experimental errors are imitated by manipulating

67

the parameters in the laboratory.

68

2. Theory

69

2.1. Transport-dispersive model

M

an

us

cr

61

As a basis for generating large data sets for ANN calibration, mechanistic models were employed

71

using the following theoretical background. To model the convection and diffusion inside the chro-

72

matography column of length L and adsorber beads of radius rp , the general rate model (GRM) is

73

used [8]:

∂c(x, t) ∂ 2 c(x, t) + Dax ∂x ∂x2 1 − εb 3 − kf ilm,i (ci (x, t) − cp,i (x, rp , t)) εb rp u(t) = − (ci (0, t) − cin,i (t)) Dax = −u(t)

Ac ce p

∂ci (x, t) ∂t

te

d

70

∂ci (0, t) ∂x ∂ci (L, t) ∂x

∂cp,i (x, r, t) ∂t

0    ∂cp,i (x,r,t) 1−ε 1 ∂  2  r D − εp p ∂qi (x,r,t) 2 p,i  r ∂r ∂r ∂t   k f ilm,i = εp Dp,i (ci (x, t) − cp,i (x, rp , t))     0

(1) (2)

=

(3) for r ∈ (0, rp ), for r = rp ,

(4)

for r = 0.

74

The mass transfer between the pore volume cp,i (x, r, t) and the interstitial volume of the mobile

75

phase ci (x, t) is described by Eq. (1). The peak-broadening effects in the interstitial volume are

76

lumped in the voidage of the bed εb , the axial dispersive coefficient Dax , and the film transfer

77

coefficient kf ilm,i . Eqs. (2) and (3) are the Danckwerts boundary conditions. Eq. (4) describes the

78

exchange between the stationary phase qi and the pore volume concentration cp,i depending on the

79

adsorber particle voidage εp and the component-specific pore diffusion coefficient Dp,i . Here, the

80

less complex linear driving force (LDF) model could also be chosen in spite of large molecules, if

81

the number of transfer units is high and the adsorption isotherm is linear.

Page 3 of 27 3

82

2.2. Stoichiometric displacement isotherm model In the linear region of the adsorption isotherm, the steric hindrance effect of proteins can be

84

neglected, resulting in equivalence of the steric mass action (SMA) [11] and stoichiometric dis-

85

placement isotherm model (SDM). Since the presented work only investigates the linear region, for

86

description of the interaction between protein and ligand on the IEX adsorber surface, SDM in the

87

kinetic formulation is employed [9, 10]: ∂qi (x, r, t) ∂t

=

cr

kkin,i

ip t

83

keq,i qsalt (x, r, t)νi cp,i (x, r, t) − csalt (x, r, t)νi qi (x, r, t)

(5)

The adsorption coefficient kads,i and the desorption coefficient kdes,i are replaced by the equilib-

89

rium coefficient keq,i = kads,i /kdes,i and the kinetic coefficient kkin,i = 1/kdes,i as proposed in [31].

90

The advantage of this formulation is that keq,i strongly affects the retention time, whereas kkin,i

91

has a slight influence on the peak height when the equilibrium is not reached instantly. νi is the

92

characteristic charge of the component i. csalt is the counter-ion concentration in the pore phase.

93

The salt concentration in the stationary phase qsalt is given as: qsalt (x, r, t)

M

an

us

88

=

Λ−

m X

νi qi (x, r, t)

(6)

i=1

d

95

with the ionic capacity of the stationary phase being Λ. Combined with the GRM Eqs. (1)-(4), an IEX chromatography process can be modeled using numerical methods.

97

2.3. Numerical solution

Ac ce p

96

te

94

98

The finite element method is employed for the discretization in space on a grid with equidistant

99

nodes. The fractional step θ-scheme is chosen for the discretization in time [32]. The solution

100

of the non-linear equation system is approximated by Picard iteration [33]. Adaptive simulated

101

annealing [34] and the Levenberg-Marquardt algorithm [35] are employed for parameter estimation.

102

The former is a heuristic algorithm that searches a defined space with random jumps to find the

103

global minimum. Thereafter, the results are refined by applying the latter algorithm which is

104

deterministic.

105

2.4. Artificial neural networks

106

In comparison to first-principles modeling, ANN modeling offers numerous advantages such

107

as the ability to detect possible correlations and unconsidered nonlinear relationships between

108

variables [36]. Its topological structure is shown in Fig. 1. Each node is connected to every node in

109

the next layer. The nodes in the input layer represent the input data and the ones in the output

110

layer return the output data. In the nodes of the hidden layer, the sum of products of biases and

Page 4 of 27 4

111

weights n, and the products of inputs and weights is transferred with a hyperbolic tangent sigmoid

112

activation function to an output signal between −1 and +1 as shown in Eq. (7). 2 1 + e−2n

=

(7)

ip t

f (n)

Here, the nonlinearity in the data can be covered. A linear activation function is employed in the

114

nodes of the output layer to transfer the sum of weighted outputs from the former layer and the

115

weighted biases to the final outputs. By mapping the input data to the correct output data using

116

backpropagation, the difference between the targeted and the actual output values is iteratively

117

minimized by updating the weights in the network [37].

118

3. Materials and methods

119

3.1. Instruments

an

us

cr

113

All chromatographic experiments were carried out using an Ettan liquid chromatography (LC)

121

system equipped with pump unit P-905, dynamic single chamber mixer M-925 (90 µl mixer vol-

122

ume), UV monitor UV-900 (3 mm optical path length), and conductivity cell pH/C-900 (all GE

123

Healthcare, Little Chalfont, Buckinghamshire, UK).

124

3.2. Software

d

M

120

The control software UNICORN 5.31 (GE Healthcare, Little Chalfont, Buckinghamshire, UK)

126

was used in combination with the LC system. The mechanistic model was simulated using the

127

protein chromatography simulation software ChromX (GoSilico, Karlsruhe, Germany). ChromX

128

was used for the numerical simulation of the system of partial differential equations, estimation of

Ac ce p

te

125

130

model parameters, as well as optimization and statistical analysis [38]. The ANN modeling was R conducted in Matlab R2016a (MathWorks, USA).

131

3.3. Adsorber, proteins, and chemicals

129

132

R The strong cation-exchange chromatography (CEX) adsorber medium, TOYOPEARL Giga-

134

Cap S-650M supplied by Tosoh Bioscience (Griesheim, Germany) was used as pre-packed 0.965 mL R Toyoscreen column with dimension 30 mm × 6.4 mm. The adsorber media were stored in 20 %

135

ethanol between the runs. The storage solution was removed by prolonged equilibration with water,

136

and flushed with low-salt and high-salt buffer before experimentation.

133

137

Three proteins of different mass transfer and adsorption behavior according to their different

138

sizes and different isoelectric points were used. A monoclonal antibody (mAb, pI 8.3-8.5, 144-

139

147 kDa) of the IgG isotype was provided by Lek (Ljubljana, Slovenia). Cytochrome c (bovine

140

heart, no. A4612, pI 10.4-10.8, 12.3 kDa) was purchased from Sigma-Aldrich (St. Louis, MO, USA).

141

Lysozyme (hen egg, no. HR7-110, pI 11.0, 14.6 kDa) was purchased from Hampton Research (Aliso

142

Viejo, CA, USA).

Page 5 of 27 5

The experiments were conducted below the isoelectric points of all three proteins. 50 mM

144

sodium phosphate buffer was used at pH 6.5 with additional 0 and 1 M NaCl. 0.5 M NaOH was

145

used for cleaning-in-place. All solutions were prepared with ultra pure water (arium pro UV,

146

Sartorius, G¨ ottingen, Germany), filtrated with a membrane cutoff of 0.22 µm and degassed by

147

sonication. Protein solution with 1 mg/ml of IgG, cytochrome c, and lysozyme each was prepared

148

in corresponding binding buffer and filtrated with a membrane cutoff of 0.22 µm prior to usage.

149

3.4. System characterization

cr

ip t

143

¨ The AKTA system and chromatography column were characterized via tracer pulse injection

151

at a constant flow rate of 0.5 ml/min just like all other chromatography experiments in this work.

152

For determination of the interstitial volume of the column, filtrated 10 g/L blue dextran 2000 kDa

153

(Sigma-Aldrich, St. Louis, MO, USA) solution was used as non-interacting, non-pore-penetrating

154

tracer. The system and total voidage of the column were determined using 25 µL 1 %(v/v) acetone

155

(Merck, Darmstadt, Germany) as a pore-penetrating, non-interacting tracer. For determination

156

of these parameters, the UV signal at 260 nm was recorded. All measurements were corrected

157

with respect to dead volumes. Acid-base titration following Huuk et al. was applied to determine

158

the ionic capacity Λ of GigaCap S-650M [39] in triplicate. The column was first equilibrated for

159

15 CV with 0.5 M HCl to exchange the counter-ion against protons. Subsequently, the columns

160

were washed with ultra-pure water for 20 CV to displace the HCl solution. Finally, the adsorber

161

was titrated with a 0.01 M NaOH solution. The total ionic capacity per adsorber skeleton volume

162

was calculated according to

164

and

Ac ce p

163

te

d

M

an

us

150

,

(8)

nN a+ = VN aOH · cN aOH

(9)

Λ=

nN a+ , Vc · (1 − εt )

(10)

165

with the amount of exchanged sodium ions nN a+ , column volume Vc , total porosity εt , and con-

166

centration of NaOH solution used cN aOH .

167

3.5. Bind-and-elute experiments

168

Protein solution with 1 mg/ml of IgG, cytochrome c, and lysozyme each was prepared in cor-

169

responding binding buffer and injected via a 100 µL loop. From low-salt and high-salt buffers,

170

different steps and gradients were mixed within the LC system. A subset of the chromatograms

171

was used to perform the parameter estimation with the inverse method [31]. The remaining runs

172

were used for model validation and the ANN-based root cause investigation.

Page 6 of 27 6

The linear gradient experiments were conducted by applying the varying gradient lengths 15 CV,

174

20 CV, and 25 CV. After a post-loading wash step of 1 CV equilibration buffer, elution was started

175

with 50 mM and an increasing salt gradient ranging from 50 mM to 400 mM NaCl. After protein

176

elution, the column was stripped over 3 CV at a NaCl concentration of 1.05 M and re-equilibrated

177

for 1 CV binding buffer.

ip t

173

The step elution consisted of three elution steps. After an equilibration step with 2 CV equili-

179

bration buffer and a wash step of 2 CV binding buffer, the salt concentration was raised to 100 mM

180

NaCl and held for 8 CV. Then, the concentration was further increased to 190 mM NaCl and kept

181

constant for 8 CV. Finally, the column was stripped for 3 CV at a concentration of 1.05 M NaCl

182

and re-equilibrated with equilibration buffer.

us

All experiments were carried out at a flow rate of 0.5 ml/min to ensure a constant residence

184

time.

185

3.6. Mechanistic model calibration and validation

an

183

cr

178

Two bind-and-elute chromatograms using linear gradient elution of 20 CV and 25 CV were used

187

for model calibration. Widely accepted correlations suggested in the literature were employed to

188

determine the upper and lower boundaries for subsequent parameter estimation. The penetration

189

theory [40] was chosen to calculate the upper boundary and the correlation suggested by Kataoka

190

and coworker [41] was employed to calculate the lower boundary of the film diffusion coefficient.

191

The correlation based on Guiochon [42], Tyn and Gusek [43], and Mackie and Meares [44] was used

192

to estimate the lower boundary, and the correlation based on Guiochon [42], Tyn and Gusek [43],

193

and Wackao and Smith [45] to estimate the upper boundary of the pore diffusion coefficient. In

194

parameter estimation, the genetic algorithm was employed at first. For fine tuning of the parameter

195

estimates, the Levenberg-Marquardt algorithm [35] was employed. It is important to note, that a

196

small discrepancy between simulated and experimental chromatograms does not necessarily indicate

197

an accurate parameter estimation. Thus, the application of parameter estimation need to be sub-

198

jected to rigorous validation. Here, an uninvolved step-wise gradient bind-and-elute chromatogram

199

was predicted and compared to wet-lab experiment for validation. The confidence intervals at

200

95 % level were calculated based on parameter sensitivities subsequently to assess the reliability of

201

parameter estimates.

202

3.7. In silico screening

Ac ce p

te

d

M

186

203

570 model parameter combinations of salt gradient length and ionic capacity were used to

204

perform in silico experimentation using ChromX. The ranges from 1.0 M to 1.3 M for the ionic

205

capacity and from 12 CV to 28 CV for the salt gradient length were chosen. The salt gradient

206

length and ionic capacity both affect the elution behavior in very similar ways. Visually inspected,

207

a deviation of the ionic capacity or salt gradient length influences the retention time and the peak

Page 7 of 27 7

height. A pronounced problem is given, when multiple root causes are potentially responsible for

209

one observed deviation. The ranges were chosen to cover different scenarios of deviated elution

210

behavior on CEX in a process-relevant parameter window. All parameters and the resulting 570

211

chromatograms were normalized to the minimum and maximum values observed. Since the in silico

212

experimentation generates ideal data, white Gaussian noise was added by a signal-to-noise ratio of

213

60. After that, all normalized UV signals smaller than 0.2 were cut off with the aim to reduce the

214

data amount. This resulted in cheaper computation and was proven to be time-saving for potential

215

real-time application of the ANN.

216

3.8. ANN model calibration and validation

us

cr

ip t

208

219

and the linear activation function purelin were implemented in the nodes of the hidden layer and

220

in the ones of the output layer, respectively. The function divideint using interleaved indices was

221

employed to divide the in silico data into training, validation, and test data sets by a ratio of

222

5:3:2. The network training function trainlm was applied to update weight values for inputs and

223

biases according to an adaptive learning rate and a gradient descent momentum. The learning rate

224

and the momentum were set to 0.05 and 0.9. A root-mean-square error (RMSE) of 1·10−5 and a

225

maximal iteration number of 1000 were set as stopping criteria.

226

3.9. ANN-based root cause investigation

Ac ce p

te

d

M

an

218

A simply constructed BP-ANN model with a hidden layer of nine neurons and an output layer R of two neurons was built in MATLAB . The hyperbolic tangent sigmoid activation function tansig

217

227

Besides the verification of the ANN’s generalization capability with cross-validation using 20 %

228

of the total in silico data sets, a case study with three runs with differently pronounced deviations

229

in their chromatograms was designed to show the practical applicability of the presented root cause

230

investigation method. The calibration experiment with a linear gradient elution of 25 CV and ionic

231

capacity of 1.172 M was chosen to be the set point. Three further bind-and-elute experiments using

232

parameter combinations of 15 CV and 1.172 M, 20 CV and 1.130 M, as well as 25 CV and 1.130 M

233

were carried out in the laboratory. The first experiment represented the deviation of the salt gradient

234

length only, the second the simultaneous deviation of the salt gradient length and ionic capacity,

235

and the third the deviation of the ionic capacity only. Furthermore, they represented the different

236

degrees of deviations in the chromatogram. The chromatograms were fitted with exponentially

237

modified Gaussian distribution [46] in the aim of signal smoothing and peak separation. After

238

normalization, all UV signals smaller than 0.2 were cut off according to the data preparation for

239

ANN model calibration and validation. The separated peaks were entered into the calibrated BP-

240

ANN model which returned the predictions of salt gradient length and ionic capacity for each

241

experiment. Subsequently, the predictions were compared to the settings and the percentage errors

242

were calculated to prove the predictive accuracy.

Page 8 of 27 8

243

4. Results and discussion

244

4.1. System characterization Without a column attached, the dead volume of 280 µl of the LC system was determined by

246

tracer injection. This dead volume was subtracted from all other data generated with the LC. Based

247

on the results of the tracer injections, axial dispersion, bed and particle voidage were calculated.

248

The ionic capacities were calculated based on the acid-base titration in triplicate. The ionic capacity

249

was reduced irreversibly from 1.172 M ± 0.003 M to 1.130 M ± 0.004 M by flushing the column with

250

2.0 M HCl at a flow rate of 0.2 ml/min for 24 h. The results are listed in Table 1.

251

4.2. Mechanistic model calibration and validation

us

cr

ip t

245

To estimate the model parameters kf ilm , Dpore , keq , and ν with the inverse method, two linear

253

gradient chromatography experiments were carried out with gradient lengths of 20 CV and 25

254

CV. First, the genetic algorithm was employed. Thereafter, a Levenberg-Marquardt algorithm was

255

used to find the final optimum. The estimated parameters are given in Table 2. The confidence

256

intervals at 95 % level confirm the reliability of the parameter estimates. Obviously, kkin was not

257

necessary for the description of the system considered, because the equilibria of all components were

258

reached instantly. Figs. 2 (a) and (b) show the experimental and simulated chromatograms used

259

for the model calibration. The weakly binding component mAb, moderately binding component

260

cytochrome c, and the strongly binding component lysozyme are eluted in the salt gradient in

261

successive order. To prove the prediction performance of the calibrated mechanistic model, an

262

additional experiment beyond the calibration space has been conducted as validation data. Fig. 2 (c)

263

shows very good agreement of experimental data and model prediction. Since the reliability of the

264

parameter estimates and the accuracy of the calibration are given, this mechanistic model is assessed

265

to be suitable for in silico data generation.

266

4.3. ANN model calibration and validation

Ac ce p

te

d

M

an

252

267

570 in silico experiments were conducted to screen the design space as shown in Fig. 3 (b). The

268

resulting chromatograms can be seen in Fig. 3 (a). The ranges from 1 M to 1.3 M and from 12 CV

269

to 28 CV were chosen to cover the designed deviations of ionic capacity and salt gradient length in

270

the case study. Thereafter, all normalized UV signals smaller than 0.2 were cut off with the aim

271

to reduce the data containing less relevant case-specific information. The resulting chromatograms

272

can be seen in Fig. 4. This resulted in cheaper computation and proved to be time-saving for

273

potential real-time application of the ANN. For ANN model calibration, backpropagation was used

274

to correlate the results of in silico screening with the corresponding parameter sets. 50 % of the

275

total data were used as training data set for the ANN’s adjustment according to the respective

276

errors. 30 % were used as validation data set to measure the generalization performance, and to

Page 9 of 27 9

stop training when generalization stopped improving. The remaining 20 % were excluded during

278

model calibration and used subsequently as test data for cross-validation. The final correlation

279

achieved for the salt gradient length is given in Figs. 5 (a) - (c) which represent the training data

280

set, validation data set, and test data set, respectively. Figs. 5 (d) - (f) show the correlation for the

281

ionic capacity.

ip t

277

A good agreement between the predictions by the ANN model and the parameter sets given in the

283

in silico screening was observed. The smallest deviations were found to be in the correlation results

284

of training data set (R2 = 0.9997 for salt gradient length, and R2 = 0.9992 for ionic capacity). The

285

deviations found in the correlation results of validation data set were in a statistically acceptable

286

range (R2 = 0.9926 for salt gradient length, and R2 = 0.9822 for ionic capacity). According to the

287

correlation results of the test data set (R2 = 0.9965 for salt gradient length, and R2 = 0.9904 for

288

ionic capacity), the ANN’s generalization was confirmed, verifying its usability for the root cause

289

investigation.

290

4.4. ANN-based root cause investigation

M

an

us

cr

282

Although the ANN’s generalization capability was verified with the cross-validation using 20 %

292

of the total in silico data sets, three runs with strong, moderate, and slight deviations in their

293

chromatograms were generated in the laboratory to prove the practical applicability of the presented

294

method for root cause investigation. The salt gradient length and ionic capacity were adjusted

295

separately and simultaneously to challenge the calibrated ANN model with the pronounced problem,

296

when multiple root causes were potentially responsible for one observed deviation. As can be seen

297

in Table 3, the casual relations in all three deviated runs were determined properly. Overall, the

298

deviations were found in a statistically acceptable range. Errors of approximately 0 % and 4.90 %

299

were given for the prediction of the deviated run 1. The deviated run 2 with both adjusted salt

300

gradient length and ionic capacity was predicted accurately with the small errors of 0.53 % and

301

0.55 %. Errors of 1.5 % and 3.35 % were given for the prediction of the deviated run 3. Compared

302

to the inverse modeling approach proposed by Brestrich and coworkers [47], the presented method

303

is advantageous with respect to the computational expense. Unlike the necessity of performing

304

parameter estimation repeatedly, the calibrated ANN model can be carried out with very little

305

computational expense.

Ac ce p

te

d

291

306

A correlation has been derived by Parente and Wetlaufer originally for determination of the

307

adsorption isotherm model parameters in linear region [48]. Its modified formulation delivered by

308

Shukla and coworkers [49] was employed with retention volumes of the solutes and solved for ionic

309

capacity and gradient length. To compare the performance of this method and the presented one,

310

the calculated results are shown in Tab. 4.

Page 10 of 27 10

311

5. Conclusion This study presents a novel method for the root cause investigation in protein chromatography

313

when an exact mathematical model which can immediately identify the root causes of the deviation

314

is not given and the amount of data available does not allow the direct application of an ANN

315

approach. The fundamental idea was to combine a mechanistic model and an ANN model to

316

benefit from their respective advantages. The learning material for the ANN model was generated

317

by conducting 570 in silico experiments using the calibrated mechanistic model. Based on these

318

data, the ANN model learned the correlations between the appearances of the chromatograms

319

and the parameters of interest. Since only few experiments were required for the calibration of

320

the mechanistic model, the high wet-lab experimental effort needed for the training of the ANN

321

model is circumvented. The generalization performance of the artificial intelligence of a very simple

322

construction was assessed by correlation results of cross-validation. The practical applicability

323

was verified by an experimental case study. Thereby, three experiments were conducted in the

324

laboratory to generate the deviating chromatograms by deliberately causing errors in the parameters

325

salt gradient length, ionic capacity, and both of them. These experiments represented the cases

326

of strong, moderate, and slight deviations in the chromatogram, respectively. This pronounced

327

problem, that multiple root causes could be potentially responsible for one observed deviation, was

328

solved successfully. The ionic capacity that had hitherto been a non-observable parameter during

329

operation, was notably accurate with percentage errors of 0 %, 0.53 %, and 1.50 %.

te

d

M

an

us

cr

ip t

312

With this method, one can identify the root causes of the deviation and determine their mag-

331

nitudes within milliseconds. Thus, it is well-suited for real-time process monitoring and control. It

332

allows an immediate correction of the chromatography process operating condition as a short-term

333

solution. In this way, the economic harm can be reduced drastically, especially when innovative

334

technologies providing high productivity e.g., continuous chromatography, are involved in the man-

335

ufacturing. For future work, the method should be extended by a decision-making strategy to be

336

able to correct the deviated operating conditions to the closest Pareto optimum.

337

6. Acknowledgment

Ac ce p

330

338

This project has received funding from the European Unions Horizon 2020 Research and In-

339

novation Programme under grant agreement No 635557. We kindly thank Lek Pharmaceuticals

340

d.d. (M engˇ es, Slovenia) for providing the mAb, and Johannes Winderl and Matthias R¨ udt for

341

the scientific conversations.

342

The authors declare no conflict of interest.

Page 11 of 27 11

343

References [1] P. Baumann, T. Hahn, J. Hubbuch, High-throughput micro-scale cultivations and chromatog-

345

raphy modeling: Powerful tools for integrated process development, Biotechnol. Bioeng. 112

346

(2015) 2123–2133.

ip t

344

[2] F. Steinebach, M. Angarita, D. J. Karst, M¨ uller-Sp¨ath, M. Morbidelli, Model based adaptive

348

control of a continuous capture process for monoclonal antibodies production, J. Chromatogr.

349

A 1444 (2016) 50 – 56.

cr

347

[3] C. L. Effio, T. Hahn, J. Seiler, S. A. Oelmeier, I. Asen, C. Silberer, L. Villain, J. Hubbuch,

351

Modeling and simulation of anion-exchange membrane chromatography for purification of Sf9

352

insect cell-derived virus-like particles, J. Chromatogr. A 1429 (2016) 142–154.

an

us

350

[4] S. M. Pirrung, L. A. van der Wielen, R. F. Van Beckhoven, E. J. van de Sandt, M. H. Eppink,

354

M. Ottens, Optimization of biopharmaceutical downstream processes supported by mechanistic

355

models and artificial neural networks, Biotechnol. Prog.doi:10.1002/btpr.2435.

356

URL http://dx.doi.org/10.1002/btpr.2435

M

353

[5] S. Chhatre, S. Farid, J. Coffman, P. Bird, A. Newcombe, N. Titchener-Hooker, How imple-

358

mentation of Quality by Design and advances in Biochemical Engineering are enabling efficient

359

bioprocess development and manufacture, J. Chem. Technol. Biot. 86 (2011) 1125–1129.

361

te

[6] A. Hanke, M. Ottens, Purifying biopharmaceuticals: knowledge-based chromatographic pro-

Ac ce p

360

d

357

cess development, Trends Biotechnol. 32 (2014) 210–220.

362

[7] International Conference on Harmonisation of Technical Requirements for Registration of

363

Pharmaceuticals for Human Use, ICH-Endorsed Guide for ICH Q8/Q9/Q10 Implementation,

364

365

366

367

368

369

370

371

372

373

http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Quality/ Q8_9_10_QAs/PtC/Quality_IWG_PtCR2_6dec2011.pdf.

[8] H. Schmidt-Traub, M. Schulte, A. Seidel-Morgenstern (Eds.), Preparative Chromatography, 2nd Edition, Wiley, 2012.

[9] A. Velayudhan, C. Horvath, Preparative chromatography of proteins - analysis of the multivalent ion-exchange formalism, J. Chromatogr. 443 (1988) 13 – 29. [10] G. Xindu, F. E. Regnier, Retention model for proteins in reversed-phase liquid chromatography, J. Chromatogr. A 296 (1984) 15 – 30. [11] C. A. Brooks, S. M. Cramer, Steric mass-action ion exchange: Displacement profiles and induced salt gradients, Fluid Phase Equilibr. 38 (1992) 1969–1978.

Page 12 of 27 12

374

375

[12] H. Iyer, S. Tapper, P. Lester, B. Wolk, R. van Reis, Use of the steric mass action model in ion-exchange chromatographic process development, J. Chromatogr. A 832 (1999) 1–9. [13] J. M. Mollerup, T. B. Hansen, S. Kidal, L. Sejergaard, A. Staby, Development, modelling,

377

optimisation and scale-up of chromatographic purification of a therapeutic protein, Fluid Phase

378

Equilibr. 261 (2007) 133–139.

ip t

376

[14] T. C. Huuk, T. Hahn, A. Osberghaus, J. Hubbuch, Model-based integrated optimization and

380

evaluation of a multi-step ion exchange chromatography, Sep. Purif. Technol. 136 (2014) 207–

381

222.

384

385

us

interaction chromatography step, J. Chromatogr. A 1099 (2005) 157–166.

an

383

[15] N. Jakobsson, M. Degerman, B. Nilsson, Optimisation and robustness analysis of a hydrophobic

[16] N. Jakobsson, M. Degerman, E. Stenborg, B. Nilsson, Model based robustness analysis of an ion-exchange chromatography step, J. Chromatogr. A 1138 (2007) 109–119.

M

382

cr

379

[17] M. Degerman, K. Westerberg, B. Nilsson, Determining critical process parameters and process

387

robustness in preparative chromatography a model-based approach, Chem. Eng. Technol. 32

388

(2009) 903–911.

390

[18] M. Degerman, K. Westerberg, B. Nilsson, A model-based approach to determine the design

te

389

d

386

space of preparative chromatography, Chem. Eng. Technol. 32 (2009) 1195–1202. [19] K. Westerberg, E. B. Hansen, M. Degerman, T. B. Hansen, B. Nilsson, Model-based process

392

challenge of an industrial ion-exchange chromatography step, Chem. Eng. Technol. 35 (2012)

393

Ac ce p

391

183–190.

394

[20] A. Krogh, What are artificial neural networks?, Nat. Biotechnol. 26 (2008) 195–197.

395

[21] I. Basheer, M. Hajmeer, Artificial neural networks: fundamentals, computing, design, and

396

application, J. Microbiol. Meth. 43 (1) (2000) 3 – 31.

397

[22] R. Bergs, V. Sanz-Nebot, J. Barbosa, Modelling retention in liquid chromatography as a func-

398

tion of solvent composition and pH of the mobile phase, J. Chromatogr. A 869 (12) (2000) 27

399

– 39.

400

[23] J. E. Madden, N. Avdalovic, P. R. Haddad, J. Havel, Prediction of retention times for anions

401

in linear gradient elution ion chromatography with hydroxide eluents using artificial neural

402

networks, J. Chromatogr. A 910 (1) (2001) 173 – 179.

403

[24] V. K. Gupta, H. Khani, B. Ahmadi-Roudi, S. Mirakhorli, E. Fereyduni, S. Agarwal, Prediction

404

of capillary gas chromatographic retention times of fatty acid methyl esters in human blood

Page 13 of 27 13

405

using MLR, PLS and back-propagation artificial neural networks, Talanta 83 (3) (2011) 1014

406

– 1022. [25] A. Malenovi´c, B. Janˇci´c-Stojanovi´c, N. Kosti´c, D. Ivanovi´c, M. Medenica, Optimization of

408

artificial neural networks for modeling of atorvastatin and its impurities retention in micellar

409

liquid chromatography, Chromatographia 73 (9) (2011) 993–998.

ip t

407

[26] A. A. D‘Archivio, A. Incani, F. Ruggieri, Cross-column prediction of gas-chromatographic

411

retention of polychlorinated biphenyls by artificial neural networks, J. Chromatogr. A 1218

412

(2011) 8679 – 8690.

us

414

[27] D. Nagrath, B. W. Bequette, S. M. Cramer, Evolutionary Operation and Control of Chromatographic Processes, AlChE J. 49 (2003) 82–95.

an

413

cr

410

[28] D. Nagrath, A. Messac, B. W. Bequette, S. M. Cramer, A Hybrid Model Framework for the

416

Optimization of Preparative Chromatographic Processes, Biotechnol. Prog. 20 (2004) 162–178.

417

[29] G. G. Wang, S. Shan, Review of Metamodeling Techniques in Support of Engineering Design

421

422

423

424

425

426

427

428

d

420

[30] J. A. Caballero, I. E. Grossmann, An algorithm for the use of surrogate models in modular flowsheet optimization, AlChE J. 54 (2008) 2633–2650.

te

419

Optimization, J. Mech. Des. 129 (2006) 370–380.

[31] T. Hahn, P. Baumann, T. Huuk, V. Heuveline, J. Hubbuch, UV absorption-based inverse modeling of protein chromatography, Eng. Life Sci. 16 (2016) 99–106.

Ac ce p

418

M

415

[32] R. Glowinski, Finite element methods for incompressible viscous flow. In Numerical Methods for Fluids (Part 3), Elsevier, 2003.

[33] A. Quarteroni, A. Vallii, Numerical approximation of partial differential equations, Springer, 1994.

[34] K. Kaczmarski, D. Antos, Use of simulated annealing for optimization of chromatographic separations, Acta Chromatogr. 17 (2006) 20–45.

429

[35] F. Devernay, C/c++ minpack, http://devernay.free.fr/hacks/cminpack/ (2007).

430

[36] J. V. Tu, Advantages and disadvantages of using artificial neural networks versus logistic

431

regression for predicting medical outcomes, J. Clin. Epidemiol. 49 (11) (1996) 1225 – 1231.

432

[37] X. Du, Q. Yuan, J. Zhao, Y. Li, Comparison of general rate model with a new modelartificial

433

neural network model in describing chromatographic kinetics of solanesol adsorption in packed

434

column by macroporous resins, J. Chromatogr. A 1145 (12) (2007) 165 – 174.

Page 14 of 27 14

435

436

[38] T. Hahn, T. Huuk, V. Heuveline, J. Hubbuch, Simulating and Optimizing Preparative Protein Chromatography with ChromX, J. Chem. Educ. 92 (2015) 1497–1502. [39] T. C. Huuk, T. Briskot, T. Hahn, J. Hubbuch, A versatile noninvasive method for adsorber

438

quantification in batch and column chromatography based on the ionic capacity, Biotechnol.

439

Progr. 32 (2016) 666–677.

ip t

437

[40] R. B. Bird, W. E. Stewart, E. N. Lightfoot, Transport Phenomena, Wiley, 1962.

441

[41] T. Kataoka, H. Yoshida, K. Ueyama, Mass transfer in laminar region between liquid and

445

446

447

448

us

linear Chromatography, Elservier, 2006.

an

444

[42] G. Guiochon, A. Felinger, D. G. Shirazi, A. M. Katti, Fundamentals of Preparative and Non-

[43] M. T. Tyn, T. W. Gusek, Predictions of Diffusion Coefficients of Proteins, Biotechnol. Bioeng. 35 (1990) 327–338.

M

443

packing material surface in the packed bed, J. Chem. Eng. Jpn. 5 (1972) 132–136.

[44] J. S. Mackie, P. Meares, The Diffusion of Electrolytes in a Cation-Exchange Resin Membrane. I. Theoretical, Proceedings A 232 (1955) 498–509.

d

442

cr

440

[45] N. Wakao, J. M. Smith, Diffusion in catalyst pellets, Chem. Eng. Sci. 17 (1962) 825–834.

450

[46] E. Grushka, Characterization of exponentially modified Gaussian peaks in chromatography, Anal. Chem. 44 (11) (1972) 1733–1738.

Ac ce p

451

te

449

452

[47] N. Brestrich, T. Hahn, J. Hubbuch, Application of spectral deconvolution and inverse mechanis-

453

tic modelling as a tool for root cause investigation in protein chromatography, J. Chromatogr.

454

A 1437 (2016) 158 – 167.

455

[48] E. S. Parente, D. B. Wetlaufer, Relationship between isocratic and gradient retention times

456

in the high-performance ion-exchange chromatography of proteins. theory and experiment, J.

457

Chromatogr. A 355 (1986) 29 – 40.

458

[49] A. A. Shukla, S. S. Bae, J. A. Moore, K. A. Barnthouse, , S. M. Cramer, Synthesis and charac-

459

terization of high-affinity, low molecular weight displacers for cation-exchange chromatography,

460

Ind. Eng. Chem. Res. 37 (10) (1998) 4090–4098.

Page 15 of 27 15

ip t cr us an

Table 1: The voidages and axial dispersion are calculated from the retention volume and peak broadening of tracer

M

injections. The ionic capacity 1 is the initial state of the column, whereas the ionic capacity 2 is determined after

GigaCap S-650M εb

0.418

Particle voidage

εp

0.758

Total voidage

εt

0.829

Dax

0.120

Ionic Capacity 1 [M]

Λ

1.172 ± 0.003

Ionic Capacity 2 [M]

Λ

1.130 ± 0.004

Ac ce p

te

Bed voidage

d

acid treatment of the same column.

Axial dispersion [mm2 · s−1 ]

Page 16 of 27 16

ip t cr us an

Table 2: Parameters of the mass transfer model and kinetic isotherm formulation estimated from two bind-and-elute experiments using the inverse

Parameter

M

method.

mAb

Cytochrome c

Lysozyme

7.516·10−3 ± 2.474·10−3

1.899·10−2 ± 2.320·10−2

2.098·10−2 ± 1.871·10−2

Dpore [mm2 · s−1 ]

2.325·10−5 ± 0.315·10−5

6.823·10−5 ± 1.482·10−5

6.304·10−5 ± 0.836·10−5

4.388 ·10−10 ± 0.522·10−10

4.860·10−8 ± 0.227·10−8

2.982·10−5 ± 0.138·10−5

9.720 ± 0.038

10.353 ± 0.018

10.012 ± 0.033

te

ν [-]

Ac ce p

keq [-]

d

kf ilm [mm · s−1 ]

Page 17 of 27

ip t cr us an

Table 3: The ionic capacity and salt gradient length adjusted in the laboratory and predicted with the ANN model, and the percentage error of

M

prediction for three deviated runs.

Ionic capacity [M]

Salt gradient length [CV]

Predicted

Percentage error

Adjusted

Predicted

Percentage error

Deviated run 1

1.172

1.172

0%

15

15.735

4.90 %

Deviated run 2

1.130

1.136

0.53 %

20

20.109

0.55 %

Deviated run 3

1.130

1.147

1.50 %

25

24.128

3.35 %

Ac ce p

te

d

Adjusted

Page 18 of 27

ip t cr us an

Table 4: The ionic capacity and salt gradient length adjusted in the laboratory and calculated with the correlation delivered by Parente and Wet-

M

laufer [48], and the percentage error of calculation for three deviated runs.

Ionic capacity [M]

Salt gradient length [CV]

Calculated

Percentage error

Adjusted

Calculated

Percentage error

Deviated run 1

1.172

1.093

6.76 %

15

15.104

0.65 %

Deviated run 2

1.130

1.309

15.81 %

20

15.003

23.27 %

Deviated run 3

1.130

1.175

3.99 %

25

20.906

15.25 %

Ac ce p

te

d

Adjusted

Page 19 of 27

Figure Captions

462

Figure 1: Topological structure of a simple multi-layer BP-ANN.

463

Figure 2: Plots of UV signals over process run time for bind-and-elute experiments. Dashed lines display the UV signal

464

measured at column outlet and the adjusted salt gradients. Solid lines represent the simulated chromatograms. The

465

elution peaks of the early eluting component mAb, the moderate eluting component cytochrome c, and the late eluting

466

component lysozyme by applying linear salt gradients from 0.05 M to 1.05 M over 20 CV and 25 CV are shown in (a) and

467

(b). (c) represent a step-wise gradient elution from 0.05 M, over 0.1 M and 0.19 M, to 1.05 M.

469

Figure 3: The resulting chromatograms of the in silico screening are displayed in (a). The corresponding design space of the in silico screening is shown in (b): ionic capacity∈ [1, 1.3] M, and salt gradient length ∈ [12, 28] CV.

cr

468

ip t

461

Figure 4: The post-processed results of the in silico screening are shown.

471

Figure 5: Plots of screened parameters versus simulated parameters using ANN model. (a) - (c) represent the training data set,

472

us

470

validation data set, and test data set of the parameter ionic capacity. (d) - (f) represent the salt gradient length. Figure 6: Plots of UV signals over process run time for bind-and-elute experiments. The set point run displayed as solid blue

474

line is identical with Fig. 1 and shown here for comparability. The deviated run 1 as dashed purple line was generated by

475

adjusting the salt gradient length to 15 CV. The deviated run 2 as dashed green line was generated by adjusting both the

476

salt gradient length to 20 CV and the ionic capacity to 1.130 M. The deviated run 3 as dashed orange line was generated

477

by adjusting the ionic capacity to 1.130 M.

Ac ce p

te

d

M

an

473

Page 20 of 27

ip t cr us an M d te Ac ce p

Figure 1:

Page 21 of 27

ip t cr us an M d te Ac ce p

Figure 2:

Page 22 of 27

ip t cr us an M d te Ac ce p

Figure 3:

Page 23 of 27

ip t cr us an M d te Ac ce p

Figure 4:

Page 24 of 27

ip t cr us an M d te Ac ce p

Figure 5:

Page 25 of 27

ip t cr us an M d te Ac ce p

Figure 6:

Page 26 of 27

HIGHLIGHTS (3 to 5) Simulations of potential process deviations based on a mechanistic model were used to construct an artificial neural network model

-

Experimental errors in ionic capacity and salt gradient length were imitated simultaneously to prove the practical capability of this method

-

By this approach, the time-consuming root cause investigation itself could be reduced down to milliseconds

-

This approach could be a building block for real-time root cause investigation in chromatography processes

Ac ce p

te

d

M

an

us

cr

ip t

-

Page 27 of 27