Accepted Manuscript Title: Root cause investigation of deviations in protein chromatography based on mechanistic models and artificial neural networks Author: Gang Wang Till Briskot Tobias Hahn Pascal Baumann Jurgen ¨ Hubbuch PII: DOI: Reference:
S0021-9673(17)31133-0 http://dx.doi.org/doi:10.1016/j.chroma.2017.07.089 CHROMA 358736
To appear in:
Journal of Chromatography A
Received date: Revised date: Accepted date:
24-3-2017 7-7-2017 28-7-2017
Please cite this article as: Gang Wang, Till Briskot, Tobias Hahn, Pascal Baumann, Jddoturgen Hubbuch, Root cause investigation of deviations in protein chromatography based on mechanistic models and artificial neural networks, (2017), http://dx.doi.org/10.1016/j.chroma.2017.07.089 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
3
Gang Wanga , Till Briskota , Tobias Hahnb , Pascal Baumanna , J¨ urgen Hubbucha,∗ a Karlsruhe
5 6
Institute of Technology (KIT), Institute of Process Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe, Germany b GoSilico GmbH, Karlsruhe, Germany
cr
4
ip t
2
Root cause investigation of deviations in protein chromatography based on mechanistic models and artificial neural networks
1
Abstract
8
In protein chromatography, process variations, such as aging of column or process errors, can result
9
in deviations of the product and impurity levels. Consequently, the process performance described
10
by purity, yield, or production rate may decrease. Based on visual inspection of the UV signal,
11
it is hard to identify the source of the error and almost unfeasible to determine the quantity of
12
deviation. The problem becomes even more pronounced, if multiple root causes of the deviation are
13
interconnected and lead to an observable deviation. In the presented work, a novel method based
14
on the combination of mechanistic chromatography models and the artificial neural networks is
15
suggested to solve this problem. In a case study using a model protein mixture, the determination
16
of deviations in column capacity and elution gradient length was shown. Maximal errors of 1.5 % and
17
4.90 % for the prediction of deviation in column capacity and elution gradient length respectively
18
demonstrated the capability of this method for root cause investigation.
19
Keywords: Protein chromatography modeling; Root cause investigation; Artificial neural
20
networks; Ion-exchange chromatography
Ac ce p
te
d
M
an
us
7
∗ Corresponding
author, E-mail address:
[email protected], Telephone: +49 721 608 47526
Page 1 of 27 Preprint submitted to Journal of Chromatography A
July 31, 2017
21
1. Introduction Mechanistic models have found extensive applications in protein chromatography as reported [1,
23
2, 3, 4]. It requires little experimental effort and provides mechanistic process understanding, which
24
satisfies the demands of the Quality by Design (QbD) approach promoted by the US Food and Drug
25
Administration (FDA) [5, 6, 7].
ip t
22
Mechanistic models for chromatography consist of partial differential equations describing the
27
convection, diffusion, and adsorption mechanisms in the chromatographic column. For ion-exchange
28
chromatography, a proven approach is given by the combination of the widely accepted general rate
29
model (GRM) covering the mass transfer [8], and the stoichiometric displacement isotherm model
30
(SDM) [9, 10] or its extension, the steric-mass action (SMA) isotherm model [11]. Although these
31
mechanistic models contain a huge amount of information, they has been almost exclusively applied
32
to model-based process development [12, 13, 14, 1, 3] and robustness analysis [15, 16, 17, 18, 19].
an
us
cr
26
Artificial intelligence and machine learning based on artificial neural networks (ANNs) have been
34
among the fastest growing research topics in the last decades. The multilateral applicability of ANN
35
has been demonstrated in numerous fields for solving many complex real-world problems [20, 21].
36
In chromatography, ANNs were thought to be an alternative to first-principle modeling. The back-
37
propagation ANN (BP-ANN) of simple architecture was employed to model retention times in
38
different chromatographic modes and formats [22, 23, 24, 25, 26]. In this context, however, it lacks
39
mechanistic understanding and requires a large amount of data for the learning process that is often
40
not available.
te
d
M
33
Combining both mechanistic and non-mechanistic modeling results in the so-called hybrid mod-
42
els or metamodels which enable entirely new opportunities [4, 27, 28, 29, 30]. ANNs can make
43
accessible hidden unused information contained in mechanistic models; whereas the mechanistic
44
model supports the learning process of ANNs with overwhelming data amounts generated by in
45
silico experimentation. To demonstrate the power of this combined method, root cause investi-
46
gation is considered. Here, an exact mechanistic model which can immediately identify the root
47
causes of the deviation does not exist. Also the conventional ANN approach would fail, because it
48
is unfeasible to artificially generate process variations of any magnitude in the laboratory to gather
49
enough information for its application.
Ac ce p
41
50
In this work, the combination of mechanistic models and ANNs is carried out to identify causes
51
for deviation in chromatograms. The presented case study pays special attention to the ionic
52
capacity for two reasons. First, the ionic capacity of a column can be influenced by day-to-day
53
operations, column aging, or even a column exchange - and its deviation can result in deviation of
54
the elution behavior of product and impurities, leading to reduced purity, yield, or production rate.
55
Secondly, a real-time analytical tool for determination of the ionic capacity is not available, and the
56
current standard method is the time-consuming acid-base titration. The additional parameter salt
Page 2 of 27 2
gradient length was chosen to complicate the case study, since its deviation has a very similar effect
58
on the chromatogram. Of course, the actual values of the salt gradient length can be measured
59
easily using a conductivity sensor. However, the measurement itself does not allow a quantitative
60
explanation of deviating chromatograms.
ip t
57
After the mechanistic model has been calibrated, it is employed for in silico experimentation
62
to generate information about the interrelationships between the chromatograms and the two pa-
63
rameters ionic capacity and salt gradient length. Based on this information, the ANN learns to
64
recognize chromatograms and to respond with the corresponding parameter values. To verify the
65
ANN’s generalization capability, 20 % of the in silico data are used for cross-validation. To demon-
66
strate the practical applicability of this method, experimental errors are imitated by manipulating
67
the parameters in the laboratory.
68
2. Theory
69
2.1. Transport-dispersive model
M
an
us
cr
61
As a basis for generating large data sets for ANN calibration, mechanistic models were employed
71
using the following theoretical background. To model the convection and diffusion inside the chro-
72
matography column of length L and adsorber beads of radius rp , the general rate model (GRM) is
73
used [8]:
∂c(x, t) ∂ 2 c(x, t) + Dax ∂x ∂x2 1 − εb 3 − kf ilm,i (ci (x, t) − cp,i (x, rp , t)) εb rp u(t) = − (ci (0, t) − cin,i (t)) Dax = −u(t)
Ac ce p
∂ci (x, t) ∂t
te
d
70
∂ci (0, t) ∂x ∂ci (L, t) ∂x
∂cp,i (x, r, t) ∂t
0 ∂cp,i (x,r,t) 1−ε 1 ∂ 2 r D − εp p ∂qi (x,r,t) 2 p,i r ∂r ∂r ∂t k f ilm,i = εp Dp,i (ci (x, t) − cp,i (x, rp , t)) 0
(1) (2)
=
(3) for r ∈ (0, rp ), for r = rp ,
(4)
for r = 0.
74
The mass transfer between the pore volume cp,i (x, r, t) and the interstitial volume of the mobile
75
phase ci (x, t) is described by Eq. (1). The peak-broadening effects in the interstitial volume are
76
lumped in the voidage of the bed εb , the axial dispersive coefficient Dax , and the film transfer
77
coefficient kf ilm,i . Eqs. (2) and (3) are the Danckwerts boundary conditions. Eq. (4) describes the
78
exchange between the stationary phase qi and the pore volume concentration cp,i depending on the
79
adsorber particle voidage εp and the component-specific pore diffusion coefficient Dp,i . Here, the
80
less complex linear driving force (LDF) model could also be chosen in spite of large molecules, if
81
the number of transfer units is high and the adsorption isotherm is linear.
Page 3 of 27 3
82
2.2. Stoichiometric displacement isotherm model In the linear region of the adsorption isotherm, the steric hindrance effect of proteins can be
84
neglected, resulting in equivalence of the steric mass action (SMA) [11] and stoichiometric dis-
85
placement isotherm model (SDM). Since the presented work only investigates the linear region, for
86
description of the interaction between protein and ligand on the IEX adsorber surface, SDM in the
87
kinetic formulation is employed [9, 10]: ∂qi (x, r, t) ∂t
=
cr
kkin,i
ip t
83
keq,i qsalt (x, r, t)νi cp,i (x, r, t) − csalt (x, r, t)νi qi (x, r, t)
(5)
The adsorption coefficient kads,i and the desorption coefficient kdes,i are replaced by the equilib-
89
rium coefficient keq,i = kads,i /kdes,i and the kinetic coefficient kkin,i = 1/kdes,i as proposed in [31].
90
The advantage of this formulation is that keq,i strongly affects the retention time, whereas kkin,i
91
has a slight influence on the peak height when the equilibrium is not reached instantly. νi is the
92
characteristic charge of the component i. csalt is the counter-ion concentration in the pore phase.
93
The salt concentration in the stationary phase qsalt is given as: qsalt (x, r, t)
M
an
us
88
=
Λ−
m X
νi qi (x, r, t)
(6)
i=1
d
95
with the ionic capacity of the stationary phase being Λ. Combined with the GRM Eqs. (1)-(4), an IEX chromatography process can be modeled using numerical methods.
97
2.3. Numerical solution
Ac ce p
96
te
94
98
The finite element method is employed for the discretization in space on a grid with equidistant
99
nodes. The fractional step θ-scheme is chosen for the discretization in time [32]. The solution
100
of the non-linear equation system is approximated by Picard iteration [33]. Adaptive simulated
101
annealing [34] and the Levenberg-Marquardt algorithm [35] are employed for parameter estimation.
102
The former is a heuristic algorithm that searches a defined space with random jumps to find the
103
global minimum. Thereafter, the results are refined by applying the latter algorithm which is
104
deterministic.
105
2.4. Artificial neural networks
106
In comparison to first-principles modeling, ANN modeling offers numerous advantages such
107
as the ability to detect possible correlations and unconsidered nonlinear relationships between
108
variables [36]. Its topological structure is shown in Fig. 1. Each node is connected to every node in
109
the next layer. The nodes in the input layer represent the input data and the ones in the output
110
layer return the output data. In the nodes of the hidden layer, the sum of products of biases and
Page 4 of 27 4
111
weights n, and the products of inputs and weights is transferred with a hyperbolic tangent sigmoid
112
activation function to an output signal between −1 and +1 as shown in Eq. (7). 2 1 + e−2n
=
(7)
ip t
f (n)
Here, the nonlinearity in the data can be covered. A linear activation function is employed in the
114
nodes of the output layer to transfer the sum of weighted outputs from the former layer and the
115
weighted biases to the final outputs. By mapping the input data to the correct output data using
116
backpropagation, the difference between the targeted and the actual output values is iteratively
117
minimized by updating the weights in the network [37].
118
3. Materials and methods
119
3.1. Instruments
an
us
cr
113
All chromatographic experiments were carried out using an Ettan liquid chromatography (LC)
121
system equipped with pump unit P-905, dynamic single chamber mixer M-925 (90 µl mixer vol-
122
ume), UV monitor UV-900 (3 mm optical path length), and conductivity cell pH/C-900 (all GE
123
Healthcare, Little Chalfont, Buckinghamshire, UK).
124
3.2. Software
d
M
120
The control software UNICORN 5.31 (GE Healthcare, Little Chalfont, Buckinghamshire, UK)
126
was used in combination with the LC system. The mechanistic model was simulated using the
127
protein chromatography simulation software ChromX (GoSilico, Karlsruhe, Germany). ChromX
128
was used for the numerical simulation of the system of partial differential equations, estimation of
Ac ce p
te
125
130
model parameters, as well as optimization and statistical analysis [38]. The ANN modeling was R conducted in Matlab R2016a (MathWorks, USA).
131
3.3. Adsorber, proteins, and chemicals
129
132
R The strong cation-exchange chromatography (CEX) adsorber medium, TOYOPEARL Giga-
134
Cap S-650M supplied by Tosoh Bioscience (Griesheim, Germany) was used as pre-packed 0.965 mL R Toyoscreen column with dimension 30 mm × 6.4 mm. The adsorber media were stored in 20 %
135
ethanol between the runs. The storage solution was removed by prolonged equilibration with water,
136
and flushed with low-salt and high-salt buffer before experimentation.
133
137
Three proteins of different mass transfer and adsorption behavior according to their different
138
sizes and different isoelectric points were used. A monoclonal antibody (mAb, pI 8.3-8.5, 144-
139
147 kDa) of the IgG isotype was provided by Lek (Ljubljana, Slovenia). Cytochrome c (bovine
140
heart, no. A4612, pI 10.4-10.8, 12.3 kDa) was purchased from Sigma-Aldrich (St. Louis, MO, USA).
141
Lysozyme (hen egg, no. HR7-110, pI 11.0, 14.6 kDa) was purchased from Hampton Research (Aliso
142
Viejo, CA, USA).
Page 5 of 27 5
The experiments were conducted below the isoelectric points of all three proteins. 50 mM
144
sodium phosphate buffer was used at pH 6.5 with additional 0 and 1 M NaCl. 0.5 M NaOH was
145
used for cleaning-in-place. All solutions were prepared with ultra pure water (arium pro UV,
146
Sartorius, G¨ ottingen, Germany), filtrated with a membrane cutoff of 0.22 µm and degassed by
147
sonication. Protein solution with 1 mg/ml of IgG, cytochrome c, and lysozyme each was prepared
148
in corresponding binding buffer and filtrated with a membrane cutoff of 0.22 µm prior to usage.
149
3.4. System characterization
cr
ip t
143
¨ The AKTA system and chromatography column were characterized via tracer pulse injection
151
at a constant flow rate of 0.5 ml/min just like all other chromatography experiments in this work.
152
For determination of the interstitial volume of the column, filtrated 10 g/L blue dextran 2000 kDa
153
(Sigma-Aldrich, St. Louis, MO, USA) solution was used as non-interacting, non-pore-penetrating
154
tracer. The system and total voidage of the column were determined using 25 µL 1 %(v/v) acetone
155
(Merck, Darmstadt, Germany) as a pore-penetrating, non-interacting tracer. For determination
156
of these parameters, the UV signal at 260 nm was recorded. All measurements were corrected
157
with respect to dead volumes. Acid-base titration following Huuk et al. was applied to determine
158
the ionic capacity Λ of GigaCap S-650M [39] in triplicate. The column was first equilibrated for
159
15 CV with 0.5 M HCl to exchange the counter-ion against protons. Subsequently, the columns
160
were washed with ultra-pure water for 20 CV to displace the HCl solution. Finally, the adsorber
161
was titrated with a 0.01 M NaOH solution. The total ionic capacity per adsorber skeleton volume
162
was calculated according to
164
and
Ac ce p
163
te
d
M
an
us
150
,
(8)
nN a+ = VN aOH · cN aOH
(9)
Λ=
nN a+ , Vc · (1 − εt )
(10)
165
with the amount of exchanged sodium ions nN a+ , column volume Vc , total porosity εt , and con-
166
centration of NaOH solution used cN aOH .
167
3.5. Bind-and-elute experiments
168
Protein solution with 1 mg/ml of IgG, cytochrome c, and lysozyme each was prepared in cor-
169
responding binding buffer and injected via a 100 µL loop. From low-salt and high-salt buffers,
170
different steps and gradients were mixed within the LC system. A subset of the chromatograms
171
was used to perform the parameter estimation with the inverse method [31]. The remaining runs
172
were used for model validation and the ANN-based root cause investigation.
Page 6 of 27 6
The linear gradient experiments were conducted by applying the varying gradient lengths 15 CV,
174
20 CV, and 25 CV. After a post-loading wash step of 1 CV equilibration buffer, elution was started
175
with 50 mM and an increasing salt gradient ranging from 50 mM to 400 mM NaCl. After protein
176
elution, the column was stripped over 3 CV at a NaCl concentration of 1.05 M and re-equilibrated
177
for 1 CV binding buffer.
ip t
173
The step elution consisted of three elution steps. After an equilibration step with 2 CV equili-
179
bration buffer and a wash step of 2 CV binding buffer, the salt concentration was raised to 100 mM
180
NaCl and held for 8 CV. Then, the concentration was further increased to 190 mM NaCl and kept
181
constant for 8 CV. Finally, the column was stripped for 3 CV at a concentration of 1.05 M NaCl
182
and re-equilibrated with equilibration buffer.
us
All experiments were carried out at a flow rate of 0.5 ml/min to ensure a constant residence
184
time.
185
3.6. Mechanistic model calibration and validation
an
183
cr
178
Two bind-and-elute chromatograms using linear gradient elution of 20 CV and 25 CV were used
187
for model calibration. Widely accepted correlations suggested in the literature were employed to
188
determine the upper and lower boundaries for subsequent parameter estimation. The penetration
189
theory [40] was chosen to calculate the upper boundary and the correlation suggested by Kataoka
190
and coworker [41] was employed to calculate the lower boundary of the film diffusion coefficient.
191
The correlation based on Guiochon [42], Tyn and Gusek [43], and Mackie and Meares [44] was used
192
to estimate the lower boundary, and the correlation based on Guiochon [42], Tyn and Gusek [43],
193
and Wackao and Smith [45] to estimate the upper boundary of the pore diffusion coefficient. In
194
parameter estimation, the genetic algorithm was employed at first. For fine tuning of the parameter
195
estimates, the Levenberg-Marquardt algorithm [35] was employed. It is important to note, that a
196
small discrepancy between simulated and experimental chromatograms does not necessarily indicate
197
an accurate parameter estimation. Thus, the application of parameter estimation need to be sub-
198
jected to rigorous validation. Here, an uninvolved step-wise gradient bind-and-elute chromatogram
199
was predicted and compared to wet-lab experiment for validation. The confidence intervals at
200
95 % level were calculated based on parameter sensitivities subsequently to assess the reliability of
201
parameter estimates.
202
3.7. In silico screening
Ac ce p
te
d
M
186
203
570 model parameter combinations of salt gradient length and ionic capacity were used to
204
perform in silico experimentation using ChromX. The ranges from 1.0 M to 1.3 M for the ionic
205
capacity and from 12 CV to 28 CV for the salt gradient length were chosen. The salt gradient
206
length and ionic capacity both affect the elution behavior in very similar ways. Visually inspected,
207
a deviation of the ionic capacity or salt gradient length influences the retention time and the peak
Page 7 of 27 7
height. A pronounced problem is given, when multiple root causes are potentially responsible for
209
one observed deviation. The ranges were chosen to cover different scenarios of deviated elution
210
behavior on CEX in a process-relevant parameter window. All parameters and the resulting 570
211
chromatograms were normalized to the minimum and maximum values observed. Since the in silico
212
experimentation generates ideal data, white Gaussian noise was added by a signal-to-noise ratio of
213
60. After that, all normalized UV signals smaller than 0.2 were cut off with the aim to reduce the
214
data amount. This resulted in cheaper computation and was proven to be time-saving for potential
215
real-time application of the ANN.
216
3.8. ANN model calibration and validation
us
cr
ip t
208
219
and the linear activation function purelin were implemented in the nodes of the hidden layer and
220
in the ones of the output layer, respectively. The function divideint using interleaved indices was
221
employed to divide the in silico data into training, validation, and test data sets by a ratio of
222
5:3:2. The network training function trainlm was applied to update weight values for inputs and
223
biases according to an adaptive learning rate and a gradient descent momentum. The learning rate
224
and the momentum were set to 0.05 and 0.9. A root-mean-square error (RMSE) of 1·10−5 and a
225
maximal iteration number of 1000 were set as stopping criteria.
226
3.9. ANN-based root cause investigation
Ac ce p
te
d
M
an
218
A simply constructed BP-ANN model with a hidden layer of nine neurons and an output layer R of two neurons was built in MATLAB . The hyperbolic tangent sigmoid activation function tansig
217
227
Besides the verification of the ANN’s generalization capability with cross-validation using 20 %
228
of the total in silico data sets, a case study with three runs with differently pronounced deviations
229
in their chromatograms was designed to show the practical applicability of the presented root cause
230
investigation method. The calibration experiment with a linear gradient elution of 25 CV and ionic
231
capacity of 1.172 M was chosen to be the set point. Three further bind-and-elute experiments using
232
parameter combinations of 15 CV and 1.172 M, 20 CV and 1.130 M, as well as 25 CV and 1.130 M
233
were carried out in the laboratory. The first experiment represented the deviation of the salt gradient
234
length only, the second the simultaneous deviation of the salt gradient length and ionic capacity,
235
and the third the deviation of the ionic capacity only. Furthermore, they represented the different
236
degrees of deviations in the chromatogram. The chromatograms were fitted with exponentially
237
modified Gaussian distribution [46] in the aim of signal smoothing and peak separation. After
238
normalization, all UV signals smaller than 0.2 were cut off according to the data preparation for
239
ANN model calibration and validation. The separated peaks were entered into the calibrated BP-
240
ANN model which returned the predictions of salt gradient length and ionic capacity for each
241
experiment. Subsequently, the predictions were compared to the settings and the percentage errors
242
were calculated to prove the predictive accuracy.
Page 8 of 27 8
243
4. Results and discussion
244
4.1. System characterization Without a column attached, the dead volume of 280 µl of the LC system was determined by
246
tracer injection. This dead volume was subtracted from all other data generated with the LC. Based
247
on the results of the tracer injections, axial dispersion, bed and particle voidage were calculated.
248
The ionic capacities were calculated based on the acid-base titration in triplicate. The ionic capacity
249
was reduced irreversibly from 1.172 M ± 0.003 M to 1.130 M ± 0.004 M by flushing the column with
250
2.0 M HCl at a flow rate of 0.2 ml/min for 24 h. The results are listed in Table 1.
251
4.2. Mechanistic model calibration and validation
us
cr
ip t
245
To estimate the model parameters kf ilm , Dpore , keq , and ν with the inverse method, two linear
253
gradient chromatography experiments were carried out with gradient lengths of 20 CV and 25
254
CV. First, the genetic algorithm was employed. Thereafter, a Levenberg-Marquardt algorithm was
255
used to find the final optimum. The estimated parameters are given in Table 2. The confidence
256
intervals at 95 % level confirm the reliability of the parameter estimates. Obviously, kkin was not
257
necessary for the description of the system considered, because the equilibria of all components were
258
reached instantly. Figs. 2 (a) and (b) show the experimental and simulated chromatograms used
259
for the model calibration. The weakly binding component mAb, moderately binding component
260
cytochrome c, and the strongly binding component lysozyme are eluted in the salt gradient in
261
successive order. To prove the prediction performance of the calibrated mechanistic model, an
262
additional experiment beyond the calibration space has been conducted as validation data. Fig. 2 (c)
263
shows very good agreement of experimental data and model prediction. Since the reliability of the
264
parameter estimates and the accuracy of the calibration are given, this mechanistic model is assessed
265
to be suitable for in silico data generation.
266
4.3. ANN model calibration and validation
Ac ce p
te
d
M
an
252
267
570 in silico experiments were conducted to screen the design space as shown in Fig. 3 (b). The
268
resulting chromatograms can be seen in Fig. 3 (a). The ranges from 1 M to 1.3 M and from 12 CV
269
to 28 CV were chosen to cover the designed deviations of ionic capacity and salt gradient length in
270
the case study. Thereafter, all normalized UV signals smaller than 0.2 were cut off with the aim
271
to reduce the data containing less relevant case-specific information. The resulting chromatograms
272
can be seen in Fig. 4. This resulted in cheaper computation and proved to be time-saving for
273
potential real-time application of the ANN. For ANN model calibration, backpropagation was used
274
to correlate the results of in silico screening with the corresponding parameter sets. 50 % of the
275
total data were used as training data set for the ANN’s adjustment according to the respective
276
errors. 30 % were used as validation data set to measure the generalization performance, and to
Page 9 of 27 9
stop training when generalization stopped improving. The remaining 20 % were excluded during
278
model calibration and used subsequently as test data for cross-validation. The final correlation
279
achieved for the salt gradient length is given in Figs. 5 (a) - (c) which represent the training data
280
set, validation data set, and test data set, respectively. Figs. 5 (d) - (f) show the correlation for the
281
ionic capacity.
ip t
277
A good agreement between the predictions by the ANN model and the parameter sets given in the
283
in silico screening was observed. The smallest deviations were found to be in the correlation results
284
of training data set (R2 = 0.9997 for salt gradient length, and R2 = 0.9992 for ionic capacity). The
285
deviations found in the correlation results of validation data set were in a statistically acceptable
286
range (R2 = 0.9926 for salt gradient length, and R2 = 0.9822 for ionic capacity). According to the
287
correlation results of the test data set (R2 = 0.9965 for salt gradient length, and R2 = 0.9904 for
288
ionic capacity), the ANN’s generalization was confirmed, verifying its usability for the root cause
289
investigation.
290
4.4. ANN-based root cause investigation
M
an
us
cr
282
Although the ANN’s generalization capability was verified with the cross-validation using 20 %
292
of the total in silico data sets, three runs with strong, moderate, and slight deviations in their
293
chromatograms were generated in the laboratory to prove the practical applicability of the presented
294
method for root cause investigation. The salt gradient length and ionic capacity were adjusted
295
separately and simultaneously to challenge the calibrated ANN model with the pronounced problem,
296
when multiple root causes were potentially responsible for one observed deviation. As can be seen
297
in Table 3, the casual relations in all three deviated runs were determined properly. Overall, the
298
deviations were found in a statistically acceptable range. Errors of approximately 0 % and 4.90 %
299
were given for the prediction of the deviated run 1. The deviated run 2 with both adjusted salt
300
gradient length and ionic capacity was predicted accurately with the small errors of 0.53 % and
301
0.55 %. Errors of 1.5 % and 3.35 % were given for the prediction of the deviated run 3. Compared
302
to the inverse modeling approach proposed by Brestrich and coworkers [47], the presented method
303
is advantageous with respect to the computational expense. Unlike the necessity of performing
304
parameter estimation repeatedly, the calibrated ANN model can be carried out with very little
305
computational expense.
Ac ce p
te
d
291
306
A correlation has been derived by Parente and Wetlaufer originally for determination of the
307
adsorption isotherm model parameters in linear region [48]. Its modified formulation delivered by
308
Shukla and coworkers [49] was employed with retention volumes of the solutes and solved for ionic
309
capacity and gradient length. To compare the performance of this method and the presented one,
310
the calculated results are shown in Tab. 4.
Page 10 of 27 10
311
5. Conclusion This study presents a novel method for the root cause investigation in protein chromatography
313
when an exact mathematical model which can immediately identify the root causes of the deviation
314
is not given and the amount of data available does not allow the direct application of an ANN
315
approach. The fundamental idea was to combine a mechanistic model and an ANN model to
316
benefit from their respective advantages. The learning material for the ANN model was generated
317
by conducting 570 in silico experiments using the calibrated mechanistic model. Based on these
318
data, the ANN model learned the correlations between the appearances of the chromatograms
319
and the parameters of interest. Since only few experiments were required for the calibration of
320
the mechanistic model, the high wet-lab experimental effort needed for the training of the ANN
321
model is circumvented. The generalization performance of the artificial intelligence of a very simple
322
construction was assessed by correlation results of cross-validation. The practical applicability
323
was verified by an experimental case study. Thereby, three experiments were conducted in the
324
laboratory to generate the deviating chromatograms by deliberately causing errors in the parameters
325
salt gradient length, ionic capacity, and both of them. These experiments represented the cases
326
of strong, moderate, and slight deviations in the chromatogram, respectively. This pronounced
327
problem, that multiple root causes could be potentially responsible for one observed deviation, was
328
solved successfully. The ionic capacity that had hitherto been a non-observable parameter during
329
operation, was notably accurate with percentage errors of 0 %, 0.53 %, and 1.50 %.
te
d
M
an
us
cr
ip t
312
With this method, one can identify the root causes of the deviation and determine their mag-
331
nitudes within milliseconds. Thus, it is well-suited for real-time process monitoring and control. It
332
allows an immediate correction of the chromatography process operating condition as a short-term
333
solution. In this way, the economic harm can be reduced drastically, especially when innovative
334
technologies providing high productivity e.g., continuous chromatography, are involved in the man-
335
ufacturing. For future work, the method should be extended by a decision-making strategy to be
336
able to correct the deviated operating conditions to the closest Pareto optimum.
337
6. Acknowledgment
Ac ce p
330
338
This project has received funding from the European Unions Horizon 2020 Research and In-
339
novation Programme under grant agreement No 635557. We kindly thank Lek Pharmaceuticals
340
d.d. (M engˇ es, Slovenia) for providing the mAb, and Johannes Winderl and Matthias R¨ udt for
341
the scientific conversations.
342
The authors declare no conflict of interest.
Page 11 of 27 11
343
References [1] P. Baumann, T. Hahn, J. Hubbuch, High-throughput micro-scale cultivations and chromatog-
345
raphy modeling: Powerful tools for integrated process development, Biotechnol. Bioeng. 112
346
(2015) 2123–2133.
ip t
344
[2] F. Steinebach, M. Angarita, D. J. Karst, M¨ uller-Sp¨ath, M. Morbidelli, Model based adaptive
348
control of a continuous capture process for monoclonal antibodies production, J. Chromatogr.
349
A 1444 (2016) 50 – 56.
cr
347
[3] C. L. Effio, T. Hahn, J. Seiler, S. A. Oelmeier, I. Asen, C. Silberer, L. Villain, J. Hubbuch,
351
Modeling and simulation of anion-exchange membrane chromatography for purification of Sf9
352
insect cell-derived virus-like particles, J. Chromatogr. A 1429 (2016) 142–154.
an
us
350
[4] S. M. Pirrung, L. A. van der Wielen, R. F. Van Beckhoven, E. J. van de Sandt, M. H. Eppink,
354
M. Ottens, Optimization of biopharmaceutical downstream processes supported by mechanistic
355
models and artificial neural networks, Biotechnol. Prog.doi:10.1002/btpr.2435.
356
URL http://dx.doi.org/10.1002/btpr.2435
M
353
[5] S. Chhatre, S. Farid, J. Coffman, P. Bird, A. Newcombe, N. Titchener-Hooker, How imple-
358
mentation of Quality by Design and advances in Biochemical Engineering are enabling efficient
359
bioprocess development and manufacture, J. Chem. Technol. Biot. 86 (2011) 1125–1129.
361
te
[6] A. Hanke, M. Ottens, Purifying biopharmaceuticals: knowledge-based chromatographic pro-
Ac ce p
360
d
357
cess development, Trends Biotechnol. 32 (2014) 210–220.
362
[7] International Conference on Harmonisation of Technical Requirements for Registration of
363
Pharmaceuticals for Human Use, ICH-Endorsed Guide for ICH Q8/Q9/Q10 Implementation,
364
365
366
367
368
369
370
371
372
373
http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Quality/ Q8_9_10_QAs/PtC/Quality_IWG_PtCR2_6dec2011.pdf.
[8] H. Schmidt-Traub, M. Schulte, A. Seidel-Morgenstern (Eds.), Preparative Chromatography, 2nd Edition, Wiley, 2012.
[9] A. Velayudhan, C. Horvath, Preparative chromatography of proteins - analysis of the multivalent ion-exchange formalism, J. Chromatogr. 443 (1988) 13 – 29. [10] G. Xindu, F. E. Regnier, Retention model for proteins in reversed-phase liquid chromatography, J. Chromatogr. A 296 (1984) 15 – 30. [11] C. A. Brooks, S. M. Cramer, Steric mass-action ion exchange: Displacement profiles and induced salt gradients, Fluid Phase Equilibr. 38 (1992) 1969–1978.
Page 12 of 27 12
374
375
[12] H. Iyer, S. Tapper, P. Lester, B. Wolk, R. van Reis, Use of the steric mass action model in ion-exchange chromatographic process development, J. Chromatogr. A 832 (1999) 1–9. [13] J. M. Mollerup, T. B. Hansen, S. Kidal, L. Sejergaard, A. Staby, Development, modelling,
377
optimisation and scale-up of chromatographic purification of a therapeutic protein, Fluid Phase
378
Equilibr. 261 (2007) 133–139.
ip t
376
[14] T. C. Huuk, T. Hahn, A. Osberghaus, J. Hubbuch, Model-based integrated optimization and
380
evaluation of a multi-step ion exchange chromatography, Sep. Purif. Technol. 136 (2014) 207–
381
222.
384
385
us
interaction chromatography step, J. Chromatogr. A 1099 (2005) 157–166.
an
383
[15] N. Jakobsson, M. Degerman, B. Nilsson, Optimisation and robustness analysis of a hydrophobic
[16] N. Jakobsson, M. Degerman, E. Stenborg, B. Nilsson, Model based robustness analysis of an ion-exchange chromatography step, J. Chromatogr. A 1138 (2007) 109–119.
M
382
cr
379
[17] M. Degerman, K. Westerberg, B. Nilsson, Determining critical process parameters and process
387
robustness in preparative chromatography a model-based approach, Chem. Eng. Technol. 32
388
(2009) 903–911.
390
[18] M. Degerman, K. Westerberg, B. Nilsson, A model-based approach to determine the design
te
389
d
386
space of preparative chromatography, Chem. Eng. Technol. 32 (2009) 1195–1202. [19] K. Westerberg, E. B. Hansen, M. Degerman, T. B. Hansen, B. Nilsson, Model-based process
392
challenge of an industrial ion-exchange chromatography step, Chem. Eng. Technol. 35 (2012)
393
Ac ce p
391
183–190.
394
[20] A. Krogh, What are artificial neural networks?, Nat. Biotechnol. 26 (2008) 195–197.
395
[21] I. Basheer, M. Hajmeer, Artificial neural networks: fundamentals, computing, design, and
396
application, J. Microbiol. Meth. 43 (1) (2000) 3 – 31.
397
[22] R. Bergs, V. Sanz-Nebot, J. Barbosa, Modelling retention in liquid chromatography as a func-
398
tion of solvent composition and pH of the mobile phase, J. Chromatogr. A 869 (12) (2000) 27
399
– 39.
400
[23] J. E. Madden, N. Avdalovic, P. R. Haddad, J. Havel, Prediction of retention times for anions
401
in linear gradient elution ion chromatography with hydroxide eluents using artificial neural
402
networks, J. Chromatogr. A 910 (1) (2001) 173 – 179.
403
[24] V. K. Gupta, H. Khani, B. Ahmadi-Roudi, S. Mirakhorli, E. Fereyduni, S. Agarwal, Prediction
404
of capillary gas chromatographic retention times of fatty acid methyl esters in human blood
Page 13 of 27 13
405
using MLR, PLS and back-propagation artificial neural networks, Talanta 83 (3) (2011) 1014
406
– 1022. [25] A. Malenovi´c, B. Janˇci´c-Stojanovi´c, N. Kosti´c, D. Ivanovi´c, M. Medenica, Optimization of
408
artificial neural networks for modeling of atorvastatin and its impurities retention in micellar
409
liquid chromatography, Chromatographia 73 (9) (2011) 993–998.
ip t
407
[26] A. A. D‘Archivio, A. Incani, F. Ruggieri, Cross-column prediction of gas-chromatographic
411
retention of polychlorinated biphenyls by artificial neural networks, J. Chromatogr. A 1218
412
(2011) 8679 – 8690.
us
414
[27] D. Nagrath, B. W. Bequette, S. M. Cramer, Evolutionary Operation and Control of Chromatographic Processes, AlChE J. 49 (2003) 82–95.
an
413
cr
410
[28] D. Nagrath, A. Messac, B. W. Bequette, S. M. Cramer, A Hybrid Model Framework for the
416
Optimization of Preparative Chromatographic Processes, Biotechnol. Prog. 20 (2004) 162–178.
417
[29] G. G. Wang, S. Shan, Review of Metamodeling Techniques in Support of Engineering Design
421
422
423
424
425
426
427
428
d
420
[30] J. A. Caballero, I. E. Grossmann, An algorithm for the use of surrogate models in modular flowsheet optimization, AlChE J. 54 (2008) 2633–2650.
te
419
Optimization, J. Mech. Des. 129 (2006) 370–380.
[31] T. Hahn, P. Baumann, T. Huuk, V. Heuveline, J. Hubbuch, UV absorption-based inverse modeling of protein chromatography, Eng. Life Sci. 16 (2016) 99–106.
Ac ce p
418
M
415
[32] R. Glowinski, Finite element methods for incompressible viscous flow. In Numerical Methods for Fluids (Part 3), Elsevier, 2003.
[33] A. Quarteroni, A. Vallii, Numerical approximation of partial differential equations, Springer, 1994.
[34] K. Kaczmarski, D. Antos, Use of simulated annealing for optimization of chromatographic separations, Acta Chromatogr. 17 (2006) 20–45.
429
[35] F. Devernay, C/c++ minpack, http://devernay.free.fr/hacks/cminpack/ (2007).
430
[36] J. V. Tu, Advantages and disadvantages of using artificial neural networks versus logistic
431
regression for predicting medical outcomes, J. Clin. Epidemiol. 49 (11) (1996) 1225 – 1231.
432
[37] X. Du, Q. Yuan, J. Zhao, Y. Li, Comparison of general rate model with a new modelartificial
433
neural network model in describing chromatographic kinetics of solanesol adsorption in packed
434
column by macroporous resins, J. Chromatogr. A 1145 (12) (2007) 165 – 174.
Page 14 of 27 14
435
436
[38] T. Hahn, T. Huuk, V. Heuveline, J. Hubbuch, Simulating and Optimizing Preparative Protein Chromatography with ChromX, J. Chem. Educ. 92 (2015) 1497–1502. [39] T. C. Huuk, T. Briskot, T. Hahn, J. Hubbuch, A versatile noninvasive method for adsorber
438
quantification in batch and column chromatography based on the ionic capacity, Biotechnol.
439
Progr. 32 (2016) 666–677.
ip t
437
[40] R. B. Bird, W. E. Stewart, E. N. Lightfoot, Transport Phenomena, Wiley, 1962.
441
[41] T. Kataoka, H. Yoshida, K. Ueyama, Mass transfer in laminar region between liquid and
445
446
447
448
us
linear Chromatography, Elservier, 2006.
an
444
[42] G. Guiochon, A. Felinger, D. G. Shirazi, A. M. Katti, Fundamentals of Preparative and Non-
[43] M. T. Tyn, T. W. Gusek, Predictions of Diffusion Coefficients of Proteins, Biotechnol. Bioeng. 35 (1990) 327–338.
M
443
packing material surface in the packed bed, J. Chem. Eng. Jpn. 5 (1972) 132–136.
[44] J. S. Mackie, P. Meares, The Diffusion of Electrolytes in a Cation-Exchange Resin Membrane. I. Theoretical, Proceedings A 232 (1955) 498–509.
d
442
cr
440
[45] N. Wakao, J. M. Smith, Diffusion in catalyst pellets, Chem. Eng. Sci. 17 (1962) 825–834.
450
[46] E. Grushka, Characterization of exponentially modified Gaussian peaks in chromatography, Anal. Chem. 44 (11) (1972) 1733–1738.
Ac ce p
451
te
449
452
[47] N. Brestrich, T. Hahn, J. Hubbuch, Application of spectral deconvolution and inverse mechanis-
453
tic modelling as a tool for root cause investigation in protein chromatography, J. Chromatogr.
454
A 1437 (2016) 158 – 167.
455
[48] E. S. Parente, D. B. Wetlaufer, Relationship between isocratic and gradient retention times
456
in the high-performance ion-exchange chromatography of proteins. theory and experiment, J.
457
Chromatogr. A 355 (1986) 29 – 40.
458
[49] A. A. Shukla, S. S. Bae, J. A. Moore, K. A. Barnthouse, , S. M. Cramer, Synthesis and charac-
459
terization of high-affinity, low molecular weight displacers for cation-exchange chromatography,
460
Ind. Eng. Chem. Res. 37 (10) (1998) 4090–4098.
Page 15 of 27 15
ip t cr us an
Table 1: The voidages and axial dispersion are calculated from the retention volume and peak broadening of tracer
M
injections. The ionic capacity 1 is the initial state of the column, whereas the ionic capacity 2 is determined after
GigaCap S-650M εb
0.418
Particle voidage
εp
0.758
Total voidage
εt
0.829
Dax
0.120
Ionic Capacity 1 [M]
Λ
1.172 ± 0.003
Ionic Capacity 2 [M]
Λ
1.130 ± 0.004
Ac ce p
te
Bed voidage
d
acid treatment of the same column.
Axial dispersion [mm2 · s−1 ]
Page 16 of 27 16
ip t cr us an
Table 2: Parameters of the mass transfer model and kinetic isotherm formulation estimated from two bind-and-elute experiments using the inverse
Parameter
M
method.
mAb
Cytochrome c
Lysozyme
7.516·10−3 ± 2.474·10−3
1.899·10−2 ± 2.320·10−2
2.098·10−2 ± 1.871·10−2
Dpore [mm2 · s−1 ]
2.325·10−5 ± 0.315·10−5
6.823·10−5 ± 1.482·10−5
6.304·10−5 ± 0.836·10−5
4.388 ·10−10 ± 0.522·10−10
4.860·10−8 ± 0.227·10−8
2.982·10−5 ± 0.138·10−5
9.720 ± 0.038
10.353 ± 0.018
10.012 ± 0.033
te
ν [-]
Ac ce p
keq [-]
d
kf ilm [mm · s−1 ]
Page 17 of 27
ip t cr us an
Table 3: The ionic capacity and salt gradient length adjusted in the laboratory and predicted with the ANN model, and the percentage error of
M
prediction for three deviated runs.
Ionic capacity [M]
Salt gradient length [CV]
Predicted
Percentage error
Adjusted
Predicted
Percentage error
Deviated run 1
1.172
1.172
0%
15
15.735
4.90 %
Deviated run 2
1.130
1.136
0.53 %
20
20.109
0.55 %
Deviated run 3
1.130
1.147
1.50 %
25
24.128
3.35 %
Ac ce p
te
d
Adjusted
Page 18 of 27
ip t cr us an
Table 4: The ionic capacity and salt gradient length adjusted in the laboratory and calculated with the correlation delivered by Parente and Wet-
M
laufer [48], and the percentage error of calculation for three deviated runs.
Ionic capacity [M]
Salt gradient length [CV]
Calculated
Percentage error
Adjusted
Calculated
Percentage error
Deviated run 1
1.172
1.093
6.76 %
15
15.104
0.65 %
Deviated run 2
1.130
1.309
15.81 %
20
15.003
23.27 %
Deviated run 3
1.130
1.175
3.99 %
25
20.906
15.25 %
Ac ce p
te
d
Adjusted
Page 19 of 27
Figure Captions
462
Figure 1: Topological structure of a simple multi-layer BP-ANN.
463
Figure 2: Plots of UV signals over process run time for bind-and-elute experiments. Dashed lines display the UV signal
464
measured at column outlet and the adjusted salt gradients. Solid lines represent the simulated chromatograms. The
465
elution peaks of the early eluting component mAb, the moderate eluting component cytochrome c, and the late eluting
466
component lysozyme by applying linear salt gradients from 0.05 M to 1.05 M over 20 CV and 25 CV are shown in (a) and
467
(b). (c) represent a step-wise gradient elution from 0.05 M, over 0.1 M and 0.19 M, to 1.05 M.
469
Figure 3: The resulting chromatograms of the in silico screening are displayed in (a). The corresponding design space of the in silico screening is shown in (b): ionic capacity∈ [1, 1.3] M, and salt gradient length ∈ [12, 28] CV.
cr
468
ip t
461
Figure 4: The post-processed results of the in silico screening are shown.
471
Figure 5: Plots of screened parameters versus simulated parameters using ANN model. (a) - (c) represent the training data set,
472
us
470
validation data set, and test data set of the parameter ionic capacity. (d) - (f) represent the salt gradient length. Figure 6: Plots of UV signals over process run time for bind-and-elute experiments. The set point run displayed as solid blue
474
line is identical with Fig. 1 and shown here for comparability. The deviated run 1 as dashed purple line was generated by
475
adjusting the salt gradient length to 15 CV. The deviated run 2 as dashed green line was generated by adjusting both the
476
salt gradient length to 20 CV and the ionic capacity to 1.130 M. The deviated run 3 as dashed orange line was generated
477
by adjusting the ionic capacity to 1.130 M.
Ac ce p
te
d
M
an
473
Page 20 of 27
ip t cr us an M d te Ac ce p
Figure 1:
Page 21 of 27
ip t cr us an M d te Ac ce p
Figure 2:
Page 22 of 27
ip t cr us an M d te Ac ce p
Figure 3:
Page 23 of 27
ip t cr us an M d te Ac ce p
Figure 4:
Page 24 of 27
ip t cr us an M d te Ac ce p
Figure 5:
Page 25 of 27
ip t cr us an M d te Ac ce p
Figure 6:
Page 26 of 27
HIGHLIGHTS (3 to 5) Simulations of potential process deviations based on a mechanistic model were used to construct an artificial neural network model
-
Experimental errors in ionic capacity and salt gradient length were imitated simultaneously to prove the practical capability of this method
-
By this approach, the time-consuming root cause investigation itself could be reduced down to milliseconds
-
This approach could be a building block for real-time root cause investigation in chromatography processes
Ac ce p
te
d
M
an
us
cr
ip t
-
Page 27 of 27