Journal of Biotechnology 110 (2004) 295–311
Efficient MILP formulations for the optimal synthesis of chromatographic protein purification processes E. Vásquez-Alvarez a , J.M. Pinto a,b,∗ b
a Department of Chemical Engineering, University of São Paulo, São Paulo, SP 05508-900, Brazil Department of Chemical and Biological Sciences and Engineering, Polytechnic University, Brooklyn, NY 11201, USA
Received 22 April 2003; received in revised form 22 January 2004; accepted 6 February 2004
Abstract The use of recombinant proteins has increased greatly in recent years, as well as the techniques used for their purification. The selection of an efficient process to purify proteins is a major bottleneck found when trying to scale up results obtained in the laboratory to a large-scale industrial process. One of the main challenges in the synthesis of downstream purification stages in biotechnological processes is the appropriate selection and sequencing of chromatographic steps. The objective of this work is to develop mixed integer linear programming models for the synthesis of protein purification processes. Models for each chromatographic technique rely on physicochemical data of a protein mixture, which contains the desired product and provide information on its potential purification. Formulations that are based on convex hull representations are proposed to calculate the minimum number of steps from a set of chromatographic techniques that must achieve a specified purity level and alternatively to maximize purity for a given number of steps. The proposed models are tested in several examples with experimental data and present time reductions of up to three orders of magnitude when compared to big-M formulations. © 2004 Elsevier B.V. All rights reserved. Keywords: Process synthesis; Purification processes; Chromatographic steps; Mixed integer optimization; Convex hull representation
1. Introduction In biotechnological processes, downstream processing is composed of two major stages that are recovery and purification. Depending on the degree of complexity of the mixtures that result from bioreactions, several recovery and purification operations may be necessary to isolate the desired product. The most important operations include chromatographic tech∗ Corresponding author. Tel.: +1-718-260-3569; fax: +1-718-260-3125. E-mail address:
[email protected] (J.M. Pinto).
niques that are critical for therapeutical products such as vaccines and antibiotics that require a very high purity level (98.0–99.9%). One of the main challenges in the synthesis of downstream purification stages is the appropriate selection and sequencing of chromatographic steps (Larsson et al., 1997). Therefore, optimization methods as well as expert systems may be useful tools for the design and synthesis of protein purification processes. While there is a significant amount of research on the development of expert systems for the synthesis of purification processes, much less has been reported in terms of optimization-based approaches. Bryant and
0168-1656/$ – see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jbiotec.2004.02.009
296
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
Nomenclature a AAE , CAE ACE , CCE Ai BGF , DGF CFi,p dp DFi,p i k k∗ Kdi,p mdp,k mp,1 mp,k moi,p MWp p Pa,p Qp S SPdp U wi Wi yi,k Yi,k Zk Greek letters αk σi ∆
protein property parameters of the dimensionless retention time correlation that corresponds to anion exchange chromatography parameters of the dimensionless retention time correlation that corresponds to cation exchange chromatography set of properties used in technique i parameters of the dimensionless retention time correlation that corresponds to gel filtration chromatography concentration factor of contaminant p after chromatographic step i (mass reduction of contaminant p after and before chromatographic step i) desired protein deviation factor for protein p in chromatographic step i chromatographic technique (i = 1, . . . , I) order in the sequence (k = 1, . . . , K or k*) final order in the sequence (k∗ ≤ K) retention time of protein p in chromatographic step i mass of the desired protein after going through chromatographic step in order k initial mass of protein p mass of contaminant p after chromatographic step in order k mass of protein p after chromatographic step i molecular weight of protein p protein (product + contaminants) (p = 1, . . . , P) value of property a for protein p charge of protein p objective function specified purity level of the desired protein dp upper bound on protein mass binary variable that denotes if chromatographic technique i is selected Boolean variable for selecting chromatographic technique i binary variable for chromatographic technique i in order k Boolean variable for chromatographic technique i in order k binary variable that indicates if order k corresponds to the last step in the sequence
slack variable relative to order k width of chromatographic peak in step i experimental parameter that denotes the fraction of contaminants that does not separate from the product
Rowe (1998) present a review on computational techniques based on artificial intelligence methods that extract information from chromatographic analysis data. Steffens et al. (2000) developed a hybrid technique for synthesizing downstream purification
processes. A heuristic screening procedure is proposed based on physical property information for reducing the size of the process superstructure, and in the sequence an implicit enumeration algorithm is applied.
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
Mathematical programming approaches for process synthesis rely on the representation of algebraic equations with discrete variables. Moreover, these problems may be modeled with disjunctive constraints. Model representation through logics and disjunctions is in many cases a natural way of formulating the problem (Raman and Grossmann, 1991). Several authors exploited this field, such as Raman and Grossmann (1994) that developed a framework for formulating mixed-integer optimization models from a logical representation, through disjunctions and logical propositions. Raman and Grossmann (1994) also developed an algorithm to solve logic-based problems, in which the model constraints are partially treated as disjunctions and the remaining formulated as mixed-integer constraints. The key issue is to decide which constraints may be converted into mixedinteger form. The authors developed the w-MIP representability criterium from which disjunctions are converted into mixed-integer constraints. Balas (1985, 1998) showed that every mixed-integer programming model can be reformulated into a disjunctive program and vice-versa. The author identifies integer programs (mixed or pure) as well as other nonconvex programming problems that can be formulated as linear models with logical conditions. Vecchietti and Grossmann (1999, 2000) developed a hybrid representation for nonlinear continuous and discrete problems. The resulting models may involve disjunctions, binary variables and mixed-integer constraints. The solution is presented for several process synthesis examples that illustrate the benefits of the use of disjunctions in the problem formulation. Turkay and Grossmann (1998) propose the convex hull representation for systems of linear disjunctions as opposed to big-M constraints. These representations correspond to relaxations of disjunctive formulations. The big-M formulation is the simplest way to convert disjunctions into mixed-integer form that introduces a big parameter M, which renders redundancy if disjunctions are not selected. The main advantage of a convex hull representation is that it generates a small relaxation gap of a given disjunction. Grossmann and Lee (2000) describe the convex hull of a disjunction that involves nonlinear constraints. The objective of this work is to develop efficient mixed integer programming models for the synthesis of protein purification processes. Models based on
297
the convex hull representation of disjunctions are proposed to calculate the minimum number of steps from a set of candidate chromatographic steps that must achieve a specified purity level. These MILP models correspond to reformulations of the ones developed in Vásquez-Alvarez et al. (2001). Moreover, models are generated to select purification operations and their sequence that maximize purity of the desired protein. These are tested in several examples with experimental data (Lienqueo et al., 1999). This paper is organized as follows. Firstly, we describe models for the protein purification steps based on parameters derived from experimental data. Then, the MILP models that are based on the convex hull relaxation of disjunctions are proposed. Three examples are shown and the computational performance of the convex-hull models is compared to previous formulations based on big-M relaxations. Finally, a sensitivity analysis over the major model parameters that affect the chromatographic correlations is performed with the goal of verifying their impact on the optimal solutions of the synthesis problems.
2. Problem definition Given a mixture of proteins at different concentration levels and a desired product specification in terms of minimum purity, the problem is to synthesize a high-resolution purification process, which is usually carried out by chromatography, that isolates the desired protein from the contaminant ones. Selection of these purification operations is based on the efficiency of different chromatographic techniques that may be employed to separate the target protein from the contaminant ones. 2.1. Models for the chromatographic operations The separation techniques rely on mathematical correlations and their corresponding parameters that depend on physicochemical properties, on the distance between chromatographic peaks, on peak width values as well as on mathematical correlations that are determined experimentally for standard proteins (Lienqueo et al., 1996). The proposed model for each chromatographic step is based on approximations of the chromatograms by
298
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
isosceles triangles and on physicochemical property data for the target protein as well as for the major contaminants in the mixture (Leser, 1996; Lienqueo et al., 1996, 1998, 1999; Lienqueo, 1999). Each chromatographic step is able to perform the separation of the protein mixture based on their physicochemical properties, which define the dimensionless retention times (Kdi,p ) as follows: Kdi,p = fi (Pa,p )
∀i, p, a ∈ Ai
(1)
parameter Kdi,p in (1) is a function of a characteristic property a that has a characteristic value (Pa,p ) for each protein p (molecular weight, hydrophobicity, charge etc.) that is the basis for a particular separation operation (Leser, 1996; Vásquez-Alvarez et al., 2001). The main parameter that determines the behavior of the proteins in ion exchange (anion exchange (AE) and cation exchange (CE)) chromatography is charge density (Lienqueo et al., 1996). For many proteins this parameter may be correlated by a nonlinear function of retention time. The relationship depends on the molecular weight of the protein as well as on the type of column to be used. For hydrophobic interaction (HI) chromatography, Lienqueo et al. (1996) relate the retention time to the concentration of ammonium sulphate that the protein elutes from the chromatographic column, and to the maximum concentration of ammonium sulphate in which the elution gradient is initiated (usually 1.5 M). In gel filtration (GF), retention time is determined from the logarithm of the molecu-
lar weight (in fact the parameter that depends linearly on the logarithm of the molecular weight is the distribution coefficient). The mathematical correlations for each of the chromatographic techniques used in the synthesis of the protein purification process can be seen in Table 1 and were obtained experimentally by Leser (1996) and Lienqueo et al. (1996, 1998). The deviation factor (DFi,p ) is defined as the absolute difference between the dimensionless retention time (Kdi,p ) of the product and that for each of the contaminants (see Fig. 1). Therefore, the deviation factor for each protein p in technique i is given as: DFi,p = |Kdi,dp − Kdi,p |
∀i, p
(2)
The concentration factor (CFi,p ) is represented by mathematical relationships that are determined from the approximation of two chromatographic peaks of equal size and shape, one that corresponds to the desired protein and the other to one of the contaminant proteins (Vásquez-Alvarez et al., 2001). In Fig. 1, the peaks have constant form and consider that the peak on the left refers to the desired protein and the one on the right to the contaminant. The intersection of the two triangles (shaded areas) represents the amount of contaminant p that remains in the mixture (with the product) after applying chromatographic technique i. Note that the cut relies on the assumption that there is no loss of the target protein. CFi,p = 1,
if 0 ≤ DFi,p <
σi 10
(3a)
Table 1 Expressions and parameters for the chromatographic steps Chromatographic technique
Dimensionless retention timesa (Kdi,p )
Anion exchange
KdAE,p =
7383 × |Qp /MWp | , 1 × 10−25 + 15844 × |Qp /MWp |
KdAE,p=0 , Cation exchange
KdCE,p =
5972 × |Qp /MWp | , 1 × 10−25 + 17065 × |Qp /MWp |
0.15
if Qp > 0
0.15
if Qp ≤ 0 [(NH4 )2 SO4 ]p , [(NH4 )2 SO4 ]max
Hydrophobic interaction
KdHI,p = 1 −
Gel filtration
KdGF,p = −0.4691 log MWp + 2.3902
MW, molecular weight in (Da).
if Qp < 0
if Qp ≥ 0
KdCE,p = 0,
a
Peak width (σ i )
[(NH4 )2 SO4 ]max = 1.5 M
0.22 0.46
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
299
Fig. 1. Representation of chromatographic peaks.
CFi,p = (1 + 0.02)
σi2 − 2DF2i,p σi2 if 0 ≤
CFi,p = 2(1 + 0.02)
σi σi ≤ DFi,p < 10 2
(σi − DFi,p )2 σi2 if
CFi,p = 0.02,
graphic step i and are introduced in the synthesis models that will be described in Sections 3 and 4.
,
3. Models for optimal selection
,
σi ≤ DFi,p < σi 2
if DFi,p ≥ σi .
(3b)
(3c) (3d)
The mathematical relationships expressed in (3) represent graphical approximations of the chromatograms for two different proteins. The peak width (σ i ) denotes the approximation of the chromatographic peak by a triangle and DFi,p is observed as a difference between the times that correspond to the peak values. The concentration factors (CFi,p ) obtained from (3a) to (3d) represent the ratio between the mass of protein p after and before chromato-
The modeling of the optimal selection of purification steps of a protein mixture is shown in this section, for which the sequence is irrelevant. Consider disjunction (4) that represents the contaminant constraints in consecutive chromatographic steps: Wi moi,p = CFi,p moi−1,p ∀p ¬Wi ∨ ∀i (4) moi,p = moi−1,p ∀p The first term of disjunction (4) indicates selection of technique i represented with Boolean variable Wi and therefore enforces a reduction in the mass of contaminants. On the other hand, if i is not selected the
300
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
mass of contaminant remains the same as indicated in the second term. Disjunction (4) is the basis for the development of convex hull models MC1 that are described in Sections 3.1 and 3.2. 3.1. Convex hull model for the minimization of the total number of steps (MC1a ) Model MC1a is composed by the objective function that minimizes the number of selected steps and by the contaminant, purity and domain constraints, as follows: Objective function: Min S = wi (5) i
Subject to: (a) contaminant constraints mo1,p = CF1,p mp,1 w1 + mp,1 (1 − w1 ),
∀p (6a)
moi,p = CFi,p mo1i−1,p + mo2i−1,p , ∀p, i = 2, . . . , I
(6b)
moi−1,p = mo1i−1,p + mo2i−1,p , ∀p, i = 2, . . . , I 0 ≤ mo1i−1,p ≤ Uwi ,
(6c)
∀p, i = 2, . . . , I (6d)
indicated by Eq. (6a). If the first technique is selected (w1 = 1), (6a) sets the resulting mass to mo1,p = CF1,p mp,1 , where mp,1 is the initial mass of protein p; otherwise it remains the same, that is mo1,p = mp,1 . In the following steps, constraints (6b)–(6e) hold. Each term of disjunction (4) is represented with subscripts 1 and 2. It can be noted from this formulation (i ≥ 2) that if step i is selected (wi = 1), the mass of protein p in this step is set to moi,p = CFi,p moi−1,p . This result is a consequence of (6b) and (6e) that set mo2i−1,p to zero and therefore moi−1,p = mo2i−1,p . On the other hand, if step i is not selected (wi = 0) then from (6d) mo1i−1,p = 0 and therefore moi−1,p = mo2i−1,p . Thus, from (6b) we have moi,p = moi−1,p that keeps the mass of protein in the mixture. Finally, constraint (7) imposes that the specified purity of the protein of interest must be achieved. In this constraint, subscript i denotes the last candidate of the chromatographic techniques. 3.2. Convex hull model for purity level maximization (MC1b ) Model MC1b that maximizes the purity level for a fixed number of steps and for which the sequence of operations is irrelevant is obtained by analogy with model MC1a ; that is, the same contaminant constraints are valid. This model, for a fixed number k∗ of chromatographic steps, has the following structure: Objective function: moI,p (9) min S = p=dp
0 ≤ mo2i−1,p ≤ U(1 − wi ),
∀p, i = 2, . . . , I (6e)
(b) purity constraints moI,dp ≥ SPdp moI,p
(7)
(c) domain constraints ∀i
moi,p , mo1i,p , mo2i,p ≥ 0,
(a) selection constraints wi = k ∗
(10)
i
(b) contaminant constraints (6) (c) domain constraints (8).
p
wi ∈ {0, 1},
Subject to:
(8a) ∀i, p.
(8b)
Constraints (6) result from the convex hull formulation of disjunction (4) that is shown in Appendix A. The mass of protein that remains after the first step is
Objective function (9) minimizes the overall mass of contaminants along the chromatographic steps, where I corresponds to the last candidate step. Constraint (10) indicates that exactly k∗ steps are selected among the candidates. Contaminant constraints (6) and (8) are the same as the ones from model MC1a and obviously play the same role.
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
4. Convex hull models for selection and sequencing Convex hull models were also developed for the selection and sequencing of purification steps. These models rely on the following general disjunction:
Yi,k ∨ i=1 mp,k+1 = CFi,p mp,k αk ∨ mp,k+1 = 0 ∀p I
∀p
∀k = 2, . . . , k − 1
4.1. Convex hull model for minimizing the number of steps (MC2a ) The resulting model that minimizes the number of steps and determines the optimal selection and sequencing is as follows: Objective function: yi,k = k · Zk (12) Min S =
Zk = 1
(14e)
(c) contaminant constraints mp,2 = CFi,p yi,1 mp,1 ,
mp,k+1 =
i
mp,k =
i
CFi,p m1i,p,k + m2p,k , ∀p, k = 2, . . . , K − 1
yi,k ≤ 1,
(15b)
m1i,p,k + m2p,k , ∀p, k = 2, . . . , K − 1
m1i,p,k ≤ Uyi,k ,
(15c)
∀i, p, k = 2, . . . , K − 1 (15d)
m2p,k ≤ Uαk ,
∀p, k = 2, . . . , K − 1 (15e)
p
k
(13a)
i
(16)
(e) domain constraints
∀i
(13b)
k
(b) ordering constraints yi,k+1 ≤ yi,k ,
Zk ≥
(15a)
(d) purity constraints mdp,k+1 ≥ SPdp mp ,k+1 − U(1 − Zk ),
(a) assignment constraints yi,k + αk = 1, ∀k
i
∀p
i
k
Subject to:
(14d)
i
k
Disjunction (11) contains I + 1 elements for each protein p and order k, where the first I elements are represented in compact form. These terms in disjunction (11) model the selection of step i in order k, whereas the last term models the case of no selection of steps in order k.
k
(14c)
∀k, k > k
yi,k + Zk ≤ 1,
(11)
i
∀k, k ≤ k
yi,k ≥ Zk ,
i
301
k ≤K−1
yi,k ∈ {0, 1},
∀i, k
mp,k , Zk , m1i,p,k , m2p,k , αk ≥ 0,
(17a) ∀i, p, k.
(14a)
(17b)
(14b)
Objective function (12) minimizes the overall number of chromatographic techniques and selects a sequence for a given purity. This is the same as the objective function defined in Vásquez-Alvarez et al. (2001).
i
i
yi,k −
yi,k+1 ,
k ≤k−1
i
302
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
Constraint (13a) indicates that at most one step i may be chosen in order k. The slack variable αk is activated if no steps are selected in order k. Constraint (13b) imposes that step i is selected at most once in the sequence and (14a) states that steps are assigned in increasing order. Constraints (14b)–(14e) define the last step of the sequence, denoted by Zk . Constraint set (15) relates protein mass in subsequent steps that is derived from disjunction (11). A group of disaggregated variables m1i,p,k and m2p,k is defined that relate to binary variables yi,k and to the continuous variable αk , respectively. Constraint (15a) indicates the mass of protein p after the first step. Moreover, mixed-integer constraints (15b)–(15e) correspond to the group that was reformulated from disjunction (11). If step i is selected in order k (yi,k = 1 and αk = 0) then (15b) defines the mass of all proteins for the chosen step i. Constraint (15c) imposes the mass of protein p for any choice in order k. Constraint (15d) indicates that if step i is not selected (yi,k = 0) then m1i,p,k = 0; on the other hand, (15d) forces m2p,k to zero if any step is selected (αk = 0). Constraint (16) enforces purity specification. 4.2. Convex hull model for maximizing product purity (MC2b )
(18)
Disjunction (18) models the choice of step i in order k. In this case, the no-choice option is not considered, since the number of steps is fixed to k∗ . Note that this number may be either defined from the minimization of MC1a , MC2a or arbitrarily set. Model MC2b is written as follows: Objective function min S = mp,k∗ +1 (19) p=dp
(a) assignment constraints yi,k = 1, k ≤ k∗
(20a)
i ∗
k
yi,k ≤ 1,
∀i
(20b)
k=1
(c) contaminant constraints mp,2 = CFi,p yi,1 mp,1 ,
∀p
(15a)
i
mp,k+1 =
i
mp,k =
i
CFi,p m1i,p,k , ∀p, k = 2, . . . , k∗
∀p, k = 2, . . . , k∗ (21b)
m1i,p,k ,
∀i, p, k = 2, . . . , k∗
m1i,p,k ≤ Uyi,k ,
(21a)
(21c)
(d) domain constraints
Model MC2b is proposed for selecting a sequence with the objective of maximizing the purity level of the desired protein. In this case, disjunction (11) may be reformulated by restricting the number of steps to k∗ and removing the last term. Therefore the model is characterized by the following disjunction: I yi,k ∨ , i=1 mp,k+1 = CFi,p mp,k ∀p ∀k = 2, . . . , k∗
Subject to:
yi,k ∈ {0, 1}, mp,k , m1i,p,k , ≥ 0,
∀i, k
(22a) ∀i, k, p.
(22b)
Objective function (19) minimizes the amount of contaminants in the mixture and it is the same one applied in the model for purity maximization of Vásquez-Alvarez et al. (2001). Eq. (20a) indicates that exactly one step i is selected in order k (yi,k = 1). Moreover, (20b) imposes the selection of step i at most once. The mass of contaminants after the first step is obtained with (15a) and (21a)–(21c), which result from the reformulation of disjunction (18). If step i is selected in order 1 (yi,1 = 1), the mass of proteins becomes defined in (15a). Constraint (21a) relates the mass of step i in orders k and k + 1, whereas (21b) and (21c) relate the disaggregated variables with the continuous variables and binary variables, respectively. Therefore, if step i is selected (yi,k = 1) then m1i,p,k = 0; otherwise m1i,p,k = 0 from (21c).
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
5. Computational results The models are solved for three different examples of increasing size that are the same as those from Vásquez-Alvarez et al. (2001). Major issues concern comparison between big-M and convex hull formulations as well as sensitivity analysis of the models. 5.1. Example 1 In Example 1, taken from Lienqueo et al. (1998), we consider the purification of a mixture that contains four proteins with the same initial mass: serum from bovine albumin (p1), ovalbumin (p2), soybean trypsin inhibitor (p3) and thaumatin (p4). Their physicochemical properties as well as the initial protein concentration of the mixture are shown in Vásquez-Alvarez et al. (2001). The purity level required for p1 is 98%. Overall, there are twelve candidate techniques that are ion exchange at 4.0, 5.0, 6.0, 7.0, and 8.0 pH levels, gel filtration and hydrophobic interaction. The solutions obtained with the four models contain three steps that are shown in Fig. 2. It can be noted that the solutions proposed from models MC1a and MC2a attain a 98.9% final purity. Note, however, that these are among the possible solutions that satisfy purity requirements. Moreover, with models MC1b and MC2b a maximum purity level of 99.8% is reached. These models select simply a different step (anion exchange at 8.0 pH) that results in higher purity levels. By comparing the optimal results of model MC2a to the solution obtained from an expert system (ES) (Lienqueo et al., 1998) for 94.5% purity we obtain the same sequence and final purity, with a small discrepancy in purity for anion exchange at pH 7.0 and hydrophobic
303
interaction (Prot-Ex solution reports 63.7% whereas MC2a obtains 64.1%). This sequence that contains two purification steps was confirmed experimentally by Lienqueo et al. (1999). However, since the required purity for the MILP models is higher (98%), cation exchange at pH 8.0 is added to the sequence. The results for Example 1 with the big-M formulations are shown in Table 4 of Vasquez-Alvarez et al. (2001) and all selected the same chromatographic operations. These are exactly the same ones chosen in models MC1b and MC2b . As for models MC1a and MC2a , cation exchange steps were selected in place of anion exchange at pH 8.0. However, the solutions of these models also contain the minimum number of steps for the purity level specified. 5.2. Example 2 In Example 2, we consider the purification of -1,3-glucanase (8.3% concentration) that must be separated from eight contaminants. Physicochemical data for this example was provided from Lienqueo et al. (1999) and are shown in Table 2a. The purification requires 90% -1,3-glucanase and the steps must be synthesized from 22 candidate techniques (the ones from Example 1 + anion and cation exchange at 4.5, 5.5, 6.5, 7.5 and 8.5 pH levels). Results for all models are shown in Fig. 3a. Note that this mixture is very difficult to purify. Overall, four steps are required and a final maximum purity of 90.5% is attained for all models. The solution provided by an ES (Lienqueo et al., 1999) is limited to a final purity of 70%. The suggested sequence is hydrophobic interaction (32.7% purity) followed by anion exchange at 6.5 pH (70.3%); interestingly, these
Fig. 2. Optimal results for Example 1.
304
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
Table 2 Physicochemical properties of the protein mixture in Example 2 (a) full set of data (b) combined data for the non-individualized contaminants Proteins
Cp,1 (g/l)
Qp = Charge × 10−25 (C/mol)
MWp (Da)
Ha [NH4 (SO2 )4 ]p
a1
a2
pH 4.0 a3
pH 4.5 a4
pH 5.0 a5
pH 5.5 a6
pH 6.0 a7
pH 6.5 a8
pH 7.0 a9
pH 7.5 a10
pH 8.0 a11
pH 8.5 a12
(a) Target Cont 1 Cont 2 Cont 3 Cont 4 Cont 5 Cont 6 Cont 7 Cont 8
0.62 0.42 0.25 0.25 0.09 0.09 2.74 2.74 0.25
31000 62500 40600 69600 40600 69600 41000 32900 35500
0.00 0.00 0.00 0.00 0.00 0.00 1.50 1.50 0.20
1.46 1.46 1.46 1.46 1.46 1.46 1.46 1.46 1.46
0.09 0.09 0.09 0.09 3.14 3.14 0.93 0.09 0.09
−0.62 −1.06 −0.55 −0.55 1.46 1.46 0.26 0.00 −0.55
−0.66 −0.98 −0.22 −0.22 0.28 0.28 −0.35 −1.70 −0.22
−1.02 −1.17 −0.22 −0.22 −0.47 −0.47 −0.87 −2.70 −0.22
−1.82 −1.71 −0.26 −0.26 −0.89 −0.89 −1.31 −2.90 −0.26
−2.33 −2.79 −0.73 −0.73 −1.06 −1.06 −1.65 −3.51 −0.73
−2.52 −3.52 −1.26 −1.26 −1.08 −1.08 −1.90 −3.51 −1.26
−2.52 −3.32 −1.82 −1.82 −1.04 −1.04 −2.04 −3.51 −1.82
−3.51 −3.32 −3.51 −3.51 −1.01 −1.01 −2.06 −3.51 −3.51
(b) Target Cont 1 Cont 238 Cont 45 Cont 6 Cont 7
0.62 0.42 0.75 0.18 2.74 2.74
31000 62500 48600 55100 41000 32900
0.00 0.00 0.20 0.00 1.50 1.50
1.46 1.46 1.46 1.46 1.46 1.46
0.09 0.09 0.09 3.14 0.93 0.09
−0.62 −1.06 −0.55 1.46 0.26 0.00
−0.66 −0.98 −0.22 0.28 −0.35 −1.70
−1.02 −1.17 −0.22 −0.47 −0.87 −2.70
−1.82 −1.71 −0.26 −0.89 −1.31 −2.90
−2.33 −2.79 −0.73 −1.06 −1.65 −3.51
−2.52 −3.52 −1.26 −1.08 −1.90 −3.51
−2.52 −3.32 −1.82 −1.04 −2.04 −3.51
−3.51 −3.32 −3.51 −1.01 −2.06 −3.51
a
H: hydrophobicity.
steps were obtained by all models. This sequence also was confirmed experimentally (Lienqueo et al., 1999). It can be observed from Table 2a that some of the data are not individualized in terms of the physical properties (this may frequently occur in practice). We combined some of the contaminants and analyzed the effect of reliable physicochemical property data on the model. In this case, contaminants 2, 3, and 8 as well as contaminants 4 and 5 were combined (Cont 238 and Cont 45). In the former case, the same titration data was present for the three contaminants and in the latter their only difference is the molecular weight. The alternative set of data shown in Table 2b rely on the overall concentration of the combined contaminants and on the weighted average of the molecular weight. Interestingly, four steps were also obtained in the optimization models from the data of Table 2b. Moreover, the same selection of steps was made, as shown in the Fig. 3b, with a slight change in the values for the purity levels. The purity ranges obtained as a function of the number of steps are presented in Fig. 4. The results for MC1a and MC2a represent the minimum number of
steps for purity levels as indicated in the columns. For instance, for a 69% specification, two steps are sufficient; however, for purity levels higher than 69.3% and lower than 83.3% then three steps are required. A similar interpretation is valid for MC1b and MC2b . In this case, the columns indicate the maximum purity level achieved for a given number of steps. Final purity levels coincide for all steps, except for the results obtained for six operations. In this case, MC1a and MC2a generated the same optimal objective function value but different solutions. Note that this may occur because many purification sequences may still provide the minimum number of steps under the same purity constraint. 5.3. Example 3 In this last example, the goal is to purify somatotropin to 98.0% from a mixture of 13 contaminants. The complete set of data is given in Lienqueo et al. (1996). The candidate techniques are the same ones from Example 2. The optimal solutions are shown in Fig. 5. Despite being a larger problem, only two steps are necessary to achieve the desired purity level. More-
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
305
Fig. 3. Optimal results for Example 2. (a) Full set of data and (b) combined data for the non-individualized contaminants.
over, all models obtained the same solution in terms of final purity level and purification steps. In this particular example, it is not possible to compare the results obtained with the ES solution, since purification techniques other than high-resolution chromatography were employed (Lienqueo et al., 1996). However, the results obtained in this paper can be compared to the optimal results obtained in
Vasquez-Alvarez et al. (2001) with the big-M formulations, which are shown in Fig. 6 of that paper. The big-M models select two steps as the minimum number to achieve the 98% required purity; moreover, the models that maximize the purity levels choose cation exchange at pH 5.5 and hydrophobic interaction. Note that these are exactly the two techniques selected in the present paper for all models.
Fig. 4. Purity scale for Example 2.
306
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
Fig. 5. Optimal results for Example 3.
5.4. Computational performance Table 3 presents statistical data for the three examples and the four MILP models presented in this paper. The software GAMS 2.50 (Brooke et al., 1998) was used to implement the models and to generate their solutions in a Pentium II 400 MHz platform. The proposed models were solved with OSL (IBM, 1991) to global optimality. The solver relies on a linear programming-based branch and bound method that enumerates the several solved sub-problems (nodes). Note that models MC1 present a smaller number of binary variables, since these are defined simply for the selection of steps. Results are also compared to the ones obtained from Vásquez-Alvarez et al. (2001) that are based on big-M representations (models M1 and M2 ). Note that there is an increase in the number of continuous variables, due to the disaggregated variables and a slight reduction in the number of constraints. Nevertheless, there is a significant reduction in the number of nodes enumerated and in CPU time with respect to the big-M models. Exceptions are Examples 1 and 3 for model MC2a , for which larger
Fig. 6. CPU times for M1 /MC1 in Example 2.
CPU times were observed. Nevertheless, a 96.2% reduction was observed for the same model in the case of Example 2 that is the largest among all. The results suggest that the linear programming problems that are solved at each node of the search tree are more time consuming in the convex hull than in the big-M formulation for model MC2a . This happens because the convex hull model presents a larger number of continuous variables. In the case of Example 2, that requires a more difficult separation sequence, the benefit of enumerating a smaller number of nodes overcomes the larger size of the LP problems. It is important to note that in model MC2b the optimal solution was obtained, in contrast with the corresponding big-M formulation (Vásquez-Alvarez et al., 2001). Fig. 6 (in log scale) presents CPU times as functions of the number of steps for models MC1 and M1 that correspond to convex hull and big-M formulations, respectively. There is a reduction of up to three orders of magnitude, as seen in Example 2. Another important aspect concerns the comparison of the results obtained from the MC1 models with respect to the MC2 models. The former simply select the sequence of operations, whereas the latter also include sequencing decisions. The current objective functions proposed in this paper that aim to minimize overall number of steps for a given purity specification and to maximize purity for a given number of chromatographic techniques are insensitive to the sequencing of operations. Consequently, for these objective functions it is advantageous to use models MC1 . Nevertheless, economical and performance objectives would depend on the feed to each of the purification steps and thus on the sequence of operations.
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
307
Table 3 Summary of statistical data for models MC1 and MC2 Model
Example
Binary variables
Continuous variables
Constraints
Nodes enumerated
CPU time (s)
Change with respect to big-M model Cont. variables (%)
Constraints (%)
CPU time (%)
MC1a
1 2 3
12 22 22
137 577 897
182 767 1192
49 187 44
2.1 13.9 18.8
+179.6 +189.9 +190.3
−6.2 −3.4 −3.4
−56.5 −99.8 −73.1
MC1b
1 2 3
12 22 22
137 577 897
182 767 1192
90 87 9
2.1 9.2 6.6
+179.6 +189.9 +190.3
−6.2 −3.4 −3.4
−50.0 −99.8 −91.4
MC2a
1 2 3
168 288 288
565 2170 3375
808 2473 3728
48 1096 26
164.6 2603 2467.9
+1156 +2070 +2077
−30.7 −40.8 −41.6
+909 −96.2 +1016.0
MC2b
1 2 3
36 132 44
109 1045 337
137 1171 380
106 62457 0
3.1 4434.6 2.6
+738 +1800 +1062
−36.0 −42.1 −42.1
−46.3 −42.7 −49.0
5.5. Sensitivity analysis The purpose of such analysis is to determine the effect of the parameters on the optimal solutions of the convex hull based models (models MC1b and MC2b for purity maximization were adopted). The case studies are presented in Table 4 as well as the parameter ranges that are applied in the sensitivity analysis. It is important to note that these ranges correspond to the uncertainties verified in their derivation (Leser, 1996; Lienqueo et al., 1998, 1999; Lienqueo, 1999). Firstly, the analysis is performed over the values that affect the dimensionless retention time (Kdi,p ), whose correlations for the chromatographic techniques applied in the present work are shown in Table 1. Model MC1b and Example 2 were selected for this analysis. Following, the analysis is based on the parameter ∆ that quantifies the fraction of contaminant that remains with the protein of interest (as shown in Fig. 7) and results from the approximation of a chromatographic peak to an isosceles triangle, as established by Leser (1996). This parameter affects the concentration factors (CFi,p ) that are calculated from (3). Recall that CFi,p defines the degree of separation of contaminants from the mixture. The first parameter studied is AAE that has a base value of 7383 (see Case 1.1 in Table 4), and takes part in the mathematical correlation for anion exchange chromatography. For instance, if this parameter is in-
creased by 10% there is a 2.8% increment in final purity (see Fig. 8). A decrease in the same order of magnitude causes a similar reduction in final purity. A second analysis in Case 1.2 (Table 4) is performed for CAE that has a base value 15844 in anion exchange chromatography. The effect of such parameter is opposite to AAE , that is an increase in CAE causes a decrease in final purity and vice-versa. For instance, 10% reduction in CAE causes a 2.7% increment in final purity (Fig. 8). Case 2 corresponds to the analysis of parameters ACE (base value 5972) and CCE (base value 17065) that are included in the correlation of cation exchange chromatography. Perturbations on these parameters (up to ±10%) in Example 2 do not indicate any variation in final purity for six steps. Such techniques are not included in the purification process for purity maxi-
Fig. 7. Maximum purity variation vs. ∆ in Example 2.
308
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
Table 4 Parameters used in the sensitivity analysis Case
Parameter
Example
Correlation
1.1
AAE
2
KdAE,p =
1.2
CAE
2
KdAE,p =
2.1
ACE
2
KdCE,p =
2.2
CCE
2
KdCE,p =
3
[(NH4 )2 SO4 ]max
2
KdHI,p = 1 −
4.1
BGF
2
KdGF,p = BGF . log MWp + 2.3902
4.2
DGF
2
KdGF,p = −0.4691. log MWp + DGF
AAE × Qp /MWp 1 × 10−25 + 15844 × Qp /MWp 7383 × Qp /MWp 1 × 10−25 + CAE × Qp /MWp ACE × Qp /MWp 1 × 10−25 + 17065 × Qp /MWp 5972 × Qp /MWp 1 × 10−25 + CCE × Qp /MWp [(NH4 )2 SO4 ]p [(NH4 )2 SO4 ]max
5.1
σ HI
2
CFi,p = (1.0 + ∆)
5.2
σ GF
2
CFi,p = 2.(1.0 + ∆)
6.1
∆
1
CFi,p = (1.0 + ∆)
6.2
∆
2
CFi,p = 2.(1.0 + ∆)
6.3
∆
3
CFi,p = ∆
σi2 − 2DF2i,p
σi − DFi,p σi2
σi2 − 2DF2i,p
Fig. 8. Maximum final purity vs. parameters in Example 2.
σi2
σi2 σi − DFi,p σi2
Value
Variation
7383
±10%
15844
±10%
5972
±10%
17065
±10%
1.5
±10%
−0.4691
±10%
2.3902
±10%
0.22
±20%
0.46
±20%
2 0.02
0.02–0.08
0.02
0.02–0.08
0.02
0.02–0.08
2
mization in model MC1b , even for more favorable parameter values. The parameter studied in Case 3 is the maximum concentration of ammonium sulphate (base value 1.5). This parameter is part of the mathematical correlation for hydrophobic interaction chromatography, as seen in Table 4. Changes in this parameter do not affect final purity in Example 2 for the case of six selected steps. In Case 4, the analysis concerns parameters BGF and DGF that take part in the correlation for gel filtration and whose base values are 0.4691 and 2.3902, respectively. Changes in these parameters showed no effect in the optimal solutions, since gel filtration chromatography was never selected. The assumption of constant peak width was verified in Case 5. We have applied the analysis for purity
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
maximization to the same Example 2 for a fixed overall number of four and six chromatographic steps. Since hydrophobic interaction was selected in all optimal solutions, we verified the impact of changes within the 20% range on its base value (0.22, see Table 1). The results indicate that the optimal purity level value is maintained for four and six steps and all solutions present HI as a purification step. Sensitivity analysis was also performed for gel filtration, which is not selected from any of the models. For the case of six steps, there are no changes in the optimal solution (GF is not selected) for variations in the range −10 to +20%. If the peak width is decreased by 15%, then GF is selected and the purity level reaches 95.3%; moreover, the final purity is 96.6% for a 20% decrease in σ GF . However, in the case of four steps no effect of the gel filtration peak width was observed in the optimal solution. In Fig. 7, final purity is plotted against variations in the ∆ parameter (Case 6). It can be verified that the approximation of a chromatographic peak to a triangle is critical. For instance if six steps must be selected, final purity may vary from 94.8% (∆ = 0.02) to 92.6% (∆ = 0.08). It can be noted that changes of up to 25% may occur in the purity level of the desired protein. This may be extremely important in the specification of vaccines or antibiotics. Fig. 8 presents the variation of final purity level as a function of changes in the most significant parameters in Example 2 with six selected steps. These are AAE , CAE and ∆ and all affect positively final purity if they cause a decrease in CFi,p . This may be extremely important in the specification of vaccines or antibiotics.
309
significant variations resulted for the parameter that quantifies the peak approximations. Nonetheless, it was observed that the selection of the chromatographic techniques was unaffected by the typical variations that occur at the experimental level; however, parameter uncertainty may be critical for reaching purity specification of products such as vaccines and antibiotics. Therefore, it may be concluded that the present approach may be an efficient tool for synthesizing protein purification processes. Research under development concerns the consideration of product loss of the target protein, since this feature was not taken into account in the present formulation. Moreover, due to the results from the sensitivity analysis it can be seen that a more precise approximation of chromatograms (e.g. normal distribution function) could render better results. Therefore, research on the development of more accurate chromatographic models and their integration with the synthesis framework is required.
Acknowledgements The authors would like to acknowledge financial support received from PADCT/CNPq (62.0239/97 QEQ), from ANTORCHAS (A-13668/1–9) and VITAE (B-11487/10B006) (Cooperation Programs Argentina, Brasil, Chile).
Appendix A. On the representation of linear disjunctions
6. Conclusions This paper presented mathematical programming formulations based on disjunctive logic that were represented in mixed-integer linear form, with the objective of minimizing the number of chromatographic steps and to maximize the final purity. Models that rely on convex hull formulations are more efficient from the computational point of view and lead to reductions of up to three orders of magnitude in computational time when compared to previous big-M formulations (Vásquez-Alvarez et al., 2001). Sensitivity analysis was performed for the major model parameters for purity maximization. The most
The linear disjunction that represents problem decisions can be represented in general form as: Z ¬Z ∨ (A.1) y = βx y=x In (A.1), Z is a Boolean variable, whereas y and x are continuous variables. The first term indicates the selection of Z (true) whereas the second term denotes that Z is not selected (false). Parameter β relates y and x linearly for the case of Z selected. The first alternative is to represent the disjunction in mixed-integer form through big-M constraints, with
310
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311
the use of z binary variables. Thus, disjunction (A.1) can be written as: z1 + z 2 = 1
(A.2)
−(1 − z1 )Ml ≤ y − βx ≤ Mu (1 − z1 )
(A.3)
−(1 − z2 )Ml ≤ y − x ≤ Mu (1 − z2 )
(A.4)
z1 = {0, 1},
z2 = {0, 1}
Replacing (A.2) in (A.4) yields: −(z )Ml ≤ y − x ≤ Mu (z ) 1
1
(A.5)
In (A.3) and (A.5), the equations contained in each term of disjunction were formulated as inequalities where Ml1 , Mu1 , Ml2 , and Mu2 are big-M parameters. These are sufficiently large so to avoid the elimination of feasible solutions. In the Convex hull formulation, disaggregated variables y1 , y2 , x1 and x2 are defined according to the continuous variables y and x, respectively, where the superscripts 1 and 2 indicate the first and the second disjunctive terms, respectively. Therefore, the following equations hold: y = y1 + y2
(A.6)
x=x +x
(A.7)
1
2
Then, the following constraints that correspond to each of the terms of the disjunction are generated as follows: y1 = βx1
(A.8)
y 2 = x2
(A.9)
Similarly, binary variables are defined for each of the terms of the disjunction. These are logically equivalent to the Boolean variables, that is the binary variable is one if the Boolean variable is true and zero otherwise. The binary variables are related to the disaggregated variables as follows: 0 ≤ x1 ≤ Uz1
(A.10)
0 ≤ x2 ≤ Uz2
(A.11)
0 ≤ y1 ≤ Uz1
(A.12)
0 ≤ y2 ≤ Uz2
(A.13)
z1 + z2 = 1
(A.14)
Note that constraints (A.12) and (A.13) may be excluded because they result from (A.8)/(A.10) and (A.9)/(A.11), respectively. For instance, if Z is false then z1 = 0. From (A.10) we have x1 = 0 and finally (A.8) yields y1 = 0. The same is valid for Z true. Consequently, the convex hull representation can be written as: x = x1 + x2
(A.7)
y = βx1 + x2
(A.15)
0 ≤ x1 ≤ UZ1
(A.12)
0 ≤ x2 ≤ U(1 − Z1 )
Z1 = {0, 1}
(A.16)
Constraint (A.15) is obtained by replacing (A.8) and (A.9) in (A.6), and is expressed in terms of disaggregated variables. Constraint (A.16) results from replacing (A.14) into (A.11). An analogous development is applied to generate constraints (6) from disjunction (4). Likewise, constraint set (15) can be generated from disjunction (11); finally disjunction (18) is converted to (15a) and (21).
References Balas, E., 1985. Disjunctive programming and a hierarchy of relaxations for discrete optimization problems. SIAM J. Alg. Disc. Methods 6, 466–486. Balas, E., 1998. Disjunctive programming: properties of the convex hull of feasible points. Discrete Appl. Math. 89, 3–44. Brooke, A., Kendrick, D., Meeraus, A., Raman, R., 1998. GAMS: A User’s Guide (release 2.50). Gams Development Corporation, Washington, DC. Bryant, C.H., Rowe, R.C., 1998. Knowledge discovery in databases: application to chromatography. Trends Anal. Chem. 17, 18–24. Grossmann, I.E., Lee, S., 2000. New algorithms for nonlinear generalized disjunctive programming. Comput. Chem. Eng. 24 (9–10), 2125–2141. IBM, 1991. Optimization Subroutine Library (OSL): Guide and Reference (Release 2). Kingston, NY. Larsson, G., Jorgensen, S.B., Pons, M.N., Sonnleitner, B., Tijsterman, A., Tichener–Hooker, N., 1997. Biochemical engineering science. J. Biotechnol. 59, 3–9. Leser, E.W., 1996. Prot-Ex: an expert system for selecting the sequence of processes for the downstream purification of proteins. PhD Thesis, The University of Reading, Reading, UK. Lienqueo, M.E., 1999. Desarrollo de un sistema experto para la seleccion racional de procesos de purificacion de prote´ınas: optimizacion de criterios de seleccion de secuencias. PhD Thesis, University of Chile, Santiago, Chile.
E. V´asquez-Alvarez, J.M. Pinto / Journal of Biotechnology 110 (2004) 295–311 Lienqueo, M.E., Leser, E.W., Asenjo, J.A., 1996. An expert system for the selection and synthesis of multistep protein separation processes. Comput. Chem. Eng. 20 (Suppl.), S189–S194. Lienqueo, M.E., Salgado, J.C., Asenjo, J.A., 1998. Design of an expert system for selection of protein purification processes: comparison between different selection criteria. Computer Applications in Biotechnology. CAB7, Osaka, Japan, pp. 321–324. Lienqueo, M.E., Salgado, J.C., Asenjo, J.A., 1999. An expert system for selection of protein purification processes: experimental validation. J. Chem. Technol. Biotechnol. 74, 293–299. Raman, R., Grossmann, I.E., 1991. Relation between MILP modelling and logical inference for chemical process synthesis. Comput. Chem. Eng. 15 (2), 73–84. Raman, R., Grossmann, I.E., 1994. Modelling and computational techniques for logic based integer programming. Comput. Chem. Eng. 18 (7), 563–578.
311
Steffens, M.A., Fraga, E.S., Bogle, D.L., 2000. Synthesis of bioprocesses using physical properties data. Biotechnol. Bioeng. 68 (2), 218–230. Turkay, M., Grossmann, I.E., 1998. Tight mixed-integer optimization models for the solution of linear and nonlinear systems of disjunctive equations. Comput. Chem. Eng. 22 (9), 1229–1239. Vásquez-Alvarez, E., Lienqueo, M.E., Pinto, J.M., 2001. Optimal synthesis of protein purification processes. Biotechnol. Progr. 17, 685–696. Vecchietti, A., Grossmann, I.E., 1999. LOGMIP: a disjunctive 0–1 non-linear optimizer for process system models. Comput. Chem. Eng. 23, 555–565. Vecchietti, A., Grossmann, I.E., 2000. Modeling issues and implementation of language for disjunctive programming. Comput. Chem. Eng. 24 (9–10), 2143–2155.