Surrogate modeling of phase equilibrium calculations using adaptive sampling

Surrogate modeling of phase equilibrium calculations using adaptive sampling

Computers and Chemical Engineering 126 (2019) 204–217 Contents lists available at ScienceDirect Computers and Chemical Engineering journal homepage:...

2MB Sizes 0 Downloads 28 Views

Computers and Chemical Engineering 126 (2019) 204–217

Contents lists available at ScienceDirect

Computers and Chemical Engineering journal homepage: www.elsevier.com/locate/compchemeng

Surrogate modeling of phase equilibrium calculations using adaptive sampling Corina Nentwich∗, Sebastian Engell Department of Biochemical and Chemical Engineering, TU Dortmund University, Emil-Figge-Str. 70, Dortmund 44221, Germany

a r t i c l e

i n f o

Article history: Received 1 December 2018 Revised 4 April 2019 Accepted 8 April 2019 Available online 11 April 2019 Keywords: Surrogate models Adaptive sampling Phase equilibria PC-SAFT

a b s t r a c t Equation of state models as the Perturbed-Chain Statistical Associating Fluid Theory (PC-SAFT) model are accurate and reliable prediction models for phase equilibria. But due to their iterative nature, they are difficult to apply in chemical process optimization, because of long computation times. To overcome this issue, surrogate modeling – replacing a complex model by a black-box model – can be used. A novel surrogate modeling strategy for phase equilibria is presented, combining the training of a classifier model with regression models for the phase composition using a mixed adaptive sampling method. We discuss the selection of the parameters of the sampling algorithm and a suitable stop criterion for the example ternary liquid-liquid equilibrium system of n-decane, dimethylformamide and 1-dodecene in detail. The sequential mixed adaptive sampling method is compared to the one-shot Latin hypercube sampling design. © 2019 Elsevier Ltd. All rights reserved.

1. Introduction In computer-based process optimization, the reliability of the optimization result depends on the quality of the process model. In order to obtain an accurate representation of the process, models based on first principles are usually preferred. In the modeling of chemical processes, phase equilibria play an important role. For example, the solubility of a feed material in the reaction solution significantly influences the speed of reaction, and the accurate computation of the composition of the vapor and liquid phases in equilibrium is fundamental to the modeling of distillation columns. For phase equilibrium calculations, activity coefficient models or equations of state models can be employed. Activity coefficient models require less computational effort, but are not applicable at elevated pressures, close to critical temperatures, and for multicomponent systems. For such systems, equations of state models should be preferred (Merchan and Wozny, 2016; Schäfer et al., 2014). For complex phase systems, advanced equations of state models as the PC-SAFT model are suitable for accurate predictions over a broad range of operating conditions. The PC-SAFT model has been applied to a wide range of different systems (Kleiner et al., 2009; Kontogeorgis and Folas, 2010; Tumakaka et al., 2005). However, in order to solve phase equilibria using equation of state



Corresponding author. E-mail address: [email protected] (C. Nentwich).

https://doi.org/10.1016/j.compchemeng.2019.04.006 0098-1354/© 2019 Elsevier Ltd. All rights reserved.

models, the density root problem as well as the phase equilibrium conditions must be fulfilled, which requires the use of embedded calculations that lead to a significant computational effort. This makes these advanced thermodynamic models difficult to use for process optimization. In order to overcome this issue, the surrogate modeling methodology can be applied. Surrogate modeling is understood here as replacing a complex model by a simpler black-box model. One can distinguish between two different classes of surrogate modeling problems: classification problems with two or more discrete outputs and regression problems with one or more continuous outputs that are approximated. As the quality of the surrogate model is dependent on the choice of the training points, sampling is an important aspect of the surrogate modeling process. Since sampling involves computationally intense evaluations of the original function, the sampling objective is to sample as few points as possible with a maximum gain of information on the modeled phenomenon. There are mainly two approaches for sampling: sampling once (one-shot) or adaptive sampling. In many applications, one-shot space-filling designs, such as the Latin hypercube sampling (LHS), Monte-Carlo or Halton sequences are used to fit surrogate models. However, in many applications, the modeled quantities exhibit complex responses as discontinuities or a strong curvature in specific regions of the input space. In space-filling designs, such complex structures are often not approximated well. For these cases, adaptive or sequential sampling designs can be used. In this context, the terms of exploration – sampling in a space-filling man-

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217

ner – and exploitation – more dense sampling in regions with complex behavior of the original function – are used. The tradeoff between exploration and exploitation is a recent field of research. Cozad et al. (2014) and Kleijnen and Van Beers (2004) focus on exploitation. The mixed adaptive sampling approach proposed in Eason and Cremaschi (2014), which was further improved by Jin et al. (2016), uses two equally weighted sampling objectives, while the recent work of Garud et al. (2017) proposes an approach of adaptive weighting of the two sampling objectives. Prior to sampling, the input and output variables of the surrogate modeling problem have to be defined. For a chemical system, the composition, temperature and pressure define the number of phases and the composition of these. Modeling the phase composition of a mixture in the range where the transition between a one-phase and a two-phase mixture occurs with a regression model leads to a discontinuity in the output space. In general, discontinuities in the output space are hard to approximate with surrogate models in an accurate fashion. Restricting the valid input range to a region which is biphasic in any case as in other work (Nentwich and Engell, 2016) avoids discontinuities in the output space, but overconstrains the operating range which is not desirable in process optimization. To cope with this problem, a novel surrogate modeling strategy for phase equilibria is investigated in this contribution. A classifier is trained on the obtained data in order to identify the biphasic region. In a second step, regression models are used to model the compositions of the two phases if the point is classified as being biphasic. This approach makes the use of equation of state models possible within chemical process simulation and optimization, since the computationally expensive frequent solution of the phase equilibrium is avoided. To obtain accurate classifier and regression models without calling the original thermodynamic model extensively for sampling, a mixed adaptive sampling scheme is applied. As a case study, the process of the hydroformylation of 1dodecene performed in a thermomorphic solvent system is examined. The proposed surrogate modeling strategy is demonstrated for the modeling of the liquid-liquid equilibrium of the ternary system n-decane, dimethylformamide and 1-dodecene, which is calculated using the PC-SAFT equation of state model. For the considered case study, the mean computation times were 5.1 seconds per phase equilibrium calculation using PC-SAFT which could be reduced to 0.003 seconds by applying the developed surrogate models on a standard computer (Windows 7, 3.6 GHz dual core Intel(R) i7, 18 GB RAM). 2. First principles modeling of phase equilibria In this section, the phase equilibrium problem is defined and the solution procedure when applying equation of state models, e. g. PC-SAFT, is described. The results of this iterative procedure are used to provide data to train a surrogate model which computes the equilibrium by a simple function call. 2.1. The phase equilibrium conditions The simplest approach to describe a phase equilibrium is a standard two-phase flash model in which the phase equilibrium problem can be divided into two sub problems. On the one hand, the component and mass balance conditions (Eq. (1)–(2)) have to be fulfilled. A B xin i = xi + (1 − )xi

0=

NC   i=1

 B

xAi − xi ,

(1) (2)

205

j

where xi is the molar fraction of component i = 1, . . . , NC in phase j (feed, phase A or phase B),  is the phase fraction of phase A. On the other hand, the thermodynamic equilibrium conditions (Eq. (3)–(5)) have to hold.

TA = TB

(3)

pA = pB

(4)

μAi = μBi ,

(5)

Tj ,

μij

pj

where and are the temperature, pressure and chemical potential of component i in phase j. Inserting the definition of the j chemical potential μi as given in Eq. (6), Eq. (5) leads to the isofugacity criterion in Eq. (7):



μ = μ (T , p ) + RT · ln j i

id i

re f

f ij



(6)

pre f

fiA = fiB , where

(7)

μid i

is the ideal chemical potential at temperature T and refj

erence pressure pref , R is the universal gas constant and fi is the fugacity of component i in phase j. In order to solve the isofugacity criterion (Eq. (7)), the fugacity for each component i has to be expressed for each phase. Since this is specific for the chosen thermodynamic model, it is shown in more detail for equation of state models in the next subsection. 2.2. Application of equation of state models j

The fugacity fi of a component i in phase j can be expressed in

terms of its fugacity coefficient ϕi as shown in Eq. (8). j

fij = xij ϕij p j .

(8)

Since the following equations in this section are all phase-specific, the index for the phase j is not shown for convenience. The thermodynamic behavior of a system can be described by modeling the Helmholtz energy. The fugacity coefficient ϕ i can also be formulated in terms of the residual Helmholtz energy ares as shown in Eq. (9).



ln ϕi = ares +

∂ ares ∂ xi





NC  j=1

  xj

∂ ares ∂xj



+ Z − 1 − ln Z,

(9)

where xi is the molar fraction of component i in the phase and Z is the compressibility factor of the phase. The compressibility factor Z can be calculated by solving the density root problem:



Z =1+ρ p = Zkb T ρ ,

∂ ares ∂ρ



(10) (11)

where ρ is the molar density number, kb is the Boltzmann constant, p is the pressure and T is the temperature. Depending on the chemical system considered, a thermodynamic model for the Helmholtz energy ares must be chosen. Here we use the PC-SAFT equation of state model, since the model is capable of not only modeling temperature-dependent LLE, but also the solubility of supercritical components such as the synthesis gas in the reaction mixture of the considered process described in Section 5. 2.2.1. PC-SAFT The Perturbed-Chain Statistical Associating Fluid Theory (PC-SAFT) equation of state model is one of the most detailed predictive models for phase equilibrium calculations. The model

206

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217

Fig. 1. Scheme of the solution procedure when applying a flash model based on an equation of state model.

was developed by Gross and Sadowski (2001). In this approach, a molecule is modeled as a chain which is formed by spherical segments. The interaction of different chains is described by the repulsive behavior of a hard-chain reference system in the perturbation theory of Barker and Henderson (1967). Besides this repulsive hard-chain interaction (hc), the attractive interaction term of the dispersion (disp) is considered and a series of additional attractive interactions can be accounted for, e. g. association (assoc) and polar interactions (dipole). The residual Helmholtz energy (ares ) is described as the sum of the different interactive effects, as shown in Eq. (12):

ares = ahc + adisp + aassoc + adipole .

(12)

2.2.2. The solution procedure using PC-SAFT The solution of the phase equilibrium is done by an external model call of a standard two-phase pT-flash model implementation of Merchan and Wozny (2016), using the MATLAB toolbox AdiMat (Bischof et al., 2002) for the partial derivatives of ares in Eqs. (9)– (10). The equation system given by Eqs. (1), (2) and (7) can be reduced by introducing equilibrium factors κ ,

κi =

ϕiA xAi = . ϕiB xBi

(13)

The resulting equation system is called the Rachford-Rice equation (Eq. (14)) which has to be solved for the phase fraction of phase A  (see also Eq. (2)), NC  xin κ − 1) i ( i = 0. 1 −  + κi

(14)

i=1

The iterative calculations that are necessary to simulate a process model using a flash model call to solve the phase equilibrium are summarized in Fig. 1. In order to solve the overall balance equations, the process solver chooses values for all process variables in the 1st iteration. Given the process conditions, the thermodynamic model is called. First, the density root problem (Eqs. (10)–(11)) is solved to obtain values for the compressibility factor Z and the density number ρ of each phase. Then it is possible to calculate the fugacity coeffij cients ϕi for each component i in phase j with the given composition xi , the compressibility factor Zj and density number ρ j at the conditions T and p (see Eq. (9)). The equilibrium factors κ i are calculated (Eq. (13)) and the Rachford-Rice equation (14) is solved for the phase fraction . This value is used to update the composition j

j

xi by using Eqs. (1) and (13). For this new composition, the density root problem is again solved. This is repeated until the values j xi do not change significantly anymore. j

The new values for xi are taken as new values for solving the overall balance equations in the next iteration of the process solver, which will have an effect on the values of all process variables in this iteration. If they change, the thermodynamic model is called again. The overall process model is solved as soon as the chosen j values for xi at the chosen process conditions are equal to the solution of the flash model call at these conditions. As this takes several iterations of the process solver and thus several calls of the thermodynamic model, this procedure is time-consuming for process models using few phase equilibria calculated with an equation of state and prohibitively slow in the optimization of complex process models with many phase equilibria. Therefore it is advantageous to replace this implicit phase equilibrium calculation by an explicit surrogate model. 2.3. The local model methodology There are different possibilities of introducing explicit approximation models in order to reduce the computational effort that is needed to solve the phase equilibrium problem. The local model methodology was originally introduced to provide an approximation of κ i (Eq. (13)). The first approaches used simplified concentration-independent expressions for κ i (Boston and Britt, 1978; Leesley and Heyen, 1977). Since these models have been shown to be not very accurate, they were extended by e.g. modifying the Margules model to introduce concentration-dependencies (Chimowitz et al., 1983; 1984). Since these models are in general very limited with respect to their range of validity, e.g. because of being related to a reference state, the model parameters have to be updated as soon as the solution procedure has moved away from the current range of validity. In later work, the expression local model is also used as a term for linearized transformations or simplified partial derivatives of thermodynamic properties in general. Several works estimate the optimal update method and frequency, e.g. (Macchietto et al., 1986; Støren and Hertzberg, 1994). For dynamic simluation and control studies, the application of such simple models is advantageous, since the states do not change very fast during the solution procedure. For process design and optimization, this is not feasible due to the fast change of states (Perregaard et al., 1992). Using approximations which are valid in a broader range to avoid parameter update makes it necessary to employ more complex model structures. Also, the phase equilibrium still has to be solved during the solution of the balance equations of the process.

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217

Our surrogate modeling approach is a promising alternative and is discussed in detail in the next section. 2.4. The surrogate modeling problem First a decision on the inputs and outputs of the surrogate model has to be made. The most intuitive method for representing phase equilibrium calculations is the approximation of the miscibility gap. Analogous to the last section, the number of outputs can be reduced by introducing the distribution coefficient of each component Ki (Eq. (15)) as the output instead of the molar fraction xj,i of the component i in each phase j:

Ki =

n˙ i,A xi,A · n˙ A xi,A xi,B − xi,in = = · , n˙ i,in xi,in · n˙ in xi,in xi,B − xi,A

(15)

where n˙ i,A and n˙ i,in are the molar flows of component i in phase A and in the feed of a continuous separation process. n˙ A and n˙ in denote the total molar flows of phase A and of the feed. The distribution coefficient is only defined within the miscibility gap, which is changing its shape for different temperatures. In order to model the shift of the biphasic region, a classifier is introduced. The distribution coefficient can then be approximated by a continuous regression model and the regression model is used only if the classifier assigns the point to the biphasic region, otherwise the concentrations remain unchanged. Prior to the training of the models, sample points have to be selected and the original thermodynamic model must be evaluated for these points which is time-consuming. Approaches to the selection of sample points are discussed in the next section. 3. Sampling Especially for the case of expensive model evaluations, the goal of devising a sampling procedure is to sample as few points as possible but nonetheless get a model of good accuracy over the full range of inputs of interest. As the training locations have a strong effect on the accuracy of the surrogate model, finding the best sampling locations is an important aspect. A common approach is to equidistantly cover the complete input-space which is referred to as space-filling or exploratory sampling design. This approach has the advantage that the original model can be evaluated in a one-shot manner, since no previous knowledge of the original function is needed. A brief overview of one-shot spacefilling sampling designs is given in Section 3.1. The drawback of this sampling method is that the behavior of the original function is not taken into account in the selection. The consequence is that regions of complex structures as discontinuities or a strong curvature are often not covered by enough samples to capture the true behavior with the surrogate model. An illustrative example is shown in Fig. 2. In this example, the 1-dimensional input x is mapped to an 1dimensional output y. While for smaller values of x the chosen

Fig. 2. Illustrative example of a space-filling sampling design (cross: sample location).

207

sampling frequency might be sufficient, the behavior of the original function at larger values of x is not captured by the few samples in this region, the local minima and the local maximum are missed. The sampling objective to cover such regions with a more dense sampling design is referred to as exploitative sampling design. Since knowledge about the behavior of the original function in different regions of the input space is needed, these designs cannot be executed in a one-shot manner. They are implemented in sequential sampling approaches which are reviewed in Section 3.2. Since both exploratory and exploitative sampling objectives are important to achieve a good performance of the surrogate model with few sampling points, they are combined in this work. Our algorithm is based on the approach of Eason and Cremaschi (2014), which is explained in more detail in Section 3.3.1. Modifications of the algorithm are explained in Section 3.3.2 and the stop criterion is discussed in Section 3.3.3. 3.1. One-shot designs Since agglomerations of samples can occur in random sampling designs as Monte Carlo methods, most one-shot sampling designs uniformly cover the input space. They are based on geometric patterns as dividing the inputs space into grids or on the use of mathematical characteristics of sampling functions. The latter is applied in quasi-random sampling methods which share the characteristic of a low discrepancy. The discrepancy of a sequence is connected to the uniform distribution of the sample locations, as it tends to zero when the number of samples tends to infinity in a fixed interval (Niederreiter, 1992). One example is the Halton sequence sampling method (Halton, 1960) which is commonly applied in surrogate modeling problems. The Halton sequence sampling is choosing the ith sample location in the input space, xi , based on the Halton sequence





xi =  p1 (i ),  p2 (i ), . . . ,  pd (i ) ,

(16)

where pj is the jth radical inverse function (see Eq. (17)) with pj as base, being the jth prime number and d is the input space dimension.

 p j (i ) =

inf 

−1 ak (i ) p−k , j

(17)

k=0

where ak (i) are the digit expansion coefficients of the integer i in base pj ,

i=

inf 

ak (i ) pkj .

(18)

k=0

This sampling design is deterministic and the number of samples can be chosen freely. Full-factorial or partial-factorial sampling designs address the space-filling objective by dividing the input space into discrete levels and locate samples at combinations of these different levels. This provides a good space coverage, but at the cost of having a huge sample set size, being the number of levels to the power of the number of input features in the case of a full factorial design. In Latin hypercube sampling (LHS), the input space is also divided into discrete levels. However, in contrast to classical factorial designs, the LHS design does not use all combinations of these levels. Instead, the intervals for each factor in the input space are defined as layers. In each layer only one sample is placed randomly. Fig. 3 shows two examples of LHS designs for a 2-dimensional input space. As a design as shown in Fig. 3b should be preferred over a design as shown in Fig. 3a, an optimization is performed, maximizing

208

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217

3.3. Mixed adaptive sequential design The applied mixed adaptive sequential sampling approach is based on an algorithm (Eason and Cremaschi, 2014) which combines the space-filling objective with an estimation of the model prediction variance for choosing new points.

Fig. 3. LHS design examples for a 2-dimensional input space.

the minimum distance between sample points. Thus, the spacefilling objective is reached and the number of samples can be chosen freely. A review of one-shot sampling designs can be found in Simpson et al. (1997). To combine the exploratory sampling objective with an exploitative sampling objective, the sampling procedure has to be done in a sequential manner. Sequential sampling designs are discussed in the next section.

3.2. Sequential designs There are applications in which it is reasonable to use purely space-filling sequential sampling designs (see Crombecq et al., 2011), but generally they are addressing the trade-off between exploration and exploitation. Starting from a small initial sampling design, new sample points are chosen from a set of candidates for which the true function evaluation has not yet been performed. Since the behavior of the original function is unknown prior to the function evaluations and the initial sampling design consists only of few points, complex regions have to be identified by an exploratory part of the sampling objective before they can be exploited by an exploitative part of the sampling objective. The nature of the sampling objective is in many works determined based upon characteristics of the chosen surrogate model. For example, the expected improvement function (Eq. (19)) that was introduced as a sampling objective by Jones et al. (1998) uses the ability of Kriging surrogates (see Section 4) to give an estimate on the standard deviation of the predicted value.







EI (x ) = ymin − yˆ(x ) · 

ymin − yˆ s (x )



 + s(x )



ymin − yˆ , s

(19)

where ymin is the current minimum value in the original function evaluations, yˆ(x ) is the surrogate model prediction at sampling location x and s(x) is the standard error predicted by the Kriging surrogate model at x. There are also methods which are in principle applicable for different surrogate models. Crombecq et al. (2009) use a Voronoi-mosaic for the input space in order to explore regions with few points and connect this with local linear models at each candidate point in order to meet the exploitative objective. Cozad et al. (2014) apply the approach of maximizing the error between surrogate model predictions and the true function value, which is resulting in a high number of original function evaluations. In contrast, the mixed adaptive sequential sampling design that is further developed in this work avoids this by using a prediction variance estimator for the exploitative part of the sampling objective. In the following sections, the algorithm from Eason and Cremaschi (2014) and the modifications proposed in this work are discussed.

3.3.1. Principle In each iteration, c new candidates are proposed without evaluating the original function. For each candidate j, j = 1, . . . , c, the minimum distance dj to the n = 1, . . . , N points that are already present in the current design and the jackknife variance s2j of the model predictions for each candidate are determined. The ith subset model, i = 1, . . . , NSS, is trained on NSS − 1 subsets, analogous to cross validation methods, e. g. subset model 1 is trained on subsets 2 to NSS, leaving out the samples in subset 1 for training. Candidates with the largest value of ηj (see Eq. (20)) are selected, the corresponding outputs are calculated and the procedure is repeated until a stopping criterion is met.

ηj =

s2j dj + , max j d j max j s2j

(20)

where dj is the minimum Euclidean distance (Eq. (21)) between the candidate and the current design point and s2j is the jackknife variance (Eq. (22)) calculated by the weighted average (Eq. (23)) of the jackknife pseudo values y˜ ji of each candidate j predicted by subset model i (Eq. (24)).

dj = s2j =

min

x j − xn 2

n∈{1,...,N}

NSS   2 1 · y˜ ji − y˜¯ j NSS(NSS − 1 )

(21) (22)

i=1

1  y˜¯ j = · y˜ ji NSS

(23)

y˜ ji = NSS · yˆ(0) − (NSS − 1 ) · yˆ(−i ) ,

(24)

NSS i=1

where yˆ(0 ) is the prediction for candidate j of the overall model, trained on all i = 1, . . . , NSS subsets and yˆ(−i ) is the prediction for candidate j of subset model i. The candidates with the highest value of ηj are selected and added to the sampling design. For these selected sampling locations the original function is evaluated and the sampling design is updated. The number of added points is determined by the selection factor SF (Eq. (25)).

cadd = SF · N,

(25)

where cadd is the number of selected points and N is the number of candidates. So the number of additional samples increases in each round. Modifications of this sampling method are explained in the next section. 3.3.2. Extension of the algorithm In the application at hand, the classifier is trained on all samples in the current design, while the regression models are trained only on the samples in the biphasic region to avoid discontinuities in the output space. In consequence, the subsets are not of equal size. In order to consider this for the jackknife variance calculation, Eqs. (22)–(24) are changed to the modified jackknife variance calculation:

s2j =

NSS  i=1

 2 1 · y˜ ji − y˜¯ j N (N − Ni )

(26)

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217

y˜¯ j =

NSS 

N − Ni

NSS

i=1

I=1

ˆ (0 )

y˜ ji = N · y

N − NI

· y˜ ji

− (N − Ni ) · yˆ(−i ) ,

(27) (28)

where N is the total number of samples in the current design and Ni is the number of samples in subset i. Thus the subset model predictions of subset models that are based on more samples are accounted with a higher weight for the modified jackknife variance calculation. Since both the classifier and the regression models should be considered in the exploitative criterion, the variance term for the decision criterion ηj in Eq. (20) is chosen according to Eqs. (29)– (30):

s2j max j s2j

=

s2appr, j max j s2appr, j

s2class, j s2appr, j 1 1 + 2 max j s2class, j 2 max j s2appr, j =

NC s2appr,i, j 1  , NC max j s2appr,i, j i=1

(29)

(30)

where s2class, j is the modified jackknife variance of the classifier predictions of candidate j and s2appr,i, j is the modified jackknife variance of the regression model predictions of candidate j for component i = 1, . . . , NC. 3.3.3. Stop criterion In any sequential sampling approach a criterion is needed to terminate the sampling procedure. Eason and Cremaschi (2014) propose the slope ratio α as:

   Error slope from iteration i − 1 to i  . α =  max(Error slope in iterations ≤ i ) 

αtot =

αappr 1 αclass 1 + 2 max(αclass ) 2 max(αappr )

NC αappr,i αappr 1  = max(αappr ) NC max(αappr,i )

a Kriging-based approach for the deterministic global optimization of processes including a distillation column. Cozad et al. (2014) use the model building tool ALAMO and combine different basis functions to model a solid sorbent adsorber in the optimization of a carbon capture process. Beykal et al. (2018) use the optimization framework ARGONAUT (AlgoRithms for Global Optimization ofcoNstrAined grey-box compUTational problems) to apply different types of surrogate models as grey-box constraints for computationally demanding derivative-free global optimization problems. Olofsson et al. (2018) use Gaussian processes in the Bayesian multiobjective optimization of a biological tissue engineering application. Recent reviews of surrogate modeling applications in modeling, feasibility analysis and optimization are given in Bhosekar and Ierapetritou (2018) and McBride and Sundmacher (2019). In this work, Support Vector Machine (SVM) models are chosen for the classification problem and ordinary Kriging models are chosen for the regression problem. The choice of the models and their structure has been taken in a pre-study which showed that models of this structure are suitable model approaches for the case study. Generally, the applied sampling method does not exploit any surrogate-dependent characteristic and can thus be applied for any other type of surrogate model. 4.1. Classification using SVM The basic idea of Support Vector Machines (SVM) is to design a hyperplane function y(x) as given in Eq. (34) that separates the training data into two groups.

y (x ) =

S 

(32)

(33)

i=1

The considered error metrics that are applied to α class and α appr,i are discussed in Section 4.3. During the course of the mixed adaptive sampling method, the classification and regression models are trained several times. A brief introduction into the surrogate models that are used in this work is given in the next section. 4. Surrogate models Classifiers and regression models are commonly used to solve different problems in the field of chemical engineering. Examples of classification problems in this field are e. g. fault detection and diagnosis (Chiang et al., 2004; Onel et al., 2018) and drug design (Byvatov et al., 2003). The application of surrogate models for regression problems to reduce the computational effort is gaining increasing popularity (see Cremaschi, 2015). Henao and Maravelias (2011) use neural networks to represent process units in a superstructure optimization, Caballero and Grossmann (2008) use Kriging models to model the relationship of different variables in a distillation column simulation and Keßler et al. (2019) propose

ai yt,i k(x, si ) + b,

(34)

i

(31)

As the number of iterations increases, the slope ratio is declining. A lower threshold value ε for α is set to terminate the algorithm. Since two different types of models are involved in this work, a classifier and regression models, the total slope ratio α tot is defined analogous to Eq. (29)–(30) as a weighted sum of the slope ratios of the different models (see Eqs. (32)–(33).

209

where ai are model parameters, k(x, si ) is the kernel function value at input x with respect to the support vector input si with i = 1, . . . , S being the index of the S support vectors and yt,i being the corresponding output to the support vector input si and b is a constant model parameter. The kernel function can be chosen arbitrarily. In this work, a polynomial of 3rd order has been selected in the prestudy (see Eq. (35)).



k(x, si ) = 1 + xT si

3

(35)

The distance between the hyperplane and the nearest training data points (margin) is maximized by choosing the support vectors and estimating the model parameters ai and b by solving a reformulation of the optimization problem in Eq. (36) (Bishop, 2006)

max w,b

1

  min yn wT (sn ) + b , ||w|| n

(36)

where w is a function of ai given in Eq. (37) and is the kernel function dependent decomposition function, see Eq. (38)

w=

S 

ai yt,i (si )

(37)

i

k(x, x ) = (x )T (x ).

(38)

By the optimization, the maximum margin when applying a certain number of support vectors is achieved. For estimating the model parameters, the MATLAB Statistics and Machine Learning toolbox is used (MATLAB, 2017). 4.2. Regression using Kriging models Kriging is an interpolation approach that originates from the field of geostatistics and was introducted by Krige (1951) and

210

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217

Matheron (1963). The predictor function y(x) consists of a regression and a correlation term, as shown in Eq. (39):

y ( x ) = f ( x )T β + r ( x )T γ ,

(39)

where f(x) is the vector of regression basis function and r(x) is the correlation function vector of input x, β and γ are model parameters. The correlation function vector is defined as in Eq. (40):

r ( x ) = [R ( θ , s 1 , x ) , . . . , R ( θ , s m , x ) ] , T

(40)

where R is the correlation function which is chosen to be Gaussian in this work (see Eq. (41)), θ ∈ Rn is a parameter vector and si are the m input locations used for the design of the model.

R(θ , a, b) =

n

exp(−θk (|a − b| )2 )

(41)

k=1

The model parameter β in Eq. (39) is the generalized least squares solution regarding the training set (see Eq. (42)), γ is defined as in Eq. (43).



β = FtT Rt−1 Ft γ=

Rt−1

−1

FtT Rt−1 yt

(yt − Ft β ),

(44)

j=1

where Ntest is the test set size, yˆclass (x j ) is the classifier model prediction for test sample input xj and yclass (xj ) is the actual class of test sample xj from the original model calculation. For the regression models, the Mean Absolute Error MAE (as defined in Eq. (45)) with respect to the predictions of the test set is used as the error criterion.

MAEi =

Ntest 1  |yˆi,appr (x j ) − yi,appr (x j )|, Ntest j=1

(45)

(46)

   (−k)  (k)    CV E (MisC ) x(jk ) = yˆclass x j − yclass x(jk ) 

(43)

To compare the quality of different models, performance criteria are evaluated for a test set. As a quality criterion for the classification models, the percentage of misclassified samples of the test set Misc, as defined in Eq. (44), is used.

Ntrain   100%  CV E (MisC ) x(jk ) , 2Ntrain j=1

CV E (MAE )i =

4.3. Surrogate model validation

Ntest 100%  |yˆclass (x j ) − yclass (x j )|, 2Ntest

CV E (MisC ) =

(42)

where Ft is the basis function vector, Rt the correlation function vector and yt is the output value. The index t indicates that these functions are evaluated at the i = 1, . . . , m training input locations si and are thus constant parameters in the prediction function. There are different variants of the Kriging approach, depending on the form of the regression term. In simple Kriging, the regression term is a known constant β ( f (x ) = 1), whereas in ordinary Kriging, the regression term is a constant parameter β ( f (x ) = 1) estimated by generalized least squares. In universal Kriging, independent basis functions f(x) are used. For estimating the model parameters, the MATLAB toolbox DACE is used (Lophaven et al., 2002). Kriging models provide a direct estimation of the prediction variance. As the jackknife variance has shown to be a superior method in comparison (Kleijnen and Van Beers, 2004) and to ensure that the proposed method is applicable to any surrogate model type, the Kriging prediction variance is not applied in this work. In order to have an indication of the quality of the surrogate models, suitable validation measures have to be applied. The applied methods are explained in the following.

Misc =

where yˆi,appr (x j ) is the surrogate model prediction for test sample input xj and yi,appr (xj ) is the actual output value for component i from the original model. Using these error metrics for testing the model prediction quality for a large test set of 50 0 0 original function evaluations, the model quality can be evaluated in this work. For calculating the stop criterion, a measure of the model quality has to be defined. For the calculation of the modified jackknife variance (Eq. (28)), subset models yˆ(−k ) are trained on all subset samples except for subset k. The predictions of each subset model yˆ(−k ) for the subset k can then be used for calculating a cross-validation error. In this case, the percentage of misclassified samples CVE(MisC) and the mean absolute cross-validation error of component i CVE(MAE)i are applied:



1



Ntrain

Ntrain





(47)



CV E (MAE )i x(jk ) ,

(48)

j=1











(−k ) CV E (MAE )i x(jk ) = yˆi,appr x(jk ) − yi,appr x(jk ) ,

(49)

(−k ) (k ) (−k ) where yˆclass (x j ) and yˆi,appr (x(jk) ) are the predictions of the subset models trained on all subsets but subset k for sample input xj belonging to subset k and for component i. yclass (x(jk ) ) and

yi,appr (x(jk ) ) are the actual output values from the original model for input xj belonging to subset k and for component i. In the following, the case study to which the mixed adaptive sampling method is applied for training the SVM classifier and Kriging models is described. 5. Case study

As a case study, the process of the hydroformylation of 1dodecene to the main product n-tridecanal has been chosen (Kiedorf et al., 2014). This process has been developed up to the technical realization in two miniplants in the collaborative research center/transregio 63 “Integrated chemical processes in liquid multiphase systems” InPROMPT. Two different strategies of tunable solvent systems have been pursued. The reaction has been performed in a microemulsion process by employing surfactants in the miniplant at TU Berlin (Illner et al., 2018; Müller et al., 2017). The process considered here is performed in a ThermoMorphic solvent System (TMS). The process has been modeled in detail (Hentschel et al., 2014; Kaiser et al., 2016; Kiedorf et al., 2014; McBride et al., 2016; McBride and Sundmacher, 2015) and was realized and iteratively optimized online in a miniplant (Dreimann et al., 2017; Hernández and Engell, 2016; Zagajewski et al., 2016). It also was investigated in several optimization studies, see e. g. (Hentschel et al., 2015; 2014; Keßler et al., 2017; Steimel and Engell, 2016). A TMS is composed of a polar (P in Fig. 4) and a non-polar solvent (N in Fig. 4), in this case DMF and n-decane. This leads to a temperature-dependent miscibility gap with a middle-polar component (M in Fig. 4), in this case 1-dodecene and tridecanal. The reaction is performed at an elevated temperature, so that the reaction mixture forms one homogeneous liquid reaction phase, as shown in Fig. 4a. After leaving the reactor, the mixture is cooled down. Due to the temperature-dependent miscibility gap, the mixture decomposes into two liquid phases, as shown in Fig. 4b. The catalyst system for this process

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217

211

Table 2 Considered ranges for sampling of the surrogate model input variables temperature T, feed fractions of n-decane xn−decane,in , DMF xDMF,in and 1dodecene x1−dodecene,in for the LLE calculations at 1 bar.

Fig. 4. TMS process scheme (Brunsch, 2013).

Input variable

range

T [K] xn−decane,in xDMF,in x1−dodecene,in

298.15 − 343.15 0−1 0−1 0−1

plication of the mixed adaptive sampling method for this task are discussed in the next sections. 6. Results

Fig. 5. TMS process flow sheet (Nentwich and Engell, 2016).

(Rh(acac)(CO)2 /Biphephos) is polar, therefore the polar phase is recycled to the reactor, while the non-polar phase, which is rich in product, is further purified in the downstream process. The translation of this process into a process flow sheet is depicted in Fig. 5. In the considered process, the reactor can be operated at temperatures between 358 K and 378 K and pressures between 10 bar and 30 bar (Hernandez et al., 2018), while the heat exchanger and decanter are operated at a pressure of 1 bar and temperatures between 298.15 K and 343.15 K. The accurate description of the solubility of the supercritical reactants hydrogen and carbon monoxide in the reaction mixture as well as the temperature-dependent LLE are crucial for predicting the performance of the process. During the investigation of this process, thermodynamic models accurately describing the behavior of all phase equilibria have been developed using PC-SAFT (Schäfer et al., 2012; 2014; Vogelpohl et al., 2013; 2014). The goal of our work is to perform process optimization based upon these models. In this paper we only discuss the replacement of the PC-SAFT LLE model of the decanter by surrogate models. The ternary LLE of the solvents n-decane and DMF with the feed n-dodecene is considered here. The pure component PC-SAFT parameters are shown in Table 1. The binary interaction of the components is described by the binary interaction parameters:

ki j = −0.0 0 0315 T /K + 0.1159

(50)

In this section, the mixed adaptive sampling algorithm is applied to the case study of the ternary LLE of 1-dodecene, n-decane and DMF. The choice of the sampling parameters which are the number of subsets NSS (Section 6.1.1) and the selection factor SF (Section 6.1.2) is discussed first. The obtained models are compared with models that are based on a conventional LHS design of the same size in Section 6.2 to see the benefits of the sequential sampling approach. In order to analyze the performance of the models and to be able to compare different models, one large test set with 50 0 0 sample points that have been evaluated by PC-SAFT serves as reference for the error measures MisC, the fraction of misclassified samples (see Eq. (44)), and MAEi , the mean absolute error (see Eq. (45)). In Sections 6.1 and 6.2, the results are discussed for the mixed adaptive sampling runs that are evaluated up to a high number of iterations. The choice of the stop-criterion is discussed in Section 6.3. 6.1. Decision on sampling parameters In order to decide on a suitable parametrization of the algorithm, several mixed adaptive sampling runs were performed. The runs for each combination of algorithm parameters SF and NSS were performed each 5 times. First, the impact of the number of subsets NSS on the model performance is analyzed. 6.1.1. Number of subsets The number of subsets NSS is connected to the exploitative objective of the mixed adaptive sampling algorithm. To find a criterion that reflects the impact of NSS on the selection of candidates, the part of the variance criterion of the total criterion value η (see Eq. (20)) for the selected new samples cadd in each iteration pRCV is analyzed:

pRCV =

for DMF/n-decane,

ki j = −0.0 0 0311 T /K + 0.1128

(51)

for DMF/1-dodecene and ki j = 0 for n-decane/1-dodecene, taken from Schäfer et al. (2012). The considered ranges for sampling of the input variables for surrogate modeling are shown in Table 2. The results of the ap-

s2j

η j · max j s2j

,

(52)

where the index j in this case refers only to selected candidates cadd in each iteration of the algorithm. Plotting this measure in a box plot over the number of iterations for a low number of subsets (in this case NSS = 2) in Fig. 6 reveals that the influence of the variance criterion on the decision is changing over the course of the algorithm.

Table 1 Pure component PC-SAFT parameters of the considered components. Component

reference

M (g/mol)

m (−)

σ (A˚ )

ε /k (K)

μ (D)

T range (K)

n-decane DMF 1-dodecene

Gross and Sadowski (2002) Schäfer et al. (2012) Schäfer et al. (2012)

142.285 73.095 168.320

4.6627 2.3660 5.0091

3.8384 3.6359 3.9413

243.87 312.99 254.86

4.12 1.70

243–617 300–630 310–630

212

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217

Fig. 6. Boxplot of pRCV over the iterations of the sampling algorithm for NSS = 2 and SF = 0.4 with 25%- and 75%-percentiles (box), median (line in box), minimum and maximum (bars) and outliers (markers), based on all selected points in 5 runs.

Fig. 7. Boxplot of pRCV over the iterations of the sampling algorithm for NSS = 18 and SF = 0.4 with 25%- and 75%-percentiles (box), median (line in box), minimum and maximum (bars) and outliers (markers), based on all selected points in 5 runs.

The median is at 45 − 64% in the first 4 iterations and decreases to 0.1% in the following iterations. While for a low number of iterations, the 25%- and 75%-percentiles have a broad spread of about 40%, they narrow to a range of  pRCV = 3% in the last iteration. There are several outliers indicated, 6 out of the 380 selected samples in the 7th iteration, 20 out of the 530 selected samples in the 8th iteration and 423 out of the 2040 selected samples in the last iteration. A similar behavior can be observed for a large number of subsets (in this case NSS = 18) as shown in Fig. 7. The decrease of the median pRCV over the course of the iterations is faster initially in this case than for NSS = 2. After the 4th iteration, the median is already at 23% and decreases to 0.1% in the last iteration. Also, a narrowing of the 25%- and 75%-percentiles can be seen. In the last iterations there are outliers indicated in the box plot, 154 out of the 2040 selected samples in the last iteration. This trend of a decreasing pRCV over the course of the algorithm shows that the impact of the exploitative sampling objective on the selection of points is stronger in the first iterations of the mixed adaptive sampling algorithm, while in later iterations only the space-filling objective is pursued by adding new sample points to the sample set. Choosing a higher NSS strengthens this effect. The impact of NSS on the performance of the surrogate models

Fig. 8. MisC over the iterations of the sampling algorithm for SF = 0.4 and NSS = 2 and NSS = 18 showing the mean(markers), minimum and maximum(bars) of each 5 runs.

Fig. 9. MAEi for component i over the iterations of the sampling algorithm for SF = 0.4 and NSS = 2 and NSS = 18 showing the mean(markers), minimum and maximum(bars) of each 5 runs.

is discussed in the following. For analyzing the influence of the number of subsets NSS on the surrogate model performance, we distinguish between the classifier model performance and the performance of the Kriging models. In Fig. 8, the MisC with respect to the test set is shown for NSS = 2 and NSS = 18. Differences in the performance for the runs with NSS = 2 and NSS = 18 can be seen in the first 5 iterations. In the following iterations, the difference is minor. Regarding the spread of the performance over the each 5 runs, the spread is broader for the runs with the NSS = 2 configuration. At a high number of iterations, this difference decreases. For the same runs of the sampling algorithm, the results of the MAEi for the three different components are shown in Fig. 9. Considering the mean of the results of all 5 runs, the difference between the NSS = 2 and NSS = 18 runs can mainly be seen in the component DMF. In iterations 2 to 4 of the mixed adaptive sampling algorithm, the performance for the predictions of KDMF is superior for NSS of 18. At a higher number of iterations, the differences are minor. Similar as for the MisC, the spread of the MAEi results in each iteration is broader for the runs with NSS = 2. As we want to design a reliable method with as small variance in the performance as possible, the value of NSS = 18 is chosen for further analysis. The number of subsets NSS is equal to the number of models that have to be trained for each subset (see Eq. (24)), so

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217

Fig. 10. MisC over the classifier sample size of the sampling algorithm for NSS = 18 and SF = 0.2 and SF = 0.6 showing the mean(markers), minimum and maximum(bars) of each 5 runs.

213

Fig. 11. MAEi for each component i over the Kriging sample size of the sampling algorithm for NSS = 18 and SF = 0.2 and SF = 0.6 showing the mean(markers), minimum and maximum(bars) of each 5 runs.

the computational effort of the method will increase significantly if a surrogate model is applied which needs a computationally expensive training method which is not the case for the models used here. In these cases, a lower value could be preferred. In the next section, the choice of the selection factor SF is discussed. 6.1.2. Selection factor In this section, the number of subsets is kept constant at NSS = 18 and different selection factors SF are considered. Again, the performance of the classifier model and of the Kriging models are discussed separately. Fig. 10 shows the MisC-value of the classifier models for the mixed adaptive sampling with NSS = 18 and 5 runs each with SF = 0.2 and SF = 0.6 respectively. The error decreases with the sample size for both cases, but the performance is different. The MisC value is lower for runs with a lower selection factor of SF = 0.2 compared to the case of SF = 0.6. The mean MisC at a sample size of 266 is at 2.3% for SF = 0.2, while being 3.4% for a sample size of 261 for SF = 0.6. In the last iterations, the MisC-difference between the two cases is minor, the mixed adaptive approach obtains a classifier model with a mean test set error of MisC = 1.04% at 1648 sample points with SF = 0.2 and for SF = 0.6 MisC = 1.02% at 1712 sample points. Thus can be concluded that a lower selection factor leads to better performing classifier models at low sample sizes. The behavior of the Kriging models with respect to the selection factor is different. Fig. 11 shows the mean MAE-values of the Kriging models for the mixed adaptive sampling with NSS = 18 and 5 runs with SF = 0.2 and SF = 0.6. In contrast to the classifier model, the Kriging models generated at lower sample sizes are generally performing better in the case of a higher selection factor. At a lower sample size of 104 sample points, the MAEi for component DMF is 0.012 for the SF = 0.2 runs, while being at a lower value of 0.007 at a similar training sample size of 103 sample points for the SF = 0.6 case. At higher sample sizes of more than about 240 sample points, the difference is minor. The results are based on the same mixed adaptive sampling runs as those shown in Fig. 10, but the training data sample size is reduced since only the biphasic samples are used to train the Kriging models. The results show that a compromise between classifier and Kriging model performance has to be found. As a compromise, the medium selection factor SF = 0.4 is considered in the further investigation. The performance of this parametrization of the

Fig. 12. MisC over the classifier sample size of the sampling algorithm for NSS = 18 and SF = 0.4 and same-sized LHS design-based models, showing the mean(markers), minimum and maximum(bars) of each 5 runs.

mixed adaptive sampling algorithm at a chosen number of subsets NSS = 18 and SF = 0.4 is compared with the conventional one-shot space-filling design LHS, which is discussed in the next section.

6.2. Comparision with LHS design-based models To compare the results of the mixed adaptive sampling algorithm with LHS, for the sample size in each iteration of the sequential sampling algorithm, 5 LHS designs of the same size were set up. The performance plots discussed are based on classifier models and Kriging models which were trained on these LHS designs and on the models trained by 5 runs of the mixed adaptive sampling algorithm with SF = 0.4 and NSS = 18. First considering the mean classifier model performance, the models trained with the mixed adaptive sampling algorithm at any sample size outperform the LHS design-based models as can be seen in Fig. 12. Especially at medium sample sizes, e. g. 521 sample points, the difference in the mean performance is significant with MisC = 1.1% more misclassified samples for the LHS design-based classifiers compared to the models trained by the sequential approach. At high sample sizes, the difference is smaller. For training the classifier models, the mixed adaptive approach leads to a superior performance than using the purely space-filling design.

214

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217

Fig. 13. MAEi for the Ki of the three components over the Kriging sample size of the sequential sampling algorithm for NSS = 18 and SF = 0.4 showing the mean(markers), minimum and maximum(bars) of each 5 runs of the models trained in the mixed adaptive sampling algorithm and the models based on LHS designs.

Fig. 14. MAEi of xi of the three components in the non-polar and polar phase over the Kriging sample size of the sequential sampling algorithm for NSS = 18 and SF = 0.4 showing the mean(markers), minimum and maximum(bars) of each 5 runs of the models trained in the mixed adaptive sampling algorithm and the models based on LHS designs.

For the Kriging models, the performance difference is not as pronounced. The comparison is shown in separate diagrams for each component of the mixture in Fig. 13. It can be seen that, also for the Kriging models, the performance using adaptive sampling is better in the medium sample size range around 500 samples. While in general the performance differences for Ki -predictions are small for the components n-decane and 1-dodecene, the difference in the component DMF is larger. The fraction of biphasic samples is lower in the sampling designs generated by the mixed adaptive sampling compared to the same-sized LHS designs. Although the training sample size in the last iteration is lower, the performance of the Kriging models trained within the mixed adaptive sampling design is similar to the LHS design-based models. This shows that by combining the exploratory and the exploitative sampling objective, the performance of the Kriging models is superior to LHS design-based models at a same number of biphasic samples, e. g. the MAEn−decane is at 0.0011 for a sample size of 519 in the sequential approach, while being at 0.0020 at a similar sample size of 490 in the LHS design. At the same sample sizes, the MAEi for DMF is 0.0017 in the sequential and 0.0041 in the LHS design-based approach, while for 1-dodecene it is 0.0010 and 0.0019. The errors with respect to the

corresponding molar fractions xi of each component i in each nonpolar and polar phase are shown in Fig. 14. In general, the results for the composition of the polar phase are similar for models based on both sampling approaches. For predicting the non-polar phase composition, the models obtained from the sequential approach are superior for all three components. In the last iteration of the mixed adaptive sampling algorithm, the MAEi of xn−decane in the non-polar phase is 0.0010, 0.0015 for xDMF and 0.0006 for x1−dodecene . For the LHS designbased models, the MAEi are 0.0013 for xn−decane , 0.0019 for xDMF and 0.0 0 06 for x1−dodecene , although the models are based on a bigger sample size of 974 in contrast to 724 in the last iteration of the mixed adaptive sampling algorithm. As the composition of the non-polar phase is determining the design of the process design for further purification (see Fig. 5), a reliable surrogate model has to be obtained. But as the quality of the surrogate model is not known prior to training, the number of samples required to obtain surrogate models with a sufficient performance is not known when using a one-shot design. A comparison of the computation time of the mixed adaptive sampling algorithm with a LHS design of the same size is shown in Table 3.

Table 3 Mean computation time and standard deviation of the each 5 runs when applying the mixed adaptive sampling algorithm and LHS design for a given total sample size, performed on a standard computer (Windows 7, 3.6 GHz dual core Intel(R) i7, 18 GB RAM). Sample size

35

49

69

97

136

190

266

372

521

729

1021

1429

Mean adaptive [min] Mean LHS [min] Std adaptive [min] Std LHS [min]

3.83 3.47 0.54 0.75

5.22 4.76 0.52 0.59

7.32 6.97 0.65 1.34

9.91 11.14 1.01 1.10

14.27 15.24 0.93 1.94

19.98 20.74 1.21 1.82

29.25 29.05 1.12 2.29

40.20 39.22 1.87 3.28

56.25 56.44 0.53 2.06

79.10 81.69 2.44 14.67

111.45 107.37 4.74 8.70

156.68 148.79 5.50 5.53

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217

215

Overall, the differences are not significant. As an advantage of applying the sequential approach, the information on the current samples in each iteration can be exploited to define a suitable stop criterion of the algorithm. This is discussed in the next section. 6.3. Stop criterion In the previous sections, the error measures of MisC and MAEi were calculated regarding a large test set to ensure comparability of the model performance. For the practical use of the sequential sampling method, an error estimate based on the available sample points at the current iteration of the sequential sampling approach should be applied. As discussed in Section 4.3, the modified jackknife variance calculation requires the generation of NSS subset models which are each trained on NSS − 1 subsets (see Section 3.3.2) and can thus be used for NSS-fold crossvalidation to calculate CVE(MAE)i and CVE(MisC) (see Eqs. (46)– (49)). As Eason and Cremaschi (2014) apply the slope ratio criterion of α tot ≤ ε (see sec. 3.3.3), where ε is a lower threshold, this stop criterion is discussed first. Since the value of ε has to be predefined, a suitable value has to be found. For their case studies, Eason and Cremaschi (2014) propose ε to be equal to 0.03, but it has to be noted that in their work another error measure, the summed squared error SSE, was used. To find an adequate value for the given case, α tot is at first calculated based on the test set error measures MAEi and MisC according to Eq. (32). The resulting α tot for the 5 runs with the configuration SF = 0.4 and NSS = 18 is given in Fig. 15. The red line indicates a threshold of ε = 0.03. It can be seen that the threshold value of 0.03 is not applicable in this case, although the MAEi and MisC are not changing significantly anymore (see also Figs. 12 and 13). Only run 2 and 3 would have stopped at iteration 8 and 11 respectively. Running the sequential sampling algorithm longer until this value of ε is achieved can be unproductive. The resulting surrogate model may be more accurate, but the errors in the last iteration are in the order of magnitude of MAEi ≈ 0.001 or lower (see Fig. 14). Measuring a liquid composition with such an accuracy is hardly possible. As a consequence, one could choose a higher value for ε to stop at an earlier iteration, but the user has no indication of the quality of the resulting models when choosing this value and as can be seen in Fig. 15, this leads to the different mixed adaptive sampling runs being stopped at different iterations of the algorithm.

Fig. 15. α tot based on the test set error measures MAEi and MisC over the iterations of 5 runs, the red line indicates a threshold value of ε =0.03. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 16. CVE(MAE)i (left axis) and CVE(MisC) (right axis) of the xi of the three components in the non-polar and polar phase over the iterations of run 1, the blue line indicates a threshold value of ε = 0.01. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 17. LLE phase diagram of the PC-SAFT predictions (lines) and surrogate model predictions (markers) at 300 K (blue) and 340 K (red). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

A more practical solution to this problem is to define an absolute value for the accuracy that should be achieved. As an example, CVE(MAE)i and CVE(MisC) for the sample set at the different iterations of run 1 are shown in Fig. 16. Defining the required accuracy to be CV E (MAE )i = 0.01 as indicated by the blue line, the algorithm would be stopped in this case at iteration 11 with CV E (MAE )n−decane = 0.005 in the non-polar phase and 0.003 in the polar phase, CV E (MAE )DMF = 0.009 in the non-polar phase and 0.005 in the polar phase, CV E (MAE )1−dodecene = 0.004 in the non-polar phase and 0.002 in the polar phase and CV E (MisC ) = 0.018. By this, an estimate of the model accuracy is available and can be used to adapt the proposed method to case study dependent requirements. Predictions of the phase diagram that were calculated using the resulting surrogate models are compared to the original PC-SAFT model predictions in Fig. 17. The Kriging models show a good prediction accuracy for the phase diagram at both temperatures. The classifier classified all samples correctly in this test case. The applied surrogate modeling strategy has shown to be suitable to replace expensive thermodynamic model calls. The application of the mixed adaptive sampling scheme leads to better performing surrogate models compared to a conventional purely

216

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217

space-filling design. The sequential nature of the algorithm gives the user more flexibility in the construction of surrogate models of a predefined accuracy.

sic region of the separator (see Fig. 5), while the regression models predict the phase composition. Acknowledgment

7. Conclusions As equation of state models as PC-SAFT often are not directly applicable in process optimization due to the computational expense of the iterative computations, this work aims at replacing the expensive thermodynamic model calls by explicit computations of surrogate models. In order to combine explorative and exploitative samling objectives to find the best sample locations, the mixed adaptive sampling approach by Eason and Cremaschi (2014) has been extended to a novel surrogate modeling strategy for phase equilibrium calculations which is in particular suitable for LLE models. Modeling the composition of the different phases of the system leads to discontinuities in the output space as the mixture disintegrates at certain operating conditions. To obtain models which are valid in the full operating range of interest, the change in the number of phases also has to be modeled. This is done by the application of a classifier model. For the prediction of the composition of the different phases, suitable regression models are then applied. The goal of our work is to train surrogate models that provide a very high prediction accuracy over the full operating range of relevant process conditions so that the models can be used in process optimization without retraining which is why nonlinear classification and regression models are used. This is in contrast to using local approximations that possibly have to be retrained during the course of the optimization. Of course, monitoring of the range of the variables is still necessary in the optimization so that the surrogate models are not employed beyond their domain of validity, where the models would have to be retrained. In this work, this approach has been applied to the operating range of interest for the ternary liquid-liquid equilibrium of n-decane, dimethylformamide and 1-dodecene, applying a SVM classifier and ordinary Kriging models as the regression method, as part of the goal to make an efficient process optimization applying a complex thermodynamic model as PC-SAFT for all phase equilibria in the process of hydroformylation of 1-dodecene possible. The choice of the parameters of the mixed adaptive sampling algorithm was discussed in detail. A large number of subsets NSS improves the quality of the surrogate models that are trained with the adaptive sampling approach. For the selection factor SF a compromise between classifier and regression model performance had to be made. The classifier model performance is best when the SF is small and for the regression models the opposite is true. The models that were trained using the mixed adaptive sampling algorithm with the configuration NSS = 18 and SF = 0.4 were compared to models based on conventional one-shot Latin hypercube designs, revealing that the mixed adaptive approach obtains better performing classification and regression models at a similar computational expense. The algorithm should be stopped at an accuracy that is satisfactory for the purpose, e. g. considering the accuracy of the measurements. As a stop criterion, the slope ratio turned out to be not useful in this case. The definition of threshold values for crossvalidation errors should be preferred. Further improvements to the algorithm could be achieved by adjusting the weighting of the exploratory objective, the classifier and the regression models in the overall sampling objective (see Eq. (20) and Eqs. (29)–(30)). The next step will be the use of the surrogate models in process optimization. The classifier model can be implemented as a constraint that defines the valid operating range within the bipha-

This work is part of the Collaborative Research Center/ Transregio 63 “Integrated Chemical Processes in Liquid Multiphase Systems” (subproject D1). Financial support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) is gratefully acknowledged (TRR 63). The authors also thank Roderich Wallrath, Clemens Lindscheid, Maximilian Cegla, Radoslav Paulen, Marina Rantanen Modéer, Simon Wenzel, Shreya Bhatia and Anoj-Winston Gladius for their support. References Barker, J.A., Henderson, D., 1967. Perturbation theory and equation of state for fluids. II. A successful theory of liquids. J. Chem. Phys. 47, 4714–4721. doi:10.1063/1. 1701689. Beykal, B., Boukouvala, F., Floudas, C.A., Pistikopoulos, E.N., 2018. Optimal design of energy systems using constrained grey-box multi-objective optimization. Comput. Chem. Eng. 116, 488–502. doi:10.1016/j.compchemeng.2018.02.017. Bhosekar, A., Ierapetritou, M., 2018. Advances in surrogate based modeling, feasibility analysis, and optimization: a review. Comput. Chem. Eng. 108, 250–267. doi:10.1016/j.compchemeng.2017.09.017. Bischof, C., Bucker, H., Lang, B., Rasch, A., Vehreschild, A., 2002. Combining source transformation and operator overloading techniques to compute derivatives for MATLAB programs. In: Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation. IEEE, Montreal, Quebec, Canada, pp. 65–72. doi:10.1109/scam.2002.1134106. Bishop, C.M., 2006. Pattern Recognition and Machine Learning, 1 ed. Springer, New York. Boston, J., Britt, H., 1978. A radically different formulation and solution of the single-stage flash problem. Comput. Chem. Eng. 2, 109–122. doi:10.1016/ 0 098-1354(78)80 015-5. Brunsch, Y., 2013. Temperaturgesteuertes Katalysatorrecycling für die homogen katalysierte Hydroformylierung langkettiger Alkene. Verlag Dr. Hut, Dissertation TU Dortmund University, Dortmund, Germany doi:10.17877/de290r-10964. Byvatov, E., Fechner, U., Sadowski, J., Schneider, G., 2003. Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J. Chem. Inf. Comput. Sci. 43, 1882–1889. doi:10.1021/ci0341161. Caballero, J.A., Grossmann, I.E., 2008. An algorithm for the use of surrogate models in modular flowsheet optimization. AlChE J. 54, 2633–2650. doi:10.1002/aic. 11579. Chiang, L.H., Kotanchek, M.E., Kordon, A.K., 2004. Fault diagnosis based on fisher discriminant analysis and support vector machines. Comput. Chem. Eng. 28 (8), 1389–1401. doi:10.1016/j.compchemeng.2003.10.002. Chimowitz, E.H., Anderson, T.F., Macchietto, S., Stutzman, L.F., 1983. Local models for representing phase equilibria in multicomponent, nonideal vapor-Liquid and liquid-Liquid systems. 1. thermodynamic approximation functions. Ind. Eng. Chem. Process Des.Dev. 22, 217–225. doi:10.1021/i20 0 021a0 09. Chimowitz, E.H., Macchietto, S., Anderson, T.F., Stutzman, L.F., 1984. Local models for representing phase equilibria in multicomponent, non-ideal vapor-liquid and liquid-liquid systems. 2. application to process design. Ind. Eng. Chem. Process Des. Dev. 23, 609–618. doi:10.1021/i20 0 026a034. Cozad, A., Sahinidis, N.V., Miller, D.C., 2014. Learning surrogate models for simulation-based optimization. AlChE J. 60, 2211–2227. doi:10.1002/aic.14418. Cremaschi, S., 2015. A perspective on process synthesis: challenges and prospects. Comput. Chem. Eng. 81, 130–137. doi:10.1016/j.compchemeng.2015.05.007. Crombecq, K., De Tommasi, L., Gorissen, D., Dhaene, T., 2009. A novel sequential design strategy for global surrogate modeling. In: Proceedings of the 2009 Winter Simulation Conference (WSC). IEEE, pp. 731–742. doi:10.1109/wsc.2009. 5429687. Crombecq, K., Laermans, E., Dhaene, T., 2011. Efficient space-filling and noncollapsing sequential design strategies for simulation-based modeling. Eur. J. Oper. Res. 214, 683–696. doi:10.1016/j.ejor.2011.05.032. Dreimann, J.M., Hoffmann, F., Skiborowski, M., Behr, A., Vorholt, A.J., 2017. Merging thermomorphic solvent systems and organic solvent nanofiltration for hybrid catalyst recovery in a hydroformylation process. Ind. Eng. Chem. Res. 56, 1354– 1359. doi:10.1021/acs.iecr.6b04249. Eason, J., Cremaschi, S., 2014. Adaptive sequential sampling for surrogate model generation with artificial neural networks. Comput. Chem. Eng. 68, 220–232. doi:10.1016/j.compchemeng.2014.05.021. Garud, S.S., Karimi, I., Kraft, M., 2017. Smart sampling algorithm for surrogate model development. Comput. Chem. Eng. 96, 103–114. doi:10.1016/j.compchemeng. 2016.10.006. Gross, J., Sadowski, G., 2001. Perturbed-chain SAFT: an equation of state based on a perturbation theory for chain molecules. Ind. Eng. Chem. Res. 40, 1244–1260. doi:10.1021/ie0 0 03887.

C. Nentwich and S. Engell / Computers and Chemical Engineering 126 (2019) 204–217 Gross, J., Sadowski, G., 2002. Application of the perturbed-chain SAFT equation of state to associating systems. Ind. Eng. Chem. Res. 41, 5510–5515. doi:10.1021/ ie010954d. Halton, J.H., 1960. On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numer. Math. 2 (1), 84–90. doi:10.1007/ bf01386213. Henao, C.A., Maravelias, C.T., 2011. Surrogate-based superstructure optimization framework. AlChE J. 57 (5), 1216–1232. doi:10.1002/aic.12341. Hentschel, B., Kiedorf, G., Gerlach, M., Hamel, C., Seidel-Morgenstern, A., Freund, H., Sundmacher, K., 2015. Model-Based identification and experimental validation of the optimal reaction route for the hydroformylation of 1-Dodecene. Ind. Eng. Chem. Res. 54, 1755–1765. doi:10.1021/ie504388t. Hentschel, B., Peschel, A., Freund, H., 2014. Simultaneous design of the optimal reaction and process concept for multiphase systems. Chem Eng Sci 115, 69–87. doi:10.1016/j.ces.2013.09.046. Hernandez, R., Dreimann, J., Vorholt, A., Behr, A., Engell, S., 2018. Iterative real-Time optimization scheme for optimal operation of chemical processes under uncertainty: proof of concept in a miniplant. Ind. Eng. Chem. Res. 57 (26), 8750–8770. doi:10.1021/acs.iecr.8b00615. Hernández, R., Engell, S., 2016. Modelling and iterative real-time optimization of a homogeneously catalyzed hydroformylation process. Comput. Aided Chem. Eng. 38, 1–6. doi:10.1016/b978- 0- 444- 63428- 3.50 0 05-9. Illner, M., Schmidt, M., Pogrzeba, T., Urban, C., Esche, E., Schomäcker, R., Repke, J.-U., 2018. Palladium-Catalyzed methoxycarbonylation of 1-Dodecene in a two-Phase system: the path toward a continuous process. Ind. Eng. Chem. Res. 57, 8884– 8894. doi:10.1021/acs.iecr.8b01537. Jin, Y., Li, J., Du, W., Qian, F., 2016. Adaptive sampling for surrogate modelling with artificial neural network and its application in an industrial cracking furnace. Can. J. Chem. Eng. 94, 262–272. doi:10.1002/cjce.22384. Jones, D.R., Schonlau, M., Welch, W.J., 1998. Efficient global optimization of expensive black-Box functions. J. Global Optim. 13, 455–492. doi:10.1023/a: 1008306431147. Kaiser, N.M., Flassig, R.J., Sundmacher, K., 2016. Probabilistic reactor design in the framework of elementary process functions. Comput. Chem. Eng. 94, 45–59. doi:10.1016/j.compchemeng.2016.06.008. Keßler, T., Kunde, C., McBride, K., Mertens, N., Michaels, D., Sundmacher, K., Kienle, A., 2019. Global optimization of distillation columns using explicit and implicit surrogate models. Chem. Eng. Sci. 197, 235–245. doi:10.1016/J.CES.2018. 12.002. Keßler, T., Mertens, N., Kunde, C., Nentwich, C., Michaels, D., Engell, S., Kienle, A., 2017. Efficient global optimization of a novel hydroformylation process. Comput. Aided Chem. Eng. 40, 2113–2118. doi:10.1016/b978- 0- 444- 63965- 3.50354- 8. Kiedorf, G., Hoang, D.M., Müller, A., Jörke, A., Markert, J., Arellano-Garcia, H., SeidelMorgenstern, A., Hamel, C., 2014. Kinetics of 1-dodecene hydroformylation in a thermomorphic solvent system using a rhodium-biphephos catalyst. Chem. Eng. Sci. 115, 31–48. doi:10.1016/j.ces.2013.06.027. Kleijnen, J.P., Van Beers, W.C., 2004. Application-driven sequential designs for simulation experiments: kriging metamodelling. J. Oper. Res. Soc. 55, 876–883. doi:10.1057/palgrave.jors.2601747. Kleiner, M., Tumakaka, F., Sadowski, G., 2009. Thermodynamic modeling of complex systems. In: Lu, X., Hu, Y. (Eds.), Structure and Bonding. Springer, Berlin, Heidelberg, pp. 75–108. Kontogeorgis, G.M., Folas, G.K., 2010. Thermodynamic Models for Industrial Application: from Classical and Advanced Mixing Rules to Association Theories, 1 ed. Wiley, Chichester, UK. Krige, D.G., 1951. A statistical approach to some basic mine valuation problems on the witwatersrand. Master’s thesis, South Africa, University of Witwatersrand. Leesley, M., Heyen, G., 1977. The dynamic approximation method of handling vaporliquid equilibrium data in computer calculations for chemical processes. Comput. Chem. Eng. 1, 103–108. doi:10.1016/0098-1354(77)80015-X. Lophaven, S. N., Nielsen, H. B., Søndergaard, J., 2002. DACE - A Matlab Kriging Toolbox, Version 2.0. Macchietto, S., Chimowitz, E.H., Anderson, T.F., Stutzman, L.F., 1986. Local models for representing phase equilibria in multicomponent nonideal vapor-Liquid and liquid-Liquid systems. 3. parameter estimation and update. Ind. Eng. Chem. Process Des. Dev. 25, 674–682. doi:10.1021/i20 0 034a013.

217

Matheron, G., 1963. Principles of geostatistics. Econ. Geol. 58 (8), 1246–1266. doi:10. 2113/gsecongeo.58.8.1246. MATLAB and Statistics and Machine Learning Toolbox Release, 2017. The MathWorks, Inc., Natick, Massachusetts, United States. McBride, K., Gaide, T., Vorholt, A., Behr, A., Sundmacher, K., 2016. Thermomorphic solvent selection for homogeneous catalyst recovery based on COSMO-RS. Chem. Eng. Process. 99, 97–106. doi:10.1016/j.cep.2015.07.004. McBride, K., Sundmacher, K., 2015. Data driven conceptual process design for the hydroformylation of 1-Dodecene in a thermomorphic solvent system. Ind. Eng. Chem. Res. 54, 6761–6771. doi:10.1021/acs.iecr.5b00795. McBride, K., Sundmacher, K., 2019. Overview of surrogate modeling in chemical process engineering. Chem. Ing. Tech. 91, 228–239. doi:10.10 02/cite.20180 0 091. Merchan, V.A., Wozny, G., 2016. Comparative evaluation of rigorous thermodynamic models for the description of the hydroformylation of 1-Dodecene in a thermomorphic solvent system. Ind. Eng. Chem. Res. 55, 293–310. doi:10.1021/acs.iecr. 5b03328. Müller, D., Illner, M., Esche, E., Pogrzeba, T., Schmidt, M., Schomäcker, R., Biegler, L.T., Wozny, G., Repke, J.-U., 2017. Dynamic real-time optimization under uncertainty of a hydroformylation mini-plant. Comput. Chem. Eng. 106, 836–848. doi:10.1016/j.compchemeng.2017.01.041. Nentwich, C., Engell, S., 2016. Application of surrogate models for the optimization and design of chemical processes. In: 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, Vancouver, BC, Canada, pp. 1291–1296. doi:10. 1109/ijcnn.2016.7727346. Niederreiter, H., 1992. Random number generation and quasi-Monte carlo methods. Soc. Ind. Appl. Math. doi:10.1137/1.9781611970081. Olofsson, S., Mehrian, M., Calandra, R., Geris, L., Deisenroth, M., Misener, R., 2018. Bayesian multi-Objective optimisation with mixed analytical and black-Box functions: application to tissue engineering. IEEE Trans. Biomed. Eng. 1–12. doi:10.1109/tbme.2018.2855404. Onel, M., Kieslich, C.A., Guzman, Y.A., Floudas, C.A., Pistikopoulos, E.N., 2018. Big data approach to batch process monitoring: simultaneous fault detection and diagnosis using nonlinear support vector machine-based feature selection. Comput. Chem. Eng. 115, 46–53. doi:10.1016/j.compchemeng.2018.10.016. Perregaard, J., Pedersen, B.S., Gani, R., 1992. Steady state and dynamic simulation of complex chemical processes. Trans. Inst. Chem. Eng. 70 (A), 99–109. Schäfer, E., Brunsch, Y., Sadowski, G., Behr, A., 2012. Hydroformylation of 1dodecene in the thermomorphic solvent system dimethylformamide/decane. phase behavior-reaction performance-catalyst recycling. Ind. Eng. Chem. Res. 51, 10296–10306. doi:10.1021/ie300484q. Schäfer, E., Sadowski, G., Enders, S., 2014. Calculation of complex phase equilibria of DMF/alkane systems using the PCP-SAFT equation of state. Chem. Eng. Sci. 115, 49–57. doi:10.1016/j.ces.2013.04.053. Simpson, T.W., Peplinski, J.D., Koch, P.N., Allen, J.K., 1997. On the use of statistics in design and the implications for deterministic computer experiments. In: Proceedings of DETC’97 1997 ASME Design Engineering Technical Conferences. Sacramento, California, pp. 1–14. Steimel, J., Engell, S., 2016. Optimization-based support for process design under uncertainty: a case study. AlChE J. 62, 3404–3419. doi:10.1002/aic.15400. Støren, S., Hertzberg, T., 1994. Local thermodynamic models applied in dynamic process simulation: a simplified approach. Trans. Inst. Chem. Eng. 72 (A3), 395–401. Tumakaka, F., Gross, J., Sadowski, G., 2005. Thermodynamic modeling of complex systems using PC-SAFT. In: Fluid Phase Equilibria, 228–229, pp. 89–98. doi:10. 1016/j.fluid.2004.09.037. Vogelpohl, C., Brandenbusch, C., Sadowski, G., 2013. High-pressure gas solubility in multicomponent solvent systems for hydroformylation. part i: carbon monoxide solubility. J. Supercrit. Fluids 81, 23–32. doi:10.1016/j.supflu.2013.04.006. Vogelpohl, C., Brandenbusch, C., Sadowski, G., 2014. High-pressure gas solubility in multicomponent solvent systems for hydroformylation. part II: syngas solubility. J. Supercrit. Fluids 88, 74–84. doi:10.1016/j.supflu.2014.01.017. Zagajewski, M., Dreimann, J., Thönes, M., Behr, A., 2016. Rhodium catalyzed hydroformylation of 1-dodecene using an advanced solvent system: towards highly efficient catalyst recycling. Chem. Eng. Process. 99, 115–123. doi:10.1016/j.cep. 2015.06.014.