A stochastic programming approach for the Bayesian experimental design of nonlinear systems

A stochastic programming approach for the Bayesian experimental design of nonlinear systems

Computers and Chemical Engineering 72 (2015) 312–324 Contents lists available at ScienceDirect Computers and Chemical Engineering journal homepage: ...

1MB Sizes 0 Downloads 30 Views

Computers and Chemical Engineering 72 (2015) 312–324

Contents lists available at ScienceDirect

Computers and Chemical Engineering journal homepage: www.elsevier.com/locate/compchemeng

A stochastic programming approach for the Bayesian experimental design of nonlinear systems José M. Laínez-Aguirre a,∗ , Linas Mockus b , Gintaras V. Reklaitis b a b

Department of Industrial and Systems Engineering, University at Buffalo, Amherst, NY 14260, United States School of Chemical Engineering, Purdue University, 480 Stadium Mall Drive, West Lafayette, IN 47907, United States

a r t i c l e

a b s t r a c t

i n f o

Article history: Received 8 February 2014 Received in revised form 9 June 2014 Accepted 11 June 2014 Available online 6 July 2014 Keywords: Design of experiments Stochastic programming NLP MINLP

Several approaches for the Bayesian design of experiments have been proposed in the literature (e.g., D-optimal, E-optimal, A-optimal designs). Most of these approaches assume that the available prior knowledge is represented by a normal probability distribution. In addition, most nonlinear design approaches involve assuming normality of the posterior distribution and approximate its variance using the expected Fisher information matrix. In order to be able to relax these assumptions, we address and generalize the problem by using a stochastic programming formulation. Specifically, the optimal Bayesian experimental design is mathematically posed as a three-stage stochastic program, which is then discretized using a scenario based approach. Given the prior probability distribution, a Smolyak rule (sparse-grids) is used for the selection of scenarios. Two retrospective case studies related to population pharmacokinetics are presented. The benefits and limitations of the proposed approach are demonstrated by comparing the numerical results to those obtained by implementing a more exhaustive experimentation and the D-optimal design. © 2014 Elsevier Ltd. All rights reserved.

1. Introduction Experimental design is a discipline that has relevance in many different fields. It can be applied as a means of minimizing the number of experiments required to improve the operating conditions of supercritical fluid extractions (Sharif et al., 2014), to prescribe the optimal dosing in a pharmacokinetic study (Laínez et al., 2011), or to accurately characterize a crystallization process (Chung et al., 2000). The optimal Bayesian design for linear models is a well defined problem. As demonstrated in Chaloner and Verdinelli (1995), maximizing the expected Kullback–Leibler divergence between the posterior and the prior distributions leads to the so-called Doptimal design which consists of maximizing the determinant of the Fisher information matrix (FIM). One convenient characteristic of the experimental design of linear systems is that the design is independent of the data to be collected during the experimentation as well as independent of the parameters characterizing the system’s model. This is not the case when designing experiments for nonlinear systems. The treatment of nonlinear optimal design requires the explicit consideration of model parameters

∗ Corresponding author. Tel.: +1 7654904901. E-mail address: [email protected] (J.M. Laínez-Aguirre). http://dx.doi.org/10.1016/j.compchemeng.2014.06.006 0098-1354/© 2014 Elsevier Ltd. All rights reserved.

and experimental outcomes. However, the investigator must make the decision on the optimal design without complete knowledge about these two components. This inherent uncertainty makes this problem suitable for a stochastic formulation. To deal with nonlinear designs, various approximation strategies have been proposed. One typical approach is the so-called local optimality construction (Chernoff, 1953; Chaloner and Verdinelli, 1995) which determines the FIM using sensitivity analysis or the Hessian of the logarithm of the posterior density usually evaluated at the maximum a posteriori (MAP) estimates. Then, the nonlinear D-optimal design is formulated similarly to the linear case by maximizing the determinant of the FIM. Even though, the FIM may be defined for any probability distribution, D-optimality requires the normality assumption for the posterior distribution and uses the FIM to approximate its corresponding variance matrix. The normality of the posterior distribution can only be demonstrated for linear systems when using normal priors (Bishop, 2006). Nonlinearities in the model of the system or/and non-normal priors may lead to significant deviations from these assumptions. The aforementioned approximations have been the mainstay of applications in the Process Systems Engineering (PSE) arena. For example, Sedrati et al. (1999) develop a sequential experimental design strategy for the estimation of kinetic parameters of a batch reactor using the local D-optimal criterion. An optimal experimental design to estimate the parameters of dynamic gas–liquid

J.M. Laínez-Aguirre et al. / Computers and Chemical Engineering 72 (2015) 312–324

Notation Sets B c e i j k m o Io

cd set of collocation points that are in the common border of a pair of elements collocation points finite elements sub-scenario associated with a given set of model parameters i sub-scenario associated with the experimental data level of accuracy of quadrature rules experiment initial and boundary conditions domain of boundary condition o

Parameters D number of control variables in the experimental design vector of unknown model parameters  ˜j weights corresponding to the quadrature rules ωi , ω employed to select sub-scenarios i and j, respectively a sufficiently large number ˝ up lower and upper bounds limiting the values of ulo e , ue the control variables corresponding to element e, respectively ˜ ec u vector of control values associated with element e and collocation point c y vector/matrix of experimental data Variables Aei multivariate polynomial associated with the approximation in element e which is evaluated in the set of model parameters i L( y|, u) likelihood of the data y given a specific set of parameters and design prior distribution for the unknown parameters p() p(| y, u) posterior distribution provided the experimental data y and design u p(, y| u) probability that  and y jointly occur given the design u p( y| u) marginal distribution of experimental data y given the design u  e,i, c unknown coefficient associated with collocation point c in element e and set of parameters i x vector of state variables predicted value for the experimental observations yˆ e ci vector of state values corresponding to finite element e, collocation point c and set of parameters i ˆm vector of control variables for experiment m u, u U utility/objective function/performance of the experimental design zem binary variable indicating if experiment m falls into element e Functions bo vector of initial and boundary conditions ε experimental error vector function relating the state variables x to the f predictions G matrix/vector of differential equations vector of algebraic equations h

N V

313

component of the polynomial associated with collocation point c and control variable (dimension) d a normal distribution variance–covariance matrix associated with the experimental error ε

transfer models is reported in Navarro-Laboulais et al. (2008). Again, they addressed the nonlinear design problem by means of an approximation based on the local D-optimal and E-optimal criteria. Subsequently, Rasch et al. (2009) introduce a software package for optimal experimental design called EFCOSS. The approaches that are implemented include the local D-optimal, E-optimal and A-optimal criteria. The E-optimality maximizes the minimum eigenvalue of the Fisher information matrix, while the A-optimality minimizes the trace of the inverse of the Fisher information matrix. Recently, the D-optimal design is utilized to determine the position of sensors so as to improve the parameter estimation of non˜ and Theodoropoulos (2012). linear distributed systems in Alana With regard to applications of nonlinear design to pharmacokinetics, the field to which both case studies presented in this work relate, Gagnon and Leonov (2005) present the use of local Doptimality to determine the blood sampling scheme by taking cost constraints into the formulation. The relationship between drug concentration and time is described using population nonlinear mixed-effects regression models. More recently, the local D-optimality approach is applied to improve parameter estimation of pharmacokinetic/pharmacodynamic models which are represented by a set of differential-algebraic equations (Galvanin et al., 2013). The authors consider additional constraints to reduce the design region according to the experimenter’s experience. The development of other type of approximations for the nonlinear optimal design problem has been considered to be deserving of more research (Chaloner and Verdinelli, 1995; Fedorov, 2010). Approaching the nonlinear design problem using criteria such as the local D-optimality can oversimplify the treatment of the variability associated with the unknown model parameters and the experimental outcomes, which may lead to suboptimal solutions. Furthermore, the normality assumption for the prior and posterior distributions may not be always appropriate. Here, we propose to formulate the optimal design as a stochastic programming problem. Instead of using normality assumptions to simplify the problem, we propose employing a numerical approximation based on the discretization of the rigorous formulation. Such a discretization is achieved using (i) a scenario based approach, and (ii) orthogonal collocation on finite elements if the system’s model is represented by a set of differential-algebraic equations (DAE). One challenge inherent to the proposed approach is the curse of dimensionality, which arises from the need for multivariate integration. We propose to employ the Smolyak rule to mitigate this difficulty. To the best of our knowledge, the rigorous formulation of the experimental design for nonlinear systems and its solution through numerical approximation is an area that has not been fully explored to date. 2. Preliminaries This work is concerned with model-based experimental design, where the relationship between (i) those aspects that are under the control of the investigator and (ii) the experimental outcomes are represented by means of a mathematical model. The experimental design consists in determining the optimal value of the control variables ( u). These variables can be of very diverse nature

314

J.M. Laínez-Aguirre et al. / Computers and Chemical Engineering 72 (2015) 312–324

and may include the duration of the experiment, number of samples, location in space and/or time, and selection of blocking factors among others aspects depending on the problem at hand. Next, we briefly discuss a mathematical model general enough to represent a wide variety of such problems. The fundamentals of Bayesian inference are also outlined. These will provide the building blocks of the solution approach proposed in this work. 2.1. A general representation of the predictive model In general, the mathematical model used to represent the system under investigation can be described by a set of DAEs of the form: yˆ = f (u, , x)

∂x = G(x, u, ) ∂u

(1)

h(x, u, ) = 0 bo (u) u∈Io

= x(u) ∀o

where yˆ is the vector of predictions, while f is a vector function relating the state variables x to the predicted values.  represents the vector of uncertain model parameters and u is the vector of control variables. These control variables are the ones we are interested in optimizing. The matrix of differential equations and the vector of algebraic equations that define the predictive model are G and h, respectively. The vectors of initial and boundary conditions are given by bo and their corresponding domain is represented by Io . 2.2. Bayesian inference According to the Bayes’ theorem (Eq. (2)), the posterior belief about the model parameters involves considering the data obtained from experiments   ( y), and the prior belief represented by the prior distribution p  . The probability that the experimental outcomes ( y) are explained by the model (Eq. (1)) and a specific set of values for the parameters () given the design u is captured  experimental  by the likelihood function (L y|, u ). p(|y, u) =

L(y|, u)p() p(y|u)

(2)

Assuming an additive error, the relationship between the experimental observation ( y) and its associated predictive value is expressed in Eq. (3). Some transformations would be required for predictions and observations to account for a log-normal error/multiplicative error. y = yˆ + ε

(3)

Let us also assume that the experimental error ε is a zero-mean Gaussian process with covariance matrix V (Kennedy and O’Hagan, 2001). The elements of V can be derived using a covariance function such as the squared exponential, -exponential and the Matérn functions (Rasmussen and Williams, 2006). Then, the likelihood function can be calculated by evaluating the normal distribution given in Eq. (4). L(y|, u) = N(y|ˆ y, V )

Additionally, the marginal distribution of the collected data can be computed as given in Eq. (6) by the law of total probability.



p(y|u) =

(5)

p(, y|u) d

(6)



3. The rigorous stochastic formulation In this section, the nonlinear Bayesian experimental design is posed as a three-stage stochastic formulation. This section has been divided into two parts. The first subsection provides the details about the objective function of the experimental design problem; while the second develops the stochastic program. 3.1. The objective function The objective of the experimental design is to improve the statistical inference about the system under study by adequately selecting the values of the control variables (Chaloner and Verdinelli, 1995). We are focused on parameter estimation; therefore, improving statistical inference is understood herein as minimizing the variability of the posterior distribution p(| y, u) of the unknown model parameters  (Blau et al., 2008). In a Bayesian framework this is equivalent to select the experimental design such that the expected Kullback–Leibler divergence between the prior and posterior distributions is maximized (Lindley, 1956) as shown in Fig. 1. Since the experiment to be designed has not been carried out its outcomes are unknown. As a result the expectation must be computed over these unknown outcomes y. Then, the utility or objective function (U) of the Bayesian experimental design can be formulated as given in Eq. (7).



 

U(u) =

p(, y|u) ln 

y

p(|y, u) p()



dy d

(7)

This equation can be mathematically rearranged using conditional probabilities as shown in Eq. (8). Notice that since the prior distribution p() does not depend on the design ( u), the second term of the right hand side of Eq. (8) is constant and thus can be eliminated from the objective function.

 

(4)

It will be useful for our further development to bear in mind that one can obtain the probability that  and y jointly occur through the definition of conditional probabilities, Eq. (5). p(, y|u) = p(y|u)p(|y, u)

Fig. 1. Graphical representation of a prior (multimodal distribution in red) and a posterior (unimodal distribution in blue) distributions whose Kullback–Leibler divergence is maximized. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

U(u) =

p(y|u)p(|y, u) ln{p(|y, u)} dy d 



y



p() ln p() d 

(8)

J.M. Laínez-Aguirre et al. / Computers and Chemical Engineering 72 (2015) 312–324

315

Table 1 Number of unknown parameters (collocation points) required to construct an approximation with a level of accuracy k = 4. Dimensions (d)

1 5 10 20

Tensor product

Smolyak rule

Gaussian

Gaussian

4 1024 1,048,576 1,099,511,627,776

4 286 1771 12,341

Kronrod–Patterson 7 151 1201 10,001

3.2. The three-stage stochastic program Stochastic programming explicitly takes into account uncertainties in the problem formulation. Its goal is to find a solution that is feasible for all possible scenarios and which optimizes the expectation of a performance indicator. In models of this type, the decision maker takes some actions in the first stage, after which an event occurs and affects the outcome of those first-stage decisions. A recourse decision can then be made in the second stage that compensates for the effects that might have been experienced as a result of the first-stage decisions. In our development, the first stage decisions are comprised of the control variables u which represent those decisions related to the experimental design. These decisions are taken without any information about the two random parameters arising in this problem, namely: the outcomes collected during experimentation ( y) and the model parameters (). Once information about the realization of y is “revealed”, then second stage variables are defined. The variables associated with the second stage include p( y| u) which represents the probability that a specific set of outcomes (collected data) y is obtained given the experimental design u (Eq. (6)). At this point it is important to emphasize that the model parameters () remain uncertain even after collecting the results from the experiments. The investigator does not obtain their values from experimentation, instead their posterior distribution can be inferred using the Bayesian framework. Nonetheless, the equations associated with  constitute the third-stage of the proposed stochastic program. The innermost optimization problem Q[3] (u, y, ) is associated with the third stage. Eqs. (14)–(17) represent the set of DAEs describing the predictive model for the system under study as discussed in Section 2.1. In this formulation, the Bayes’ theorem is expressed by Eqs. (18) and (19) using conditional probabilities (see Section 2.2). The likelihood function is stated in Eq. (20). Eq. (13) is the objective function of the innermost problem, which maximizes the contribution to the entropy associated with the realization of a given combination of experimental outcomes y and model parameters . The optimization problem related to the second stage Q[2] (u, y) is formulated for every realization of the experimental outcomes y as a function of the experimental design u. Eq. (12) states the law of total probability in order to compute p( y| u) (see Section 2.2). Note that the objective function of the second stage problem is the entropy of  for a given experimental outcome y weighted by the probability that such an outcome occurs. The optimization problem related to the first stage just involves constraints specifying the feasible range of the experimental design variables u (Eq. (10)) while the objective function is the expected entropy of , which corresponds to the first term of the right hand side of Eq. (8). The stochastic model developed in this section is a rigorous representation of the optimal design problem. However, it cannot be solved analytically. In the next section, we propose a discretization approach to tackle this optimization problem.

4. Numerical approximation: a discretized approach Instead of simplifying the optimal experimental design by using normal approximations to compute the expected Kullback–Leibler divergence as described in Section 1, a discretized approximation to the optimization problem represented by Eqs. (9)–(20) can be assembled. This approximation translates the optimization problem into a mixed integer nonlinear program (MINLP). To do so, a discretization (i) of control variables u and (ii) of the integrals involving the two stochastic set of parameters,  and y, is employed. The former is accomplished by using orthogonal collocation on finite elements and the latter by using a scenario based approach. 4.1. Orthogonal collocation approach This well-known approach transforms a set of DAEs into a set of algebraic equations. This is achieved by subdividing the domain of state variables ( x) (see Section 2.1) into a set of finite elements. The state profile within each element is described using a polynomial interpolation whose nodes are referred to as the collocation points. Next, the methodology employed to construct such polynomial interpolation is briefly described. 4.1.1. Univariate interpolation A univariate polynomial with accuracy level k that interpolates nk nodes can be represented as follows: pk (u) =

nk 

c c (u)

(21)

c=1

where  c is the set of unknown coefficients associated with collocation point c, whereas c (u) is the set of characterizing functions of

316

J.M. Laínez-Aguirre et al. / Computers and Chemical Engineering 72 (2015) 312–324

the polynomial approximation pk . In our development, the simplest choice to define c is utilized, as given by Eq. (22). c (u) = uc−1

∀c

(22)

By selecting the collocation points c using a Gaussian quadrature rules, an approximation pk with accuracy level k can exactly describe polynomials of total order 2k − 1. Polynomials to approximate a smooth function x in one dimension are easy to construct and their accuracy depends on the rule chosen to select the collocation points c; however, the optimal design problem may involve multivariate interpolation. We address that issue in the next section. 4.1.2. Multivariate interpolation using sparse grids The following discussion is based on Heiss and Winschel (2008) and Barthelmann et al. (2000) which is applicable irrespective of the particular quadrature rule used to select the collocation points (e.g., Legendre, Kronrod–Patterson, Chebychev rules). However, it is noteworthy that slight modifications are required if a functional form different from Eq. (22) is used for c (e.g., Lagrange polynomials). When dealing with the approximation of a function in high dimensional spaces, a common difficulty in using polynomial interpolation is that it generally suffers from the curse of dimensionality. This occurs particularly when the multivariate polynomial is constructed using a tensor product of univariate polynomials as shown in Eq. (23). Note that we have added to the formulation the index d to account for the different dimensions (e.g., kd and nkd indicate the accuracy level in the d dimension and its corresponding number of collocation points, respectively). In D dimensions (d = {1, . . ., D}), the tensor product built using the same accuracy level in every dimension requires the estimation of nD parameters. In our k case, D denotes the number of control variables involved in the experimental design. (D) 1 ,k2 ,...,kD

=

pk

D

p (u ) d=1 kd d

nk

=

n

n

k2 kD 1   

···

c1 =1c2 =1

c1 ,...,cD

cD =1

D



(23)

cd (ud )

d=1

We propose using a different approach for extending univariate polynomials to multiple dimensions, which is known as the sparse grids or the Smolyak algorithm. The interpolating polynomial (A) resulting from the application of the Smolyak rule in D dimensions and for a level of accuracy k is determined as shown in Eqs. (24) and (25). Such a polynomial is the sum of tensor products with different combinations of accuracy levels.



k−1 

AD,k =

(−1)k−1−q

q=k−D

ND q =

⎧ ⎨l ∈ ⎩



D

D−1 k−1−q

ND

l d=1 d

=D+q



(D) 1 ,l2 ,...,lD

pl

(24)

l∈ND q

if q ≥ 0

(25)

if q < 0

It can be proved that in contrast to the tensor product (Eq. (23)) the number of collocation points required by the Smolyak algorithm does not increase exponentially, rather it increases polynomially (Bungartz and Griebel, 2004). Furthermore, the number required can be decreased by using nested rules such as the Kronrod–Patterson sequences (Heiss and Winschel, 2008). To demonstrate the reduction in computational cost that can be achieved by using sparse grids, a comparison of four different multidimensional constructions is presented in Table 1.

4.2. Scenario selection Scenario based approaches built an approximation to the problem through the discretization of the integrals involved in the computation of expectations. The solution quality of scenario based approaches depends on appropriately selecting the scenarios to be included in the approximation. Note that numerical integration transforms the integrals included in Eqs. (9), (11), and (12) into a weighted sum over a few significant evaluation nodes (i.e., scenarios). To perform this discretization, quadrature rules are a suitable well-known alternative for univariate integration (Davis and Rabinowitz, 2007); while the Smolyak algorithm can be also used as an efficient methodology to address high dimensional numerical integration (Heiss and Winschel, 2008). We demonstrated in a previous work (Laínez-Aguirre and Reklaitis, 2013) that accurate approximations for stochastic programming can be obtained by employing the Smolyak algorithm. Therefore, the use of quadrature rules based on the Smolyak rule is suggested. The scenarios are selected to correspond to the evaluation nodes (i.e., collocation points) proposed by the Smolyak rule. The integral equations are then approximated as a weighted sum over the respective evaluation nodes. For the optimal experimental design problem, each scenario constitute a different combination of a set of unknown model parameters () and potential outcomes ( y) which will be designated hereafter by the indexes i and j, respectively. 4.3. Discretized problem formulation The discretized version of the optimal Bayesian design problem is detailed next. Notice that the predictive model described by Eqs. (14)–(17) is solved for sub-scenario i, i.e., every set of parameters i selected for the discretization. As a result, in addition to postulating a different polynomial approximation for every element e the discretized stochastic formulation requires a polynomial for every ‘sub-scenario’ i associated with a set of model parameters i as defined in Eq. (26). Note that c is the same in every combination of element and scenario and that the accuracy level k is a parameter to be chosen a priori. Aei =

k−1 

(−1)k−1−q

q=k−D n

n

l1 lD   

×

l∈ND q

···

c1 =1

D−1

k−1−q

e,i,c

cD =1

D

∀e, i

cd (ud )

(26)

d=1

Using orthogonal collocation and the sparse grids approach, the predictive model can then be formulated in discretized form as follows: ˜ ec ) ∀e, c eci = Aei (u

(27)

∂Aei ˜ ec , i ) ∀e, c, i ˜ ec ) = G(eci , u (u ∂u

(28)

˜ ec ) = eci Ae i (u

∀(e , e, c) ∈ B, i

(29)

∀e, c, i

(30)

˜ ec , i ) = 0 h(eci , u

˜ ec ) = bo (u ˜ ec ) ∀o, (e, c) ∈ Io , i Aei (u



(31)

zem = 1 ∀m

(32)

e up

ˆ m ≤ ˝(1 − zem ) + ue −˝(1 − zem ) + ulo e ≤u

∀e, m

(33)

J.M. Laínez-Aguirre et al. / Computers and Chemical Engineering 72 (2015) 312–324

yˆ im = f



317

ˆ m) zem Aei (u

∀i, m

(34)

e

Eq. (27) expresses the polynomial interpolation for the state variables ( x) for each element e evaluated at collocation point c. Eq. (28) is the discretized form of the differential equations. Eq. (29) imposes state profile continuity between elements. Here, the set B represents those collocation points c that are on the border between elements e and e . The algebraic constraints of the model are evaluated only at the collocation points as expressed by Eq. (30), while Eq. (31) imposes the initial and boundary conditions represented by the vector function bo . The set of collocation points c within element e that lies in the domain of bo is indicated by I0 . Eq. (32) constraints each experiment m to belong to only one element e. Note that the cardinality of set m is the number of experiments being optimized. Binary variable zem takes the value of 1 if experiment m lies in element e, 0 otherwise. Eq. (33) forces the values of ˆ m to fall into the domain of the element e to which the experiment u it belongs. ˝ is a sufficiently large number which forces the bounds ˆ m to be inactive if zem is zero. Eq. (34) computes the predictions on u corresponding to experiment m and set of model parameters i . For further details the interested reader is directed to the work of Cuthrell and Biegler (1987) and Bialecki and Fairweather (2001). ˆ m) = p(y jm |u



ˆ m ) ∀j ωi p(i )L(y jm |i , u

(35)

i

ˆ m ) = p(i )L(y jm |i , u ˆ m ) ∀i, j p(i , y jm |u

(36)

ˆ m ) = p(y jm |u ˆ m )p(i |y jm , u ˆ m ) ∀i, j p(i , y jm |u

(37)

ˆ m ) = N(y jm |ˆ L(y jm |i , u y im , V ) ∀i, j

(38)

Eqs. (35)–(38) are the discretized form of those expressions related to the Bayesian theorem, namely, Eqs. (12), (18), (19), and (20), respectively. The associated integrals have been discretized as a weighted summation over the scenarios (i.e., evaluation nodes corresponding to the selected Smolyak rule) U=

 i

ˆ m ) ln{p(i |y jm , u ˆ m )}] [ωi ω ˜ j p(i , y jm |u

(39)

j

The expected Kullback–Leibler divergence is discretized in ˜ j denote the weights associated with Eq. (39). Here, ωi and ω the quadrature rule used to discretize the model parameters i and experimental outcomes y jm , respectively. Combining these constructions, an approximation to the optimization problem represented by Eqs. (9)–(17) can be mathematically posed as follows: MinU ˆm u

subject to Eqs. (26)–(39) 5. Computational studies In this section two case studies of increasing complexity are presented to demonstrate the capabilities and limitations of the proposed approach. The problems are drawn from the field of pharmacokinetics. The purpose of these case studies is to characterize the pharmacokinetic profile of an individual by taking blood draws after administration of the drug under study. Each blood draw requires an analysis to determine drug and/or metabolite concentrations. The experimental design problem consist in determining the best timing for the blood draws given a prior probability distribution. The prior probabilities used in the case studies are derived from data collected in previous clinical studies and are constructed as a mixture (i.e., weighted summation) of Gaussian distributions by applying the procedure reported in Laínez et al. (2011). Note that

Fig. 2. Graphical representation of the different regions defining the precision and recall metrics.

if the model postulated to describe the system under analysis is a plausible alternative, we can easily use a discretized version of the prior to propagate the parameter variability and construct a discretized distribution for the experimental outcomes. For both case studies, we assume an additive experimental error which follows a zero-mean Gaussian process with a diagonal covariance matrix V (see Eq. (20)). For the multivariate integration (i.e., scenario selection), sparse grids derived from univariate Gaussian quadrature rules are used (Heiss and Winschel, 2014). A key aspect in optimizing nonlinear problems is to provide an initial feasible solution to the solver. In this regard, the heuristic procedure reported in Laínez et al. (2011) is employed to generate plausible initial solutions. This procedure is based on ranking the modes in the predicted concentration distributions at potential sampling times. First, a pool of potential sampling points is selected. Using the predictive model (Eq. (1)) and the prior distribution of its parameters (p()), the distributions of the predicted concentrations are built for the different potential sampling time points and subsequently are ranked according to their modes. All computations have been carried out using GAMS (Rosenthal, 2011) on an Intel i7 2.66 GHz computer with 8 GB RAM memory. To demonstrate the performance of the proposed optimal design procedure, we perform a retrospective analysis. We compare ‘exhaustive’ designs against the recommended optimal designs. The proposed comparison uses the 95% confidence region of the corresponding predicted profile. The profile under study depends on the specific case. To quantify the performance of the optimal designs, the precision (P) and the recall/sensitivity (R) are evaluated (Fawcett, 2006). Fig. 2 shows a generic graphical representation of the prior distribution and two posterior distributions: one obtained by means of an exhaustive design and one by means of the optimal design. Let us assume that the exhaustive design provides the best available estimate, the baseline, since it is constructed with all available information from experimental observations. Then, four regions are defined: (i) the true positive (TP) region which both posterior distributions consider as part of the predicted estimation, (ii) the true negative (TN) region which both posterior distributions do not consider as part of the predicted estimation, (iii) the false positive (FP) region which is considered as part of the predicted estimation by the optimal design but not by the exhaustive design, and (iv) the false negative (FN) region which is considered as part of the predicted estimation by the exhaustive design but not by the optimal design. The metrics calculation using these regions is as follows: P=

TP TP + FP

R=

TP TP + FN

(40)

318

J.M. Laínez-Aguirre et al. / Computers and Chemical Engineering 72 (2015) 312–324

Table 2 Optimal sampling design for the gabapentin case study for different number of samples (M) and accuracy levels. M



y

Accuracy

Scenarios

Accuracy

1 2 3

2

209

1

1 2 3

3

1 2 3

2

969

1

209

2

Model size Scenarios 19

19

209

Equations

CPU s

Sampling times [h]

Variables

U [nats]

u1

u2

u3

8190 12,389 16,588

8191 12,391 16,591

4.09 2.31 3.09

2.14 1.65 1.42

– 5.95 3.63

– 8.32

37,830 57,229 76,628

37,831 57,231 76,631

97.74 70.48 171.40

2.15 1.64 1.38

– 5.94 3.62

– 8.55

87,990 132,089 176,188

87,991 132,091 176,191

53.80 33.56 49.51

2.13 1.65 1.38

– 5.98 3.63

– 8.57

−2.55

– −1.59 −1.58

−2.55

– −1.59 −1.58

−2.55

– −1.59 −1.58

Table 3 Precision (P) and recall (R) resulting from the proposed optimal and the D-optimal design of two blood draws after administration of a single dose of gabapentin. Patient

Proposed approach

D-optimality



P

R

P

R

P

R

G01 G02 G03 G04 G05 G06 G07 G08 G09 G10 G11 G12 G13 G14 G15 G16 G17 G18 G19

98.25 77.35 55.32 74.58 85.45 98.80 66.19 65.89 99.97 72.66 99.39 98.50 57.24 54.29 72.87 88.66 75.50 79.82 60.92

99.58 99.88 99.96 97.25 100.00 98.27 100.00 99.47 97.44 100.00 99.95 97.73 100.00 93.88 99.96 100.00 100.00 99.26 100.00

49.54 37.35 53.98 69.98 90.84 49.02 66.00 82.74 24.36 57.11 51.30 96.59 49.20 80.00 73.59 45.10 47.35 34.30 69.20

96.29 97.98 99.61 97.23 99.93 96.26 99.92 99.64 98.94 99.51 95.35 98.07 100.00 97.03 99.38 96.24 98.75 94.85 99.56

48.71 40.00 1.34 4.60 −5.38 49.77 0.19 −16.85 75.61 15.55 48.09 1.91 8.04 −25.71 −0.73 43.56 28.16 45.51 −8.28

3.28 1.90 0.35 0.02 0.07 2.01 0.08 −0.17 −1.50 0.49 4.60 −0.33 0.00 −3.15 0.58 3.76 1.25 4.41 0.44

Average

77.98

99.09

59.35

98.13

18.64

0.95

The precision (P) of the optimal sampling indicates the probability that a prediction within its 95% confidence region is part of the baseline, while the recall (R) indicates the probability that a prediction that is part of the baseline is included within the 95% confidence region associated with the optimal sampling. The performance metrics are evaluated for the proposed approach as well as for the D-optimal design. The results are then contrasted and discussed. To implement the D-optimal design, the R package PFIM (Retout et al., 2007) has been utilized. The Fedorov–Wynn algorithm (Fedorov, 1972) is selected for the Doptimization which evaluates the determinant of the FIM within a finite number of potential experimental designs. 5.1. Gabapentin pharmacokinetics Gabapentin is an anticonvulsant that is prescribed to treat epilepsy and other neuropathic disorders. The data used to build the prior distribution were obtained from 19 patients enrolled in the clinical study carried out by Urban et al. (2008). The model selected for predicting plasma concentrations is a single dose one compartment model with first order absorption. The integrated form of this model is as follows: Cˆ b (t) = Dose

F  V

ka [exp(−ke (t − to )) − exp(−ka (t − to ))] ka − ke

where F [%] is the bioavailability (i.e., the fraction of the dose that reaches the blood stream), V [mL] is the volume of distribution, Dose [mg] is the administered dose, ke [h−1 ] is the elimination rate,

ka [h−1 ] is the absorption rate, to [h] is the lag time, and t [h] is the time after administration. The control variable as aforementioned is t. The dose amount is fixed at a value of 400 mg. Notice that the predicted concentration Cˆ b (t) is equal to zero for all t < to . The unknown parameters () are (F/V) [mL−1 ], ke , ka , and to . In this case there is no need for the orthogonal collocation approximation since the model is expressed in closed algebraic form. The marginal prior distributions for each of the unknown parameters are shown in Fig. 3. Fig. 4 depicts the joint density plot of two unknown parameters: the absorption rate (ka ) and the apparent volume (V/F). These figures exhibit a prior distribution that does not follow a normal distribution as commonly assumed. The prior is a multimodal distribution which is modeled using a mixture composed of 19 Gaussian distributions. The parameters defining each component of the prior distribution are provided in the Supplemental Material. It is noteworthy that given the multimodality of the prior distribution, an appropriate approximation would not be obtained by using a single sparse grid for all of its components. Consequently, we construct the approximation employing a sparse grid for every component. Table 2 lists the results of applying the proposed approach to this problem and the corresponding model size. It presents the results for: (i) different number of experiments which is given by M, the cardinality of set m (Eq. (32)), and (ii) for different accuracy levels. With regard to the accuracy levels, as accuracy increases it is evident that the results are converging to the same utility (U) optimal value.

319

3 2 1 0

Gabapentin concentration (mg/l)

4

J.M. Laínez-Aguirre et al. / Computers and Chemical Engineering 72 (2015) 312–324

0

5

10

15

20

25

30

35

25

30

35

Time (hours)

3 2 1 0

Fig. 3. Marginal distributions for the unknown parameters included in the prior distribution of the gabapentin pharmacokinetic model.

Gabapentin concentration (mg/l)

4

(a) G06

0

5

10

15

20

Time (hours)

(b) G11

Fig. 4. Prior pairwise density plot for gabapentin unknown parameters ln(F/V) and ln(ka ).

To determine the number of points to use in the optimal design, one should bear in mind that an extra experiment should only be added if it provides a significant gain in information. Hereafter, we use the last case presented in Table 2 for this exercise (accuracy level value of two for both unknown set of parameters  and y). The optimal values of the utility function U, which has been simplified to Eq. (8), are −2.55, −1.59, and −1.58 nats for an optimal design of one, two and three experiments, respectively. One can see that there is a significant gain in executing two experiments against one experiment, however there is a marginal gain in adding a third experiment as compared to just using two experiments. As a result, two experiments at 1.65 h and 5.98 h after drug administration are recommended to characterize the gabapentin pharmacokinetic profile of an individual. The sizes of the nonlinear programs (NLP) generated are also reported in Table 2. They were solved using CONOPT3 (Drud, 2011). The CPU time required to solve the resulting models is listed in Table 2 as well. To demonstrate the performance of the proposed solution, we present a retrospective analysis using the data collected by Urban et al. (2008) for 19 patients (G01–G19). Given the retrospective nature of this exercise, the suggested optimal timing of the

Fig. 5. Comparison of the predicted 95% confidence region for the concentration–time profile of gabapentin after the administration of a single dose of 400 mg. The baseline is shown as the shaded region (in gray); while the proposed and the D-optimal design predictions are shown as the region within continuous and dashed lines (in blue), respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

experiments does not coincide with those included in the Urban et al. study. However, samples after 1.5 and 6.0 h are part of their study and these sampling times are close enough to the recommended optimal design. As a matter of fact, the utility function value of −1.60 nats result from using this quasi-optimal design (i.e., an optimal gap of just 0.63%). We then compare the complete serial sampling of 14 blood draws, which is used as baseline, against the quasi-optimal design of two blood draws. The comparison is carried out based on the 95% confidence region of the predicted concentration–time profile after the administration of a single dose as previously discussed. In the second and third column of Table 3 the quantitative results of this comparison are listed for the 19 patients. Note that the precision ranges from 54.29% for patient G14 to 99.97% for patient G09, whereas the recall ranges from 93.88% for patient G14 to 100% for seven patients. This indicates that the 95% confidence region obtained from the proposed approach includes most of the baseline predictions; however, in some cases close to 50% of the enclosed predictions do not belong to the studied patient (e.g., G13 and G14). Using the proposed design the precision and

320

J.M. Laínez-Aguirre et al. / Computers and Chemical Engineering 72 (2015) 312–324

recall are on average 78.0% and 99.1%, respectively. The ‘cost’ of following the optimal sampling design is the average loss of 22.0% and 0.9% in precision and recall, respectively. However, this design also permits a reduction in the number of blood samples that must be drawn from a patient and then analyzed from 14 to 2 which is a saving of 86%. The D-optimal design for two blood draws has been also computed using the Fedorov–Wynn algorithm. In this case, potential designs are built using the actual sampling times that were collected in the clinical study. The determinant of the FIM is evaluated for 66 different potential designs. The D-optimal design is to draw blood samples 2 h and 12 h after the drug administration. Table 3 also presents the performance of the D-optimal design in the fourth and fifth column. The precision ranges from 24.36% for patient G09 to 96.59% for patient G12, while the recall ranges from 94.85% for patient G18 to 100% for only one patient, G13. On average, a precision and recall value equal to 59.35% and 98.13% is obtained by using the D-optimal design. The last two columns of Table 3 show the differences in performance between the proposed approach and the D-optimal design for every patient. It is noteworthy that there is a significant advantage in the design performance by using the proposed approach. The stochastic programming formulation outperforms the D-optimality by yielding an increase of 31% in precision. Fig. 5 shows a graphical comparison for two patients, G06 and G11. 5.2. Cyclophosphamide pharmacokinetics This case study deals with the pharmacokinetics of cyclophosphamide (CY) which is a therapeutic agent used for treating some forms of cancer. CY is activated in the body to hydroxycyclophosphamide (HCY) which is then metabolized to Carboxyethylphosphoramide mustard (CEPM) (McDonald et al., 2003). CY has a complex metabolism whose most relevant pathways can be represented by a four compartment model. We employ the data published by Salinger et al. (2006) to build the prior distribution and for the retrospective performance evaluation. This data was collected by administering an initial intravenous infusion of 45 mg CY per kg body weight (Dose1 ) during approximately 1 h to each individual. The pharmacokinetic model is described by the following system of nonlinear differential equations: ˆ 1 (t) dA CLnon ˆ CL ˆ ˆ = ko − A1 (t) − ind A 1 (t)A2 (t) V Vcy dt cy ⎛ ⎞

⎜ ˆ 1 (t) ⎟ ˆ 2 (t) (Emax /Vcy )A dA ⎟ − kenz Aˆ 2 (t) = kenz ⎜1 + ⎝ dt ˆ 1 (t) ⎠ A EC50 + Vcy ˆ dA3 (t) CLind ˆ ˆ 2 (t) − khcy A ˆ 3 (t) − k34 A ˆ 3 (t) = A1 (t)A Vcy dt ˆ 4 (t) dA ˆ 3 (t) − kcepm A ˆ 4 (t) = k34 A dt  Dose1 , if 0 h ≤ t ≤ 1 h k0 = h 0, otherwise ˆ ˆ ˆ 3 (0) = 0; A ˆ 4 (0) = 0 A1 (0) = 0; A2 (0) = 1; A ˆ 1 (t) [␮mol/kg], A ˆ 3 (t) [␮mol/kg], and A ˆ 4 (t) [␮mol/kg] are where A ˆ 2 (t) [adim.] is amounts of CY, HCY, and CEPM, respectively, and A the amount of an autoinducible enzyme. ko represents the rate of infusion. CLnon [L/(h kg)] and CLind [L/(h kg)] are noninducible and inducible clearances for CY, respectively; whereas kenz [h−1 ], khcy [h−1 ] and kcepm [h−1 ] are the elimination rate constants of enzyme, HCY and CEPM, respectively. k34 [h−1 ] is the conversion rate from HCY to CEPM. The previous six parameters constitute the unknowns (). Parameters Vcy [L/kg], Emax [adim.] and EC50

[␮mol/L] are assumed constant at 0.694 L/kg, 5, and 0.6 ␮mol/L, respectively. Emax and EC50 are enzyme synthesis parameters. Vcy [L/kg] is the apparent volume of distribution of CY. Plasma concentrations are predicted by dividing the amounts by their respective volumes of distribution. The HCY and CEPM volumes (Vhcy ,Vcepm ) are assumed equal to 0.013 L/kg. In the Supplementary Material, the information about the prior distribution used in this case study is provided. The marginal prior distributions for each of the six unknown parameters are shown in Fig. 6. Fig. 7 depicts the joint density of two unknown parameters: CLnon and khcy . These figures exhibit prior distributions that are of single mode, but clearly do not follow a normal distribution as commonly assumed. Table 4 presents the results of applying the proposed approach to determine the optimal sampling times. It shows the results for: (i) different number of experiments (M), and (ii) for different accuracy levels. The sizes of the resulting MINLP’s are also listed in this table. They were optimized using the solver SBB (ARKI Consulting & Development, 2011). Additional computational resources and/or a decomposition strategy are required to run the optimization for higher accuracy levels. To choose the experimental design, we employ the results corresponding to an accuracy level of two for both uncertain parameters. In order to determine the number of experiments, we again check the objective function gain that is achieved by adding a new experiment. The values of the objective function are −363.4 × 10−4 , −6.19 × 10−4 , and 0.16 × 10−4 nats for one, two and three experiments, respectively. By comparison to one experiment, the utility function is improved by 98.30% by using a second experiment, and an extra 1.65% is gained by executing a third experiment. Consequently, two experiments are recommended at 2.0 and 6.7 h after the end of the CY infusion. It is noteworthy that even though the resulting optimal experiments are different for the tested accuracy levels, the value of the utility function converges. For instance, a value of −6.17 × 104 nats is accomplished by sampling at 2.0 and 6.0 h as suggested by the solution corresponding to an accuracy level value of one. Moreover, a value of 0.17 × 10−4 , virtually the same as that resulting from an accuracy level equal to two and the use of three experiments, is obtained by sampling at 2.0, 4.0 and 10.0 h. The focus of convergence is on the objective function, since the possibility of local optimality exists. However, given an appropriate initial solution the optimization procedure is capable of suggesting an improved solution which indeed is of interest for practical purposes. To demonstrate the performance of the proposed design, we again present a retrospective analysis using the data published by Salinger et al. (2006) for 20 individuals (C01–C20). The same difficulty as in the gabapentin case is faced with this problem. Salinger et al.’s published data does not include the recommended optimal sampling points. The closest combination of sampling times is 2.0 and 8.0 h which yields a utility function value of −6.21 × 10−4 nats, i.e., an optimality gap of 0.45%. We compare the serial sampling of five or six blood draws, which is used as baseline, against the quasi optimal design of two blood draws (2.0 and 8.0 after infusion). The comparison is carried out based on the 95% confidence region for the area under the concentration-time curve (AUC) of the two relevant metabolites, HCY and CEPM. In Table 5 the results of this comparison are quantified for the entire trial involving the 20 patients. The precision ranges from 43.59% for individual C06 to 99.74% for individual C19, while the recall ranges from 96.42% for individual C06 to 99.97% for individual C02. Consequently, the suggested design provides on average precision and recall values of 87.59% and 99.23%, respectively. However, it also allows a reduction in the number of samples from 5 or 6 to 2 while still acquiring very similar information. For comparison purposes, the D-optimal design has been computed by using the Fedorov–Wynn algorithm. Again, potential

J.M. Laínez-Aguirre et al. / Computers and Chemical Engineering 72 (2015) 312–324

321

Fig. 6. Marginal distributions for the unknown parameters included in the prior distribution of the CY pharmacokinetic model.

Fig. 7. Prior pairwise plots for the unknown parameters CLnon and khcy of the cyclophosphamide pharmacokinetic model.

Table 4 Optimal sampling design for the CY case study for different number of samples (M) and accuracy levels. M



Equations

Cont. Var.

u1

u2

u3

1 2 3

1

20

1

20

7542 8423 9304

7062 7943 8824

6 12 18

1.76 1.95 2.20

3.93 2.00 2.00

– 6.00 4.01

– – 10.00

−3.62 × −02 −6.17 × −04 −1.66 × −05

1 2 3

2

260

2

260

285,222 421,463 557,704

278,982 415,223 551,464

6 12 18

373.31 264.49 160.13

3.82 2.00 2.00

– 6.68 6.00



−3.63 × −02 −6.19 × −04 −1.62 × −05

1

3

1460

3

1460

6,842,262

6,807,222

18







Accuracy

y Scenarios

Accuracy

Model size Scenarios

designs are built based on the sampling times that were used in the clinical study. The determinant of the FIM is evaluated for 21 different potential designs. The resulting D-optimal design is to collect blood samples 2 h and 16 h after starting the infusion. Table 5 also lists the performance of the D-optimal design. Its precision ranges from 41.30% for patient C05 to 97.57% for patient C19, while the recall ranges from 98.40% for patient C18 to 100% for six patients. On average, a precision and recall value equal to 88.80% and 99.47% is obtained by using the D-optimal design. The last two columns of

CPU s Bin. Var.

>3600

Sampling times [h]

U [nats]

17.14



Table 5 show the differences in performance between the proposed approach and the D-optimal design for every patient. It is noteworthy that even for this case which uses a prior distribution which is not multimodal, an improvement in the design performance can be achieved by using the proposed approach, thus demonstrating the impact of nonlinearities in the underlying assumptions of the D-optimal design. The stochastic programming formulation still is able to provide an increase of 8.40% in precision in comparison with the D-optimal design, while decreasing the recall metric in a very

322

J.M. Laínez-Aguirre et al. / Computers and Chemical Engineering 72 (2015) 312–324

Table 5 Precision and recall resulting from the proposed optimal and the D-optimal design for the CY case study. Patient

Proposed approach

D-optimality



P

R

P

R

P

R

C01 C02 C03 C04 C05 C06 C07 C08 C09 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20

82.11 86.29 96.29 98.90 98.73 43.59 88.42 89.07 77.77 93.50 90.71 93.34 95.15 90.74 86.27 83.00 77.83 93.08 99.74 87.21

99.89 99.97 99.50 99.23 97.98 96.42 98.53 99.61 99.79 99.62 99.61 99.89 99.59 98.90 99.89 99.38 99.36 99.01 98.59 99.81

84.28 85.89 95.62 69.43 41.30 87.14 92.29 89.82 68.96 69.71 92.15 68.54 84.42 86.15 77.52 84.74 73.05 86.33 97.57 81.06

100.00 99.66 99.65 99.32 100.00 99.00 99.60 99.85 99.31 100.00 99.17 100.00 98.99 98.90 99.69 99.69 98.98 98.40 99.22 100.00

−2.18 0.40 0.67 29.46 57.43 −43.55 −3.87 −0.74 8.82 23.79 −1.44 24.80 10.73 4.59 8.75 −1.74 4.78 6.74 2.17 6.15

−0.11 0.31 −0.15 −0.09 −2.02 −2.58 −1.07 −0.24 0.48 −0.38 0.45 −0.11 0.60 0.00 0.20 −0.32 0.38 0.62 −0.63 −0.19

Average

87.59

99.23

80.80

99.47

6.79

−0.24

Fig. 8. Comparison of the 95% confidence region of the AUC [␮mol h/kg] of HCY and CEPM for both optimal designs. The baseline is shown as a black line, while the suggested optimal sampling is represented as a blue (lighter) line. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

J.M. Laínez-Aguirre et al. / Computers and Chemical Engineering 72 (2015) 312–324

small percentage (0.24%). Fig. 8 shows a graphical comparison for three patients, C04, C05 and C06 for both approaches. 6. Concluding remarks This work proposes a stochastic programming approach to address the optimal Bayesian experimental design of nonlinear systems. Instead of using a Gaussian approximation to this problem, we suggest the implementation of an approximation which is based on using sparse grids and the orthogonal collocation approach for multivariate integration. This framework accommodates in a straightforward fashion the two sources of uncertainty arising in this problem: (i) the unknown parameters of the predictive model, and (ii) the experimental outcomes. Moreover, the formulation proposed in this work has general relevance beyond the domain of the reported case studies. It can deal with very general models which may be comprised of a system of differential-algebraic equations. However, one of the limitations of the proposed approach is the size of the NLP/MINLP models that are generated. We have shown that useful results can be obtained using low accuracy levels. For a model with six unknown correlated parameters and a set of four differential equations, the problem could be solve within reasonable time (≈160 CPU s). Evidently, to deal with problems demanding higher accuracy or involving a significant greater number of unknown parameters, a decomposition strategy may be required (see the last row in Table 4). In this work we have solved the problem to local optimality. However, to be rigorous, a global optimizer should be used. We have taken advantage of the heuristic procedure reported in Laínez et al. (2011) to generate plausible initial solutions. This a recommended strategy which can select initial designs based on deriving predictive distributions for the metrics of interest by using the available prior distribution. With respect to the performance of the proposed approach, we have demonstrated that the suggested optimal designs have the potential of considerably reducing the cost of the experimentation, while providing virtually the same information as an exhaustive design. Additionally, the proposed approach has been compared to the D-optimal design. The results from this comparison, particularly the cyclophosphamide case study, emphasize the fact that is not only the non-normal priors which may induce significant deviations from the underlying normality assumptions of the D-optimal design, but also the nonlinearities in the model employed to describe the system under investigation. The proposed approached is capable of capturing the implications of non-normal posteriors in the computation of the optimal design. This is achieved by utilizing a more appropriate representation of the uncertainty associated with the two unknown components of this problem, the model parameters and the collected data. It should be noted that although the case studies discussed in this work focus on selecting the time of sampling, other experimental design variables ( u) can be treated using the formulation presented in Section 3.2. The predictive model given in Eqs. (14)–(17) represents a general system of differentialalgebraic equations and thus covers a wide range of potential applications, including systems of partial differential equations (PDE’s). This generalization is possible through further application of orthogonal collocation to approximate the spacial dimensions of the PDE’s along with the use of the Smolyak rule. Acknowledgements We gratefully acknowledge support from the United States National Science Foundation (Grant NSF-CBET-0941302), and the

323

collaboration with colleagues Gary Blau and Seza Orc¸un in this area of research. We also would like to thank University of California, San Francisco for providing the data used in the gabapentin case study. Appendix A. Supplementary Data Supplementary data associated with this article can be found, in the online version, at doi:http://dx.doi.org/10.1016/j. compchemeng.2014.06.006. References ˜ JE, Theodoropoulos C. Optimal spatial sampling scheme for parameter Alana estimation of nonlinear distributed parameter systems. Comput Chem Eng 2012;45:38–49. ARKI Consulting & Development. SBB. Bagsvaerd, Denmark: ARKI Consulting & Development A/S; 2011. Barthelmann V, Novak E, Ritter K. High dimensional polynomial interpolation on sparse grids. Adv Comput Math 2000;12:273–88. Bialecki B, Fairweather G. Orthogonal spline collocation methods for partial differential equations. J Comput Appl Math 2001;128(1–2):55–82. Bishop CM. Pattern recognition and machine learning. New York, NY: Springer Science; 2006. Blau G, Lasinski M, Orcun S, Hsu S-H, Caruthers J, Delgass N, et al. High fidelity mathematical model building with experimental data: a Bayesian approach. Comput Chem Eng 2008;32(4–5):971–89. Bungartz H-J, Griebel M. 5 Sparse grids. Acta Numer 2004;13:147–269. Chaloner K, Verdinelli I. Bayesian experimental design: a review. Stat Sci 1995;10(3):273–304. Chernoff H. Locally optimal designs for estimating parameters. Ann Math Stat 1953;24(4):586–602. Chung SH, Ma DL, Braatz RD. Optimal model-based experimental design in batch crystallization. Chemom Intell Lab Syst 2000;50(1):83–90. Cuthrell JE, Biegler LT. On the optimization of differential-algebraic process systems. AIChE J 1987;33(8):1257–70. Davis PJ, Rabinowitz P. Methods of numerical integration. 2nd ed. Mineola, N.Y.: Dover Publications; 2007. Drud A. CONOPT. Bagsvaerd, Denmark: ARKI Consulting and Development A/S; 2011. Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett 2006;27(8):861–74. Fedorov V. Optimal experimental design. Wiley Interdiscipl Rev: Comput Stat 2010;2(5):581–9. Fedorov VV. Theory of optimal experiments. Academic Press; 1972. Gagnon R, Leonov S. Optimal population designs for pk models with serial sampling. J Biopharm Stat 2005;15:143–63. Galvanin F, Ballan CC, Barolo M, Bezzo F. A general model-based design of experiments approach to achieve practical identifiability of pharmacokinetic and pharmacodynamic models. J Pharmacokinet Pharmacodyn 2013;40:451–67. Heiss F, Winschel V. Likelihood approximation by numerical integration on sparse grids. J Econom 2008;144(1):62–80. Heiss F, Winschel V. Quadrature on sparse grids: code to generate and readily evaluated nodes and weights; 2014, January http://www.sparse-grids.de/ Kennedy M, O’Hagan A. Bayesian calibration of computer models. J R Stat Soc B 2001;63:425–64. Laínez JM, Blau G, Mockus L, Orc¸un S, Reklaitis GV. Pharmacokinetic based design of individualized dosage regimens using a Bayesian approach. Ind Eng Chem Res 2011;50(9):5114–30. Laínez-Aguirre J, Reklaitis G. A stochastic optimization approach for the design of individualized dosage regimens. AIChE J 2013;59(9):3296–307. Lindley LV. On the measure of information provided by an experiment. Ann Stat 1956;27. McDonald GB, Slattery JT, Bouvier ME, Ren S, Batchelder AL, Kalhorn TF, et al. Cyclophosphamide metabolism, liver toxicity, and mortality following hematopoietic stem cell transplantation. Blood 2003;101(5): 2043–8. Navarro-Laboulais J, Cardona S, Torregrosa J, Abad A, López F. Practical identifiability analysis in dynamic gas–liquid reactors: optimal experimental design for mass-transfer parameters determination. Comput Chem Eng 2008;32(10): 2382–94. Rasch A, Bucker HM, Bardow A. Software supporting optimal experimental design: a case study of binary diffusion using EFCOSS. Comput Chem Eng 2009;33(4):838–49. Rasmussen CE, Williams CKI. Gaussian processes for machine learning. Cambridge, MA: MIT Press; 2006. Retout S, Comets E, Samson A, Mentré F. Design in nonlinear mixed effects models: optimization using the Fedorov–Wynn algorithm and power of the Wald test for binary covariates. Stat Med 2007;26(28):5162–79. Rosenthal RE. GAMS – a user’s guide. Washington, DC: GAMS Development Corporation; 2011. Salinger DH, McCune JS, Ren AG, Shen DD, Slattery JT, Phillips B, et al. Realtime dose adjustment of cyclophosphamide in a preparative regimen for

324

J.M. Laínez-Aguirre et al. / Computers and Chemical Engineering 72 (2015) 312–324

hematopoietic cell transplant: a Bayesian pharmacokinetic approach. Clin Cancer Res 2006;12(16):4888–98. Sedrati Y, Cabassud M, Lann ML, Casamatta G. Sequential experimental design strategy for kinetic parameters estimation. Comput Chem Eng 1999;23(Suppl.):S427–30.

Sharif K, Rahman M, Azmir J, Mohamed A, Jahurul M, Sahena F, et al. Experimental design of supercritical fluid extraction: a review. J Food Eng 2014;124:105–16. Urban T, Brown C, Castro R, Shah N, Mercer R, Huang Y, et al. Effects of genetic variation in the novel organic cation transporter, OCTN1, on the renal clearance of gabapentin. Clin Pharmacol Ther 2008;83(3):416–21.