Estimation and identification in batch processes with particle filters

Estimation and identification in batch processes with particle filters

Journal of Process Control 81 (2019) 1–14 Contents lists available at ScienceDirect Journal of Process Control journal homepage: www.elsevier.com/lo...

2MB Sizes 0 Downloads 65 Views

Journal of Process Control 81 (2019) 1–14

Contents lists available at ScienceDirect

Journal of Process Control journal homepage: www.elsevier.com/locate/jprocont

Estimation and identification in batch processes with particle filters Zhonggai Zhao a,c,∗ , Aditya Tulsyan b , Biao Huang c , Fei Liu a a

Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi 214122, China Department of Chemical Engineering, Massachusetts Institute of Technology, USA c Department of Chemical and Materials Engineering, University of Alberta, Edmonton, AB T6G 2G6, Canada b

a r t i c l e

i n f o

Article history: Received 13 August 2017 Received in revised form 2 May 2019 Accepted 30 May 2019 Keywords: Estimation Identification Batch process Real-time Bayesian Two-dimensional Dual particle filters

a b s t r a c t This paper deals with the problem of real-time estimation and identification in batch processes represented by stochastic nonlinear state-space models (SSMs). To address this, the two problems – estimation and identification – are formulated under a common Bayesian filtering framework. In contrast to a continuous process, a batch process is typically operated for a relatively shorter duration, which limits the number of measurements available in a campaign to effectively solve the filtering problem. To overcome the limitations of existing solutions, we propose a two-dimensional dual particle filtering algorithm to solve the nonlinear filtering problem. The proposed algorithm performs filtering both along the time and the batch directions. This allows to combine measurements from multiple short campaigns for improved estimation and identification results. The efficacy of the proposed method is demonstrated on two simulation examples. © 2019 Elsevier Ltd. All rights reserved.

1. Introduction Batch processes are ubiquitous in process industries, and have found applications in the production of specialty chemicals, polymers, and pharmaceuticals[47,48]. Compared to a continuous process, a batch process is typically used in situations when: (a) the production volumes are low; (b) isolation is required for reasons of sterility or safety; and (c) the materials involved are difficult to handle. With the recent rising trend in building small, flexible, and modular plants that are close to the markets of consumption, there has been a renewed interest in batch processing [1]. Batch processing typically requires operations over a range of conditions, which lead to nonstationary process operations, and a highly nonlinear process model representation [49,50,51]. Owing to their unique operations, real-time estimation and identification of batch process models are critical to ensure that the performance of advanced model-based control systems (e.g., model predictive control) stay optimized [52,53]. For a batch process represented by a state-space model (SSM) – a widely used class of time-series models – the estimation and identification problems are equivalent to the state and parameter estimation problems, respectively. While the two problems have conventionally been viewed as two separate

∗ Corresponding author at: Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi 214122, China E-mail address: [email protected] (Z. Zhao). https://doi.org/10.1016/j.jprocont.2019.05.019 0959-1524/© 2019 Elsevier Ltd. All rights reserved.

problems, it is well known that the performance of a state estimation algorithm is contingent on the quality of the identified model. The conventional strategy to state estimation under unknown parameters is a two step process – first identify the process parameters in an offline settings, and then use the identified model for real-time state estimation. In this sequential approach, the identification step is typically the most difficult step. The existing and current developments in both Bayesian and likelihood-based methods for identification of a nonlinear SSM are briefly reviewed here. A detailed exposition of identification methods can be found in [2,3]. The maximum-likelihood (ML) is an important class of methods for identification of nonlinear SSMs. The ML method maximizes the probability or likelihood of the sampled data over the parameter space. A gradient method is typically used to identify the optimal model. While a gradient method is relatively simpler to implement for a linear SSM with Gaussian noise [4], or for a finite SSM [5], it is generally nontrivial to use for nonlinear SSMs. This is because no analytical solutions to the likelihood or the gradients exist for nonlinear SSMs [6]. This problem is further exacerbated as computing the likelihood or the gradients require knowledge of the key hidden states. In [6,7] the authors propose a simulation-based method to approximate the gradient-based ML method for nonlinear SSMs under hidden states; however, the method is known to scale poorly for large dimensional problems [2,8]. The expectation maximization (EM) is an alternative ML-based method for identification of nonlinear SSMs [9,10]. The key theoretical advantage of using an ML-based estimator, such as a gradient [6,7] or the EM algorithm

2

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

[11,12] is that these methods yields asymptotically efficient estimates; however, in situations where the likelihood functions are non-convex in model parameters, the use of numerical methods yield locally optimal (or biased) estimates [8,13] or requires careful tuning of the algorithm parameters [2]. Furthermore, the EM method is not amenable for real-time estimation and identification [3]. The Bayesian methods are a popular class of methods for realtime estimation and identification in nonlinear SSMs [3,14,15]. Under the Bayesian framework, a customary approach is to assume the model parameters to be random, as opposed to deterministic (as considered under the ML framework) [54]. The first Bayesian approach augments the random parameters with the hidden states to form an extended state vector [14]. This formulates the estimation and identification problems simultaneously under a single filtering framework. Once set-up, any filtering method, such as extended Kalman filters, unscented Kalman filters or particle filters can be employed to solve the filtering problem [3]. Alternatively, another strategy is to employ dual particle filters [16,17], where one particle filter is used for state estimation, and another one for parameter estimation. In dual particle filters, the estimation and identification are achieved through iterations between the two filters. While the aforementioned ML and Bayesian methods were originally developed for estimation and identification in continuous processes, not much work has been done to extend their use for batch processes. This is because batch processes have complex operations which invalidates any direct application of continuous methods to batch processes. For example, compared to continuous processes that typically run on time scales lasting for several months or years, batch processes operate iteratively in batches, with each campaign lasting for a few days or weeks. Of course one can choose to ignore this distinctive feature of a batch operation, and continue using continuous methods for batch processes – as in [15], where the author used particle filters for estimation and identification in batch processes by treating campaigns as individual continuous processes. Though plausible, but from an estimation perspective, due to the short duration of a batch campaign, the performance of continuous methods is limited as measurements sampled within a campaign are limited. Recall that most of the existing Bayesian and ML methods are asymptotic methods that rely on a continuous stream of measurements. In other words, the continuous methods are guaranteed to asymptotically converge to the true parameters only if the process is run long enough (akin to stating that the number of measurements within a campaign goes to infinity). Now since a batch campaign is typically run for a shorter duration, it is difficult for continuous methods to converge within the campaign time. Furthermore, the nonlinear, non-Gaussian, and nonstationary operations of a batch process further limit the quality of the estimates obtained with continuous methods. This paper deals with the problem of real-time estimation and identification in batch processes represented by nonlinear SSMs. Despite all the previous attempts to solve this problem, including [18,19], existing methods fail to fully utilize the information contained in a batch data. For example, in [15], the parameter and state are estimated simultaneously by the dual particle filters, however, the estimation is limited within a batch campaign, and the continuity of this algorithm is guaranteed only along the time dimension; in [19], the authors perform estimation along the time and batch directions, but then identification is only performed along the batch direction. In this paper, we propose a two-dimensional (2D) dual particle filter that allows for real-time estimation and identification of batch processes. To the best of authors’, the proposed method is the first algorithm that fully exploits the dynamics of a batch process both across the time and batch directions to deliver high-quality state and parameter estimates. The proposed method

maintains the continuity in estimation by carefully linking the terminal state of a batch campaign to the next campaign. This allows the proposed 2D dual particle filter to smoothly transition from one campaign to the next. Furthermore, since the identification and estimation are performed in real-time, the proposed method can quickly track the changes in time-varying parameters, and ensure that the quality of state estimates remain optimal across campaigns. The remainder of this paper is organized as follows. Problem description is given in Section 2. The recursive state estimation is discussed in Section 3. The recursive state and parameter estimation is discussed in Section 4. Section 5 proposes the twodimensional dual particle filtering method for estimation and identification in batch processes. Section 6 demonstrates the efficacy of the proposed method in a simple numerical example and a beer fermentation process. The conclusions are presented in Section 7. 2. Problem description Consider a batch process campaign represented by the following stochastic nonlinear state-space model (SSM) Model 1. Batch process model (1a)Xi,0 ∼p(xi,0 ), x , (1b)Xi,t = f (Xi,t−1 , Ui,t−1 , ) + Wi,t−1 y

(1c)Yi,t = h(Xi,t , Ui,t , ) + Vi,t , where i ∈ N denotes the batch or campaign index; Xi,t ∈ Rnx describes a discrete-time, first-order Markov state process characterized by an initial density Xi,0 ∼ p(xi,0 ) and a transition density defined by the state transition function f : Rnx × Rnu × Rn → Rnx ; the control variables and model parameters are denoted by Ui,t ∈ Rnu and  ∈ Rn , respectively. While the process states {Xi,t }i,t ∈ N are assumed to be hidden or latent, they are indirectly observed through the measurement process, denoted by Yi,t ∈ Rny . The measurement process {Yi,t }i,t ∈ N is assumed to be conditionally independent given {Xi,t , Ui,t }i,t ∈ N , and is characterized by a conditional marginal density described by h : Rnx × Rnu × Rn → Rny . y x ∈ Rny are the state and measurement Finally, Vi,t ∈ Rnx and Wi,t noise sequences, respectively. Model 1 assumes that a campaign i ∈ N runs for t = 0, 1, . . ., Ti , where Ti ∈ N is the final time for the campaign i before the process transits to the campaign i + 1, where it runs for t = 0, 1, . . ., Ti+1 . Model 1 represents a typical batch behavior with multiple campaign runs. Next we make several assumptions on Model 1. Assumption 1.

y

{Vi,t }

i,t ∈ N

x } ∈ Rnx and {Wi,t

i,t ∈ N

∈ Rny are inde-

pendent sequences of independent random variables distributed y x ∼p(W ), respectively, and defined according to Vi,t ∼p(vi,t ) and Wi,t i,t independent of Xi,0 ∼ p(xi,0 ). y

Assumption 2. The densities Xi,0 ∼ p(xi,0 ), and Vi,0 ∼p(vi,t ) and x ∼p(w ) for all i, t ∈ N are known in their distribution (e.g., Wi,0 i,t Gaussian, uniform) and are parametrized by a finite number of moments (e.g., mean, variance). Note that Model 1 represents a general class of stochastic nonlinear SSM for batch processes with multiple campaigns. In fact, several batch unit operations, such as fed-batch bioreactors can be represented using Model 1. Compared to a continuous process that evolves over time, observe that Model 1 evolves over the space of batch-runs – indexed by i ∈ N, and time t ∈ N. In fact, by setting i = 1, Model 1 can be used to represent a continuous process. The main problem addressed in this paper is outlined next. Problem 1. Given a sequence of input and output measurements, denoted by {Ui,1:t = ui,1:t }i,t ∈ N and {Yi,1:t = yi,1:t }i,t ∈ N , respectively,

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

sampled from a batch process represented by Model 1 under Assumptions 1 and 2, compute a real-time estimate of the states {Xi,t }i,t ∈ N and the model parameters . Problem 1 is referred to as the real-time state and parameter estimation problem. As discussed in Section 1, Problem 1 is an active area of research not only for batch processes but also for continuous processes. This is because computing a closed-form solution to Problem 1 is nontrivial. In the next section, we discuss several plausible formulations for the state and parameter estimation problem in Problem 1, and discuss some of the approximate solutions, including the proposed solution. 3. Recursive state estimation To reduce the burden of notation, we have dispensed with the input signal {Ui,t }i,t ∈ N in the rest of the paper; however, note that all the derivations and results presented in this paper hold with {Ui,t }i,t ∈ N included. Assuming, for now, that  ∈ Rn is exactly known, Problem 1 reduces to a batch state estimation problem. Under the Bayesian framework, a batch state estimation problem can be cast as a filtering problem [20]. The idea of Bayesian filtering is to recursively construct a state posterior probability density function (PDF) denoted by p(xi,t |yi,1:t , ) for all i, t ∈ N. Physically, a posterior density is a probabilistic representation of available statistical information on {Xi,t }i,t ∈ N conditioned on the measurement sequence {Yi,1:t }i,t ∈ N . From the Markov property of Model 1, and from the Bayes’ theorem, p(xi,t |yi,1:t , ) can be alternatively written as follows: p(xi,t |yi,1:t , ) =

p(yi,t |xi,t , )p(xi,t |yi,1:t−1 , )

where

p(yi,t |yi,1:t−1 , )

,

(2)



p(yi,t |yi,1:t−1 , ) =

p(yi,t |xi,t , )p(dxi,t |yi1:t−1 , ),

(3)

Rnx

is a constant, p(dxi,t |yi,1:t−1 , ) ≡ p(xi,t |yi,1:t−1 , )dxi,t denotes a prior state distribution function, and p(xi,t |yi,1:t−1 ) is a prior state density function which can be computed as follows



p(xi,t |yi,1:t−1 , ) =

p(xi,t |xi,t−1 , )p(dxi,t−1 |yi,1:t−1 , ),

(4)

Rnx

where p(dxi,t−1 |yi,1:t−1 , ) ≡ p(xi,t−1 |yi,1:t−1 , )dxi,t−1 is the state posterior distribution function at t − 1. Now ignoring the normalizing constant term in (2) yields p(xi,t |yi,1:t , ) ∝ p(yi,t |xi,t , )p(xi,t |yi,1:t−1 , ).

(5)

3

exactly represented by the Kalman filter (KF) using a finite number of moments (e.g., mean, variance); whereas, for the model choice in Model 1, even when  is known, the posterior density in (5) is generally non-Gaussian, and at least in theory, an infinite number of moments are required for its exact representation [22]. Thus, with finite computing capabilities, an optimal state filter for Model 1 is not realizable [23]. In the last few decades, several approximate nonlinear state filters based on statistical and analytical approximations of the optimal nonlinear filter have been developed for state estimation in nonlinear SSMs [24–26]. Most of these nonlinear filters can be classified as either Kalman-based filters or sequential Monte-Carlo (SMC)-based filters. Both the Kalman and SMC-based filters are tractable in finite computational time and can be used for state estimation in general or specific types of nonlinear SSMs. A detailed exposition of nonlinear filtering methods, related approximations, and analysis is not included here, but can be found in [27,28], and the references cited therein. 3.1. Particle filtering for state estimation The SMC-based filtering methods, popularly referred to particle filters are an importance class of filtering methods for state estimation in nonlinear SSMs. Some of the popular particle filtering algorithms, include sampling importance resampling (SIR) filter, auxiliary SIR (ASIR) filter, and Rao-Blackwellized particle filter (RBPF). The SIR method is the simplest of all particle filtering algorithms. The theory of SIR filter is well established in literature, and therefore details are skipped here; however, readers interested in this topic are referred to the tutorial [29]. The essential idea behind any SMC-based method is to approximate a given density function using a set of random particles or samples generated from it. For batch state estimation applications, the target density of interest is the posterior density in (5). Note that generating particles from the posterior density in (5) is nontrivial for Model 1. This is because the posterior density with Model 1 is generally complex and non-Gaussian. To address this, an SIR filter employs an alternative sampling density that is relatively easier to generate samples from. Once particles from the sampling density are made available, importance weights are calculated, which measure the likelihood of sampled particles to have come from the target density. Formally, as discussed in [29], we first need to identify the target and sampling densities in the filtering problem. An SIR filter identifies the sampling and target densities as follows: p(xi,t |yi,1:t , ) ∝ p(yi,t |xi,t , )p(xi,t |yi,1:t−1 , ).









target density





(8)

sampling density

The recurrence relation between the prediction and the update steps in (4) and (5), respectively, provides a complete Bayesian solution to the state estimation problem for Model 1. Now to extract a point estimate of {Xi,t }i,t ∈ N from p(xi,t |yi,1:t , ), a decision-theoretic approach is to minimize the risk function

distributed (i.i.d.) random particles from the sampling density p(xi,t |yi,1:t−1 , ) then the importance weights for the particles can be calculated as follows

RM (xi,t ) = E[xi,t − xˆ i,t|t 22 |yi,1:t , ],

wi,t|t−1 ∝ p(yi,t |xi,t|t−1 , ),

(6)

where RM is a mean-square error (MSE) risk, xˆ i,t|t is a point estimate for {Xi,t }i,t ∈ N , · 2 is a 2−norm operator, and E[ · ] is the expectation operator with respect to the state posterior density function. Minimizing RM (xi,t ) yields a minimum mean-square error (MMSE) estimate and is given by [21]



xˆ i,t|t ≡ E[xi,t |yi,1:t , ] =

(k)

If {xi,t|t−1 }

N k=1

denotes a set of N independent and identically

(k)

(k)

(k)

where {wi,t|t−1 }

N k=1

xi,t p(dxi,t |yi,1:t , ),

(7)

where p(dxi,t |yi,1:t , ) is the posterior distribution function. For a linear SSM with Gaussian state and measurement noise sequences, the state posterior density in (2) is Gaussian and can be

(9)

is a set of unnormalized importance weights,

which can be normalized and written as follows (k)

(k)

˜ i,t|t−1 = w

p(yi,t |xi,t|t−1 , )

N

(k)

k=1

Rnx

for k = 1, . . ., N,

(k)

˜ i,t|t−1 } where {w

p(yi,t |xi,t|t−1 , )

N k=1

,

for k = 1, . . ., N,

(10)

is a set of normalized weights. According to (10),

the particle weight depends on the likelihood function p(yi,t |xi,t , ). Intuitively, this makes sense, since the likelihood establishes how likely a given state explains the measurement. Therefore, if certain

4

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

particle explains the measurement better, then higher is the probability that the particle was in fact sampled from the target density. Observe that the prerequisite for computing the importance weight (k)

in (10) is complete access to the particle set {xi,t|t−1 }

N k=1

. This set can

result, as shown in [30], the rate of convergence of (17) to (7) (k)

decreases as the correlation in {xi,t|t }



xˆ i,t|t =

1 ı (k) (dxi,t−1 ), p˜ (dxi,t−1 |yi,1:t−1 , ) = x N i,t−1|t−1



p˜ (xi,t |yi,1:t−1 , ) 1 ı (k) (dxi,t−1 ), x N i,t−1|t−1 N

R nx

=

k=1

1 (k) p(xi,t |xi,t−1|t−1 ), N

(12b)

k=1

where p˜ (xi,t |yi,1:t−1 , ) is a Monte-Carlo (MC) approximation of p(xi,t |yi,1:t−1 , ). Observe that in (12b), the sampling density is provided as a mixture of N transition densities. Now since each of the N densities are uniformly weighted, passing the partiN (k) {xi,t−1|t−1 } k=1

cle set

=

through the transition density generates an

(k)

i.i.d. sample set {xi,t|t−1 }

N k=1

, that is distributed according to the sam-

pling density p(xi,t |yi,1:t−1 , ). Now once the particle set

N (k) {xi,t|t−1 } k=1

is made available, the empirical distribution of the sampling density can be written as follows 1 ı (k) (dxi,t ), x N i,t|t−1 N

p(xi,t |yi,1:t−1 , )dxi,t =

(13)

(k)

(k)

relation

k=1

(k)

(j)

where {Xi,t|t }

(k)

N j=1

for j = 1, . . ., N,

(14)

1  (k) (k) ˜ i,t|t−1 . xi,t|t−1 w N

(18d)

Algorithm 1. Particle filter (k) (k) xi,t|t−1 ∼p(xi,t |xi,t−1|t−1 , ), 1: 2: 3:

for k = 1, . . ., N.

for i=1 to J do (k)

Initialization: generate {xi,0|0 }

(k)

Predict: predict {xi,t|t−1 }

4: (k)

(k)

xi,t|t−1 ∼p(xi,t |xi,t−1|t−1 , ), 5:

N k=1

∼p(xi,0 ) distributed according to

the initial state density p(xi,0 ). for t=1 to Ti do N k=1

according to

for k = 1, . . ., N.

Update: compute importance weights according to (k) p(yi,t |xi,t|t−1 , ) = N , for k = 1, . . ., N. (k) p(yi,t |xi,t|t−1 , ) k=1

6:

State estimation: compute the state estimate as

 N

xˆ i,t|t =

1 , N

1 N

(k)

(k)

˜ i,t|t−1 . xi,t|t−1 w

N 

(15)

(k) (k) ˜ i,t|t {xi,t|t , w

(k)

˜ i,t|t ı w

(k) i,t|t

x

k=1

=

N −1 }

N k=1

represents a set of

(dxi,t ).

(16)

Given a particle approximation in (16), the state estimate in (7) is calculated by substituting (16) into (7), such that

 xi,t R nx

N  k=1

(k)

˜ i,t|t ı w

(k) x i,t|t

(dxi,t ) =

N 

(k)

(k)

˜ i,t|t xi,t|t , w

(17)

k=1

Observe that resampling in (14) takes a set of i.i.d. particles N k=1

(k)

(k)

and delivers a dependent particle set {xi,t|t }

N k=1

. As a

(k)

˜ i,t|t−1 } Resample: Resample {xi,t|t−1 , w =

(k) xi,t|t−1 )

=

(k) ˜ i,t|t−1 , w

N k=1

as per

for j = 1, . . ., N. (j)

Define the resampled particles as {xi,t|t } 8: 9:

for j = 1, . . ., N.

p(xi,t |yi,1:t , )dxi,t =

xˆ i,t|t ≈

7:

(j) Pr(Xi,t|t

random samples approximately distributed according to p(xi,t |yi,1:t , ), with its empirical distribution represented as

(k)

(18c)

k=1

is a set of resampled particles, with the correspond-

The new particle system

{xi,t|t−1 }

1  (k) xi,t|t−1  N

Note that from (18a) to (18b), we use the relation in (2) and the empirical distribution in (13). Further, since (18d) involves an independent particle set, instead of a dependent resampled particle set, the state estimate in (18d) is generally more accurate than (17). Finally, the procedure to recursively compute a particle approximation of the posterior density in (16) is referred to as particle filtering, and its application in batch state estimation is outlined in Algorithm 1.

according to the probability

ing particle weights reset to (j)

k=1

k=1

(j)

˜ i,t|t−1 , Pr(Xi,t|t = xi,t|t−1 ) = w

˜ i,t|t = w

(18b)

N

k=1

˜ i,t|t−1 } the particle system {xi,t|t−1 , w

1 ı (k) (dxi,t ), x N i,t|t−1

(18a)

(k) ˜ i,t|t−1 w

Finally, particles from the target density are obtained by resampling N

p(xi,t |yi,1:t−1 , )dxi,t ,

(k) p(yi,t |xi,t|t−1 , ) , N (j) p(yi,t |xi,t|t−1 , ) j=1

k=1

(12a)

N

increases. To speed up the

N

xi,t p(yi,t |xi,t , ) N

=

p(xi,t |xi,t−1 , )

=

Rnx

p(xi,t |yi,1:t , ) p(xi,t |yi,1:t−1 , )

Rnx

k=1



xi,t



N

where p˜ (dxi,t−1 |yi,1:t−1 , ) is an N-particle approximation of p(dxi,t−1 |yi,1:t−1 , ) and ıy (·) is a Dirac delta measure at y ∈ R. Now substituting (11) into (4) and using the integral property of the Dirac delta measure, we get the following

k=1

convergence of the state estimate to the true posterior mean, (7) can be alternatively computed as follows

be obtained as follows. Assuming that the particles from the target density p(xi,t−1 |yi,1:t−1 , ) at a previous time is available, then its empirical distribution can be approximately written as follows (11)

N

N j=1

.

end for end for

4. Recursive state and parameter estimation In Section 3, we review the state estimation problem formulation for Model 1 under the assumption that  is precisely known; however, in practice,  is rarely known a priori, and needs to be estimated along with the process states. This is referred to as the state and parameter estimation problem. In this section, we discuss an extension of the particle filtering method outlined in Section 3.1 for state and parameter estimation in Model 1. The central idea of state and parameter estimation is certainly not new. A customary Bayesian approach involves selecting a prior distribution for the model parameters followed by augmenting it with the states to form an extended state vector [14]. Theoretically, it casts the state and parameter estimation problem into a unified filtering framework; however, due to the lack of ergodicity and exponential forgetting of the joint state-parameter filter coupled with successive resampling steps, employing this approach

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

with any standard SMC algorithm often results in parameter sample degeneracy [31,32]. In other words, SMC approximation of the marginalized parameter posterior distribution is represented by a single Dirac delta function. Moreover, this approach is also known to cause error accumulation in the successive Monte-Carlo (MC) approximations of the density of interest, which in terms of Lp norm grows exponentially or polynomially in time [2]. A pragmatic approach to reduce parameter sample degeneracy and error accumulation in successive MC approximations is to introduce diversity to the parameter samples. This is done by adding artificial dynamics to the parameters [3,14,33] such that

i,t ∈ N

estimation steps are independent of each other, the parameter estimation step completely disregards any information amassed from the state estimation step, and instead computes posterior densities based on the measurement sequence alone. 4.2. Dual particle filtering Algorithm 2.

Dual particle filter

1  (k) = xi,0|0 ; N

1  (k) ˆ i,0|0 = i,0|0 . N

N

xˆ i,0|0

N

k=1

 i,t = i,t−1 + Wi,t−1 ,  } where {Wi,t

5

(19)

∈ Rn is a sequence of independent Gaussian ran-

x } dom variable defined independent of {Wi,t

i,t ∈ N

,

y {Vi,t } , i,t ∈ N

1: 2:

(k)

Initialization: generate {xi,0|0 }

N

(k)

k=1

xˆ i,0|0 =

N 

1 N

1 ˆ i,0|0 = N

(k)

xi,0|0 ;

N 

k=1

3:

(k)

N k=1

(k)

N

6:

parameter which governs the rate at which the parameters span Rn . An optimal approach to design the distribution of  i,0 is discussed in [34], and an optimal approach to tune the distribution of  {Wi,t } ∈ Rn is given in [3].

p(yi,t |xi,t|t−1 , ˆ i,t−1|t−1 )

for k = 1, . . ., N. State Estimation: compute the state estimate as xˆ i,t|t =

1 N

N 

(k)

(k)

˜ i,t|t−1 . xi,t|t−1 w

k=1 (k)

Parameter Predict: predict {i,t|t−1 } according to

7:

(k) (k)  i,t|t−1 ∼p(i,t |i,t−1|t−1 , xˆ i,t−1|t−1 ),

8:

(k)

(k)

9:

(k)

p(ˆxi,t|t |xi,t|t−1 , i,t|t−1 )

N

(k) (k) p(ˆxi,t|t |xi,t|t−1 , i,t|t−1 ) k=1

,

for k = 1, . . ., N. Parameter Estimation: compute the parameter estimate as 1 ˆ i,t|t = N

4.1. Simultaneous particle filtering

for k = 1, . . ., N.

Parameter Update: compute importance weights according to v˜ i,t|t−1 =

i,t ∈ N

Next we discuss the simultaneous and dual filtering approaches to state and parameter estimation in Model 2.

,

(k)

k=1

y

according to

for k = 1, . . ., N.

(k) p(yi,t |xi,t|t−1 , ˆ i,t−1|t−1 )

(k)

 , (20c)i,t = i,t−1 + Wi,t−1

(k)

State Update: compute importance weights according to ˜ i,t|t−1 = w

i,t ∈ N

N 

(k)

(k)

i,t|t−1 v˜ i,t|t−1 .

k=1

A simultaneous approach views the state and parameter estimation problem in Model 2 as an extended filtering problem. First an extended state process, comprising of the unknown states and parameters, and denoted as Zi,t = [Xi,t ,  i,t ]T for all i, t ∈ N is defined; and then {Zi,t }i,t ∈ N is estimated in real-time given {Yi,1:t }i,t ∈ N . Under the Bayesian framework, {Zi,t }i,t ∈ N can be estimated by recursively computing the posterior density p(zi,t |yi,1:t ). Now as in the case of state filtering, there is no closed form solution for p(zi,t |yi,1:t ). Several SMC methods, including Algorithm 1 can be used to recursively approximate p(zi,t |yi,1:t ). For example, with an SIR filter, we simply replace {Xi,t }i,t ∈ N in Algorithm 1 with an extended state process {Zi,t }i,t ∈ N . In fact, the simultaneous approach to state and parameter estimation in Model 2 has been implemented with several on-line Bayesian estimators, including the SIR filter [3], and with other advanced SMC methods such as the auxiliary SIR filter (ASIR) [35] and the Rao-Blackwellised particle filter (RBPF) [36]. Despite the simplicity of the simultaneous state and parameter estimation method it is known to suffer from several limitations. For example, the authors in [17] showed that the simultaneous approach is unable to efficiently track the parameters in complex systems, such as in the run-in-mine ore mill application. Further, using sensitivity analysis, it was also shown that the simultaneous approach yields poor parameter estimates for systems, where the Jacobian matrix of the measurements with respect to the states is ill-conditioned [17]. For example, since the state and parameter

∼p(i,0 ).

k=1

State Predict: predict {xi,t|t−1 }

4:

5:

where {i,t }i,t ∈ N is a first-order Markov process with initial density  i,0 ∼ p( i,0 ) and a transition density described by the random walk  model in (20c). In Model 2, the noise {Wi,t } ∈ Rn is a tuning

k=1

for t=1 to Ti do

(k)

(20d)Yi,t = h(Xi,t , i,t ) + Vi,t ,

N

i,0|0 .

xi,t|t−1 ∼p(xi,t |xi,t−1|t−1 , ˆ i,t−1|t−1 ),

Model 2. Model for estimation (20a)Xi,0 ∼p(xi,0 ), i,0 ∼p(i,0 ), x (20b)Xi,t = f (Xi,t−1 , i,t−1 ) + Wi,t−1 ,

∼p(xi,0 ) and {i,0|0 }

Find initial state and parameter estimates as

and

{Xi,0 }i ∈ N in Model 1. The dynamic formulation in (19) avoids the parameter degeneracy problem discussed earlier and further allows for estimation of time-varying parameters. In fact (19) together with Model 1 can be written as follows:

k=1

for i=1 to J do

10:

(k)

(k)

˜ i,t|t−1 } State Resample: Resample {xi,t|t−1 , w (j)

(k)

(k)

˜ i,t|t−1 , Pr(Xi,t|t = xi,t|t−1 ) = w

(j)

per

(k)

N

.

j=1 (k)

Parameter Resample: Resample {i,t|t−1 , v˜ i,t|t−1 } (j) Pr(i,t|t

=

(k) i,t|t−1 )

(k) ˜ i,t|t−1 ,

=v

N k=1

per

for j = 1, . . ., N. (j)

Define the resampled particles as {i,t|t } 12: 13:

k=1

for j = 1, . . ., N.

Define the resampled particles as {xi,t|t } 11:

N

N j=1

.

end for end for

To overcome the limitations of the simultaneous approach it is important to estimate the model parameters using available measurements and the inferred state information. This is the basic idea behind a dual particle filter for state and parameter estimation. A dual particle filter is typically implemented as a two-step filter. The first step involves the state estimation step and the second step involves the parameter estimation step. Formally, this is done as follows (see Algorithm 2 for a formal outline of the dual particle filter). Assuming that the parameter estimates, denoted by ˆ i,t−1|t−1 ∈ Rn , are available from the ith campaign, the states {Xi,t }i,t ∈ N at time t ∈ N can be predicted and updated by replacing  in Steps 4 and 5 in Algorithm 1 with its current best estimate ˆ i,t−1|t−1 (see Steps 4 and 5 in Algorithm 2 for illustration). The state estimates are then computed by executing Step 6 in Algorithm 2. Observe that the current best estimates for the parameters are used to predict, update, and compute the state estimates at current sampling time (Steps 4 through 6 in Algorithm 2). This

6

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

Fig. 1. An illustration showing the working principle of the proposed dual particle filtering method for estimating the states and parameters in Model 2. In the illustration, the PF1 is the particle filter for state estimation and PF2 is the particle filter for parameter estimation. Observe that while PF1 estimates the current states using the incoming measurements in conjunction with the best available parameter estimates, PF2 uses the current states to update the parameter estimates. The dual particle filtering method iterates between PF1 and PF2 at each sampling time. The algorithm for a dual particle filter is outlined in Algorithm 2.

completes the state estimation step. The next step is to perform similar prediction and update steps for model parameters. This is shown in Steps 7 and 8 in Algorithm 2. Assuming a particle set (k)

{i,t−1|t−1 }

N k=1

∼p(i,t−1 |yi,1:t−1 ), distributed according to the param(k)

eter posterior density is available, a new particle set {i,t|t−1 } be predicted using the relation (k)

N k=1

(k)

i,t|t−1 ∼p(i,t |i,t−1|t−1 , xˆ i,t−1|t−1 ),

can

(21)

where xˆ i,t−1|t−1 ∈ Rnx is a vector of state estimates. In (21), the parameter transition density is defined by the random walk model (k)

in (20c). Using (21), a particle set {i,t|t−1 }

N k=1

, representing a pre-

dicted parameter particle set, is generated. This constitutes the prediction step for the parameters in the dual particle filter. See Step 7 in Algorithm 2 for illustration. Now given a predicted parti(k)

cle set {i,t|t−1 }

N k=1

, the importance weights for the parameters are

calculated using

(k)

(k)

p(ˆxi,t|t |xi,t|t−1 , i,t|t−1 )

v˜ (k) =  i,t|t−1 N

(k)

k=1

(k)

p(ˆxi,t|t |xi,t|t−1 , i,t|t−1 )

,

(22)

(k)

(k)

where v˜ i,t|t−1 ∈ R is the importance weight for i,t|t−1 ∈ Rn for all k = 1, . . ., N. In (22), the importance weight is calculated using the state transition function (see Step 8 in Algorithm 2). Physically, (k)

particles in the set {i,t|t−1 }

N k=1

that are more likely to have yielded

the current best state estimate xˆ i,t|t are assigned higher importance weights, while particles that are less likely to have generated xˆ i,t|t are assigned smaller weights. Observe that (22) explicitly takes into account the state information from the state estimation step. Now (k)

(k)

given a particle system {i,t|t−1 , v˜ i,t|t−1 }

N k=1

, the parameter estimates

are computed using the relation (see Step 9 in Algorithm 2) 1  (k) (k) i,t|t−1 v˜ i,t|t−1 , ˆ i,t|t = N N

(23)

k=1

where ˆ i,t|t ∈ Rn are the parameter estimates at time t ∈ N. The (k)

(k)

˜ i,t|t−1 } particles in {xi,t|t−1 , w

N k=1

(k)

(k)

and {i,t|t−1 , v˜ i,t|t−1 }

N k=1

with neg-

ligible importance weights are removed to avoid the degeneracy problem. This is done by implementing a resampling strategy for the states and parameters. For example, using the importance (k)

weights in (22), the particle set {i,t|t−1 }

N k=1

is resampled with

replacement according to the probability relation (see Steps 10 and 11 in Algorithm 2) (j)

(k)

(k)

Pr(i,t|t = i,t|t−1 ) = v˜ i,t|t−1 ,

for j = 1, . . ., N,

(24)

(j)

where {i,t|t }

N j=1

are the resampled particles. Also, for illustration

purposes, the dual particle filter for state and parameter estimation in Model 2 is graphically presented in Fig. 1.

4.3. Challenges Note that the state estimation or the state and parameter estimation strategies discussed in Sections 3 and 4, respectively, do not allude to, or incorporate any specific features of a batch operation, such as raw-material variability or batch-to-batch variability. The strategies discussed in Sections 3 and 4 are generic, and can also be used for estimation in a continuous process as well. This is because Algorithm 2 assumes that the batch campaigns are independent and identical. Under this restrictive assumption, measurements from successive campaigns, even when made available, are deemed to be redundant for improving the batch-over-batch performance of the dual filter. In fact, observe that in Algorithm 2, the batch index i ∈ N is simply a dummy variable, and can in fact be dispensed with as it has limited influence over the quality of the state and parameter estimates computed in successive batches. From an estimation perspective, the strategies in Sections 3 and 4 do not differentiate a batch process from a continuous process. To implement such strategies, any conventional state and parameter estimation method designed for a continuous process can also be used for estimation in a batch process. For instance, a simultaneous estimation algorithm proposed in [15], originally developed for a continuous process, can also be used for estimation in a batch process with identical campaigns. Similarly, Algorithm 2 designed here for a batch process can also be used for estimation in a continuous process by setting i = 1 and allowing the campaign to run long enough (possibly infinitely long) for effective estimation of the process states and parameters. While Algorithm 2 can still be used if a batch process exhibits identical dynamics across multiple campaigns, such that there is only one unique campaign; however, in practice, it is rarely true. This is primarily due to the variability in raw-material, which inherently makes each campaign unique. Even if it were possible to run identical campaigns, relatively shorter campaign durations would severely limit the accuracy of estimates from Algorithm 2. This problem is further exacerbated for batch processes where the use of advanced real-time sensing technology is still at infancy, and most of the measurements are made available based on offline analysis of sample assays collected at large and irregular sampling intervals. For example, in a pharmaceutical industry, key process variables, such as titer concentrations and other product quality attributes are rarely measured in real-time. To address the aforementioned limitations with Algorithm 2, we propose a two dimensional (2D) dual particle filter for state and parameter estimation in batch processes. The proposed 2D dual filter is discussed next.

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

7

campaign, information amassed from the previous campaign is lost. Observe that Algorithm 2 only uses the measurements available within a campaign to estimate the states and parameters. In fact, if the transition dynamics to the new campaign is defined then Algorithm 2 can continue to run uninterrupted from one campaign to another without having to stop or restart at the end of a campaign. Thus the dynamics describing the transition from the terminal state of a campaign to the initial conditions of the next campaign needs to be appropriately defined. For the parameters in Model 2, a straightforward approach to define the transition dynamics for the following campaigns is as follows  i+1,0 = i,T + Wi,T ,

Fig. 2. A schematic of a multidimensional data structure for a batch process. Here I ∈ N, J ∈ N and T ∈ N represent the number of campaigns, the number of variables and the batch duration, respectively. It is assumed that all the campaigns have the same batch duration and the same number of variables being measured.

5. Two-dimensional (2D) dual particle filters As discussed in Section 4.3, Algorithm 2 for state and parameter estimation in batch processes, although valid, does not yield satisfactory results if the campaigns have different operating conditions or if the campaigns are short. To incorporate these distinctive features of a batch process, it is imperative to modify the dual particle filter in Algorithm 2. In this section, we propose a two-dimensional (2D) dual particle filter for state and parameter estimation in batch processes. Before we discuss the 2D dual particle filter, it is important to understand the structure of a typical batch data. Fig. 2 shows a generic multidimensional data structure for a batch process, where I ∈ N, J ∈ N and T ∈ N represent the number of campaigns, the number of measured variables and the batch duration, respectively. The different batch runs have been arranged along the vertical axis, the measurement variables are across the horizontal axis, and their time-evolution occupies the third dimension. Each horizontal slice through this three-dimensional array is a matrix of dimension (J × K) containing the trajectories of all the variables from a single batch. Similarly, each vertical slice is a matrix of dimension (I × J) representing the values of all the variables for all the batches at a common time interval. For convenience, it is assumed that all the campaigns have the same batch duration and contain the same number of measured variables. This is not a restrictive assumption, as many methods, including dynamic time wrapping are available to temporally align the batches to have the same length [37]. Further, observe that in Fig. 2, the time and campaign index are the two independent variables. Thus a variable is measured not only along the time direction but also across campaigns. This is clear from the index i, t ∈ N in Model 2 representing the campaign index and the sampling time. As discussed in Section 4.3, the dual particle filter in Algorithm 2 is ineffective because it does not combine the measurements from across different campaigns. Therefore, from an estimation perspective, Algorithm 2 is ineffective if the measurements within a campaign are few. 5.1. Batch transition model To combine the measurements across campaigns it is first important to define the dynamics of a batch process across multiple campaigns. Observe that Model 2 describes the dynamics only along the time dimension, i.e., Model 2 is a one-dimensional (1D) representation of a batch process, with continuity along the batch index broken. In fact this lack of continuity along the batch index forces Algorithm 2 to stop at the end of a campaign, and then restart for the next campaign. Every time Algorithm 2 restarts for a new

(25)

for all i ∈ N. In (25), the initial condition for the parameters at campaign i + 1 is defined as the parameter at the terminal time of  campaign i corrupted by some random noise Wi,T ∈ Rn . Physically, equation (25) ensures that the parameters are not significantly perturbed from its last location (obtained at the end of the previous campaign). Similar to (25), a batch transition relationship also needs to be defined for the process states in Model 2. Unfortunately, for the states, the terminal state of a campaign are rarely related to the initial condition of the subsequent campaign. This is not surprising since the campaigns are often run independently at different initial conditions. Despite the lack of any apparent batch transition for the states, it is still possible to link the states across campaigns. Note that many campaigns share the same raw-material with small variations in between them. This fact allows us to model the initial states across campaigns using the following model x Xi+1,0 = Xi,0 + Wi,0 ,

(26)

for all i ∈ N. In (26), the states Xi+1,0 ∈ Rnx and Xi,0 ∈ Rnx are the initial conditions for campaigns i + 1 and i, respectively, and x ∈ Rnx is a measure of raw-material variability across camWi,0 paigns. If the initial conditions for campaigns are precisely known x ∈ Rnx can be set to a zero matrix – indithen the covariance of Wi,0 cating zero raw-material variability across campaigns. While the state transition in (26) does not link the process states at the end of a campaign to the initial conditions in the subsequent campaign, as in (25), equation (26) still provides some basic inter-batch relationship for the states. In summary, using (25) and (26) with Model 2, a 2D batch model can now be defined as follows Model 3. Model for estimation (27a)Xi,0 ∼p(xi,0 ), i,0 ∼p(i,0 ), x (27b)Xi,t = f (Xi,t−1 , i,t−1 ) + Wi,t−1 , x (27c)Xi+1,0 = Xi,0 + Wi,0  (27d)i,t = i,t−1 + Wi,t−1 ,  (27e)i+1,0 = i,T + Wi,T ,

y

(27f)Yi,t = h(Xi,t , i,t ) + Vi,t , With the transition dynamics across campaigns defined in Model 3, the dual particle filtering method in Algorithm 2 can be modified for the state and parameter estimation in Model 3. We refer to this proposed filter a 2D dual particle filter as filtering is performed both along the time and the batch index dimensions. The state and parameter estimation steps of the 2D dual particle filter are similar to the dual particle filter in Algorithm 2 with some modifications. The parameter estimation step of the proposed 2D dual particle filter is discussed first. Unlike in Model 2, where the objective is to compute p( i,t |yi,1:t ); with Model 3, we compute p( i,t |y1:i,1:t ), where y1:i,1:t ≡ [y1,1 , . . ., y1:T , . . ., yi−1,1 , . . ., yi−1:T , yi,1 , . . ., yi,t ] denotes all the measurements available until time t ∈ N and campaign i ∈ N. Observe that while in Model 2 the lack of continuity across campaigns forces the parameter estimation step in Algorithm 2 to only use measurements from the current campaign,

8

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

with Model 3, the measurements from all previous campaigns can be straightforwardly combined to estimate the model parameters. Therefore, once Algorithm 2 completes the parameter estimation for the ith campaign at the terminal time, the parameters for the subsequent campaign can be predicted and generated from (25) using the following relation (k)

(k)

i+1,0 ∼p(i+1,0 |i,T |i,T , xˆ i,T |T ), where

N (k) {i+1,0 } k=1

and

for k = 1, . . ., N,

(k)

N

(k)

k=1

are random particles from

N k=1

∼p(i+1,0 ) as particles from the

(i + 1)th campaign at t = 0, such that 1 N

(k)

j

(30)



where a = 1 − h2 , ¯ i,t is the mean of parameter. For simplicity, this kernel smoothing algorithm is not introduced in detail here. Finally, (28) and (29) provide a continuous approach to perform parameter estimation across campaigns in the 2D dual particle filter without having to restart at the start of each campaign. Unlike the parameter estimation, for the state estimation step, the measurements across campaigns cannot be explicitly combined as Model 3 lacks any state continuity across the campaigns. Therefore, as in Algorithm 2, in the proposed 2D dual particle filter the states are estimated from p(xi,t |yi,1:t ) by ignoring measurements from all other campaigns. Once the ith campaign is complete, the initial condition for the states in the next campaign is set to Xi+1,0 ∼ p(xi+1,0 ), where from (26), the initial density function p(xi+1,0 ) is defined as follows x p(xi+1,0 ) = p(xi,0 + wi,0 ),

l=1

˜ i,t|t p(xi,t+1|T |xi,t|t ) w

(k)

N

k=1

for all t = 0, 1, . . ., T. (k)

(k)

˜ i,t|T = w ˜ i,t|t for all k = 1, 2, . . ., N. 3: Initialization: For t = T, set w

4: for t = T − 1 to 0 do 5: Compute smoothed importance weights: (j)

N 

k=1

(k)

(k)

(j)

˜ i,t+1|T p(xi,t+1|T |xi,t|t ) w

N

(l) (k) (l) ˜ w p(xi,t+1|T |xi,t|t ) l=1 i,t|t

,

for all j = 1, 2, . . ., N. 6: end for

where ˆ i+1,0 are the parameter estimates at the start of the (i + 1)th campaign. It should be noted that adding a small independent, zero-mean normal increment to the parameter may make it timevarying, and new parameter values can be generated at each time step, so the dual particle filtering method may be employed. However, for the actual fixed parameters, treating it time-varying may result in the over-dispersed approximation to the posterior distribution of parameter. The kernel smoothing algorithm has been proposed to correct it through the shrinkage of kernel locations [15,35]. According to this algorithm, the j

(k)

˜ i,t|t } {xi,t|t , w

(29)

k=1

mi,t = ai,t + (1 − a)¯ i,t

(k) (k) (j) N  ˜ i,t+1|T p(xi,t+1|T |xi,t|t ) w , N (l) (k) (l)

1: for i = 1 to J do 2: Run Algorithm 2 for batch i ∈ N. Store the particle system

(j)

i+1,0 ,

(j)

˜ i,t|T ≡ w ˜ i,t|t w

˜ i,t|T ≡ w ˜ i,t|t w

N

ˆ i+1,0 =

(j)

Particle smoothing for batch processes

k=1

N (k) {i,T |i,T } k=1

, we define {i+1,0 }

Algorithm 3.

(28)

p( i+1,0 |y1:i,1:T ) and p( i,T |y1:i,1:T ), respectively, and xˆ i,T |T ∈ Rnx are the state estimates computed from p(xi,T |yi,1:T ). Given the particle set {i+1,0 }

is to make use of the measurements from the campaign to accurately predict the initial condition for the campaign. A systematic approach to accurately estimate the initial condition of a campaign is discussed next.

(31)

x )∼p(x x where (Xi,0 + Wi,0 i,0 + wi,0 ) is the distribution for the ranx dom variable Xi,0 + Wi,0 . For example, if Xi,0 ∼N(0, i,0 ) and x ∼N(0, P x ) are any two independent Gaussian random variWi,0 i,0 ables (see Assumption 1) then it is straightforward to show that Xi+1,0 ∈ Rnx in (26) is also a Gaussian random variable, such that x ). Even though (31) provides an approach to Xi+1,0 ∼N(0, i,0 + Pi,0 calculate the density for the initial states across campaigns there is a major issue with (31) – the variance for Xi+1,0 ∼ p(xi+1,0 ) increases with each campaign. This is because Xi+1,0 in (31) is a martingale process whose variance is a monotonically increasing function of the batch index [38]. In terms of state estimation, it is well-known that the performance of a filter (measured in terms of the meansquare error) deteriorates as the variance on the initial condition increases. This is because it takes much longer for a filter to forget its initial conditions if the uncertainty on it is large [39,40]. An approach to resolve the issue of poor state estimation in Model 3

7: State Estimation: compute the initial states as follows xˆ i,0|T =

N 

(k)

(k)

˜ i,0|N xi,0|0 . w

k=1

8: end for

5.2. Estimating initial conditions In Algorithm 2, the states are estimated as the mean of the filtering density p(xi,t |yi,1:t ), which for t = 0 is same as the prior density p(xi,0 ) (since the measurements at t = 0 are unavailable). In other words, at t = 0 the measurements from the ith campaign are not being used to estimate the initial conditions for the ith campaign. Note that this issue is not specific to Algorithm 2, but is inherent to the filtering framework itself, wherein a state filtering density is conditioned on measurements available only up until the current time. To allow for an accurate estimation of the initial condition of a campaign, we propose to compute a smoothing density, p(xi,0 |yi,1:T ) for all i ∈ N, where p(xi,0 |yi,1:T ) encapsulates all statistical information on the initial condition of a campaign i ∈ N conditioned on the measurements from the campaign. Physically, once a campaign i ∈ N is complete, the data sampled or collected from the campaign can be used to refine the estimates for the initial conditions. However, as in the case of a filtering density, computing a closed-form solution to the smoothing density p(xi,0 |yi,1:T ) is a nontrivial problem for batch processes represented by Model 3. In this paper, we use a forward-backward particle smoothing algorithm to approximate p(xi,0 |yi,1:T ) for all i ∈ N [41–43]. Unlike particle filters that recursively approximate the filtering density in the forward temporal direction, a forward-backward particle smoother moves both forward and backwards in the temporal direction to approximate a smoothing density. Thus to approximate the smoothing density p(xi,0 |yi,1:T ), we first need to approximate p(xi,t |yi,1:T ) for all t ∈ N. The basic idea of particle smoothing, as applicable to a batch process is briefly discussed here. For a detailed discussions the reader is referred to [41–43], and the references contained therein. The idea of a particle smoother is similar to the particle filter discussed in Section 3.1. In a particle smoother, we seek to approximate the smoothing distribution function p(dxi,t |yi,1:T ) as follows

p˜ (dxi,t |yi,1:T ) =

N 

(k)

˜ i,t|T ı w

(k)

(k) i,t|T

x

k=1 (k)

˜ i,t|T } where t{xi,t|T , w

N k=1

(dxi,t ),

(32)

∼p(xi,t |yi,1:T ) is a set of N smoothed particles

and their corresponding particle weights. Using the Law of Total

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

Probability and Bayes’ Theorem, the smoothing density function p(xi,t |yi,1:T ) can be written as follows



p(xi,t |yi,1:T ) =

p(xi,t , xi,t+1 |yi,1:T )dxi,t+1 ,

(33a)

Rnx

 =

p(xi,t |xi,t+1 , yi,1:T )p(xi,t+1 |yi,1:T )dxi,t+1 ,

(33b)

p(xi,t+1 |xi,t )p(xi,t |yi,1:t ) p(xi,t+1 , yi,1:T )dxi,t+1 , p(xi,t+1 |yi,1:t )

(33c)

R nx

 =

R nx



p(xi,t+1 |xi,t ) p(xi,t+1 , yi,1:T )dxi,t+1 . p(xi,t+1 |yi,1:t )

= p(xi,t |yi,1:t ) Rnx

(33d)

The equation (33d) is expressed as a product of a filtering density p(xi,t |yi,1:t ) and an integral with respect to the states. Although not shown here for brevity, the equations (33a) through (33d) are a function of the model parameters and are evaluated at the current best parameter estimate ˆ i,t−1|t−1 ∈ Rn . Similar to (12b), observe that in (33d), the density function p(xi,t+1 |yi,1:t ) can be approximated, such that p˜ (xi,t+1 |yi,1:t ) =

N 

(k)

(k)

˜ i,t|t p(xi,t+1 |xi,t|t ), w

(34)

k=1

where p˜ (xi,t+1 |yi,1:t ) is an N-particle approximation of the density (k) ˜ i,t|t function p(xi,t+1 |yi,1:t ) and w

= 1/N is the resampled importance (k)

weight corresponding to the resampled particle xi,t|t for all k = 1, 2, . . ., N. Now note the smoothing density, p(xi,t |yi,1:T ) and the filtering density p(xi,t |yi,1:t ) are the same for t = T. Therefore, the smooth(k)

(k)

˜ i,t|T } ing particle system {xi,t|T , w (k)

(k)

˜ i,t|t } system {xi,t|t , w

N k=1

N k=1

is same as the filtering particle

. Working backwards in time t ∈ N, if we

xˆ i,0|T ∈ Rnx is the mean of the smoothing density p(xi,0 |yi,1:T ). Algorithm 3 gives a pseudo-code for implementing the forward-backward particle smoothing algorithm for batch processes. It is important to highlight that using Algorithm 3, the initial condition for a campaign can only be estimated once the campaign is complete. This is because all the measurements within a campaign are used to construct a smoothing density for the initial condition. Now since Algorithm 3 estimates the initial condition of a campaign based on a smoothing density, the estimates are better than that obtained using a filtering density; however, observe that in order to run a dual particle filter in Algorithm 2, we still need to provide the initial condition for the campaign before the campaign starts. Now from (26), it is clear that the initial conditions for the campaigns are related to one another through a random noise term that describes the variability in the raw-material. Note that since there is no physical dynamics for the initial states across campaigns, we can also write (26) as follows x Xi,0 = Xi+1,0 − Wi,0 ,

I≡ R nx

p(xi,t+1 |xi,t ) p(xi,t+1 , yi,1:T )dxi,t+1 , p(xi,t+1 |yi,1:t )

(35a)

(k) (k) N  ˜ i,t+1|T p(xi,t+1|T |xi,t ) w ≈ , N (l) (k) (l) l=1

k=1

x x Yi,0 = Xi,0 − Wi−1,0 ,

x Xi+1,0 = Xi,0 + Wi,0 ,

(41a)

x x Yi,0 = Xi,0 − Wi−1,0 .

(41b)

Algorithm 4.

p˜ (dxi,t |yi,1:T ) =

N 

xˆ i,0|0

(j) i,t|t

x



(j) ˜ i,t|t w

 k=1



k=1

(k)

(k)

N 

N j=1

1 ˆ i,0|0 = N

(k)

xi,0|0 ;

N 

(38)

k=1

∼p(i,0 ). Find

(k)

k=1

(k)

N k=1

(k) (k) xi,t|t−1 ∼p(xi,t |xi,t−1|t−1 , ˆ i,t−1|t−1 ),

5:

according to

for k = 1, . . ., N.

State Update: compute importance weights according to ˜ i,t|t−1 = w

(k) p(yi,t |xi,t|t−1 , ˆ i,t−1|t−1 )

N

,

(k) p(yi,t |xi,t|t−1 , ˆ i,t−1|t−1 )

for k = 1, . . ., N. State Estimation: compute the state estimate as 1 N

N 

(k)

(k)

˜ i,t|t−1 . xi,t|t−1 w

k=1 (k)

Parameter Predict: predict {i,t|t−1 } according to (k) (k)  i,t|t−1 ∼p(i,t |i,t−1|t−1 , xˆ i,t−1|t−1 ),

corresponds

N

i,0|0 .

State Predict: predict {xi,t|t−1 }

4:

7:

to the set of N particles from the smoothing density p˜ (xi,t |yi,1:T ). This process can be repeated moving backwards for all t = T, T − 1, . . ., 0. Finally, the initial condition of a campaign i ∈ N can be approximated as follows ˜ i,0|N xi,0|0 , w

1 N

k=1

(37)

(j)

(k)

and {i,0|0 }

for t=1 to T do for i=1 to J do

xˆ i,t|t =

(k) (j) (k) ˜ i,t+1|T p(xi,t+1|T |xi,t|t ) w , N (k) (l) (l) ˜ w p(xi,t+1|T |xi,t|t ) l=1 i,t|t (j)

xˆ i,0|T =

Initialization: generate

k=1

˜ i,t|T } for all j = 1, 2, . . ., N. In (36), the system {xi,t|t , w

N 

k=1 (k) N {xi,0|0 } ∼p(xi,0 ) k=1

initial state and parameter estimates as

(36)

where (j) ˜ i,t|T w

1:

6: N

N

k=1

2: 3:

(dxi,t ),

1  (k) ˆ i,0|0 = i,0|0 . N

N

(k)

(j)

˜ i,t|T ı w

j=1

2D dual particle filter

1  (k) = xi,0|0 ; N

(35b)

where the last equality follows by substituting (34) into (35a). Finally substituting (35b) and the particle approximation of the filtering density p(xi,t |yi,1:t ) in (16) into (33d) yields

(40)

x ≡x ˆ i,0|N is the estimate of the initial condition for the where Yi,0 (i − 1)th campaign computed using Algorithm 3. Now together, (26) and (40) can be written as follows:

xˆ i,0|0 =

˜ i,t|t p(xi,t+1|T |xi,t|t ) w

(39)

where (39) describes the initial conditions for the ith campaign as a function of the (i + 1)th campaign. Now substituting an estimate of Xi−1,0 from Algorithm 3 into (39), we get

assume that the approximation (32) is available at the sampling time t + 1, the integral in (33d) can be approximated as follows:



9

8:

(k)

(k)

v˜ i,t|t−1 =

(k)

p(ˆxi,t|t |xi,t|t−1 , i,t|t−1 )

N

(k)

k=1

9:

for k = 1, . . ., N.

Parameter Update: compute importance weights according to

(k)

,

p(ˆxi,t|t |xi,t|t−1 , i,t|t−1 )

for k = 1, . . ., N. Parameter Estimation: compute the parameter estimate as 1 ˆ i,t|t = N

N 

k=1

(k)

(k)

i,t|t−1 v˜ i,t|t−1 .

10

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

Fig. 3. An illustration showing the working principle of the proposed 2D dual particle filter for estimating the states and parameters in Model 3. The box or enclosure around the blocks represents a campaign operation. Here PF1 is the particle filter for state estimation and PF2 is the particle filter for parameter estimation. The Kalman filter, denoted by KF, is used to estimate the initial conditions for a batch operation. At the end of a campaign, all the information flows over to the following campaign, where it is further processed and improved. The pseudo-code for the proposed 2D dual particle filter is provided in Algorithm 4.

(k)

(k)

˜ i,t|t−1 } State Resample: resample {xi,t|t−1 , w

10:

(j) Pr(Xi,t|t

=

(k) xi,t|t−1 )

=

(k) ˜ i,t|t−1 , w

(j)

Parameter Resample: resample (j)

(k)

(k)

Pr(i,t|t = i,t|t−1 ) = v˜ i,t|t−1 ,

N

as per

.

j=1 N (k) (k) {i,t|t−1 , ˜ i,t|t−1 } k=1

v

as per

for j = 1, . . ., N. (j)

Define the resampled particles as {i,t|t } 12: 13:

k=1

for j = 1, . . ., N.

Define the resampled particles as {xi,t|t } 11:

N

N j=1

.

end for State Initial Condition: implement Algorithm 3 and compute x ←− xˆ i,0|T . xˆ i,0|T . Set yi+1,0

14:

(k)

15:

N k=1

∼p(xi+1,0|yx

(k)

(k)

 i+1,0 ∼p(i+1,0 |i,T |i,T , xˆ i,T |T ), 16:

x Letting xˆ i,0|0 ∈ Rnx be the mean of p(xi,0 |y1:i,0 ), then we define the

initial condition for the states for the ith campaign as xˆ i,0|0 . Sim-

(k)

for k = 1, . . ., N.

and estimate ˆ i+1,0 . end for

Observe that (41a) and (41b) represent a state-space model for the initial conditions for the campaigns across the batch index. Here {Xi,0 }i ∈ N is defined as a first-order Markov process, characterized by an initial density Xi,0 ∼ p(xi,0 ) and a transition density p(xi,0 |xi−1,0 ), which essentially is the density of the state noise. The states {Xi,0 }i ∈ N are assumed to be hidden but indirectly observed x } x ∈ through another process {Yi,0 . Here the measurement Yi,0 i∈N

x ) we also generate {Xi,0|0 } ilarly, given p(xi,0 |y1:i,0 (k)

).

i+1,0

Parameter Initial Condition: predict {i+1,0 } according to

Rny is assumed to be conditionally independent given a state process Xi,0 ∈ Rnx . In fact (41a) and (41b) represent a stochastic linear state-space model. Now since {Xi,0 }i ∈ N is a latent variable, it can be estimated x x by constructing a posterior density p(xi,0 |y1:i,0 ), where y1:i,0 = x x x [y1,0 , y2,0 , . . ., yi,0 ] for all i ∈ N. Observe that since (41a) and (41b)

i∈N

is Gaussian then the Kalman filter yields a closed-form solution to x ). Since the basis of the Kalman filthe posterior density p(xi,0 |y1:i,0 ter is well established, we will not discuss it here; however, for details the reader are referred to [22]. The Kalman filter gives the x mean and covariance of the posterior density function p(xi,0 |y1:i,0 ).

(k)

Implement Kalman filter on (41a) and (41b). Estimate xˆ i+1,0 and generate {xi+1,0 }

x } represent a linear state-space model, if the noise sequence {Wi,0

where Xi,0|0

N

k=1

N k=1

x ∼p(xi,0 |y1:i,0 ),

x are distributed according to p(xi,0 |y1:i,0 ). To

ensure the continuity of the proposed 2D dual particle filters across (k)

batch dimension, we redefine {Xi,0|0 }

N k=1

∼p(xi,0 ) as a set of random

particles distributed according to the initial state density function for the ith campaign. Finally, a schematic of the proposed 2D dual particle filters is shown in Fig. 3, and the corresponding pseudocode is given in Algorithm 4.

6. Simulation results In this section, the efficacy of the dual particle filters and the 2D dual particle filters outlined in Algorithms 2 and 4, respectively, for real-time estimation and identification in batch processes is demonstrated on two example problems – a benchmark problem and a beer fermentation problem.

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

Fig. 4. (a) Comparing the state estimates (solid-black curve) obtained with Algorithm 2 against the true states (broken-black curve). (b) Comparing the parameter estimates (solid-black curve) obtained with Algorithm 2 against the true parameters (broken-black line). The results presented here are for Batch 1 with T = 10 samples.

6.1. Example 1: a benchmark system Consider a batch process represented by the following timeinvariant, stochastic, nonlinear SSM Xi,t =

1 Xi,t−1 + Wi,t−1 , (1 + ) y

Yi,t = Xi,t + Vi,t ,

(42a) (42b)

where  ∈ R is the unknown model parameter, i, t ∈ N are the batch index and the sampling rate, respectively. It is assumed ∗ = 2 for all i ∈ N that the initial state is batch-invariant with Xi,0 and the true parameter is both time and batch-invariant with  * = 0.01. Here the asterisk notation represents the true value for the variable. It is further assumed that {Wi,t }i,t ∈ N and {Yi,t }i,t ∈ N in (42a) and (42b), respectively, are the independent sequences of Gaussian random variables, such that Wi,t ∼N(wi,t |0, 0.012 ) and y y Vi,t ∼N(vi,t |0, 0.0012 ). For a batch process represented in (42a) and (42b), the objective is to perform real-time state and parameter estimation using Algorithms 2 and 4. For simulation purposes, we assume that there are 10 campaign runs, each for 10 days, with a one-day sampling rate, such that J = 10 and T = 10. In this section, we compare and evaluate the performances of the dual particle filter and the 2D dual particle filter proposed in Algorithms 2 and 4, respectively. To implement Algorithm 2 for state and parameter estimation, we first set-up (42a) and (42b) according to Model 2, and define 2

Xi,0 ∼N(xi,0 |1.6, 0.1 ),

(43a)

i,0 ∼N(i,0 |0.2, 0.012 ).

(43b)

Further, the model parameters are assumed to follow the following dynamics  i,t = i,t−1 + Wi,t−1 ,  } where {Wi,t

i,t ∈ N

(44)

∈ Rn is a sequence of independent Gaussian

  random variables, such that Wi,t ∼N(wi,t |0, 0.12 ). Finally, we set N = 100 in Algorithm 2. Fig. 4(a) and (b) give the real-time state and parameter estimates for the first campaign, as calculated using Algorithm 2. From Fig. 4(a) and (b) it is clear that the performance of the estimation

11

Fig. 5. (a) Comparing the state estimates (solid-black curve) obtained with Algorithm 2 against the true states (broken-black curve). (b) Comparing the parameter estimates (solid-black curve) obtained with Algorithm 2 against the true parameters (broken-black line). The results presented here are for Batch 1 with T = 200 samples.

step is dependent on the identification step. Observe that the performance of the state and parameter estimates improves as the batch progresses; however, even before the estimates fully converge to their true values, the campaign is over. It is instructive to highlight that while the results in Fig. 4 (a) and (b) are shown for the first campaign, similar results are obtained for other campaigns – the estimates fail to converge to their true values before the end of the campaign. This is because, as discussed in Section 4.2, Algorithm 2 treats campaigns independently. Of course, if the campaign is allowed to run for much longer, as continuous processes, and assuming the process is time-invariant, Algorithm 2 would converge to the true parameter estimates (see Fig. 5 for illustration) before the end of the campaign; however, in practice, such long campaign operations are rare for batch processes. In fact, from a batch estimation perspective, data from a single campaign is rarely sufficient for real-time estimation and identification. Next we use Algorithm 4 for real-time state and parameter estimation in (42a) and (42b). As discussed in Section 5, we first need to set up (42a) and (42b) according to Model 3. For a fair comparison between Algorithms 2 and 4, we assume that the prior densities for the states and parameters, and the parameter transition density functions in Model 3 follow the same distributions as defined earlier for Model 2. Now as required by Algorithm 4, we link the initial conditions for different campaigns through the following equation x Xi+1,0 = Xi,0 + Wi,0 ,

(45)

x } where where {Wi,0 i,t ∈ N



Rnx

is a sequence of independent Gaus-

x ∼N(w x |0, 0.12 ). sian random variables, such that Wi,0 i,0 Fig. 7(a) and (b) give the real-time estimation and identification errors with Algorithm 4 along the campaign duration, and for three different campaigns (i = 1, 2 and 10). Here the estimation and identification errors are defined as follows ∗ xi,t ≡ xˆ i,t − xi,t ,

(46a)

∗ i,t ≡ ˆ i,t − i,t ,

where

∗ xi,t

and

(46b) ∗ i,t

are the true states and the true parameters,

respectively; and xi,t and i,t are the estimation and identification errors for the ith campaign at time t, respectively. Similar to Algorithm 2, from Fig. 6(a) and (b) it is evident that Algorithm 4 is successful in reducing the estimation and identifi-

12

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

6.2. Penicillin fermentation process Penicillin fermentation process is typically operated as a batch process; wherein, the biomass, substrate and water are added to the fermenter to produce Penicillin, carbon dioxide and water [45]. There are different fermentation mechanisms to produce different qualities, but for the most part, the central idea of Penicillin fermentation is the same – the biomass grows by consuming substrate to produce Penicillin. The concentrations of biomass, substrate and Penicillin are critical for the fermentation process. In this section, we consider the problem of real-time estimation and identification of Penicillin fermentation process. Penicillin fermentation process is commonly modeled using the following nonlinear state-space model [46]

Fig. 6. (a) State estimation error (or estimation error) along the batch duration for batch numbers 1 (dotted-black curve), 2 (dashed-black curve) and 10 (solid-black curve) obtained with Algorithm 4. (b) Parameter estimation error (or identification error) along the batch duration for batch numbers 1 (dotted-black curve), 2 (dashedblack curve) and 10 (solid-black curve) obtained with Algorithm 4.

dC X CX dV = CX − , V dt dt

(47a)

dC P CP dV = pp CX − KC P − , V dt dt

(47b)

pp  CS dV dC S , CX − CX − mx CX + F − =− Yx/s V dt Yp/s dt

(47c)

pp  CS dV dDo CX − CX − mo CX + Kla (Domax − Do ) − =− , Yx/o Yp/o V dt dt (47d)

dV = F − Floss , dt

(47e)

dCO2 dC X = ˛1 + ˛2 CX + ˛3 , dt dt

(47f)

where: X = [CS , CX , CP ]T are the process states, representing the concentrations of the substrate, biomass, and Penicillin, respectively, Y = [Do , V, CO2 ] are the accurate online measurements representing dissolved oxygen, volume and CO2 evolution, respectively. The substrate-to-biomass conversion rate  is expressed as  = max

CS Do Kc CX + CS Kox CX + Do

(48)

The specific penicillin production rate pp is as follows

Fig. 7. Estimation of the initial condition for the state (solid-black curve) along the batch direction as obtained with Algorithm 4. The true initial condition is represented by the broken-black line.

cation errors along the campaign duration; however, the estimates fail to converge to their true values. This is again due to insufficient measurements available in any single campaign; however, note that unlike Algorithm 2, since Algorithm 4 is able to continue improving the estimates along the batch direction by making use of measurements from multiple campaigns, the performance of Algorithm 4 improves in subsequent campaigns. For example, Fig. 6(a) and (b) show how the estimation and identification errors monotonically reduce both along the time and batch directions with Algorithm 2. In fact, observe that by the end of Batch 10, Algorithm 4 successfully converges to the true states and the true parameters. Finally, Fig. 7 shows the estimated initial conditions for the campaigns as computed by Algorithm 4. Recall that since the campaigns are assumed to have the same initial condition, the performance of Algorithm 4 continues to improve along the batch direction, and quickly converges to the true initial condition. In contrast, the initial condition estimated using Algorithm 2 is always inferior (compared to Algorithm 4), as evident from Fig. 4.

pp = p

p

CS

Do

(49)

p

Kp + CS + CS2 /KI Kop CX + Do

where max , p , Kp , KI , p, Kop , Yx/s , Yp/s , mx , Yx/o , Yp/o , mo , Domax and Kla are constant parameters. The input of fermenter, the freezing and boiling temperature are denoted by F, T0 and Tv , respectively. The evaporative loss Floss is modeled as Floss = V(e5((T −T0 )/Tv −T0 ) − 1)

(50)

where  is the evaporative rate. Assume that the model parameter  = p is unknown to be estimated and the sampling interval is 0.01h to obtain the discretized model through the discrete-time transformation of state-space model. The values of the other known parameters can refer to [45]. Wtx,1 , Wtx,2 and Wtx,3 are the sequences of independently distributed Gaussian random variables, such that





⎛⎡ ⎤ ⎡ −2 10 0 0 ⎢ x,2 ⎥ ⎜ ⎢ ⎢ W ⎥ = N ⎝⎣ 0 ⎦ , ⎣ 0 10−4 ⎣ t ⎦ Wtx,1

Wtx,3

0

0

0

0 0

⎤⎞ ⎥⎟ ⎦⎠ ,

(51)

−2

10

for all sampling instants in all batches. The initial concentrations of the species are given as X0∗ = [15, 0.1, 0]T for all batches, and the y,1

true model parameter is given by  * = 0.005. Vt

y,2

, Vt

y,3

and Vt

are

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

13

Fig. 9. Identification of  along the batch duration for batch numbers 1 (dotted-black curve), 10 (dashed-black curve) and 20 (solid-black curve) obtained with Algorithm 4.

Fig. 8. (a) State estimation error (or estimation error) for product along the batch duration for batch numbers 1 (dotted-black curve), 10 (dashed-black curve) and 20 (solid-black curve) obtained with Algorithm 4. (b) State estimation error (or estimation error) for substrate along the batch duration for batch numbers 1 (dotted-black curve), 10 (dashed-black curve) and 20 (solid-black curve) obtained with Algorithm 4.

the sequences of independently distributed Gaussian measurement noises, such that





⎛⎡ ⎤ ⎡ −1 10 0 0 ⎢ y,2 ⎥ ⎜ ⎢ ⎢ V ⎥ = N ⎝⎣ 0 ⎦ , ⎣ 0 2 ∗ 10−1 ⎣ t ⎦ y,1

Vt

y,3

Vt

0

0

0

0 0

⎤⎞ ⎥⎟ ⎦⎠ ,

(52)

10−1

for all sampling instants in all batches. For simulation purposes, it is assumed that there are J = 20 campaigns, each operating for 20 hours, such that 2000 measurements are collected in each campaign. Finally, for Penicillin fermentation process represented by (47a) through (47c), and the sensor measurements by (47d) through (47f), we perform real-time estimation and identification using Algorithm 4. To avoid the over-dispersed approximation of the parameter, the kernel smoothing algorithm is employed where h = 0.95. A poor initial state guess is taken as [15.5, 0, 0.2]T with noise of Gaussian distribution N([0, 0, 0]T , diag([5e − 1, 1e − 2, 1e − 1])). 50 particles and 50 particles are used for the Monte Carlo simulations of the state and the parameter, respectively. The estimation result by the generic dual particle filter is shown in Fig. 8, where Batch 1 represents that only the data of one campaign are used for the estimation. Fig. 8(a) and (b) show the error trajectories in estimating (3) (1) the concentrations of the biomass (Xt ) and the substrate (Xt ), respectively. The results are presented for three different batches, namely 1, 10 and 20. It is evident from Fig. 8(a) and (b) that the estimation performance of Algorithm 4 improves not only along the time direction, but also along the batch direction. As discussed earlier, the estimation performance of Algorithm 4 is directly correlated to the identification performance of Algorithm 4. Fig. 9 gives the identification performance of Algorithm 4 in estimating . As with the state estimation, the identification performance of Algorithm 4 improves both along the time and batch directions. Finally, Fig. 10 shows how the initial concentration of the substrate in the fermenter converges to the true initial concentration with every new campaign run. Observe that the initial condition for the substrate keeps converging to its real value with the increas-

Fig. 10. Estimation of the initial condition for the substrate concentration (solidblack curve) along the batch direction as obtained with Algorithm 4. The true initial condition is represented by the broken-black line.

ing number of campaigns. Similar results are also obtained for the initial concentrations of the biomass and product. All in all, Figs. 8 through 10 clearly demonstrate the efficacy of Algorithm 4 for real-time estimation and identification of batch processes.

7. Conclusions In this paper, we discuss the problem of estimation and identification in batch processes. A two-dimensional (2D) dual particle filtering algorithm is proposed to allow for real-time estimation and identification of batch processes represented by nonlinear SSMs. To the best of authors’ knowledge, the proposed method is the first algorithm that fully exploits the dynamics of a batch process both across the time and batch directions to deliver high-quality state and parameter estimates. The proposed method maintains estimation continuity by carefully linking the terminal state of a batch campaign to the next campaign. Finally, the efficacy of the 2D dual particle filter was demonstrated on two simulation examples.

14

Z. Zhao et al. / Journal of Process Control 81 (2019) 1–14

Acknowledgements The authors would like to acknowledge the support by national first-class discipline program of Light Industry Technology and Engineering (LITE2018-25) as well as National Natural Science Foundation of China (NSFC 61573169, 61833007). References [1] B. Srinivasan, S. Palanki, D. Bonvin, Dynamic optimization of batch processes: I. Characterization of the nominal solution, Comput. Chem. Eng. 27 (1) (2003) 1–26. [2] N. Kantas, A. Doucet, S.S. Singh, J. Maciejowski, An overview of sequential Monte-Carlo methods for parameter estimation on general state-space models system identification, in: Proceedings of the 15th IFAC Symposium on System Identification, Saint-Malo, France, 2009. [3] A. Tulsyan, B. Huang, R.B. Gopaluni, J.F. Forbes, On simultaneous state and parameter estimation in non-linear state-space models, J. Process Control 23 (4) (2012) 516–526. [4] S. Koopman, N. Shepard, Exact score for time series models in state-space form, Biometrika 79 (1992) 823–826. [5] T.C. Lystig, J.P. Hughes, Exact computation of the observed information matrix for hidden Markov models, J. Comput. Graph. Stat. 11 (2002) 678–689. [6] G. Poyiadjis, A. Doucet, S.S. Singh, Particle approximations of the score and observed information matrix in state-space models with application to parameter estimation, Biometrika 98 (1) (2011) 65–80. [7] G. Poyiadjis, A. Doucet, S.S. Singh, Maximum likelihood parameter estimation in general state-space models using particle methods, in: Proceedings of the American Statistical Association, Minneapolis, USA, 2005. [8] C. Andrieu, A. Doucet, S.S. Singh, V.B. Tadic, Particle methods for change detection, system identification, and control, Proc. IEEE 92 (3) (2004) 423–438. [9] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. B 39 (1998) 1–38. [10] L. Chen, A. Tulsyan, B. Huang, F. Liu, Multiple model approach to nonlinear system identification with an uncertain scheduling variable using EM algorithm, J. Process Control 23 (10) (2013) 1480–1496. [11] P. Del Moral, A. Doucet, S.S. Singh, Forward smoothing using sequential Monte Carlo, Technical Report CUED/F-INFENG/TR 638, Department of Engineering, Cambridge University, UK, 2009, pp. 37–40. [12] S. Yildirim, S.S. Singh, A. Doucet, An online Expectation- Maximization algorithm for change point models, J. Comput. Graph. Stat. (2012). [13] O. Cappse, Online sequential Monte Carlo EM algorithm, in: Proceedings of the 15th IEEE Workshop on Statistical Signal Processing, Cardiff, UK, 2009, pp. 37–40. [14] G. Kitagawa, Self organizing state-space models, J. Am. Stat. Assoc. 93 (1998) 1203–1215, 1998. [15] T. Chen, J. Morris, E. Martin, Particle filters for state and parameter estimation in batch processes, J. Process Control 15 (2005) 665–673. [16] H. Khodadadi, H. Jazayeri-Rad, Applying a dual extended Kalman filter for the nonlinear state and parameter estimations of a continuous stirred tank reactor, Comput. Chem. Eng. 35 (11) (2011) 2426–2436. [17] L.E. Olivier, B. Huang, I.K. Craig, Dual particle filters for state and parameter estimation with application to a run-of-mine ore mill, J. Process Control 22 (4) (2012) 710–717. [18] A. Gupta, R.D. Gudi, Integrating iterative learning estimation with optimal control for batch productivity enhancement, IFAC Papers Online 48 (8) (2015) 843–848. [19] Z. Zhao, B. Huang, F. Liu, Parameter estimation for batch processes with measurements of large sampling intervals, IFAC Papers Online 48 (8) (2015) 799–804. [20] Z. Zhao, B. Huang, F. Liu, Bayesian method for state estimation of batch process with missing data, Comput. Chem. Eng. 53 (2013) 14–24. [21] H.L. Van Trees, Detection, Estimation and Modulation Theory. Part I, Wiley, New York, 1968. [22] B. Ristic, S. Arulampalam, N. Gordon, A tutorial on Particle Filters, in: Beyond the Kalman filter: Particle Filters for Tracking Applications, Artech House, Boston, MA, 2004. [23] A. Tulsyan, B. Huang, R.B. Gopaluni, J.F. Forbes, Performance assessment, diagnosis, and optimal selection of nonlinear state filters, J. Process Control 24 (2) (2014) 460–478.

[24] H.W. Sorenson, On the development of practical non-linear filters, Inf. Sci. 7 (C) (1974) 253–270. [25] P.S. Maybeck, Stochastic Models, Estimation and Control, Academic Press, New York, 1982. [26] A. Tulsyan, Y. Tsai, R.B. Gopaluni, R.D. Braatz, State of charge estimation in lithium-ion batteries: a particle filter approach, J. Power Sources 331 (2016) 208–223. [27] D. Crisan, B. Rozovskii, The Oxford Handbook of Non-linear Filtering, Oxford University Press, Oxford, 2011. [28] A. Tulsyan, B. Huang, R.B. Gopaluni, J.F. Forbes, Bayesian identification of non-linear state-space models: Part II – error analysis, in: Proceedings of the 10th International Symposium on Dynamics and Control of Process Systems, Singapore, 2013. [29] A. Tulsyan, R.B. Gopaluni, S.R. Khare, Particle filtering without tears: a primer for beginners, Comput. Chem. Eng. 95 (12) (2016) 130–145. [30] B. Ninness, Strong laws of large numbers under weak assumptions with application, IEEE Trans. Autom. Control 45 (11) (2000) 2117–2122. [31] C. Andrieu, A. Doucet, V.B. Tadic, Online parameter estimation in general state-space models, in: Proceedings of the 44th IEEE Conference on Decision and Control and European Control Conference, Seville, Spain, 2005. [32] A. Doucet, V.B. Tadic, Parameter estimation in general state-space models using particle methods, Ann. Inst. Stat. Math. 55 (2) (2003) 409–422. [33] T. Higuchi, Self organizing time series model, in: Sequential Monte Carlo Methods in Practice, Springer-Verlag, New York, 2001. [34] A. Tulsyan, J.F. Forbes, B. Huang, Designing priors for robust Bayesian optimal experimental design, J. Process Control 22 (2) (2012) 450–462. [35] J. Liu, M. West, Combined parameter and state estimation in simulation-based filtering, in: Sequential Monte Carlo Methods in Practice, Springer-Verlag, New York, 2001. [36] F. Gustafsson, T. Schon, Particle filters for system identification of state-space models linear in either parameters or states, in: Proceedings of the 13th IFAC Symposium on System Identification, Saint-Malo, France, 2003. [37] A. Kassidas, J.F. MacGregor, P.A. Taylor, Synchronization of batch trajectories using dynamic time warping, AIChE J. 44 (4) (1998) 864–875. [38] J. Yeh, Martingales and Stochastic Analysis, World Scientific (1995) 1. [39] Q. Xia, M. Rao, Y. Ying, X. Shen, Adaptive fading Kalman filter with an application, Automatica 30 (8) (1994) 1333–1338. [40] C.I. Byrnes, B.N. Datta, C.F. Martin, Systems and Control in the Twenty-first Century, Springer Science & Business Media, 2013, pp. 22. [41] T.B. Schon, A. Wills, B. Ninness, System identification of nonlinear state-space models, Automatica 47 (2011) 39–49. [42] R. Douc, A. Garivier, E. Moulines, J. Olsson, Sequential Monte Carlo smoothing for general state-space hidden Markov models, Ann. Appl. Prob. 21 (6) (2011) 2109–2145. [43] G. Kitagawa, Monte Carlo filter and smoother for non- Gaussian nonlinear state-space models, J. Comput. Graph. Stat. 5 (1) (1996) 1–25. [45] G. Birol, C. Undey, A. Cinar, A modular simulation package for fed-batch fermentation: penicillin production, Comput. Chem. Eng. 26 (2000) 1553–1565. [46] Z. Zhao, B. Huang, F. Liu, Constrained particle filtering methods for state estimation of nonlinear process, AIChE J. 60 (6) (2014) 2072–2082. [47] A. Tulsyan, C. Garvin, C. Undey, Industrial batch process monitoring with limited data, Journal of Process Control 77 (2019) 114–133. [48] A. Tulsyan, C. Garvin, C. Ündey, Advances in industrial biopharmaceutical batch process monitoring: MachineÜlearning methods for small data problems, Biotechnology and Bioengineering 115 (8) (2018) 1915–1924. [49] A.C. Tulsyan Garvin, C. Undey, Machine-learning for biopharmaceutical batch process monitoring with limited data, IFAC-PapersOnLine 51 (18) (2018) 126–131. [50] A. Tulsyan, F. Alrowaie, B. Gopaluni, Design and assessment of delay timer alarm systems for nonlinear chemical processes, AIChE Journal 64 (1) (2018) 77–90. [51] A. Tulsyan, P.I. Barton, Reachability-based fault detection method for uncertain chemical flow reactors, IFAC-PapersOnLine 49 (7) (2016) 1–6. [52] S. Spielberg, A. Tulsyan, N.P. Lawrence, P.D. Loewen, R.B. Gopaluni, Towards Self-Driving Processes: A Deep Reinforcement Learning Approach to Control, AIChE J. (2019). [53] S.S.P. Kumar, A. Tulsyan, B. Gopaluni, P. Loewen, A Deep Learning Architecture for Predictive Control, IFAC-PapersOnLine 51 (18) (2018) 512–517, http://dx. doi.org/10.1002/aic.16689. [54] A. Tulsyan, S. Khare, B. Huang, B. Gopaluni, F. Forbes, A switching strategy for adaptive state estimation, Signal Processing 143 (2018) 371–380.