Computers and Chemical Engineering 130 (2019) 106564
Contents lists available at ScienceDirect
Computers and Chemical Engineering journal homepage: www.elsevier.com/locate/compchemeng
Optimal fusion of industrial data streams with different granularities Tiago J. Rato, Marco S. Reis∗ CIEPQPF, Department of Chemical Engineering, University of Coimbra, Rua Sílvio Lima, 3030-790, Coimbra, Portugal
a r t i c l e
i n f o
Article history: Received 15 March 2019 Revised 25 August 2019 Accepted 28 August 2019 Available online 29 August 2019 Keywords: Data fusion Multiresolution data Rauch-Tung-Striebel smoother Multiresolution soft sensor Big data
a b s t r a c t In the Industry 4.0 era, it is a common situation that measurements are available with different resolutions (or granularities). Multiresolution (or multi-granularity) data can arise, for instance, from aggregation rules applied over measurements collected in batches during non-overlapping periods of time, or in composite sampling schemes (e.g., using industrial auto-sampler devices). Current solutions for optimal data fusion, such as Kalman filters (KF) and Moving Horizon Estimation (MHE) approaches, assume that all variables are measured at the same resolution, even if their acquisition rates are different. Therefore, they are unable to account for the multiresolution structure of data. In this article we introduce a multiresolution framework for the optimal fusion of multiple signals at distinct resolutions. This approach is compared against the benchmark alternative using simulated case studies, where the superior signal processing capability of the novel methodology is demonstrated. © 2019 Published by Elsevier Ltd.
1. Introduction The Chemical Process Industry (CPI) generates data with an increasing variety of structures that are not properly handled by current data analytics platforms. The existing platforms and methodologies strongly rely on the availability of data tables at a singleresolution and, most often, at a single acquisition rate. Analyzing modern process databases, one can easily verify that this assumption is frequently not met, meaning that these databases present, in general, a multiresolution data structure. This situation tends to be found with increasing incidence, as Industry 4.0 takes its course, and more data collectors are hooked along the entire value chain. In short terms, resolution relates to the degree of aggregation of a given quantity (variable), over the domain of the relevant independent variables, i.e., the time or space coordinates. In this context, some process variables collect instantaneous snapshots from the process (i.e., they are aggregated or averaged over a very short time period), while others are aggregated over several minutes, hours, or even days. Therefore, each recorded value does not respect solely to their reporting time instant (as happens in the multirate case (Izadi et al., 2005; Lu et al., 2004)), but rather to the time window over which data was acquired and accumulated. Due to this defining characteristic, multiresolution data require the use of dedicated methodologies. However, to the best of our present knowledge, the number of studies addressing the modeling and processing of multiresolution structures is very low, even rare. To
∗
Corresponding author. E-mail address:
[email protected] (M.S. Reis).
https://doi.org/10.1016/j.compchemeng.2019.106564 0098-1354/© 2019 Published by Elsevier Ltd.
mitigate this gap and effectively handle multiresolution data structures, a novel modeling framework was proposed by Rato and Reis (2017). This modeling framework is the enabler that sets the ground for the development of other Systems Engineering tasks for multiresolution data structures. In this work, we address the development of an optimal multiresolution estimation framework for industrial data that extends the well-known and established Kalman filter to this more general scenario. The celebrated Kalman filter (KF) (Kalman, 1960) is a singleresolution optimal estimation approach, with a Bayesian construction, currently available in several versions resulting from multiple extensions designed to address specific systems and data characteristics, including multirate (Wu and Luo, 2010; Roshany-Yamchi et al., 2013; Li et al., 2008) and multiscale (Stephanopoulos et al., 20 08; Stephanopoulos et al., 20 08; Basseville et al., 1992; Chou et al., 1994; Willsky, 2002) contexts. However, as discussed in the following paragraphs, none of these proposals addresses the actual multiresolution structure prevailing in industrial databases. The same applies to Moving Horizon Estimation formulations, targeting essentially the same fundamental problem as the Kalman filter (Haseltine and Rawlings, 2005). Multirate approaches concern the case where variables are available at different time instants, but still at the same resolution. Therefore, only the dynamic dependencies are addressed in these models, not accounting for any type of inter-resolution relationships. In fact, they mostly operate by employing multiple Kalman filters, depending on the amount of information available. Examples include the proposal of Roshany-Yamchi et al. (2013), where the missing variables are given zero weights during the KF state estimation. Another example is the approach of Wu and Luo
2
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
Fig. 1. Noise free signal for the response variables of the distillation case study in Section 6: (a) signals with multirate structure; (b) signals with a multiresolution structure.
(2010), which is based on a fast rate KF and, when a slow rate observation is acquired, the procedure switches to another KF. Multirate approaches also assume that, whenever an estimate is made (either at a fast or slow sampling rate – not to be confounded with high/low resolution) it holds for that particular time instant. This is a reasonable assumption for multirate data because all current data do concern to the same time instant (see Fig. 1(a)). However, for multiresolution data this assumption does not hold and would implicitly cause the multirate KF to consider that the low resolution signal is a good representation of the high resolution signal, when clearly this is not the case (see Fig. 1(b)). Therefore, multirate approaches are, by design, limited to underperform when applied to multiresolution data streams. For further discussion on the differences between multirate and multiresolution data sets we refer the reader to Ref. (Rato and Reis, 2017). On the other hand, multiscale approaches (Stephanopoulos et al., 2008; Stephanopoulos et al., 2008; Basseville et al., 1992; Chou et al., 1994; Willsky, 2002) (sometimes imprecisely referred as multiresolution (Reis, 2019)) aim to address process dynamics over distinct time-frequency scales. The rational underlying these approaches is that process phenomena are distributed across several scales. Thus, to better isolate the different dynamics, wavelet
transforms are employed to properly decompose the process data and then identify the relevant models (Bakshi, 1999; Reis, 2009; Rendall and Reis, 2014). To merge the information over the scales, these methodologies (Stephanopoulos et al., 2008; Stephanopoulos et al., 2008; Basseville et al., 1992; Chou et al., 1994; Willsky, 2002) consider the existence of a coarse-to-fine scale dependency between them, which is obtained from the state-space model available at the finest scale (the original scale). However, these approaches require all initial states at the finest scale to be known and available at the same resolution, being for this reason singleresolution, multiscale approaches. Following the above discussion, the development of a filtering approach for multiresolution data is of paramount importance, not only for its optimal estimation capabilities, but also for enabling the fusion of multiple data streams with different resolutions in a single predictive tool. An additional advantage that we have found in our research, is the ability to recover high resolution information even when only low resolution observations of the response are available. This can be achieved without requiring any additional data manipulation, such as lifting techniques often employed in multirate state-space approaches (Lin et al., 2009; Li et al., 2001; Dongguang et al., 2003).
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
The rest of this paper is organized as follows. In the next section, the multiresolution partial least squares (MR-PLS) modeling framework used in this study is briefly reviewed. Afterwards, the benchmark Kalman filter is introduced. This is followed by the description of the proposed multiresolution Kalman filtering approach (MR-KF) in Section 4. In Section 5, the proposed methodology is tested and compared in simulated case studies. In Section 6, a realistic simulation of an industrial process is used to consolidate the advantages of the proposed methodology. This case study also illustrates the application of MR-KF to signal reconstruction. The results obtained are discussed in Section 7 and the main conclusions of this work are summarized in Section 8. 2. Background: multiresolution modeling Before discussing the modeling methodology, the nomenclature used will be first defined. Multiresolution data arises from the use of aggregation rules that merge multiple (high resolution) samples into a single (low resolution) observation, after which the former are discarded (see Fig. 2). This contrasts with the more wellknown scenario of multirate data where variables are not aggregated over time. Furthermore, variables may be recorded at distinct resolutions (or granularities). To reflect this multiresolution (q ) structure, each variable is here represented as xt , where q is the resolution level and t is the time index at the highest resolution. Additionally, the time window over which samples are collected and aggregated is defined as the variables’ time support. Without loss of generality, it is assumed that the resolution levels follow a dyadic structure as considered in wavelet-based frameworks. If such dyadic pattern is not encountered in a particular application, interpolation (Stephanopoulos et al., 2008) or multiresolution infilling of missing observations (Reis and Saraiva, 2006) can be used to accommodate such pattern. Based on this assumption, the time support of the aggregation rule at resolution q has a length of 2q time instants. In other words, the recorded value at resolution q is the aggregated value within a window of 2q time instants to the past (see Fig. 2). The highest resolution (finest granularity) correspond to the zero level (i.e., q = 0), while the lowest resolution (coarser granularity) in the data is represented by r, thus 0 ≤ q ≤ r. Note that the aggregation rules are applied over non-overlapping windows and therefore the variables’ time support also defines the
3
frequency at which observations become available. As the lowest resolution (r) corresponds to the longest time support, the data is here divided into non-overlapping windows with a length of 2r time instants. These windows are tracked through a window index k and the set of time indices t within window k is given by ωk = {(k − 1 )2r + 1, (k − 1 )2r + 2, . . . , k2r }. Following these definitions, the changes in the time index t, as a result of movements on the multiresolution structure, can be univocally tracked down by the multiresolution indices: q, r and k. In the multiresolution modeling methodology discussed in this section, it is further assumed that the response variable is available at the lowest resolution since it is the variable more likely to be measured less frequently and also more prone to aggregation. This is the pattern found in the overwhelming majority of industrial situations. To obtain adequate multiresolution modeling frameworks (and multiresolution soft sensors), it is important to note that the structure of multiresolution data induces a natural dependency between high resolution (HR) and low resolution (LR) observations. This structure is a consequence of the acquisition systems, which often only record averages of several measurements collected over a specific time window (the variable’s time support). The same aggregation process is encountered when composite samples are collected over an extended time window. This underlying dependency is exemplified in Fig. 3 where it is observed that the low resolution response y(2) is directly dependent of a measured but unrecorded high resolution signal y˜(0 ) , which in turn has a static dependency with the predictor variable x(0 ) . To model these relationships, the Multiresolution Partial Least Squares (MR-PLS) presented by Rato and Reis (2017) starts by considering that the LR-responseobservation at time t (yt(r ) ), can be expressed as a weighted sum of unrecorded HR-response-observations (y˜t(0 ) ) within its time support. That is, (0 ) (0 ) yt(r ) = w0 y˜t(0) + w1 y˜t−1 + · · · + w2r −1 y˜t− (2r −1 )
(1)
where wi are weighting constants. Afterwards, by assuming a static dependency at the highest resolution level, the unknown HRresponse (y˜t(0 ) ) can be estimated by,
y˜t(0) = bxt(0)
Fig. 2. Relationship between the recorded values (gray) of variables aggregated at different resolutions, time index (t) and window index (k).
(2)
4
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
Fig. 3. Relationship between the HR- and LR-signals of the response and the HRpredictors. The observed variables are represented in gray.
where b is a vector of regression coefficients. Finally, by combining Eqs. (1) and (2), the LR-response predictions are given by, (r )
yˆt
=
r 2 −1
(0 )
wi y˜t−i =
i=0
r 2 −1
(0 )
wi bxt−i = b
i=0
r 2 −1
(0 ) wi xt−i = but(r )
(3)
i=0
where ut represents the sum of weighted predictor variables over the time support of y( r ) : r 2 −1
(0 ) wi xt−i
(4)
i=0
From Eq. (3), it is verified that the two step prediction process of the LR-response can also be made directly without the unavailable HR-response. Furthermore, Eq. (4) shows that the predictor variables (x’s) may be at different resolutions as long as a low resolution version of them can be computed (u( r ) ). Based on this, a LR version of MR-PLS (defined as the LR-model) can be fit using only the LR-response-observations. Nevertheless, since the regression coefficients are relative to the HR-response, an equivalent HR version of MR-PLS (defined as the HR-model) is obtained by simple mathematical manipulation (see Eq. (5)). Note however that the HR-response can only be accurately recovered if the predictor variables (x’s) are all at the highest resolution. If any of those variables is at a lower resolution (i.e., for x( q ) with q > 0), their HR-predictor-signal needs to be estimated before applying the HRmodel. To address part of this issue, one possibility is to fix the last known LR-predictor-observation to infill the missing HR-predictorobservations until the next one is available (zeroth-order hold). However, this only replicates the LR-predictor-signal into the missing observations, while the desirable approach is to recover the HR-predictor-signal at each instant. For this reason, the application of the HR-model may still underperform in some scenarios. The above derivation was made for static systems. A similar deduction can also be made for dynamic relationships, leading to the following general model:
yˆt(r ) = y˜t(0) =
⇔
2r −1 i=0
L
yˆt(r ) =
ut(r ) =
j=0
(0 ) wi y˜t−i
(0 ) b j xt− j
L j=0
2r −1 i=0
wi = ( 1 − λ ) , i
(r )
ut(r ) =
tion models the multiresolution structure between the HR- and LRresponse, while the bottom equation describes the dependence of the HR-response upon the HR-predictors through a dynamic relationship that involves several time-shifted versions of them (the, (0 ) xt− , j = 0,…,L). Likewise, Eq. (6) is the LR representation of MRj PLS (i.e., the LR-model), where the HR-predictors are changed to the resolution of the LR-response (bottom equation) before applying the prediction model (top equation). To fit the combined HR- and LR-models, we suggest running a grid search optimization step to select the best weights following a minimum mean squared error criterion (Rato and Reis, 2017). For each set of weights, the regression coefficients are obtained by PLS (Helland, 1988; Höskuldsson, 1988; Martens and Naes, 1989; Wold et al., 2001; Geladi and Kowalski, 1986; Jackson, 1991). Even though the weights can be determined by optimization, for standard multiresolution data sets they will have an equal value in order to simulate an averaging procedure. Nevertheless, the approach followed in MR-PLS is flexible and contemplates other weighting schemes. For instance, it can be assumed that the variables importance decreases in time, according to an exponential decay (as in an EWMA filter), i.e.,
i = 0, . . . , 2r − 1
with 0 ≤ λ < 1 as the forgetting factor. With this class of multiresolution models (more information in Ref. (Rato and Reis, 2017)) it is possible to develop optimal estimation schemes, namely filtering approaches, which is the main goal of this work. 3. Benchmark method: single-resolution Kalman filter (SR-KF) The Kalman filter is based on a state-space representation of the system, such as,
θt = Aθt− + Bxt + ht yt = Cθt + vt
b j ut(r ) (0 ) wi xt−i
(5)
(6)
In Eqs. (5) and (6), y˜t(0 ) is the unknown HR-response, wi are weights relating the high and low resolution signals of the response, bj are regression coefficients and L is the number of past observations with a dynamic effect on y˜t(0 ) . Eq. (5) corresponds to the HR version of MR-PLS (i.e., the HR-model), where the top equa-
(8)
where θ t is a (d × 1) vector of states, xt is a (m × 1) vector of inputs and yt is a (a × 1) vector of observations. A (d × d) and B (d × m) represent the system dynamic matrices and C (a × d) is the observation matrix. ht and vt are additive white noise vectors on the states and observations, satisfying:
E (ht ) = E (vt ) = 0
E vt vtT = R T
E ht htT = Q
(9)
E vt ht = 0 Under this prior model specification, the standard SR-KF (see Fig. 4) starts with a prediction update that estimates the value of the states and their associate covariance matrix:
θˆt |t − = Aθˆt −|t − + Bxt Pt |t − = APt −|t − AT + Q
⇔
(7)
(10)
The next stage involves the update of the current estimate of the states (θˆt |t − , the prediction for time t based on the information available at the previous time instant t –) using the current observations (yt ) by considering the discrepancy between the observed and predicted values (yt − Cθˆt |t − ) and the associated covariance matrices. This measurement update stage is carried out through the following relationships,
θˆt |t = θˆt |t − + Kt yt − Cθˆt |t − −1 T T Kt = Pt |t − C CPt |t − C + R Pt |t = (Id − Kt C )Pt |t −
(11)
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
5
Fig. 4. Operations of the standard single-resolution Kalman filter. M: measurement update; P: prediction update; E: corrected estimates. To facilitate the analysis of the figure, the sequential numbering 1, 2, etc., represents the order by which each operation is made.
where Kt is the (d × a) innovation gain matrix and Pt |t − is the (d × d) covariance matrix of the estimation errors (see Eq. (10)). After this processing, the filtered estimate for the observations are obtained as,
y˜ t = Cθˆt |t
(12)
These steps are then repeated for each new time instant. To initialize the algorithm, it is required to have initial estimates of the states (θˆ ) and of the covariance matrix of their estimation er0|0
ror (P0|0 ). Similarly, the covariance of the noise vectors (R and Q) should also be properly estimated or set. Another aspect that must be considered during the implementation of the SR-KF is the type of model used to estimate the states. In typical applications, this corresponds to a singleresolution model (SR-model) built under the assumption that all variables are available at the same resolution, even when sampled at different rates. In this context, the SR-model is typically fit after downsampling the predictor variables to the time instants where the LR-response is available. At most, the otherwise discarded inter-samples are included in the SR-model as time-shifted variables. However, this approach only accounts for dynamic dependencies and does not properly handle multiresolution structures (Rato and Reis, 2017). Therefore, the SR-model (and by extension, the SR-KF) does not effectively integrate any information about the multiresolution structure. Another disadvantage of the SR-KF is that it can only be applied when a LR-response-observation becomes available at the end of its time support (i.e., for t = k2r ). For this reason, the SR-KF only improves the estimates for the LRresponse. The SR-KF also implicitly assumes that both the HR- and LR-response have the same physical meaning. The best estimates for the HR-response that can be attained with the SR-KF correspond to the LR-estimates. The same fundamental problem is encountered in the multirate Kalman filter. 4. Proposed method: optimal fusion of multiresolution data streams The proposed framework for multiresolution data fusion is composed by three main components: (i) A bank of standard Kalman filters that fuse data available at a given resolution level; (ii) A high-to-low sweep that transfers information from higher to lower resolution levels; (iii) A low-to-high smoothing step that propagates the improved estimates from lower to higher resolution levels. Stages (ii) and (iii) constitute the fundamental components for addressing the multiresolution structure of data.
Without loss of generality, it is considered that the response variable is to be analyzed in two resolution levels (defined as HR and LR). This is the scenario most often encountered in practice. For instance, a set of HR-response-observations obtained from a fast, but uncertain analytic technique, while LR-responseobservations come from laboratory analysis conducted over composite samples. Furthermore, it is also considered that, in some scenarios, the HR-response may not be even available. In such case, the goal of MR-KF is not only to filter the LR-response (which is available), but also to reconstruct the signal of the underlying HRresponse. Regarding the predictor variables, it is assumed that they can be available at any resolution level. This implies that observations for predictor variables at lower resolutions (i.e., q > 0) may not be available at all time instants. Recovering the high resolution signal of the predictors goes beyond the scope of this article. As an approximation, these cases were treated by filling in the missing observations by holding the last observation constant (zero-order hold). In summary, the following process measurements are treated as inputs to the MR-KF: • Observations for the predictor variables, x( q ) , at multiple resolutions, where 0 ≤ q ≤ r; • Observations for the response variable at low resolution, y( r ) ; • Observations for the response variable at high resolution, y(0) . Observations for the response variable at high resolution, y(0) , may not be available. In this case, for the sake of implementing the MR-KF, its values can be estimated using a high resolution inferential model (HR-model; see below). In addition to the aforementioned data, the proposed framework also makes use of the predictions provided by the MR-PLS methodology introduced in Section 2. This model is trained on the predictor variables (x( q ) ) and LR-response-variables (y( r ) ) and returns: • LR-model (b, w), see Eq. (6), where a prediction-LR-estimate of the response is obtained; • HR-model (b), see Eq. (5), where a prediction-HR-estimate of the response is obtained; • Weights (w) relating the HR- and LR-response, see Eq. (5), where an inter-resolution-LR-estimate of the response is obtained. To achieve multiresolution integration, the proposed MR-KF is applied by imbedding the HR- and LR-models in an equivalent state-space representation defined by, (0 ) θt(0) = xtT bPLS + ht(0)
yt(0) = θt(0) + vt(0)
(13)
6
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
(r ) + hk(r2)r θk(2r )r = xTk2r bPLS
(14)
yk(r2)r = θk(2r )r + vk(r2)r
As stated before, the multiresolution structure induces an additional relationship between the HR- and LR-response. This relationship is here modeled by,
)T θk(2r )r = W(r ) θ k(02)r + sk(r2)r , θ k(02)r = θ((k0−1 )2r +1
where xtT is a (m × 1) vector of predictor variables and xk2r = xT(k−1 )2r +2 · · · xTk2r ]T is a ((m2r ) × 1) extended [xT(k−1 )2r +1 vector that concatenates all predictor variables within the LR-time (0 ) support. Likewise, bPLS is a (m × 1) vector of coefficients representing the HR-model (its elements are the b coefficients in Eq. (5) and (r ) bPLS is a ((m2r ) × 1) vector of coefficients representing the LRmodel (its elements are the respective product of the w and b coefficients in Eq. (6)). In this framework, the states θt(0 ) and θk(2r )r correspond to true values of the HR- and LR-response.
1.1. Prediction update (0 ) θˆt(|0t −) = xtT bPLS
Pt(|0t)− = Q(0)
1.2. Measurements update
θˆt(|0t ) = θˆt(|0t −) + Kt(0) yt(0) − θˆt(|0t −) −1 Kt(0) = Pt(|0t)− Pt(|0t)− + R(0)
Pt(|0t) = I − Kt(0) Pt(|0t)−
2. Low resolution Kalman Filter 2.1. High-to-low estimate (0 )
⎡
Pk(02)r |k2r
) θˆ((k0−1 )2r +1|(k−1 )2r +1
) θˆ((k0−1 )2r +2|(k−1 )2r +2
···
θˆk(20r)|k2r
T
) P((0k−1 )2r +1|(k−1 )2r +1
0
0
···
0
0
) P((0k−1 )2r +2|(k−1 )2r +2
0
···
0
0
0
) P((0k−1 )2r +3|(k−1 )2r +3
. . . 0
. . . 0
⎢ ⎢ ⎢ ⎢ =⎢ ⎢ ⎢ ⎢ ⎣
(0 )
···
) Pk(r2)(r |res = W(r ) Pk(02)r |k2r W(r ) + S(r ) k2r − T
2.2. Prediction update pred ) (r ) T θˆk(2r )( r |k2r − = xk2r bPLS
) = Q (r ) Pk(r2)(r |kpred 2r −
θˆk(2r )r |(kf2usr −) = Pk(r2)r |k2r − Pk(r2)r(|kf 2usr −) =
) Pk(r2)(r |kpred 2r −
) Pk(r2)(r |kpred 2r −
−1
−1
−1 pred ) ) res ) ˆ (r )( θˆk(2r )( Pk(r2)(r |res θ r |k2r − + r r r k2 − k2 |k2 − −1 −1
) + Pk(r2)(r |res k2r −
2.4. Measurements update
θˆk(2r )r |k2r = θˆk(2r )r |(kf2usr −) + Kk(r2)r yk(r2)r − θˆk(2r )r |(kf2usr −) −1 Kk(r2)r = Pk(r2)r(|kf 2usr −) Pk(r2)r(|kf 2usr −) + R(r )
Pk(r2)r |k2r = I1 − Kk(r2)r Pk(r2)r(|kf 2usr −) 3. Downward smoothing θ˜ (r )r r = θˆ (r )r r
k2 |k2 k2 |k2 (0 ) (0 ) ) θ˜k2r |k2r = θˆk2r |k2r + Lk(02)r θ˜k(2r )r |k2r − θˆk(2r )r |(kres 2r − −1
) Lk(02)r = Pk(02)r |k2r W(r )T Pk(r2)r(|res k2r −
4. Corrected estimates (0 )
(0 ) y˜ k2r |k2r = θ˜k2r |k2r , y˜k(r2)r |k2r = θ˜k(2r )r |k2r
. . . ..
res ) (r ) θˆ θˆk(2r )( r |k2r − = W k2r |k2r
2.3. Bayesian fusion of LR-estimates
θk(20r)T
T
where W(r) is a (1 × m) vector of weighting constants (w) and sk(r2)r represents the error when moving from high-to-low resolutions. This error is assumed to be white noise with zero mean and covariance S( r ) . Also, note that the HR-states refer to a single time instant, t, while the LR-states have longer time supports covering time windows with 2r observations, and thus the need for concatenating the HR-states. Furthermore, the LR-response-observations
1. High resolution Kalman Filter For t from (k – 1) 2r + 1 to k2r ,
θˆk2r |k2r =
···
(15)
Table 1 Pseudo-code for MR-KF.
)T θ((k0−1 )2r +2
.
0
0 Pk(02)r |k2r
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
7
Fig. 5. Overall scheme of the proposed multiresolution Kalman filter: (a) high resolution KF; (b) Bayesian fusion at low resolution KF; (c) downward smoothing of high resolution estimates.
are only available at the end of such time window, t = k2r (k represents the window index). With this model structure available, the MR-KF is applied according to the procedure described in the following paragraphs. The pseudo-code for this approach is presented in Table 1 and is graphically represented in Fig. 5. Furthermore, the naming scheme in Fig. 6 is used to uniquely identify the response at each step of the MR-KF, as it is successively processed. The first step of MR-KF involves the filtering of each HRresponse-observation (yt(0 ) ; or their estimates, in case they are not collected) with a HR-Kalman filter (Stage 1 of Table 1; Fig. 5(a)). This operation corresponds to a standard KF using the HR-model in the prediction update and returns a filtered-HR-estimate (θˆt(|0t ) ) at each time instant t. Furthermore, since the HR-response is related to the LR-response (recall that
the LR-response is an aggregated version of the HR-response), the filtered-HR-estimates within the time support of the LRresponse (i.e., for t ∈ ωk ) are temporarily retained for further processing. When a LR-response-observation (yk(r2)r ) is obtained at the end of window k (i.e., for t = k2r ), the MR-KF proceeds to a series of data fusion operations at the LR-level (Fig. 5(b)). Firstly, the MR-KF uses the filtered-HR-states within the LR-time support (i.e., θˆt(|0t ) for t ∈ ωk ) to make an inter-resolution-LR-estimate res ) of the LR-state (θˆk(2r )( r |k2r − ) by application of Eq. (15) (Step 2.1
of Table 1). At this point the inter-resolution-LR-error covarires ) ance matrix (Pk(r2)( r |k2r − ) is also updated based on the HR-errors,
which are assumed to be independent in time (the case where they are time dependent is described in Appendix A). There-
8
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
Fig. 6. Relation between the HR- and LR-response variables at each step of the multiresolution Kalman filter.
fore, the covariance of the concatenated filtered-HR-errors is given by,
⎡
) P((0k−1 )2r +1|(k−1 )2r +1
Pk(02)r |k2r
⎢ ⎢ ⎢ ⎢ =⎢ ⎢ ⎢ ⎣
0
0
0
) P((0k−1 )2r +2|(k−1 )2r +2
···
0
···
0 .. . 0
0 .. . 0
0 .. .
) P((0k−1 )2r +3|(k−1 )2r +3
.. ···
T
) Pk(r2)r(|res = W(r ) Pk(02)r |k2r W(r ) + S(r ) k2r −
(17)
ance matrix of this estimate is the prediction-LR-error covariance, (r )( pred ) res ) ˆ (r )( pred ) Pk2r |k2r − . Since two LR-states are available (θˆk(2r )( r |k2r − and θk2r |k2r − ),
they are merged by a Bayesian fusion module, which generates a (r )( f us ) fused-LR-estimate (θˆk2r |k2r − ). This is done by weighting both esti(r )( pred )
mates by their associated error covariance matrices (Pk2r |k2r − and
res ) Pk(r2)( r |k2r − ; Step 2.2. of Table 1; Fig. 5(b)), following the approach of
Pk(r2)r(|kf 2usr −) =
) Pk(r2)(r |kpred 2r −
) Pk(r2)(r |kpred 2r −
−1
−1
−1 pred ) ) res ) θˆk(2r )( Pk(r2)(r |res θˆk(2r )( r | k2r − + r | k2r − k2r − −1 −1
.
0
Secondly, the LR-model (Eq. (14)) is applied to obtain a prediction update at LR-level, retuning a prediction-LR-estimate for (r )( pred ) the LR-state, θˆk2r |k2r − , (Step 2.4 of Table 1; Fig. 5(b)). The covari-
θˆk(2r )r |(kf2usr −) = Pk(r2)r |k2r −
Table 1; Fig. 5(b)).
0
and the updated inter-resolution-LR-error covariance is computed by; see Eq. (15),
Chou et al. (1994),
filtered-LR-estimate of the response (yˆk(r2)r |k2r ; Steps 2.3. and 2.4. of
) + Pk(r2)(r |res k2r −
(18) Afterwards, this fused-LR-estimate is sent to a LR-Kalman filter where it is merged with the LR-response-observation, leading to a
0 Pk(02)r |k2r
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
(16)
The final (optional) step of the proposed MR-KF consists of a downward smoothing stage (Stage 3 of Table 1; Fig. 5(c)) that smooths the filtered-HR-states based on the filtered-LR-states by use of the Rauch-Tung-Striebel (RTS) algorithm (Rauch et al., 1965). In this case, the initial condition for the RTS smoother are the LRstates computed from all available information (θ˜k(2r )r |k2r = θˆk(2r )r |k2r ). This stage is implemented by application of the following relationships,
(0 ) (0 ) ) θ˜k2r |k2r = θˆk2r |k2r + Lk(02)r θ˜k(2r )r |k2r − θˆk(2r )r |(kres 2r − −1 ) Lk(02)r = Pk(02)r |k2r W(r )T Pk(r2)r(|res k2r −
(19)
where the smoothing gain matrix (Lk(02)r ) is determined from the ex˜ k(02)r |k2r ), such that, tended smoothed error covariance matrix (P
˜ k(02)r |k2r = P(0)r r + L(0)r (P(r )r r − P(r )r(resr) )L(0)r T P k2 |k2 k2 k2 |k2 k2 |k2 − k2
(20)
Following these operations, the outputs of MR-KF are the filtered-HR-estimate at each time instant t, along with the smoothed-HR-estimates and filtered-LR-estimate at each time window k (Stage 4 of Table 1).
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
5. Case study 1: exploratory analysis
To introduce dynamics in the process, the latent variables are computed based on a first order autoregressive model:
The design of the MR-KF is not trivial, and this is a consequence of the need to handle, in a consistent way, the different resolutions available in data. However, it is important that such data characteristics be treated in a rigorous and coherent way, as demonstrated in the example presented in this section. To demonstrate the real added-value of the proposed multiresolution Kalman filter, three simulated scenarios addressing different scopes were considered. The resort to a simulation-based approach allows direct control of the process behavior and the mechanisms for generating multiresolution data, potentiating a more transparent assessment of the results (given that the underlying noise free signal is known) as well as to conduct meaningful and rigorous comparisons. In the first scenario (Section 5.1), the HR-response-observations are not directly available, being their values estimated by a soft sensor. Furthermore, the predictor variables are all available at the highest resolution. On the other hand, in the second scenario (Section 5.2) the case where the predictor variables are available at different resolutions is considered. Finally, in the third scenario (Section 5.3) we illustrate the case where noisy HR-responseobservations are indeed available, simulating the case where a less reliable but faster measurement source can be used to infer the response (e.g., a spectrometer sensor implemented online, instead of a chromatographic one). The performance comparison is based on the mean squared error (MSE, Eq. (21)) using the noise free signal as reference, obtained for 100 independent realizations of the simulated processes. In all cases, 10 0 0 observations were used to train the SR-, HR- and LR-models and the respective SR- and MR-KFs, being the error matrices (Q, R and S) preliminarily tuned for optimizing the filter performance. Afterwards, another 10 0 0 observations were generated for testing the different data fusion approaches. In each replicate, the MSE is computed for both HR-estimates and LR-estimates.
MSE =
n 2 1 yi − yˆi n
9
(21)
i=1
5.1. Scenario 1: all predictor variables available at the same high resolution
ti, j = ϕ j ti−1, j + εi, j
(24)
where, ti, j is the element of the i-th row and j-th column of T, ϕ j is the autoregressive coefficient for each latent variable and ɛi, j is a random white noise variable with zero mean and variance (1 − ϕ 2j ) (leading to latent variables with unit variance); they represent the source of structural process variation. The autoregressive coefficients of the three latent variables were set to 0.9, 0.5 and 0.3, respectively. By application of the aforementioned steps, one obtains the “true” values of the process and response variables, without the effect of measurement noise. Therefore, on top of the noise free data, a second layer of measurement variation is added through the residual E and F matrices (additive noise). Furthermore, the data acquisition systems are also responsible for the appearance of multiresolution characteristics. In this case, the observed response variable Y is recorded less frequently and corresponds to averages taken over windows with a time support of 4 (r = 2). Therefore, Y is only available every 4th time instant and the observed value is an average of the HR-signals in these time windows. The predictor variables X may, or may not, undergo a similar data acquisition (averaging) treatment. In this scenario it is considered that all predictor variables are at the same (highest) resolution (q = 0). Thus, it constitutes the best-case scenario. The case where the predictors are at different resolutions is addressed in Section 5.2. To assess the Kalman filters’ performance and robustness, the data sets were subject to different error magnitudes by changing the standard deviation of F to 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, and 1.00, while the standard deviation of E was kept as 0.10. For all cases, the HR-MSE and LR-MSE were computed for 100 random latent variables systems. For the sake of reference, in Fig. 7 the MSE of the SR- and LRmodels (before applying the KFs) are represented. This is accompanied by the MSE of the LR-response-observations (which in this case simply corresponds to the amount of noise added to the LRresponse). Note that these quantities measure the quality of the inputs used by the studied KFs and consequently they also affect the KFs performance.
The present scenario makes use of a dynamic latent variable model structure in order to simulate the typical data structures originated in real processes operating under normal operation conditions (Burnham et al., 1999; Burnham et al., 2001; Kourti, 2005; Kourti and MacGregor, 1995; Kresta et al., 1991; MacGregor and Kourti, 1995). A similar simulation approach was followed, for instance, in Refs. Shi and MacGregor (20 0 0), Ferrer et al. (2008), Reis et al. (2015), Reis and Saraiva (2005). In this context, data is simulated according to the following latent variable model formulation,
X = TPT + E
(22)
Y = TBGT + F
(23)
where T is a (n × p) matrix of latent variables relating X and Y; P is a (m × p) X-loading matrix; G is a (a × p) Y-loading matrix; B is a (p × p) regression coefficient matrix; E (n × m) and F (n × a) are residual matrices. In these simulations, the process dimensions were set to nine process variables (m = 9), one quality variable (a = 1) and three latent variables (p = 3). The process parameters P and G are randomly generated matrices with orthogonal columns, while the elements of B are randomly generated using the standard normal distribution.
Fig. 7. Evolution of the median MSE of the models when the predictors are at the highest resolution and for increasing noise in the LR-response-observations. The error bars correspond to the interquartile range.
10
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
Fig. 8. Evolution of the median MSE of the Kalman filters when the predictors are at the highest resolution and for increasing noise in the LR-response-observations. The error bars correspond to the interquartile range. (a) MSE on the LR-response; (b) MSE on the HR-response.
From Fig. 8(a) it is verified that the filtered-LR-estimates returned by the KFs degrade with the increase of noise in the LRresponse-observations. This is an expected result since the noise level ultimately determines the quality of the models’ estimates as well as the uncertainty of the LR-response-observations (see Fig. 7). Furthermore, it is clearly visible that the better results of the MR-KF are linked to the LR-model. In turn, the superior performance of the LR-model is related to its ability to model the multiresolution dependencies and thus be more parsimonious with the actual structure of the data. As for the HR-response, the best the standard SR-KF can do is to assume that it is equal to the filtered-LR-estimates. However, since HR-signals relate to instant measurements and LR-signals relate to averages over non-overlapping time windows (more specifically, non-overlapping averages of the HR-signal), this assumption leads to poor estimation performances at the HR-level. On the other hand, the proposed MR-KF acknowledges the difference between HR- and LR-signals and makes use of their relationship to improve them both. To achieve this, the MR-KF uses the HR-model (and HR-response-observations, if available) to make an initial estimate of the underlying HR-response (i.e., the filtered-HR-estimates). This information is then used to obtain an inter-resolution-LR-estimate, which is subsequently fused with the prediction-LR-estimate and the actual LR-response-observation. This leads to an improved filtered-LR-estimate that can be used to smooth the filtered-HRestimates. As a result, the usually unknown HR-responses can be reconstructed with greater accuracy as represented in Fig. 8(b). 5.2. Scenario 2: predictor variables available with different resolutions To extend the results to a full multiresolution scenario, the predictor variables were set to different resolutions by divided them into three resolution levels: (i) variables 1–3 are set to have the highest resolution (q = 0); (ii) variables 4–6 are computed as averages with time support 2 (q = 1); (iii) variables 7–9 are computed as averages with time support 4 (q = 2). Note that since P and G are randomly generated, the order of the variables has no consequence. The presence of predictors with different resolutions raises an additional challenge on the implementation of the HR-models since they require the HR-signal of all predictors. Therefore, to infill the missing observations of LR-predictors, their last known value
Fig. 9. Evolution of the median MSE of the models when the predictors are at different resolutions and for increasing noise in the LR-response-observations. The error bars correspond to the interquartile range.
was kept constant under the assumption that the process does not change very rapidly. Note however that this constitutes a crude approximation since LR-predictor-observations were used to infill missing HR-predictor-observations. For this scenario, the MSE for increasing noise in the LRresponse-observations are presented in Figs. 9 and 10. Again, Fig. 9 represents the quality of the models used by the KFs, while Fig. 10 addresses the performance of the KFs. The first observation from these results is that the SR-models have a better performance than in the previous scenario. This happens because in the current scenario some of the predictors are at the same resolution as the response and thus closer to its structure. However, by using the LR-models, better estimates are obtained. Because the uncertainty in the LR-response-observations is always greater than the uncertainty of the models (see Fig. 9), the KFs will tend to give little weight to the LR-response-observations. Therefore, the KFs’ performances are mostly driven by the models they use. For this reason, the best approach at the LR-level is the MR-KF (see Fig. 10(a)). Furthermore, at the HR-level, the MRKF tends to have lower HR-MSE than the SR-KF (see Fig. 10(b)).
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
11
Fig. 10. Evolution of the median MSE when the predictors are at different resolutions and for increasing noise in the LR-response-observations. The error bars correspond to the interquartile range. (a) MSE on the LR-response; (b) MSE on the HR-response.
Fig. 11. Evolution of the median MSE when noisy HR-response-observations are available and for increasing noise in the LR-response-observations. The error bars correspond to the interquartile range. (a) MSE on the LR-response; (b) MSE on the HR-response.
This happens because MR-KF is capable of reconstructing the HR-response, while with SR-KF it can only be assumed that the HR-response equals the LR-response. 5.3. Scenario 3: fusion of multiple response signals While the MR-KF can be applied without HR-responseobservations, its application scope also covers the scenario where the response variable is observed at both HR- and LR-levels (the case where the response is available at more resolution levels is discussed in Section 7). To exemplify this situation, we revisit the latent variables process of Section 5.1 (with all predictors at the same resolution), but now considering that noisy HR-responseobservations are also available. This happens in practice when a faster sensing technology is used on-line (e.g. a spectroscopic device) together with the more accurate and labor intensive solution available in the quality laboratory (e.g., a chromatographic column). In this scenario, simulations are carried out as follows. The noise free HR-response-observations are firstly generated by application of Eq. (23). Afterwards, the noise free LR-responseobservations are built by averaging the HR-response-observations
over a time support of 4 time instants. Finally, white noise is added to both signals. For the HR-response-observations, the standard deviation of the noise was set to 0.30, while for the LRresponse-observations, the standard deviation of the noise was set to 0.10, 0.20, 0.30, 0.40, 0.50, 0.60, 0.70, 0.80, 0.90, and 1.00. In all cases (regardless of the noise magnitude) the HR- and LR-models are trained solely on the LR-observations, following the MR-PLS approach of Section 2. This is done under the assumption that LR-response-observations are more reliable and thus more likely to provide better models. Otherwise, a different modeling approach can be used to fit a HR-model. Having this data structure, the MR-KF is applied to fuse the noisy HR-response-observations with the prediction-HR-estimates obtained with the HR-model. Afterwards, the filtered-HR-estimates are converted into inter-resolution-LR-estimates and fused with the prediction-LR-estimates and LR-response-observations in order to produce a set of filtered-LR-estimates. Finally, the filteredHR-estimates are smoothed using the filtered-LR-estimates. The smoothed-HR-estimates and filtered-LR-estimates were then compared against the respective noise free signals. The resulting MSEs are summarized in Fig. 11 through their median and interquartile range. For comparison purposes, the MSEs obtained with the HR-
12
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
and LR-models (without applying the MR-KF) and the HR- and LRobservations inputted to the MR-KF are also represented in Fig. 11. These results show that even though the HR-models were trained using only the LR-response-observations, they still have lower errors than the actual HR-response-observations (see Fig. 11(b)). Furthermore, it is verified that the MR-KF can give better estimates of the HR-response due to the natural filtering capabilities of the KF. In turn, the better estimates at the HR-level also lead to improvements in the filtered-LR-estimates (see Fig. 11(a)). Similar performances were observed for different noise levels in the HR-observations. Based on these results, it is verified that the MR-KF can make an efficient use of multiple sources of information.
Table 3 Paired t-test to the MSE of the modeling and filtering methodologies at low resolution. p-value and test statistic are presented for each pair of A – B.
6. Case study 2: simulation of an industrial process
The values in bold represent cases where A is statistically better than B.
The second case study concerns the use of the multiresolution Kalman filter in a realistic scenario based on simulations of a distillation column described by Skogestad and Postlethwaite (2005) (available at http://www.nt.ntnu.no/users/skoge/distillation/ nonideal_skouras/ternary; accessed on August 29, 2016). In these simulations, the distillation column separates a ternary mixture of methanol, ethanol and 1-propanol. The column is composed by 41 stages, including a reboiler (stage 1) and a total condenser (stage 41). The feed is located at stage 21. The inputs of the system are the reflux (L) and boilup (V) flow rates, with disturbances in the feed flow (F) and in the feed composition (xA , xB and xC ). These disturbances are generated according to the autoregressive models in Table 2. Finally, the distillate flow (D) and bottom flow (B) are used to control the condenser holdup (MD ) and reboiler holdup (MB ), respectively. After inputting the disturbances, the simulator returns the noise free compositions and temperatures in all trays and for all time instants (i.e., all variables are originally available at high resolution). In the interest of this study, the molar fraction of the lighter component at the top of the column is treated as the response variable. Furthermore, it is considered to be measured through a laboratory analysis on a composite sample taken by an auto-sampler over a time period of approximately one hour (i.e., the auto-sampler collects samples over one hour, mixes them and injects the resulting composite sample in the analytical device, such as chromatographic column or a spectroscopic sensor). In other words, the observed distillate composition is a LR-variable with a time support of approximately 1 h (this is simulated by averaging the respective distillate noise free HR-signal). In Fig. 12(a) an example of the generated signals for the HR- and LR-response is provided. Please note the difference between the two signals due to the aggregation taking place in the LR-response. As for the temperature readings, they are all kept at their HR-level. All observations are subject to a signal-to-noise ratio (SNR) of 20 dB, being the SNR defined as: SNR = 10 log(var(signal )/var(noise ) ). To assess the performance of the methodologies under study, 100 replicates of training and test data sets were generated. Each data set was composed by 46,080 observations, corresponding to
Table 2 Configuration of the disturbances over time. ω(t;ϕ ) represents an autoregressive time series with zero mean, unit variance and autoregressive coefficient ϕ . Disturbance
Equation
F
1 + 0.033 × ω (t; 0.9)
xA
xA =
xB
xB =
xC
xC =
zA zA +zB +zC zB zA +zB +zC zC zA +zB +zC
,
[kmol/min]
zA (t ) = 0.4 + 0.033 × ω (t; 0.95 )
,
zB (t ) = 0.4 + 0.033 × ω (t; 0.95)
,
zC (t ) = 0.2 + 0.033 × ω (t; 0.95)
B A LR-Obs. LR-Model SR-Model SR-KF MR-KF
LR-Obs.
LR-Model
SR-Model
SR-KF
MR-KF
– – <0.001 −90.425 <0.001 −60.942 <0.001 −65.841 <0.001 −90.747
<0.001 90.425 – – <0.001 28.168 <0.001 26.690 0.665 0.435
<0.001 60.942 <0.001 −28.168 – – <0.001 −3.656 <0.001 −28.819
<0.001 65.841 <0.001 −26.690 <0.001 3.656 – – <0.001 −27.763
<0.001 90.747 0.665 −0.435 <0.001 28.819 <0.001 27.763 – –
12.8 h of operation. In spite of this large number of observations, since the LR-composition is measured every 64 min, only 720 observations of the LR-composition are available. All models were fit and optimized based on the training data using the LR-composition as response and the HR-temperatures as predictors. Following this operation, SR-, LR- and HR-models were obtained. Afterwards, the respective KF were applied to fuse multiple sources of information. Finally, the accuracy of each methodology was evaluated on the test data sets through the MSE (see Eq. (21)) using the noise free signals as reference. The SR-models were fit by downsampling the data to the time instants where the LR-composition was observed. To avoid losing information on the predictor variables, the inter-sample HRtemperatures were included in the SR-models as time shifted variables. Furthermore, the optimal number of time shifts (or lags) were selected by minimizing the cross-validation RMSE. For this scenario, it was observed that within the 100 replicates, the lowest number of lags was set to 59. This result is in-line with the underlying structure of the LR-composition since it relates to a composite sample taken over 64 time instants. As the predictors are all available at the HR-level, the SR-model can be used for online prediction. However, the SR-model can only return LR-estimates of the distillate composition. That is, at each time instant, the SR-model estimates the average distillate composition within the last 64 min. Although these estimates were close the true average composition, they were considerably different from the instantaneous HR-composition (see Fig. 12(b)). Following these models, SR-KFs were built to integrate the SR-model predictions with the LR-composition observations. From Fig. 13(a) and Table 3 it is verified that the SR-KFs were able to fuse both sources of information and improve the LR-estimates. However, since this approach is unable to reconstruct the HR-response, the SR-KFs had poor estimation capabilities at the HR-level. As mentioned before, this happens because with SR-KF one can only assume that the HR-signal is equal to the LR-signal. Since the LRcomposition covers a considerably long time support, such estimates are clearly inadequate (see Figs. 12(b) and 14). Conversely, when the HR- and LR-models were applied, accurate estimates for the distillate composition at both LR- and HR-levels were obtained (see Fig. 12(c)). This situation results from the ability of MR-PLS to explicitly account for the multiresolution structure of the data and thus encode that while the response is observed at LR, there is a HR-signal behind it. Although the undelaying aggregation rule was known to be an average operator, the MR-PLS methodology was allowed to select the weights of the aggregation rule by tuning the forgetting factor defined in Eq. (7). In 90% of the cases, the forgetting factor was lower than 1.07 × 10−3 . Thus, even in this extreme case, the HR-temperatures observations within the time support of the LR-
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
13
Fig. 12. Comparison between the noise free signals and the estimates made by each methodology: (a) noise free signals at the HR- and LR-levels; (b) estimates of SR-model and SR-KF; (c) estimates of HR-, LR-models and MR-KF.
composition received an approximately equal weight (for this case, the weights are between 0.93 and 1), which is in accordance with the true nature of data. The results obtained for the LR-model show that it already has considerable smoothing proprieties since the prediction-LRestimates are always better than the LR-response-observations. For this reason, the MR-KF gave a higher importance to the predictionLR-estimates, which subsequently causes the two approaches to have a similar performance at the LR-level (see Table 3). However, since the MR-KF can smooth the filtered-HR-estimates based on the filtered-LR-estimates, the MR-KF was able to surpass all other methodologies at the HR-level, including the HR-model (see Table 4). In this regard it is noted that while the smoothed version of the HR-response can only be retrieved after a LR-responseobservation become available, the HR-model already gives acceptable estimates. In fact, in Fig. 12(c) the prediction-HR-estimates
Table 4 Paired t-test to the MSE of the modeling and filtering methodologies at high resolution. p-value and test statistic are presented for each pair of A – B. B A HR-Model SR-KF MR-KF
HR-Model
SR-KF
MR-KF
– – <0.001 294.960 0.020 −2.368
<0.001 −294.960 – – <0.001 −295.303
0.020 2.368 <0.001 295.303 – –
The values in bold represent cases where A is statistically better than B.
and the smoothed-HR-estimates are almost identical. Furthermore, it is noticeable that the estimates provided by the HR-model are
14
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
Fig. 13. Boxplot comparing the MSE at LR-level of the inputs and outputs of each KF: (a) SR-KF; (b) MR-KF. For reference, the maximum MSE (excluding outliers) obtained by the MR-KF is represented in an horizontal line.
Fig. 14. Boxplot comparing the MSE at HR-level of each KF. For reference, the maximum MSE (excluding outliers) obtained by the MR-KF is represented in an horizontal line.
considerably closer to the true signal than any of the other models. This happens because the SR- and LR-models estimate the average composition within the last 64 min, while the HR-model estimates the instantaneous composition. Therefore, the perditionHR-estimates allow for a better control of the processes (either for automatic control or statistical process control). Likewise, the smoothed-HR-estimates can improve subsequent process optimization tasks since they can now relay on a local measure of the process state rather an overall metric. 7. Discussion Systems for data acquisition, storage and retrieval, are routinely used in industrial processes, constituting a rich source of information about their past and present operating conditions. However, even in the same process unit, not all variables are collected at the same resolution. To accommodate for the presence of observations at different resolutions, a MR-PLS scheme was proposed by Rato and Reis (2017) that is able to model multiresolution data sets
(see Section 2). This modeling approach sets the base for higher level data analysis tasks, such as the optimal fusion and estimation of process data, leading to the development of the multiresolution Kalman filter presented in this work. The standard Kalman filter aims at providing an estimate of the current state variables’ by optimally combining model’s predictions and observed values under a Bayesian framework. However, this standard formulation was designed for single-resolution data structures. Furthermore, if the response variable is observed at two (or more) resolution levels, then a SR-KF would be needed for each resolution level under analysis. As these filters typically do not interact, there is no integration of information across the resolution levels. Alternatively, an extended KF can be built by concatenating the HR- and LR-response into an artificial vector of observations. While this approach allows for a simultaneous estimation of both signals, it can still only be applied after all observations become available. Multirate implementations of the Kalman filter could also be considered as possible solutions for handling multiresolution data, if properly modified for incorporating the aggregation mechanism. This can be achieved by defining new artificial variables resulting from the application of Finite Impulse Filters (FIR) over moving windows. The state vector would be expanded and given by the concatenation of high- and low-resolution variables collected at different rates, where the acquisition rate of each variable is directly related to its resolution. However, unless the observability conditions for such expanded state vector are met, this approach implies a delay on the filter’s estimates as the multirate KF can only be applied once the low-resolution variables are acquired. This is a restriction that the proposed MR-KF approach does not present: the multiresolution KF can be applied online, to fuse the information that is available at each time instant. The MR-KF also maintains full consistency in the way each quantity is estimated by fusing measurements at different resolutions. This allows taking full advantage of all the information sources available. For instance, in multirate approaches the artificial generation of variables at the LR-response-estimates obtained at slow rate level of the multirate KF cannot be fed back to the fast rate level of the multirate KF because the filter is expecting only HR-estimates. In summary, in contrast to current approaches, the proposed MR-KF effectively fuses predictor variables at different resolutions, models the high-to-low dependency of data and improves both low and high resolution estimates of the response variable.
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
For these reasons, the proposed multiresolution methodology is necessarily more complex when compared to the current singleresolution approaches. However, the results presented in this work clearly show the advantages of adopting this more rigorous and consistent methodology. Another important advantage of the proposed MR-KF is the ability to reconstruct unknown HR-signals, estimating finer resolution information when it is not available. To exemplify this scenario, the MR-KF was used to recover the HR-composition of a key component of a simulated distillation column. The results confirm that the MR-KF is the best candidate for HR-signal reconstruction based only on LR-response-observations. However, accurate reconstructions of the HR-response can only be made if the predictor variables are all available at the HR-level. If this is not the case, the HR-signal of LR-predictors must be estimated prior to apply the HR-model. In this work, we opted to infill these missing HRobservations by holding constant the last known value of the LRpredictors. However, since HR- and LR-signals are averages over distinct time supports, this is a crude approximation. Alternatively, missing data imputation techniques (Walczak and Massart, 2001; Walczak and Massart, 2001; Nelson et al., 1996) can also be used. Finally, we note that the proposed methodology can be easily generalized to account for cases where the response is available at more than two resolutions. In summary, this can be achieved through a pyramidal scheme that starts by applying a high-tolow sweep over consecutive pairs of resolution levels (Fig. 4(b),
⎡
0) Pt(−2 r +1|t −2r +1
⎢ P (0) ⎢ t −2r +1|t −2r +1 ⎢ ⎢ 0) Pt(|0t) = ⎢ 2 Pt(−2 r +1|t −2r +1 ⎢ .. ⎢ ⎣ . r 0) 2 −1 Pt(−2 r +1|t −2r +1
0) Pt(−2 r +1|t −2r +1 0) Pt(−2 r +2|t −2r +2
0) 2 Pt(−2 r +1|t −2r +1 0) Pt(−2 r +2|t −2r +2
0) Pt(−2 r +2|t −2r +2
0) Pt(−2 r +3|t −2r +3
.. .
2r −2
(0 )
Pt −2r +2|t −2r +2
ição de Redes Temáticas) and co-financed by the European Union’s FEDER. Appendix A. Covariance matrix for the high resolution states As the LR-states are computed from the HR-states within the LR-time support, the covariance matrix of the HR-errors also needs to be computed over such time window. The resulting extended covariance matrix comprises the stationary HR-error dependency (matrix diagonal), as well as their cross-correlation (matrix offdiagonal). The stationary component at each time instant is obtained during the HR-measurement update step and thus easily available. On the other hand, the error’s cross-correlation depends on the process characteristics. Therefore, in order to estimate the error’s cross-correlation, it is assumed that the HR-errors follow a first order autoregressive model. Based on this assumption, each cross-covariance matrix can be computed using the autocovariance relationship given by Lütkepohl (2005):
(t ) = h (t − l )
replacing the high and low resolution inputs by the respective pair of resolution levels). This operation is then sequentially repeated until the lowest resolution level is attained. Afterwards, a series of low-to-high smoothing steps (Fig. 4(c)) are applied over consecutive resolution levels until the highest resolution level is attained. 8. Conclusions All available optimal estimation methodologies developed hitherto, such as Kalman filters and Moving Horizon Estimation approaches, are unable to handle multiresolution data in a consistent and rigorous way. To address their limitations, this article presents, for the first time, the development and application of a suitable Kalman filtering approach for multiresolution data fusion. The proposed methodology was tested, validated and compared against the current single-resolution Kalman filter in a series of simulated case studies covering distinct scenarios of available data. The results show that the proposed MR-KF is substantially better than the single-resolution Kalman filter, leading to estimation improvements in both HR- and LR-levels. Future work should focus on alternative ways for handling missing HR-predictor-observation in order to improve the online estimation of the HR-response. Acknowledgments The authors acknowledge financial support through project 016658 (references PTDC/QEQ-EPS/1323/2014, POCI-01–0145FEDER-016658) financed by Project 3599-PPCDT (Promover a Produção Científica e Desenvolvimento Tecnológico e a Constitu-
(25)
where t represents the time instant, l is the time lag and is the autoregressive coefficient matrix describing the error’s time series. This relationship is firstly applied to the oldest time instant and used to fill the elements of the same row. The columns are filled in the same manner. The same procedure is then applied to the second oldest time instant and so on until the off-diagonal is full. By doing so, the following extended HR-error matrix is obtained,
··· ···
.. ···
15
.
(0 )
Pt −1|t −1
⎤ r 0) 2 −1 Pt(−2 r +1|t −2r +1 r 0) ⎥ 2 −2 Pt(−2 r +2|t −2r +2 ⎥ ⎥ .. ⎥ . ⎥ ⎥ ⎥ (0 ) ⎦ Pt −1|t −1
(26)
Pt(|0t)
For processes with moderate dynamics (as the ones studied in Section 5) the error’s cross-correlation can be ignored without a major impact on performance. Therefore, and for exposure simplicity, it is assumed that = 0. References Bakshi, B.R., 1999. Multiscale analysis and modeling using wavelets. J. Chemom. 13, 415–434. Basseville, M., et al., 1992. Modeling and estimation of multiresolution stochastic processes. IEEE Trans. Inf. Theory 38 (2), 766–784. Burnham, A.J., MacGregor, J.F., Viveros, R., 1999. Latent variable multivariate regression modeling. Chemometr. Intell. Lab. Syst. 48, 167–180. Burnham, A.J., MacGregor, J.F., Viveros, R., 2001. Interpretation of regression coefficients under a latent variable regression model. J. Chemom. 15, 265–284. Chou, K., Willsky, A.S., Nikoukhah, R., 1994. Multiscale systems, Kalman filters, and Riccati equations. IEEE Trans. Automat. Contr. 39 (3), 479–492. Dongguang, L., et al., 2003. Application of dual-rate modeling to CCR octane quality inferential control. IEEE Trans. Control Syst. Technol. 11 (1), 43–51. Ferrer, A., et al., 2008. PLS: a versatile tool for industrial process improvement and optimization. Appl. Stoch. Models. Bus. Ind. 24 (6), 551–567. Geladi, P., Kowalski, B.R., 1986. Partial least-squares regression: a tutorial. Anal. Chim. Acta 185, 1–17. Haseltine, E.L., Rawlings, J.B., 2005. Critical evaluation of extended Kalman filtering and moving-horizon estimation. Ind. Eng. Chem. Res. 44 (8), 2451–2460. Helland, I.S., 1988. On the structure of partial least squares regression. Commun. Statist.-Simula. 17 (2), 581. Höskuldsson, A., 1988. PLS regression methods. J. Chemom. 2, 211–228. Izadi, I., Zhao, Q., Chen, T., 2005. An optimal scheme for fast rate fault detection based on multirate sampled data. J. Process Control 15 (3), 307–319. Jackson, J.E., 1991. A User’s Guide to Principal Components. John Wiley & Sons, Inc, New York. Kalman, R.E., 1960. A new approach to linear filtering and prediction problems. Trans. ASME – J. Basic Eng. (82 (Series D)) 35–45. Kourti, T., 2005. Application of latent variable methods to process control and multivariate statistical process control in industry. Int. J. Adapt. Control Signal Process 19, 213–246.
16
T.J. Rato and M.S. Reis / Computers and Chemical Engineering 130 (2019) 106564
Kourti, T., MacGregor, J.F., 1995. Process analysis, monitoring and diagnosis, using multivariate projection methods. Chemom. Intell. Lab. Syst. 28, 3–21. Kresta, J.V., MacGregor, J.F., Marlin, T.E., 1991. Multivariate statistical monitoring of process operating performance. Can. J. Chem. Eng. 69, 35–47. Li, D., Shah, S.L., Chen, T., 2001. Identification of fast-rate models from multirate data. Int. J. Control 74 (7), 680–689. Li, W., Shah, S.L., Xiao, D., 2008. Kalman filters in non-uniformly sampled multirate systems: for FDI and beyond. Automatica 44 (1), 199–208. Lin, B., et al., 2009. Data-Driven soft sensor design with multiple-rate sampled data: a comparative study. Ind. Eng. Chem. Res. 48 (11), 5379–5387. Lu, N., et al., 2004. Multirate dynamic inferential modeling for multivariable processes. Chem. Eng. Sci. 59 (4), 855–864. Lütkepohl, H., 2005. New Introduction to Multiple Time Series Analysis. Springer-Verlag, Berlin. MacGregor, J.F., Kourti, T., 1995. Statistical process control of multivariate processes. Control Eng. Pract. 3 (3), 403–414. Martens, H., Naes, T., 1989. Multivariate Calibration. Wiley, Chichester. Nelson, P.R.C., Taylor, P.A., MacGregor, J.F., 1996. Missing data methods in PCA and PLS: score calculations with incomplete observations. Chemom. Intell. Lab. Syst. 35, 45–65. Rato, T.J., Reis, M.S., 2017. Multiresolution soft sensors (MR-SS): a new class of model structures for handling multiresolution data. Ind. Eng. Chem. Res. 56 (13), 3640–3654. Rauch, H.E., Striebel, C.T., Tung, F., 1965. Maximum likelihood estimates of linear dynamic systems. AIAA J. 3 (8), 1445–1450. Reis, M.S., 2009. A multiscale empirical modeling framework for system identification. J. Process Control 19 (9), 1546–1557. Reis, M.S., et al., 2015. Challenges in the specification and integration of measurement uncertainty in the development of data-driven models for the chemical processing industry. Ind. Eng. Chem. Res. 54, 9159–9177. Reis, M.S., 2019. Multiscale and multi-granularity process analytics: a review. Processes 61 (17), 1–21.
Reis, M.S., Saraiva, P.M., 2005. Integration of data uncertainty in linear regression and process optimization. AIChE J. 51 (11), 3007–3019. Reis, M.S., Saraiva, P.M., 2006. Generalized multiresolution decomposition frameworks for the analysis of industrial data with uncertainty and missing values. Ind. Eng. Chem. Res. 45 (18), 6330–6338. Rendall, R.R., Reis, M.S., 2014. A comparison study of single-scale and multiscale approaches for data-driven and model-based online denoising. Qual. Reliab. Eng. Int. 30 (7), 935–950. Roshany-Yamchi, S., et al., 2013. Kalman filter-based distributed predictive control of large-scale multi-rate systems: application to power networks. IEEE Trans. Control Syst. Technol. 21 (1), 27–39. Shi, R., MacGregor, J.F., 20 0 0. Modeling of dynamic systems using latent variable and subspace methods. J. Chemom 14, 423–439. Skogestad, S., Postlethwaite, I., 2005. Multivariable Feedback Control: Analysis and Design, 2nd ed. Wiley, Chichester. Stephanopoulos, G., Karsligil, O., Dyer, M., 2008a. Multiscale theory for linear dynamic processes: part 2. Multiscale model predictive control (MS-MPC). Comput. Chem. Eng. 32 (4–5), 885–912. Stephanopoulos, G., Karsligil, O., Dyer, M.S., 2008b. Multiscale theory for linear dynamic processes: part 1. foundations. Comput. Chem. Eng. 32 (4–5), 857–884. Walczak, B., Massart, D.L., 2001a. Dealing with missing data: part I. Chemom. Intell. Lab. Syst. 58 (1), 15–27. Walczak, B., Massart, D.L., 2001b. Dealing with missing data: part II. Chemom. Intell. Lab. Syst. 58 (1), 29–42. Willsky, A.S., 2002. Multiresolution Markov models for signal and image processing. Proc. IEEE 90 (8), 1396–1458. Wold, S., Sjöström, M., Eriksson, L., 2001. PLS-Regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58, 109–130. Wu, Y., Luo, X., 2010. A novel calibration approach of soft sensor based on multirate data fusion technology. J. Process Control 20 (10), 1252–1260.