Computers and Chemical Engineering 75 (2015) 120–134
Contents lists available at ScienceDirect
Computers and Chemical Engineering journal homepage: www.elsevier.com/locate/compchemeng
Correntropy based data reconciliation and gross error detection and identification for nonlinear dynamic processes Zhengjiang Zhang a,b , Junghui Chen b,∗ a b
College of Physics and Electronic Information Engineering, Wenzhou University, Wenzhou 325035, People’s Republic of China Department of Chemical Engineering, Chung Yuan Christian University, Chung Li District, Taoyuan, Taiwan 32023, Republic of China
a r t i c l e
Article history: Received 22 September 2014 Received in revised form 2 December 2014 Accepted 9 January 2015 Available online 16 January 2015 Keywords: Chemical processes Correntropy Data reconciliation Instrumentation Optimization Systems engineering
a b s t r a c t
i n f o
Measurement information in dynamic chemical processes is subject to corruption. Although nonlinear dynamic data reconciliation (NDDR) utilizes enhanced simultaneous optimization and solution techniques associated with a finite calculation horizon, it is still affected by different types of gross errors. In this paper, two algorithms of data processing, including correntropy based NDDR (CNDDR) as well as gross error detection and identification (GEDI), are developed to improve the quality of the data measurements. CNDDR’s reconciliation and estimation are accurate in spite of the presence of gross errors. In addition to CNDDR, GEDI with a hypothesis testing and a distance–time step criterion identifies types of gross errors in dynamic systems. Through a case study of the free radical polymerization of styrene in a complex nonlinear dynamic chemical process, CNDDR greatly decreases the influence of the gross errors on the reconciled results and GEDI successfully classifies the types of gross errors of the measured data. © 2015 Elsevier Ltd. All rights reserved.
1. Introduction Accurate process data is important for the evaluation of the process performance and to justify current process data requires large capital expenditures. Also, process control and optimization schemes rely on accurate process data monitoring for trustworthy assessments. However, process data are often inaccurate or inconsistent with the mass balances, energy balances, and their constraints of the process systems. The inaccuracy in the process data may come from the measurement information usually corrupted by random measurement errors and systematic errors. Random measurement errors can be small perturbations from the true values. However, systematic errors, which are so called gross errors, can be quite large. The primary concerns are the gross errors usually caused by malfunctioning instruments, measurement device biases or process deficiencies. The presence of random errors decreases the precision of measurement information while gross errors introduce inaccurate information. As the improvement of the raw data set would increase the process performance and maintenance efficiency, data reconciliation (DR), which could rectify the errors in the raw data, would be very important. It uses the redundancies in the measurements to improve the accuracy and
∗ Corresponding author. Tel.: +886 3 2654107; fax: +886 3 2654199. E-mail address:
[email protected] (J. Chen). http://dx.doi.org/10.1016/j.compchemeng.2015.01.005 0098-1354/© 2015 Elsevier Ltd. All rights reserved.
precision of measurement information to reduce the influence of measurement errors. Kuehn and Davidson (1961) were the first to address DR. They focused on the DR problem in steady-state chemical engineering processes. Their proposed method was the solution to an optimization problem. It minimized a weighted least-squares objective function of errors between the measured and the estimated values of process variables under static material and energy balance constraints. Since then, several researchers have developed many other approaches. Romagnoli and Stephanopoulos (1981) proposed a systematic strategy for the location of the source and the rectification of gross errors in a chemical process. Their strategy can efficiently reduce the size of the DR problem and conform to the general process of variable monitoring in a chemical plant. Several researchers also proposed different strategies to enhance the solution to DR problem (Serth and Heenan, 1986; Narasimhan and Mah, 1987; Tong and Crowe, 1995; Rollins et al., 1996; Arora and Biegler, 2001; Martinez Prata et al., 2010; Zhang et al., 2010; Chen et al., 2013). In the study of dynamic data reconciliation (DDR), Kalman Filter (KF) had been effectively used to smooth measurement data (Sage and Melsa, 1971). KF estimates possess the desirable statistical properties of being unbiased. KF can also obtain the minimum variance under the assumption of the Gaussian distribution. For the dynamic nonlinear system, Stanley and Mah (1977) tackled the DDR problem in a dynamic nonlinear process using extended Kalman filter (EKF) (Narasimhan and Jordache, 2000). Their research showed that the reliability of EKF-based
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
approaches often decreases while the nonlinear complexities and modeling uncertainties of the system increase. Large errors and divergence of the filter might occur (Romanenko and Castro, 2004; Romanenko et al., 2004). Therefore, a model should be properly selected in order to reduce complexity. Furthermore, when the state and/or measurement equations were highly nonlinear and the posterior distribution of the states was non-Gaussian, KF or EKF based DDR would yield unsatisfactory reconciled and estimated results in a number of applications (Chen et al., 2005, 2008). The particle filtering (PF) technique, which is served as a general filter in the nonlinear and non-Gaussian state-space systems, was recently applied to DDR problems (Chen et al., 2008). However, it was restricted to the use of process state-space models, and it was not able to deal with inequality constraints, such as lower and upper bounds on the states (Bai et al., 2007; Nicholson et al., 2014). In the study of nonlinear dynamic processes, Leibman et al. (1992) and later Ramamurthi et al. (1993) formulated the nonlinear dynamic data reconciliation (NDDR) problem and proposed solution strategies by neglecting the random noise disturbances in the state transition equations. The NDDR formulation included the manipulated input variables as part of the objective function. It was more general than the model used in filtering, whose manipulated inputs are assumed to be known exactly (Narasimhan and Jordache, 2000). This formulation could deal with inequality constraint, and it was widely used by many researchers (Chen and Romagnoli, 1998; Kong et al., 2000; Martinez Prata et al., 2010). However, the NDDR problem was still formulated as a weighted least-squares objective function which is the sum of squared measurement errors in each time step. The function was minimized subject to the process dynamic model. It was very sensitive to large measurement errors, and it would lead to unsatisfactory reconciliation and estimation in the presence of the gross errors. Gross errors are random or deterministic errors without the relation with the true values. In the original DR study, it was assumed that the noise that affected the variables was randomly distributed with zero mean. However, in practice, gross errors may occur. The presence of gross errors will affect the results of DR if the large errors are not sufficiently eliminated or corrected. As a result of smearing, both the reconciled measurements and the estimates of states may become distorted. Gross error detection and identification (GEDI) is generally considered as a crucial technique within the DR framework. In order to avoid corrupted adjustments, the GEDI problem has received considerable attention in the past few decades and a number of strategies have been developed. The classical hypothesis testing strategies are the first methods used for GEDI, including the global test (Almasy and Sztano, 1975), the nodal test (NT) (Mah et al., 1976) and the measurement test (MT) (Mah and Tamhane, 1982). Serth and Heenan (1986) proposed several tests, including the iterative measurement test (IMT) and the modified IMT. They were more efficient than MT and NT in terms of performance. Other methods, such as generalized likelihood ratio methods (Narasimhan and Mah, 1987), maximum power test methods (Crowe, 1992), principal component test methods (Tong and Crowe, 1995), etc., were also developed for GEDI. A general survey of gross error detection with data reconciliation approaches was given by Özyurt and Pike (2004). However, most of the above strategies were developed to solve the DR problems in steady-state chemical processes. After DR, the methods that identified gross errors in dynamic systems were also developed because the process model error was considered as an important contributing factor in the estimation of the measurement bias and process state variables. McBrayer and Edgar (1995) used the NDDR formulation to derive the resulting difference between the measured and the reconciled values, and they developed a method for bias detection in nonlinear dynamic processes. Bagajewicz and Jiang (1997) proposed a new statistic
121
method to detect bias in the linear dynamic systems. Chen and Romagnoli (1998) used the moving horizon concept and the cluster analysis techniques to successfully distinguish outliers from normal measurements in dynamic chemical processes. Bai et al. (2007) developed an algorithm to deal simultaneously with bias correction and DR in dynamic processes. Xu and Rong (2010) proposed a new framework for DR and measurement bias identification in generalized linear dynamic systems. Gonzalez et al. (2011) proposed a Bayesian approach to determine the inconsistency of sensors. They used the modified principal components for factor analysis to determine the initial value, and then estimated sensor variance and gross errors by means of the Bayesian estimation. In 2012, they developed an online algorithm to detect and estimate gross errors from measurement data under mass and energy balance constraints (Gonzalez et al., 2012). By applying filtering techniques, Singhal and Seborg (2000) proposed a probabilistic formulation that combined EKF and the expectation-maximization (EM) algorithm in the measurement reconciliation. The new EKF-EM method removed the outliers and reduced noise effects. Later, Chen et al. (2008) used the PF technique for the NDDR problem and used a mixture model comprising two Gaussian distributions to address the effect of outliers. The outlier detection was more efficient than the EKF-EM method in terms of performance. The strategies mentioned above for GEDI problems only deal with outlier or bias detection without considering different types of gross errors even if there were mixed types of gross errors. GEDI is also considered as sensor fault detection and isolation problems in the area of fault detection and diagnosis (FDD). Many different FDD approaches were developed to detect and isolate sensor faults. Those approaches are mainly classified into two categories: the model-based approaches and the knowledge-based or the data-driven approaches. The data-driven approaches include the traditional multivariate statistical-based methods (such as the principal component analysis and partial least-squares methods) and many other improved data-based methods (such as independent component analysis, Gaussian mixture models, neural networks, support vector machines, and support vector data description) (Ge et al., 2013). Those FDD methods of sensor fault detection and isolation generally train models from data rather than relying on accurate prior models which are not often available in practice. The techniques recently developed for data based learning models contain closed loop identification technique (Wei et al., 2010), neural networks (Samy et al., 2011; Sadough Vanini et al., 2014), expert systems (Silva et al., 2012), fuzzy logic (Zhang et al., 2013), and adaptive estimation (Zhang, 2011). Those FDD methods can optimally exploit information on sensor faults whose corresponding data are stored in the historical database of the plant. In model-based approaches, the multiple-model (MM) approaches are more flexible and powerful. The term “MM” covers a wide range of approaches whose common goal is to propose an architecture (or hierarchy) for a bank of estimators or filters for isolation and identification of faults. The choice of the application domains in the MM FDD schemes in implementing the Kalman filter (Wei et al., 2010; Pourbabaee et al., 2013) for linear dynamic system, the extended Kalman filters (An and Sepehri, 2005) and particle filters (Alrowaie et al., 2012) for nonlinear dynamic system. Those filters are used as state estimators. Model-based approaches are by nature more powerful and popular if a perfect analytical model can be created and utilized. Many FDD approaches that detect and isolate sensor faults are focused on permanent sensor bias faults (Pourbabaee et al., 2013; Zhang, 2011; Samy et al., 2011) or sensor saturation (Zhang et al., 2013). However, the strategies for detecting and isolating mixed types of gross errors in sensor faults are rarely considered. In fact, only a handful of researchers have addressed the GEDI strategies for the mixed types of gross errors in dynamic chemical processes. Abu-El-Zeet et al. (2002) proposed a novel technique
122
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
for the detection and the identification of both bias and outliers in dynamic process systems. This technique was done sequentially for all the measurement variables. The reconciliation procedure was run repeatedly until no gross error was found in the measurement variables. Miao et al. (2011) developed a method based on support vector regression to achieve simultaneous DR and GEDI, which were used for estimating the joint bias and the leak problem. The method solves a mixed integer non-linear programming problem. Silva et al. (2012) presented an expert system that uses a combination of object-oriented modeling, rules, and semantic networks to deal with different types of gross errors, which are the most common sensor faults, such as biases, drifts, scaling, and dropouts. However, the above methods tend to require heavy computational load, especially when the process system was large-scale and nonlinear. The complexity of NDDR problems increases when the measured data are corrupted. Moreover, the best control system performance can only be achieved using accurate measurements. As a result, NDDR and GEDI have become crucial tools for data quality improvement in integrated control and management systems. In our previous work (Chen et al., 2013), the correntropy estimator was used in the steady DR problem to reduce the effect of gross errors and to yield less biased estimates. As correntropy measures both the uncertainty and dispersion, it can be used as an optimality criterion in the estimation problems. However, the paper (Chen et al., 2013) only focused on the detection of one type of gross error, the outliers, in the linear process system at the steady state. Since the temporal redundancy information cannot be used to properly solve the steady DR problem, a new method is needed to detect and identify the mixed types of gross errors. In this paper, correntropy estimator is extended and used in the NDDR problems, and a novel method is proposed to deal with NDDR and GEDI simultaneously in nonlinear dynamic systems with different types of gross errors occurring at the same time. In NDDR, the new method minimizes a correntropy function of error between the measured and the estimated values of the process variables under nonlinear dynamic constraints. Correntropy is good for measuring both the uncertainty and dispersion. Particularly, if the variables have gross errors, the data with outliers will show a large deviation from the trend exhibited by the majority of the observations. This means that the reconciled values will be smeared. Based on the correntropy approach, data associated with larger errors would be ignored and the effects of the gross errors on data reconciliation can be minimized. The proposed correntropy based NDDR (CNDDR) can decrease the influence of large measurement errors. It can provide more accurate reconciliation and estimation in the presence of the gross errors. Then, without iterative trials, a new GEDI scheme is developed. It contains MT, a distance-time step criterion and the variance checking for detecting and identifying different types of gross errors, such as outliers, biases and drifts. The rest of the paper is organized as follows. In the next section, the traditional NDDR problem formulation is described. Different types of gross errors and their influence are also discussed. In Section 3, the CNDDR algorithm, which is robust to gross errors, is formulated. The GEDI strategies for the ability detection and gross error identification are proposed in Section 4. In Section 5, the effectiveness of CNDDR and GEDI is demonstrated through the free radical polymerization of styrene in a complex nonlinear dynamic process system. Finally, conclusions are drawn in Section 6.
2. Traditional NDDR The general formulation for the NDDR problem was first introduced by Leibman et al. (1992)and became widely used by
researchers. The general formulation can be expressed as minJ( z, z) zˆ
subject to
f
d z , z dt
=0
(1)
h( z) = 0
g( z) ≤ 0
where J is the objective function. f is the set of differential model constraints; h is the set of algebraic equality constraints and g represents the set of algebraic inequality constraints. The dynamic constraints (f) in Eq. (1) is usually the process differential equations to be satisfied. This means that z is adjusted until the difference between integrating the process differential equations and the measurements z over the data window is minimized in the meansquare sense. Thus, in most applications, weighted least squares is used as the objective function, J =
K
T
( zk − zk ) V−1 ( zk − zk ).
k=0
V is the weighting matrix of the process measurements. zk = T [z1,k , z 2,k , . . ., zN,k ]T and zk = [ˆz1,k , zˆ2,k , . . ., zˆN,k ] , k = 1, . . ., K are the vectors of the measured and the reconciled values of the process variables measured at time step k. N is the number of measured variables. Note that the measurements can include both the measured state variables and the measured input variables. If there are some state variables unmeasured in the process system, the above problem can also be formulated using some transformations (Narasimhan and Jordache, 2000). Discretization is an important strategy adopted to facilitate the solution to the general NDDR problem where the dynamic constraints (f) needs to be discretized in order to solve the NLP problem defined by Eq. (1). The DR problem in nonlinear processes presents more difficulties which are not encountered in linear processes. First, it is generally impossible to analytically obtain a compact form for representation of the solution to the DR problem. Second, it is mathematically difficult to handle random noise if the differential equations or measurement equations are nonlinear functions of the noise. Because of the nonlinearity of the equations, neither the state variables nor the measurements would follow a Gaussian distribution even if the random noise is assumed to be normally distributed. A least-squares formulation, however, is still used to derive the estimates so far. To overcome this problem, a novel robust estimator using correntropy is proposed in Section 3. It is an information theoretic alternative to the traditional mean square error criterion for NDDR problems. The above techniques presented by Eq. (1) assumed that only random errors are present in the data. This assumption is not valid given that gross errors may occur in non-random events. Even though gross errors are less common, if not removed, they affect the accuracy of reconciliation. Therefore, in the NDDR problem, the proposed method should not only identify the presence of gross error but also correct the measurements. When gross errors are not present in the measurements, any observation (zn,k ) can be written as zn,k = z˜n,k + εn,k
(2)
where z˜n,k is the true value of the nth measured variable. εn,k is nothing more than the inherent variability entirely because of the random measurement error randomly distributed with zero-mean. When gross errors are present in the measurements, more generally, the true value and the observation can be formulated into zn,k = z˜n,k + εn,k + ın,k
(3)
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
123
period (Fig. 1(c)). For data reconciliation problem without gross errors, weighted least squares method is the best way to estimate the values of process variables. But the data contaminated with gross errors invalidate the assumption of the observations following Gaussian distributions. When the objective function in Eq. (1) is a weighted least squares criterion, the weighted least squares method would yield biased estimates. The reconciled results would not be reliable estimates of the true state of the process. It is important to reduce the influence of those gross errors. 3. CNDDR The weighted least squares based NDDR (WLS-NDDR) method is the best method of estimating the values of process variables if the measurements do not contain gross errors and the measurement errors are normally distributed with zero mean. However, gross errors are generally present in the measurement data. If there are gross errors in the measurement, the data will show a large deviation from the trend exhibited by the majority of the observations. The realization of the measurement errors will follow a non-Gaussian distribution with a non-zero mean and a variance significantly larger than unity. This NDDR method will give biased estimates and more robust estimators are needed. The correntropy estimator is robust to gross errors which can measure both the uncertainty and dispersion. By properly specifying a kernel width of correntropy function, data associated with errors exceeding a specified magnitude would be ignored and the effects of the gross errors on data reconciliation could be minimized. Thus, correntropy can be used as an optimality criterion in estimation problems. In our past works, its performance in steady state data reconciliation outperforms the other robust estimators, such as quasi-weighted least squares estimator, fair function, etc. (Chen et al., 2013). In this section, the correntropy estimator is extended to the formulation of NDDR problems for nonlinear dynamic data reconciliation. Correntropy V (W, V ) is defined as the mean of the Gaussian function of the difference between the two arbitrary scalar random variables, W and V (Liu et al., 2007). It is given by
V (W, V ) = E[k (W, V )] =
Fig. 1. Different types of gross errors.
If ın,k is non-zero, the observation contains gross errors in addition to the inherent random variation. The implication is, therefore, that the expectation of zn,k differs from the true value of the process variable z˜n,k . Gross errors may arise, for instance, from malfunctioning instruments, incorrect calibration of measurement devices, corrosion in sensors, faulty analogy-digital conversion or process deficiencies. They can be classified into three representative types, outliers, systematic biases, and drifts (Dunia et al., 1996; Narasimhan and Jordache, 2000), illustrated Fig. 1 graphically. The outliers can be caused by a number of different sources, such as power supply fluctuations, network transmission and signal conversion fault. The corresponding measurements have little or no relation to the true values and result in some occasional spikes in the time window (Fig. 1(a)). The systematic biases usually occur when measurement devices yield consistently erroneous values caused by incorrect calibration or malfunction. As shown in Fig. 1(b), they may occur suddenly at a particular time and thereafter remain at a constant magnitude. The drifts may be caused by the wear or fouling of sensors and can occur gradually over a period of time. The magnitude of the gross error increases slowly over a relatively long time
k (w, v)dPWV (w, v)
(4)
where PWV (w, v) is the joint distribution function of (W, V ) and k (w, v) is a shift-invariant Mercer kernel. In most general formulation, this kernel is selected as the Gaussian kernel function with the kernel width , and is defined as 1 exp k (w, v) = √ 2
w − v2 − 2 2
(5)
The arbitrary scalar random variables, W and V, can be either independent or correlated with each other. If the joint distribution function PWV (w, v) is unknown, a finite number of data
K
(wk , vk ) drawn from the joint probability density distribution k=1 is used as the samples and the correntropy estimator of samples Vˆ (W, V ) can be estimated by 1 k (wk , vk ) K K
Vˆ (W, V ) =
(6)
k=1
Intuitively, correntropy is closely related to the similarity between arbitrary scalar random variables, W and V. Therefore, if W is similar to V, the correntropy value of W and V is large. This characteristic can be applied to the NDDR problem. Correntropy can be zk ) are utilized as a goodness of fit to describe how well the data ( reconciled for the process measured data (zk ) with the random and gross errors. When some measured data contain gross errors, the
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
corresponding value of correntropy is very small. Thus, if correntropy is used as the objective function of NDDR, the influence of the measured data with gross errors will be decreased. In this work, to overcome the disadvantage of the conventional NDDR methods, a correntropy-based objective is proposed to conduct the data reconciliation in nonlinear dynamic systems. It can be defined as follows: 1 maxJ1 = kn (ˆzn,k , zn,k ) NK zˆ N
K
(7)
n=1 k=0
1 √ n 2
−
exp
(ˆzn,k −zn,k )
2
2n2
. Note that if any measurement error
(ˆzn,k − zn,k ) is large, it will contribute very little to the value of the correntropy estimator (Chen et al., 2013). This means that the correntropy estimator is a robust cost function for the measured data with gross errors because it can avoid amplifying the effect of gross errors. The maximum value of Eq. (7) is 1. Thus, the maximum problem can be recast as a minimization problem using one minus the objective function (Eq. (7)), min J2 = min
K N n=1 k=0
1 √ n 2
1 − exp
−
(ˆzn,k − zn,k ) 2n2
2
(8)
To minimize the objective function, the derivative of the objective function in Eq. (8) with respect to zˆn,k can be obtained and set to zero, K N
w(en,k )
n=1 k=0
where en,k =
den,k dˆzn,k
en,k = 0
(ˆzn,k −zn,k ) . n
(9)
w(en,k ) = exp
−
e2
n,k
2
can be regarded as
the weighted terms. Thus, the same set of equations listed in Eq. (8) can also be obtained by minimizing the objective function (Liu et al., 2007), min J3 = min
K N n=1 k=0
1 2 √ w(en,k )en,k n 2
(10)
Therefore, under the framework of Eq. (1), the CNDDR problem can be formulated as min
z
N K n=1 k=0
subject to
f
1 2 √ w(en,k )en,k n 2
d z , z dt
=0
3.5
WLS-NDDR CNDDR (σ = 0.75)
3 2.5 2 1.5 1
where N is the number of measured variables, n is the kernel width of the nth measured variables and kn (ˆzn,k , zn,k ) =
4
Influence function
124
(11)
h( z) = 0
g( z) ≤ 0
The influences of en,k to the objective functions between the WLS-NDDR formulation and the CNDDR formulation are compared in Fig. 2. When there are only small random errors in measurements, the effect of the measurement errors on the two objective functions is almost the same. However, the effect of the large measurement errors (i.e. gross errors) is quite different. The influence function of WLS-NDDR is proportional to the large measurement errors, which means that the gross error will have great influences on both the objective function of WLS-NDDR and the reconciliation of other measured variables. However, the effect of the large measurement errors on the objective function of CNDDR is decreased
0.5 0 -0.5 0
0.5
1
1.5
2
2.5
3
3.5
4
Measurement error Fig. 2. Influences of the measurement errors on the objective functions among WLSNDDR and CNDDR.
√ significantly. When en,k is equal to or greater than 2 2, the influence function of CNDDR is equal to or less than zero. In other words, the corresponding measurement errors will contribute very little to the value of the correntropy estimator. Therefore, CNDDR is robust to the large measurement error. When the kernel width (n ) is properly specified, the measurement data points associated with gross errors would be ignored, and their effects on the estimates could be minimized. The kernel width plays an important role in the smoothing process. If it is close to zero, the Gaussian kernel is almost the same as the Dirac delta function. In this situation, the maximum correntropy estimation is identical to the maximum a posteriori estimation. If the kernel width tends to be infinite, the maximum correntropy estimation is equivalent to the minimum mean squared error (Chen and Principe, 2012). In this work, the kernel width can be simply computed as √ 2 2n = the order of magnitude of the nth measurement (12) √ Any measurement error which is equal to or greater than 2 2n contributes very little to the value of the correntropy estimator. In other words, any measurement error whose value is equal to or greater than the order of magnitude of the measurement contributes very little to the objective function of CNDDR. 4. Gross error detection and identification In most NDDR methods, gross-error free measurements are assumed and statistical models are known in advance, but these only happen in an ideal situation. Three types of measurement gross errors, including outliers, biases and drifts, are usually encountered in the operating plants. The gross error models for these types are different. (1) Outlier: It usually has little or no relation with the actual measured values, resulting in some occasional spikes in the measured data. If outlier is present in the nth measurement at time step ko , the gross error model for the outlier can be described by:
zn,k = z˜n,k + εn,k ,
k= / ko
zn,k = z˜n,k + εn,k + on,ko ,
k = ko
(13)
where on,ko is the magnitude of the outlier in the nth measurement at time step ko . (2) Bias: The systematic bias usually occurs when measurement devices yield consistently erroneous values. In the presence of
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
bias in the nth measurement, the gross error model of the bias becomes: zn,k = z˜n,k + εn,k + bn
(14)
where bn is the magnitude of the bias in the nth measurement. (3) Drift: It is more difficult to calibrate the measurement device if it has drifts. The behavior of the measurement errors with drifts is more complex than that of the other two types of gross errors. The gross error model of the drift can be generally written by: zn,k = z˜n,k + εn,k + dn ( zk , k)
(15)
where dn ( zk , k) is a function to describe the change of the measurement errors with drifts. It may be a linear, nonlinear or periodic function and is related to the operating condition, external environment or the time step. A realistic way is to identify such systematic or gross errors. In Section 4.1, an MT is applied in a simple attempt to answer the question of whether gross errors are present in the data. In fact, sometimes there would not be only one single type of gross error; the combinations of gross errors would be present in the measurement data. However, the identification of the gross errors in the NDDR problem has still not been completely developed. If gross errors are present, two strategies, including the hypothesis testing and the distance with time-point criterion, will be developed in Section 4.2 to find the number of gross errors, their types (including outliers, biases and drifts), or their locations. 4.1. Gross error detection for nonlinear dynamic system using measurement tests The residual of the reconciled values is important to the detection of gross errors in measurement data. The gross error detection scheme is derived from a testing procedure based on the residuals. When measurements do contain gross errors, the measurement residual between the reconciled value from CNDDR and the actual value from the measured data, (ˆzn,k − zn,k ) will no longer be randomly distributed with zero-mean. Based on the residual, the distinction could be made between the random errors and the gross errors. The residuals are normally distributed when the uncontrollable factors vary at random. In typical processes, small residuals are usual and gross errors are unusual. Contrary to random errors, gross errors occur when one or more isolated factors cause the displacement of the measurement. The expected value of residual is not consequently equal zero. The question now is how to decide whether the mean is significantly different from zero. Based on the statistical hypothesis testing, a test statistic can be defined as:
rn,k P tn,k = ≥ Z˛/2 = ˛
(16)
Vnn
where rn,k = zˆn,k − zn,k is the residual data point of the measured variable n at the kth time step. Vjj is the diagonal element of V. Z˛/2 is the upper 100˛/2 percentage point of the standard normal distribution. In the presence of gross errors at a pre-assigned probability, a test can be applied to inconsistency of measurements. A value of (1 − ˛) for the allowable error probability is acceptable in many cases. If tn,k is identified as the measurement with gross errors, it is necessary to identify which type of the gross error. Then the instrument can be repaired to correct the corresponding measurement and the gross error can be deleted from the set of data base. 4.2. Gross error identification for nonlinear dynamic systems Once the existence of gross errors is ascertained, the type of gross error should be distinguished. Because outliers, unlike the
125
deterministic behaviors (biases and drifts), are the occasional ones, the distance criterion can be used to determine which measurement data are outliers. Chen and Romagnoli (1998) made use of cluster analysis techniques to successfully distinguish outliers from the other data. However, when the outliers have the same values or are close to each other at different time steps, this method is difficult to differentiate these outliers. In order to eliminate the similar outliers, Abu-El-Zeet et al. (2002) improved the method by expressing the distance in terms of the absolute value of the difference between the measurement data and the mean of all measurement data. Those methods can detect which measurement data points contain the outlier but cannot identify which measured variables of those data points contain the outlier. (1) Outlier identification To overcome the above problems, the time point in the list of the measurement variables should be included for analysis. The distance criterion containing the vector of measurement residuals and the corresponding time points is used for the outlier detection. For the nth measured variable with gross errors at the time point kg , the minimum distance between one measurement residual point (kg , rn,kg ) and all the other measurement residual data / kg , is calculated as: points (kg , rn,k ), kg = g
DISTn,kg =
min
all kg = / kg
2 (kg − kg )
+
rn,kg − rn,k
g
n
2 12 (17)
The outliers are some occasional spikes in the moving time window. Unlike the biases and drifts, which are continuous data points in succession, they are usually present as isolated points. Therefore, the distance calculation is used to differentiate the outliers from the other two types of gross errors. Since those measurement data points with biases or drifts usually appear in clusters, their calculated values of DIST are significantly lower than those of the measurement data points with outliers. Because the outliers do not occur in succession, with the inclusion of kg in Eq. (17), the increase in the minimum distance of an outlier is significantly larger than that of data points with other gross errors. At the time point with the outlier, the neighboring time point will be selected in DIST. The statistic criterion of DIST can be estimated from the second term of the right hand side of Eq. (17). This term dealing with the measurement characteristics can be used to monitor outliers. When the outlier is not present, all the DIST points should have an essentially random pattern. With the F-testing procedures, the nonrandom measurement can be applied to detecting the outliner condition. In this work, an outlier is a data point whose minimum distance is greater than the minimum distance based on the 95% confidence interval. The outliers identified by this outlier detection method are used to estimate the minimum magnitude of the error arising from an outlier. (2) Bias and drift identification If the gross error is not an outlier, the next identification work is to determine whether the measurement data points with gross errors are classified as biases or drifts. On bias detection and identification, the previous work (McBrayer and Edgar, 1995; Abu-El-Zeet et al., 2002) was done sequentially for all the measurement variables, and the reconciliation procedure was run repeatedly until no bias was found in the measurement variables. In this situation, the NDDR problem formulation is very sensitive to the measurements with gross errors. The measurement variable with bias will affect the adjustments of other measurement variables. However, the proposed CNDDR does not require detecting and identifying the
126
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
Fig. 3. Bias detection and identification based on variance calculation in the moving window.
bias sequentially. It is robust to the gross error because the influence of the measurement variable with biases is not smeared over adjustments of other measurement variables. Because systematic biases usually occur when measurement devices consistently yield erroneous values, the variance of those measurement data points with biases is much smaller than that of those measurement data points with drifts. To classify these two types of gross errors, a moving window shown in Fig. 3is proposed to calculate the variance of those data points with gross errors. In each moving window, the reconciled residuals are used to calculate the standard deviation (n ) and the mean value (¯rn ). Assume the length of moving window is H, and then the variance of H samples of the residual data points in the moving window becomes
1 n =
k+H−1
H−1
(rn,kg − r¯ n )2
(18)
kg =k
1 H
k+H−1
kg =k
rn,kg
if n < th zn is a gross error with the bias
(20)
else zn is a gross error with the drift Once the nth measured variable with bias is detected and identified, the magnitude of the bias can be estimated as Kgb
1 bˆ n = − rn,kg Kgb
(21)
kg =1
where r¯ n =
As the method is based on the assumption of isolated residuals without drifts, the residuals with biases in the consecutive samples shows a random behavior. The data points in the moving window are identified as biases if the variance (n ) in the moving window falls below the defined threshold (th ); otherwise, they should be classified as drifts,
(19)
where Kgb is the total number of residual data points identified as a bias. The new approach to NDDR and GEDI is constructed. CNDDR is used to reconcile the process measurements without considering if the gross errors exist, because CNDDR is insensitive
Fig. 4. Flowsheet of CNDDR and GRDD algorithms.
Concentration of initiator
40 35 30 25
0
50
100
150
200
1
0.96
0
50
100
150
First moment
Zero moment
1
0
50
100
150
0
50
100
150
200
0
50
100
150
200
0.98
0.96
Average molecular weight
0.9 0.85 0.8 0.75 0
50
100 Time step
150
2 1.5 1 0.5
200
0.95
Second moment
0.94
2.5
2
0.7
0.96
0.94
200
3
0
0.98
1
0.98
0.94
127
measured TNDDR CNDDR true
1
45
Reactor temperature
Concentration of monomer
Total concentration of live polymers
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
200
0
50
100
150
200
0
50
100
150
200
1.15 1.1 1.05 1 0.95 0.9
Fig. 5. Comparisons of the reconciled results of the TNDDR and the CNDDR in the first case study.
to gross errors. The advantage is that the interaction between the data reconciliation algorithm and the gross error detection algorithm is not necessary. The whole design strategies shown in Fig. 4 can be listed as:
Step 1: Use the developed models which can describe the process system. Reconcile the actual measurement data with the proposed CNDDR method. Step 2: Use Eq. (16) for the gross error detection to discriminate the random errors and the gross errors in the operating data. If there are no significant errors, go to Step 1; otherwise, measurements
with gross errors are identified and go to Step 3 for gross error identification. Step 3: Calculate the minimum distance in Eq. (17). Use Ftesting procedures to detect if the occasional spikes occur. If so, the gross errors are outliers and go to Step 1; otherwise, the gross errors would come from the biases or the drifts and go to Step 4. Step 4: Calculate the measurement variances (Eq. (18)) based on the gross error data in the moving window. If the variances in the moving window fall below the defined threshold, the biases occur; otherwise, the gross errors come from the drifts. Go to Step 1.
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
Table 1 Information of the measured variables for free radical polymerization of styrene. Measured variables Concentration of initiator (C˜ I ) Concentration of monomer (C˜ M ) Reactor temperature (T˜ ) ˜ 0) Zero moment ( ˜ 1) First moment ( ˜ 2) Second moment ( Average molecular ˜ w) weight (M Total concentration of live polymers (CP )
Standard deviation of noise
Final value at steady state
0.95
0.005
1
0.95
0.005
1
0.95
0.005
1
0.95 0.95 0.95 0.95
0.005 0.005 0.005 0.005
1 1 1 1
0.2
90.0528
Initial value
26.89
5. Simulation case study 5.1. Process description In order to illustrate the validity of the proposed CNDDR and the GEDI strategy, a complex nonlinear dynamic process system, which is free radical polymerization of styrene simulations, is carried out.
The outputs of this process, such as molecular weights, form a distribution. The estimates of the distribution outputs are crucial as they significantly influence the product quality and process efficiency. The mathematical models of Schmidt and Ray (1981) are used to describe the dynamics of polymerization in a jacketed continuous stirred tank reactor. The material balance equations for the zero, the first and the second moment balances of concentration of polymers with different molecular weights are presented. The reaction mechanisms of free radical polymerization of styrene would start with the decomposition reaction of an initiator, and then the initiator reaction, the propagation and the termination reaction. For the simulation runs, the number of the chain lengths is set to
Residual of total concentration of live polymers
128
(a)
1
0.5
0
-0.5
-1 measurement data point data point with outlier
-1.5
0
20
40
60
80
120
140
160
180
200
(b)
1
Residual of zero moment
100
Time step
measurement data point data point with bias data point with outlier
0.5 0 -0.5 -1 -1.5 -2 -2.5
0
20
40
60
80
100
120
Time step
160
180
200
(c)
0.2 0
Residual of first moment
140
measurement data point data point with bias
-0.2 -0.4 -0.6 -0.8 -1 -1.2 -1.4 -1.6
Fig. 6. Distribution of absolute errors between the estimated values and the true values of the concentration of polymer for (a) TNDDR and (b) CNDDR in the first case study.
0
20
40
60
80
100
120
Time step
140
160
180
200
Fig. 7. Results of GRDD for (a) the total concentration of live polymers, (b) the zero moment, and (c) the first moment in the first case study.
40 35 30 25
0
50
100
150
200
0.96
0.94
Reactor temperature
Concentration of monomer
0.96
0
50
100
150
0.94
200
150
200
0
50
100
150
200
4
First moment
Zero moment
100
0.96
3 2 1 0 0
50
100
150
Average molecular weight
0.9 0.85 0.8 0.75 0
50
100 Time step
150
3 2 1 0
200
0.95
Second moment
50
0.98
4
0.7
0
1
0.98
-1
measured TNDDR CNDDR true
0.98
1
0.94
129
1
45
Concentration of initiator
Total concentration of live polymers
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
200
0
50
100
150
200
0
50
100
150
200
1.15 1.1 1.05 1 0.95 0.9
Fig. 8. Comparisons of reconciled results between the TNDDR and the CNDDR in the second case study.
be 8000. The dynamic discretized equations describing the process model are: C˜ I,k = f1 (C˜ I,k−1 , T˜k−1 ) C˜ M,k = f2 (C˜ M,k−1 , T˜k−1 , C˜ I,k−1 ) T˜k = f3 (C˜ M,k−1 , T˜k−1 , C˜ I,k−1 ) ˜ 0,k−1 , C˜ I,k−1 , T˜k−1 ) ˜ 0,k = f4 ( ˜ 1,k−1 , C˜ I,k−1 , T˜k−1 ) ˜ 1,k = f5 ( ˜ 2,k = f6 ( ˜ 2,k−1 , C˜ I,k−1 , T˜k−1 ) ˜ w,k−1 , C˜ I,k−1 , T˜k−1 ) ˜ w,k = f7 (M M g(C˜ P,k , C˜ I,k , T˜k , kt,0 ) = 0
(22)
C˜ Mn ,k = f7+n (C˜ Mn ,k−1 , C˜ I,k−1 , T˜k−1 ) n = 1, 2, . . ., 8000
(23)
In Eqs. (22) and (23), C˜ I,k , C˜ M,k and C˜ Mn ,k are the concentrations of the initiator, the monomer, and the polymer at time point k, respectively. C˜ P,k is the total concentration of live polymers at time point k. ˜ w,k are the reactor temperature and the average molecular T˜k and M ˜ 0 represents the zero moment weight at time point k, respectively. ˜ 1 and ˜2 which is the total concentration of all the dead polymers; ˜ are the first and the second moments; and CMn is the concentration of polymer, which forms a distribution. Note that the state variables ˜ 0, ˜ 1, ˜ 2, M ˜ w , C˜ P and C˜ M ,k , n ≥ 2 vary greatly in magniC˜ I , C˜ M , T˜ , n tudes and they are redefined using dimensionless variables before the simulation of the complex nonlinear dynamic polymerization process. The total number of equations in the process model is 8008. ˜ 0, ˜ 1, ˜ 2, M ˜ w , and It is assumed that the state variables C˜ I , C˜ M , T˜ ,
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
Residual of total concentration of live polymers
130
(a) 1.5
1
0.5
0
-0.5
-1
-1.5
measurement data point data point with outlier
0
20
40
60
80
100
120
140
160
180
200
Time step (b) 1
Residual of zero moment
0.5 0 -0.5 -1 -1.5 measurement data point data point with drift data point with outlier
-2 -2.5 -3
0
20
40
60
80
100
120
140
160
180
200
140
160
180
200
Time step (c) 0.5
Fig. 9. Distribution of absolute errors between the estimated values and the true values of the concentration of polymer for (a) TNDDR and (b) CNDDR in the second case study.
Residual of first moment
0 -0.5 measurement data point data point with drift
-1 -1.5 -2 -2.5
C˜ P are measured variables; the concentration of polymer C˜ Mn is an unmeasured variable, so it is estimated from Eq. (23) after CNDDR. In reality, the polymerization concentrations cannot be frequently measured in a polymerization system. The data considered here are simulated data whose concentrations are measured for the illustration purpose only. The simulation environment for this example is MATLAB with CPU main frequency 3.2 GHz and 4 GB memory. The optimization solver is SQP. The main information of the measured variables for free radical polymerization of styrene is listed in Table 1. In order to appreciate the benefits of the proposed CNDDR, the traditional NDDR (TNDDR) is included for comparisons. Three separate case studies with different combinations of gross errors are conducted. In the first case study, the gross errors, outliers and biases in the observations occur simultaneously. This case is set up to observe the influence of the gross errors on the results of the two NDDR methods. The second case study is set up to examine the influence of the mixed gross errors, outliers and drifts. The last case study is to observe the detectability of the two NDDR methods for the measured data in the presence of outliers, biases and drifts.
-3
0
20
40
60
80
100
120
Time step Fig. 10. Results of GRDD for (a) the total concentration of live polymers, (b) the zero moment, and (c) the first moment in the first case study.
5.2. Case study 1: NDDR with outliers and biases In the first case study, the gross errors with both outliers and biases are simultaneously introduced into the simulation. The outliers are present in the two measured variables: the total concentration of live polymers and the zero moment. Six outliers in the two measured variables with their location selected randomly are added to the measured values. The magnitudes of outliers are gotten between 2 percent to 4 percent of the measured values. The biases are introduced in the zero moment and the first moment. The biases start at time step 75, and thereafter remain at a constant magnitude, 1.5, for 126 time steps. The comparison results of the TNDDR and the proposed CNDDR are shown in Fig. 5. It is clear that the TNDDR with least squares formulation has worse performance than the CNDDR with correntropy
Concentration of initiator
45 40 35 30 25
0
50
100
150
200
1
Reactor temperature
Concentration of monomer
Total concentration of live polymers
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
0.98
0.96
0.94
0
50
100
150
First moment
2 1.5 1 0.5
0
50
100
150
Second moment
0.96
0.94
0
50
100
150
200
0
50
100
150
200
1
0.98
0.96
0.9 0.85 0.8 0.75 0
50
100
150
2 1.5 1 0.5
200
0.95
0.7
0.98
2.5
Average molecular weight
Zero moment
2.5
measured TNDDR CNDDR true
1
0.94
200
131
200
0
50
100
150
200
0
50
100
150
200
1.15 1.1 1.05 1 0.95 0.9
Time step Fig. 11. Comparisons of reconciled results between the TNDDR and the CNDDR in the third case study.
formulation. The reconciled values of the measured variables, the zero moment and the first moment severely deviate from the actual values using TNDDR. Also, using TNDDR, the reconciled results of other measured variables (i.e. concentration of initiator, concentration of moment) are greatly affected by the biased measurements (Fig. 5). Since the least squares formulation is very sensitive to gross errors, the reconciled values are inaccurate with the presence of outliers and biases. On the contrary, in CNDDR, if any measurement error is greatly larger, it will contribute very little to the value of the correntropy estimator. Thus, CNDDR can avoid amplifying the effect of gross errors and it is robust to those gross errors. Fig. 6 shows the distribution of absolute errors between the estimated values and the actual values of the concentration of polymer. The absolute errors using CNDDR are much more accurate than those using TNDDR. After getting the reconciled data with CNDDR, the GEDI is applied and shown in Fig. 7. In Fig. 7(a), most of the outliers in the measured variable, which is the total concentration of live
polymers, are detected correctly. However, with the influence of biases, two outliers cannot be detected. Fig. 7(b) shows that outliers and bias are simultaneously present in the measured variable, the zero moment. The proposed GEDI can still correctly identify the outliers and the bias. The magnitude of the bias estimated by Eq. (21) is 1.5009, which is very close to the actual value. Fig. 7(c) shows the bias detection results of the first moment. The estimated magnitude of bias is 1.5013. The results are also correct and accurate. 5.3. Case study 2: NDDR with outliers and drifts In this case study, the outliers combined with drifts are simultaneously introduced into the simulation process. The presence of outliers is the same as in the first case study; that is, it is assumed that six outliers are found in the two measured variables. The nonlinear changing drifts are introduced in the zero moment and the first moment. The drifts start at time step 100, and increase in 50 time steps. The trend of the drifts is parabolic.
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
Residual of total concentration of live polymers
132
(a) 2 1.5
measurement data point data point with outlier
1 0.5 0 -0.5 -1 -1.5 -2 -2.5
0
20
40
60
80
100
120
140
160
180
200
Time step
(b)
Residual of zero moment
0.2
measuerment data point data point with bias data point with drift
0 -0.2 -0.4 -0.6 -0.8 -1 -1.2 -1.4 -1.6
0
20
40
60
80
100
120
140
160
180
200
160
180
200
Time step
(c)
Fig. 12. Distribution of absolute errors between the estimated values and the true values of the concentration of polymer for (a) TNDDR and (b) CNDDR in the third case study.
Fig. 8 shows the comparison results of the TNDDR and the proposed CNDDR. In Fig. 8, it is known that the reconciled results by CNDDR are much more accurate than the ones by TNDDR. The accuracy of reconciliation by TNDDR is greatly affected by the outliers and drifts. However, CNDDR is robust to both outliers and drifts. Fig. 9 shows that the absolute errors between the estimated values and the actual values of the concentration of polymer by CNDDR are much lower than those by TNDDR. The estimated distribution of the concentration of polymer is more accurate by using CNDDR. Fig. 10 shows the results of gross errors detection by using GEDI. From Fig. 10, it is known that both outliers and drifts are correctly detected and identified. Comparing the results of data reconciliation in this case study with those in the first case study, by using TNDDR, the reconciled results in the first case study are worse than those in this case study. For CNDDR, the results of gross error detection in this case study are more correct comparing with those in the first case study. It is known that outliers combined with biases affect more severely on the results of reconciled value. 5.4. Case study 3: NDDR with outliers, drifts and biases In this case study, the mixed gross errors, which contain outliers, drifts and biases, are introduced into the simulation process. Two outliers are present in the total concentration of live polymers. Both
Residual of first moment
0.2 0
measurement data point data point with bias data point with drift
-0.2 -0.4 -0.6 -0.8 -1 -1.2 -1.4 -1.6
0
20
40
60
80
100
120
140
Time step Fig. 13. Results of GRDD for (a) the total concentration of live polymers, (b) the zero moment, and (c) the first moment in the third case study.
drifts and biases are introduced in the zero moment and the first moment. The drifts start at time step 25, and then increase linearly at a rate of 0.03 degrees per sample time until the time step 74, and thereafter remain at a constant magnitude, 1.5, for 50 time steps, and finally decrease linearly in 50 time steps. The comparison results of the TNDDR and the proposed CNDDR are shown in Fig. 11. Since TNDDR is sensitive to gross errors, the reconciled results are much worse than those by CNDDR. However, the reconciled results by TNDDR are more accurate in this case study than those in the first case study. The main reason is that the biases here last for a shorter time than those in the first case study. Fig. 12 also shows that using CNDDR to estimate the distribution of the concentration of polymer is more accurate. Because fewer outliers are present in this case study, the distribution of the concentration of polymer is estimated more accurately by TNDDR
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
than that in either the first or second case study. Fig. 13 shows the results of gross errors detection using the proposed GEDI. Although the mixed gross errors occur, including outliers, drifts and biases, the proposed GEDI strategies can still correctly detect and identify each type of gross error. The estimated magnitudes of biases for the two measured variables, the zero moment and the first moment, are 1.4990 and 1.5000, respectively, both of which are very close to the actual values. The results are encouraging. The data reconciliation using the CNDDR is robust to the mixed gross errors, and it is more accurate to reconcile measured variables. The GEDI is successful in identifying the types of gross errors. The GEDI algorithm can be implemented as an agent which is compatible with the work in conjunction with the smart supervisory system. 6. Conclusions The previous research studies implemented NDDR and GEDI with the iterative procedures exhibits the coupling effect of NDDR and GEDD. The algorithms with the trial procedures would not be regarded as being very robust. They become unstable with a large increase in the computational effort. A new approach to NDDR and GEDI is proposed and tested. This new approach allows NDDR and GEDI to be sequentially solved without iterations. To sum up, the features of the CNDDR and GEDI algorithms for nonlinear dynamic data reconciliation, and gross error detection and identification are listed as follows: • Correntropy estimator is extended and used in the nonlinear dynamic data reconciliation problems. The CNDDR formulation focuses on the problem which is totally different from the linear steady state data reconciliation problem considered in our previous work (Chen et al., 2013). Because the new NDDR formulation is based on correntropy, not on the least squares estimator, its influence of gross errors on the reconciled results is decreased significantly more than the TNDDR formulation’s. Thus, the measured data can be accurately reconciled regardless of the data with gross errors whether the complete systems are in transient or steady state conditions. • Using the temporal redundancy information, the GEDI strategies for detecting and identifying the mixed types of gross errors in nonlinear dynamic chemical processes are proposed. They can simultaneously detect and identify different types of gross errors including outliers, biases and drifts, instead of one type of gross error (outliers) in the linear steady state process discussed in our previous work (Chen et al., 2013). • Compared with other GEDI strategies for the mixed types of gross errors in dynamic chemical processes, the proposed strategies do not need to detect the gross errors sequentially for all the measurement variables, and it is unnecessary for the reconciliation procedure to run repeatedly. They can reduce the computational load and time. With a complex nonlinear dynamic process system, free radical polymerization of styrene, the NDDR with gross error simulations are carried out. The outcome of different tests shows that the proposed CNDDR and GEDI approaches are efficient in tackling the nonlinear dynamic problem. The mixing gross errors can be correctly detected and identified when the measurement data points contain outliers, biases and drifts, respectively. Although the proposed CNDDR and GEDI algorithms have several good features, they still have some limitations. First, these parameters affect the performance of the data reconciliation, including the size of historical window, the use of scaling and the initial guess at the optimal solution. The selection of these parameters would be studied to determine the effect on the efficiency of the proposed CNDDR and
133
GEDI algorithms. Moreover, most process models with high fidelity often include undetermined parameters that have to be estimated based on process measurement data. Thus, data reconciliation and parameter estimation in the nonlinear dynamic processes with gross errors should be considered. In this work, CNDDR is based on an accurate prior model, which is often not available in practice. CNDDR combined with FDD framework may be a new way to deal with both the modeling uncertainty and measurement data uncertainty. This is worthy of exploration in the next phase of research. Acknowledgment The authors gratefully acknowledge National Science Council, R.O.C. (NSC 102-2811-E-033-001), the National Natural Science Foundation of China (No. 61374167; 51207112), and the Natural Science Foundation of Zhejiang Province (No. LQ14F030006) for financial support. References Abu-El-Zeet ZH, Becerra VM, Roberts PD. Combined bias and outlier identification in dynamic data reconciliation. Comput Chem Eng 2002;26(6):921–35. Almasy GA, Sztano T. Checking and correction of measurements on the basis of linear system model. Probl Control Inf Theory 1975;4(1):57–69. Alrowaie F, Gopaluni RB, Kwok KE. Fault detection and isolation in stochastic non-linear state-space models using particle filters. Control Eng Pract 2012;20(10):1016–32. An L, Sepehri N. Hydraulic actuator leakage fault detection using extended Kalman filter. Int J Fluid Power 2005;6(1):41–51. Arora N, Biegler LT. Redescending estimators for data reconciliation and parameter estimation. Comput Chem Eng 2001;25(11–12):1585–99. Bagajewicz MJ, Jiang Q. Integral approach to plant linear dynamic reconciliation. AIChE J 1997;43(10):2546–58. Bai S, McLean DD, Thibault J. Simultaneous measurement bias correction and dynamic data reconciliation. Can J Chem Eng 2007;85(1):111–7. Chen B, Principe JC. Maximum correntropy estimation is a smoothed MAP estimation. IEEE Signal Process Lett 2012;19(8):491–4. Chen J, Peng Y, Co Munoz J. Correntropy estimator for data reconciliation. Chem Eng Sci 2013;104:1019–27. Chen J, Romagnoli JA. A strategy for simultaneous dynamic data reconciliation and outlier detection. Comput Chem Eng 1998;22(4):559–62. Chen T, Morris J, Martin E. Particle filters for state and parameter estimation in batch processes. J Process Control 2005;15(6):665–73. Chen T, Morris J, Martin E. Dynamic data rectification using particle filters. Comput Chem Eng 2008;32(3):451–62. Crowe CM. The maximum-power test for gross errors in the original constraints in data reconciliation. Can J Chem Eng 1992;70(5):1030–6. Dunia R, Qin SJ, Edgar TF, McAvoy TJ. Identification of faulty sensors using principal component analysis. AIChE J 1996;42(10):2797–812. Ge Z, Song Z, Gao F. Review of recent research on data-based process monitoring. Ind Eng Chem Res 2013;52(10):3543–62. Gonzalez R, Huang B, Xu F, Espejo A. Estimation of instrument variance and bias using Bayesian methods. Ind Eng Chem Res 2011;50(10):6229–39. Gonzalez R, Huang B, Xu F, Espejo A. Dynamic Bayesian approach to gross error detection and compensation with application toward an oil sands process. Chem Eng Sci 2012;67(1):44–56. Kong M, Chen B, Li B. An integral approach to dynamic data rectification. Comput Chem Eng 2000;24(2):749–53. Kuehn DR, Davidson H. Computer control II. Mathematics of control. Chem Eng Prog 1961;57(6):44–7. Leibman MJ, Edgar T, Lasdon LS. Efficient data reconciliation and estimation for dynamic processes using nonlinear programming techniques. Comput Chem Eng 1992;16(10):963–86. Liu W, Pokharel PP, Príncipe JC. Correntropy: properties and applications in nonGaussian signal processing. IEEE Trans Signal Process 2007;55(11):5286–98. Mah RS, Stanley GM, Downing DM. Reconciliation and rectification of process flow and inventory data. Ind Eng Chem Process Des Dev 1976;15(1):175–83. Mah RSH, Tamhane AC. Detection of gross errors in process data. AIChE J 1982;28(5):828–30. McBrayer KF, Edgar TF. Bias detection and estimation in dynamic data reconciliation. J Process Control 1995;5(4):285–9. Miao Y, Su H, Wang W, Chu J. Simultaneous data reconciliation and joint bias and leak estimation based on support vector regression. Comput Chem Eng 2011;35(10):2141–51. Martinez Prata D, Schwaab M, Luis Lima E, Carlos Pinto J. Simultaneous robust data reconciliation and gross error detection through particle swarm optimization for an industrial polypropylene reactor. Chem Eng Sci 2010;65(17):4943–54. Narasimhan S, Jordache C. Data reconciliation and gross error detection: an intelligent use of process data. Houston: Gulf Professional Publishing; 2000.
134
Z. Zhang, J. Chen / Computers and Chemical Engineering 75 (2015) 120–134
Narasimhan S, Mah RSH. Generalized likelihood ratio method for gross error identification. AIChE J 1987;33(9):1514–21. Nicholson B, López-Negrete R, Biegler LT. On-line state estimation of nonlinear dynamic systems with gross errors. Comput Chem Eng 2014;70:149–59. Özyurt DB, Pike RW. Theory and practice of simultaneous data reconciliation and gross error detection for chemical processes. Comput Chem Eng 2004;28(3):381–402. Pourbabaee B, Meskin N, Khorasani K. Multiple-model based sensor fault diagnosis using hybrid Kalman filter approach for nonlinear gas turbine engines. In: American control conference (ACC); 2013. p. 4717–23. Ramamurthi Y, Sistu PB, Bequette BW. Control-relevant dynamic data reconciliation and parameter estimation. Comput Chem Eng 1993;17(1):41–59. Romanenko A, Castro JA. The unscented filter as an alternative to the EKF for nonlinear state estimation: a simulation case study. Comput Chem Eng 2004;28(3):347–55. Romanenko A, Santos LO, Afonso PA. Unscented Kalman filtering of a simulated pH system. Ind Eng Chem Res 2004;43(23):7531–8. Romagnoli JA, Stephanopoulos G. Rectification of process measurement data in the presence of gross errors. Chem Eng Sci 1981;36(11):1849–63. Rollins DK, Cheng Y, Devanathan S. Intelligent selection of hypothesis tests to enhance gross error identification. Comput Chem Eng 1996;20(5):517–30. Sadough Vanini ZN, Khorasani K, Meskin N. Fault detection and isolation of a dual spool gas turbine engine using dynamic neural networks and multiple model approach. Inf Sci 2014;259:234–51. Sage AP, Melsa JL. Estimation theory with applications to communications and control. New York: McGraw-Hill; 1971. Samy I, Postlethwaite I, Gu DW. Survey and application of sensor fault detection and isolation schemes. Control Eng Pract 2011;19(7):658–74.
Schmidt AD, Ray WH. The dynamic behavior of continuous polymerization reactors—I: Isothermal solution polymerization in a CSTR. Chem Eng Sci 1981;36(8):1401–10. Serth RW, Heenan WA. Gross error detection and data reconciliation in steammetering systems. AIChE J 1986;32(5):733–42. Silva JCD, Saxena A, Balaban E, Goebel K. A knowledge-based system approach for sensor fault modeling: detection and mitigation. Expert Syst Appl 2012;39(12):10977–89. Singhal A, Seborg DE. Dynamic data rectification using the expectation maximization algorithm. AIChE J 2000;46(8):1556–65. Stanley GM, Mah RSH. Estimation of flows and temperatures in process networks. AIChE J 1977;23(5):642–50. Tong H, Crowe CM. Detection of gross errors in data reconciliation by principal component analysis. AIChE J 1995;41(7):1712–22. Wei X, Verhaegen M, van Engelen T. Sensor fault detection and isolation for wind turbines based on subspace identification and Kalman filter techniques. Int J Adapt Control Signal Process 2010;24(8):687–707. Xu H, Rong G. A new framework for data reconciliation and measurement bias identification in generalized linear dynamic systems. AIChE J 2010;56(7): 1787–800. Zhang D, Wang QG, Yu L, Song H. Fuzzy-model-based fault detection for a class of nonlinear systems with networked measurements. IEEE Trans Instrum Meas 2013;62(12):3148–59. Zhang X. Sensor bias fault detection and isolation in a class of nonlinear uncertain systems using adaptive estimation. IEEE Trans Autom Control 2011;56(5):1220–6. Zhang Z, Shao Z, Chen X, Wang K, Qian J. Quasi-weighted least squares estimator for data reconciliation. Comput Chem Eng 2010;34(2):154–62.