Computers and Chemical Engineering 25 (2001) 1585– 1599 www.elsevier.com/locate/compchemeng
Redescending estimators for data reconciliation and parameter estimation Nikhil Arora, Lorenz T. Biegler * Department of Chemical Engineering, Carnegie Mellon Uni6ersity, Pittsburgh, PA 15213 -3890, USA Received 30 October 2000; received in revised form 27 June 2001; accepted 29 June 2001
Abstract Gross error detection is crucial for data reconciliation and parameter estimation, as gross errors can severely bias the estimates and the reconciled data. Robust estimators significantly reduce the effect of gross errors (or outliers) and yield less biased estimates. An important class of robust estimators are maximum likelihood estimators or M-estimators. These are commonly of two types, Huber estimators and Hampel estimators. The former significantly reduces the effect of large outliers whereas the latter nullifies their effect. In particular, these two estimators can be evaluated through the use of an influence function, which quantifies the effect of an observation on the estimated statistic. Here, the influence function must be bounded and finite for an estimator to be robust. For the Hampel estimators the influence function becomes zero for large outliers, nullifying their effect. On the other hand, Huber estimators do not reject large outliers; their influence function is simply bounded. As a result, we consider the three part redescending estimator of Hampel and compare its performance with a Huber estimator, the Fair function. A major advantage to redescending estimators is that it is easy to identify outliers without having to perform any exploratory data analysis on the residuals of regression. Instead, the outliers are simply the rejected observations. In this study, the redescending estimators are also tuned to the particular observed system data through an iterative procedure based on the Akaike information criterion, (AIC). This approach is not easily afforded by the Huber estimators and this can have a significant impact on the estimation. The resulting approach is incorporated within an efficient non-linear programming algorithm. Finally, all of these features are demonstrated on a number of process and literature examples for data reconciliation. © 2001 Elsevier Science Ltd. All rights reserved. Keywords: Redescending estimator; Data reconciliation; Parameter estimation
1. Introduction Data reconciliation and parameter estimation are important components to model fitting, validation, and real time optimization in chemical industries. In its most general form, data reconciliation is a minimization of measurement errors subject to satisfying the constraints of the process model. Parameter estimation is the step after data reconciliation in which the reconciled values of the process variables are used to set values for the model parameters (Marlin & Hrymak, 1997; Perkins, 1998). This inefficient two-step approach has led to the development of simultaneous strategies * Corresponding author. Tel.: + 1-412-268-2232; fax: +1-412-2687139. E-mail address:
[email protected] (L.T. Biegler).
for data reconciliation and parameter estimation (DRPE; Tjoa, 1991). The most commonly used formulation of a simultaneous DRPE problem is: min (least squares error in measurements)
(1.1)
subject to model constraints and bounds Formulation (1.1) is based on the assumption that measurements have normally distributed random errors in which case least squares is the maximum likelihood estimator. The problem is compounded when gross errors or biases are present in the data as these can lead to incorrect estimations and severely bias reconciliation of the other measurements. Gross errors can arise from a plethora of reasons such as broken gauges, process leaks, improper use of measuring devices, and random sources, such as an operator preparing the process log
0098-1354/01/$ - see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S 0 0 9 8 - 1 3 5 4 ( 0 1 ) 0 0 7 2 1 - 9
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599
1586
(Albuquerque & Biegler, 1996). A number of methods have been devised to identify and negate the effects of gross errors. These range from sequential or combinatorial methods to simultaneous data reconciliation and gross error detection. Crowe (1994) provides a good summary of the evolution of various methods for gross error detection.
1.1. Simultaneous data reconciliation and parameter estimation The general form of a simultaneous DRPE problem is: min F(x M, x)
(1.2)
x,u,p
s.t. h(x, u, p)=0 g(x, u, p)50 x L 5x5 x U p L 5p 5p U u L 5u 5u U, where F is some objective function dependent upon the difference between the measurement of a variable and its value for all measured variables, x M is the set of measurement data of the corresponding variable x, p the set of parameters, U is the set of unmeasured variables, h is the set of equality constraints, and g is the set of inequalities. In this case, if there are multiple measurements of a variable, its reconciled value would lie somewhere in between its successive measurements. This leads to a smaller variance of the reconciled variables and also reduces their sensitivity to any gross error detection tests. If we assume all measurements change with each data set, the problem is an errors in variables measured (EVM) problem, formulated as: m
min % Fi (x M i , xi )
(1.3)
xi,ui,p i = 1
s.t.
h(xi, ui, p)=0 Â g(xi, ui, p)50 Ã Ã x L 5xi 5x U Ì, Ã p L 5 p5p U Ã u L 5 ui 5u U Å
Öi
where the subscript i refers to the ith measurement set and the rest of the symbols mean the same as in Eq. (1.2). Finally, if the model comes from a discretized differential algebraic equation (DAE) system, then time is generally incorporated into the constraints and the objective function. The constraints discretized successively in time may not be decoupled in this case as in Eq. (1.2) and (1.3). The size of the problem in Eq. (1.3)
is generally much larger than the problem in Eq. (1.2) and this can be a significant issue if the problem in Eq. (1.2) is itself large. If there are biases or outliers in the data, the least squares objective function would be severely biased leading to incorrect reconciliation and estimation. The common procedure is to identify the measurements that suffer from gross errors and to somehow account for them. Yamamura, Nakajima and Matsuyama (1988) applied the Akaike information criterion (AIC) to identify biased measurements in a least squares framework for gross error detection. Due to the combinatorial nature of the problem attempted, they suggested a branch and bound method to solve the problem. The above combinatorial algorithm for data reconciliation can be automated by mixed integer programming techniques (see e.g. Soderstrom, Himmelblau & Edgar, 2000) but this is computationally expensive as it requires a discrete decision for each measurement. The problems become even more difficult when the model constraints, the objective function or both are non-linear. Albuquerque and Biegler used a robust M-estimator, the Fair function, to reduce the effects of gross errors. They used boxplots to identify outliers from the residuals of regression. They also review the advantages of using estimators based on robust statistics. Here a key advantage for outlier detection is the elimination of the combinatorial procedure. Estimators derived from robust statistics can be used as objective functions in simultaneous DRPE problems. These estimators put less weight on large residuals corresponding to outliers. This results in less biased parameter estimates and reconciled values. Statistical tests can then be applied to identify outliers (Hoaglin, Mosteller & Tukey, 1983). Commonly used robust estimators include the M-estimators by Huber (1981), Rey (1988), Hampel (1974) and Hoaglin et al. (1983). Before data reconciliation is performed, it is beneficial to identify the unobservable variables and parameters, and non-redundant measurements. Here gross errors in non-redundant measurements can lead to unobservable values and non-unique solutions rendering the estimates and fitted values useless. Stanley and Mah (1981) and later Crowe (1989) proposed observability and redundancy tests for steady state data reconciliation. Albuquerque and Biegler (1996) extended these to dynamic systems. To simplify and speed the calculations they applied sparse LU decomposition rather than a QR factorization. This is what we have also used in this paper. The presence of many different methods for DRPE raises the question of whether there is a common statistical framework in which both combinatorial and robust methods can be interpreted. In an attempt to provide an answer, we consider the AIC as a general framework for DRPE. In the following sections we
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599
describe the AIC and its applications to combinatorial and robust approaches to gross error detection. In Section 3, we describe mixed integer programming approaches. In Section 4, we describe properties of robust estimators and introduce a Huber estimator, the Fair function, and a Hampel estimator, the three part redescending estimator. We also suggest methods of outlier detection. With these concepts, we provide a simple algorithm for tuning the redescending estimator based on the AIC. In Section 5, we solve a DRPE problem with linear model constraints, compare the performance of the robust estimators with mixed integer approaches, and show the similarity between the two approaches. We then solve two non-linear examples and also compare the performance of various estimators. Finally, we conclude the paper in Section 6.
1587
ments having no gross errors. For this paper, we consider the likelihood function to be the least squares function formed after removing the outliers. We shall observe later that the AIC also offers a novel method to tune redescending estimators for efficient data reconciliation.
3. Mixed integer approaches for data reconciliation Yamamura et al. (1988) used the AIC for DRPE on a linear system. They divided the set of measurement sensors (J) into faulty (F) and non-faulty (J−F) sets. For the faulty sensors they estimated the biases in the following format: 1 1 % r 2j + % (rj − 6j )2 + F ri,6j,F 2 j J − F 2 jJ−F min
2. DRPE with the Akaike information criterion
s.t. % aijrj = bi,
Data reconciliation and gross error detection can be addressed as a model discrimination and parameter estimation problem, where multiple models correspond to the partitioning of random and gross errors. If more than one of these models can be fitted to the data under consideration, it becomes necessary to identify which model to use. To this end, one is interested in obtaining the most likely model and its parameters. Since maximum likelihood estimators are asymptotically efficient under certain conditions (Akaike, 1974), the likelihood function is a very sensitive criterion of deviation of model parameters from their true values. The AIC is an estimate of the Kullback– Leibler mean information for distance between the true model and the model under consideration. It is given by: AIC = − 2 log(maximum likelihood) + 2(c independently adjusted parameters within the model). and can be re-written as:
Here rj are the studentized residuals, 6j are the biases scaled by the instrument standard deviations and I is the set of equations resulting from elimination of the measured variables. To systematically select F, they devised a branch-and-bound strategy, selected a set of biased sensors and solved Eq. (3.1). This constituted the branching operation. For bounding the objective functions, they divided the power set of J into two non-intersecting subsets constituting faulty and non-faulty instruments. By solving Eq. (3.1) for these subsets they successively improved the lower bound which is the objective function of the above subproblems. This procedure can easily be translated into a mixed integer non-linear program (MINLP) with binary variables identifying faulty sensors. Using a linear model of the measured variables, we state the MINLP as: min %
xi,vi,yi i = 1
(x M v i − xi ) − i |i |i
s.t. Ax = 0
N
AIC = E(S)=2 % −log(l(m(i, p),i, p)) +2dim(p),
(iI).
jJ
n
(2.1)
(3.1)
yi {0, 1}
n
2
vi 5 Ui yi
n
+ 2 % yi
(3.2)
i=1
vi ] Li yi
xi ] 0,
i=1
(2.2) where E is the expectation, m is the measurement error obtained after reconciliation, i is the observation × , l is the likelihood function, and p is the number of independently adjusted model parameters. For data reconciliation, we consider the total number of parameters to be given by: dim(p)=dim(p0)+ nout
(2.3)
where p0 is the number of model parameters and nout is the number of outliers. Here variables with outlying measurements are treated as parameters because their reconciled values are adjusted only from measure-
where n is the number of measured variables, x M i is the measurement of the ith variable, xi is the reconciled value of the ith variable, |i is the standard deviation of the ith variable, yi is a binary variable denoting existence of bias in the ith variable, A is the matrix for constraints, vi is the magnitude of bias in the ith variable, and Li and Ui are the lower and upper bounds on bias in the ith variable. If we use an MINLP solver such as DICOPT to solve Eq. (3.2), we cannot always guarantee that all the intermediate mixed integer linear programs (MILPs) would be feasible and bounded. To ensure boundedness of the MILPs we add bound constraints of the form:
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599
1588
xi 5 Xi.
(3.3)
Also, due to the presence of the absolute value operator in the bound constraints of Eq. (3.2), we have to reformulate the problem in order to remove the non-differentiability caused by this operator: n
min
%
xi,vi,yi,zi i = 1
(x M vi i −xi ) − |i |i
n
2
n
+2 % yi
(3.4)
i=1
s.t. Ax = 0 vi 5 Ui yi − vi 5 Ui yi vi − zi Ui −zi Li + Li yi 50 −vi −zi Ui −zi Li +Li yi 5Li +Ui
ing the MINLP by using robust statistics and also relating this approach back to the AIC. Both the MILP and the MINLP are suitable for problems with only a few variables. For large problems such as EVM problems, the combinatorial overhead is too great to justify their use, especially on-line. Also, if there are non-linear constraints in the problems, the computational overhead on the MINLP can be large. In what follows, we shall observe that not only do robust estimators perform well, they also significantly reduce the computational overhead associated with mixed integer approaches. They are also suitable for large EVM problems because they allow efficient nonlinear programming algorithms to be applied.
zi 5 yi 05 xi 5Xi
4. Robust statistics and estimators based on robust statistics
yi, zi {0, 1}. Here, zi is a binary variable for the sign of bias value vi for the ith variable. Recently Soderstrom et al. (2000) devised a MILP approach to minimize an objective function similar to the AIC. The advantage of an MILP is that it eliminates the non-linear programming subproblem associated with the MINLP algorithm. The quadratic term in Eq. (3.4) is replaced by an l1 norm and a penalty is added:
)
)
n (x M v i − xi ) − i + % wi yi xi,vi,yi,zi i = 1 |i i = 1 |i n
min
%
(3.5)
s.t. Ax = 0 vi 5 Ui yi − vi 5 Ui yi vi − zi Ui −zi Li + Li yi 50 −vi +zi Ui +zi Li +Li yi 5Li +Ui zi 5 yi 05 xi 5Xi yi, zi {0, 1}. with wi being ‘weight’ functions that penalize identification of too many biases. The non-differentiability caused by the l1 norm is removed by rewriting the argument of in the absolute value as the difference of two positive numbers: xM i −(xi +vi )= ri −qi,
(3.6)
qi, ri ]0.
(3.7)
Problem (3.5) contains a more robust objective function but does not directly minimize the AIC. Also, the choice of the weight functions may be arbitrary, and the AIC is a more complete measure of model fitting as it also includes the maximum likelihood on good data. In the next section, we consider an alternative to solv-
In most DRPE problems data are assumed to follow a certain distribution which often determines the objective functions used. Data are mostly assumed to follow the normal distribution, in which case the associated likelihood function is the least squares estimator. It is often difficult to determine the distribution of data corrupted with outliers. The use of estimators derived from a fixed probability distribution is thus not justified. In such cases, robust estimators may be used. Robust estimators are largely distribution independent and produce unbiased results in presence of data derived from an ideal distribution (say, normal), but are insensitive to deviations from ideality (Albuquerque & Biegler, 1996). These estimators add less weight to outlying measurements and this protects other measurements from being corrupted. Let (x1, …, xn ) be derived from distribution f(x), T be the estimator of parameter p, and a(x) be an approximate model of the distribution of x. The unbiased estimate of p is now pˆ = T[ f(x)] and its approximate estimate is p˜ = T[a(x)]. Let the distributions of the above estimates of p be Y(pˆ, f ) and Y(p˜, a). Now the estimator T[·] is robust iff Öm, ×l:d( f, a)Bm [ d[Y(pˆ, f ),Y(p˜, a)]B l, (4.1) where d(·) is a distance function (Albuquerque & Biegler, 1996). This means that a finite difference between the true and assumed p.d.f. leads to a bounded difference in the parameter estimates. The influence function of an estimator is the effect of an observation on the estimates obtained. If the residuals m of an estimation process are drawn from a probability distribution f(m) and if T[ f(m)] is the unbiased estimate of a parameter p, then the influence function of a residual m0 is given by: T[(1− t)f+ tl(m−m0)]− T[ f ] IF = (m0)= lim . t t0 (4.2)
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599
Here, l(m−m0) is the delta function centered about m0. The influence functions of robust estimators are bounded. In this paper we compare the least squares estimator, the Fair function (Rey, 1988) and the threepart redescending estimator of Hampel which are all M-estimators.
4.1. M-estimators: properties and examples We now discuss the form and properties of M-estimators for the objective function F in Eqs. (1.2) and (1.3). If l(xi p) is the likelihood function of an observation xi dependent upon parameters p, then the overall likelihood function L, of the errors in N observations is given by: N
L = 5 l(xi p).
(4.3)
Fair function PFi = C 2
!
mi m − log 1+ i C C
1589
"n
, and
(4.6)
Three-part redescending estimator of Hampel Á Ã1 2 à m i , 05 mi 5 a 2 à a2 à a m − , aB mi 5 b i à 2 PHi Í 2 à ab− a +(c− b)a 1− c− mi à 2 2 c −b à 2 à ab− a +(c− b)a, mi \ c 2 2 à Ä
n 2
,
, bB mi 5c
(4.7)
i=1
The M-estimator associated with this likelihood function is its negati6e logarithm, i.e. N
N
i=1
i=1
z M = % zi = − log (L) = − % log l(xi p) N
=F OR % Fi,
(4.4)
i=1
depending on whether the problem is in the standard form or EVM, respectively. Here z M is the overall M-estimator and zi is the estimator associated with the ith observation. Two particular M-estimators include the Fair function, a Huber (1981) estimator, and the redescending estimator proposed by Hampel (1974). These M-estimators are defined as: Least squares estimator 1 PLi = m 2i , 2
Fig. 1. Plots of the three M-estimators considered.
(4.5)
where mi is the ith residual of regression satisfying c ]b+ 2a.
(4.8)
Here C is the tuning constant for the Fair function, a, b, and c are the tuning constants for the redescending estimator, and least squares is the only non-robust estimator. In Fig. 1, we depict plots of the above M-estimators. Observe that the Fair function increases only linearly for large residuals. This gives it some robustness compared with the least squares estimator because large residuals are given lower influence. The redescending estimator becomes constant for large observations; thus large residuals have zero influence which makes the redescending estimator very robust (i.e. IF 0 as m0 ). Both the Fair function and the redescending estimator are approximately least squares for small residuals. This gives these estimators high efficiency for data derived from a Gaussian distribution. Fig. 2 depicts the probability distribution of the redescending estimator for a= 1, b= 2, and c= 4. Notice that the central part of the distribution is derived from a normal distribution which is explained by the estimator being a least squares function in [− a, a]. The probability falls rapidly in [− c, − b] and [b, c] until it becomes constant for (− , − c) and (c, ). The long tail is responsible for the robustness. Notice that both the least squares estimator and the Fair function are con6ex functions whereas the redescending estimator is nonconvex. In many cases, one would need to initialize the redescending estimator by performing DRPE with least squares or the Fair function and, depending on the quality of the local solutions, one may have to resort to global optimization. The redescending estimator has the capability of being tuned to the system under investigation by setting/obtaining the best possible values for a, b and c. This would enable the ‘best’ possible fit to the data and also detect the outliers. However, it is not
1590
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599
Fig. 2. Probability plot of the three point redescending estimator.
clear how to test the goodness of fit without prior knowledge of the process, i.e. the past data or some kind of expected performance. In tuning the estimator, it is deemed necessary to estimate two of the three parameters. The third parameter can be assigned a workable lower/upper bound by Eq. (4.8). In the remainder of this section, we apply the AIC to tune this robust M-estimator.
4.2. Tuning the robust estimators The Fair function contains a single tuning constant C that is related to the asymptotic efficiency of the estimator (Rey, 1988) and allows the user to balance insensitivity to outliers with this efficiency (Albuquerque, 1996). On the other hand, we will tune constants for the redescending estimator based on minimizing the AIC associated with applications of this estimator. From Eq. (4.7) we see that the constant a is related to the amount of contamination in the data, m. One could also tune c, consider some relation between b and c, and obtain a from Eq. (4.8). Here the relationship of b to a or c depends upon the desired size of the redescending region (see Fig. 1). Setting b =c would make the estimator lose its resistance to rounding and grouping errors. For samples drawn largely from a Gaussian distribution, we want very few data items to lie outside [− b, b]. So, in this case it would be acceptable to have large values for b and have b close to c. However, if samples are drawn from distributions with tails heavier than Gaussian, it is advisable to have b small (Hoaglin et al., 1983). All these decisions require some preprocessing of data which may be substantial. Here we present a simple tuning strategy which requires no
preprocessing and is dependent upon the slope of the redescending region. We identify the best estimator by a two-step procedure. First, various estimators resulting from different combinations of a, b, and c are considered so that the minima of the resulting AIC can be bounded. Bounds for the location of the minimum are then selected and a golden section search (Edgar & Himmelblau, 1988; Fletcher, 1987) is performed over the value of c to obtain the best estimator. The complete procedure is: 1. Select a wide range of redescending estimators by setting a, b, and c, taking care to satisfy Eq. (4.8). 2. Perform data reconciliation with each set of constants and identify outliers by Eq. (4.10). 3. Calculate the AIC and store its values and/or plot them. 4. We now know the approximate location of the minimum. Select upper and lower bounds on the values of c within which the minimum seems to be located. 5. Perform golden section search for the precise value of c. In all iterations of the golden section search, use b=
c 2
and
a=
(c− b) . 2
(4.9)
6. The estimator eventually selected by golden section search is the estimator suited best to the data. The tuning process outlined above is not optimal because search methods for selecting values of b given c or a could differ from Eq. (4.9). In fact, this selection could be extended to a multidimensional search in a, b and c. However, because of the relatively minor influence of a and b in the objective function, we believe that Eq. (4.9) leads to a reasonably good tuning.
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599
1591
4.3. Outlier detection
5. Examples
We obtain the residuals of regression after DRPE. An analysis of these residuals helps identify outlying measurements. Different methods of outlier detection exist for different estimators. For the redescending estimator, since we have an explicit cutoff point, c, for outliers, an observation is deemed to have an outlier if its corresponding residual
We shall now motivate the benefits of using redescending estimators for DRPE with three examples taken from literature. The first example is a linearly constrained stream metering problem through which we motivate the benefits of robust estimators with emphasis on redescending estimators. We also compare the performance of the mixed integer approaches on this problem. In the second and third examples, we tune the redescending estimator based on the AIC. The last two examples are non-linear EVM problems. All examples have been solved on a Pentium III machine with dual 800 MHz CPUs or a Pentium III machine with single 667 MHz CPU, both using Linux as the operating system.
m] c.
(4.10)
No such cutoff points are available for the Fair function or least squares. We thus have to resort to exploratory data analysis (EDA; Hoaglin et al., 1983) to identify outliers (Albuquerque & Biegler, 1996). The EDA method we have used is the boxplot. In order to construct boxplots, data are first sorted and their fourth spreads are calculated as: dF = FU − FL,
(4.11)
where FU and FL correspond to the upper and lower fourths, respectively. Now, an observation u is deemed an outlier if 3 x 5 FL − dF 2
(4.12)
or 3 x ] FU + dF. 2
(4.13)
5.1. Linear example: stream metering This problem was first presented by Serth and Heenan (1986) and subsequently used by Narasimhan and Mah (1987), Rollins, Cheng and Devanathan (1996) and Soderstrom et al. (2000). The problem is a stream metering process with 28 streams flowing in and out of 11 nodes as depicted in Fig. 3. The flowrates of the streams are measured at the nodes. The problem has 11 equality constraints and 28 variables, all of which are measured, and all of the measurements are redundant. Each of the above authors have generated data for the problem differently, so their results are not
Fig. 3. Stream metering problem: example 1.
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599
1592
Table 1 Data reconciliation on the stream metering problem with the Fair function Horizon length 5
10
20
40
c Biased
OPffs1
AVT1ffs1
OPffs2
AVT1ffs2
3 5 7 3 5 7 3 5 7 3 5 7
0.624 0.544 0.515 0.665 0.583 0.507 0.664 0.587 0.470 0.649 0.589 0.471
2.736 2.873 2.789 1.945 2.111 2.554 1.666 1.912 2.670 1.499 1.887 2.588
0.640 0.551 0.448 0.627 0.539 0.480 0.629 0.537 0.504 0.633 0.536 0.504
3.907 3.149 3.461 3.460 2.816 2.772 3.427 2.725 2.482 3.381 2.683 2.440
entirely comparable. We have used the approach of Soderstrom et al. to generate data and compare our results with theirs. In order to solve the problem, a horizon of data is generated by sampling process data at various time intervals and organizing them into a data matrix as (Jang, Joseph & Mukai, 1986; Liebman, Edgar & Lasdon, 1992; Robertson, Lee & Rawlings, 1996): xM ··· Æx M 11 12 M Ãx M x 22 ··· X M = Ã 21 ·· Ã · Èx M xM ··· n1 n2
xM 1H Ç M Ã x 2H Ã Ã É xM nH
(5.1)
where
AVT1=
xM ij is the ith measured variable in jth time period, n the number of measured variables, and H is the siz e of horizon. (5.2) The size of a horizon is fixed in order to accommodate data in a real time system. Every new measurement leads to the ‘dropping’ of the corresponding oldest measurement. New horizons are constructed this way. This is akin to dropping-out the first column of the X M matrix and appending a column. The DRPE problem is now written as: H
n
min % % z M(xi −x M ij ) xi
i=1 j=1
s.t. Ax =0 x] 0,
numbering 3, 5 or 7, with their location selected randomly, are then added to the above. The magnitudes of the biases are taken to be between 5| and 25|. One hundred moving windows, each of length H, are chosen for a particular configuration and 100 such configurations are generated with the location of gross errors being different across successive configurations. The performance of the estimators is defined by the observed power (OP) and the average number of type-1 errors (AVT1). These are defined as: c of biased variables correctly identified OP = . of biased variables simulated (5.4)
(5.3)
where z M is an M-estimator, the Fair function or the redescending estimator. A Monte Carlo study is performed by generating data by adding noise and biases to the true values of the variables. Standard deviations, |, of the observations are taken to be 2.5% of the true values of the variables. Normal noise from N(0, |) is added to the true values for any set of moving windows. Biases
c of unbiased variables wrongly identified as biases . c simulation trials
(5.5) Data for the problem are generated using an application developed in MATLAB. The data reconciliation problem is solved using MINOS5 as the NLP solver which is invoked in GAMS. The redescending estimator being a non-con6ex function, can lead to the presence of local optima. To obtain a good initialization, we first solve the problem with the convex Fair function as the objective function. The results of this problem are then used as starting points for the redescending estimator. This way, the starting points are feasible and hopefully, lead to the ‘best’ local optimum. Also to handle the non-differentiable terms in Eq. (4.7) we apply an interior point smoothing function based on Chen and Mangasarian (1995), Gopal and Biegler (1999). Outlier detection is now addressed in two ways. For data reconciliation with the Fair function, we have used boxplots of the residuals as the method for outlier detection (Eqs. (4.12) and (4.13)). The Fair function tuned at a relative asymptotic efficiency of 70% was used in all the trials because of the added robustness. We have used four different redescending estimators in order to identify a range of the best redescending estimators for the problem. Results are reported in Tables 1 and 2. The superscripts ff and red indicate the
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599
estimator used: ff= Fair function and red =redescending estimator. The Fair function performs worse than most cases of the redescending estimator (Tables 1 and 2). This also suggests using the results of DRPE with the Fair function as starting points for DRPE with redescending estimators. From Table 2, we observe that the OP stabilizes very quickly between the different horizon sizes. It even shows a slight decrease as the horizon size is increased. Type-1
errors significantly decrease as the size of the horizon is increased. Type-1 errors also decrease as the redescending estimators are made less strict. This behavior is to be expected as now the redescending estimator can tolerate larger residuals without flagging them as outliers. In all the cases, we obtain a larger OP when redescending estimators are used except when they are the least restrictive. More significant than OP are the type-1 errors whose reduction with increasing horizon size indicates increasing accuracy of the robust estimators.
Table 2 Data reconciliation on the stream metering problem using redescending estimators Horizon 5
c Biased
a, b, c
OPred
AVT1red
OP (Soderstrom et al., 2000)
AVT1 (Soderstrom et al., 2000)
3
0.5, 1, 2 1, 2, 4 2, 4, 8 4, 8, 16 0.5, 1, 2 1, 2, 4 2, 4, 8 4, 8, 16 0.5, 1, 2 1, 2, 4 2, 4, 8 4, 8,16 0.5, 1, 2 1, 2, 4 2, 4, 8 4, 8, 16 0.5, 1, 2 1, 2, 4 2, 4, 8 4, 8, 16 0.5, 1, 2 1, 2, 4 2, 4, 8 4, 8, 16 0.5, 1, 2 1, 2, 4 2, 4, 8 4, 8, 16 0.5, 1, 2 1, 2, 4 2, 4, 8 4, 8,16 0.5, 1, 2 1, 2, 4 2, 4, 8 4, 8,16 0.5, 1, 2 1, 2, 4 2, 4, 8 4, 8, 16 0.5, 1, 2 1, 2, 4 2, 4, 8 4, 8, 16 0.5, 1, 2 1, 2, 4 2, 4, 8 4, 8, 16
0.673 0.626 0.571 0.455 0.610 0.552 0.498 0.392 0.616 0.546 0.476 0.376 0.667 0.624 0.573 0.454 0.606 0.544 0.499 0.391 0.613 0.542 0.475 0.374 0.669 0.622 0.572 0.452 0.599 0.540 0.499 0.392 0.602 0.532 0.470 0.375 0.666 0.624 0.577 0.456 0.589 0.538 0.510 0.388 0.597 0.528 0.472 0.380
5.820 2.065 0.305 0.069 6.005 2.462 0.624 0.161 6.538 3.295 1.407 0.609 5.844 1.901 0.280 0.065 6.034 2.353 0.586 0.154 6.526 3.125 1.357 0.611 5.761 1.851 0.268 0.074 5.911 2.240 0.565 0.155 6.452 3.007 1.364 0.602 5.653 1.741 0.249 0.061 5.794 2.177 1.669 0.163 6.297 2.857 1.311 0.570
0.67
0.23
0.60
0.51
0.55
0.84
0.69
0.10
0.60
0.56
0.59
0.97
0.74
0.12
0.63
0.62
0.62
0.74
0.75
0.17
0.71
0.59
0.59
1.26
5
7
10
3
5
7
20
3
5
7
40
3
5
7
1593
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599
1594
Table 3 Results of data reconciliation with MINLP and MILP on five horizons and three biases Solution method
wi
OP
AVT1
MINLP/MILP
– 1.0 2×H 100.0
0.885 0.685 0.704 0.112
0.071 2.265 0.081 0.000
Table 4 Performance of the redescending estimator, MINLP, and MILP compared on five horizons and three biases Solution method
Redescending estimators Redescending estimator b=1, c=2) Redescending estimator b= 2, c= 4) Redescending estimator b= 4, c = 8) Redescending estimator b=8, c=16)
AIC
(a=0.5,
CPU time per horizon(s)
3.491
1.114
(a=1,
14.112
1.075
(a= 2,
40.299
1.057
(a= 4,
96.399
1.019
1.067 1.211 1.117 17.148
42.989 4.369 0.592 0.386
Mixed integer approaches MINLP MILP (wi = 1) MILP (wi = 2×H) MILP (wi = 100.00)
5.1.1. Results from MINLP and MILP Finally, we compare the performance of the MINLP and the MILP with the redescending estimator and the Fair function. Refer to Eq. (3.2) where we have set Ui to 30. |i and Li to 3. |i so that these are fairly generic bounds. We have tested the efficacy of Eq. (3.4) on a problem with a horizon size of five, the number of biases equal to 3, the number of moving windows per data set equal to 100, and 100 overall data sets. The above problem has been solved using the solver DICOPT invoked in GAMS calling CPLEX as the MIP solver. Results are summarized in Table 3. The problem formulated in this way is the true maximum likelihood estimator of the AIC. We then solved the problem as by Soderstrom et al. Here one can set the weight functions to any value and obtain different results. In Table 3 we have given results for some values of the weight functions. Observe that we obtain much better results (higher OP and lower AVT1) than Soderstrom et al. when we set wi =2× H where H is the horizon length. We obtain similar result when we do not weight the binary variables (wi =1) and very poor performance when we set weight to arbitrary large value of 100. This however drives the AVT1 to 0. The weights can be optimized as we have done for the redescending estimator. How-
ever, each time we would have to solve a potentially expensive problem (MILP vs. NLP) which is not a very attractive proposition. In Table 4 we also present values of the AIC for the above four redescending estimators and the MINLP. The AIC values have been scaled by the total number of measurements. Results are reported for the three biases, horizon size= five problem. As expected, the redescending estimator giving a higher OP also has lower AIC. The AIC increases as OP decreases for the redescending estimator. The AIC of the MINLP is smaller than the AIC of all redescending estimators. This is explained by a better overall fit produced by the MINLP due to a discrete decision on each measurement. Note also that OP and AVT1 which are commonly used to assess the performance of an estimator, are consistent but not as complete as the AIC, which maximizes the overall probability of the estimator for the DRPE problem with gross errors. This is because the AIC considers the reconciled residual values as well; that is, an estimator having high OP may produce poor reconciliation in uncorrupted measurements. This can be seen clearly in Table 4. From Table 4 we see that the minimum value for the AIC is provided by the MINLP approach and the MILP results approach this for wi = 1 and wi =2×H. In comparison, AIC for the NLP approach with the redescending function is not as good as MINLP or MILP approaches, although it is still quite reasonable. Moreover, these results can be further improved by tuning the constants using the AIC as a guide. On the other hand the MINLP problem is very expensive for this example; it requires about 40 times more computation than the redescending approach. In fact, the expense of the MINLP approach allowed us to consider only 15 horizons out of the 100 that were formulated. With respect to the MILP approach, the NLP approach requires about the same computational expense, although this depends heavily on the weight factor in the MILP formulation. However, the NLP approach can be extended directly to non-linear models with similar performance. On the other hand, extending the MILP approach to non-linear models requires an MINLP formulation, which is likely to be much more expensive. The solution of non-linear examples is considered in the next section, along with the need to obtain the redescending estimator best suited to the system, using an AIC tuning procedure. Table 4 shows that the AIC is strongly influenced by tuning parameters in the redescending estimator. This motivates the need to search for the redescending estimator best suited to the process; it is the focus of the next two non-linear examples.
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599
5.2. Non-linear examples 5.2.1. Pai and Fisher (1988) This problem has been taken from Pai and Fisher (1988) and has also been used by Tjoa (1991) to study the contaminated normal estimator. We have analyzed this problem by solving it both in the standard formulation (Eq. (1.2)) and the EVM formulation (Eq. (1.3)). In this example there are six equality constraints, five measured variables—all measurements redundant— (x1 – x5), and eight variables in all, the unmeasured variables being observable. Variables x1 – x5 are measured 100 times and each measurement set has 20 gross errors in the form of stuck sensors. The first 20 measurements of x1 have gross errors followed by the next 20 measurements of x2 and so on, making a total of 100 gross errors. In addition, normal noise is added with a S.D. of 0.1. The exact values of the variables are: xexact =[4.5124, 5.5819, 1.9260, 1.4560, 4.8545]T, and
uexact =[11.070, 0.61467, 2.0504]T
(5.6)
First consider the data reconciliation problem solved in the standard form henceforth referred to as case 1. Results are summarized in Table 5. All estimators produce reconciled values that are very close to the true values of the variables. This is because of the large redundancy introduced by multiple measurements of the same variable. Neither least squares estimator nor the Fair functions were able to flag outliers. This is verified by boxplots (4.11), (4.12) and (4.13). The robustness offered by the Fair function is offset by the large redundancy in the measurements. On the other hand, we consider reconciliation performed with the redescending estimators (M1 – M8). From Table 5 we observe that neither estimator M1 – M4 is able to detect outliers. Outlier detection starts from M5 where we also
1595
observe a sharp decrease in the value of AIC. M7 and M5 detect more than 80% of the measurements as being corrupted. This is due to the increase in type-1 errors with increasing robustness of the estimator (Davies & Gather, 1993). The AIC also increases from M7 to M8. With this knowledge we tuned the redescending estimator. The resulting estimator, reconciled values due to it and its AIC are given in Table 5. We have divided the AIC by the number of measurements, i.e. 500, in order to scale its values. As mentioned before, the tuned estimator lies close to M6 and provides the best estimation. It, however, contains more type-1 errors than M6 but these do not deteriorate the reconciliation significantly. We then solved the problem in the EVM form (Eq. (1.3)) henceforth referred to as case-2. The redescending estimators were the same as were used in the standard form. In this study, however, we count the number of outliers identified correctly and the number of observations incorrectly flagged as outliers. The least squares estimator is again unable to detect any outliers. The Fair function is more sensitive in this case and as we make it more robust, it detects more outliers as seen in Table 6. However, the maximum power of outlier detection is only 31%. The redescending estimators start detecting outliers from M4 but most of the outliers appear as type-1 errors, which progressively increase from M4 to M5 as shown in Table 6. Overall power is less than 75% and this indicates that the quality of estimation is not very good. The values of AIC are, however, lower than for case 1 (Table 6). We then tuned the estimator which led to identification of an estimator close to M6 (Table 6). The tuned estimator performs somewhat better than M6 as it suffers from fewer type-1 errors and leads to higher power. We can conclude that redescending estimators are more sensitive than the Fair function in detecting outliers but this
Table 5 Data reconciliation of example 2 in standard form — case 1 x1
x2
x3
x4
x5
u1
u2
u3
True values LSE Fair function (effect = 95%) Fair function (effect = 80%) Fair function (effect = 70%)
4.5124 4.5503 4.5444 4.5448 4.5474
5.5819 5.5602 5.5633 5.5626 5.5608
1.9260 1.9208 1.9216 1.9216 1.9212
1.4560 1.4755 1.4713 1.4690 1.4687
4.8545 4.8442 4.8468 4.8489 4.8497
11.070 11.1361 11.1264 11.1279 11.1331
0.6146 0.6152 0.6151 0.6152 0.6152
2.0504 2.0497 2.0502 2.0510 2.0510
Redescending estimator (a, b, c, nout, AIC) M1 (10, 20, 40, 0, 4.353) M2 (5, 10, 20, 0, 4.353) M3 (4, 8, 16, 0, 4.353) M4 (2, 4, 8, 0, 4.360) M5 (1, 2, 4, 80, 1.768) M6 (0.5, 1, 2, 119, 1.083) M7 (0.2, 0.4, 0.8, 287, 1.227) M8 (0.1, 0.2, 0.4, 378, 1.525) Tuned (0.369, 0.739, 1.478, 158, 1.034)
4.5605 4.5605 4.5622 4.5477 4.5279 4.5313 4.5661 4.5702 4.5449
5.5543 5.5543 5.5534 5.5612 5.5727 5.5708 5.5476 5.5452 5.5609
1.9194 1.9194 1.9192 1.9212 1.9239 1.9234 1.9189 1.9183 1.9217
1.4801 1.4801 1.4812 1.4720 1.4627 1.4644 1.4643 1.4661 1.4609
4.8420 4.8420 4.8413 4.8468 4.8514 4.8505 4.8571 4.8562 4.8562
11.1541 11.1541 11.1569 11.1325 11.0977 11.1036 11.1702 11.1775 11.1311
0.6154 0.6154 0.6154 0.6152 0.6149 0.6149 0.6155 0.6156 0.6152
2.0497 2.0497 2.0496 2.0505 2.0505 2.0504 2.0561 2.0562 2.0539
1596
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599
Table 6 Results of estimation of example 2 in the EVM form — case 2 Estimator Fair function (effect = 95%) Fair function (effect = 80%) Fair function (effect = 70%) Redescending estimator (a, b, c, nout, AIC) M1 (10, 20, 40, 2.747) M2 (5, 10, 20, 2.747) M3 (4, 8, 16, 2.759) M4 (2, 4, 8, 2.859) M5 (1, 2, 4, 1.323) M6 (0.5, 1, 2, 0.980) M7 (0.2, 0.4, 0.8, 1.276) M8 (0.1, 0.2, 0.4, 1.410) Tuned (0.697, 1.393, 2.787, 0.882)
nout
Type?
x1
x2
x3
x4
x5
1 8 48
Correct Correct Correct
0 0 0
0 1 9
0 0 17
0 0 1
0 4 4
0 0 0 2 94 182 316 352 129
Correct Correct Correct Correct Correct Correct Correct Correct Correct
0 0 0 0 0 4 10 18 17
0 0 0 0 10 9 11 19 10
0 0 0 0 20 20 20 20 20
0 0 0 0 9 6 9 13 10
0 0 0 0 7 7 20 5 1
Fig. 4. Connected tanks.
is often at the price of type-1 errors. Even when there are repeated measurements of variables as in Case1, redescending estimators are superior to the Fair function due to their ability to discard outliers. We now examine the final example, which furthers these conclusions.
5.2.2. Connected tanks This problem concerns two CSTR’s connected by a valve with liquid flowing into the first tank, from the first tank into the second tank, and out of the second tank. The three flows, F0, F1, and F2, and the heights L1 and L2 are measured periodically. The areas of the tanks, A1 and A2 are the unknown parameters. The process is depicted in Fig. 4 and the system is described by an index-2 DAE system (Albuquerque & Biegler, 1996). Measurement data is simulated as was done by Albuquerque and Biegler (1996). All data have normal noise, with variance 0.01, added. In addition, the sensors for F2 and L1 are assumed to be stuck at the second time instant for measurement. The DAE system is discretized by implicit Euler’s scheme. Measurements are taken at each interval of discretization. Thus this problem is an EVM problem with coupling in successive constraints for each set of measurements.
All measurements except F0 are redundant and the parameters 1/A1 and 1/A2 are observable. The problem is solved with the least squares estimator, Fair functions at 95, 80, and 70% asymptotic efficiencies, and the redescending estimator. All cases have been solved using the reduced Hessian successive quadratic programming (rSQP) approach of Biegler, Nocedal and Schmid (1995). Results for estimation of 1/A1 and 1/A2 have been summarized in Table 7. Observe that the least squares estimator performs very poorly because the estimates of both parameters hit their lower and upper bounds, respectively. This behavior is to be expected because of no robustness of the least squares estimator. The Fair functions perform somewhat better but still the estimate of 1/A1 hits the lower bound. Again, the estimate of 1/A2 becomes better as the Fair function becomes more robust. In contrast, the redescending estimators perform a lot better. From M2 to M6, the estimates of 1/A1 and 1/A2 are very close to their true values. The AIC values (Table 7) have again been scaled as in example 3. The AIC values indicate that the best estimator lies close to M6. Upon tuning the estimator, we find this to be indeed the case. The estimates of 1/A1 and 1/A2 are almost their true values. In Figs. 5 and 6, we have plotted the fitted values of L1 and F2 when estimation is performed by the least squares
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599 Table 7 DRPE on connected tanks
6. Conclusions
Trials
1/A1
1/A2
True values Least squares Fair function (effect =95%) Fair function (effect =80%) Fair function (effect =70%)
0.5 0.25a 0.25a 0.25a 0.25a
0.5 0.55a 0.320 0.388 0.439
Redescending estimators (a, b, c, nout, AIC) M1 (a = 8, b =16, c= 32, 48, 24.855) M2 (a = 4, b =8, c=16, 49, 36.647) M3 (a = 2, b =4, c=8, 95, 1.409) M4 (a = 1, b =2, c=4, 95, 1.078) M5 (a = 0.5, b =1, c=2, 103, 1.048) M6 (a = 0.1, b =0.2, c= 0.4, 163, 1.303) Tuned (a = 0.362, b =0.724, c = 0.1449, 105, 0.974)
0.308 0.454 0.484 0.485 0.498 0.492 0.502
0.391 0.485 0.498 0.499 0.499 0.495 0.499
a
1597
Bounds.
estimator, the Fair function (efficiency =70%), and the tuned redescending estimator. In addition, we have also plotted the original noisy data and the stuck measurements. The poor performance of least squares is clearly brought out where fitted values due to it have been grossly underestimated. The Fair function underestimates L1 but fits F2 very closely. This is the effect of 1/A1 being at the lower bound. In both cases, the redescending estimator ignores the noise and fits L1 and F2 at their true values.
A simultaneous data reconciliation and parameter estimation strategy using redescending estimators has been presented. Redescending estimators are found to be very robust and this is brought about by their superior performance w.r.t. a Huber estimator, the Fair function, on a variety of examples. In addition, outlier detection with the redescending estimators has been shown to be very straightforward. We have also provided a derivation of DRPE based on AIC and compared and contrasted broad strategies that can be derived from AIC. The MINLP approach is a direct minimizer of AIC and this is supported by earlier work of Yamamura et al. (1988). There also exists a related work by Soderstrom et al. (2000). On the other hand, robust statistics and the redescending function provide a similar task of minimizing AIC by requiring only the solution of an NLP, hence reducing the computational load of the MINLP/MILP and alleviating problems associated with these. An innovative two-step tuning strategy for the redescending estimator has been suggested that is based upon minimizing the AIC. This has been found to work very well with the examples considered. Future work will deal with the development of more efficient and reliable NLP algorithms to take advantage of the structure of DRPE problems with M-estimators. Important issues here are also reliable algorithm behavior even in the presence of non-redundant variables. These shall be addressed in a future paper.
Fig. 5. Fitted values of L1.
1598
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599
Fig. 6. Fitted values of F2.
Acknowledgements Funding from the Elkem Foundation is gratefully acknowledged for this work. The authors are also grateful to Dr Guillermo Sentoni for very helpful suggestions on redescending functions over the course of this work.
References Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, AC-19 (6), 716. Albuquerque, J. S. (1996). Parameter Estimation and Data Reconciliation for Dynamic Systems. Carnegie Mellon University: Ph.D. thesis. Albuquerque, J. S., & Biegler, L. T. (1996). Data reconciliation and gross-error detection for dynamic systems. American Institute of Chemical Engineering Journal, 42 (10), 2841. Biegler, L. T., Nocedal, J., & Schmid, C. (1995). A reduced Hessian method for large-scale constrained optimization. SIAM Journal of Optimization, 5 (2), 314. Chen, C., & Mangasarian, O. L. (1995). A Class of Smoothing Functions for Nonlinear and Mixed Complementarity Problems, Technical report. University of Wisconsin: Computer Sciences Department. Crowe, C. M. (1989)). Observability and redundancy of process data for steady state reconciliation. Chemical Engineering Science, 44 (12), 2909. Crowe, C.M. (1994). Data reconciliation — progress and challenges. In Proceedings of PSE’94. Davies, L., & Gather, U. (1993). The identification of multiple outliers. Journal of the American Statistical Association, 88 (423), 782. Edgar, T. F., & Himmelblau, D. M. (1988). Optimization of Chemical Processes. New York: McGraw-Hill.
Fletcher, R. (1987). Practical Methods of Optimization (second ed.). New York: Wiley. Gopal, V., & Biegler, L. T. (1999). Smoothing methods for complementarity problems in process engineering. American Institute of Chemical Engineering Journal, 45 (7), 1535. Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69 (346), 383. Hoaglin, D. C., Mosteller, F., & Tukey, J. W. (1983). Understanding Robust and Exploratory Data Analysis. New York: Wiley. Huber, P. J. (1981). Robust Statistics. New York: Wiley. Jang, S. S., Joseph, B., & Mukai, H. (1986). Comparison of two approaches to on-line parameter and state estimation of nonlinear systems. Industrial Engineering and Chemical Process Design and De6elopment, 25, 809. Liebman, M. J., Edgar, T. F., & Lasdon, L. S. (1992). Efficient data reconciliation and estimation for dynamic processes using nonlinear programming techniques. Computers and Chemical Engineering, 16 (10/11), 963. Marlin, T.E., & Hrymak, A.N. (1997). Real-Time operations optimization of continuous processes. In Fifth International Conference on Chemical Process Control, (CACHE/AICHE). Narasimhan, S., & Mah, R. S. H. (1987). Generalized likelihood ratio method for gross error identification. American Institute of Chemical Engineering Journal, 33 (9), 1514. Pai, C. C. D., & Fisher, G. D. (1988). Application of Broyden’s method to reconciliation of nonlinearly constrained data. American Institute of Chemical Engineering Journal, 34 (5), 873. Perkins. J.D. (1998). Plant wide optimization — opportunities and challenges. In FOCAPO, (CA CHE/AIChE). Rey, W. J. J. (1988). Introduction to Robust and Quasi-Robust Statistical Methods. Berlin/New York: Springer. Robertson, D. G., Lee, J. H., & Rawlings, J. B. (1996). A moving horizon-based approach for least-squares estimation. American Institute of Chemical Engineering Journal, 42 (8), 2209. Rollins, D. K., Cheng, Y., & Devanathan, S. (1996). Intelligent selection of hypothesis tests to enhance gross error identification. Computers and Chemical Engineering, 20 (5), 517.
N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599 Serth, R. W., & Heenan, W. A. (1986). Gross error detection and data reconciliation in stream –metering systems. American Institute of Chemical Engineering Journal, 32 (5), 733. Soderstrom, T.A., Himmelblau, D.M., & Edgar, T.F. (2000). A mixed integer optimization approach for simultaneous data reconciliation and identification of measurement bias. In ADCHEM 2000. Stanley, G. M., & Mah, R. S. H. (1981). Observability and redun-
1599
dancy in process data estimation. Chemical Engineering Science, 36, 259. Tjoa, I. B. F. (1991). Simultaneous Solution and Optimization Strategies for Data Analysis. Carnegie Mellon University: Ph.D. thesis. Yamamura, K., Nakajima, M., & Matsuyama, H. (1988). Detection of gross errors in process data using mass and energy balances. International Chemical Engineering, 28 (1), 91.