ISA Transactions 47 (2008) 157–170 www.elsevier.com/locate/isatrans
Hardware/firmware implementation of a soft sensor using an improved version of a fuzzy identification algorithm Claudio Garcia ∗ , C´assio de Carvalho Berni, Carlos Eduardo Neri de Oliveira Automation and Control Laboratory, Department of Telecommunications and Control Engineering, Polytechnic School of the University of S˜ao Paulo, Av. Prof. Luciano Gualberto, trav. 3 – n. 380 – CEP: 05508-900-S˜ao Paulo, SP, Brazil Received 10 December 2006; accepted 17 September 2007 Available online 5 November 2007
Abstract This paper presents the design and implementation of an embedded soft sensor, i.e., a generic and autonomous hardware module, which can be applied to many complex plants, wherein a certain variable cannot be directly measured. It is implemented based on a fuzzy identification algorithm called “Limited Rules”, employed to model continuous nonlinear processes. The fuzzy model has a Takagi–Sugeno–Kang structure and the premise parameters are defined based on the Fuzzy C-Means (FCM) clustering algorithm. The firmware contains the soft sensor and it runs online, estimating the target variable from other available variables. Tests have been performed using a simulated pH neutralization plant. The results of the embedded soft sensor have been considered satisfactory. A complete embedded inferential control system is also presented, including a soft sensor and a PID controller. c 2007, ISA. Published by Elsevier Ltd. All rights reserved.
Keywords: Soft sensor; System identification; Nonlinear modeling; Fuzzy models; Embedded sensor
1. Introduction In the process industry, there are some types of plants that have important variables which cannot be measured by conventional (hard) sensors. The reasons that make a hard sensor not applicable are: unavailability in the market, compelling the use of laboratory analysis; high expense; unacceptable precision; much slow response for control applications and so on. An example in wastewater treatment process is BOD (Biochemical Oxygen Demand), which is obtained only by laboratory tests. The use of a soft sensor is proposed for those cases where the use of a hard one is not feasible. A soft sensor is a computational model that estimates a target variable from discrete-time data collected from other variables in the plant. This model is normally developed using system identification techniques, because if one uses phenomenological models, the process equations could be so ∗ Corresponding author. Tel.: +55 11 30915648; fax: +55 11 30915718.
E-mail addresses:
[email protected] (C. Garcia),
[email protected] (C. de Carvalho Berni),
[email protected] (C.E.N. de Oliveira).
complex that an estimation would take too many times the sampling period. In this paper, there are two innovative aspects: (a) The first one is to apply an identification algorithm that is based on an evolution of the work presented by Wang and Langari [1]. As it is most common, neural networks are normally applied to develop soft sensors. In this paper, fuzzy logic and clustering algorithms are used, generating a discrete-time fuzzy model, the structure of which is based on Takagi–Sugeno–Kang (TSK) proposal [2,3], defined by linear segments modeling; and (b) The second innovative aspect is to develop a soft sensor in an autonomous and generic hardware instead of running it in a microcomputer, as usual. Nowadays, there are some companies that provide soft sensors, such as: Pavilion, Emerson, Aspen, Matrikon, Honeywell and Yokogawa, among others. In some cases, these implementations are Windows-based solutions in a PC microcomputer. The algorithms were tested using a simulated pH neutralization plant. This process occurs in a CSTR unit (Continuous Stirred Tank Reactor) and it has several scientific and industrial applications. Despite the availability of good hard
c 2007, ISA. Published by Elsevier Ltd. All rights reserved. 0019-0578/$ - see front matter doi:10.1016/j.isatra.2007.09.004
158
C. Garcia et al. / ISA Transactions 47 (2008) 157–170
sensors to measure pH in real time, this process was chosen due to its high nonlinearity. The structure of this paper is as follows: Section 2 presents an overview of the fuzzy identification algorithms employed here (original and improved versions of Wang–Langari method); Section 3 describes the training and execution steps of the “Limited Rules” method; Section 4 briefly presents the pH neutralization process; Section 5 presents a comparison between the Wang–Langari and Limited Rules methods; Section 6 presents an overview of software and hardware of the embedded soft sensor; Section 7 presents the model implemented in the embedded soft sensor; Section 8 presents the tests performed with the software sensor and, finally, Section 9 presents the conclusions. 2. The fuzzy identification algorithms The generic format of each rule in the TSK structure is shown next: Ri : if (u 1 (k) is µi1 (k) and u 2 (k) is µi2 (k) and . . . and u p (k) is µi p (k)) then yi (k) = bi0 + bi1 · u 1 (k) + bi2 · u 2 (k) + · · · + bi p · u p (k)
However, due to its mathematical complexity, the resulting model takes at least 10 times the estimated time of a model generated using FCM, without a great increase in the model accuracy. In [1], FCM is employed. Next, an unidimensional version of the FCM is presented, as used by Wang and Langari [1], in which the input variables are treated independently. The FCM algorithm aims at minimizing the objective function: Jj =
n c X X (µi j (k))m · |u j (k) − vi j |
where: vi j : cluster center i relative to input variable j m: determines the “fuzziness” of the clusters, normally it is assumed that m = 2. Eq. (3) calculates the membership grade of an input variable to a cluster: µi j (k) =
(1)
where: u j (k): input variable j acquired at instant k, 1 ≤ j ≤ p and 1≤k≤n µi j (k): membership grade of the input variable j, acquired in k, to cluster i; 1 ≤ i ≤ c and µi j (k) ∈ [0, 1] p: number of input variables n: number of collected points c: number of clusters. Next, the method to get the premise and the consequent parameters of the TSK fuzzy rules are described, following what is proposed in Wang and Langari [1]. 2.1. Premise generation The antecedents of a fuzzy model may be obtained in three different ways. The first one is subjective, because it utilizes an expert’s previous knowledge. The second one is based on statistical models. And the third way uses clustering as a way of determining the data clusters. In this work, clustering was used, as the aim was to use an automatic way to generate models, with no expert intervention. Clustering is the first step to generate a fuzzy inference nucleus using data collected from the plant. After clustering, the membership functions are defined, and consequent parameters are calculated. The clustering may be performed by any known method, such as Fuzzy C-means (FCM), Gustafson–Kessel [4] etc. The FCM algorithm searches for regions in the input space where there are clusters of points which determine their centers. In this algorithm, the pertinence of a point to a cluster decreases with distance and is independent of the direction. In this case, the clusters are called spherical. Gustafson–Kessel [4] method finds ellipse-shaped clusters choosing a different axis and a different grade of importance for each axis of each cluster.
(2)
i=1 k=1
1 c P
(3)
(di (k)/dl (k))2/(m−1)
l=1
where di (k) = |u j (k) − vi j | and dl (k) = |u j (k) − vl j |; 1 ≤ i ≤ c, 1 ≤ j ≤ p, 1 ≤ k ≤ n. If di (k) = 0 for any i = s, then µs j (k) = 1 and ∀i 6= s, µi j (k) = 0. The membership grades µi j (k) must obey two restrictions: c X
µi j (k) = 1
j = 1 . . . p; k = 1 . . . n
(4a)
i=1
0<
n X
µi j (k) < n
i = 1 . . . c; j = 1 . . . p.
(4b)
k=1
The first restriction aims at avoiding the trivial solution of function (2), that is, it avoids the solution where µi j (k) = 0, for i = 1 . . . c; j = 1 . . . p and k = 1 . . . n. The second restriction hinders a cluster from being totally empty or from containing all the points with membership grade 1. The coordinates of cluster centers vi j are calculated using the equation: n P
vi j =
(µi j (k))m · u j (k)
k=1 n P
i = 1 . . . c; j = 1 . . . p. (µi j
(5)
(k))m
k=1
Cluster centers that minimize Eq. (2) are iteratively calculated using Eqs. (3) and (5), which are initialized either with a center matrix V = [vi j ], i = 1 . . . c; j = 1 . . . p; or using a fuzzy partitioning matrix U = [µi j (k)], i = 1 . . . c; j = 1 . . . p; k = 1 . . . n. They are randomly generated, according to restrictions (4a) and (4b). According to a given tolerance ξ , the algorithm stop criterion can be defined as: l l−1 k < ξ , (a) max(|µli jk − µl−1 i jk |) < ξ or kU − U if matrix U is used as initialization; or
159
C. Garcia et al. / ISA Transactions 47 (2008) 157–170 l l−1 k < ξ , (b) max(|vil j − vil−1 j |) < ξ or kV − V if matrix V is used as initialization, where index l indicates the calculated value of lth iteration. In Eq. (5), a center for each cluster i of each input variable j is calculated. This was the clustering form employed in the work by Wang and Langari [1], in which the input variables are treated independently. The data of each variable are grouped and clustered individually, generating c clusters for each variable and consequently, c centers of clusters per variable. It yields the following set of fuzzy rules:
R1 : if (u 1 (k) is µ11 (k) and u 2 (k) is µ12 (k) and . . . and u p (k) is µ1 p (k)) then y1 (k) = b10 + b11 · u 1 (k) + b12 · u 2 (k) + · · · + b1 p · u p (k) R2 : if (u 1 (k) is µ21 (k) and u 2 (k) is µ12 (k) and . . . and u p (k) is µ1 p (k)) then y2 (k) = b20 + b21 · u 1 (k) . + b22 · u 2 (k) + · · · + b2 p · u p (k).. Rr : if (u 1 (k) is µc1 and u 2 (k) is µc2 and . . . and u p (k) is µcp ) then yr (k) = br 0 + br 1 · u 1 (k) + br 2 · u 2 (k) + · · · + br p · u p (k) where r = c p corresponds to the number of fuzzy rules. For small p and c, this clustering form is possible. But the amount of rules r = c p becomes computationally infeasible when p and c are increased, taking the system to the curse of dimensionality condition. Another inconvenience consists of not taking into account the input–output function, i.e., to distribute the clusters uniformly, across the operation area. Generally, that provokes a computational waste of time and increases the amount of errors in the global output. In that case, it is common for the computer memory to run out during the training. To solve this problem, two of the authors [5] proposed an improvement to the Wang and Langari method [1], employing vector clustering, calculating the cluster centers not more for each input variable, but for the whole set, including the output variable. For this, the multidimensional version of FCM is used. In that method, the collected data (input and output) are grouped as points in a p + 1 space dimension and then they are clustered. This clustering form, where all the regression space is used, is known as “Product Space Clustering” [4]. Each new cluster that is found is associated with a rule and its consequent function. This way, the number of clusters c directly defines the number of rules r and the rules assume the format presented in (1). Due to this reduction in the number of rules, this new method was entitled “Limited Rules”. In the previous method, as the rule set is generated from all the input data combinations, this makes common the creation of rules to model space regions of low importance or a lack of information. This fact brings about an unnecessary computational effort, not only on model generation, but also on output estimation. In the proposed method, the use of all the regression space in the clustering presents better results because, as the output variable affects the clustering, it is possible to identify better
Fig. 1. Example of clustering performed just with the input variable and with input and output variables.
the nonlinear regions, assigning to them a greater amount of clusters. An example of this fact can be seen in Fig. 1. It shows a function y(u) and the projected cluster centers found by conventional clustering of input variables (A) and by clustering of all regression space (B). It is possible to notice that in the second case the generated clusters concentrate at the region of greater output variations, that is, where 0 < u < 0.2. The disadvantage of this method is the loss of the linguistic character of fuzzy inferences. Each cluster can receive a label in the method proposed by Wang and Langari. This allows the association of clusters and linguistic terms such as “high”, “medium” and “low”. In principle, there is no way to do such an association in the proposed method. Nevertheless, for the development of the soft sensor, this does not seem to be a great loss. Next, the clustering algorithm for a p + 1 space dimension is presented. In the procedure developed in this paper, clustering is performed by grouping the input and output data. This way, vector x(k) has, besides the input variables, a last value referring to the output. J=
c X n X
(µi (k))m · kX(k) − vi k2
(6)
i=1 k=1
where: X(k) = [u 1 (k) u 2 (k) . . . u p (k) y(k)]: vector of input and output variables in time k vi = [vi1 vi2 . . . vi p vi, p+1 ]: cluster center i µi (k): membership grade of the input vector in instant k to cluster i, µi (k) ∈ [0, 1]. The membership grade of an input vector to a cluster is calculated by: µi (k) =
1 c P
(7)
(di (k)/d j (k))2/(m−1)
j=1
where di (k) = kX(k) − vi k (Euclidean norm); 1 ≤ i ≤ c, 1 ≤ k ≤ n.
160
C. Garcia et al. / ISA Transactions 47 (2008) 157–170
The membership grades µi (k) must obey the same restrictions mentioned in (4a) and (4b). The coordinates of the centers vi are calculated by: n P
vi =
(µi (k))m · X(k)
k=1
.
n P
(8)
(µi (k))m
k=1
The initialization, as well as the stop criterion of the algorithm, is similar to the forms previously proposed. During the execution of the model as a soft sensor there will be no access to the measured output, whose value must be estimated, so it is enough to store the projection of the cluster center to the input space. This is made by discarding the cluster center coordinate relative to the input variable, that is: vi0 = [vi1 vi2 . . . vi p ]. 2.2. Consequent generation For the Wang and Langari method, the activation level of each rule is calculated by the Boolean operation AND of the membership grades that compose the rule. In fact, this is made by simply multiplying these membership grades. Further information can be found in [1]. As seen in (1), the consequents of the TSK model assume the following form: yi (k) = bi0 + bi1 · u 1 (k) + bi2 · u 2 (k) + · · · + bi p · u p (k). (9) As in the “Limited Rules” method each rule is associated to only one cluster, the activation level of a rule, given an input, becomes exactly the membership grade of that input to the corresponding cluster, that is: gi (k) = µi (k)
(10)
where the membership grade is calculated by Eq. (7). The output of the fuzzy model is: yˆ (k) =
r X
yˆ (k) =
(11) gi (k) · (bi0 + bi1 · u 1 (k) + · · · + bi p · u p (k)).
In the Takagi–Sugeno model, it is necessary to estimate the rule consequent parameters bi j . The purpose is to minimize the sum of the quadratic errors between the model and the real process, that is, the expectation is to minimize:
J =
n X (y(k) − yˆ (k))2 k=1 n X k=1
y(k) −
r X
gi (k) · (bi0 + bi1 · u 1 (k)
i=1
!2 + · · · + bi p · u p (k))
.
Those matrixes were mounted in such a way that each line of ΦB represents the output of the fuzzy model for a vector of collected points. In this way, the determination of the consequent parameters becomes a linear system. One wishes to determine the parameter vector B so that ΦB = Y, in such a way that such a solution approaches better the given output points. If n = p, Φ will be a square matrix and then, if there is a solution, B = Φ −1 Y. Nevertheless, when one deals with system identification, n 6= p or, more specifically, n p. Then the objective function J (B) to be minimized by the minimum-square method can be rewritten as: J (B) = kY − ΦBk2 .
(13)
So, vector B could be calculated by B = (Φ T Φ)−1 Φ T Y. A direct solution of this equation is neither simple nor fast. Even using good computational programs (Matlab), the inversion of matrixes can generate several errors. It is common for this operation not to be accomplished due to the matrix singularity for the accuracy of the used software. As proposed in [1], it is convenient to use a recursive algorithm composed of the following equations: fkl = (Clk−1 )T · Φ Tk γkl = [1 + (fkl )T · fkl ]1/2
σ lk
(12)
=
βkl
=
Blk−1
· Clk−1
(14) · fkl "
Blk
i=1
J=
T Y = y(1) y(2) . . . y(n) T B = b10 . . . b1 p b20 . . . b2 p . . . br 0 . . . br p 1 g1 . . . gr1 g11 u 11 . . . gr1 u 11 . . . g11 u 1p . . . gr1 u 1p 2 g1 . . . gr2 g12 u 21 . . . gr2 u 21 . . . g12 u 2p . . . gr2 u 2p . Φ= .. .. .. .. . . . . n n n n n n n n n n g1 . . . gr g1 u 1 . . . gr u 1 . . . g1 u p . . . gr u p
αkl = γkl + 1 1 βkl = l l γk · αk
gi (k) · yi (k)
i=1 r X
This problem is solved by transforming it into a linear system, whose numerical solution is described next.
Clk =
+ σkl
αkl
#
· · (yk − Φ k · Blk−1 ) γkl Clk−1 − σ lk · (fkl )T
where Φ k is the kth line of Φ. The process is initialized by a square identity matrix C, of order p. The initial vector B is such that bi = 1, 1 ≤ i ≤ p. And, for each iteration l, Eq. (14) are calculated for k = 1, 2, . . . , n. For the beginning of a new iteration, the last values of C and B are used, that is, C0l = Cnl−1 and B0l = Bnl−1 . The stop criterion for a tolerance ξ is defined as kBln − l−1 Bn k/ p < ξ or max(|bil − bil−1 |) < ξ for Bln and 1 ≤ i ≤ p. For a new input vector, it is necessary to calculate the new distances and memberships of the vector to the clusters using Eq. (7). This way, the activation levels of each rule are
C. Garcia et al. / ISA Transactions 47 (2008) 157–170
161
determined by (10) and, based on Eqs. (9) and (11), the new output yˆ (k) is estimated. As in the “Limited Rules”, there are just c rules against c p of the original Wang and Langari method, which means that the amount of calculations involved in the consequent parameter estimation is much lower. 3. Implementation of the “Limited Rules” method 3.1. Offline model training The offline model training starts with the acquisition of n input/output points from the process, generating the following matrix: u 1 (1) u 2 (1) . . . u p (1) y(1) u 1 (2) u 2 (2) . . . u p (2) y(2) X= . .. .. .. . .. .. . . . . u 1 (n)
u 2 (n)
...
u p (n)
y(n)
Notice that u j (k) are the inputs to the fuzzy system, but in fact they may represent any chosen present or past value of process inputs or outputs variables, except for y(k). After data acquisition, the training algorithm is applied, which basically consists of the following steps: • Definition of the number of clusters (rules) c. If no prior knowledge is available, which would suggest the number of clusters, an automated procedure can be applied. Ref. [4] presents two approaches to help with this step: Validity Measures and Compatible Cluster Merging; • Clustering process by FCM (Fuzzy C-Means) algorithm. In the Wang and Langari method, the resultant matrix of cluster centers has dimension c X p and each element is a scalar, whereas in the Limited Rules method, the resultant vector of cluster centers has dimension c X 1 and each element is a vector with dimension p + 1. Next, the form of the matrix of cluster centers generated by the Limited Rules method is presented: V = [v1 v2 . . . vc ]T . • Determination of the fuzzy partition matrix U, calculating the membership grades by Eqs. (15)–(17). Zi (k) = X(k) − Vi
i = 1...c
di2 (k) = Zi (k)T · Zi (k) i = 1 . . . c µi (k) =
1 c P
(di (k)/d j
i = 1 . . . c.
(16) (17)
(k))2
... ... .. .
...
• Estimation of the consequent parameters vector B is done by the recursive least-squares method shown in (14). • Then, the global output calculation is made, based on the partial outputs (9) and on the activation levels as shown in (11). • Last, the obtained model must be validated by some performance criterion, as discussed in Section 5. 3.2. Online operation When the soft sensor is operating online, the following steps are followed to calculate the output at each sampling time: • For a new input vector at time instant k, the online output calculation begins with the determination, for each rule or cluster, of the difference vector between the input vector and the respective cluster center, according to Eq. (15). • After that, the Euclidean squared norms of the difference vectors mentioned must be calculated (16). • Then the membership grades (17) must be calculated and, consequently, the activation levels for each rule (10) (c = l calculations). • Finalizing the online algorithm, the determination of the c = l partial outputs (9) and the global output y(k) is made, as shown in (11). 4. The pH neutralization process
(15)
j=1
It results in: µ1 (1) µ1 (2) µ2 (1) µ2 (2) U= . .. .. . µc (1) µc (2)
Fig. 2. P & I diagram of the pH neutralization process.
µ1 (n) µ2 (n) .. . . µc (n)
• Determination of the activation level for each rule. It is given by the membership grade, as shown in Eq. (10).
A simulated pH neutralization plant [6,7] was used to evaluate the fuzzy soft sensor, whose benchmark is implemented in Matlab/Simulink environment. It simulates a CSTR unit — Continuous Stirred Tank Reactor. This plant has three input flows. The first is the influent acid flow q1 (HNO3 ). The second is a buffer flow q2 (NaHCO3 ), which aims at absorbing/diminishing pH variations. The third is the base flow q3 (mixture of NaOH and NaHCO3 ). So to simplify, the influent acid and the buffer flows are kept constant and so is the influent concentration. The output flow q4 is defined as the sum of the input flows, in order to avoid changes at the tank level. Fig. 2 shows the process diagram. In order to get points in a large range of pH values (3–12), and not only around the neutral point (pH = 7), a PID controller
162
C. Garcia et al. / ISA Transactions 47 (2008) 157–170
Table 1 Maximum feasible models based on Wang–Langari’s algorithm at the computer used Number of input variables
3
4
5
6
7
8
Maximum possible number of clusters per variable Number of rules to the maximum number of clusters Number of rules to the maximum + 1 clusters
a
4 256 625
3 243 1024
2 64 729
2 128 2187
2 256 6561
125b –
a This value was not reached, the maximum value used was 5 clusters per input variable. b Number of rules for 5 clusters per input variable.
Fig. 3. Example of data employed in the model training.
was used. To obtain the training data, positive and negative slope ramps were applied to the setpoint of the controller. Additionally, white noise was added to the controller output, in order to generate informative closed loop experiments, since the addition of such a kind of signal in the controller output generates a persistently exciting signal [8]. This enabled data to contain several base flow and pH combinations. 5. Comparison between the Wang–Langari and Limited Rules methods 5.1. Obtaining data from the simulated pH neutralization plant To generate the training data, two ramps were applied to the setpoint of the pH controller, one rising and the other descending, both inside the possible pH range, each taking half the simulated testing time of the process. Two tests were conducted with two different noise seeds, thus generating two models (models 1 and 2). Fig. 3 shows a chart of 1000 collected points with sampling interval Ts = 1/6 s. In this chart, there are just the data generated by the ascending ramp. For the validation data, another pair of ramps was obtained, using a different noise seed from the previous ones. 5.2. Wang–Langari model The number of regressors for pH was fixed to two for those tests, that is, pH(k − 1) and pH(k − 2), where k refers
to the current instant of simulation. This decision was taken based on tests performed with different numbers of regressors, but it may be aided by previous knowledge of the process. These regressors work as input variables in the consequent part of each rule. It must be stressed that, to operate as a soft sensor, the output regressors cannot be measured, since they are not available. This way, the output regressors must be the previous estimated outputs. This form of validation is called “free simulation” or “infinite steps ahead”. In these tests, the number of base flow regressors (q3 (k − 1), q3 (k − 2), . . .) was changed to find the best fit. However, during the experiments, there were problems with the memory of the workstation that was not enough to deal with the amount of data generated. The computer used was an Ultra60 work station, Sun Microsystems 360 MHz, 256 MB of RAM, 690 MB of swap memory, SunOS 5.7, OpenWindows 3.6.1, Matlab 6.1 R12. The number of rules in the fuzzy model grows exponentially as the number of input variables increases, making it computationally infeasible to create models with a large number of input variables. The amount of collected data to be used as training points is a less important limiting parameter concerning the use of memory than the number of rules. The maximum values used in performing tests are shown in Table 1. It can be easily observed in Table 1 that, to be feasible, the models could not exceed around 256 rules. It was not possible to use nine input variables, because the limit of workstation memory is reached for two clusters per variable. There is no sense in using just one cluster per variable because this produces one single rule and creates a simple linear model. For the evaluation of the fuzzy models, besides the changes in the above parameters, the effects of the changes in the number of data collected for training were also analyzed. The used FCM tolerance was 1E-6 and for the estimation of the parameters the tolerance was 1E-3. Table 2 shows the measures of errors for each model created with different input configurations. For this sort of models, 2 or 3 base flow regressor configurations presented the best results. The training times needed to generate the models were in between some minutes to hours. As an example, to the model with six base flow regressors and 2000 training points, the total training time was near 29 100 s, approximately eight hours. 5.3. Limited Rules model The FCM tolerance was fixed to 1E-6. The tolerance for the consequent parameter estimation in the first tests was set to 1E4. The collected training data were identical to those used in
163
C. Garcia et al. / ISA Transactions 47 (2008) 157–170 Table 2 Best error results for different number of base flow regressors, 2000 validation points Base flow regressors
Situation of the best result
1 2 3 4 5 6
2 clust./var. 250 points of training 2 clust./var. 1000 points of training 2 clust./var. 2000 points of training 2 clust./var. 2000 points of training 2 clust./var. 2000 points of training 2 clust./var. 2000 points of training
Squared error sum (pH) Model 1
Model 2
Maximum of the error module (pH) Model 1 Model 2
6489 1003 798 1092 1847 3176
1020 697 1089 4656 4074 2787
5.06 2.33 2.28 2.32 2.73 4.74
1.97 1.74 2.20 5.37 6.06 3.72
Mean of the error modules (pH) Model 1 Model 2 1.25 0.49 0.49 0.55 0.76 0.97
0.55 0.41 0.42 1.11 1.04 0.93
Table 3 Error results for the best model obtained taken to 4 base flow regressors by the Limited Rules method Condition
Squared error sum (pH) Model 1 Model 2
Maximum of the error module (pH) Model 1 Model 2
Mean of the error module (pH) Model 1 Model 2
20 clusters and 2000 training points
85
0.86
0.14
82
Wang–Langari method tests and the extracted results were the same. As a first try, 4 base flow regressors were taken. The cluster or rule numbers were changed between 10 and 60 at steps of 10. The validation results only for the best model are shown in Table 3. This table shows a significant improvement in the results as compared to the models created by the Wang–Langari method (Table 2). While the best Wang–Langari model (2 base flow regressors) has a squared error sum between 700 and 1000, the best Limited Rules model has the same measure around 85. Although these models were taken with tolerance 1E-4, the Limited Rules method has shown better results even with tolerance 1E-3. Fig. 4 illustrates the quality of the models obtained by both methods. 4 base flow regressors were taken, 2 pH regressors, 2000 training points and 1E-4 as tolerance for both methods. For the model created by the Wang–Langari method, 2 clusters per variable were used (the only possibility due to the memory), and for the model created by the Limited Rules method, 28 clusters were used. The influence of the number of rules was the opposite of the intuitive idea that the greater the number of rules, the better the model. This can be seen in Fig. 5. The bar graph shows a comparison between the sum of the squared error sum obtained in both tests to a different number of rules. It can be clearly observed in Fig. 5 that there is a great improvement at 20 rules or clusters. At first sight, this result does not seem to be coherent, but it is explained in the next paragraph. In general, the data set used at the model generation has information about the process, but it also has random noise from measurements or from nonmeasured input variables that affect the process. If the model has a small number of rules, it is probable that the model would not properly model the process, which implies a model not adjusted to the desired application. On the other hand, a great number of rules will model the process noise, affecting the model precision. The influence of the number of rules seems to be contrary to the intuitive idea
0.98
0.14
Fig. 4. Comparison between the validation results of the models obtained by both methods.
Fig. 5. Comparison between models generated with different number of rules.
that a larger number of rules imply a better model. This effect is equivalent to the overfitting found when ARX (AutoRegressive structure with eXogenous input) or ARMAX (AutoRegressive Moving Average structure with eXogenous input) linear system parametric models are employed with an exceeding number of regressors or to the overtraining problem in neural networks.
164
C. Garcia et al. / ISA Transactions 47 (2008) 157–170
Fig. 6. Comparison between models generated with different number of flow regressors.
For the pH neutralization process, it was experimentally found that the best number of clusters is 28, when using around 1000 training points, 4 base flow regressors and 2 pH regressors. In [4], two ways to estimate the best number of clusters are presented. The method of “Validity Measures” confirmed 28 as the best number of clusters. Other tests were performed with a more specific number of clusters. The number of base flow regressors was also changed to compare these results to those obtained using the Wang–Langari method. Based on mean error, the best models were selected to different number of base flow regressors. The results of these models are shown in Fig. 6. It can be concluded that the best models have 3 and 4 base flow regressors. The models having 1, 5 or 6 base flow regressors do not look reliable because the error values have great differences for tests with the same conditions, but different noise seeds. Although only the best models were selected, almost all of them created with 1E-4 as tolerance presented good results. This shows that even if the best structure is not chosen, it is possible to create a reasonable model of the process. 5.4. Comparison between the mean time estimation for the best models From the best models, it was possible to compare the efficiency of the generated models related to the estimation time. The training time is not so relevant in creating a virtual sensor, because it is feasible to run a training program for many days. On the other hand, the processing time or the model estimation time is a feature of great relevance. It is not feasible to use a soft sensor demanding too much processing time, because the estimations must be performed in real time. Thus,
Table 4 Comparison between mean time estimation for the best models obtained by each method Conditions
Mean time estimation (ms)
Wang–Langari, 2 regressors for base flow, 2 clusters/variable Limited Rules, 3 regressors for base flow, 20 clusters
6.3 4.7
Table 5 Results of changing the number of pH regressors Condition
Squared error sum (pH)
Maximum error module (pH)
Mean error module (pH)
3 pH regressors 2 pH regressors 1 pH regressor No pH regressor
943 1151 1471 4764
2.45 2.66 3.03 5.51
0.46 0.52 0.64 1.19
the estimation time information is closely connected to the virtual sensor viability. Table 4 shows a comparison between the best models obtained by each method using the mean time of estimation in the workstation. The fuzzy models generated by both methods presented small estimation times, thus allowing their implementation as virtual sensors to be used in inferential control. 5.5. Evaluation of the number of estimated pH regressors Only for the Limited Rules method the number of regressors was changed. 12 base flow regressors were used as an attempt to provide information about the system state. The model was generated with 2000 training points, 40 clusters and 1E-3 as tolerance. The results are shown in Table 5.
C. Garcia et al. / ISA Transactions 47 (2008) 157–170
It can be checked that there is a significant model quality improvement when the number of output regressors is changed from none to one. However, for each additional output regressor, the error reduction is smaller. 6. Overview of the embedded soft sensor The Limited Rules algorithm was implemented in the firmware of the autonomous hardware. 6.1. Hardware The hardware platform utilized is based on Texas Instruments TMS320F206 Digital Signal Processor (DSP) [9]. The choice of this platform was made based on the following reasons: • Adequate computational capacity, once the model has some parts based on difference equations; and • Easy RS-485 serial network interface. The developed prototype, called VS01A, presents the following hardware features: • TMS320F206 digital signal processor; • 4 analog input channels, with interface to signals of 0–10 V, 0–5 V, 1–5 V, 0–20 mA and 4–20 mA; • Parallel A/D converter — 12 bits; • 4 analog output channels, with interface to signals of 0–10 V, 0–5 V and 1–5 V; • Parallel D/A converter — 12 bits. 6.2. Software Before being used as an online sensor, the soft sensor must be trained. This operation was performed offline in Matlab environment, where the fuzzy model was generated based on an input/output data batch collected from the simulated pH neutralization plant. When the trained model was available, a software was developed to export it from Matlab to its autonomous hardware. Finally, when the fuzzy model was installed in the hardware, it was able to operate as a virtual sensor. A fundamental characteristic of the autonomous soft sensor is that the model is recorded in firmware, i.e., to use a new kind of soft sensor, it is only necessary to do the offline training in Matlab and store the parameters in the firmware, correcting the calculations and input/output ranges. The hardware is generic. The trained model was passed to the DSP by transferring the cluster centers and the consequent parameters into the program memory. The language used for the DSP programming was Assembly. As the used DSP is a fixed-point processor, the first software version was entirely developed on this standard, with parameters and variables normalized to the 1.15 (Q15) fractional format. However, even using scaling techniques, the calculation of the activation levels of the rules has a too large dynamic range. This fact demanded the implementation of this part of the model with floating point, by software routines. The
165
results were very satisfactory, and so the final version of the model was generated, totally developed with floating point. The time spent to run the model was about 13 ms, whereas the pH neutralization process benchmark has a sampling time of 167 ms. The following steps can describe the pH soft sensor execution algorithm: • Wait for timer interruption occurrence, with sampling time identical to the process (167 ms); • Make an A/D conversion — sampling of q3 (k); • For the current X vector (composed of four q3 regressors and two estimated pH regressors), it must be calculated, for all the clusters, the Z vector, which represents the distance between X and the cluster centers V, using Eq. (15); • Calculate the quadratic norms of the distances using Eq. (16); • For each rule, calculate the corresponding membership grades and activation levels by Eqs. (17) and (10); • Calculate the partial outputs using Eq. (9); • Calculate the global output: y(k) = pH(k) by Eq. (11); • For the case of isolated sensor operation (without controller embedded), write the pH new value to the D/A converter; • For the case of sensor + controller, run the PID algorithm with “anti reset wind-up” and write the control output to the D/A converter; and • Update the regressors and go out from timer interruption. 7. Model implemented in the embedded soft sensor The following parameters of the model were chosen: • Number of clusters (rules): 20 • Number of base flow q3 regressors: 4 • Number of estimated pH regressors: 2 The ith consequent is given by: c i (k) = bi0 + bi1 · q3 (k − 1) + bi2 · q3 (k − 2) pH + bi3 · q3 (k − 3) + bi4 · q3 (k − 4) c − 1) + bi6 · pH(k c − 2). + bi5 · pH(k Observe in this equation that real pH measurements are no c i (k). Only previous estimations longer used when estimating pH are used, as desired in the implementation of a soft sensor. The training results in a matrix with cluster centers (antecedents) and a matrix with the consequent parameters. Below there is an example of a cluster center and a linear sub model: • Center of cluster number 1: q3 (k − 1) = 0.4048 q3 (k − 2) = 0.4039 q3 (k − 3) = 0.4036 q3 (k − 4) = 0.4047 pH(k − 1) = 6.3189 pH(k − 2) = 6.3198.
166
C. Garcia et al. / ISA Transactions 47 (2008) 157–170
Fig. 7. Open loop model validation.
• Consequent parameters for rule number 1: b10 = 0.3313 b11 = 0.0491 b12 = 0.0275 b13 = −0.0172 b14 = −0.0164 b15 = 0.4054 b16 = 0.5383. 8. Tests of the software sensor Two groups of tests were made. The first one consisted of tests performed with the soft sensor operating in Matlab environment. The second group was conducted with the soft sensor installed in the autonomous hardware. The reference pH neutralization plant was always simulated in Matlab PC environment. 8.1. Soft sensor in Matlab environment 8.1.1. Open loop test The obtained model was initially tested in an open loop, for a nominal base flow q3 with the addition of white noise. In this case, the expected pH was 7.0. The result is shown in Fig. 7. This test was performed with the aim of showing that there is no drift along time in the soft sensor output, that is, even with noise in the input, the output remains stable. It can be noted in Fig. 7 that the soft sensor output is stable, nevertheless with a small offset, since the mean value of the output is around 6.88 and not 7.0 as expected. One possible reason for that offset could be a low concentration of training points around pH = 7.0.
8.1.2. Closed loop test with pH measured directly from the process output Next, the model was validated comparing the real pH curve with the estimated one. A ramped setpoint (Fig. 8) was used in a closed control loop with pH measured directly from the set process output. The controller employed was a PID. The idea behind this test was to check the behavior of the sensor in a large range of pH values (3–10), but still without the responsibility of sending its signal to the controller. For this test a high noise level was imposed in the base flow q3 , in order to check if the soft sensor was able to deal with this situation. As can be noticed in Fig. 8, the behavior of the soft sensor was satisfactory along all the tested range. The performance criterion used was the ISE (Integrated Squared Error) of the estimated pH in relation to the “real” pH of the process. For the test of Fig. 8, the ISE was 22.5 pH2 · s, integrated over 690 s of simulation. 8.1.3. Closed loop test with feedback from soft sensor In this test, a sinusoidal setpoint was used, which led the process to pH values between 4 and 10, but with the soft sensor sending its signal to the PID controller. In this case, the amount of noise in q3 was reduced, when compared to the previous test. Fig. 9 shows that the operation of the soft sensor as the feedback element was satisfactory. For this case, the ISE was 507.6 pH2 · s, integrated over 690 s of simulation. 8.2. Soft sensor in autonomous hardware To check the embedded hardware soft sensor performance, the experiment illustrated by Fig. 10 was built. In that case, as the soft sensor is placed out of the Matlab environment, installed in its autonomous hardware, it is necessary to interface the process and the controller that still remain in the Matlab environment with the external soft sensor. To do this, a data acquisition board was used, model CAD12/36, manufactured
C. Garcia et al. / ISA Transactions 47 (2008) 157–170
167
Fig. 8. Validation in a large range of pH values.
Fig. 9. Utilization of the soft sensor for closed loop control.
by Lynx Technology, S˜ao Paulo, Brazil. This acquisition board has 12 bits A/D and D/A converters, compatible with the resolution used in the soft sensor prototype. A software interface was developed to allow the communication between Matlab and the data acquisition board. 8.2.1. Open loop test Fig. 11 illustrates the result of an open loop simulation. This test was performed to check the behavior of the soft sensor, now running in its own hardware and to see the effects of the A/D and D/A conversions. This test is quite similar to the one of item 8.1.1, but now with the base flow signal going to the external soft sensor and the pH signal coming back to the Matlab environment, including all the possible noise present in the communication of those two signals.
As can be noticed in Fig. 11, the result is similar to the one of Fig. 7, but the signal variance is a little bit larger, showing the effects of the communication noise. 8.2.2. Closed loop test with feedback from pH measured directly from the process Fig. 12 shows a ramp setpoint validation. The ISE performance criterion for the ramp simulation in Fig. 12 was 41.5 pH2 · s. This ISE increase in relation to the PC soft sensor (=22.5 pH2 · s) is explained by the error sources described below: • Quantization 12–16 bits on A/D conversion; • Tolerance and “drift” of electronic components, including voltage reference circuit to A/D and D/A; and
168
C. Garcia et al. / ISA Transactions 47 (2008) 157–170
• Prototype board, without printed circuit layout and without adequate instrumentation techniques. The A/D converted value is multiplied by 8, due to Q15 representation. Therefore, on full scale (+32 767), there is an error of 7 A/D units (4095 × 8 = 32 760) for the VS01A and the same value for the CAD12/36 board. So, for the 4020 points of simulation, only by quantization, a maximum ISE = 137.4 pH2 · s (14 × (10 V/4095 = 2.44 mV) × 4020) could happen.
Fig. 10. Layout for embedded soft sensor tests.
8.2.3. Closed loop test with feedback from embedded soft sensor For the closed loop test operating with the VS01A, Fig. 13 presents the performed experiment. For a sinusoidal setpoint value, the obtained result is shown in Fig. 14.
Fig. 11. VS01A open loop operation.
Fig. 12. VS01A Validation for ramped reference value.
C. Garcia et al. / ISA Transactions 47 (2008) 157–170
169
Fig. 13. Experiment for closed loop operation with the VS01A.
The performance criterion ISE was 394.4 pH2 ·s, quite below the corresponding ISE of the PC soft sensor (=507.6 pH2 · s). This fact cannot be interpreted, under any circumstance, as an advantage of the VS01A in relation to the PC soft sensor. Probably, what happened was a “contribution” of VS01A error sources to an approximation of the estimated pH to its “real” value, especially around pH = 4.0 (see Figs. 9 and 14). 8.2.4. Closed loop test with embedded PID controller and feedback from soft sensor Fig. 15 shows the VS01A operating with an embedded PID controller and soft sensor, creating an embedded inferential control system. The results for this integrated system were similar to those obtained with the PID in Matlab, as seen in item 8.2.3. 9. Conclusions The development of virtual sensors for variables that cannot be directly measured by hard sensors represents an advance in process control. Barriers caused by the current technology limitations, such as the inexistence of sensors, high costs, high delays on measures or low precision, can be solved in this way. The use of fuzzy models has been shown to be a possible way to derive virtual sensors. From the information collected from a specific process, it is possible to obtain a fuzzy model
Fig. 15. Soft sensor + PID hardware integration.
in a black-box approach. Moreover, the Limited Rules method has been shown to be versatile enough to allow the generation of models from much more complex processes and with many more input variables than those allowed by the original Wang–Langari method. A difficulty of discrete-time fuzzy system modeling using fuzzy theory is the determination of the fuzzy model structure, i.e., the determination of the number of regressors from each variable. Many times, a small number of input variable regressors generate a low precision model, while an excessive number of regressors demand a much higher computational effort. It was verified that the accuracy of the generated fuzzy models is much dependent on the data collected from the process. The lack of information in these data generates an imprecise fuzzy model which will diverge if used as a virtual sensor. Thus, if possible, it is important to collect data for different system excitations to provide information about different states of the system. The number of input variables and the training data implies that the best number of clusters should be defined prior to the creation of the model. A small number of clusters limit
Fig. 14. Performance of the closed loop system operating with the VS01A.
170
C. Garcia et al. / ISA Transactions 47 (2008) 157–170
the model capacity to identify the process dynamics, while an excessive number of clusters will possibly model measurement errors, deteriorating the estimates’ precision. It was also observed that the computer capacity of generating, or not, a more complex model by using the proposed methods is strongly related to the number of rules in the fuzzy model. With respect to the implementation of the soft sensor in an autonomous hardware, the values of ISE obtained with the VS01A were compatible with those from the original Matlab model, providing an indication of the effectiveness of the embedded system. This prototype was successfully tested in a simulated plant. After more tests in other simulated plants and then in real plants, it could be possible to employ it in industrial applications, as a sensor, or as a more complete solution: sensor plus controller, for which the development of a network connection would be necessary. The implementation of spread used protocols like Fieldbus, Profibus, DeviceNet etc. would allow easy connection of the proposed system with almost all industrial plant control systems, and thus easy integration of setpoint and tuning adjustments as well as visualization of the controlled variable, to the already existing supervisory systems. The versatile characteristic of the hardware can be noticed, which is a single solution for all processes. It is only necessary for the input and output signals to have compatible levels (voltage range, current limits and others) with the VS01A. Despite having used fuzzy identification techniques for model generation in this paper, it is quite possible for the VS01A to use other techniques, such as Artificial Neural Networks, once the firmware is reprogrammable. References [1] Wang L, Langari R. Complex systems modeling via fuzzy logic. IEEE
[2]
[3] [4] [5] [6] [7] [8] [9]
Transactions on Systems, Man and Cybernetics — Part B: Cybernetics 1996;26(1):100–6. Takagi T, Sugeno M. Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man and Cybernetics 1985;SMC-15(1):116–32. Sugeno M, Kang GT. Structure identification of fuzzy model. Fuzzy Sets and Systems 1988;28(1):15–33. Hellendoorn H, Driankov D. Fuzzy model identification — Selected approaches. Springer; 1997. Oliveira CEN, Garcia C. Application of fuzzy logic to develop a soft sensor. In: XV Brazilian congress of automatica. 2004 [in Portuguese]. Hall RC. Development of a multivariable pH experiment. Master thesis. Santa Barbara: University of California; 1987. Girardot DM. Control of pH based on reaction invariants. Master thesis. Santa Barbara: University of California; 1989. Ljung L. System Identification — Theory for the User. 2nd ed. Prentice Hall; 1999. Texas Instruments. TMS320F206 Datasheet, Literature number: SPRS050A. 1998.
Claudio Garcia is an electronic engineer at the Polytechnic School of the University of S˜ao Paulo - EPUSP, Brazil, 1979. He obtained M.Sc. and Ph.D. degrees both in Electronic Engineering from EPUSP in 1987 and 1992, respectively. He has experience in the areas of process modeling, identification and control. He worked for Foxboro, Taylor, the Brazilian Navy and nowadays he is a professor at EPUSP. His research interests include industrial process control and system modeling and identification. He has published a book in Brazil called “Modeling and simulation of electromechanical systems and industrial processes” and many papers in international congresses and journals. C´assio de Carvalho Berni was born in S˜ao Paulo, Brazil, in 1976. He received the B.S. in electrical engineering from the Maua Institute of Technology, in 2000, and the M.S. degree in systems engineering from the Polytechnic School of the University of S˜ao Paulo, in 2004. He is currently working as a design engineer of electronic systems, especially involving digital signal processors and microcontroller embedded equipments. Carlos Eduardo Neri de Oliveira is an electrical engineer at the Polytechnic School of the University of S˜ao Paulo (USP), currently mastering in Systems Engineering at the same university. Since 2005, he has been working as Automation and Control Engineer at Chemtech (Brazil), developing projects for different industries.