A comparative study of non linear MISO Process modelling techniques: Application to a chemical reactor Okba Taouali *, Saidi Nabiha** & Hassani Messaoud *** Unité de Recherche d’Automatique, Traitement de Signal et Image (ATSI), Ecole Nationale d’Ingénieur Monastir; Rue Ibn ELJazzar 5019 Monastir –Tunisia *
[email protected] [email protected] ***
[email protected] **
Abstract: This paper proposes the design and a comparative study of two non linear Multiple Inputs Single Output ( MISO ) models. The first, titled Volterra model, is built using Volterra series and the second, named RKHS model, uses the Statistical Learning Theory (SLT) which operates on Reproducing Kernel Hilbert Space (RKHS). The complexity of both models is pointed out in SISO models as well as in MISO ones. The performances of both models are evaluated first by using Monte Carlo numerical simulations and second by handling a chemical process known as the Continuous Stirred Tank Reactor CSTR. In both validation operation the results were successful. theory (SLT) which uses the RKHS to yield MISO models. The paragraph 3 is devoted to the comparative study of these two models, for each model some setting parameters are tuned and some performances such as the parameter number, the Normalised Mean Square Error (NMSE) and the computing time are evaluated. The validation of such models is carried out through Monte Carlo simulations where the influence of an additive noise is evaluated for the two models. These two models are tested on a physical process the Continuous Strired chemical Reactor and the results are successful.
1. INTRODUCTION Volterra models process have several important properties that make them very useful for the modelling and analysis of non linear systems (Dodd, T.J and al, 2002): they are linear with respect to their parameters, i.e. the kernel coefficients; each homogeneous term can be viewed as a multidimensional convolution; Volterra models with finite memory are BIBO (Bounded Input Bounded Output) stable; they allow to model a large class of non linear systems. Indeed, it has been shown that causal time invariant non linear systems with fading memory can be described with a finite degree of precision by truncated Volterra models (Boyd and Chua, 1985). However this elegy is disqualified by the huge increasing of the parameter number depending on non linearity hardness. As an alternative to this modelling strategy the last few years has registered the birth of a new modelling technique developed on a particular Hilbert Space the kernel of which is reproducing. This space known as Reproducing Kernel Hilbert Space (RKHS) uses the statistical learning theory to provide an RKHS model as a linear combination of the kernels forming the RKHS space. The RKHS modelling proud of its independence of the degree and the memory of the model which constraint the models developed on Volterra series and cause the exponential increasing of their parameter number. Contrarily the parameter number depends only on the observation number and may be very smaller compared to that engaged in Volterra series models especially for higher nonlinear systems (Kibangou, 2005).
2. NON LINEAR SYSTEM MODELLING 2.1 Volterra model 2.1.1 SISO Volterra model The output of the discrete time invariant Volterra model of order P (Boyd and Chua, 1985) is given by: P
y(k ) = h0 + ∑
M −1
i
M −1
∑ ... ∑ h (m , m , ..., m ) ∏ u(k − m ) (1)
i = 1 m1 = 0
mi = 0
i
1
2
i
j =1
j
With P the non linearity degree, M the system memory, hi (m 1 , m 2 , ..., mi ) the ith Volterra Kernel and h0 the system steady state characteristic. The Volterra model can be seen as a natural extension of the linear system impulse response to non linear systems. Although it is non linear with respect to its inputs, this model is linear with respect to its parameters which enables to apply some identification techniques developed in linear case. The parameter number of the
In this paper we are concerned by the comparaison of Volterra model and RKHS model in MISO case. In paragraph 2, we remind the Volterra model in SISO and MISO case and a brief review of the statistical learning
1
minimize rather the empirical risk Remp ( f
Volterra model is given by: P
∑
M
Remp ( f ) =
(2)
i
i = 1
P
M −1
M −1
∑ ... ∑
i = 1 m1 = 0
mi = mi-1
i
hi (m1 , m 2 , ..., mi ) ∏ u (n − m j ) j =1
( M − 1 + i )!
∑ (M − 1)!i !
(4)
2.1.2 MISO Volterra model For multiple input single output process, the output of the triangular form of Volterra model is : n M−1
min D ( f ) = f ∈Η
i
M−1
ji =1 m1=0
mi =mi−1
e=1
e
Where u (k ) = u1 (k ) u2 (k ) ⋯ un (k ) T and y(k) are the process input vector and output respectively. The parameter number is P
∑n i =1
i
( M − 1 + i )! ( M − 1)! i !
(6)
V ( y , f ( x ) ) P ( x, y ) dxdy
2 H
(9)
coefficients ( ci )i = 1, ..., N , we have : n
n
∑∑ c c K ( x ,x )
(7)
i =1 j =1
i
j
i
j
≥0
(10)
In this case, there exist (Aronszajn, 1950) a Hilbert space, H , and a function Γ : Ε → H such as for any x, x ' on E ,
are independents distributed observations and
K ( x,x ' ) = 〈 Γ ( x ) , Γ ( x ' )〉 H
is a cost function that measures the deviation
between the process output
f
a. K is symmetric ; b. K is positive definite, i.e., for any integer n , any sequence of elements ( xi )i = 1, ..., N ∈ E 2 and any set of
Where ( X , Y ) is a random vector of distribution P of which
( xi , yi ) V ( y, f ( x ) )
i =1
i
Let’s assume that the random variable X takes its values in the space E ⊂ ℝ d and let us consider a function K : E 2 → ℝ K, called kernel such as:
The Statistical Learning Theory (Vapnik, (1995,1998)) aims to model and input/output process from a set of observations D = {( x1 , y1 ) ,..., ( xN , y N )} , by selecting, in a pre-definite set of functions H, the function f 0 that fits as most as possible the relation between the process inputs xi and outputs yi and the optimal function is that which minimizes the following functional risk.
the
i
2.2.2. RKHS and representer theorem
2.2.1. Statistical Learning Theory (SLT)
X ,Y
∑ V ( y , f ( x )) + λ N
The minimization of the criterion (9) on any arbitrary function space H, possibly of infinite dimension, is not an easy problem. However, this task may be accomplished easily when this space is a Reproducing Kernel Hilbert Space (RKHS).
2.2. RKHS model
R( f ) = ∫
N
1
Where the parameter λ is a regularization term allowing to adjust the trade-off between the minimization of the empirical risk and generalization ability. The bigger λ is, the more important the regularity for the solution will be.
y(k) = ∑∑…∑ ∑… ∑ hj1, j2, …, ji (m1,…, mi )∏uj (k −me) (5) i=1 j1=1
(8)
often, leads to overfitting of the function reserved to the data and the generalization of the model to new observations other than that in D may not be garanteed. To solve this problem, (Vapnik, 1998) proposed the structural risk minimization (SRM) principle which consists on penalizing the empirical risk by a function estimating the complexity of the reserved model. This leads to the minimization of the criterion defined in the equation
i =1
P n
, f ( xi ) )
i
i =1
the H isn’t the best estimate of the minimization of the risk R ( f ) . Indeed, the minimization of the empirical risk
(3)
And the relevant parameter number of such model is: P
∑V ( y
)
Where N is the observations number and V(yi, f(xi)) is a loss function. However, the direct minimization of Remp ( f ) in
To reduce the parameter number we use generally the triangular form of the Volterra model, given as: y (n) = ∑
1 N
N
y and its approximation
(11)
H is called Reproducing Kernel Hilbert space (RKHS ) of kernel K . This space has some specific proprieties, in particular
by f ( x ) . Since the distribution P(x, y) is unknown, the risk.
R ( f ) can not be evaluated. To overcome this situation we
2
a. ∀ x ∈ E , et f ∈ H 〈 K ( x,.) , f 〉 H = f ( x ) ;
Where ϕ is a non linear function, p is the input number and e(k) is an additive noise. The input vector can be defined as:
b. The representer theorem (Wahba, 2000) proves that the solution of the optimisation problem given by (9) in this space can be written as:
x = u 1( k ) ,..., u 1( M1 + k −1),..., u p ( k ),..., u p ( M p + k −1)
N
f opt = ∑ ai K ( xi ,.) .
for k = 1, …, N - Mp + 1
(13)
Where N is the observation number and Mp is the memory of the pth input.
i =1
In this case, the optimization problem (9) is equivalent to quadratic optimization problem of N real ( ai )
3.
Furthermore, the space H becomes implicit and is simply visible by means of its kernel thanks to the property (a) called also kernel trick.
2 H
N
3.1 Numerical example Consider the system described by the following relation:
N
= ∑∑ ai a j K ( xi ,x j )
(14)
y (k ) = 0,3 u1 (k -1) + 0, 2 u23 (k -1) + e( k )
i =1 j =1
The algorithms used to estimate the parameters ai in (13) are called learning machines such as support vector machines (SVM) and , regularization network (RN) for which the cost function is:
V ( yi , f ( xi )) = ( yi − f ( xi ))
2
(15)
−1
ai = ∑ ( G + λ I )ij y j
N
NMSE =
k ( x , x ') = e x p
P o ly n o m ia l :
x - x ' 2 p12
k ( x , x ') = ( 1 +
x, x '
)p
1
Where p1 is a given parameter
Table 1 summarises the performances of both models such as the parameter number (np) the Normalized Mean Square Error (NMSE) and the computing time (CT). For the Volterra model three sets of structure parameters (degree of non linearity P and memory M) are considered. Even though the NMSE and the computing times are comparable for both models, the parameter number is smaller in case of Volterra model. We conclude that the complexity of the Volterra
2.2.3. RKHS MISO model In the case of MISO model (Taouali.O ,2008) the output can be written as:
y (k ) = ϕ u1 , ..., u p , k + e(k )
(20)
2
In Figure 1 we draw the process output and the the output of the RKHS model using 100 samples for the learning set to estimate the parameters formulated by the representer theorem. The kernel used is polynomial with p1 = 3. For model validation we use 120 samples other than those used for identification. It resorts that the model output fits the process output with a Normalised Mean Square Error (NMSE) equal to 0,48%. In Figure 2 we plot the Volterra model output and the process output for a non linearity degree P = 2 and a memory M = 2. We notice the concordance between both outputs with an NMSE equal to 0,51%.
kernels can be considered:
RBF :
N
∑ ( y(k ))
3.1.1 Model Comparison
Gij = K ( xi , x j ) and Y is the output vector .Different types of
x - x ' 2 p12
k=1
− yɶ (k )) 2
k=1
Where G is the gram matrix associated to the kernel K
E R B F : k ( x , x ') = e x p
∑ ( y (k )
With N the observation number, y(k) is the output of the system and yɶ (k ) is the model output.
(16)
j =1
(19)
Where u1 and u2 are two gaussian inputs and e is a white noise. The provided results are issued after 20 Monte Carlo tries each contains 100 runs. The model performance is evaluated by using the Normalized Mean Square Error (NMSE) between the system output and the model one.
And the the optimal function is given by (13), where the sequence {ai } are such as: N
COMPARATIVE STUDY OF BOTH MODELS
In this paragraph we are interested to compare Volterra and the RKHS models when a MISO process is considered.
In particular, the square norm of the solution function f in the space H is given by:
f
T
(17)
3
3.1.2. Noise effect
model increases with the memory M and the non linearity degree P which increases when a hard non linearity is considered, however the complexity of the RKHS model depends only on the number of observations used in learning step. Therefore the RKHS model is more efficiently for modelling hard non linearities.
To raise the influence of an additive noise on the identification quality we plot in figures 3 and 4 the evolution of the NMSE for different SNR for the both models Signal to Noise Ratio (SNR):
1.4
N
1.2
SNR =
∑ ( y (k ) − y )
n = 0 N
∑
1
2
(21)
(e( k ) − e) 2
n = 0
0.8 With y and e are the mean values of the output and the noise respectively.
0.6 0.4 0
10
20
30
Process output RKHS output
40
50
60
70
80
90
100 SNR=5 SNR=7 SNR=10 SNR=20 SNR=30 SNR=5O without noise
Identification phase
1.2
-1
10
1 0.8 0.6 0.4
-2
10
20
40
60 80 Validation phase
100
120
0
20
60
80
100
120
140
180
160
Fig.3: Noise effect on MISO Volterra model
Fig.1: Validation of RKHS model; polynomial kernel 1.2
10
Volterra output process output
1.1
40
-2
SNR=5 SNR=7 SNR=10 SNR=20 10-3 SNR=30 SNR=5O without noise
1
0.9
0.8
10
0.7
-4
0.6
0.5
0
20
40
60
100
80
10
120
-5
0
10
20
30
40
50
60
70
80
90
100
Fig.2: Validation of Volterra model (P = 2; M =2) Fig.4: Noise effect on MISO RKHS model Table 1: Performances of both models Models RKHS model Volterra Model
Tuning Parameters
np
It’s noted that the error goes down when the SNR value goes high.
NMSE(%) CT (s)
3.2 Chemical reactor modelling
polynomial Kernel ( p1 = 3)
100
0.48
0,78
P = 2 et M = 1
6
3
0,046
P = 2 et M = 2
16
0,51
0,072
P = 3 et M = 2
48
0,64
0,82
3.2.1. Process description To test the effectiveness of the RKHS and the Volterra models we test them on a Continuous Strired Tank Reactor CSTR which is a non linear system used for the conduct of
4
the chemical reactions(Demuth et al, 2007). A diagram of the reactor is given in the figure 5.
w1
w2
: feed of reactant 1
Cb 2
Cb1 : Concentration of reactant 1
p1 = 400, λ =10-8 . In the learning phase we use a training set of 100 inputs/outputs observations and in the validation phase we use 300 new observations to determine the performance of the RKHS model. In figure 7 we plot the RKHS model output and the process output in the in the validation phase
: feed of reactant 2 : Concentration of reactant 2
24 23.5 23
h
22.5
w0 Cb
22
: feed product
21.5
: Concentration product
21 20.5
Fig. 5: Chemical reactor Diagram
RKHS ouput Process output
20 19.5
0
50
The physical equations describing the process are: dh ( t ) dt
(
Where: h ( t )
)
is
(
the
)
height
of
the
k .C ( t ) 1 b
mixture
in
300
This process can be modeled by a MISO Volterra model with degree of non linearity P = 2 and a memory M = 2, the input number is n = 2 and the parameter number is 16.
the
with concentration Cb1(resp. Cb2). The feed product of the reaction is w0 and its concentration is Cb . k1and k2 are
In Figure 8 we draw the validation of the model output, the yielded NMSE is 0,0485 %.
consumption reactant rate. The temperature in the reactor is assumed constant and equal to the ambient temperature.We are interested by modelling the subsystem presented in Figure 6 where k1 , k2 , w2 and Cb2 are assumed to be constant so
24
Process output Volterra output
23.5 23
w1 and the
22.5
concentration Cb1 of the reactant 1 and output the product
22
concentration Cb . : Feed of reactant 1
250
3.2.1 MISO Volterra model:
2
reactor, w1 (resp, w2 ) the feed of reactant 1(resp, reactant 2)
w1
200
we notice the concordance between both outputs and the Normalised Mean Square Error (NMSE) is 7.848 10−3 %
(1 + k2.Cb ( t ))
that it fits a MISO process with inputs the feed
150
Fig.7: Validation of RKHS model
= w ( t ) + w ( t ) − 0, 2 h ( t ) 1 2
dC ( t ) w (t ) w (t ) b = C − C (t ) 1 + C − C (t ) 2 − b 1 b b 2 b dt h(t ) h (t )
100
21.5 21
Cb1 : Concentration
20.5
of reactant 1
20
0
50
100
150
200
250
300
Fig. 8: Validation of MISO Volterra model P = 2 and M = 2
Cb : Product
In Table 3 we evaluate the parameter number of the MISO Volterra model and of the MISO RKHS model. We conclude that MISO Volterra model is more efficient because it has the less number of parameters with a comparable NMSE.
Concentration
Fig. 6: Considered subsystem For the purpose of the simulations we used the CSTR model of the reactor provided with Simulink of Matlab.
Table 2: Performances of both models
3.2.2. MISO RKHS model: We used the support vector machine (SVM) with the RBF kernel. The optimal parameters p1 , λ of this learning machine are obtained by a cross validation technique.
5
Model
Model output
np
NMSE (%)
RKHS model
concentration
100
7.848 10 −3
Volterra model concentration
16
0,0485
4. CONCLUSION
Boyd S., Chua L.O., (1985). Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Tr. Circuits and Systems vol. CAS-32, No. 11, pp. 1150–1171. Demuth H., Beale M. and Hagan M., (2007). Neural Network Toolbox 5, User’s Guide“. The MathWorks. Dodd, T.J and Harrison, R.F. (2002). A new solution to Volterra series estimation. Proceedings of the 15th IFAC World Congress on Automatic Control, Barcelona, Spain. Kibangou A., (2005). Modèle de Volterra à complexité réduite : estimation paramétrique et application à l’égalisation des canaux de communication. Doctorat, Université de Nice Sophia Antipolis, France. Okba, T., Ilyes., A and Hassani, M. (2008). Identification des Systèmes MISO non linéaires modélisés dans les Espaces de Hilbert à Noyau Reproduisant (RKHS), La cinquième Conférence Internationale d’Electrotechnique et d’Automatique, 02-04 Mai 2008, Hammamet, Tunisie Vapnik, V.N. (1995). The Nature of Statistical Learning Theory. Springer, New York. Vapnik V.N. (1998). Statistical Learning Theory, Wiley New York. Wahba G., (2000). An introduction to model building with reproducing kernel Hilbert space, Technical report N° 1020, Department of Statistics, University of Wisconsin.
This paper has dealt with the study and the comparison of two non linear MISO system modelling techniques the Volterra model and the RKHS model. It has been shown that in its original form the complexity depends on the kind , on the hardness of the process non linearity and the number of observations. Monte Carlo simulations are carried out to evaluate performances of both models and the influence of an additive noise on the identification qualities. These models have been tested for modelling of a chemical readtor and results are successuful. REFERENCES Aronszajn N., (1950). Theory of reproducing Kernels, Transactions of the American Mathematical Society, Vol.68, pp. 337-404
6