Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749 www.elsevier.com/locate/nahs
Online clustering of switching models based on a subspace framework K.M. Pekpe a,1 , S. Lecœuche b,∗ a LAGIS - UMR CNRS 8146, Universite des Sciences et Technologies de Lille U.S.T.L., Cite Scientifique 59655 Villeneuve d’Ascq Cedex, France b LAGIS - UMR CNRS 8146, Ecole des Mines de Douai, Département Informatique et Automatique, 59508 Douai Cedex, France
Abstract This paper deals with the modelling of switching systems and focuses on the characterization of the local functioning modes using the online clustering approach. The system considered is represented as a weighted sum of local linear models where each model could have its own structure. This implies that the parameters and the order of the switching system could change when the system switches. Moreover, possible constants of the local models are also unknown. The method presented consists of two steps. First, an online estimation method of the Markov parameters matrix of the local linear models is established. Secondly, the labelling of these parameters is done using a dynamical decision space worked out with learning techniques; each local model being represented by a cluster. The paper ends with an example and a discussion with an aim of illustrating the method’s performance. c 2007 Elsevier Ltd. All rights reserved.
Keywords: Identification; Markov parameters estimation; Machine learning; Dynamical classification; Switching systems
1. Introduction Switching systems are a particular class of hybrid systems which can be considered as a weighted sum of the linear local models with normalized weights. In this type of modelling, contrary to the more general hybrid systems, only one model is active at each time and then the weights are binary. In fact, a great number of complex industrial applications which include both discrete-events and continuous phenomena could be modelled by switching systems, these latter applications having interesting approximation properties. The identification of such systems is a recent and active research area of increasing activity. A large majority of the contributions published in the hybrid identification literature deals with the subclass of piecewise affine models and several methods have been proposed. The challenge of the identification of hybrid systems could be summarized as the need to distinguish the current local mode and at the same time to estimate the model parameters. Then, each hybrid identification technique is a combination of clustering, regression, and optimization. A clustering method based on hyperplane determination is proposed in [6] which consists of the estimation of two hyperplanes continuously joined together. Inspired by pattern recognition ∗ Corresponding author. Tel.: +33 0 327712445; fax: +33 0 327712980.
E-mail addresses:
[email protected] (K.M. Pekpe),
[email protected] (S. Lecœuche). 1 Tel.: +33 0 320434565; fax: +33 0 320337189. c 2007 Elsevier Ltd. All rights reserved. 1751-570X/$ - see front matter doi:10.1016/j.nahs.2007.11.006
736
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
techniques, [10] proposes to partition the regressor space into regions on which a linear local model is valid and then, estimate each model’s parameters by standard least squares regression. The method proposed in [4] uses a mixed integer programming technique which is NP-hard in the worst case and is unfortunately practically applicable only when the number of data is very small. An extension of this method has been presented in [21] where the solution is of a lower complexity, thanks to the introduction of a change detection algorithm. This algorithm requires the use of a sliding window where only one switch is allowed; this implies an assumption on the minimal duration between the two switches. In the above methods, the order of the system cannot change. [24] proposes an identification of pwarx hybrid models with unknown and different orders. This work is based on an algebraic approach in which homogeneous polynomials are used to realize a segmentation of the regression space into regions which correspond to discrete states. This approach is ideal only in a noise-free context (see also [14]). It is also possible to cluster (in a suboptimal way) the generated data by sorting them using Bayesian inference, but again, the system order and the number of modes have to be a priori known [13]. Some common characteristics of these previous methods are that they are iterative but difficult to apply on-line. Most of them rely on data partitioning either to compute the regions of the regression space and then the related parameters [13,10,21] or to estimate first the parameters so as to derive the switching times [24]. A majority of works deal with the identification of ARX models such as switched ARX models and piecewise affine models. Contrary to input–ouput models, state-space representations are more appropriate to systems analysis in the case of multivariable models. Some authors [18,5] have recently addressed the problem of subspace identification in switching MIMO systems in an off-line context. In [22], the identification problem of piecewise linear systems under a state-space model is undertaken by using the assumptions that the number of modes and the switching times are known. The state representation of local models is determined by subspace methods. To obtain a complete representation of the piecewise linear system, transformations between the linear local models are used to characterize the models into the same basis. In the context of possible structure changes, [19] uses a change detection technique to estimate the switches and a subspace method to estimate the Markov parameters (state representation) of the local models. In this case, as the state is generally unknown, the partitioning of the regression space becomes harder. The detection of the switching times seems to be the alternative to deal with switching MIMO systems using subspace methods but the subspace approach requires a minimum residence time for each mode. In fact, switching systems may also be regarded as time-varying systems. In this case, the data used off-line for the identification of such systems may cover only some modes of the systems and not the whole. The lack of adaptation abilities is probably the weak point of off-line identification methods. This is mainly due to the fact that online mode clustering does not seem to have received great attention in the hybrid systems community. To the best of our knowledge, only [23,11,3] deal with the identification of switching systems in an online context. The papers proposed by R. Vidal are a clever extension of his algebraic geometric approach in order to enable their application without knowledge of the number of modes and their order, but are limited to SISO systems. L. Bako’s paper presents a recursive technique based on the state-space model and subspace tracking algorithms and demonstrates the possibility of identifying online MIMO linear piecewise systems in a recursive way using a detection approach. Recently, [16] presents an approach based on the combination of a recursive identification technique and a classifier set for nonstationary environments. But, this approach is limited to non-stationary systems with a fixed structure. From this concise overview (see [20,17] for a precise overview), we note that few works have been dedicated to online identification of hybrid systems and no work has been addressed in the context of state-space modelling without constraints on the order and on the number of models. In this paper, we propose an online method for switching system modelling based on state-space models. The proposed technique is complementary to the other existing ones in the sense that it could be applied to a new range of identification problems. This method is designed to be recursively applied to MIMO hybrid systems with minimal a priori knowledge. The only required assumptions are: each model has to be stable, and the period before two switches should be greater than a lower bound. The number of the local linear models is unknown and could change with time. The order could change for each local model and constants of the local linear models could occur.2 In fact, the knowledge of the system is obtained using online estimation and unsupervised continuous learning. The approach developed here is similar to the one proposed in [16]: one stage for the estimation and one stage for the classification. 2 Constants could correspond to set points and/or constant faults on sensors or actuators.
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
737
The paper’s structure is as follows. After presenting the overall formulation of the problem in the second section, the third and fourth sections are dedicated to the presentation of the online estimation method and of the online clustering technique respectively. The last section is dedicated to the method experimentation. Its first subsection illustrates an application of this method to a free-surface hydraulic system. Its second subsection presents a discussion of the hyperparameters based on a simulated example of a MIMO switching system. 2. Problem formulation The output of the switching system is represented as a weighted sum of the outputs of h local models: yk =
h X
ps,k .ys,k
with ps,k ∈ {0, 1}
and
s=1
h X
ps,k = 1, ∀k
(1)
s=1
where the scalar k represents the time index, the vector yk ∈ R ` the output of the system, the vector ys,k ∈ R ` the output of local model s and the scalar ps,k the weight associated with ys,k . For each time k, the weights’ summation equals 1. Each local model is represented by the linear state-space model of order n s : xs,k+1 = As xs,k + Bs (µk − u s ) + vs,k ys,k = Cs xs,k + Ds (µk − u s ) + ys + ws,k
(2)
where µk ∈ R m is the input vector and (u s , ys ) are unknown constants of local model s. The deviation of the input vector for the local model s is denoted by u s,k ∈ R m (u s,k = µk − u s ). For simplicity’s sake, the vector u s,k is denoted u k in the next. The process noise vs,k ∈ R n s and the measurement noise ws,k ∈ R ` of the sth local model are zero mean white noises which are uncorrelated with the inputs µk ∈ R m . The vector xs,k ∈ R n s represents the state vector of model s. The local models are assumed to be stable, and being active during a minimal time η [12], this parameter being called the dwell time. From available measurements of inputs µk and outputs yk , our aim is to characterize the current mode using an online classification of the Markov parameter vectors. The final result will be to estimate the membership weights ps,k at each time k. This task is done in two stages described hereafter: - Online local model Markov parameters estimation. First, the online estimation of the Markov parameters matrix of the local models is fulfilled. This black-box estimation uses the state space representation, which is particularly adapted to MIMO processes and does not require any canonical parameterization. Moreover, the use of the Markov parameters suits the estimation of the system parameters with variable structure or order. The method presented is based on the FIR modelling of the local models. It is shown that the regressors of an FIR model lie in the hyperplane whose orthogonal matrix is the Markov parameters matrix. Markov parameters of each local model are estimated by a least squares method from data derived from a sliding window. - Determination of the current functioning mode. This classification is based on a dynamical decision space obtained by online learning techniques. The novelty of the proposed approach consists in exploiting a specific clustering technique, making possible the online modelling of the functioning modes. These ones are modelled by a dynamical neural network technique [15]. When a new observation (new estimation of Markov parameters matrix) is sent to this algorithm, the decision space is updated according to the information brought by this observation and the current functioning mode is determined (the closest linear local model). The characterization of the current functioning mode is given in terms of the membership degree of the identified parameters vector to updated classes. The classes represent the current linear local models. These two stages will be more precisely presented in the two following sections. 3. Local models parameters estimation The online estimation of the local model’s Markov parameters matrix is discussed in this section. First, it is established that the regressors of one local model belong to the same hyperplane, and an orthogonal matrix of this hyperplane is the Markov parameters matrix. Then, in order to characterize the modes, the online estimation of the Markov parameters matrix by the least squares method is proposed.
738
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
3.1. Local model hyperplane equation The aim of this section is to perform the online estimation of the Markov parameters of the local models. To reach this goal, it is firstly established that all the regressors built from the data resulting from the same local model belong to the same hyperplane. This could be proved from the expression of the local models’ output according to the state and the inputs multiplied by the Markov parameters. This expression being developed, the state influence is deleted by weighting it by a high power of the local state matrix (As ) which is supposed to be strictly stable. This implies that each local model output equals the inputs multiplied by the Markov parameters, similar to the approximation by the FIR model. This equality can be rewritten as the orthogonality of the regression vector (which contains the local model output and the inputs) and the Markov parameters matrix. The sth local model being active, its output can be expressed (see relation (2)) as: i−2 ys,k = Cs Ai−2 s Bs u k−i+1 + · · · + Cs Bs u k−1 + Ds u k + Cs As Bs u s + · · · + Cs Bs u s i−1 + Ds u s + Cs Ai−2 s vs,k−i+1 + · · · + Cs vs,k−1 + ws,k + Cs As x s,k−i+1 + ys .
As the local model is assumed to be stable, the term term is considered negligible if:
p
x
Cs Ai−1
≤ variance(ws,k ). s,k−i+1 s
Cs Ai−1 s x s,k−i
(3)
could be neglected for high values of “i”. This
(4)
Thus, (3) becomes: ys,k = Hs,i u¯ k + Hs,i u¯ s + ys + βs,k
(5) R `×mi
Hvs,i
R `×n s (i−1) ,
where the Markov parameters matrices Hs,i ∈ and ∈ the perturbation βs,k due to the noise and the approximation of the local model by a FIR model, the centred input vector u¯ k ∈ R mi , the constant vector u¯ s ∈ R mi and the system noise vector v¯s,k−1 ∈ R `(i−1) are defined as: Hvs,i = Cs Ai−2 (6) Hs,i = Cs Ai−2 · · · Cs As Cs s Bs · · · Cs Bs Ds , s v βs,k = Cs Ai−1 s x s,k−i+1 + Hs,i v¯ s,k−1 + ws,k u k−i+1 vs,k−i+1 us ··· ··· , · · · , u ¯ = v¯s,k−1 = u¯ k = s u k−1 vs,k−2 . us uk vs,k−1
(7)
(8)
Indeed, (5) gives the FIR approximation of the local model and can be rewritten as: Ms z s,k = Hs,i u¯ s + ys + βs,k
(9)
where Hs,i u¯ s + ys is a constant vector, z s,k is the regression vector of the local model and Ms is a matrix orthogonal to the regressors: T T T T z s,k = u k−i+1 ∈ R (mi+`) (10) · · · u k−1 u kT ys,k Ms = Hs,i −I` ∈ R n s ×(mi+`) . (11) In the deterministic case, and if the FIR approximation (4) is verified, all the regressors z s,k lie in the hyperplane: Ms z s,k = Hs,i u¯ s + ys
(12)
But this relation changes in the presence of noise, which makes the modelling problem more difficult. Furthermore, the constants (u s , ys ) (s = 1, . . . , h) are unknown. 3.2. Online estimation of the local models’ Markov parameters The local model hyperplane equation, established in the previous paragraph, contains an unknown vector ¯ s,i an oblique projection is used. Hs,i u¯ s + ys . Therefore, to achieve the estimation of the Markov parameter matrix H
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
739
From the hyperplane equation (9), the matrix (Ms ) orthogonal to the regressors is estimated. This estimation can be carried out if there are mi independent regressors z s,k . These regressors are independent if the inputs are persistently exciting of order mi. Consider the regression matrices z s,k : z s,k = z s,k−ρ+1
···
z s,k+1
z s,k
or
z s,k =
Uk y s,k
(13)
with y s,k = ys,k−ρ+1 · · · ys,k−1 ys,k ∈ R`×ρ u k−ρ−i+2 · · · u k−i u k−i+1 u k−ρ−i+3 · · · u k−i+1 u k−i+2 mi×ρ Uk = .. .. .. ∈ R . ··· . . u k−ρ+1 · · · u k−1 uk
(14)
(15)
and ρ is an integer that is equal or greater than the number (mi) of the matrix Uk rows. The unique matrix which is orthogonal (and defined by Eq. (11)) to this regression matrix is determined. To estimate the matrix Ms , it is enough to determine the Markov parameters matrix Hs,i (see (11)). Here only the input data µk (µk = u k + u s ) are available since the constants are unknown. The following proposition gives the least squares estimation of this matrix from the inputs µk and the outputs yk . Proposition 3.1. Under the following conditions: - the matrix Mk Φ ⊥ is full row rank, - the local model s is stable, the least-squares estimation of the Markov parameters matrix is given by: y s,k Π = Hs,i + Bs,k Π
(16)
where Φ is a row vector which has one as elements, Φ ⊥ is its orthogonal matrix,3 Bs,k is the perturbation matrix, Π is the oblique projection matrix, and Mk is the input Hankel matrix. These latter matrices are defined as : Φ = 1 · · · 1 1 ∈ Rρ (17) `×ρ Bs,k = βs,k−ρ+1 · · · βs,k−1 βs,k ∈ R (18) µk−ρ−i+2 · · · µk−i µk−i+1 µk−ρ−i+3 · · · µk−i+1 µk−i+2 m(i−1)×ρ Mk = (19) .. .. .. ∈ R . ··· . . µk−ρ
···
µk−2
µk−1
Π = Φ ⊥ (Mk Φ ⊥ )Ď . 4
(20)
Proof. By stacking Eq. (5) on time window ρ, we have: ¯ s,i Uk + H ¯ s,i Us + y + βk y s,k = H s
3 The orthogonal matrix to the matrix Q is defined by Q ⊥ = I − Q T (Q Q T )−1 Q. 4 Q Ď denotes the Moore–Penrose pseudo-inverse of the matrix Q.
(21)
740
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
y s ∈ R`×ρ are defined as: us us us us y s = ys · · · .. .. , . . · · · us us
where Us ∈ R mi×ρ and us · · · u s · · · Us = . .. · · · us
ys
ys
(22)
the rows of the matrices y s and Us have the same basis which is the vector Φ. Then the kernel of the row vector Φ is also the kernel of these matrices. Therefore, the projection of the above equation onto the kernel of Φ yields: ¯ s,i Uk Φ ⊥ + βk Φ ⊥ . y s,k Φ ⊥ = H
(23)
M k = Uk + Us
(24)
Since
and M k Φ ⊥ = Uk Φ ⊥ ,
because Us Φ ⊥ = 0,
(25)
¯ s,i is: an estimation of the Markov parameter matrix H ¯ s,i + βk Φ ⊥ (Mk Φ ⊥ )Ď . y s,k Φ ⊥ (Mk Φ ⊥ )Ď = H
(26)
As the noises (vs,k and ws,k ) are zero mean processes, the mathematical expectation of the matrix y s,k Π is: E[y s,k Π ] = E[Hs,i ] + E[Bs,k Π ] = Hs,i + E[Bs,k ]Π whereas (see Eq. (7)): xs,k−ρ−i+2 E[Bs,k ] = Cs Ai−1 s
···
xs,k−ρ−i+2
xs,k−ρ−i+1
which can be neglected if integer i is great enough, or be considered as a deterministic perturbation if the term
Cs Ai−1 xs,k−i+1 is not negligible. To conclude this section, as the output ys,k is unknown, it is necessary to state s specific assumptions in order to use the actual system output yk instead of ys,k . If the consecutive regressors T T T z k = u k−i+1 (27) · · · u k−1 u kT ykT , k = 1, . . . ρ, ρ > mi are generated by the same local model “s” (i.e. yk = ys,k ) then the matrices y s,k (resp. z s,k ) and y k (resp. z k ) are equal. In this case, the local output matrix (for local model s) y s,k can be replaced in the proposition by the global output matrix y s,k y k = yk−ρ+1
···
yk−1
yk ∈ R`×ρ .
(28)
If the matrix y k is built with data coming from two local models, then the system is in transient mode. To guarantee accurate estimates, we enforce the condition that the minimum dwell time should be greater than the block size of the estimate of Hs,i . In this case, the estimation of the Markov parameters could be achieved using non-mixed data, i.e. the data y k and Mk are attached to the same local model. In the next section, the clustering method used to determine the class of the estimated matrix is given. 4. Clustering method The modes modelling tool treats (as they arrive) the data extracted from the identified Markov parameters matrices. The new information is incorporated continuously in order to redefine the models of the classes that characterize the functioning modes (also named local models) and thus to model continuously the decision space used to label the
741
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749 Table 1 Classifier adaptation rules If
Then
1
µtj < µmin ∀ j ∈ {1 . . . J }
Prototype and class creation; Ψnew ∈ Ωnew
2
∃Ψ j ∈ Ωi , µmin < µtj < µmax
Prototype creation; Ψnew ∈ Ωi
3
∃Ψ j ∈ Ωi , µmax < µtj
Prototype adaptation; b update = Ψ old (Ot ) (see 4.1) Ψ j
∃Ψl ∈ Ω p and ∃Ψm ∈ Ωq , µmin < µlt < µmax and µmin < µtm < µmax
4
j
Ambiguity: Ot ∈ χamb ; Possible fusion of Ω p with Ωq Possible fusion of Ψl with Ψm
Input Output (IO) data. The various situations related to the non-stationary environment require a continuous learning process and the setting of specific rules for the adaptation of the decision space. In the area of machine learning, some techniques exist with architectures exploiting incremental learning [7,9]. Most of these algorithms present some disadvantages related to a coarse class modelling and/or limited adaptation capacities in non-stationary environment. In order to fill these gaps, two neural algorithms for the dynamic classification of the evolutionary data have been previously developed [15,1]. These algorithms use a multi-prototype approach, making it possible to accurately model the structure of complex classes. In this paper, the AUDyC network based on a Gaussian modelling is used. Each local mode corresponds to a label of a complex class Ω which could be defined by an assembly of Gaussian prototypes Ψ . For an application to switching system modelling, the local modes are generally represented by mono-prototype Gaussian classes. The activation function of the prototype Ψ j determines the membership degree µtj of the observation Ot to this prototype, and then to the class (or mode). In order to obtain a fine classes representation, this degree is based on the Mahanalobis distance T 1 Ot − O j ∗ Σ −1 µtj = exp ∗ O − O (29) t j j 2 where O j and Σ j are respectively the centre and the covariance matrix of the prototype. The advantage of this classifier is that the prototype parameters are continuously adapted. The use of the membership function allows the implementation of learning rules which have specifically been designed for this dynamical classifier. With the first acquisition O1 , the network is initialized: this involves the creation of the first prototype Ψ1 constituting the first class Ω1 (first local mode). The prototype is parameterized by its centre O 1 = O1 and an initial covariance matrix Σini is selected beforehand. According to new acquisitions, various situations can arise by comparison of the membership degree with two fixed thresholds µmin and µmax (resp. limit of prototype and class membership). Each case leads to a specific procedure (see Table 1). Then, the AUDyC learning process is established in three principal phases. 4.1. First phase: Classification The classification stage corresponds to the creation and adaptation of prototypes and classes. For the cases 1 and 2 of Table 1, the observation Ot is not close to any existing prototype. These cases are similar to a distance rejection which is used to detect the novelty in the multiclass environment. If the observation is not sufficiently close to any class (case 1), it leads to the creation of a new prototype and a new class corresponding to a new system mode. In the case 2, a new prototype is created and allocated to the nearest class in order to contribute to a better definition of the mode model. In situation 3, the observation is close enough to a prototype to take part in its definition. The functioning mode adaptation is then carried out by using the following recursive equations: update
Oj
old
= Oj +
1 (Ot − Ot−N p +1 ) Np
(30)
742
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
1 1 Np N p (N p − 1) update T Σj = Σ old (31) j + ∆O −(N p + 1) ∆O 1 N p (N p − 1) N p (N p − 1) h i old old with ∆O = Ot − O j Oi−N +1 − O j , N p : prototype size. The prototype is adapted by introducing the newest observation and by removing the oldest observation. A prototype is then defined by the set of the N p most recent observations close to its centre.
4.2. Second phase: Fusion The case 4 of Table 1 depicts the case of the ambiguity rejection when an observation is sufficiently close to two or several prototypes (e.g. l, m) to contribute to their structure. The fusion procedure consists in evaluating an acceptance criterion based on the Kullback–Leibler distance [25]. When this criterion is higher than a threshold, the different classes (e.g. p, q) merge onto a unique new mode. To be less noise-sensitive, the fusion procedure is only activated when the cardinality of the set χamb is equal to Namb [2]. 4.3. Third phase: Evaluation The evaluation phase is significant for eliminating prototypes and classes possibly created by the influence of noise or by poor information. To detect non-representative modes, this phase is based on the cardinality of the modes. The mode’s cardinality is compared to the threshold N Cmin . Each mode with a cardinality lower than N Cmin is eliminated. Then, a mode is created when its cardinality is equal to N Cmin . In the same way, N Pmin defines the threshold for a prototype creation and then for the novelty detection. When a switch occurs (jump of observations), a new prototype will be upheld as soon as the number of stabilized observations is equal to N Pmin . A mode creation is indeed a confirmation of the novelty detection since N Cmin is fixed slight greater than N Pmin . For more details on the AUDyC network, the reader can consult [15,2] which give theoretical and practical analysis of the AUDyC. 5. Application and experiments This last section concerns the experimental validation of the proposed approach. In the next paragraph, the principle of the method is illustrated through an application example dedicated to data allocation in a free-surface hydraulic system. From this application, a discussion about the tuning of the most significant hyperparameters concludes this section. This discussion is highlighted by results stemming from the simulation of a MIMO system. 5.1. Application to a free-surface hydraulic system This application consists in data allocation on a simulated model of the first reach of the Neste Canal located in Gascogne (South of France). After a short presentation of the models of this system, the principle of the proposed method is given and experimental results are shown. The multimodelling proposed in [8] leads to three discretized local models defined by the following transfer functions (Fs , s = 1, 2, 3): 0.002405z 3 + 0.07898z 2 + 0.07604z + 0.003702 z 4 − 1.051z 3 + 0.212z 2 0.09798z + 0.09136 F2 (z) = z 2 − 0.8107z 0.1155z + 0.1062 . F3 (z) = z − 0.7783 F1 (z) =
(32)
These different modes correspond to the different dynamics due to various combinations of input flow rate, input level and water demand.
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
743
Fig. 1. Data and classes location.
5.1.1. Online mode determination For this application, the IO sequence consists of the successive data stemming from the different modes. In fact, during this sequence, the changes of the system dynamics are characterized as follows: mode 1 over the intervals [1, 479], [670, 809] and [980, 1299], mode 2 over the intervals [480, 669] and [810, 879] and mode 3 over the interval [1300, 1500].5 At the beginning, only the input µk and the output yk are known. From these data, the estimation of the Markov parameters matrix is achieved by Proposition 3.1. The size of the sliding window ρ equals 30 and the integer i is fixed as 25. ˆ s,i constitute the observation vector Ot sent to the clustering tool. In fact, The values of the identified matrix H this vector consists of all the monitored parameters. According to the application’s complexity, its dimension could be reduced (Markov parameters selection or reduction) or increased by adding, for example, complementary physical information. In this paper, in order to present tight and comprehensible results, a space reduction is applied in order to obtain a 3D representation. From the Ot information, the dynamical classifier updates the decision space (functioning mode models) and determines the current functioning mode. For the whole of this application, the AUDyC parameters are fixed as follows: Σini = 4, µmin = 0.1, µmax = 0.3, N Cmin = 30, N P = 500 and Namb = 15. As the three first parameters are closely dependent, the initial covariance matrix is only tuned by the user. This parameter has to be set according to the data distributions. The only way to estimate it is to use an initial dataset which constitutes the primary knowledge and to apply a method based on the computation of the inter-distance between the data and classes. Concerning the thresholds, previous studies have shown that they are independent from the distribution parameters. Then, in the assumption of a Gaussian approximation, we fix the lower threshold (resp. the upper threshold) so that they approximately characterize a distance equals the variance times two (resp. times 1.5).6 For more information about the choice of the parameters, the reader can refer to [2]. The details about the influence of the initial covariance matrix are presented through the simulations in the Section 5. Brief comments on the choice of N Cmin and N P which are also estimated through the initial dataset distribution, are also given in the last paragraph. Fig. 1 illustrates the final representation space of the Ot raw data and gives the final class locations. Despite the fact that these figures are static, the data are classified online and the decision space is built and updated in a recursive way: the data are treated (and classified) sample by sample and new modes are created as time goes on. At the beginning, no initial a priori knowledge is required. As this classifier is dynamical, from the first similar data, the classifier initializes the initial decision space by creating the first class corresponding to the first local model. The next modes are created when novelty is detected according to the Table 1. More precisely, when a switch occurs, the data generated after the switch are not recognized as a known class. After a delay corresponding to a stabilization of these data (when
5 The sampling period is 89 s. 6 For the normal distribution, the half width at half maximum equals the variance times 1.178.
744
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
Fig. 2. Membership degrees.
Fig. 3. Classification of the Markov parameters matrices.
the number of stable observations exceeds the N Cmin threshold) in a particular area of the decision, a new class that characterizes the second functioning mode is created. All the new classes are created in this way (cf. Table 1). Remark 5.1. About the discernibility of the different modes, it is important to note that the proposed clustering approach is only able to create non-overlapping clusters. Then, a mode discernibility condition could be defined via the distance (in the sense of Kullbach–Leibler) between two modes. From a theoretical point of view, this distance should be superior to the LS estimate variances times two. At each time, the current functioning mode is well determined by using the membership degree of the observation (membership ratio rule according to a membership threshold denoted by θ). Fig. 2 gives the evolution of the membership degree of the observations correlated with the actual mode labels. From this figure, it is possible to deduce the values of the instant of the mode’s detection. The mode 1 is created at the sample 130 (the 100 first samples are not used) and recognized at sample 677 and 1005. The mode 2 is created at sample 546 and recognized at sample 841. The mode 3 is created at sample 1364. On Fig. 3 where the membership threshold equals 0.25, it can be noticed that the observations located between modes are not classified and the class creation is effective after an extra delay corresponding to N Cmin . When the mode is already known, the decision is made more quickly (e.q at sample 840, way through mode 1 to mode 2). In this case, the delay of recognition is mainly due to the convergence of the Markov parameters matrices estimation and to the noise influence. Fig. 3 where the membership threshold equals ε (ε 1) illustrates the behaviour of this method for a small threshold. In this case, the membership is forced to come from known classes. This confirms the proximity between the mode 2 and the mode 3. The evolution from the mode 1 to the mode 3 leads to an intermediate mode which is the mode 2. One can notice, even if this is not presented on the figures, that the classes’ parameters are continuously adapted. This allows an accurate definition of the decision space and makes possible the modelling of non-stationary local linear
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
745
Fig. 4. Classified data and final decision space.
models [16]. Of course, from this point, expert knowledge should be introduced in order to define if this evolution is “normal” (filling, dredging) or if a progressive failure appears to the process (bank leak for example). 5.2. Discussions on MIMO simulation The last paragraph presents a brief discussion about the tuning of the hyperparameters. In the previous example, several parameters have been set. For the estimation stage, the parameter ρ is the most influential. For the classification stage, most of the parameters are fixed whatever the application is. But, three of them (Σini , N Cmin and N P ) have to be tuned according to the application. The influence of these significant parameters is highlighted through simulation experiments. The system used for the simulations is described hereafter. 5.2.1. MIMO system description The considered switching system is a combination of three local linear models. The state-space model is used and each matrix (As , Bs , Cs ) is specific to each local model (except Ds = 0). The matrices of the local models are defined below: 0, 4 0, 1 0 1, 5 0, 9 0, 8 1, 1 2 −1 , A1 = 0, 8 0, 4 0 , B1 = 1 C1 = −1, 3 0, 7 1, 7 (33) 0 0 0, 8 −1, 5 2, 3 1, 5 0, 7 −0, 9 0, 8 1, 1 0, 4 0, 6 1, 5 0, 9 A2 = , B2 = , C2 = −1, 3 0, 7 0, 5 0, 1 1 −1 1, 5 0, 7 0, 3 0, 2 0 0 , B3 = B1 , C3 = C1 . A3 = 0, 8 0, 2 0 0 −0, 75 The constants are fixed to zero. The changes of the system’s dynamics are summarized as follows: s = 1 over the intervals [1, 199], [400, 499] and [700, 799], s = 2 over the intervals [200, 299], [500, 599] and [800, 950] and s = 3 over the intervals [300, 399] and [600, 699]. The inputs u k are Pseudo-Random Binary Sequences (PRBS) with variable amplitudes. The signal-tonoise ratio of the outputs with respect to the measurement noises is 20 db. The process noises vk are Gaussian white noise and have a covariance var(vk ) ' 7 × 10−4 I3 ; the covariance matrix of the inputs is var(u k ) ' 7 × 10−2 I2 . During the simulation, 900 input–output data pairs have been processed on the system. According to the system’s description, the parameters are fixed, except when specified, as follows: i = 10, ρ = 30, Σini = 4, µmin = 0.1, µmax = 0.3, N Cmin = 30, N P = 500 and Namb = 15. The final results are shown on the Fig. 4. The circles represent non-classified data (that mainly correspond to observations located on the transitions between two modes and are then badly estimated due to the use of mixed IO data belonging to the two modes). The Fig. 5
746
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
Fig. 5. MIMO modes determination.
Table 2 ρ influence ρ
25
30
35
40
σO Vol−1
0.0421 0.0167
0.0252 0.0211
0.0241 0.0316
0.0323 0.0526
SNR
15 db
20 db
25 db
30 db
σO Vol−1
0.0469 0.0129
0.0414 0.0203
0.0192 0.0366
0.0121 0.0521
Table 3 SNR influence
shows the memberships defined according to the same thresholds as mentioned before (the 100 first observations have been removed). The reader can notice the same features as in the previous application. 5.2.2. Discussions on the parameters of the estimation tool In this paragraph, both the influences of the ρ parameter (size of the sliding window) and of the Signal-to-Noise Ratio (SNR) affecting the IO data are characterized in terms of their influences on the final results. The parameter i is equal to 10 and is a fixed constant for the whole of the study. Sixteen parameter combinations have been tested in accordance with a Monte Carlo simulation (only 15 trials). The Table 5 gives, for these different combinations, the mean centre of each mode (three reduced components : O1, O2, O3), the variance (σ O) of the centres and the mean volume of the modes’ covariance (Vol−1 ). These values are extracted directly from the mode parameters O j and Σ j estimated by the dynamical classifier. The main characteristics are given in the Table 2 and in the Table 3. In each cell, the value corresponds to a mean value of different trials. The first comment concerns the evolution of the mean volume. As ρ grows, the volume of modes becomes smaller. With a small sliding window, the estimates are more sensible to noise influence and then the covariance matrix of the modes is more significant. At the opposite, a bigger window has a behaviour similar to a smooth filter on the IO data and then the estimate’s variance is reduced. The influence of the parameter ρ on the variance of the mode centre is most difficult to characterize. In fact, the variance of the mode location is poorly affected by the value of ρ. This could confirm that this parameter has no influence on the bias of the estimates. Another influence of ρ concerns the instant of new mode detections. We could notice that obviously the detection delay as well as the transition duration between two modes are proportional to ρ. This is due to the fact that when a switch occurs, the sliding window mixes IO data stemming from two modes. As an example,
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
747
Table 4 Influence of parameters on classes position and volume N Cmin = 10
N Cmin = 30
O
Vol−1
O 7.0411 −2.91
N Cmin = 50 Vol−1
O
Vol−1
Σini = 2
M1 M2 M3 E1
7.1187 −2.9058 −6.5805 −5.8985
−0.3819 4.8465 −6.9441 0.1493
−1.2281 −0.3064 −0.0244 1.6026
0.3101 0.4049 0.512 0.02
−0.3356 −1.2351 0.1167 4.826 −0.2976 0.1349
7.0138 −0.3518 −1.2453 0.0647 −3.0228 4.6498 −0.2881 0.0418
Σini = 4
M1 M2 M3 E2 E3
7.0168 −2.9466 −6.5564 −0.8315 −5.8994
−0.4653 4.5777 −6.9108 4.1762 0.5426
−1.2615 −0.3139 −0.0342 4.8416 1.5801
0.0347 7.0059 −0.4826 −1.2562 0.0063 0.0317 −2.7337 4.5393 0.7314 0.004 0.025 −6.5556 −6.939 −0.0215 0.0031 0.00049 0.00065
7.0067 −0.5054 −1.2343 0.0017 −2.7782 4.5083 0.7536 0.0012 −6.5556 −6.939 −0.0215 0.0011
Σini = 8
M1 7.0067 −0.5054 −1.2343 0.001 7.1148 −0.5957 −1.2088 0.00013 7.2491 −1.9002 0.0412 2.9E−05 M2 −2.3855 4.3219 0.8286 0.0006 −2.8382 4.2049 1.0376 0.00067 −3.1897 4.6988 −0.9289 2.6E−05 M3 −6.4018 −5.1297 0.3784 0.00047 −6.4172 −5.6717 0.1899 7.1E−05 −6.046 −7.0763 0.0802 1.1E−05
when ρ is fixed to 25, the transition mode from mode 1 to mode 2 is from samples 268 to 2697 and the transition from mode 2 to mode 3 is from samples 358 to 360. When ρ is fixed to 45, the transition mode from mode 1 to mode 2 is from samples 273 to 293 and the transition from mode 2 to mode 3 is from samples 369 to 373. To conclude, the influence of the integer i is like the parameter ρ influence since (ρ > mi). Therefore, the high values of integer i imply the high values of ρ. Thus the estimation is improved by high values of integer i, seeing that the high values of ρ improve the estimation. About the influence of noise, it might be noticed that both mode location variance and mode volume decrease as the SNR is getting better. 5.2.3. Discussions on the parameters of the clustering tool This section is dedicated to the analysis of the influence of the N Cmin parameter and Σini which are the most influential on the quality of the results. The other significant parameter N p allows specifying the evolution dynamic of the prototypes, and then has no influence when the local modes are stationary. The Table 48 shows the results (interpreted in terms of mode location and volume) according to various values of Σini and N Cmin . The first influence of these parameters is obviously on the number of created modes. The key idea is that a new mode is created when a sufficient number of observations (equal to N Cmin ) are close together in a bounded region (defined by Σini ). When Σini is chosen too small and N Cmin is too high, there is a risk of mode neglect. At the opposite end, if N Cmin is too small, more modes could be created. These modes could be characterized, according to the value of Σini , by non-optimised modes close to adequate modes (see E1 on Table 4) and/or characterized by transient modes between adequate modes (see E2 and E3). To set N Cmin correctly it is only necessary to have a coarse idea about the minimum duration of a mode. As the proposed approach is dedicated to a switching system where an assumption on the dwell time has been posed, the dwell time could contain enough information to fix the value of N Cmin . Σini influences only the initial volume of the created modes. To set correctly Σini , it is necessary to quantify the noise level affecting the IO data in order to choose a value greater than the noise variance. Pre-dimensioning data could be used to initialize this value (as described in Section 5.1.1). In fact the covariance matrix will be adapted and the classifier will converge on the accurate matrices. In this simulation, the number of data are limited (less than 150 by mode) and N P is fixed with a high value; then we could still notice the influence of Σini on the «final» decision space. 6. Conclusion A new approach has been presented for the online determination of the functioning modes of a switching system. This approach is based on a recursive estimation tool coupled with a dynamic classification algorithm. The method is 7 Extreme values of the MC simulation. 8 The 3D projection being different for each simulation, the results are given with a different basis than those of the Table 5.
σO
0.61 1.3E −02 0.04 1.6E− 01 0.29 1.2E− 02
O3
6.51
6.54
Mode 3
1.03
3.27 −4.90
−6.86
1.26
0.08 8.6E− 03 0.03 1.0E− 02 0.22 2.0E− 02
0.16 1.2E− 02 3.50 −4.73 0.01 1.2E− 03 6.06 5.72 −2.63 1.2E− 03
−7.00
1.30 −0.17 5.1E− 02 3.46 −4.59 −0.05 2.0E− 01 5.76 6.43 0.26 1.0E− 02
−7.38
Mode 2
RSB = 30 db Mode 1
Mode 3
Mode 2
RSB = 25 db Mode 1
Mode 3
Mode 2
RSB = 20 db Mode 1
6.64
5.31
Mode 3
0.72
O2
3.38 −4.30
−7.63
O1
ρ = 25
Mode 2
RSB = 15 db Mode 1
Table 5 Parameters study
6.1E− 02 8.2E− 03 7.8E− 03
6.2E− 02 7.0E− 03 1.2E− 03
2.4E− 02 5.8E− 03 1.2E− 03
9.1E− 03 2.3E− 03 1.0E− 03
V ol −1 O2
O3
σO
1.52
1.18
0.09 1.1E− 02 3.00 −5.09 −0.07 8.9E− 03 6.57 6.44 0.29 1.4E− 02
−6.85
1.14
6.42
0.12 4.1E− 02 0.00 2.8E− 02 0.38 8.2E− 03
0.17 7.9E− 03 3.69 −4.73 −0.10 2.3E− 02 5.92 5.92 0.44 8.4E− 03
−7.07
5.90
3.39 −4.52
−7.36
0.73 −0.07 9.4E− 03 3.41 −4.86 0.13 1.3E− 01 5.23 6.67 −0.34 8.1E− 03
−7.58
O1
ρ = 30
7.6E− 02 9.4E− 03 8.3E− 03
7.1E− 02 7.0E− 03 5.1E− 03
3.4E− 02 5.9E− 03 1.2E− 03
1.6E− 02 5.9E− 03 1.2E− 02
Vol−1 O3
σO
1.17
1.11
6.18
6.55
5.95
3.39 −4.91
−6.82
5.98
3.72 −4.83
−6.92
1.50
0.11 1.3E− 02 0.03 1.0E− 02 0.22 2.6E− 02
0.18 1.0E− 02 0.10 1.1E− 02 0.34 1.5E− 02
0.18 4.2E− 02 3.57 −4.83 −0.02 3.8E− 02 5.55 6.09 0.18 8.0E− 03
−7.51
0.89
O2
0.15 7.3E− 03 3.73 −5.03 −0.08 9.0E− 02 4.92 6.29 0.21 1.2E− 02
−7.56
O1
ρ = 35
9.6E− 02 4.2E− 02 9.1E− 03
7.8E− 02 4.2E− 02 8.2E− 03
5.2E− 02 9.7E− 03 1.3E− 03
2.3E− 02 5.0E− 03 1.2E− 02
Vol−1
1.28
O2
1.04
0.17 7.2E− 02 3.34 −5.20 −0.02 1.1E− 02 5.38 6.19 0.32 2.5E− 02 −6.97
1.13
0.24 1.3E− 02 3.96 −4.93 −0.07 8.4E− 03 5.50 6.21 0.35 4.4E− 02 −7.07
1.14
6.30
σO 0.18 1.3E− 02 0.03 8.4E− 02 0.23 7.8E− 03
O3
0.13 7.4E− 02 3.38 −5.07 −0.02 2.6E− 02 5.56 6.37 0.18 8.5E− 03 −7.62
5.64
3.69 −5.09
−7.45
O1
ρ = 40
2.2E− 01 6.4E− 02 1.9E− 02
9.2E− 02 5.6E− 02 9.7E− 03
5.3E− 02 4.8E− 02 6.6E− 03
1.7E− 02 3.1E− 02 1.3E− 02
Vol−1
748 K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
K.M. Pekpe, S. Lecœuche / Nonlinear Analysis: Hybrid Systems 2 (2008) 735–749
749
dedicated to switching MIMO systems, where each linear local model could have its own structure (parameters and order). The interest of this approach is also to take into account commutations and evolutions of modes without a priori knowledge on possible constants, the model’s structure, and the number of modes. This advantage is due to the estimation of the Markov parameters and the use of a dynamical classifier. References [1] H. Amadou-Boubacar, S. Lecœuche, A new kernel-based algorithm for online clustering, in: 7th International Conference on Artificial Neural Networks, Warsaw, Poland, 2005, pp. 350–358. [2] H. Amadou-Boubacar, S. Lecœuche, S. Maouche, Audyc neural network using a new gaussian densities merge mechanism, in: 7th International Conference on Adaptive and Natural Computing Algorithms, Coimbra, Portugal, 2005, pp. 155–158. [3] L. Bako, G. Mercèe, S. Lecœuche, Online structured identification of switching systems with possibly varying orders, in: European Control Conference, Kos, Greece, 2007. [4] A. Bemporad, J. Roll, L. Ljung, Identification of hybrid systems via mixed-integer programming, in: 13th International Conference on Artificial Neural Networks, Orlando, FL, 2001, pp. 786–792. [5] Borges José, Vincent Verdult, Michel Verhaegen, Miguel Ayala Botto, A switching detection method based on projected subspace classification, in: 44th IEEE Conference on Decision and Control and European Control Conference ECC 2005, 2005. [6] L. Breiman, Hinging hyperplanes for regression, classification, and function approximation, IEEE Transaction on Information Theory 39 (3) (1993) 999–1013. [7] D. Deng, N. Kasabov, On-line pattern analysis by evolving self-organizing maps, Neurocomputing 51 (2003) 87–103. [8] E. Duviella, P. Chiron, P. Charbonnaud, Hybrid control accommodation for water-asset management of hydraulic systems subjected to large operating conditions, in: 1st Workshop on Applications of Large Scale Industrial Systems. Helsinki, Finland, 2006. [9] T. Eltoft, R. deFigueiredo, A new neural network for cluster-detection-and-labeling, IEEE Transactions on Neural Networks 9 (1998) 1021–1035. [10] G. Ferrari-Trecate, M. Muselli, D. Liberati, M. Morari, A clustering technique for the identification of piecewise affine systems, Automatica 39 (2) (2003) 205–217. [11] Y. Hashambhoy, R. Vidal, Recursive identification of switched arx models with unknown number of models and unknown orders, in: 44th IEEE Conference on Decision and Control, Seville, Espagne, 2005, pp. 6115–6121. [12] J.P. Hespanha, A.S. Morse, Stability of switched systems with average dwell-time, in: 38th IEEE Conference on Decision and Control. Phoenix, USA, 1999, pp. 2655–2660. [13] A.Lj. Juloski, S. Weiland, W.P.M.H. Heemels, A bayesian approach to identification of hybrid systems, IEEE Transactions on Automatic Control 50 (10) (2005) 1520–1533. [14] A.Lj. Juloski, W.P.M.H. Heemels, G. Ferrari-Trecate, R. Vidal, S. Paoletti, J.H.G. Niessen, Comparison of four procedures for the identification of hybrid systems, in: HSCC, HSCC, 2005, pp. 354–369. [15] S. Lecœuche, C. Lurette, Auto-adaptive and dynamical clustering neural network, in: 13th International Conference on Artificial Neural Networks, Istanbul, Turkey, 2003, pp. 350–358. [16] S. Lecœuche, G. Mercère, H. Amadou-Boubacar, Modelling of non stattionary systems based on a dynamical decision space, in: 14th IFAC symposium on System Identification, Newcastle, Australia, 2006. [17] S. Paoletti, A. Juloski, G. Ferrari-Trecate, R. Vidal, Identification of hybrid systems: A tutorial, European Journal of Control 513 (2–3) (2007) 242–260. [18] K.M. Pekpe, K. Gasso, G. Mourot, J. Ragot, Subspace identification of switching model, in: 13th IFAC Symposium on System Identification, Rotterdam, The Netherlands, 2003. [19] K.M. Pekpe, K. Gasso, G. Mourot, J. Ragot, Identification of switching systems using change detection technique in the subspace framework, in: 43rd IEEE Conference on Decision and Control, Atlantis, Bahamas, 2004. [20] J. Roll, Local and piecewise affine approaches to system identification, Ph.D. Thesis, Department of Electrical Engineering, Linkoping University, Linkoping, Sweden, 2003. [21] Jacob Roll, Alberto Bemporad, Lennart Ljung, Identification of piecewise affine systems via mixed-integer programming, Automatica 40 (2004) 37–50. [22] V. Verdult, M. Verhaegen, Subspace identification of piecewise linear systems, in: 43rd IEEE Conference on Decision and Control, vol. 4, Atlantis, Bahamas, 2004, pp. 3838–3843. [23] R. Vidal, B.D.O. Anderson, Recursive identification of switched arx hybrid models: Exponential convergence and persistence of excitation, in: 43rd IEEE Conference on Decision and Control, vol. 1, Atlantis, Bahamas, 2004, pp. 32–37. [24] René Vidal, Identification of pwarx hybrid models with unknown and possibly different orders, in: IEEE American Conference on Control, Boston, MA, USA, 2004. [25] S.K. Zhou, R. Chellappa, Kullback-leibler distance between two gaussian densities in reproducing kernel hilbert space, in: International Symposium on Information Theory, Chicago, USA, 2004.