Local Model Networks for Nonlinear System Identification

Local Model Networks for Nonlinear System Identification

Copyright © IFAC System Identification, Kitakyushu, Fukuoka, Japan, 1997 LOCAL MODEL NETWORKS FOR NONLINEAR SYSTEM IDENTIFICATION M.D. Brown, G. Lig...

1MB Sizes 0 Downloads 196 Views

Copyright © IFAC System Identification, Kitakyushu, Fukuoka, Japan, 1997

LOCAL MODEL NETWORKS FOR NONLINEAR SYSTEM IDENTIFICATION

M.D. Brown, G. Lightbody and G.W. Irwin

Advanced Control Engineering Research Group, The Queen 's University of Belfast, UK

Abstract: Neural networks have generated considerable interest as nonlinear modelling tools, but their fundamental limitation is the associated non-transparent structure. An alternative approach is to use a set of local models to describe the nonlinear system behaviour. The paper describes the general Local Model network and compares this with the special case of an RBF neural network. A new hybrid optimization algorithm is proposed for training the Local Model Network, that uses a combination of linear and nonlinear techniques. The excellent modelling ability of the Local Model Network is illustrated by tests on both a simulation model and pilot plant data. Keywords: Nonlinear Identification; Nonlinear Models; pH Control

I. INTRODUCTION

Model Network represents a feed forward neural structure in which the global model is constructed from a set of locally valid system sub models (Johansen and Foss, 1993b). The constituent sub models could possibly be neural networks such as the MLP or RBF.

Non-linear systems can be modelled using various techniques such as Volterra and Wiener functional series, but these have had limited success in industrial applications (Unbehauen, 1996). Recently, neural networks have generated considerable interest as an alternative modelling tool, and successful applications have been reported (Miller, et aI., 1990; Hunt, et al., 1992). However, since most industrial processes operate under feedback control within a small region about any given operating point, the on-line training of neural networks must be approached with care.

The paper describes the general Local Model structure and develops a new hybrid optimization algorithm for training the Local Model Network. The modelling potential of the Local Model network is then compared with a Radial Basis function network using a nonlinear simulation and pilot plant data.

An alternative approach is to use neuron support functions that are active only over a small or local region of the network' s input region. Networks of this type include the gaussian RBF (radial basis function) (Parks and Sandberg, 1991), CMAC (cerebellar modular articulation controller) (Parks and Militzer, 1992) and B-(basis) spline networks (O ' Reilly, et aI., 1996). Another major advantage of these local structures is that trammg involves linear optimization (unlike the MLP) and can be performed online (An, et aI. , 1994). However, the major drawback of such networks is that a restrictive number of processing units may be needed to accurately represent the non-linear dynamics over the full operating region.

2. LOCAL MODEL NETWORKS The representational ability of radial basis function neural networks can be generalized to form the local model network structure. The basis functions in this instance are used to weight general functions of the inputs as opposed to simply the weights . The network output can be described as (Johansen and Foss, 1993b) : M

Y=F( '1', tP) =:~:t;('I'p;{tP)

(I)

;;:;1

where the M local models fi('I') are linear or non-linear functions of the measurement vector '1', and are multiplied by a basis function Pi(tP) that is a function of the current operating region vector, tP. In the Local Model network, the basis functions Pi(tP) are commonly chosen to be normalized gaussian functions, i.e.

Recent research has suggested that such a fine partitioning of the input space, in which each neural basis function is active, is not necessary. Rather, it is sufficient that the partitioning procedure should simply split the input space into the expected operating regions of the plant. The Local

681

ap(a) M Pj(a)] VJ(a) =2I,N [MI,-a-/j ] .([ ~ji .-) Yi J

(2)

I

The underlying assumption for the local model network strategy is that the systems to be modelled undergo significant changes in operating conditions. The complexity of the overall model can be reduced by incorporating simpler models in each operating region. For example, local state-space and ARMAX models can be formed and then interpolated to give global non-linear state-space and NARMAX (non-linear ARMAX) models (Johansen and Foss, 1993a).

} -I

I

(4)

where i={ 1,... ,N} N is the number of training vectors; k={ 1, ... ,K} K is the dimension of the operating regime; m= {1,... ,M} M is the number of local models These equations are evaluated over the N training patterns and combined to form the batch gradient in equation (4). Having determined the cost function and its gradient, any suitable gradient descent algorithm should succeed in reducing the cost function to at least a local minimum. This study employs a second-order gradient technique that is much more powerful than simple gradient descent or error backpropagation. The proposed method uses the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm (McKeown, et aI., 1990) to build up an approximation to the inverse Hessian, and hence the vector direction. A single line search is then carried out based on Brent's method in one dimension (Press, et aI., 1992).

3.1 Optimization algorithm

Referring to equation Cl), if we define Cl = [Cm, am] as the centres and widths of the normalized gaussian interpolation functions, then a hybrid cost function can be defined as:

3.2 Optimization of a dynamic model

(3)

Consider a SISO plant, where the orders of the input and output sequences, n & m, are known or can be de~ermine~ . Minimization of the cost function given by equatIon (3) IS then a static optimization of the series-parallel model (one step into the future) given by the LMN output as:

where N is the number of training vectors. A hybrid optimization strategy similar to that found in Webb (1988) for minimizing equation (3), can be derived by considering a non-linear technique that finds the centres and widths of the normalized gaussian interpolation functions, and a subsequent linear optimization of the local linear models at each iteration. Structuring the overall optimization in this way can lead to a number of advantages in training the network. Firstly, the number of independent parameters in the network is reduced . Secondly, the network is always in a state where the error is at a global minimum in the space of the local models, since these are obtained by linear optimization. The hybrid optimization strategy proposed in this paper uses a second-order descent method for determining the non-linear parameters, with singular value decomposition used at each iteration to determine the linear models, as follows: Cl

I

(5)

3. HYBRID LEARNING SCHEME

The gradient with respect to by (3) is given as:

a

Partial differentiation with respect to Cl of the i'h local model network output (with normalized gaussian activation functions and local linear models) is given by the expressions:

The identification of local operating regimes for an unknown plant is difficult. A priori knowledge of the plant can be used at this stage. When little knowledge of the actual regimes exists, however, it may be beneficial to use unsupervised learning methods, such as k-means clustering and nearest neighbours, to give an initial estimate of the normalized gaussian activation regions. These clustering methods are valid in this case since many plants tend to operate in distinct regions. Despite these difficulties, successful applications of the Local Model technique have been reported in the biotechnology and chemical engineering industries (Johansen and Foss, 1995; Johansen, et al., 1995) .

J(a)=~[(~/jpj(a»), -y,f[(f/jpj(a»), -Yi]

)=1

y(k + 1)= F('I',cI»;

'I'

cl>

c 'I'

= [y(k),y(k -1), .. ., y(k -

(6)

n), u(k),u(k -1), . .. ,u(k - m)f

where y(k) is the plant output and u(k) is the plant input at time k respectively. Following usual system identification practice, the LMN network is trained on one data set (training set) and tested on another data set (test set). As training progresses, the LMN network will learn the dynamic behaviour and then begin to fit the noise. This is termed overtraining or training interference. This problem can be alleviated by cross-validation using the test set (Pollard, et aI., 1992) . Trainineo should be stopped at the minimum value of the . test set error, since at this point the network has Its maximum predictive capability. Also, the predictive capability can be checked more thoroughly by forming the test set error on the parallel model output rather than the one-step-ahead error as used for training, i.e.:

of the cost function defined

682

Y(k+I)=F(W. ~);

considerably more parameters. Also, the final RBF structure is not a transparent model, and it cannot easily incorporate a priori plant information. In contrast, the Local Model Network does incorporate prior knowledge (e.g. operating regime parameters, initial operating regions and number of local models) and can be described by the transparent Nonlinear ARX structure, i.e.:

~cW

W=[y(k),y(k -I) ..... y(k -

(7) n),u(k),u(k -I) •.. .• u(k - m)f

This strategy, however, can be a lengthy process. The ultimate goal is, of course, to obtain a good parallel model. However, the static optimization strategy only minimizes the one-step-ahead error. From a dynamic modelling point of view this can emphasize high frequency components, resulting in overfitting and noise learning. Hence, the obvious solution is to minimize the parallel model error directly. A numerical approximation to the parallel cost function gradient using forward finite difference calculations is used in this study (McKeown, et aI., 1990).

y(k + 1)= Bou(k) + B]u(k -1)+ ... + Bmu(k -m) +A]y(k)+ A 2 y(k -1)+ ...+A(n+])y(k - n)

where

The hybrid learning scheme was used to fit a Local Model network to a non linear model of a pH neutralization plant. This plant is a highly non-linear system in which acid, buffer and base streams are mixed in a tank and the effluent pH is measured (Nahas, et aI., 1992). Separate training and test sets were generated using variable step changes in base flow rate. Five local linear SISO 2nd order dynamic models were fitted using the training data, i.e. n=m=2 in equation (7) to give:

Bi = LPibij

Ai = LPiaij

j=1

j=1

(9)

CONCLUSIONS Local model networks offer a parsimonious framework for non linear system identification. The constituent local models can also be nonlinear. Indeed, linear models and a priori system information can easily be incorporated within the local model structure.

5

pH(k + I) = If; {1jI)Pi ($);

$ = [PH(k),q3(k -1- d)f

The proposed hybrid optimization method outlined in this work, has distinct advantages over unsupervised learning techniques and complex gradient based schemes. The method uses a combination of unsupervised learning, Hessian based optimization, singular value decomposition and cross-validation to obtain an optimal choice for the interpolation function parameters. The network is always in a state of global minimum in the space of the local linear models since these are determined by a linear optimization technique.

;=1

IjI

M

This simple formulation of the LMN output means that the resultant nonlinear model can easily be used in a number of conventional control strategies such as Internal Model Control and Generalized Minimum Variance.

4. MODELLING A pH NEUTRALIZATION PLANT

,

M

=[PH(k).pH(k -1),q3(k -

(8) d).q3(k -1- d)f

where pH(k) is the process output, q3(k) is the process input base flow rate at time k respectively, and d is the system delay . The Local Model Network therefore has 35 parameters. The same data was also used to form an RBF network with sixteen neurons, giving a total of 96 parameters for the complete description.

Also proposed is a dynamic ·optimization technique, in which the multi step-ahead or parallel predicted model error is reduced directly . A finite difference partial gradient approach is used to calculate the cost function gradient in this case, thus obviating the need to calculate the highly complex recursive dynamic gradient.

For this particular plant, the static process gain between base flow rate and pH (i.e. the slope of the titration curve) varies considerably as the pH changes. This curve was therefore used to initialize the non linear activation regions. Figure I shows the resultant parallel model outputs over a simulation test set after optimization for the RBF network, the LM network scheduled on pH, and the LM network scheduled on pH and base flow. Excellent predictions are obtained for all the models.

Test results taken from a highly nonlinear pH neutralization process show the excellent modelling capability of the LMN and RBF networks. Further, if the local models are of linear ARX type, the Local Model Network then forms a transparent non linear ARX process, leading immediately to its incorporation within well known control structures. This is in contrast to the RBF network, which requires substantially more parameters to perform the same mapping and cannot be analyzed in the same straightforward manner.

The RBF and LMN networks were also used to form a parallel model from data of the actual pH pilot plant (Nahas, et al. 1992). This data contained nonlinearities and measurement noise not found in the simulation. Figure 2 shows the performance of both the RBF and LMN over the same test set. Again, excellent predictions are obtained for both models. Also shown is the RBF output when it has been trained with the usual unsupervised methods of k-means clustering and nearest neighbours. The results show that new hybrid training method is significantly better than the unsupervised methods usually associated with radial basis techniques.

ACKNOWLEDGMENT This work was funded by the Engineering and Physical Sciences Research Council under grant GRlK69476

Although the RBF network has a similar modelling capability as the more general LMN, it does require

683

REFERENCES

Pollard, 1.F., Broussard, M.R., Garrison, D.B. and San, K.Y. (1992) Process identification using neural networks, Computers chem. Engng. , 166, no. 4, pp 253-270 Press, W.H., Teukolsky, S.A., Vettering, W.T. & Flannery B.P. (1992) Numerical Recipes in C: the art of scientific computing, 2nd ed ., Cambridge University Press Unbehauen, H. (I996) Modelling of Nonlinear Systems, Proc. Euraco Workshop on 'Control of Nonlinear Systems: Theory and Applications'. pp 201-218, Portugal Webb, A. & Lowe D. (1988) A Hybrid optimisation strategy for adaptive feedforward layered networks, Royal Signals and RadLlr Establishment Memorandum 4193, HMSO

An, P.E., Brown, M., Harris, C.J., Lawrence, A.J. & Moore, e.G. (1994) Associative memory neural networks: adative modelling theory, software implementations and graphical user interface, Engng. Applic. Artif. Intell., 7, no.l, pp 1-21 Astrom. K.J . & Wittenmark, B. (1973) On Selftuning Regulators, Automatica, 9, pp 185-199 Economou, e.G., Morari, M . & Palsson, B. (1986) Internal Model Control. 5. Extension to nonlinear systems, Ind. Eng. Chem. Proc. Des. Dev., 25, pp 403-411 Garcia, C.E. and Morari, M. (1982) Internal Model Control - I . A Unifying Review and Some New Results, Ind. Eng. Chem. Process Des. Dev. , 21, pp 308-323 Hunt, K.J. and Sbarbaro, D. (1991) Neural Networks for Nonlinear Internal Model Control, lEE Proc. D, 138, no.5, pp431-438 Hunt, K.J., Sbarbaro, D., Zbikowski. R. and Gawthrop, P.J. (1992) Neural Networks for Control Systems - a Survey, Automatica, 28, no. 6, pp 1083-1112 lohansen, T.A . & Foss, B.A. (1993a) State-space modelling using operating regime decomposition and local models, Automatic Control World Congress, 4, ch 251, pp 39-42 10hansen, T .A. & Foss, B.A. (1993b) Constructing NARMAX models using ARMAX models, Int. J. Control, 58, no.5, pp 1125-1153 10hansen, T.A. & Foss, B.A. (1995) Identification of nonlinear system structure and parameters using regime decomposition, Automatica, 31, no. 2, pp 321-326 10hansen, T.A., Foss, B.A. & Sorensen, A.Y. (1995) Nonlinear predictive control using local models - applied to a batch fermentation process, Control Eng. Practice, 3, no.3 , pp 389-396 Lightbody, G. & Irwin, G.W. (1995) A Novel Neural Internal Model Control Structure, IEEE American Control Conj., ACC '95,1, pp 350-354 Miller, W.T., Sutton, R.S. and Werbos, P.J. (1990) Neural Networks for Control, MIT press, Cambridge, MA, USA McKeown, 1.1., Meegan, D. & Sprevak, D. (1990) An Introduction to unconstrained optimisation, Adam Hilger, Bristol Morari, M. and Zafiriou, E. (1989) Robust Process Control Prentice-Hall Nahas, E.P., Henson, M.A. & Seborg, D.E. (1992) Nonlinear internal model control strategy for neural network models, Computers chem. Engng., 16, no 12, pp 1039-1057 O'Reilly, P., Irwin, G.W., Lightbody, K. & McCormick, 1. (1996) Online neural inferential prediction of viscosity, Proc. IFAC World Congress, San Francisco, USA Parks. 1. & Sandberg, LW. (1991) Universal approximation using radial basis functions, Neural Computation, 13, pp 246-257 Parks, P.e. & Militzer, 1. (1992) A Comparison of Five Algorithms for the Training of CMAC Memories for Learning Control Systems, Automatica, 28, no. 5, pp 1027-1035

684