Engng Applic. Artif. InteR. Vol. 7, No. I, pp. 1-21, 1994
Pergamon 0952-1976(93)E0007-U
Copyright © 1994 Elscvicr Science Lid Printed in Great Britain. All rights reserved 0952-1976/94 $6.(X)+ o.00
Contributed Paper
Associative Memory Neural Networks: Adaptive Modelling Theory, Software Implementations and Graphical User Interface P. E. A N Southampton University
M. B R O W N Southampton University
C. J. H A R R I S Southampton University
A. J. L A W R E N C E Southampton University
C. G. M O O R E Southampton University
(Received March 1993; in revised form September 1993) This paper describes in a unified mathematical framework a class of associative memory neural networks (AMN), that have very fast learning rates, local generalisation, parallel implementation, and guaranteed convergence to the mean squared error, making them appropriate for applications such as intelligent control and on-line modelling of nonlinear dynamical processes. The class of A MN considered include the Albus CMAC, B-spline neural network and classes of fuzzy logic networks. Appropriate instantaneous learning rules are derived and applied to a bench mark nonlinear time series prediction problem. For practical implementation, a network software library and graphical user interface (GUI) is introduced for these networks. The data structure is modular, allowing a natural implementation on a parallel machine. The GU1 provides a front end, for high-level procedures, allowing the networks to be designed, trained and analysed within a common environment with a minimum of user effort. The software library is readily integrable into industrial packages such as MA TLAB. Keywords: Associative memory, neural networks, fuzzy logic, CMAC, B-spline, non-linear modelling, parallel implementation, graphical user interface.
I. INTRODUCTION Any intelligent module must be able to modify its behaviour in response to its interaction with the current environment, and to be able to associate its current experiences with similar events that have happened in the past. This means that an intelligent module must be able to adapt and in a local manner. Within the context Correspondence should be sent to: Professor C. J. Harris, Advanced Systems Research Group, Department of Aeronautics and Astronautics, Southampton University, Southampton SO9 5NH, U.K.
of Intelligent Control (IC) an intelligent controller must be able to modify its strategy according to its current performance and this modification will affect the output of the controller for similar inputs. Two methodologies which have seen a very rapid growth in recent years are Neural Networks (NN) and Fuzzy Networks (FN). 2 Despite having been first developed in the 1960s, it is only in recent times that the computing power has been available for cost-effective implementations. Both algorithms make few restrictions on the type of mapping that is to be learnt, as they were developed to cope with highly non-linear, time-
2
P . E . AN et al.: ASSOCIATIVE MEMORY N E U R A L NETWORKS
\
/
,
//
//
\ network output
Y
transformed input space
Y
asie functions
Fig. 1. Schematicassociativememorynetwork. varying, ill-defined plants. 3 However, despite a growing number of successful simulations and applications, the behaviour of many of these algorithms is poorly understood. This is especially true for fuzzy modelling and control where, to the authors' knowledge, the decision surfaces produced are poorly understood and few convergence and stability conditions exist. This paper will describe the CMAC and the B-spline NN within a common structure and it will be shown that, under certain conditions, both the continuous and discrete versions of FNs can be described within the same framework. Instantaneous learning laws will then be derived for the NNs and it can easily be shown that, in the limit, they minimise a Mean Squared output Error (MSE) cost function. These learning laws can also be applied to the FNs and under the same technical conditions it can be shown that they also minimise the MSE. Due to the small number of parameters which are adapted at each stage (because of the local generalisation which occurs in these networks), the adaptation rules are easy to implement and the networks are temporally stable. Temporal stability means that updating the network's response for a particular input minimally affects the output of the network for an input far away. Finally, all three networks will be used to try to model a two-input single-output set of noisy training data which is obtained from a nonlinear time series which in turn converges to a stable limit cycle. The performance of these networks is assessed using a variety of test criteria; computing the MSE on both the training and test data; examining the dynamics of the model; inspecting the smoothness of the model's output; performing an autocorrelation test; and comparing time histories. Although the three networks described in this paper are very similar, each has certain strong features, and both the test problem and the performance indicators have been specially chosen to highlight these attributes. For practical implementation of these associative memory neural networks a modular
network software library and graphical user interface (GUI) is described. In the second half of the paper the modular decomposition provides a natural implementation of the networks on a parallel machine with minimal effort; equally they can be readily integrated into a Matlab environment for process modelling and control. 2. ASSOCIATIVE MEMORY NETWORKS
Associative Memory neural Networks (AMN) such as the CMAC and B-splines have been shown to have many desirable characteristics for on-line nonlinear modelling and control. They have a 'perceptron-like' structure which means that these networks can be decomposed into a fixed nonlinear mapping and an adaptive linear mapping. The initial nonlinear topology conserving map transforms the input into a higherdimensional space in which the desired function is approximately linear. These networks also adopt the principle of local generalisation, which means that for a particular input, only a small region of the network will be involved in calculating the output. This map conserves the topology of the input space in that similar inputs will map to similar regions of the network, while dissimilar inputs will map to completely different regions (Fig. 1). The networks are linear with respect to their set of adaptable parameters (weights) and thus simple instantaneous learning laws can be used, for which convergence can be established, subject to wellunderstood restrictions. The space in which linear optimisation is performed is sparse and so many of the transformed input vectors will be mutually orthogonal, ensuring that the initial rate of convergence is very fast. Specifically an input vector x e ~n is first mapped to a vector a which lies in a higher-dimensional space ~ [0, 1]p. This mapping is such that only a small number, p, of the variables ai will have a non-zero output, and generally the relationship n < p ~ p holds.
P. E. AN et al.: ASSOCIATIVEMEMORYNEURALNETWORKS In addition the transformed inputs, a~, are normalized so that they sum to 1. The fact that only p elements of the vector a are non-zero means that the output of the network is formed from a linear combination of p weights and so the output of the network is stored locally across these weights. This mapping also determines the type of space in which linear optimisation is performed. If p is chosen too large then the generalisation region of the network will be large and learning will not be local. However, if p is too small, the network will be unable to generalise (interpolate) between neighbouring training examples and so the overall approximation ability of the network will be poor. The output of the network is given by
y(t) = a r ( x ( t ) ) w ( t - 1) and a weight contributes to the output of the network only if the input lies in the receptive field (or support) of its correspnding basis function (association cell). The x to a mapping is constructed by specifying an ndimensional lattice. The lattice is defined by specifying a set of knots, 4 and the position of these knots determines the resolution of the network in that part of the input space. Generally the knot density is greater in regions where the desired function changes its value rapidly and sparse in those areas of the input space where the desired function is approximately constant. The placement of these knots influences the learning abilities of the network and the computational cost, as well as the approximation capabilities. This lattice is bounded and so these networks implicitly assume that the input space is bounded, p overlays are then constructed, each of which is just large enough to cover the lattice. Each overlay consists of the union of adjacent, non-overlapping receptive fields which ensures that the input will lie in one and only one receptive field on each overlay and so the input will always lie in p receptive fields in the network (Fig. 2). The basis functions depend on their receptive fields which in turn depend on the generalisation parameter p and the shape of lattice. This decomposition of the algorithm into a set of overlays provides a natural way of implementing these networks on transputers, with an overlay (or set of overlays) being assigned to each transputer. This provides not only a way of decomposing the computational cost, but also means by which the memory can be divided into local memory and global memory, with little interaction between the neighbouring processors. Finally, the output of each basis function must be specified, and this determines the modelling capabilities of the network. The basis functions may be binary (as in the original CMAC), they may be piecewise polynomials (B-splines or dilated B-splines), or they may calculate the distance between the input and an association cell centre and then pass this through a
3
univariate nonlinear function (similar to radial basis functions)? The CMAC has found widespread practical applications in the field of robotics 6-8 and process control. The use of CMAC in a collision avoidance system for autonomous vehicles is currently being researched. Some B-spline applications in control are reported in Refs 11 and 12. 2.1. The C M A C
The CMAC specification requires a generalisation parameter p to be given. This determines the number of basis functions mapped to for any input, as well as fixing the receptive field of each basis function to be a hypercube of volume p~ defined on the lattice. The p overlays, consisting of hyprcubes of volume p", are then displaced relative to each other so that each knot uniquely specifies one side of the hypercubes on one and only one overlay. This ensures that the principle of uniform projection is preserved; when an input moves one cell on the lattice parallel to any one of the input axes it will share p - 1 weights in common with the previous input. Therefore, when two inputs are more than p - 1 cells apart (with respect to the lattice and the norm [I.[[~) the two weight sets have no members in common. Improved overlay displacement algorithms have been proposed in Ref. 13, where the overlays are distributed across the lattice and not purely along the main diagonal. Originally the basis functions had a binary output (1 if the input lies in the receptive field and 0 otherwise) which resulted in the output of the CMAC being piecewise constant, and recently the interpolation capabilities of this nonlinear model have been established.t4 To obtain a continuous output from the CMAC, several research groups have proposed using higherorder basis functions which are defined on the same
j.,,~ ~
i
/ I///
i
/"
.~4
5 th
Overlay Overlay
~II ~
~ 1 " ' Overlay " d- (1,2) ..4:.-GG(.~G'~-GG~~2D input
Fig. 2. AMN Overlaysdefinedon a 2-D lattice.
4
P.E. AN et al.: ASSOCIATIVE MEMORY NEURAL NETWORKS
receptive fields. These functions have a maximum value at the centre of the receptive field which reduces to zero as the input moves closer to the edge of the receptive field. These functions are typically dilated B-splines. ~5 superspheres or trigonometric functions. However using these higher-order basis functions means that the field strength [la[[j is no longer constant across the input space and so the output calculation may have to be normalised if a smooth output is desired. A major feature of this algorithm is that the number of association cells mapped to p, is not dependent on n. This means that the cost of calculating the set of addresses of the non-zero basis functions is only linearly dependent on n.
2.2. B-splines B-splines were originally proposed for use in graphical applications and are formed from a linear sum of piecewise polynomial basis functions of order k. 4' ~2,~6A simple and stable recurrence relationship exists for evaluating the output of the univariate basis functions and the multivariate output is formed from the multiplication of every possible combination of the univariate basis functions. The recurrence relationship for evaluating the jth univariate B-spline of order k, Nkq(x) is given by
Nk.j(x) = \ 2 j _ , - 2,_k/ Nk ,,j_,(x) 2/- x
~ 1 ifxelj_, N,.j(x) = [ 0 otherwise where It is the jth interval [2j, 2j+ 1) with the last interval being closed at both ends and 2j is the jth knot. These basis functions are normalised; for every input the sum over all the basis functions=l. Also, each univariate basis function is k intervals wide (with respect to the lattice) and so each input will map to p = k" multivariate basis functions. Note that this is exponentially dependent on n. Each overlay consists of receptive fields which have a volume of k", and so basis functions are defined at every possible position on the lattice, unlike the C M A C which provides a sparse coding of the input lattice. It is possible to have B-splines of different orders defined on the different univariate input axes in order to reflect the type of information contained in that input. For instance the output may be approximated by a piecewise quadratic function ( k = 3) with respect to one of the inputs and by a piecewise linear function (k = 2) with respect to a different input variable, and this case is illustrated later in this paper. Dilated B-splines can also offer an increased flexibility for modelling by decoupling the relationship between the
0,5
0 0
1
2
3
1
2
3
0.5-
4
5
Fig. 3. B-splines (top) and dilated B-splines (bottom) of order 2. order of the basis function and the width of its support. Instead of insisting that the width of the support must equal the order of the B-splines, it is assumed that the width of the support is an integer multiple of the order of the B-splines (Fig. 3). The basis functions produced resemble the triangular membership sets commonly used in fuzzy logic.
2.3. Fuzzy networks Fuzzy logic was proposed as a methodology for handling linguistic uncertainty. Although fuzzy systems are often discussed in terms of continuous fuzzy sets, many implementations use discrete fuzzy sets. Most continuous fuzzy implementations use a very small number of rules, often with equal rule confidences, constructed from a small set of qualifiers (Positive Large, Almost Zero, etc.); in this implementation the network mapping is encoded precisely. A discrete implementation can encode any number of rules by spatially sampling the network mapping; in this implementation the qualifiers do not necessarily belong to a fixed set (or language) and only an approximation to the mapping is encoded. Both implementations are strongly linked to CMAC and B-spline networks under certain technical conditions.
2.3.1. Continuous fuzzy networks In the continuous fuzzy network representation, knowledge is stored in terms of the confidence, e [0, 1], in a fuzzy production rule being true: IF(x is Positive Small AND x is Almost Zero) THEN(y is Negative Small). Both the set of these fuzzy rules and their corresponding rule confidences define the fuzzy plant (controller) model. The linguistic variables, such as x is Positive Small, are modelled using continuous fuzzy sets which
P. E. AN et al.: ASSOCIATIVEMEMORYNEURALNETWORKS
J•
5
Rule2 LI'I'z(Y) \
xl x2 x3
__
-,
~
•
~ 0 ,~( y )
/
t
Defuzzification
> y
RuleN ~ Fig. 4. Fuzzyproductionrule set.
associate with each value of x a number e [0, 1] which denotes the degree of membership of x in the fuzzy set Positive Small. Fuzzy logic operators are then used to implement the fuzzy AND, OR and IMPLICATION (Fig. 4). It has been shown 17 that if (dilated) B-splines are used to define the fuzzy input sets, symmetrical B-splines (k t>2) are used to implement the fuzzy output sets, algebraic operators (*, + ) are used to generalise the logical operators and a centre of area defuzzication strategy employed. The output of the fuzzy network is then linearly dependent on the fuzzy input sets and is given by
y(t) = ar(t)w(t- 1) where a~(t) is the output of the ith multivariate fuzzy input set, wg(t- 1) is the weight associated with a~ given by w~(t- 1) = c~(t- 1)y% y~ is the centre of the jth fuzzy output set and cij(t-1) is the confidence in the fuzzy production rule being true, relating the ith fuzzy input set to the jth fuzzy output set. This means that the fuzzy networks and the B-spline networks are output equivalent, i.e. for each weight there exists a rule confidence vector such that the output of the two networks is the same. For FNs the knowledge is stored in terms of a rule confidence vector (rather than a weight) which means that the network can be inverted as the input and output variables are treated in a similar manner. This means that a fuzzy model can be built of a plant and then inverted in order realise a controller.IS
2.3.2. Discrete fuzzy networks In a discrete fuzzy network, knowledge is stored in terms of confidence, e [0, 1], that a particular point on the universe of all possible inputs is linked to a particular point on the universe of all outputs. These confidence values are the elements, r~j, of the finite discrete fuzzy relational matrix R, which link input point i with output point ]. The discretisation interval between points can be constant (linear discretisation) or a function of input or output value (non-linear discretisation). The confidence values encode rules of the form If(x' AND x') THEN(y')
where x', x' and y' are fuzzy numbers defined over discrete universes, i.e. if there are N points, x~, on the universe of all values of x, then the fuzzy set x' is given by the membership values p(xi)[~:~,oN. In this implementation, the fuzzy sets are not necessarily members of a fixed language of qualifiers (such as Positive Small and Almost Zero in the previous implementation); for example, x' could represent approximately 4.56 by centering a set shape (triangular, B-spline etc.) on the point 4.56. The width of a set compared to the discretisation interval is the number of non-zero elements of the discrete set at any one time, say s, and the equivalent generalisation parameter is p=s" (for an ndimensional input). When a discrete representation is used to implement a small number of rules, the network output is an approximation of the equivalent continuous case. Under certain strict conditions it is possible for the two implementations to be output equivalent, and in such cases the relational matrix elements rgj are equivalent to the rule confidences c0 defined in the previous subsection. In general these two quantities are not equivalent and hence different notation is used here. For a single rule, the elements of R are given by the intersection of the input and output sets (note that in general the input is a multivariable set). For multiple rules, R is given by the union of the relations for the individual rules. Once again, if algebraic operators (*, + ) are used to implement the logical operators and centre of area defuzzification is used, the output of the network is given by
y(t) = ar(t)w(t- 1) where a~(t) is the membership value at the ith discrete point on the multivariate fuzzy input set and w~(t- 1) is the weight associated with ai, which is computed from the relational matrix R. In this case the equivalent weight vector is given by Wi(t-- 1) = rri(t- 1)y Cwhere y~ is the centre of the jth discrete point on the output universe and r~(t-1) is a column of the relational matrix whose members are rij(t - 1). The above description assumes that the fuzzy input sets are normalised, that is Z a~---1. This description of both continuous and discrete
6
P.E. AN et al.: ASSOCIATIVE MEMORY NEURAL NETWORKS
fuzzy logic is particularly suitable for self-organising fuzzy models (controllers) as the model is linear in terms of its adjustable parameters (weights or rule confidences). The basic model is a piecewise polynomial which is dense in the space of continuous nonlinear functions and the compact support of the fuzzy input sets means that for each input, only a small number of rules will contribute to forming the output. However this number of rules is exponentially dependent on the size of the input space. A possible solution to this problem is to devise a fuzzy CMAC, where the output would be treated as an extra input and a rule confidence would be stored for each basis function instead of a weight. This means that the number of rules that are active for any input would not be dependent on the size of the input space, but on the userdefined generalisation parameter p. The rule confidence vector (for each multivariate fuzzy input set) would no longer be normalised and the basic model would be a normalised piecewise polynomial. A design compromise is being made, trading output smoothness for computational efficiency (especially in higherdimensional spaces).
3. LEAST MEAN SQUARE LEARNING AMNs use an instantaneous measure of how well the network is performing in order to update the weight vector. AMNs are particularly suited to this type of learning rule, both because the output is linearly dependent on the weight vector and because the transformed input vector is sparse, which means that only those weights which contributed to the output will be modified. Generally the performance measure is simply taken to be the MSE, whose instantaneous estimate at time t is simply !e~(t),
(1)
def
where ey(t)= f:(t)- y(t) and 9(t) is the desired outut of the network at time t. The (unbiased) instantaneous estimate of the gradient is given by
-tv(t)a(t)
(2)
Then an instantaneous gradient descent law of this form can be employed to adapt the weight vector A w ( t - 1) = bev(t)a(t),
(3)
dcf
2.4. Comparison Both the C M A C and the B-spline networks are very similar, especially when higher-order basis functions are used in the CMAC. B-splines have a greater flexibility; they allow basis functions of different widths and shapes to be defined on the different inputs; and are normalised, eliminating a network output normalisation procedure. However many of the functions which the networks are trying to learn are redundant (especially in medium/high-dimensional spaces) and so the B-spline network, in these cases, is overly complex. The C M A C network provides a sparse coding of the input lattice which ensures that, in general, a lesscomplex model is formed ( p < k " ) and so the CMAC stores information more efficiently than the B-spline network. However, when higher-order basis functions are used, the outputs of the basis functions are not normalised, which provides a less smooth output. Normalising the outputs (with respect to the sum over all the outputs of the basis functions) generally gives a smoother output. These networks also have their fuzzy equivalents, continuous and discrete, where knowledge is stored in terms of a fuzzy production rule being true. There also exists (under certain circumstances) an invertible mapping between a weight and its corresponding rule confidence vector, and so the outputs of these networks are equivalent. This means that in a forward-chaining mode the networks are equivalent, although it is possible to invert the fuzzy model (backward chaining) if desired.
where A w ( t - 1 ) = w ( t ) - w ( t - 1 ) and 6 is the learning rate. This learning rule is called the Least Mean Square rule (LMS) because, in the limit, the weight vector will tend towards its 'optimal' value (subject to its existing, and a sufficiently exciting bounded input signal). ~ Using this learning rule, the a posteriori output ey(t) (the output error after adaptation) is related to the a priori output error by
ey(t) = (1 - b]la(t)ll~)ev(t ),
(4)
which is a function of Ila(t)l[2. Hence the storage ability of the network depends on the size of the transformed input vector (note that for AMNs the size of the transformed input is bounded below by some quantity greater than zero and above by 1). Stability of the learning rule is assured if 0 < b < max, (2/lla(t)ll~) which will guarantee that the absolute value of the a posteriori output error is less than the corresponding a priori output error. Alternatively, an error-correction rule called the Normalised Least Mean Square (NLMS) may be derived. This rule updates the weight vector so that A w ( t - 1) = c(t)a(t) )~(t) = ar(t)w(t)
(5)
Solving for the scalar c(t) and including the learning rate, 6, gives a(t) A w ( t - 1) = bey(t) Ila(t)ll~'
(6)
Rather than minimising the MSE of the training set,
P. E. AN et al.: ASSOCIATIVE MEMORY NEURAL NETWORKS
this rule minimises a biased cost function which is given by
(e~(t) E ~,lla(t)ll~/
ey(t) = (1 - 5)ey(t)
training rule. So evaluating the derivative of eqn (1) with respect to the ith rule confidence vector gives a learning rule of the form Aci(t - 1) =
as Eqn (6) is simply a scaled (by _ l ) instantaneous estimate of the derivative of this cost function. Using this modified learning rule the reduction in the output error is not dependent on the size of the transformed input vector (7)
and stable learning is assured if 0 < 6 < 2. These two learning rules have their stochastic approximation counterparts which let the learning rate tend to zero as time increases. This filters out measurement noise and modelling mismatch and is discussed in more detail in Ref. 20. This is necessary when there exits modelling error which causes the weights to converge to a minimal capture zone) ~ rather than to a unique value, for a fixed learning rate. The simple form of the learning rules given in Eqns (3) and (6) should be emphasised: the recommended weight update is simply a scalar multiplied by the transformed input vector and so is very cheap to implement. It could be argued that because only linear optimisation is being performed it might be appropriate to use Recursive Least Squares to update the weight vector. However this requires a non-singular solution matrix and even if the desired function is completely known (specified within every cell on the lattice) the solution matrix may be singular.~4 Also, the algorithm has a computational cost of O(p2), whereas for the previous two algorithms it is O(p), (p~p). The above multi-input single-output case is easily extended to the multi-output case by simply considering each output as coming from a separate network defined on the same parameters.
3.1. Fuzzy learning rules In the previous section several learning rules have been proposed to train a set of linear parameters. The FNs are also linearly dependent on a weight vector and so these learning rules could be used to train this set of weights, and each weight would be stored as a rule confidence vector (the membership of the weight in the fuzzy output sets). For symmetric fuzzy output sets, this mapping between a weight and its rule confidence vector is invertible and when the fuzzy rules are defuzzified, the original weight is obtained. Hence the fuzzy networks are also learning equivalent to the B-spline networks, meaning that the outputs are the same after adaptation. 2° This method for training the rule confidences is indirect and it may be preferable to derive a more direct
7
5i(t)ey(t)ai(x(t))yL
(8)
Taking the dot product with yC gives
A w,(t-
1) = A c f ( t - 1)y c= 6,(t)ey(t)a,(x(t))llycll~,
which is Ily'll~ times greater than the corresponding weight training rule, and this factor would have to be incorporated into the learning rate. The learning rule is also undesirable from the point of view that only the output error is used to train the rule confidences, because the true rule confidences could be wrong and yet still generate the correct weight. This is because the new set of basis functions {aiY~}i.iare no longer linearly independent and so there exists (in general) an infinite number of global minima. Consider the modified learning rule for training the rule confidence vectors Ac~(t-1) = 5~(t)(~(t)- E
a/(x(t))ci(t-1))a~(x(t)),
J
(9) where ~(t) is the desired rule confidence vector at time t which is given by the membership of the desired model output in the fuzzy output sets. Taking the dot product of Eqn (9) with yC gives
AWl(t-
1) = Ac/r(t- 1)y ~= 5,(t)ey(t)ai(x(t))
due to the invertible relationship which exists between a weight and the corresponding rule confidence vector. Learning rule 9 is based on the error in the rule confidence vector and so a unique minimum exists in the rule confidence space, and the defuzzified a posteriori output is equivalent to training the weight using the standard LMS rule. Hence FNs trained using 9 are also learning equivalent to the B-spline networks. These rule confidence learning laws are instantaneous gradient-descent rules, and similar errorcorrection laws (as in the previous section) can easily be derived.
4. COMPARISON: TIME SERIES PREDICTION In this section, three networks which have been described (CMAC, B-splines, discrete fuzzy network) are used to model a noisy data set which was generated from a (known) simulated nonlinear time series. Firstly a description is given of the time series and then the network models are described and the results are presented. Finally a comparison is made between the three networks and the strong similarities which exist are emphasised.
S
P.E. AN et al.: ASSOCIATIVE MEMORY NEURAL NETWORKS
Fig. 5. Time series output surfacey(t) vs y ( t - 1) × y ( t - 2).
4. I.
The nonlinear
time series
Consider the two-input, single-output nonlinear time series described by y(t) = ( 0 . 8 - 0 . 5 e x p [ - y 2 ( / - 1)])y(t- 1)
- (0.3+0.9 e x p [ - y Z ( t - 1)])y(t- 2) + 0.1 sinQry(t- 1)) + q(t)
(10)
where the noise r/(t) is a Gaussian white sequence whose mean value is zero and variance is 0.01. Defining the network input to be x(t)= ( y ( t - 1 ) , y ( t - 2 ) ) r and the network output to be y (t), a set of two-input, singleoutput training samples can be collected which represents the noisy dynamics of eqn (10). The surface of the time series, across the domain [-1.5, 1.5] x [-1.5, 1.5], is shown in Fig. 5 and the input vectors which represent the noise-free, iterated time series, from the initial condition (0.1, 0.1) r are displayed in Fig. 6. The limit cycle displayed in Fig. 6 is a global attrac-
tor. It should also be noted that eqn (10) is linear with respect to y ( t - 2) but nonlinear with respect to y ( t - 1 ). It is possible to use such a priori knowledge in the construction of the networks, if available, as will be demonstrated in the following sections• The training set was constructed by collecting 1000 iterated noisy observations of Eqn (10), when started from the initial condition (0, 0) r. The noise forced the time series away from the unstable equilibrium at the origin• The distribution of the input vectors is shown in Fig. 7 and it should be noted that there is a sparse population of training points about the origin and the boundary, whereas there exists a large number of noisy examples lying near the limit cycle• This will test the ability of the networks to generalise to regions about which the data is sparse. This time series was taken from Ref. 5, where Radial Basis Function NNs and high-order polynomials were used to model the data.
1.50
1.50 l •
• y(t-1)
0.00
•
/ /
l
~':.
°
:'.
'_.--,? :"°
: "'~' .,-'. ' , i • "'- "~ ' . . . ; " " s
¢.:..7." -
y(t-11
°...~. • ..-.¢
.:,.: " .'X :" •
%. • o.?..~
: y.':
o~
0.00
'5'~'.'
: 4," j:
•
•. ~,,. , : ' ~ ,.'
.% ¢ . . . ~ :
.;,'i- • .'~.' "
: ".~. : ?
i. ~ ' . . t a . -1.50 -1.50
0.00
y(t-2)
Fig. 6. Iterated time series mapping from (0.1, 0.1)r.
1,50
the initial condition
-1.50 -1.50
0.d0
y(t-2)
1.50
Fig. 7. Training data scatter plot, 1000noisy examples.
P. E. AN et al.: ASSOCIATIVE MEMORY NEURAL NETWORKS 4.2. Network models
Contrary to much that is written about NNs and FNs, the majority of the algorithms are model-based algorithms adapting a set of parameters (often nonlinear optimisation) to achieve a desired behaviour. However because of the underlying flexibility of the model, they can be termed weak or soft modelling schemes. The basic structure influences the final behaviour of the network and so the parameters used in each of the three networks are now described. The C M A C network was designed with 17 uniform intervals lying in [-1.5, 1.5] on each of the input variables and /9 was set to be 17. This large value (relative to the number of intervals on each axis) was chosen because the underlying desired function is smooth and because choosing a large p increases the ability of the network to filter the noise in the training data. The lattice offset was chosen to be (1, 3) 13, which produced a model having 65 weights. A stochastic NLMS learning rule was used to train the weights. The univariate basis function shapes were chosen to be hat functions (similar to dilated B-splines of order 2) which were multiplied together to form multivariate basis functions. The output was then normalised with respect to the sum of the multivariate basis function output. These techniques--the generalised lattice displacement, NLMS learning rule, univariate hat functions, product combination and output normalisation-provide a smooth C M A C output and consistently good results. The effects of choosing different values of 19on learning speed and performance of the C M A C network are discussed in Ref. 1, and supported with experimental results. A theoretical treatment of the modelling capabilities of the CMAC with respect to p is given in Ref. 14. It was decided to show how a priori knowledge can be incorporated into the B-spline network, so 2 B-splines of order 2 were defined on the variable y ( t - 2) and 8 B-splines of order 3 (defined on 6 uniform intervals of width 0.5) were defined on y ( t - 1). This made it possible to model the dependency on y ( t - 2 ) exactly and to provide a model which was continuous and had a continuous derivative with respect to y ( t - 1). The basic model therefore has 16 weights and a generalisation parameter of 6, and a stochastic NLMS learning rule was used to train the weights. The FN used a discrete implementation with a fixed discretisation interval of 0.25. To produce sensible network outputs at the extremes of the required range, [-1.5, 1.5], one extra point on each side was required and hence the actual ranges used on each variable were [-1.75, 1.75]. An FN with a coarser discretisation was initially implemented but this produced a poor error autocorrelation plot. All fuzzy variables were represented by triangular fuzzy sets, centred on the measured value, with a width of 2 times the discretisation interval. This meant that there were 13 univariate fuzzy
9
sets defined on each axis and a total of 169 multivariate fuzzy sets. With a 2-dimensional input space this led to an equivalent generalisation parameter of 4. The learning rate was 0.05. These different network structures reflect the extent of a priori knowledge that is available. If it is known that the output is linear with respect to one or more variables, then it is possible to incorporate this knowledge into a B-spline network and also into the FN under certain conditions. It is difficult to incorporate this kind of knowledge about how a model generalises with respect to individual variables directly into the CMAC which sparsely codes the input space. 4.3. Results
The networks are evaluated by plotting the output surface in order to see how well they have learnt the solution surface. Another test is to iterate the networks from the initial condition (0.1, 0.1)r in order to see how well the networks have filtered the noise and approximated the underlying dynamics of the time series based only on noisy input/output data. A further test is to plot y(t) against t, which indicates how accurately (in magnitude) the networks have learnt to approximate the (noiseless) time series, and also gives an idea of the prediction horizon of the network. Other tests include plotting the Root Mean Square error (RMS) as the network learns the training data, and evaluating the network against noiseless test data. Finally a prediction error autocorrelation test is performed in order to assess how well the network has learnt the underlying data. For some of these tests, the result of only one of the networks is given, due to lack of space, although the results are similar for the other networks. Figure 8 shows the learning history of the FN when evaluated against the noisy training data and the noisefree test data. For both tests there is a large reduction in the initial error after only one learning cycle, and further passes through the training data allow the network to filter out the additive noise. Similar plots were obtained for the B-spline and CMAC network, demonstrating the very fast initial convergence rate that occurs in AMNs. Figure 9 shows the prediction error autocorrelation function for the CMAC network and the 95% confidence band. In general the autocorrelation function lies within this band, which indicates that the model is adequate. Again, similar results were obtained for the fuzzy and B-spline networks. Figure 10 shows the evolution of the B-spline model and the true time series from the initial condition (1.0, 0.5) r which lies close to the stable limit cycle. After 80-90 iterations, the model and the time series are beginning to go out of phase and the errors in the predicted values are starting to diverge significantly. Hence this quantity may be regarded as the prediction horizon for these networks. In order to highlight the differences between the networks, the output surface and the limit cycle figures
10
P . E . AN et al.: A S S O C I A T I V E M E M O R Y N E U R A L NETWORKS
were plotted for all three networks. The outputs of the CMAC, B-spline and fuzzy networks are shown in Figs 11, 12 and 13 respectively. The basic model structure is evident from looking at the output surfaces of these networks. The B-spline model output is clearly linear with repect to one of the variables, and has a continuous output and derivative with respect to the other variable. The fuzzy network output surface is continuous (apart from regions where there was no training data) and nearly piecewise linear within each interval, thus reflecting the shape of the fuzzy input sets. The output of the CMAC network is also very smooth and the piecewise linear nature of the basis functions is reflected in the output of the network. However, the large generalisation parameter has resulted in poor extrapolation ability; the approximation is bad where there exists little training data and learning is one-sided. The ability of the networks to approximate the underlying dynamics of the noiseless time series was compared by starting each of the networks from the initial condition (0.1, 0.1) r, letting them iterate autonomously and plotting the resulting limit cycle generated from the input data. Figures 14, 15 and 16 show the limit cycles generated for the CMAC, B-splines and fuzzy networks respectively. The (normalised) piecewise linear nature of the CMAC's basis functions is reflected in the shape of the limit cycle, where it is easy to see the derivative discontinuities. The overall shape of the limit cycle is a fairly good match and the fivearmed interior spiral, Fig. 6, is a reasonable approximation, due to the large generalisation parameter. The limit cycle formed by the fuzzy network is also a good match and again it has derivative discontinuities which are due to piecewise linear fuzzy sets. This time, however, the basis functions had a smaller width, which is reflected in the poorer ability of the network to construct the interior five-armed spiral. The network was unable to extract enough information from the sparse, noisy data close to the origin. The B-spline limit cycle,
however, produces a very good approximation to the limit cycle and to the five-armed spiral. This is due to the incorporation of a priori knowledge in the network which enables the network to correctly generalise and locally extrapolate to nearby states where the training data is sparse or non-existent. It is worth noting that a B-spline model constructed with no a priori knowledge performed similarly to the other two networks.t2 5. S O F T W A R E IMPLEMENTATION AND GRAPHICAL USER INTERFACE
The neural network research literature has traditionally been concentrated on a specific global network called the multi-layered perceptron (MLP), as is evident from the widely available commercial 22 and academic 23 software packages that simulate the MLP. This paper, on the other hand, places a strong emphasis on the local networks as they exhibit not only real-time learning abilities but also well-conditioned convergence properties. To be specific, the networks of interest in this paper are the C M A C network, B-spline network, and a certain class of fuzzy logic network. The selection of these networks is based on their similar network structures and learning strategies. While more attention has recently been focused on these networks,~7'24'2s their software implementation has not been documented in the literature. Besides few commercial software packages which are written explicitly for the fuzzy logic networks (e.g. CubiCalc), none is written explicitly for either the CMAC network or the B-spline network. Although these networks can be simulated using Ref. 26 if they are arranged in a multilayered structure, the weight adjustment procedure will become excessively inefficient. This justifies the need to have a well-balanced data structure which can preserve the real time learning abilities. In Section 2 a discrete and a continuous implementation technique were described for fuzzy logic networks. A discrete implementation is used here since
1.0000 RMS Error
0.0 0
Training Cyclo
Fig. 8. Fuzzy network learning history: training data (thin line), test data (thick line).
P. E. AN et al.: ASSOCIATIVE MEMORY N E U R A L NETWORKS
Residual Autocorrelation
1.0
11
A
0.0
I
-i.0 -20
Time Delay
2O
Fig. 9. Residualautocorrelationplot for the CMACafter 20 trainingcycles.
it is more suitable for adaptive modelling and the continuous case can also be approximated if required. A standard approach to this implementation is to create the relational matrix as a block of data and to use a single process to step through every element, calculating the effect of each element on the output. Although this sounds computationally intensive, in reality only a small number of elements (equivalent to the generalisation parameter p) have an effect on the output and any element which has no effect can be quickly detected and ignored. An alternative technique is to partition the relational matrix into p segments such that only one element is active in each segment at any time. This implementation technique is very similar to that used for CMAC or B-spline networks, with each segment of the relational matrix equivalent to a local layer, and the following software description then becomes equally
relevant to fuzzy logic networks. The organisation of the second half of this paper is as follows. In the next section, a common data structure is proposed for these networks, and its advantages/ disadvantages are discussed. Section 7 presents a set of high-level user interface procedures which can be used to initialise the network, train the network, and evaluate the network. These procedures decouple the internal network operation from the user so that the user's programming confidence, and in turn the ability to evaluate the network, can be significantly strengthened. Section 8 presents a graphical user-interface (GUI) for these networks. Its availability completely eliminates any user programming, provides extensive network evaluation facilities (such as the learning curve, the error autocorrelation, on-line design/ interrogation, and the network output plot), and sup-
1.5
yCt)
0.0
-1.5
I
T~
(t)
zoo
Fig. 10. Time history from (1.0, 0.5)r: B-spline prediction (thin line), desired (thick line).
12
P.E. AN et
al.:
ASSOCIATIVE MEMORY NEURAL NETWORKS
Fig. 11. CMAC output surfacc. ports certain off-line modelling applications. Section 9 summarises the effectiveness of the proposed software library and the GUI. 6. AMN DATA STRUCTURE Figure 17 shows the proposed data structure of these AMNs. The top level of the data structure is divided into a global parameter structure and p independent sets of local-layer parameter structures (p is the number of layers defined as one of the global parameters in the network), 26 and finally the I/O buffer structure. This arrangement is based on the fact that all the layers in these networks are identical in their structures and learning strategies, and each of these layers is defined by the same global parameters. The I/O buffer structure is allocated to handle the user
interface so that the entire network structure is hidden from the user. In the bottom level of the data structure, the global parameter structure is divided into a learning substructure and a network substructure because these substructures are independent of each other. The local-layer parameter structure is divided into a fixed substructure and a varying substructure. This emphasises which parameters are set when the network is initialised and which parameters are changed during learning. The proposed data structure has two desirable features. Firstly, it allows the user to use the same software and compare the performances of these AMNs easily. Secondly, based on the fact that the computations among the layers are mutually independent, the suggested data structure supports parallel
Fig. 12. B-spline output surface.
P. E. AN et al.: ASSOCIATIVE M E M O R Y N E U R A L NETWORKS
13
Fig. 13. Fuzzy output surface.
processing for the local-layer structures if each local layer structure and a copy of the global structure are implemented on a node of a SIMD (Single Instruction Multiple Data) parallel machine. This greatly reduces the network computation bottleneck, especially for high-dimensional input spaces. The additional memory requirement for storing the global structure in each processor is acceptable because the size of the global structure is very small (for example, consists of flags) compared to the local-layer structure (for example, consists of the entire weight vector and the random mapping, see section 6.1.2.). However, one disadvantage of this approach is that each univariate basis function will be evaluated several times on different layers although this is compensated for because there is no communication between the individual layers. The
1.50
S
y(t-1)
1 . . ..
0.00
-1.50 -1.50
0.00
y(t-2)
1.50
Fig. 15. Iterated dynamics of the B-spline network, initial condition (0.1, 0.1) r.
1.50
1.50
y(t-1) y(t-1)
0.00
$i '
/
/
/' , .
•'
0.00
"/
/
I
/
t'
.
• I" " -
i
.
a
/
.I
/ "'~ 4,,,.
1_.~/'" I
-1.50 -1.50 -1.50
0.00
y(t-2)
1.50
-1.50
I 0.00
y(t-2)
1.50
Fig. 16. Iterated dynamics of the fuzzy network, initial condition Fig. 14. Iterated dynamics of the CMAC, initial condition (0.1,0.1) r.
(0.1, 0.1) r.
14
P . E . AN el al.: A S S O C I A T I V E M E M O R Y N E U R A L N E T W O R K S
following section describes the features of both the top level and the bottom level of the data structure.
6.1. Global parameter definition 6.1.1. Global learning parameters The following three global learning parameters are usually fixed during learning. They are:
Rule selection--A global learning rule, applied to each of the active basis functions, must be selected before learning. Typical rules include LMS, NLMS, stochastic LMS, and stochastic NLMS, which altogether form a rule selection vector. Learning rate characteristic--Once the rule is chosen, an initial learning rate (6~mt) must be defined. The learning rate can be gradually reduced by a predefined decaying rate (6Occay),to a minimum value (~min)" (~init, bdecay , 6min) forms a learning characteristic vector. Dead-zone specification--A dead-zone (?') is used to ensure that adaptation will only take place if the absolute error between the network output and the actual process output is greater than Y. 6.1.2. Global network parameters The following six network parameters are fixed during learning.
1/0 Dimensionality--These parameters define the number of network inputs and outputs. Memory hashing flag--This flag allows the weight indexing to be generated by hashing. Hashing is a memory compression process, and is required so that learning is possible for a high-dimensional input and a small amount of available memory. Hashing can be implemented with Ref. 6 or without collisions (two independent inputs are mapped to a same adaptable parameter, which is undesirable). 27The corresponding trade-off is numerical complexity vs. approximation accuracy. It is important to notice that hashing is useful only if a small subset of the input space is of interest for
function approximation, otherwise a significant amount of collision is likely to occur. Knot density~distribution--The "density" parameter determines the density of the receptive fields in R". The receptive fields are commonly distributed uniformly in the input space although a nonlinear knot placement facility is provided. The number and position of the knots must be provided by the user. The addition of nonlinear knot placement introduces two extra procedures into the network calculation; the existing procedures remain unaltered. These procedures have the simple effect of normalising the input space so that it will still be treated as regular lattice. The first procedure finds the active interval on a given axis, the second determines the support of the active receptive field on the given axis. For fuzzy logic networks the knot distribution is equivalent to the distribution of the discrete points used to define the fuzzy sets. Unioariate field shape--For the B-spline network, the field shape is a (dilated) B-spline order k, however, the original CMAC has a rectangular field shape which means that the network output is only piecewise constant. Higher-order receptive fields (such as B-splines) can be used to increase the network resolution with respect to the input.t5 For fuzzy logic networks this is equivalent to the univariate fuzzy set shape defined on each axis. Multivariate field operator--Once the individual univariate field shapes are defined, a multivariate basis function (or multivariate fuzzy set) is formed using a multivariate field operator. This operator combines the output of the n individual univariate fields into a single number which defines the membership of the multivariate field. The operator can be either the minimum or product operator. The product operator will generally give a smoother output. However when n is large and p is small, the network precision can be badly conditioned due to numerical truncation. This problem can be avoided by normalising the network output with respect to the sum of active multivariate field outputs. (minimum, product) forms a selection vector for the
I CMAC Data
I Global Parameters I
Ikad-Zone
IX) Dimcmlonelity Random Mappin s Flag Knets demlty/distn'betlon Univarlate Reid Roll-off Multivariate Basis Fauetion
I
Local Layer Parameters ]
I
work Rule Sclectian Lelmlln8 Rate
Structure
Layor placeax~nt Random Mappin8
I
Active W©ii0~.Address Active Field Output Weight Values Weight Histmy
Number of layers
Fig. 17. Network data structure
I I/O Buffers [
States network inputs, outputs and derivatives
P. E. AN et al.: ASSOCIATIVE MEMORY NEURAL NETWORKS multivariate field operator.
Number of layers--For the B-spline network this parameter is determined by the order of the univariate B-spline, while for the C M A C network this parameter is determined by the width of the univariate receptive field. This is a critical parameter associated with both networks which affects both the generalisation ability and the size of the network. This parameter is not normally encountered in fuzzy systems implementation since the rule confidences or relational matrix elements are encoded as a block of data. To enable the same software to implement all three networks, the relational matrix is partitioned into p segments such that only one element is active in each segment at any time. For example, if an input variable is represented by a fuzzy set of width 3 times the discretisation width, then every third element in this axis of the relational matrix can belong to the same local layer and hence there are three independent layers required by this axis. If p~ is the ratio of set width to discretisation width for the ith input variable then the total number of layers is equal n to p : 1-ii=lPi. 6.2. Local-layer parameters Each of the p layer structures is divided into a fixed type and a varying type.
6.2.1. Fixed type The fixed parameters consist of n placement coordinate values for each layer in R n, and a memory hashing vector if the memory hashing flag is set. Layer placement--Each layer within the AMN is displaced with respect to each other on the ndimensional lattice. Each layer stores its own displacement vector. For B-spline and fuzzy logic networks, the layers are distributed at every possible position on the lattice, and so the displacement vectors are automatically generated. The C M A C network provides a sparse coding of the lattice and so the displacement of each layer is generated (using modulo arithmetic) from a single n-dimensional displacement vector supplied by the designer. For the standard CMAC this displacement vector is simply a unity vector although different displacement schemes, which provide a more-uniform distribution of receptive fields, can be found in Refs 10 and 28. Memory hashing--If the hashing flag is set a vector (g~, for the ith layer) with indices of random values is allocated, and is used to map any network input onto a weight vector. The number of elements in gi is the sum of the number of univariate receptive fields in the ith layer. By having these vectors generated independently for different layers, different layers of receptive fields map onto the same weight vector independently.
6.2.2. Varying type The varying parameters consist of the following:
Active weight address--This provides a temporary
15
storage for the current active weight address so that this address only needs to be computed once. If the knots are not distributed uniformly in R n, a search of active receptive fields is required in order to calculate the weight indices.
Active
receptive
field
output~derivative--This
provides a temporary storage for the current active receptive field outputs and its derivatives. Weight storage--A weight vector of size Pi (for the ith layer) times the number of network outputs is allocated to store the mapping. For the fuzzy logic network, a relational matrix instead is allocated to store the confidence values of all the rules associated between inputs and outputs in the ith layer. Weight history--A vector of size pi records the frequency of each weight being used during training. This is used in both stochastic learning and the "Display Memory Usage" procedure (Section 7). For the fuzzy logic network, this vector records the frequency of each rule vector being applied during training. 6.3. I/O B u f f e r These input/output buffers provide a temporary storage for the network inputs, the training outputs, the network outputs, and also the network derivatives which the user sets and interrogates using the high-level procedures. 7. AMN CODE STRUCTURE This section presents a set of high-level user procedures based on the suggested data structure. Initialise network--This procedure first reads in the global/local network parameters from a network specification file (an example for the CMAC network is shown in Table 1), and then allocates the data structure. At the end of the procedure, a network ID is returned to the user. A network ID is required to use these procedures for multiple networks. Table 1. CMAC network specification file Number_of_inputs: 2 Number of outputs: 1 Number_of_layers: 5 Number of interior_knots: First_knot_position: 0.0 Lastknot_position: 1.0 Hash_flag: no Displacement_vector: 1 2 Output_scaling: yes Multivariate_field_operator: product Dead zone: 0.0 Updat-e_rule: nlms delta init 0.5 decay-_rate 50.0 delta min 0.0 Initial_weight_zeroing: yes
I(~
P.E. AN et al.: ASSOCIATIVE MEMORY NEURAL NETWORKS Table 2. Pseudo code
Return global learning~network parameters--These procedures return the global learning parameters' values (such as the learning rate, learning rule, and the dead-zone specification), and the network parameters' values (such as the I/O dimension, knot density/ distribution) respectively. Return ith Weight~Weight History--These procedures return the weight values and the weights' history for the ith layer respectively. Return memory usage--This procedure examines the "weight history" vector associated with each layer, computes a number of weights being used in training, and finally returns this number to the user. This number indicates the average network efficiency in terms of weights used and the number of weights available. Set input~desired output--These procedures input the user's input vector and the desired output vector respectively, and store them in the I/O buffer. Calculate output--After the "Set Input" procedure is called, this procedure internally generates all the active weight addresses and active receptive field outputs, which are used to form a network output. The network output is then stored in the I / 0 buffer. Calculate derivative--After the "Set Input" procedure is called, this procedure generates all the active weight addresses and active receptive field outputs which are used to form a network derivative. The network derivative is then stored in the I/O buffer. Get output--After the "Calculate Output" procedure is called, this procedure returns the network output vector to the address given by the user. Get Derivative--After the "Calculate Derivative" procedure is called, this procedure returns the network derivative vector to the address given by the user. Adjust--After the "Set Input", "Set Desired Output" and "Calculate Output" procedures are called, this procedure adjusts the active weights according to the selected learning rule. Learn--After the "Set Input" and "Set Desired Output" procedures are called, this procedure calls "Calculate Output" followed by the "Adjust" procedure. Dump network--This procedure dumps both the global and the local parameters' values, including the entire set of weight values onto a network file for later use. Deallocate network--This procedure deallocates the entire data structure associated with the network ID. For clarity, a pseudo code was written (Table 2) to illustrate how to combine these procedures. First in the example, the network is initialised from the network specification file supplied by the user, and the network ID is returned. The network is then trained N times each by calling "Set Input", "Set Desired Output", and "Learn" procedures. After training, the code interrogates the network by calling the "Set Input", "Get Output", and/or "Get Derivative" procedures. The
User procedure starts here:
Network ID = lnitialise Network(network file) DO LOOP for i - 0 to N training samples { User Generated Desired (input, output) Set Input (Network ID, input) Set Desired Output (Network ID, output) learn (Network ID) }
Set Input (Network ID, input) Calculate Output (Network ID, input) Get Output (Network ID, output) Calculate Derivative (Network ID, input) Get Derivative (Network ID, derivative) Dump Network (Network ID, network file) Deallocate Network (Network ID) User procedure ends here:
trained network profile is then dumped onto a file. Finally, the network is deallocated. A summary of these procedures is shown in Table 3. With these procedures, the programming effort is greatly simplified. To reduce further the user responsibility, Section 8 presents a GUI so that the user can design, train and analyse the network without performing any programming.
8. G R A P H I C A L U S E R I N T E R F A C E
The main menu of the G U I is shown in Fig. 18. In the top portion of this menu, the user can select a network specification file. If the user does not specify any network file, a default file will be loaded. A " D u m p File" button is available which saves the modified network parameters on a specified file. Besides the loading and saving functions, the G U I consists of three sub-menus. The "Display Network Parameters" submenu initialises the network by either using a network specification file supplied by the user, or by using the on-line editing/design utilities. The details are covered in Section 8.1. The "Train the Network" sub-menu trains and tests the network, based on a given set of training and testing data. The details of this sub-menu are covered in Section 8.2. The "Display Weight Edit Panel" sub-menu allows the user to edit and analyse the weights/rule confidences, and is covered in Section 8.3. Within the GUI, error-checking procedures examine all the values entered by the user. 8.1. Design/edit network
Before training can take place, the network structure and learning rule must be specified. Designing a network is an iterative process; changes are made in order to improve the network's capability to learn the desired function. These changes will be based on the training results and any insight which can be obtained from analysis of the trained network. Hence it is
P. E. AN et al.: ASSOCIATIVE MEMORY NEURAL NETWORKS
necessary to be able to return to the design stage at any time. The tool described here has a pop-up Design/Edit window, which enables current network design parameters to be viewed or edited. In the absence of current network parameters a default net is loaded so that other tool functions can be defined. When invoked, the Design/Edit window will be in viewing mode, i.e. editing will be disabled; thus the window may be used purely for network parameter display. There is an "enable/disable edit" switch to allow parameter editing and network design. The Design/Edit window has a similar structure for all the networks. It is designed on the principle that each design parameter should depend only on those preceding it. In this way each design parameter will have the correct field format when they are entered in order, i.e. from top to bottom. With reference to the Design/Edit window pictured in Fig. 19 for the B-spline network, the first two parameters are the input and output dimensions. When the user enters the input dimension the correct number of spline order and knot position fields appear. Similarly, the number of dead zone fields corresponds to the output dimension. The number and format of the knot position fields is dependent upon the input dimension, the spline orders and the number of interior knots for each axis. For all the networks, the learning parameter fields depend on the selected learning rule. A facility is also provided for automatic generation of uniformly spaced knots. These may then be edited in the window. The automatic knot generation depends on the number of knots specified and their ranges for each input axis. The ranges for automatic knot generation are determined by entering the first and last knot values for each axis. Clicking on the automatic knot generation button calculates values for the intervening
knots, corresponding to uniform placement, and displays them in their respective fields. The "Save Network" button can either save the designed/edited network parameters onto a new network file, or overwrite the same network file (this saving utility is also available in the main menu). At the end of this submenu, a "finish viewing/editing" button will alert the user if the edited network parameters have not been saved.
8.2. Network training Once the network has been designed/edited, the network weights can be adapted, during the training phase, to approximate the desired input/output mapping based on a training data set. In order to monitor the training performance some testing is required. The ability to create intermediate data files during training allows communication with other sections of the tool, and possibly with other tools, for analysis and comparison of weight adaptation methods. Training data sets may be either obtained from existing data files or generated/saved within the GUI. These features are accessed via three subwindows.
8.2.1. Training from data file A subwindow is used to input the training data set and the training strategy. The file name for the data set must be specified and a particular strategy must be selected from a menu of available strategies, such as cyclic or random sampling of the data points. Other information such as the number of training cycles must also be supplied. A button on this subwindow then allows training to be started. Switches on this subwindow determine whether a training data error file is produced or if an MSE history during the training process is plotted for the user to monitor.
Table 3. A summary of high level procedures User procedure
Internal operation
Initialise network
read the parameters from a network file allocate memory for the data structure returns a network ID to the user returns the global learning parameters' values returns the global network parameters' values returns the weight values in the ith layer returns the weight history values in the ith layer stores the user's input in the I/O buffer stores the user's output returns the network output returns the network derivative generates active weight addresses generates active receptive field outputs generates/stores the network output generates active weight addresses generates active receptive field outputs generates/stores the network derivative adjust the active weights calls "Calculate Output" and "Adjust" procedures dumps the global/local parameters to a file dealloeates the structure
Return global learning parameters Return global network parameters Return ith weight Return ith weight history Set input Set desired output Get output Get derivative Calculate output Calculate derivative Adjust Learn Dump network Deallocate network
17
18
P.E. AN et al.: ASSOCIATIVE MEMORY NEURAL NETWORKS
I Enter d i r e c t o r y ] I EntEr f i l e name ]
-? /u/postg/mbrc~n/bspl/nets -7 t s l l t r a i n . n e "
Enter t i t l e fTr dlrnped f - l e > &fret 27: i^ainln 0 Enter duuipiriq t l e r:an,a-> tsll,dulnI.,.r~et
c>'nlins, 2S/O/rI2.
[ DisF.,la ~, Netuork Parar,eters ) ~ g h t
Edit Panel ]
f-c~q Fig. 18. Main panel.
8.2.2. Training data generation Training data generation allows the user to construct a simple process and generate an input/ouput data set for network training (Fig. 20). The data generation subwindow enables the user to edit a set of process parameters, such as damping ratio and natural frequency for a simple second-order linear process. A measurement device is also simulated by allowing the user to specify additive measurement noise. A switch then selects one of two possible simulation methods, open- or closed-loop. For open-loop stable systems, a series of step inputs are supplied by the GUI. In the case of systems which are not open-loop stable, a simple set of control gains (proportional integral and derivative) can be supplied by the user to enable the simulation to generated useful input/output data. In the case of the closed-loop simulation, the quality of control achieved is of little importance since this is simply a means of generating process data. Other information which must be supplied to this subwindow includes total simulation time, measurement time step and the data file name. A button then starts the simulation and produces the training data set. This facility is useful for testing and comparing the performance of the networks. For example, the performance of a particular network or adaptation strategy can be tested in the presence of changing process parameters or measurement noise.
8.2.3. Testing Testing is required in order to monitor network adaptation. A file name is specified by the user for the test data set and the network output predictions. Mean
I
squared error, absolute error or chi-squared analysis can be requested from this subwindow. A prediction error autocorrelation plot can also be requested to assess how well the network has learnt the training data set and whether training should continue. The results of any testing at this stage can be dumped to a file for use during subsequent analysis.
8.3. Analyse and edit weights/rules One of the major advantages in using AMNs for learning applications is the transparency of the network design. In this context, a network is transparent if the relationship between the network's output and the set of adjustable parameters is easy to understand and analyse. This is one of the major differences between AMNs and other neural networks (such as MLPs) where the relationship between the network's output and the set of weights is poorly understood. Such networks are termed opaque. This property is important because it allows designers to initialise and edit a set of rules. Indeed in the fuzzy logic field, the number of static fuzzy rulebases far outweighs the number of adaptive fuzzy networks. Therefore it is necessary to have design modules within the GUI which provide a convenient interface for initialising and editing the individual rules. Figure 21 gives an example of such a tool, designed for the B-spline network. The information requested and displayed will be presented in a variety of different formats according to which type of network is currently loaded into to the tool, and while this section concentrates on the B-spline network display, the differences between the other displays will be mentioned.
[Enter Dead ZonesJ up1 > EI.OObO00 Select Learning Rule I--films ~=Inlms I se~ms ~sanlms I Enter Learnin~ Parameters J ~nitial della > I.OQUO
Minimum delt~ > O.UUUI
[ Enable/Disable Edit )
[ Save Network )
[ Finished Edit/View Network J
Fig. 19. Design/edit window.
P. E. AN eI al.: A S S O C I A T I V E M E M O R Y N E U R A L N E T W O R K S
Pro~ess~
~
0
~
Save Training Data] Output File:
~
Loop Option--~ (Closed Loop 0p~io~ s t ~
19
Simu,ation)
trainingl.dat,
f rO
[Phase Plane] Process Options
File: Natural Frequency : 4.~ Damping Ratio : 0.2
trainingI.da~
:s.eoo re,see,A ......
l,eeee
{VVV
e.eeee
Open Loop Options I n i t i a l x : -G.O I n i t i a l x2 : 0.0
V v-
Step Input t : 6.0 Step Input 2 : -8.0 Step Input 3 : 4.0
e.@ee~
4 .eeee
@ ,eeee
Step Input 4 : -~.0 Fig. 20. Network training window.
Firstly at the top of the window a variety of static network parameters are displayed. This is simply a reminder for the designer. Then a set of fields are presented, one for each input axis. These fields remind the designer of the univariate spline order, the number of univariate rules and the support of the univariate rule set. A designer then selects a multivariate rule by specifying the univariate addresses (integers lying between 1 and the number of univariate rules) and the support of the selected rule is echoed to the screen.
Input Dimension = 2,
This scheme is analagous to choosing a multivariate fuzzy input set by specifying the individual univariate fuzzy linguistic variables. The CMAC addressing scheme is different because of the sparse distribution of the basis functions. The CMAC addressing scheme works by entering a particular network input and then selecting a particular overlay. The basis function which is non-zero on that overlay will then be selected. Associated with each basis function is a weight vector (length equal to the dimension of the network output)
Output Dimension = I
Spline order: Number of Rules: Support of Rules:
Input i 2 2 [-1.5000, 1.5000]
Input 2 3 8 [-1.5000, 1.5000]
[Enter Rule ]
-> 2
->
[-1.5000, 1.5000]
[-1.5008, O.OflO0)
Support of Selected Rule:
Output Weight Vector: -> -1.75827 Weight Vector Updated: > 9801 update weight vector coordinate (0 for whole vector) -> 0
[Enter Veighte) [Zero the Whole Veight Vector 1 [Find Next Unlnitialised Rule)
No of Uninitialised Rules:
[Enter Input Vector) i-> fl.00000 Corresponding Output Vector: 0.00551 Partial Derivative[output][input]: -1.17317
[Display Graph]
2->
(Finished Weight Editing]
Fig. 21. Rule design/edit window.
0.00000
0.35021
0
P . E . AN et al.: ASSOCIATIVE M E M O R Y N E U R A L NETWORKS
2(I
Vat2 l i e s I n [ - 1 4 9 7 5 0 ,
t,49750],
Ousput l l e s
I n [ -1.84Z45,
1 .8 5 9 8 1 ]
Dimeneic, n o{ ~raph: fi=i2O [~30 Type of display: I Global ~ L o c a l Enter tbe indices o{ the inpJt varlebles ( l to 2): > 1 Enter t i e oJtput variable to be p l o l t e d , only one output: [ Enter Variables ] Enter valJee f o r the remaining 8 "nputs: [Eqter Inputs]
[Enter Plottlng Resolution]
>
2~
3D Autoscallng ~ O { T i O n Enter tDe camera parameters ( f o r 3D only) Pitch -~ 0.0000 Yaw -> 0.0000
Roll ->
r 1
T 2
> 75.ooflfl
Height
>
B.O00n
>
Focal Length -> lOO.O00O [ Enter Camera Parameters ] Draw Graph ]
Fig. 22. Display graph window.
and a counter which remembers how many times this weight vector has been updated. The weight values can be inspected and edited by the designer and a non-zero value for the counter must be set. For the fuzzy networks a relational matrix (of dimension equal to the number of outputs x number of fuzzy sets defined on each output) is displayed instead of the weight vector. These facilities allow the designer to address individual rules and modify the information stored. In the window, the number of uninitialised rules is displayed in order to give the designer a measure of the completeness of the rule base. Also provided is a button which will automatically search for, and display, the next uninitialised rule. A set of facilities are provided which allow the user to investigate the output of the network. Firstly, for a particular input to the network, the output of the network and its partial derivative with respect to each input is displayed. This information can allow the user to determine numerically the sensitivity of the network with respect to individual rules. However, displaying information numerically is not always the best way, and so a "Display Graph" button is made available for plotting the local or global network surface in 2 or 3 dimensions (Fig. 22). The plotting options shown are self-explanatory. 9. CONCLUSION This paper has described in a unified mathematical framework a class of associative memory neural networks (AMN) together with associated software library and its Sunview based GUI for practical process modelling and control. Whilst the AMN initial input nonlinear mapping has to be specified, it results in very
fast learning convergence and guaranteed convergence to the minimum MSE. Such networks readily allow incorporation of a priori process knowledge, enabling the network to generalise and locally extrapolate correctly. The AMNs considered have similar structures, which makes it possible to represent them using a common data type. The software development has broadly followed object-oriented design principles: the data structure is highly modular which allows the network to be easily extended, and a set of interface procedures have been designed. One advantage of using AMNs is that their network computations are inherently parallel, which is reflected in the proposed data structure. The parallel implementation allows the memory requirements and the computational cost of the network to be distributed evenly across each layer of the data structure. Access to the data structure is controlled by the set of high-level procedures. This set of procedures covers all of the common neural network tasks, and so minimises the user's programming workload. Besides the software library, the proposed GUI also provides a convenient front-end for these procedures, and has three main functions. It first allows the user to design/edit any of these AMNs by various means, trains/tests the network according to the strategy specified by the user, and finally, allows the user to analyse or edit weights/rules. This software library and the GUI are expected to provide a complete set of AMN tools for neural network researchers, and may readily be incorporated into systems such as Matlab. Acknowledgements--The authors would like to thank
Lucas Aerospace, the CEC (ESPRIT II, no 2483), P R O M E T H E U S project, SERC and DRA (Farnborough) for the financial support of this work.
P. E. AN et al.: ASSOCIATIVE MEMORY NEURAL NETWORKS
REFERENCES 1. Tolle H and Ersfi E. Neurocontrol: learning control systems inspired by neuronal architectures and human problem solving. Lecture Notes in Control and Information Sciences 172. Springer, New York (1992). 2. Harris C. J., Brown M. and Moore C. G. Intelligent Control: Some Aspects o f Fuzzy Logic and Neural Networks. World Scientific Press, London (1992). 3. Lawrence A. J. and Harris C. J. A Label Driven CMAC Intelligent Control Strategy, IMC Colloq. Application of Neural Networks to Modelling and Control, Liverpool (1992). (Revised version is available from the authors.) 4. Cox M. G. Algorithms for spline curves and surfaces. NPL Report DITC 166/90 (1990). 5. Chen S. and Billings S. A. Neural networks for nonlinear dynamic system modelling and identification. Int. J. Control 56, 319-346 (1992). 6. Miller W. T., Glanz F. H. and Kraft L. G. Application of a general learning algorithm to the control of robotic manipulator. J. Robot. Res. 6, 84-98 (1987). 7. Miller W. T. Real-time application of neural networks for sensor-based control of robots with vision. IEEE Trans. on Systems, Man Cybernet. 19, 825-831 (1989). 8. Miller W. T. Real-time control of a biped walking robot. World Congress on Neural Networks, Portland, Oregon, Vol. 3, pp. 153-156 (1993). 9. An P. C. E., Harris C. J., Tribe R. and Clarke N. Aspects of neural networks in intelligent collision avoidance systems for Prometheus. Joint framework for information technology, pp. 129-135, University of Keele, U.K. (1993). 10. Arain M., Tribe R., An P. C. E. and Harris C. J. Action planning for the collision avoidance system using neural networks. Intelligent Vehicle Symposium, Tokyo, Japan (1993). ll. Kavli T. Learning principles in dynamic control, Thesis Institute for Informatics, University of Oslo (1992). 12. Brown M. and Harris C. J. The B-spline neurocontroller. In Parallel Processing in Real-Time Control, Rogers E. and Li Y., (Eds), Chap. 6. Prentice Hall, Englewood Cliffs, NJ (1992). 13. An P. C. E., Miller W. T. and Parks P. C. Design improvements in associative memories for cerebellar model articulation controllers (CMAC). Proc. Int. Conf. on Artificial Neural Networks, Helsinki, North Holland, Vol. 2, pp. 1207-1210 (1991).
21
14. Brown M., Harris C. J. and Parks P. C. The interpolation capabilities of the binary CMAC. Neural Networks. 6, 429-440 (1993). 15. Lane S. H., Handelman D. A. and Gelfand J. J. Theory and development of higher order CMAC neural networks. IEEE Cont. Sys Mag. April, pp. 23-30 (1992). 16. Kavli T. A S M O D - - A n algorithm for adaptive spline modelling of observation data, Technical Report, Centre for Industrial Research, Box 124 Blindern, 0314 Oslo 3, Norway (1992). 17. Brown M. and Harris C. J. A nonlinear adaptive controller: a comparison between fuzzy logic control and neurocontrol. IMA J. Math. Control Inform. 8/3, 239-265 (1991). (Revised version is available from the authors.) 18. Moore C. G. and Harris C. J. Indirect adaptive fuzzy control. Int. J. Control 56,441-469 (1992). 19. Widrow B. and Lehr M. A. 30 Years of adaptive neural networks: perceptron, madaline and backpropagation. Proc. IEEE 78/9, 1415-1441 (1990). 20. Brown M. and Harris C. J. Aspects of learning in asociative memory networks, PANORAMA Technical Report; (1992). 21. Parks P. C. and Militzer J. Convergence properties of associative memory storage for learning control systems. Automatica Remote Control 50,254-286 (1989). 22. NeuralWorks Professional II/Plus and NeuralWorks Explorer: Reference Guide, NeuralWare, Inc. 23. Ahalt S. C., Chen P., Chou C. T., Kuttuva S. and Little T. E. The neural shell: a neural network simulation tool. Engrg. Applic. Artif. lntell. 5, 183-192 (1992). 24. Kosko B. Neural Networks and Fuzzy Systems. Prentice Hall, Englewood Cliffs, NJ (1992). 25. Miller W. T., Glanz F. H. and Kraft L. G. CMAC: an associative neural network alternative to backpropagation. Proe. IEEE 78, 1561-1567 (1990). 26. Nielson R. H. Neurocomputing. Addison Wesley, Reading, MA (1989). 27. Tolle H. and Ersii E. Lecture Notes in Control and Information Sciences (Thoma M. and Wyner A., Eds). Springer, New York (1992). 28. Parks P. C. and Militzer J. Improved allocation of weights for associative memory storage in learning control systems. Proc. 1st IFAC Symp. on Design Methods o f Control Systems, Ziirich, Vol. 2, pp. 777-782. Pergamon Press, Oxford (1991).
AUTHORS' BIOGRAPHIES Chris Harris received degrees from the Universities of Leicester, Southampton and Oxford, and has held previous appointments at Hull, Manchester and Oxford Universities, as well as at the Royal Military College of Science (Cranfield). Currently he is the Lucas Professor of Aerospace Systems Engineering, and directs the A d v a n c e d Systems Research Group. His research interests are in theory and application of intelligent controls, multi-sensor data fusion, and intelligent systems architectures and c o m m a n d systems. Alan J. Lawrence was born in London in 1968. He received his B.Sc. degree from Southampton University, England, in 1990. He is currently writing his Ph.D. thesis in the subject of control systems engineering. His research interests are the automatic tuning of low-order controllers. Pak-Cheung Edgar An was born in H o n g Kong in 1963. He received his B.S. degree from the University of Mississippi in 1985, and M.S. and Ph.D. degrees from the University of New Hampshire in 1988 and 1991, all in Electrical Engineering. He is currently a post-doctoral fellow at the A d v a n c e d Systems Research G r o u p in the D e p a r t m e n t of Aeronautics and Astronautics at Southampton University, working on the pan-European Prometheus Project to provide intelligent vehicle manoeuvring for collision avoidance. Martin Brown was born in England in 1967. He received a B.Sc. from the University of Bath in England in 1989, and completed a Ph.D. at The University of Southampton in 1993. He is currently Senior Research Fellow at the A d v a n c e d Systems Research G r o u p in the D e p a r t m e n t of Aeronautics and Astronautics at Southampton University. His research interests include learning algorithms for modelling and control, the relationship between neural and fuzzy systems and the modelling capabilities of various neural algorithms. Chris Moore was born in England in 1965. H e received a B.Sc. in 1988 and a Ph.D. in 1992 from the D e p a r t m e n t of Aeronautics and Astronautics, University of Southampton. His current work involves the application of intelligent control techniques to the estimation and guidance levels of six-degree-of-freedom autonomous vehicles.