NEUROCOMPUTINC Neurocomputing 14 (1997) 139-156
Associating arbitrary-order energy functions to an artificial neural network Implications concerning the resolution of optimization problems G. Joya Depurtemenfo
Tecnologia
*,
Electrcinica.
M.A. Atencia, F. Sandoval Campus Teatinos,
Received 6 February
Uniuersidad
de Mklaga,
1995; accepted 3 1 January
29071 Mu’laga,
Spain
1996
Abstract We have studied the restrictions that a first order asynchronous feedback neural network must fulfill to be associated to an arbitrary order energy function of the kind described by Kobuchi [6], i.e., the network evolution is related to the descent to a minimum of such a function. These restrictions do not avoid the association of the even order energy functions to a first order network. However, for the odd order energy functions, most of the weights of each neuron must be zero. This result discards using first order neural networks for the solution of optimization problems associated with an odd order function, justifying in this way the use of high order neural networks. For these ones, we have obtained a general expression of their possible energy functions, which includes, as a special case, the high order generalization of Hopfield’s energy functions until now used, for example, in [5], [8]. Keywords:
network;
Asynchronous feedback neural network; Optimization Multinomial function order
problem; Energy function; High order neur,xl
1. Introduction
An artificial an optimization
neural network designed to work as an associative memory or to solve problem is a dynamic system, whose convergence and stability must be
* Corresponding author. Email: joya@tecmal .ctima.uma.es. 0925-2312/97/$17.00 PII
Copyright 0 1997 Elsevier Science B.V. All rights reserved
SO925-2312(96)00033-l
140
G. Joya et al./Neurocomputing
14 (1997)
139-156
guaranteed. The Lyapunov function method is mostly used to study these characteristics: using this method, the network evolution is related to the descent to a minimum of such Lyapunov function (also called energy function). In this way, Hopfield [3] presents a Lyapunov function ad hoc that describes the evolution of an asynchronous feedback network with zero valued self-weights and symmetrical first order weights. Braham and Hamblen [l] show that the convergence and stability of these networks may be guaranteed even if self-weights are non-negative. Kobuchi [6] builds a general frame for the generation of Lyapunov functions associated with a first order network, where the Hopfield function appears as a special case. Another direction for the generalization of Hopfield’s works consists in using high-order asynchronous feedback neural networks (whose neurons include as inputs, not only the state value of other neurons, but also products of these values). Ad hoc energy functions for these networks, with zero valued self-weights and symmetrical weights in all orders are proposed in [83, [9]. These functions are studied by Kobuchi [7], obtaining a theoretical support for the above conditions. To sum up, given an arbitrary order neural network, there is a well established method to find an energy function associated to that network. Our guideline is, nevertheless, different: we want to study the designing process of artificial neural networks to solve optimization problems. This process follows an inverse path to that of the process of finding an energy function for a given neural network. Thus, in the case of an optimization problem, one would start with the function to be optimized, which is identified as an energy function and, finally, one should find a network that evolves according to this energy function. In [4], [5], and [8], a direct way to find this network is exposed: a q-th degree multinomial function E(s) is an energy function of a (q - I)-th order asychronous feedback neural network. In other words, any q-th order optimization problem may be solved with some (q - l)-th order neural network. This method has the advantage of giving a network for every function to be optimized, but it has the disadvantage of using networks of high order. If this order is too high, the use of these networks is impossible, due to the computational and hardware implementation problems that arise. The main aim of this paper is to avoid this difficulty, trying to solve practical optimization problems with less complex networks than those used up to now. This will be possible if we can associate a q-th order energy function to a neural network of less than (q - I)-th order. Firstly, we study if a first order neural network (the most simple) may be associated with arbitrary order energy functions and the restrictions that such a network would have. We demonstrate that this is not possible in the case of odd order functions. The optimization of these functions must be carried out by high-order networks. This justifies our second task: to establish the general expression of the energy functions that may be associated to a q-th order neural network; this expression is a generalization of the one given by Kobuchi [6] for a first order network. In Section 2 we give some definitions and previous results, on which this work, nomenclature and results are based. In Section 3 we find the restrictions that a first order network must fulfill to be associated to an arbitrary order energy function. These restrictions will limit the set of optimization problems that may be solved by a first order network (Section 4), justifying the existence of high order neural networks. In Section 5 we find the general expression to obtain energy functions associated to a high order
G. Joyu et al. / Neurocomputing
14 (1997) 139-156
Ml
neural network. This energy is built upon a set of auxiliary functions fk; the form of these auxiliary functions will determine the energy function degree, and the restrictions to be satisfied by the network. In Section 6 we particularize the above results by using linear auxiliary functions, obtaining a formal justification of the so-called Hopfield’s high order energy function (and network conditions) referred to in [8] and [9], through a different way to Kobuchi’s. In Section 7, as an example, we study the optimization problem of a fourth order function that may be resolved by a third order neural network (already known method) or by a first order one (practical implication of this paper). In Section 8 we summarize the conclusions and possible future directions.
2. Definitions and previous results Definition 1. An artificial neural network (ANN) is an n neurons tin McCulloch-Pitts sense) system where the state value of the i-th neuron is si E (0,l). Definition
2. We call network stare or configuration to the vector s = (s,, . . . ,s,,) E
UAW. In the case of a q-th order asynchronous k-th neuron at instant t + 1 is:
feedback
s,(f + 1) = H(~kW)))
neural network,
the state of the
(1)
with
H(x) =
Oif x50 lifx>O
(2)
and
d,(s) =
5 c
wki,,...
.i-1 (i ,,..., ij)EC;
i.s3,*-Ae~st
”
I
-
‘k
that joins the product of neurons Wk,,,... i is the weight of the connection . . i to ‘the input of neuron k, and the second summation extends to all possible 11,129...7 / combinations of j elements out of n. The network is denoted as N(n,q,W,B), where W is the weight set, and 8 = (a,, . . . , 0,) is the threshold vector. (A conceptual explanation of these networks, though using a different formulation, may be found in [2]). In the particular case of first order networks, the Eq. (3) is reduced to: where
dk(S(f))
= iWkiSi(‘) i= I
- 8,.
(4)
To simplify the expressions, we will note Sj = 1 - si, and Si = si - Si = 2 si - 1 (S; is the opposite state value to si, and Si represents the translation from our network to another that uses states belonging to { - 1,l)“).
G. Joya et al./Neurocomputing
142
Definition 3. We call stare evaluation function
14 (1997) 139-156
E(s) to any function
from (0,l)”
to R.
Definition 4. Let J be a subset of N, = (1,. . . ,n). We define a difference function as A E,(s) = E(s) - E(s,), where sJ denotes the configuration resulting from changing in s the state of neurons whose index belongs to J:
(5) Every difference
function
is also a state function.
Definition 5. First and second Kobuchi’s conditions. (In [6], Kobuchi calls them first and second order conditions but we change the nomenclature for reasons of clarity). We say that a state evaluation function fulfills the first Kobuchi’s condition if A E,(s) + A Ek(sk) = 0. It means that the energy variation along a closed circuit is zero. We say that a state evaluation function fulfills the second Kobuchi’s condition if AE,(s) + AE,(s,) = A,!?,(s) + AE,(s,). It means that the energy variation between two network states is independent of the path followed from the original state to the final one. Theorem 1 161. Given the set of difference functions A&(s): {O,l}” + lR for each k E N,,, the first and second Kobuchi’s conditions are necessary and suficient for a state eualuation function to exist: E(s) : {O,1)” + R, so that A E,(s) = E(s) - E(s,). And the form of this function is: E(s) where i,,i,,
=
AEil(s)
+
AEiz(siI)
+
. . . , i, are the subindexes
AEi(st,iz)
+
...
+AEtJsi,,i,,....i,_,)
+
‘*
(6)
of the neurons that have output 1 in configuration
S.
Comment to Theorem 1. The difference functions used assign a real value to the transition between two configurations differing only in one neuron state (adjacent configurations). The significance of this Theorem lies in the fact that it allows to assign a real value to every configuration s, starting at a reference configuration (0.. .O> and adding the values associated to sucessive changes of an only neuron state, until the state s is reached. That is, we may obtain a state evaluation function from any set of n functions of s that fulfill Kobuchi’s conditions. Lyupunov condition. A state evaluation function E(s) will be a Lyapunov function for a neural network N(n,q, W, 19) if the change of state of any neuron k from sk to 5,, implies a decrease of E value from E(s) to E(s,), i.e. E(s,) < E(s). In our case, according to Definition 4, E(s) will be a Lyapunov function if its difference functions A E,(s) > 0. From now on, we will use Kobuchi’s criterium of building difference functions of the form A E,(s) = fk( - Tkd,(s)), where fk( x) > 0 when x > 0. This criterium guarantees that E(s) is a Lyapunov function: looking at Eq. (1) and Eq. (2), it is easy to prove that
G. Joya et al./Neurocomputing
14 (1997) 139-156
143
every change in sk neuron state (s,(t) # s,(t t 1)) implies S,(t)d,(s(t)) < 0. be guaranteed that this is the only way to build a Lyapunov function, but reasonable to think that d,(s) is a determining factor. It is worth noting functions fk are just auxiliary functions, without any special physical meaning. Using these auxiliary functions f,, in the case of first order networks, second Kobuchi’s conditions may be expressed as follows: First Kobuchi’s condition: fk(-Skdk(S))+f~($dk(S)-Wkk)=O Second Kobuchi’s fI@kdkW)
first and
k+,2,...,n).
(7)
condition: -%.d,( s > + Gkj) =/c,( -S,d,(s))
-fJ
k,jE(1,2
It cannot it seems that the
,...,
-t;(
-sjd,(S)
+ iijk)
n}
(8)
with iC,, = wajTk . F,, and we have the following
theorem:
Theorem 2 [6l. Let functions fk : R -+ R, k = 1,2, . . . ,n, be such that ifx > 0, fk( x) > 0. If the conditions given in Eq. (7) and Eq. (8) are fulfilled, then the network has a Lyapunoo energy function, and this function is: E(s) = is,f,(
-S,d,(s)
+ &+,,s,)
+ C.
(5))
i
j-1
Comment to Theorem 2. It allows us to generate energy functions network. The order of these functions will depend on the degree functions f,, which, in any case, must fulfill Kobuchi’s conditions.
for a first order of the auxiliary
Given Definition 6. Let s = (s ,, . . _,sn) be the configuration of a network N(n,q,W,B). si,, with c being a real number, we call the order of 7 and the expression T = csi,si, . . . we note it as O(T) to the number of different factors s, appearing. That is, o(cs;,si,. . . s,) = k. Definition 7. Given a function f,multinomial in s = (s,, . . ,s,>; that is, an addition where every addend is an expression like T; we call order of such a function to the maximum order of its terms, and we note it o( f ). From now on, we will restrict ourselves to the study of energy functions in s and functions fk(x), polynomial in X.
3. Restrictions
multinomial
on first order networks for the existence of energy functions
Theorem 2 provides us with the way to obtain several energy functions for a first order ANN. We will use functions fk(x), polynomial in X, so we will have f&j and E(s) multinomial in (s ,, . . . ,s,,). In this case, according to Eq. (91, the order of the function E(s) will be determined by the order of fk, so that: o( E(s))
5 o(fA
+ 1
( 10)
144
G. Joyu et (11./ Neurocomputing
14 (I 997) 139-156
because s; = sk for every i E N, and, therefore these products do not contribute to raise the order of the function. Generally speaking, if we do not impose any restriction to both the weights and the coefficients of fk, and if we have a sufficient number of neurons, the above inequality becomes equality. Initially, one could think that given a first order neural network, we could generate an arbitrary order energy function, only by looking for the fk of appropriate order. In this section we will demonstrate that this is not always possible, depending on the order of the functions fk, due to the restrictions that the Kobuchi’s conditions impose on the network weights. These restrictions may be summarized in the following theorem: Theorem 3. Let N(n,l,W,B) be a first order neural network. For an energy function E(s) (obtained from the m-th degree polynomial auxiliary functions fk(x) = Cicix’, i=O... m) to exist, the first Kobuchi’s condition imposes the following restrictions on the network weights: (a) If m is an even number, n - m + 1 weights of each neuron must be zero. (b) If m is an oa’d number, wkk= 2 c, _ , /mc, must be true, where c, _ , and c, are the coefJicients of terms (m - I)-th and m-th, respectively, for the functions fk, k = 1 . . . n. Proof. (a) Let us study the case of a neural network N(n,l,W,B), whose energy function must be built with m-th (even) degree auxiliary functions fk. We suppose m I n, because if n < m, m-th order weights cannot be obtained and so, network order would be lesser than m. The greatest order terms in the first Kobuchi’s condition (7) take the following form (see Appendix A): 2c,wki,wki, 2w,,(l
. . . wk,“,si, . . . s,,,, k # i,, . . . ,i,,
-m)c,wki
,...
wki,“_,stsi,...
s,,“_,
(11) k#i,
,...,
i,_,.
(12)
Terms of the form (1 I) cannot be cancelled out by any other, so its coefficients must be zero (for the first Kobuchi’s condition to be fulfilled for any s) and as c, # 0, we have: Wki, . . . wki,,,= 0
k# i ,,._.,
i,.
(13)
We must cancel out all possible combinations of m elements out of n - 1 weights wki (i # k); to cover all these combinations, we need to make zero at least n - m weights for each neuron k. To make zero the terms like (12), it must be fulfilled that, either wki,
. . . wkin,_I
=0
k#i,,...,i,,,_,
(14)
or wkk(l -112) =O.
(15)
This implies that n - m + 1 weights wki with i # k must be zero, or else wkk = 0. In
G. Joya et al./Neurocomputing
14 (19971 139-156
14.5
all situations, conditions (13) (14) and (15) together imply that n - m + 1 connections of each neuron must have null weights. (b) Let us study now the case of odd m, that is, fk(x) odd degree in X. The condition wkk = 2c, _ ,/mc, is directly obtained from Eq. (A.14) (see Appendix A). 0 Comment to Theorem 3. The key of this theorem is that it proves that an optimization problem of odd order cannot be solved by a first order network, justifying in this way the use of high order neural networks. A more detailed explanation is developed in the following Section. Corollary. In the conditions of Theorem 3, case (b), if we particularize for functions of the form fk(x) = a(x - b)“, then the first Kobuchi’s condition is equivalent to wkk := -26. Proof. The expansion of fk( x) = a(x - b)” produces the coefficients c, = a and = ( 2 mab/am) = 2 b. On the other hand, ma( b), so from Theorem 3, wkk c,_t= when replacing this equality in Eq. (A.4) all coefficients of x are cancelled out, that is, - 2 b is a necessary and sufficient condition for the first Kobuchi’s condition to Wkk = be fulfilled. 0
4. Energy functions and network order Let us remember that the target of our work is to resolve an optimization problem with a neural network as simple as possible. Eq. (9) given by Kobuchi, allows first order neural networks to have arbitrary order energy functions. This fact makes us think about the inverse way: given an arbitrary order - (m + I)-th function to be optimized, could we find a first order neural network which has such a function as its associated energy function? The first Kobuchi’s condition, as shown by Theorem 3, does not avoid this possibility when we have even order energy functions (functions to be optimized); moreover, that condition is guaranteed if using fk(x) = a( x + wkk/2)“. However, the application of the same method for optimization of odd order functions faces great obstacles: firstly, because of Eq. (lo), .fk must be chosen from even order ones (i.e., m-th); but, in that case, because of Theorem 3, n - m + 1 weights of neuron k must be zero, that is, at most m - 1 weights wki (j= l... m - 1) may be different from zero. The potential function of neuron k due to these m - 1 weights will be m-l dk(s)
=
c
Wkr,Si,-
'k
(16)
j=l
and fk(--ykdk(S) + xi< kwki~k * s,>, which is part of the expression of E(s) in Eq. (9) will produce a multinomial function in s whose greatest order terms will be of the form m-1 c . Sk
. I-IS,, j=l
where c is a constant. As a result, E(s) is m-th order at most, and, consequently, a (m + l)-th order energy cannot be obtained. Thus, if every neuron has only m - 1
G. Joya et al./Neurocomputing
146
14 (1997) 139-156
non-null weights, an energy function of order greater than m cannot be obtained by the application of Kobuchi’s formula. This result limits the set of optimization problems that may be solved by a first order neural network by using Kobuchi’s assumptions: those problems whose associated function is an odd order one cannot be solved by a first order neural network. We believe that this result justifies the use of high order neural networks; for instance, a particular optimization problem with a third order function (as in [8]), can only be solved by means of a second order neural network.
5. Lyapunov
functions
for high order neural networks
In the previous sections we have studied the possibilities of first order neural networks to solve any optimization problem. It has been shown that these networks cannot optimize odd order functions. From now on, this work will follow a different path: our target is still to solve an optimization problem with a neural network as simple as possible, but now this network must be a high order one. Thus, in this section we will search the expression of all possible energy functions that may be associated with a general high order neural network. We will use a generalization of the method described by Kobuchi - that associates several energy functions to a first order network. As shown in Eq. (3) in a high order neural network, neurons receive at their input not only outputs of other neurons, but also products of these outputs. Let us now find the general expression of the energy of an q-th order neural network. We start with the expression (6) which has no reference to “network order”, so it is valid for high order networks too. It is repeated here: E(s) =AE;l(s)
+AEiJsi,)
+AEi,(si,i,)
+ ... +A~ii,(si,r2..,ir_,)
+C,
(17)
where i,,i,, . . . ,i, are the subindexes of the neurons that have output 1 in configuration s, and difference functions are defined as: A&(s)
=.Q
(18)
-Q&(s))
with fk( x) > 0 if x > 0, as in the first order case. To develop Eq. (17) as an expression of auxiliary functions fk, let us firstly analyze the way a neuron potential varies when a change in network configuration occurs. The potential at neuron k resulting from a change of activation state of another s, and adding them back once neuron i is calculated substracting the terms containing substituted s, with S,:
j= C
d,(s;)= i
I(i,..
Wki,...
k c
j= I(i,i,
+e c j=l(i,i,
i,Si,...Si,-‘,-
.,i,)
Wkiil..
i,_ ,Gi,
wkii,
...i._,‘i 'i, . . . 'i,_
I
. . ..ij-.)
f . . ‘i,_,
,..., i,_,) 4
=dk(S)- C
C
j=l(i,i,,...,ij_,)
Wk;i,...i,_,‘isi,...sij_,.
(19)
G. Joya et al. / Neurocomputing
The potential
k resulting
at neuron 4
4(Sk) =d,(s) -
c j=
I(k,i
c
,,...,
14 (1997) 139-156
from a change of activation Wkkl,. .r,_,%r;, . ..s.
I
i,_,)
147
state of itself is:
_;
(20)
In general, after changing the state of a set of neurons {sh ) (with h, belonging set of indexes J), the potential of neuron k will be:
to the
q min(lJl,j) 4w
=4(s)
W
-
c j=
c
I
(-I)‘+’
I= I
k~,...i,p,...p_r~i,~“Si,Sp,.“Sp_,. I
To simplify
notation,
r(l,j,i,{i,
c
c
(i, i,) il,...,iFJp
(p I..... p,_,) ,,..., p,_,e(i ,....,
i,)
(21)
I
we will use the functions:
..A,})
=
W.. It ,.._
c
i,y,...p
I _,‘i’i,
...“i,sp,
I _,‘i’i,
...
s
P,-I’
(P,...P,-,) p,...p,-,dti,... {i,
H(l,j,i,{i,
. . . i,})
=
i,) i,)
c
W-. rr,...i,p,...p
‘ilSpl
s
P,-1.
(PI...P,-r)
p,...p,_,E(i,...i,) (i,
i,)
(22)
and Eq. (171, Eq. (18), Eq. (19), Eq. (20) and Eq. (21) we obtain
With these functions the following
expression
for the energy function:
i~(l,j,i,,{i,J)
E(S)=_&(-Sildil(s)) +fi, -si,diZ(S)+
i
1
j=l
q min(j.2) -Zi3diz(s)
+fi,
+
C
C
j=l
I=1
(-I)‘+’
C
(i, i,...i,Ef(S) i,...i,
y
+ . . . +fi,
-sip,,(s) [
X
r(l,j,i,,{i,
C (i,
+
min(j,r-
c
c
j=l
I-
. ..iJ)
(-l)‘+’
1
+ C
i,)
i,...i,Cf(s) i,...i,
1)
I
r(19j9i3,(il...il)) i,)
148
G. Joya et al./Neurocomputing
q =
F
fi,
-Sihdih(S)
+
mm(j.h-
C
[
14 (1997) 139-1.56
I)
C
t-l)‘+’
/= I
1
j=
i,, E I(s)
x
r(l,j,i,,(i,
C
. ..i.}) 1 + C,
(23)
(i,...i,) i, i,E/(s) i,...i,
where I(s) is the set of neurons whose state S, = 1 in configuration s. Summation for neurons with activation value 1 may be extended multiplying by every si, and the energy becomes 4 E(s)=isJ
min(j.i-
-S,d,(s)+C
I)
(-l)‘+’
C
i=l
j=l
to all neurons
H(l,j,i,(i,...i,})
C
I= I
i,...i,
[
.
i,)
(i,
I (24)
This equation gives the expression of the energy function of a q-th order asynchronous feedback neural network for general auxiliary functions fk. For Eq. (17) - and then Eq. (24) - to be fulfilled, difference functions must fulfill first and second Kobuchi’s conditions, whose implications on fk functions, in high order generalization, are as follows: (a) First Kobuchi’s condition: A&(s)
+ A&@,)
fk( -s’,d,W)
Developing
d,(s,)
fk( -sl,d,W)
= 0,
(25)
+fk(zkdk(Sk))
= 0.
(26)
we have:
+fk
Fdk(S)
-
(b) Second Kobuchi’s
zt
c
= 0.
wkki,...ij_,sl~~~sj-l
j= I(ki ,...i,_,
i
)
condition:
A&(s) + A&,(S,) = A&,(s) + AEk(Sh), fk( -Skdk(s))
+f,,(
(27)
i
+hd,@k))
Joining terms and developing
dk
=f,,(
(28)
-6,d,(s))
+fk(
-+kdk(%))’
C
Wkhi,...i,_IFkShSit
(29)
we have:
...
‘i,_,
j=l(h,i,...i,_,)
=fh(-Fkdk(S))-fh
-z,,d,(s)
+
i
C
j= l(k ,i ,...i,_,
whki,...ij_,shsks,~.“sij-, )
(30)
G. Joya et al./Neurocomputing
149
I4 (1997) 139-156
Eq. (241, Eq. (27) and Eq. (30) provide us with the general form of energy and first and second Kobuchi’s conditions, respectively, for any auxiliary functions fk and for any network order. Use of a particular fk will imply particular energy and conditions for the network. It may be easily proved that if we make the network order ~7= 1, these equations become the expressions of the energy and first and second conditions given by Kobuchi.
6. Linear functions
fk: Energy and restrictions
for a high order network
In this section we will particularize the above equations for the case of linear form functions fk. We will show that the energy happens to be the high order generalization of Hopfield’s one used, for instance, in [5], [8]. So, for fk(x) = A, x + B, with A, > 0 and B, 2 0, first and second Kobuchi’s conditions impose the following restrictions: (a> First Kobuchi’s condition: -Akzkdk(s)
+B,=
-A,Fkd,(s)
+A,?
C Wkkl,...,,_,‘i,...‘,,_,-BBk j=l(k,i ,... i,_,) (31)
and joining
terms:
24-Akwkk-
i j- 2(k,i,.
As expression
c
AkWkki,...r _1~1,... I
(32) must be fulfilled
2 B, - A,w,,
(32)
=O.
for any value of the combination
of si, we have:
= 0,
=o, I_ 1
A,w,,;,...,
SiI _,
i,_ ,)
(3’3) j=2
,...,
q,
(34)
and, so: Akwkk
Bk = -
Ak%r,...r
2
(315)
’
(36)
I _ 1=O.
As a conclusion, the first Kobuchi’s condition imposes first order self-weights greater or equal to zero, and higher order self-connections to be zero weighted. (b) Second Kobuchi’s condition:
-Ak
-S,d,(s)
+ Sk cq j-
c l(h,i,
wkhi,...i,_,shsiee~
_.i,_,)
‘i,_,
+
$
to be
150
G. Joya et al./Neurocomputing
-S,d,(s)
-A, and grouping
s,5 j=
c
l(k,i ,
139-156
+F
Whki,...r,_,SkSi,"'Sj-I i,_, 1
i
(37)
terms: 4
Aki:c .i=
+
i
14 (1997)
Wkhr
l(i ,...i,_,
,,__
t,_,skshsi,“*si,_,
=A/zC
C
j=
)
Whkr
I(i ,...i,_,
i,+h
,...
r,_ISkShSil-*-Si,_,.
)
i,#k
(38) As the first condition imposes We,,,, ,, i,_, and whk,, to k or h, respectively, we may write:
i
.i=
,,_ I to be zero if any ii is equal
(AkWkhrl...i,~,-AhWhk~l...~,~,)Si,~~~S,,_I=O.
c I(i ,...i,_,
(39)
)
i,+h.k
and as equality A
must be fulfilled
for any value of si we have:
r,- I =Ahwhkil...i,_,
bh,,
Therefore,
j=
1 ... 4.
(40)
the second condition imposes quasi-symmetry
between high order connec-
tions when fk is of linear form.
These constraints, here obtained from linear auxiliary functions fk are the same as those established ad hoc for the high order Hopfield network, and, in general, as those established by Kobuchi [7] for a q-th order network from a (q + l)-th order energy. On the other hand, these functions f, give the following energy function for high order networks:
-T;i
2siA;
E(S)=
i= 1
j=
i
q +tsiAj
j=
c
I
W,,,..,i,Si,...S,,f’i’i i,)
min(j,i-
c
i= I
C
I(;,
I)
(-l)‘+’
I= I
-
2
i
i= Ij=
C I(i,...
AiWir,,,
H(l,j,i,{i,...i,})+$
(i,...i,) i,...i,
[ =
c
j,SiS,,
*. . Si,t
i,)
which happens to be the Hopfield energy for a q-th order network. For instance, case of a second order network, we have:
(4’) for the
G. Joyu etal./Neurocomputing
~Aiw,js,sj -2
~A;w;jksls,sk
E(s) = - t i=]j=l
i=
n
+
I4 (19971 139-156
I(
+ iA;s,B;
jk)
i=
i=lj
+
I
n
C CwijA,s,s,
151
2 A;s;: i= I
n
+ C C CA,WijkS,S,Sk i=lj
-
C i=l
C
Atw,,ksisjsk.
(42)
(jk)
j.k
Taking
into account (43)
we obtain E(s)
= -CC
CA,W~~~S~S,S,-C~A~W,~S~.S~+CA,( i j>ik>j
8, -
i j>i
i
T)s,,
(44)
which, with Ai = A, and wii = 0 for every ;,j and restrictions obtained from first and second Kobuchi’s conditions, is the energy of a second order Hopfield network. 7. An example: function
Two neural networks
with different
orders and the same energy
In the first part of this paper we tried to associate a high order energy function to a first order neural network. Theorem 3 showed that this is impossible for odd order energy functions, but it does not avoid it with even order energy functions. In this section we will see an example of how to actually carry out this association: we will study a fourth order multinomial function, which will be an energy function of two neural networks. On the one hand, we will obtain a third order network by means of the already known method, like in [5], [8]. On the other hand, a first order network will be presented: it is obtained by matching the multinomial function to be optimized with Eq. (9), where we choose third order fk, due to Eq. (10). Let us consider the problem of optimizing the fourth order multinomial function E(s) that appears in Eq. (45) E(s)
= -0.125(
s, . s2 + s3 . s4) - 0.027( si . s3 + s2 . s4) 4
+
0.512( s, . s4 + s2 . s3) - 0.36 c
4
c
4
csi
. si . s/, + 0.72( s, . s2 . s3 . s4)
i=lj>ik>j (45)
Table 1 s configration
for E(s) to reach a local minimum
s
E(s)
(1,0,1,0)
- 0.027 - 0.027 -0.125 -0.125 0.000
(0.1,o,I) (0,0,1,1) (1,1,0,0) (O,O,O,O)
-
-
152
C. Joya et al./Neurocomputing
Table 2 Connections
14 (1997)
139-156
for the third order neural network
First order connections
Second order connections
Third order connections
“v,* = w,~ = W,d = W23 = w14 = wj4 =
~1,~~= w*,~ = ““‘3,2 = 0.36 WI24 = w 2,4 = wdlz = 0.36 = wd3, = 0.36 WI34 = w3,4 w2,_, = wjz4 = w423 = 0.36
W,234 = W2,,4 = w 3,24 = w1123 =
w2! = We, = Wq, = W3? = wd2 = wdl =
0.125 0.027 -0.512 -0.512 0.027 0.125
-0.72 -0.72 -0.72 -0.72
This function reaches its local minima at values of s shown in Table 1 and it may be considered as the energy associated to a neural network that describes the behaviour of a switch element with two inputs and two outputs, which must receive information through an only input and send it through an only output. If we associate the variable si with the input I,, s2 with output 0,, ss with 0, and sq with I,, configurations that produce a minimum of function E(s) (Table 1) correspond to good behaviour states of the switch element. The first neural network that has Eq. (45) as its energy function is the third order Hopfield network resulting from associating Eq. (45) to the generalized Hopfield energy function (46). E(s)
=
-~~w;~s;~s~-~~~w,,~s;s;s~-~~~~w,,~,s;s~~s~~s,. i j>i i i j>ik>j
j>ik>jl>k
(46)
In this case, a network with four neurons is obtained, with twelve first order connections, twelve second order connections and four third order connections that fulfill symmetry restrictions of any Hopfield network (Table 2). The second neural network associated to the energy function (45) is a first order feedback network obtained associating Eq. (45) with Eq. (9) appearing at Theorem 2, with f(x) = x3. In this case we will have a network with four neurons and twelve first order connections, whose weights (Table 3) fulfill Theorem 2 and Theorem 3(b) conditions. Besides, the corollary of Theorem 3 justifies that w,, = 0.
Table 3 Connections
for first order neural network
First order connections w ,* = W*, = 0 .5
W13 = w,., = W*3= NJ*4= W34 =
w31
--
w4, = Wj2 = W‘$z= W& =
0.3 - 0.8 -0.8 0.3 0.5
G. Joya et al./Neurocomputing
Table 4 Possible transitions Initial state
from state 01
14 (1997)
139-156
153
II Activated neuron
Next state
It is easy to observe that both networks behave in the same way; evolution paths for any initial state are the same. For instance, starting at state 0111 (activated input I2 and outputs 0, and 0,) both networks will end at state 0011, (activated input I, and output 0,) or at state 0101 (activated input I, and output 0,). The final state will be the former or the latter, depending on the random sequence of neuron activation (Table 4).
8. Conclusions The method established by Kobuchi to build energy functions for an asynchronous feedback network opens the way to solve optimization problems with neural networks of lesser order than those used up to now. Ideally, we would hope that these networks were first order. This possibility is explored in the first part of this work and we obtain the restrictions that the order of an optimization function - energy function - would impose on the weights of such a first order neural network. For even order energy functions, that is, odd order auxiliary functions fk,restrictions obtained from the first Kobuchi’s condition are very light, because they only impose a relation between self-weights and coefficients of highest order terms of f,. Particularly, the first condition is equivalent to wkk = -2 b, with uk > 0 for fk(x) = a,(~-b,)", and b, _<0. The function studied in Section 7 is a particular case of this kind of energy functions; it is possible to associate it with two neural networks of different order. On the other hand, obtaining odd order energy functions by means of even order auxiliary functions fkimposes the restriction that n - m + 1 weights of every neuron be zero, where n is the number of neurons and m is the degree in x of fk(x). Consequently, a first order neural network cannot solve optimization problems whose associated function is an odd order one. Then, it is justified using high order neural networks, despite the disadvantage of combinatorial growing of connectivity. This limitation of first order networks justifies dedicating the second part of this work to the generalization to high order of the above results, obtaining an expression that allows us to generate different energy functions for an arbitrary order network. A particularization of this method for linear functions fk(x) gives us an already known case: the Hopfield energy function for high order networks, with non-negative first order self-weights, null second and greater order self-weights, and quasi-symmetrical connections for every order. For future research, we will study the restrictions that non-linear functions fk would impose on networks of any order. Concluding, we hope to find the least order network that optimizes an arbitrary order function.
G. Joyu et al./Neurocomputing
154
14 (19971 139-156
Acknowledgements This work has been partially supported by the Spanish Comisicin Interministerial de Ciencia y Tecnologia (CICYT), Project No. TIC92-1325-PB. The authors also want to thank the referees for their collaboration and suggestions.
Appendix
A
From the auxiliary functions and odd power polynomials: f,(x)
= c c;x’; even i
i = 1,. . . ,m, we may define the even
f( X> = Cjcjxi,
f,(x)
= &?$.
(A.1)
odd;
Then, we have the following
relations:
f(-X)=~Ci(-~)‘=f,(*)-~~(~),
(A-2)
i-0
f(x-c)=
&+
f’&,
i=o
i-oj-0
;
q-q-j
0 mi-1
=f,(
x)
-tf,(
x)
+
c
. cc;
f
and if we call g,(x)
8cW
-c)i-J,
W)
= f( -x> +f( x - c>, adding Eq. (A.21 and Eq. (A.3), we have &i(j).,,-c)i-j
=
2f,( x) + t
=
2f,( x) +mc’n’
i=oj
5 i=j+
j=O
=
x’(
0
i=Oj-0
~o~~(i~,c;(~)(-c~'-~)
ci(;)(
-c)i-j
1
+
~oxj(2cj+i~,c~(;ioc~i~j)~ evenj
oddj
(A-4) We are looking for the terms of g,,,(Zkdd,(s)) with the greatest order in s. (a) If m is an even number, the term for j = m is:
and for j = m - 1 it is:
(Fkdk(s))m-‘c,m(
-wkk)
=
-w,,c,m~~-‘(d,(s))m-l
=
- wk,c,mFk(d,(s))m-‘.
(A.6)
G. Joya etal./NeurocomputinR
Replacing terms:
at Eq. (A.51 the greatest
2c,wki,wk,?.
14 (1997) 139-156
order terms of Eq. (4), we obtain the m order
. . wki,,,s,,. . . s;,,, k # i, . . . i,,
2c,wkkwk,,...wki,,,_,sks,,...s,,,,_,
155
(A.71
k+i,
...
i,-,.
(A.81
And, in the same way, from Eq. (A.6) we obtain: -WkkCmmSkWki,~~~Wkl,,,_,Sil
k#i,...i,_,,
.d,_,
-WkkCmmSkWkkWk,,~~~Wk~,,,_2SkS,,
order
terms,
(A. IO) as Xk. sk = sk. Replacing
k#i,...i,_,,
WkkCmmWki,“~Wkr,,,_,Si,~~~Si,,,_,
-2wkkc,,,mwk,,
k#i,,...i,_,
..“I,,,_,
of which just Eq. (A.9) may produce m-th Sk = 2s, - 1, at (A.9) we have terms like:
(A.91
. . .wk, “l_Isksi,. . . si,,,_,
(A.1 1)
k# i, . . . i,_,.
(A.12)
Terms of the form of Eq. (A.71, Eq. (A.8), Eq. (A.121 happen to be the only ones of m-th order. Terms (A.81 and (A.12) may be grouped by twos because they have the same variables s,, obtaining: 2W,,(
1 -
m)C,Wk;,
. . .Wk ,,,,_,SfS;,
. . . S,,,g_,
k z i, . . . i,_,.
(A.13)
Finally, the terms (A.7) and (A.13) are the only ones of greatest order. They are of m-th order. (b) If m is an odd number, from Eq. (A.4), the m-th order coefficient of g,Vll(Fkd,(s)) is zero and the (m - l)-th order coefficient is:
(A.14)
References [I] R. Braham and J.O. Hamblen, On the behavior of some associative neural networks, Biological Cybernetics 60 (1988) 145- 151, (21 CL. Giles and T. Maxwell, Learning, invariance, and generalization in high-order networks, Applied Optics 26(23) (1987) 4972-4978. I31 J.J. Hopfield, Neurons with graded response have collective computational properties those of two-state neurons, Proc. Nut. Acud. Sci. USA Rf (1984) 3088-3092. 141 J.J. Hopfield and D.W. Tank, Neural computation of decisions in optimization problems, Biological Cybernetics 52 (198.5) 141-152. [51 G. Joya, M.A. Atencia and F. Sandoval, Application of high-order Hopfield neural networks to the solution of diophantine equations, in: A. Prieto (ed.), Artificial Neural Networks. Lecrure Notes in Computer Science, No. 540 (Springer-Verlag. 1991) 395-400. [6] Y. Kobuchi, State evaluation functions and Lyapunov functions for neural networks, Neuwl Network:: 4 (1991) 505-510. [71 Y. Kobuchi and H. Kawai, Quasi-symmetric logic networks have Lyapunov functions, Proceedings of Int. Joint Conj on Neural Networks 2 (1991) 1379-1384. (81 T. Samad and P. Harper, High-order Hopfield and Tank optimization networks. Parallel Computing 16 (I 990) 287-292. [91 T.J. Sejnowski, Higher-order Boltzmann machines, in: J.S. Denker (ed.), Neural Networks for Cr,mputmg (American Institute of Physics, 1986).
G. Joya et al./Neurocomputing
14 (1997) 139-156
Gonzalo Joya was born in 1960 and he received his B.S. degree in Physics from the University of Granada (Spain) in 1982. He is an Assistant Professor in the Electronics Department of the University of Mdlaga (Spain) and a member of the Local Committee of the International Workshop on Neural Networks (IWANN’95) held in MBlaga, 1995. He works in the researching project “Control and optimization of communication networks with artificial neural networks” supported by Spanish Science and Technology Interministerial Committee (CICYT) and takes part in the Integrated Action “Computational property of the artificial neural networks: factorial structure and high-order” between the University of Mdlaga and the University Paris 1. His research interests include high-order neural networks and its applications to optimization problem, invariant pattern recognition and the modeling of reflect behavior in autonomous systems. Miguel A. Atencia Ruiz was born in 1966 and he is a system manager in the Hospital “Carlos Haya” at M&laga. In 1988 he received the diplomate degree in Comouter Engineering from the Universitv of Maaea. From 1988 to 1992. he was an i&estigati& supp&t technician in the Computeys Architecture and El&tronics Department, as well as in the Data Processing Laboratory, in the University of M&laga. Currently, he is carrying out his graduating project dealing with artificial neural network simulations in computer engineering at the University of MBlaga. He is a member of ALIA (Association of Computer Engineering Graduates).
i i : : i
Francisco Sandoval was born in Spain in 1947. He received the title of Telecommunication Engineering and the Ph.D. degree from the Technical University of Madrid, Spain, in 1972 and 1980 respectively. From 1972 to 1975 he was engaged in research on Silicon Integrated Circuits in the Department of Physics Electronics, UPM. In 1976 he joined the Electronic Department as an Assistant Professor, engaged in research and characterization and modeling of electro-optical devices. In 1981 he became a Lecturer in the Electronic Department, teaching and carrying out research in the fields of opto-electronics and integrated circuits. In 1990 he joined the Universidad of Maaga as Full Professor starting his research on Artificial Neural Networks (ANN). He is currently involved in VLSI design of ANN, application of ANN to Broad Band Communication, high order ANN and ANN learning models.