Model of competitive learning based upon a generalized energy function

798KB Sizes 0 Downloads 45 Views

Report

PDF Reader
Full Text

0893-6080/93 $6.00 + .00 Copyright © 1993 Pergamon PressLtd.

NeuralNetworks, Vol. 6, pp. 1095-1103, 1993 Printed in the USA. All rights reserved.

ORIGINAL CONTRIBUTION

Model of Competitive Learning Based Upon a Generalized Energy Function TADASHI MASUDA IndustrialProducts Research Institute

(Received 30 September [991; revised andaccepted 11 December 1992) Abstract-A neuralnetworkmodel of competitive learning wasproposed. In this model, output cells in thenetwork

were self-organized to represent the distribution of input pattern vectors. The self-organization was based upon a generalized energyfunction. The network was mathematically proved to converge to the global minimum of the energy function when the number of output cells is the same as that of input patterns. In this global minimum, a one-to-one correspondence was established between inputpatterns and output cells, and an output cell responded exclusively to its corresponding input pattern. The model was compared with conventional models of competitive learning orfeature detection. Typical behavior of the network was demonstrated by computer simulation, which included the case of clustered inputpatterns. Keywords-Competitive learning, Convergence, Global minimum, Potential function, Energy function, Feature detection.

1. INTRODUCfION

spond to a different input pattern. To avoid the state of monopoly, some models set the initial synaptic connection to be random (von der Malsburg, 1973; Amari & Takeuchi, 1978). However, in this case when the network fallsinto the state of monopoly it cannot escape from this state. Other modifications to the models are also proposed to avoid the state of monopoly. For example, Nakano, Niizama, and Omori (1989) introduced fatigue into output cells, while Bienenstock, Cooper, and Munro ( 1982) made the threshold of output cells variable. The effect of these modifications is, however, still unclear from the mathematical point of view. In this article, we proposed a new model of competitive learning based upon a generalized energy function. We proved that the network avoids the state of monopoly independent of the initial value of synaptic connection and independent of the distribution of input patterns. Typical behavior of the network was demonstrated by computer simulations. Wethen compared our model with other neural network models of competitive learning or feature detection.

Learning of neural networks is either supervised or unsupervised (Lippmann, 1987). In unsupervised learning, a network is expected to self-organize to a state that reflects the distribution of input patterns while a desired response is not given in an explicit form. Unsupervised learning therefore has a possibility of discovering unknown relationships among input patterns and can be a model of rule finding or concept formation. One type of unsupervised learning is competitive learning, in which output cells in the network affect inhibitory to each other (Rumelhart & Zipser, 1985). Various models of competitive learning have been proposed (von der Malsburg, 1973; Fukushima, 1975; Grossberg, 1976; Amari & Takeuchi, 1978; Amari, 1983; Rumelhart & Zipser, 1985; Hecht-Nielsen, 1987; Carpenter & Grossberg, 1988; Kohonen & Makisara, 1989). In these competitive learning models, a phenomenon of monopoly is a common problem. A monopoly is defined as a state in which a small number of output cells respond to all the input patterns and the other remaining cells never respond to any ofthe input patterns. When the number of output cells is the same as input patterns, each output cell is expected to re-

2. COMPETITIVE LEARNING MODEL Let XCi = (XCii, ... , x"K)(a = 1, ... , M) be M input pattern vectors with dimension K. Each component XCik of these vectors takes any real value. The neural network has two layers of cells. The first layer consists

Requests forreprintsshouldbe sent to Tadashi Masuda, National Institute of Bioscience and HumanTechnology, Agency ofIndustrial Science and Technology, Tsukuba, Ibaraki 305,Japan. 1095

T. Masuda

1096

afK input cells and the second includes N output cells (Fig. 1). The synaptic connection between the input and output cells is denoted by w/l = (W/ll' ... , W/lK)({3 = 1, ... , N). When an input pattern x, is presented to the network, the response 1I/l(xO/) of an output cell (3 is given by

the same distance, the response of these cells becomes II~CD\X",) = lin. We define the potential function for learning by

IIp(Xa) = Iwp-xO/I-~P;~=llw'Y-x"I-~}-I,

for an input pattern XO/, where y; is another positive integer parameter and W represents all the connection W/l ({3 ;:;:: 1, ... , N). Assuming input patterns X" are randomly presented to the network with probability p" (a == 1, ... , M), the energy (or loss) function of the network is defined by

(1)

where 1> is a positive integer parameter, which defines the response characteristics. When it is necessary to show the parameter ¢ explicitly, we will denote the response by II~"'l(xO/) instead of IItlx",). In a strict sense, it is inadequate to call wt/ a synaptic connection because the response of an output cell is defined by a distance IWt/ - x, I between the input XO/ and the connection W(3' In this article, however, we will use the term synaptic connection for convenience. This point is mentioned later in this section. From the definition in (1), the response IItlxO/) has a property .s; II/l( Xc.) ::0;; 1, and the total response of all the output cells is constant:

°

~~=lVp(X,,) = 1.

If there is a single output cell that has connection W /l = x, for a given input X"" the response becomes 1I/l(x",) = 1 for this cell and consequently 1I')'(xO/) = 0 for the other cells 'Y (";'{3). In the case ¢ - co, only the cells nearest to an input have a response not equal to O. When there are n cells nearest to an input pattern at

Input Vector

Input Layer

R(x a , W) = ~~~IVp(X,,)INlw,B- xO/I4>N

= N{~~-llw'Y - x"I-4>}-IN

E(W)

W)

= ~~IPa~~=IVp(x,,)INlw,B- x"I4>N.

(3)

In the case the number of output cells is equal to or greater than the number of input patterns (M .s; N), the minimum value of E(W) is 0, which is achieved by the following configuration; there exists at least one output cell {3 for each input pattern x"' such that this cell satisfies Iw{3 - x, I ;:;: 0 and the other cells 'Y (,,;.(3) consequently have a response 1I..' (xa ) == 0. In the case M;:;:: N, E(W) = 0 is achieved if a one-to-one correspondence between input patterns and output cells is established and wt/ = x, holds between the corresponding pairs of input patterns and output cells. We call this state a complete matching between input patterns and output cells. The rule of learning is derived from the potential function in (2): Liw,B

Output Layer

= ~~IP,.R(X",

(2)

= -cmBR(x", =

W)/Bw,B

-CmN¢,p-lVp(X")(,,,+llNlwp - X,,I(4)-2'''W(W,B- x,,), (4)

where em is a small positive real constant defining the rate of self-organization and the partial derivative R (x"' W) of by wf3 is

• •

•

Total Activity Constant

BR(x", W)/Bwp = (BR/Bwpl, ... , BR/Bw,BK).

1mt/ represents the change of the synaptic connection when an input pattern x, is presented to the network. As a result ofthis self-organization, synaptic connection w/l changes on average to decrease the energy E(W), which is an ensemble average of R(x", W) (Amari, 1977). In the following discussion, we assume a relation y; = ¢/2, which simplifies the learning rule as W{3

•

Xa

a=l .....M FIGURE1. Structure of a neural network for competitive learning. Cells in an input layer hold input pattern vectors x"' the component X.k of which takes any real value. An output layer consists 01 N cells. The total activity of the output cells is kept constant, and consequently the output cells are mutually competitive. The learning of the network is carried out by changing the synaptic connection w fJ between the cells in the input and output layers.

(5)

It is noted, however, that we can obtain similar results without this assumption. The rule of learning indicates that when an input pattern x, is presented the synaptic connection W{3 changes toward xo/. The magnitude of the change is proportional to vt/(X,,)(+2 l!
Competitive Learning Model

dicates that a cell nearer to an input pattern modifies its synaptic connection by a greater amount because if IWIl- x,,1 < Iw'Y - x",1 we have a relation VIl(X a)(+2 )/ \W Il-

xal> v'l'(x a) ( +2)l l w'l' - xal·

When only a single cell (3 has connection w il = x, for a given input x"', the response becomes IIp(X,,) = 1 for this cell and v-y{x",) = 0 for the other cells 'Y (1'{3). In this case, their connection w, ('Y l' (3) never changes. When each output cell has its corresponding input pattern and is organized to be wII = x, (the state of a complete matching), the connection Wil is not affected by the probability of the presentation Pa. After the self-organization of the network is completed, the recall of memorized patterns is simply carried out by choosing a cell that responds most strongly to a given input. The recalled pattern is then given by the synaptic connection wp of this cell. Our model can therefore be regarded as a kind of table look-up memories (Kohonen, 1980) . To define the response vll( x,,) of output cells in ( 1), we used a distance Iwil - X'" I. In the conventional models of neural networks , an output of a neuron is usually defined as a function of an inner product wII • x, of input pattern x, and synaptic connection wp. If we assume the norm of input patterns and that of synaptic connection to be constant, these two definitions become equivalent because we have a relation I w p - X" 12 = 2 2 1Wp 1 + 1x, 1 - 2wll' x". Most of the models of competitive learning assume these conditions (von del' Malsburg, 1973; Fukushima, 1975; Amari & Takeuchi, 1978; Rumelhart & Zipser, 1985; Hecht-Nielsen, 1987). In this article, we used a distance because the analysis of the convergence becomes possible. Kohonen and Makisara (1989) also used a distance to define the response of output cells.

3, BEHAVIOR OF THE MODEL Let us first assume that the number of output cells is the same as the number of input patterns (M = N) . In this case, if the parameter 1> is large enough the generalized energy function E( W) has no local minima and the network converges to the global minimum of the energy function where a complete matching between input patterns and output cells is established. The proof is given in Appendix A. We also show in Appendix D that in the case 1> is not large enough the network may have a local minimum of the energy function. The important point of this convergence property is that the parameter 1> must be large enough but it must not be infinite. If 1> is infinite, only the cell nearest to a given input pattern modifies its synaptic connection. In this case, once the network is in the state of monopoly the network cannot escape from this state and the en-

1097

ergy function does not converge to the global minimum. As cited in Section I, this is a common problem of competitive learning models. In our model, if we make the parameter 1> larger the network needs longer time to escape from the state of monopoly. When the number of input patterns is larger than the number of output cells (M > N),a one-to-one correspondence between input patterns and output cells cannot be established. In this case, however, we can prove that the network will be organized to the configuration that each output cell fJ has at least one input pattern x, that satisfies a relation vll(x a ) ~ t. The proof is given in Appendix B. This means each output cell has at least one input pattern to which the cell gives a stronger response than the other cells. In other words, no cell is organized to have a weak response of vp(xa ) < t to any of the input patterns. In the special case of the condition mentioned above, where the input patterns form several clusters and the number of the clusters is the same as the number of output cells, each cluster can be regarded as a single input as far as the diameter of the clusters is small compared with the distance between the clusters. The network is then organized to establish a one-to-one correspondence between output cells and the clusters of input patterns because the learning is based upon a continuous energy function and input patterns with small variability within clusters will yield a small deviation of the energy function compared with input patterns without variability. It should be noticed that with a larger value of 1> the diameter of the cluster must be smaller. When the number of input patterns is smaller than the number of output cells(M < N), each input pattern becomes to have its corresponding output cell and the energy function becomes O. Some output cells do not respond to any of the input patterns. No further change will occur after the network has reached this state. If there is a small variability around input patterns, however, these input patterns satisfy the condition mentioned above where M > N and the input patterns form several clusters. The network therefore convergesto the state where each cluster has at least one output cell and each cell responds to at least one input pattern.

4. COMPUTER SIMULATION To confirm the behavior of the model described in the previous section, we made a computer simulation. The detail of the results will be reported in a separate article . In this section, we show typical results obtained from three sets of input patterns. In the first simulation, input patterns were 10 alphabet characters consisting of 4 X 3 pixels as shown in the top row of Figure 2(a). These patterns were used by Matsuoka ( 1989) in a simulation of an associative memory. The initial synaptic connection w~ was gen-

1098

T. Masuda InputPatterns

(a) X,

X,

X,

x,

x,

X.

x,

X.

X,

XI'

ii~Blge~RDg Memorized Patterns Iteration

~

0

1000 2000

WI

W,

W,

W,

w,

W.

W,

W.

W,

w,.

IBil1a11111 R~BI~B~IUB R ~B e~RIIB

g • R

0 0.0:>w:;;0.4

3000 3250

•

(b)

g

O.4
inal patterns shown in the top row in Figures 2 and 3, one of the 12 pixels was changed from 0 to 1 or 1 to O. The 13 patterns including the original one comprised an individual cluster. The result is shown in Figure 3. The network successfully converged to establish a one-to-one correspondence between the clusters of input patterns and output cells. The energy function decreased monotonously, but it did not converge to 0 because the input patterns have a variability within the clusters. In the third simulation, the number of clusters was reduced to 5 whereas the number of output cells remained 10 (Fig. 4). Initially, five output cells were Ofganized to respond to the five clusters. During this phase, the energy function decreased rapidly. As the self-organization further proceeded, the other five output cells also became to respond to one of the input clusters. Finally, each cluster had two output cells re-

4.0..------------. ~ 3.0

r:.r>,

2.0

J

1.0

.....

ep=2 c..=O.Ol

(a)

InputPatterns

x,

x,

x,

x.

x,

x.

x,

x.

X,

Xw

ii~llge~11I1 Memorized Patterns Iteration

FIGURE 2. Simulation of memorizing 10 character patterns x, shown in the top row. The initial values of the synaptic connection w~ were randomly generated from a value between 0 and 1. As the learning proceeded, 10 cells in the output layer were self-organized to respond to different input patterns. Finally, 8 one-to-one correspondence was established between the input patterns and the output cells (a). The generalized energy function E(W) of the network decreased monotonously during the learning and converged to 0 when the one-to-one correspondence was established (b).

~

0

zooo 4000

sooo 7400

WI

W,

W,

W,

W,

W.

W,

w,

W,

WI'

9111111111 I~BBIBBRBD

R IB IIIDB B e~ B II g 0 O.OS:w SO.4 .0.4wS1.0

erated randomly from a value between 0 and 1. These initial values are shown in the row of an iteration O. As in the study by Matsuoka, we indicate the value of synaptic connection with three levels. The parameter of the model was ¢ = 2. We presented the 10 patterns in a random order with an equal probability Pet = 0.1. Figure 2(b) showsa changeof the generalized energy function. During the process of self-organization described in (5), the energy E (W) defined in (3) decreased monotonously to 0 after an iteration of 3,250. In this state, the network established a complete matching between the 10 input patterns and the 10 output cells and each cell was organized to respond exclusivelyto its corresponding input pattern. In the second simulation, we presented 10 clusters ofinput patterns. These clustersare generated from the 10 character patterns used in Figure 2. From the orig-

(b)

4.0......--------., ~ 3.0

~

2.0

~

1.0

ell

2000

4000

Iteration FIGURE 3. Simulation of memorizing input patterns that formed 10 clusters. Each cluster comprised one of the 10 original patterns shown in the top row and 12 patterns that are different in one pixel from the original patterns. Similarly to the results shown in Figure 2, 10 output cells also successfully converged to establish a one-to-one correspondence with the input pattern clusters. During the self-organization, the generalized energy function decreased monotonously but did not converge to 0 because of the variability of input patterns within the clusters.

1099

Competitive Learning Model

(a)

InputPatterns

xI

x,

X3

X,

x,

I1IRII Iteration

~

0 1000

WI

W.

w,

Memorized Patterns W. w, w , W 7

W.

W.

W,.

IlillHllll1 i1~RRlellel

II

2000

I

4000 5300

B II

~

o O.OSwSO.4 • 0.4
i1 g

• 0.7SwSl.O

(b) 1/1=2

§;' 3.0

c ",==0.01

~

>. 2.0

e.o

~ w

1.0

2000

4000

Iteration FIGURE 4. Simulation of memorizing input patterns that formed five clusters. In this case, the number of the clusters is less than the number of output cells. Initially, five output cells were organized to detect the five input pattern clusters. As the self· organization proceeded, the remaining five output cells were also organized to detect one of the input pattern clusters. Finally, each cluster had two output cells responsive to its input patterns. The generalized energy function decreased monotonously. After the first five cells were organized to respond to the five clusters, the rate of the decrease became smaller.

sponsive to it. During this phase, the energy function decreased relatively slowly. As in the result shown in Figure 3, the energy function did not converge to 0 because of the variability of input patterns within the clusters.

5. COMPARISON WITH OTHER MODELS We consider that the following four conditions should be required as the characteristics of competitive learning networks: 1. When the number of output cells is the same as the number of input patterns, the network establishes a one-to-one correspondence between input patterns and output cells. This means the network must avoid the state of monopoly.

2. The one-to-one correspondence is established independent of the probability p" of the presentation of input patterns. 3. The one-to-one correspondence is established even if there is a small variability around the input patterns. 4. The network has no prior knowledge concerning the configuration of input patterns, for example, the minimum distance between differentinput patterns. As mentioned in Section 1, when only the output cell nearest to a giveninput pattern modifiesits synaptic connection (Hecht-Nielsen, 1987) the network has the problem of monopoly and therefore does not satisfy condition 1. To avoid the problem of monopoly, some models revised the rule of learning such that the output cell nearest to a given input pattern modifies the synaptic connection only when the input pattern is within a distance determined before the self-organization (Carpenter & Grossberg, 1988). However,this modification is against condition 4. Ifthe minimum distance between input patterns is smaller than the one expected before the self-organization, the network cannot establish a one-to-one correspondence between input patterns and output cells.Some output cellsmay remain irresponsive to any input patterns. On the other hand, if the distance to input pattern is set too small and individual input patterns have variability around them the variability of input patterns attracts more than one output cell to a single input pattern and some input patterns do not have any output cells responsive to them. Amari (1983) proposed a different model of competitive learning. In his model, if the mutual inhibition between output cellsis weak or negligible and the output cells work independently the network falls into a state that all the output cells respond to a small number of input patterns and the other input patterns have no output cells responsive to them. This state is regarded as a monopoly of output cells by a small number of input patterns. Ifthe mutual inhibition is made stronger, the network suffersfrom a monopoly of input patterns, which was discussed in Section 1. In the model of topographical mapping, Kohonen ( 1982) and Kohonen and Makisara ( 1989) introduced a self-organization rule that not only the output cell nearest to an input pattern but also the output cells adjacent to this nearest cell modify their synaptic connection. Kohonen (1982) mathematically analyzed the ordering process of this model in the case that the output cells are arranged in a 1-D space. No further analysis has, however, been made for a general space with dimensions higher than 1. Some other models introduced fatigue into the characteristics of output cells and make the threshold of output cells variable with time (Bienenstock et al., 1982; Nakano et al., 1989). In this case, the number

T. Masuda

1100

of output cells responsive to an input pattern becomes proportional to the probability of the presentation POI' Consequently, an input pattern that appears less frequently may not have an output cell responsive to it. These models therefore do not satisfy condition 2. To satisfy condition 3 and establish a one-to-one correspondence between output cells and clusters of input patterns, it is sufficient for the neural network to have a rule of self-organization that is based upon a continuous potential function . An energy function is an ensemble average of a potential function weighted by a probability POI of a presentation of input patterns. In the models proposed by Fukushima (1975) and Rumelhart and Zipser (1985), only the rule of selforganization is given and a potential function underlying the rule is not clarified. Consequently, it is difficult to analyze the convergence of the self-organization.

Computer simulations for three sets ofinput patterns demonstrated the behavior of the network predicted by the mathematical analysis.

REFERENCES Amari, S. ( 1977). Neural th eory of association and concept-formation.

Biological Cybernetics, 26, 175-185. Amari, S., & Takeuchi, A. ( 1978) . Mathem atical theory on formation of category detecting ner ve cells. BiologicalCybernetics, 29, 127-

136. Amari, S. (1983). Field theory of self-organizing neural network. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13,

741-748, Bienenstock, E, L., Cooper, L. N" & Munro, P. W. (1982). Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual corte x. Journala/Neuroscience,

2,32-48 . Carpenter, G. A., & Grossberg, S. ( (988). The ART of adapt ive pattern recognition by a self-organizing neural network. IEEE Computer,

6. CONCLUSION

21,77-88.

We proposed a model of competitive learning based upon a generalized energy function and proved that when the network has the same number of output cells as that of input patterns the network con verges to the state where the global minimum of the energy function is achieved. In this state, a one-to-one correspondence is established between input patterns and output cells. The network thus avoids the state of monopoly, where a small number of output cells respond to all the input patterns and the other remaining cells never respond to any of the input patterns. The convergence of the self-organization was not proved for the case where the number of output cells is smaller than that of input patterns. However, we can expect the network to converge to the global minimum of the energy function when input patterns are distributed in a small number of clusters because the selforganization of the network is based upon a continuous energy function.

Fukushima, K. ( 1975). Cognitron: A self-organizing multilayered neural network. Biological Cybernetics. 20, 121-136 . Grossberg , S. (1976) . Adaptive pattern classification and universal recording : 1. Parallel development and coding of neural feature detect ors. Biological Cybernetics, 23, 121-134, Hecht-Nielsen, R. (1987 ), Counterpropagation networks. Applied

Optics. 26, 4979-4984. Kohonen , T. ( 1980 ). Content-addressablememories. Berlin: Springer. Kohonen , T. (1982). Analysis of a simple self-organ izing process. Biological Cybernetics. 44, 135-145. Kohonen , T., & Makisara, K. (1989). The self-organizing feature maps. PhysicoScripta, 39, 168-172 . Lippmann, R. P. ( 1987). An introduction to computing with neural nets. IEEE ASSP Magazine, 4, 4- 22. Matsuoka, K. (1989). An associative network with cross inhibitory connections. Biological Cybernetics, 61, 393-399. Nakano, K., Niizuma, M., & Omori, T. (1989). Model of neural visual system with self-organ izing cells. Biological Cybernetics,

60, 195-202. Rumelhart, D, E., & Zipser, D. ( 1985). Feature discovery by competit ive learning, Cognitive Science, 9, 75-112. von der Malsburg, C. (1973). Self-organization of or ientation sensitive cells in the striate cortex. Kybernelik, 14,85-100.

NOMENCLATURE

em E(W)

K M N Pa(a= 1, ... , M )

R(x, W) Vf3(X",),

wfJ

W

v~
= (WfJl' •• , , Wf3J{)«(3 ==

I, , . . , N)

small real constant defining the rate of self organization generalized energy function of the neural network indicating the degree of adaptation to input patterns dimension of input pattern vectors number of input patterns number of output cells presentation probability of input pattern x, potential function defining the self-organization rule of the neural network response of an output cell (3 to an input pattern X OI synaptic connection vector of an output cell (3 matrix representtion of synaptic connection vectors W (3 for all the output cells in the network

Competitive Learning Model

x, = (X"I,""

XaK)(a

1101

= 1, ... , M)

input pattern vectors presented during self-organization positive integerparameters defining the characteristics of the network.

rP,y; APPENDIX A: PROOF OF THE CONVERGENCE OF THE LEARNING

In this appendix, we discuss the case where the number of output cellsN is the same as the number of input patterns M. Wewill prove that if the parameter ¢ is large enough the energy function E(W) defined in (3) does not have any local minima and the network converges to the state that realizes the global minimum of the energy function independent of the initial value of synaptic connection w~. In this global minimum, the networkestablishes a complete matching betweeninput patterns and output cells. The equilibrium of the learning is given by the conditionthat the partial derivative of the energy function E (W) by wp is 0, that is,

Nextwediscuss case 2'. In thiscase,wewillshowthat the network does not satisfy the equilibrium condition (A.l ) unless the network establishes a complete matching betweeninput patterns and output cells. Fora given output cell (3, wedenoteby X.the corresponding input pattern that satisfies V~oo)(xa) = 1.In this case, when we make ¢ larger 14» I ) V ~ (x a ) approaches to 1 and the response v'" (xa) for the other cells approaches to 0. Wetherefore have (AA) Because the denominator ~~llw. - x.I-4> of v~(xc<) and v~(x.) is common to all the output cells w~ and w~, by multiplying this term to both sidesof( AA) we have (A.S)

holds for any output cells,where the pararneterv is set to be '" '" ,pI 2. When a complete matching between input patterns and output cellsis established, the energyfunction satisfies condition(A.I ). The state of a complete matching, therefore, is an equilibrium of the network. Moreover, because this state gives an isolated minimum of the energy E(W) this equilibrium is stable. To analyze the stability of other possible equilibria, we calculate the second derivative of the energy function with respect to Wpk (I ;s; {3 ;s; N, I ;s; k s; K): 8 2EI8w3k = 2N~~'IPaV~(xa)I4>+2)Nlw~ - x.I- 2 X {-(¢

+ 2)( I

- v~(x.))(WPk - X.k)2 + Iwp - xa

X{-(¢+2)(I-v~(x.))+K}.

(A.2)

(AJ)

If this summation is negative, 8 2EI8w3k < 0 holds for at least one of the ks and the network again becomes unstable. In the following discussion,we classify the configuration of input patterns and synaptic connection according to the value of v ~""( xa ) . Wewill provethat the network isnot in a stableequilibrium unlessa complete matching is establishedbetween input patterns and output cells. Weclassify the configuration into the following two cases: 1. There exists at least one output cell {3 such that for any input pattern x; it has a value V~"')(Xa) ;s; 1. 2. For each output cell {3, there exists an input pattern Xa such that v~"''(xa) > ~. Condition 2 is the negation of condition 1. Because the value v~"')(xa) can be either 0 or lIn (n = 1,2, ...) (see Appendix C), condition 2 becomes 2'. Fp!)each output cell w~, there existsan input pattern x, such that (x a )

jwp-xal- I >

= 1.

Let us first discuss case 1. We denote by {3 an output cell that s~tisfies this con~ition. In the case 11~"')(Xa) > 0, we have I v/'{xa):z. I - v~'" (x a ) > 0 (see AppendixC) becauseofthe condition v1""(x.);s; !. In the case v~"')(xa) = 0, we also have 1 - v~4>\Xa) > o for a large value of the parameter ¢. Therefore,in both the cases if we make ¢ sufficiently large we can increase the term (¢ + 2) (1 v~(xa)) to any extent. This means we can make {-(¢ + 2)( 1 v~(x.)) + K} in (A.3) to be negative. We then consider the term v~(xa)I4>+2'N in (A.3). Ifa complete matching between input patterns and output cells is not established, there exists at least one input x, such that x, "" w~ and 0 < v~¢)(x.) < 1 for any cells 'Y including {3. The term v~(Xa)(4>+2'/4> therefore becomes positive. In conclusion, the summation of the second derivative of the energy function ~f.182EI8w3k becomes negative for a large value of ¢. This means an equilibrium is unstable if the network satisfies condition 1.

\W~-xa\-I.

By multiplying this inequalityto each side of (A.S), Iw~

- Xal-';-I >

~~'I~+~lw~

- x.I-4>-I.

(A.6)

From the definition of vp(X.), vp(X.J 1';+2 l1¢1w~ - x.1

I2 } .

If this term is negative, the equilibrium of the network becomesunstable. By summing up this term with respect to k, we have

v~

Because the synaptic connection vector w~ is located nearer to the input pattern vector x, than any other connection w'l"

= 1w~ - x, 1-.;-1 {~l':11 w. - x.I-';} -14>·/·2l/4>.

By multiplying the term (A.6),

P;~llw.

- xal-4>} -1.;+2)/4> to both sides of

In this inequality, the equality holds when w~ = xa. By multiplying the probabilityPc< and summing up both the sidesof (A. 7) forall the input patterns x., we have ~;;~IPaVP(Xa)(·+2)N[ wp

- xal

- };~_IPa~~_I~+~V'l'(xa)(·+2)1·[w~ - xal ~ O.

(A.S)

The equality holds when an the output cells {3 have a relation wp = X. for their corresponding inputs Xa • It is noticed that {:I in (A.S) is determined by a, that is, {3 is a function of a. We then modify (A.S) in a form of a summation over output cells. Weagaindenote by x, the input pattern that corresponds to a given cell fl. In this case, C/ is a function of {3. Wethen have ~r'IPaV~(Xa)I';+2)f¢lw~

- xal - ~r'I~l':,,+'p.vp(x.)(¢+2JNlw~ - x.1 :z. O.

Because the left-hand side of this inequality is either positive or 0, a relation

holds for at least one (3. In this inequality, the equality holds under the condition w~ = x, and v~(x.) '" 0 (0'" C/). From (A.I ), the condition forthe equilibrium is

T. Masuda

1102

If we calculate the absolute valueof this vector, we havea relation

Wetherefore have v~~)(x.) < Iwp - ".I-~-I{nlwp - X.I-~-I

s: ~~Id ..'p.V~(X6)(~+2)/~lw~ -

x.1

+ ~~""+1 IW, -

(A.lO)

X.I-~-I} -I = V~~+I)(X.).

for any cells {3 because \a + b I s: Ia I + Ib I for any vectors a and b. To make inequalities (A.9) and (A.IO) compatible, the equality must hold in both the inequalities. This means there exists no equilibrium other than the state of a complete matching under condition 2'.

In the case Iwp - x.1 = 0, we have v~~)(x.) = V~~+I)(:x~. As a conclusion, in both caseswe havethe relation v~~)(x.) .s v~~ I)(X.).

APPENDIX B: CONVERGENCE OF THE NETWORK WHEN THE NUMBER OF INPUT PATTERNS IS GREATER THAN THAT OF OUTPUT CELLS

In this appendix, we will show that the energyfunction E(W) has a stable local minimum if the parameter r/J is sufficiently small. Let us assume that the number of input patterns M is the same as the number of output cells Nand M = N = 2K, where K is a dimension of input pattern vectors. Further, let the input vectors to be

We willshowin thisappendixthat whenthe number of input patterns isgreater than that of output cells eachoutput cell(3willbe organized to have at least one input pattern x, such that v~"')(x.) > ~. This meanseach output cell responds to someinput patternsmore strongly than the other cells. Let us assume there exists an output cell fi that has a response v~"')(x.) s; t for any of the input patterns x•. As shown in Appendix A, we have I - v~~)(x.) > 0. Consequently, the term {-( cf> + 2)( 1 - vp( x.) ) + K } in (A3) becomes negative when wemaketheparameter ¢ sufficiently large. Because the numberof input patterns is assumed to be greater than that of output cells, there exists at leastone input pattern x, that satisfies x, + w, 0. The summation of the second derivative of the energy function I f=, a2 E/aW~k thereforebecomes negative and the networkis unstable. As a conclusion, any output cells have at leastone input pattern x, such that v~"'J(x.) > t.

APPENDIX D: EXAMPLE OF THE FAILURE IN THE CONVERGENCE

xl=(I,O, ... ,O),

->-

(JJ,

output cells can only have a response

° or = 1,2, ...). A response v~"')(x.) = 1/11 > °meansthat there exist n output cells whoseconnection w/i is nearestto a given input x•. In thisappendix, v~"')(x.)

=

lin (n

w~

Consequently,

(0, 1,

= (0,

v~(x.) =

aE(W)law~

,0), ,0),

= (0, - I,

, 0),

= N -I and the initial values of the

... ,0)«(3

= 1, ... ,N).

N- 1 • From (A.I), we have

= 2N~~=IP.vP(X.)(~+2)/~(wp -

x.)

=0 forany cells(3. This meansthesewps givean equilibrium ofthe energy function E(W). Wewill show that this equilibrium is stable when r/J is small. The second derivative of the energy function E(W) with respect to w~ is a

2

EI aW~JaWpk x.I- 2{ -(¢

= 2N~~=IP.v~(X.)(~+2)Nlwp -

v~Ol(x.).s v~~+I\X.) s; v~"')(x.)

ifv~"'}(x.)= l/n>O(n= 1,2, ). Let us denote by W~ «(3 = I, ,n) the synaptic connection of n output cells nearest to an input x, and by w, (1' = n + I, ... , N) the connection of the other cells. We can then express the response as

X3 =

with an equal probability of P. synaptic connection to be

(3

we will show that when synaptic connection Wp and input patterns x, are given the response of output cellshas a relation

(-1,0,

X4

APPENDIX C: A LEMMA CONCERNING THE RESPONSE OF OUTPUT CELLS In the case ¢

X2=

X (wPJ - x.J)(Wpk - X.k) = 2N-(~+2)/¢~~=d

+ IiJklw~ - x.1

-(r/J + 2)( I -

+ 2)( 1- vp(x.» 2

}

N-I)(W~J -

x.)

X (Wpk - X.k)

where (j = k),

v~~)(x.)

= Iw~ -

x.I-~{~~llw. - x.I-~}-1

= Iwp -

wnl -~{ nlw~ - w.I-~ + .E~~Il+dw, - x.I-~} -J.

Consequently,

(j

In the case)

=

k,

a2E/aW~k

= 2N-(~+2)W~;~=I{ -(c/J + 2)( 1- N- 1) X (W~k

Assuming IW/i - x.1 + 0, a relation Iwp - xnl- I > [w, - x.I- 1 holds. By multiplying Iwp - x.I- 1 and [w, - x.I- 1 to the left-hand and the right-hand sides of the above expression, respectively, Iw~ - xnl-~-I -IlV~~)(x.)lwp - x.I-H

> v~~}(xn)~~=Il+dw> - x.I-~-'.

+ k).

- X.k)2

+ I}

= 2N-(~+2)/¢{ -2(cf> + 2)(1 = 2N-2(~+I)/~{N2 -

whereas in the case)

+k

- N-')

+ N}

2(¢ + 2)(N - I)},

+ Ojd,

Competitive LearningModel because (wpj - Xaj)(Wpk - xakl Fof'y oF {3 a 2E/aW,jaWPk = 2N~~a,Pa(t/>

1103

where a = 2N-2('+ll!¢{N 2 - 2(4) + 2)(N - I)} and b = 4(4) + 2) N -2(0+ I l!¢. The necessary and sufficient condition for this matrix

= O. + 2)vp(x a)(·+2 l/· v,(xa)!w, -

xa

l-2

X (w,j - X,,;)(Wpk - xad

= 2(4) + 2)N-2('+I)/·~~_I(w,j In the case)

= 4(t/> + 2)N- 2(O+ll/.,

whereas in the case) oF k

An equilibrium therefore becomes stable if the following matrix is positive definite: aOO

bOO

0

0 b 0

OOa

OOb

bOO

a a0 aaa aaa

bOO aba aab

bOO

oa

ab a aab

abO

a0

b

b a b b b a

a a a a 0 a a 0 a

o a a a 0 0 o a a

a b b b a b b b a

a b b Xaj)(Wpk - xakl.

= k, a2E/aW,kaWpk

to be positive definite is that any principal minors are positive. By rearranging the determinant, wehave

bOO ...

a b a ." o0 b b aa ob a oab a 0 a oa a oaa

Consequently, the positive definiteness can be judged by the sign of the following determinant:

a b b b a b b b a This determinant becomes positive when a > b, that is, a - b = 2W 2lO+I )"' {N 2 - 2(t/>

+ 2)(N

- I) - 2(t/>

+ 2)}

> O.

This condition is reduced to t/> < (N - 4)/2. If the parameter t/> satisfies this condition, the equilibrium becomes stable.

Model of competitive learning based upon a generalized energy function

Model of competitive learning based upon a generalized energy function

Recommend Documents