4!!!!9
NeuratNetworks, Vol.9, No.9,pp.1491-1495, 1996 Copyright ICJ 1996Elsevier ScienaLtd.Allrightsreserved PrintedinGreatBritain 0893-6080/96 $15.00+.131
Pergamon
PII: S0893-6080(%)OW4-O
NEURAL NETWORKS LETTER
Object Generationwith NeuralNetworks(WhenSpurious Memoriesare Useful) A. A. EZHOV1AND V. L. VVEDENSKY2 1TroitskInstituteof Innovativeand ThermonuclearInvestigations,Russia,and2KurchatovInstitute,Russia (Received 29 December 1995; revised and accepted 9 May 1996)
Ahatraet-Objectgeneration constitutes a new class of problems which can be solved using neural networks. It is reverse to classl~cation and makes a good use of the stable network moaks generally consiakred as undesired (spurious memories). Single-class networks are considered as basic elements instead of multiclass ones, and attractors of the former are treated as potential objects of the corresponding class. Development of multiple attractors reflects the network generalization abilities and converts the single-class network into the active generator of templates. Object generating networks con also be used as modtdiof recognitionsystemswhicharefreefrom some disadvantagesinherentin muiti-classdiscriminantnetworks.Copyright~ 19% Elsevier Science Ltd.
Keywords-objectgeneration,Spuriousmemory,Single-classnetwork,Activetemplategenerator,Generalization,Grammaticalinference. networks in information processing. Neural networks demonstrate nontrivial mapping of learned patterns onto the set of network attractors. Caianiello suggested one consider the latter as the stringsof primitive “letters” which identify the stored patterns (Crtianielloet al., 1992). Unfortunately, in this case again only few strings (attractors) form meaningful syllables.(Caianiello concluded that this is typical for any language.) In this letter we argue that aZl stable states of attractor neural networks (ANN) without exception are meaningfulwhen the networks are used to solve a new important object generation problem. These arguments can be easily extended to other neural architectures (including multilayer perceptions, MLP) where extraneous memory is responsible for the generalization abilities.
1. INTRODUCTION
The subtitle of thh letter is inspired by the article of Caianiello et al. (1992). “Can spurious states be useful?” Indeed, spurious memory is an “infant terrible” of neural models of content-addressable memory (Hopfield, 1982) because it can attract stimuli (externally evoked networks states to be recognized) and the meaning of these events is not clear. A number of interpretations of these “undesirable” modes of neural networks were suggested. ~ey were interpreted as alogical conclusions (Hopfield et al., 1983), “dreams” (Crick & Mitchison, 1983), prototype versions (Ezhov et al., 1990),empty class representatives (Ezhov, 1994),etc. In fact, Caianiello et al. (1992) discuss very typical behaviors of neural systems, which, when used to store a set of “images”, generate additional attractors. These attractors seem to be “useful” for representation of such objects as digits and characters. They reflect the creative role of neural
2. INVERSE AND DIRECT PROBLEMS
Acknowledgements:We are grateful to Professor Maria Marinarowho kindlysent us reprintsof Caianielloet al. (1992) and other papers,to A. Yu. Turygirtfor the DCT preprocessor wale, to Ya. B. Kazanovichprovidingus with proceedingsof ICANN’94and also to R. M. Borisyuk,Tony MartirtezjErnesto Della SalaandV. R. Chechetkinfor usefiddiscussion. Requestsfor reprintsshouldbe sent to A. A. Ezhov,Troitsk Instituteof InnovativeandThermonuclearInvestigations,Troitsk, MoscowRegion,Russia 142092.
Artificial neural networks are commonly used for classificationwhich can be considered as the direct problem. CZasszfication:given an object, find a class this object
belongs to. It is possible to formulate a corresponding rnverse problem. Object generation: given a class of objects, generate some object belonging to this class.
1492
1492
A. A. Ezhov and V. L. Vvedensky
It is possibleto formulate the direct problem of signaturesclassification.The classifyingANN must be able to recognizewhether a givensignaturebelongs to person A or to person B. The problem of object generation can be stated as: generate a signature that can be written by a person A. Here, two important points should be stressed. First, the simplest way to generate an object is to store correctly all the signatures and to reproduce a certain pattern as output. However, we believe that productive approach can generate a new version of a signature which, when classifiedby ANN should be recognized as belonging to the person A. The object generating network is expected to model the variability of the signatures really observed in the human drawing process. The classifying ANN can be considered as an analog of the reading system while the object generating ANN corresponds to the drawing system. It is preferable to call the latter system drawing rather than writing since the images but not the strings of characters are important. Writing the handwritten characters can also be considered as “drawing.” Second point is that the object generation implies non-unique solution and has to incorporate some kind of random process. Example:
Imagine that all vectors Xsare corrupted versions of only one prototype vector y. They can be considered as outputs of a noisy channel which transmits n times the same message y. To avoid ambiguity due to the invariance of the prototype vector with respect to the component inversion it is better to consider parallelity (Yiyj= 1) and antiparallelity (yiyj= –1) of the “spin”variables Yiand yj in the prototype messageY. We can show that the Hopfield network generates locally the most plausible versions of the prototype on the basis of the received messages. It is quite natural to introduce a probability of the parallelity and antiparallelity of spin variables of the vector y: . P;(Y) = (2n)-’ Z(1
Then, the matrix elements T;j can be written as
and the energy functional after simple algebra takes the form
=;($ ~ – N
I =
Spurious memories can be considered undesirable in the neural networks which are multi-class systems. However, one can use single-classnetworks. As a simple but important illustrative example consider a Hopfield network. The attempt to guarantee the stability of the prescribed set of patterns x$ = {x:, i = 1, ... , N}, s = 1, ... , n; X; G {–1, 1 } in the fully connected network with symmetrical Hebbian interconnections n
x:x;,
Tij = : ~
i, j = 1, ...,
N,
i #j,
(1)
$=1
~i = O,
i = 1, ..., N
NN
~
i= I j = 1
Tijxixj.
1 j# i
—N(N
1)
NN P;(1 + Xixj)+ ~ ~ P;(1 – i = 1 j# i
Xixj)
)
2“
(6)
Now imagine a prototype network with Hebbian interconnections generated by only prototype noncorrupted vector y (unknown for us). Let x be a state of the network with interconnectionmatrix (5). It can be seen from (6) that neglectinga constant factor and additive term energy, E(x) in this network will coincide with the probability P to find frustrated bond Tijxixj
(2)
leads to emergence of the stationary network states which are far from any “learned” pattern x’. This nontrivial mapping of the set {xS} onto the set of network attractors {ak} provokes splitting of the latter into the memorized patterns (which are closeto the states described by XS)and spurious memories. However, one can show that all network attractors can be naturally treated in the same manner using probabilistic interpretation of the network state energy functional (Ezhov et al., 1990): E(x) = –1/2 ~
i #j.
(4)
E(x)
3. SINGLE-CLASS ANN
+ xfx~) = 1/2(1 + Tij),
S=l
(3)
1 p = 2N(N– 1)
(7)
Since all bonds in the prototype state y of the prototype network are nonfrustrated by definition of Tij (for n = 1, Tijyiyj>O) minimum of the energY corresponds to minimal frustration probability. It will mean that all attractors of the network can be considered as locally best versions of prototype state y.
Object Generation with Neural Networks
We can conclude that it is adequate to consider a Hopfield network as a systemwhich is best suited to store single-classinformation sincein any case it will try to identz~yall stored vectors. [Similar considerations can be easily generalized to the networks with asymmetric interconnections (Ezhov et al., 1990), multi-state neurons (Ezhov et al., 1995),etc.] It is not only possible to consider neural networks (at least Hopfield networks) as single-classsystems but it is preferable to do this making irrelevant division of network memory states into “true” and “spurious” ones. Postdate: Only the objects of one class are to be located (stored) in one network. Corollary: All modes generated in this network can be onlythe versions of objects of this class.
Briefly speaking, if we store different images of character A in the A-class network, then all the output states of this network can be only some, may be chimerical versions of A, but not of B or C. Stationary states of the self-organizedsystemscan be considered as such output states. Somecomments are needed. First, it is very important to stress that the corollary by statement is not a trivial one. Indeed, in the recently published article by Gascuel et al. (1994) single-classnetworks were used to memorize the objects of the same class. However the spurious memories generated in such modular networks were still consiakred by the authors as unknown decisions.
Second, two conceptsof the object classare known (Lakoff, 1987). The older one declares that the objects are assigned to the same class provided they have common properties (monothetic classes). A more recent one (Rosch, 1973) states that human observer groups the objects in the same class, if they possess common prototype-typical representative of the class (polythetic classes).Here we conjecture that for assigning the objects into the same class it is enough to store them in the same place (i.e., in one network). No common properties, no common prototypes, no similarity of feature vectors are suggested. A clear analogy can be found in the set
theory where a set can be described both by the properties of the elements (analog of monothetic class) or by the list of elements (the list is similar to “common location” of elements).The set defined by the list of elements can contain completely different objects such as “horse”, “star”, “Gardian” or “Macintosh PC.” Of course, this reasoning brings us back to locally addressable rather than to distributed content-addressable memory. However this is not a trivial comeback, since the network is able to store distributed representations of all members of the given class. The single-classnetwork can be considered as an active template generator because its attractors form a set of templates of the
1493
objects belongingto this class. More important, these templates can be matched with external stimulus in parallel.
We suggestinformation processing systemshaving two parts. The right one contains a set of single-class modular networks while the left one stores the set of addresses of the networks from the right side. We shall also refer these addresses as nouns or names of the classeswhile states of the modular networks are particular images illustrating the corresponding noun. Now we describe the processes of classification and object generation performed by this twocomponent system. 4. CLASSIFICATION External stimulus(image)is presented simultaneously to all the single-class modular networks which constitute the right part of the system. Relaxation processes start and the fastest modulus, reaching equilibriumstate first, stops the process and activates the corresponding address state in the left part of the system. This is the result of classification. The procedure can be named as “fastest-producesanswer.” In other words the answer is the address label of a single-classmodulus evolving to the stable state in the shortest time. Commentary: The presented classificationprocess is not completelynew. A similar one was describedby Schwenk and Milgram (1994), who have used autoassociative perceptions (Cottrell et al., 1985)as single-classmoduli and trained them to recognize handwritten uppercase characters. Instead of selecting the fastest modulus, which can be done by feeding the network output back to input until the process converges, they simply compared output and input. They assumed that if unknown image of character “A” is presented to all the moduli then minimal distance between the network input and output will be observed just for the modulus trained to encode and decode “A” rather than for other characters. In our studies we have found that a similar truncated (single step) process is very effective for other applications, in particular, for ECG noise elimination with phasor neural networks. Schwenk and Milgram achieved a recognition rate about 95?Z0 which proves efficiency of the single-class neural networks and their competitiveness with other systemsin solvingclassificationproblems. Obviously,if the stimulus coincideswith the stable state of a certain single-classmodulus, then it comes into equilibriumimmediatelyand definesimage class. hence, if the single-classmoduli are good memories, they will recognize correctly every stored image. In order to provide ability to classify properly other stimuli it is desirable to generate in advance multiple stable states in the vicinity of prognosed images of
1494
A. A. Ezhov and V. L. Vvedensky
this class. Generation of these additional memories indicate generalization abilities of the single-class networks and its competitiveness in recognition of new stimuli. In some sense such a system can be considered as an analog of an immune system where antibodies are generated in advance in order to be ready to bind a completely new antigen. 5. OBJECT GENERATION
?y
When the single-classnetwork is trained and all the images assigned to this class are stored, the object generation can be started. After random input the network finds a stable state which in general can be “spurious” and represent some forecast object of the given class. In the suggested two-part system, the process starts with activization of the “word” or “noun” indicating class in the left part, which sendsa random stimulus to the corresponding right-side single-classnetwork. The equilibrium state reached by the network represents a version of the object of this class. 6. EXAMPLE
As examples of class “9”, nine black and white images of digit “9” were used. Each image was encoded using 105-component DCT-code (digital cosine transform code). Corresponding 105-component feature vectors x were used to construct the Hebb-like synapticmatrix of a phasor neural network with clock neurons (Noest, 1988): T~j = ~
,=1
ei(d; – 4;);
Tkk= O;
k #j,
k = 1, ..., N
q$= 27H:; s = 1, ..., n.
(8) (9)
(lo)
In (10) x; denote the kth DCT component of sth image, N = 105.The use of DCT givesa possibilityto visualizeattractors of the network because it permits reconstruction of images from corresponding state vectors. Figure 1 represents a set of nine images stored in the single-classnetwork having the name “nine” (top). The top line of the framed “attractors” represents a network stable state attracting corresponding stored images (indicated by arrows). Other network attractors can be found observing evolution of the network from random initial states to the final stable states. The bottom line of framed “attractors” contains some of the network stable states found in the object generation process after presentation of a set of random stimuli. Some of these attractors coincidewith the images stored in the network. Other
qqg
~
R.4N’DOM STIMUU
TEY1’EDIMAGES FIGURE 1. Object generation in phasor neural network. Top: nine black-whiteimages of handwrittendigit “9” DCT code of which were used to compute the networkinterconnections(8); top framed attractors:stablenetworkstatesattractingthe stored images (arrowsshowthecorrespondencebetweaninputimages and final outputstabla states);bottomframad attractors:some networkstablestatesfound after presentationof random stimuli (objectgeneration);bottom:Test images of “9” which were not used to form interconnectionsbut, neverthelessare attractedto tha network stables states which do not attract the learned images (top).Thia illustrate activegenerationof new templatas of “9”. Question marks indicateattractorswhichpotentiallycan be used for matchingof other images of 9 (they are raally “9Iika”).
attractors are completelynew, though they are not in conflict with our notion of how the digit “9” might look like. We can consider these new attractors as versions of digit “9” generated in an object generation process. Maybe some of these new attractors can correspond to previously unforseen images of “9”? We checked a test set of 50 handwritten images of “9” not used for the network formation and observe that some of them do evolve to the new attractors found in the object generation process. The bottom line in the figure contains test images of “9” which were attracted by those stable states which did not attract images of the top line. One can conclude that the object generation process is responsible for generalization abilities of a network, since it forecasts a feature space regions where potential examples of “9” can appear. Commentary: Similar examples can be presented for other neural architectures which can processmore complicated data structures, for example, recursive autoassociative memories (Pollack, 1990).
7.
CONCLUSION
The advantages of suggested single-class object generating networks are obvious. When training multi-class neural networks (e.g. multilayer perceptions) one has to avoid contradictions in the data set since two identical vectors cannot be assigned to different classes. When vectors are different though
1495
Object Generation with Neural Networks
close in configuration space the training can be complicated. This problem disappears in multimodulus single-class network systems. The same example can be assigned to different classes (for example, differentpeople can recognizethe same digit image as 3 or 5). In our systemthe conflictmanifests itself as the simultaneous answer of two different moduli which can be interpreted as “3 or 5.” The concept of object generation can be relevant to the well known phenomenon of easy interpretation of chimerical patterns inherent in human dreams (for example, people can firmlyrecognizea rather strange fantastic city as London or Rome, etc.). It can be explained under the assumption that during dream sleep the images are not recognized (classified)but, just the opposite, the objects are generated and interpretation (class,London) precedes the formation of a corresponding chimerical image. The treatment of object generating single-class neural systemsleads us to the syntactic rather than to discriminant recognition systems (Fu, 1982).Indeed, it is possible to consider them as nonclassical grammatical inference systems. Every object stored in a single-classnetwork can be considered as initially grammaticallycorrect though a corrupted sentenceof a given language-class(see similar considerations in the Hopfield system above). All attractors generated in this network can be treated as new grammatically correct sentences of the same language. We can interpret formation of the landscape of network configuration space as the “instantaneous” inference of all such grammatically correct sentencesof a given language. The single-classobject generating networks can produce new language structures without explicit extraction of grammatical rules on the basis of examples which can be contradictory and cannot suggest any underlying grammatics.
REFERENCES Caianiello,E. R., Ceccarelli,M., & Marinaro,M. (1992). Can spuriousstatesbe useful?Complex Systems, 6, 1–12. Cottrell,G. W., Munro,P., & Zipser,D. (1985).Learninginternal representationsfrom gray-scale images: An example of extensional programming. In: Proceedings Ninth Annual Conference of the Cognitive Science Society, Irvine,CA. Crick,F., & Mitehison,G. (1983).The functionof dreamsleep. Nature, 304, 111–114.
Ezhov, A. A. (1994). Empty classes, predictiveand clustering thinkingnetworks.Neural Network World, 4, 671-688. Ezhov, A. A., Kalambet,Yu. A., & Knizhnikiva,L. A. (1990). Neural networks:general propertiesand particularapplications. In: A. Holden& V. Kryukov(Eds), Neuralnetworks— theory and architecture (pp. 39-47). Manchester,Manchester UniversityPress. Ezhov,A. A., Chechetkin,V. R., Knizlmikova,L. A., Turygin,A. Yu, Molochkov,V. A., & Ezhova, M. N. (1995). Energyminimizingneuralnetworkwith symbol-valuedunits and its applicationto the amlysis of noisy symbolic data. Neural Network World, 5, 271–286.
Fu, K. S. (1982). Sequential pattern recognition and applications. EnglewoodCliffs,NJ: Prentice-Hall. Gascuel,J.-D., Moobed, B., & Weinfeld,M. (1994).An internal meehanismfor detecting parasite attractorsin a Hopfield network.Neural Computation, 6, 902-915. Hopfield,J. J. (1982).Neuralnetworksandphysicalsystemswith emergent collective computationalabilities. Proceedings OJ National Academy of Sciences USA, 79, 2554-2558.
Hopfield, J. J., Feinsteinj D. I., & Palmer, R. G. (1983). “Unlearning”has a stabilizingeffect in collective memories. Nature, 304, 15S-159. Lakoff, G. (1987). Women, fire and oirngerous things. Chicago:
ChicagoUniversityPress. Noest, A. J. (1988).Associativememoryin sparsephasorneural networks.Europhysics Letters, 6, 469-474. Pollaek, J. B. (1990). Recursive distributed representations, Artificial Intelligence, 46, 77-105.
Roseh,E. (1973).Naturaleategones.Cognitive Psychology, 4, 328– 350.
Sehwenk,H., & Milgram,M. (1994).Structureddiabolo-networks for hand-writtencharacterrcsognition.In: M. Marinaro& P. G. Morasso(Eds.),International Conferenceon Artificial Neural Networks (Volume2, pp. 985-988).Sorrento,Italy.