PHYSICA Physica D 80 (1995) 140-150
El~qEVIER
Synergetic learning for unsupervised texture classification tasks T. Wagner 1, U. Schramm, F.G. Boebel Fraunhofer-Institute for Integrated Circuits, Am Weichselgarten 3, 91058 Erlangen, Germany
Received 13 October 1993; revised 4 March 1994; accepted 4 March 1994 Communicated by L. K.ramer
Abstract
Synergetic computers form a class of self-organized algorithms. Due to their close similarity to nonlinear self-organized systems in physics and chemistry they are potential candidates for a new sort of image processing hardware. We will study the performance of an unsupervised synergetic learning algorithm with classification problems on both artifical and real texture data and will show that unsupervised synergetic learning can be successfully used for unsupervised pattern classification.
1. Introduction
Nonlinear systems can be studied with respect to many different aspects. Quite often the main task is the understanding of a nonlinear system in real world, and a large variety of different methods for this is well established: from the time series a dynamics in phase space is reconstructed, which can then be used for either modelling the underlying differential equations, quantifying many characteristics such as fractal dimensions or entropies of attractors, or visualising the behavior of the system by use of Poincar6 maps as time tends towards infinity. Some practical limitations for these methods stem from the fact that with increasing dimension of the underlying problem, the amount of data necessary for a sufficient performance of the evaluation algorithms tends to rise drastically. As a consequence, this approach runs into some trouble if the underlying problems get too complex. For the investigation of extremely high-dimensional systems the interdisciplinary, field of Synergetics has contibuted some helpful ideas during the recent decade. It was established by Haken, who originally applied it to processes of self-organization in the dynamics of lasers. Synergetics has shown a way to analyze quantitatively many real systems with a large number of degrees of I Email:
[email protected] 0167-2789/95/$09.50 (~) 1995 Elsevier Science B.V. All rights reserved SSDI 0167-2789(94)00101-U
141
T. Wagner et al. / Physica D 80 (199_5) 140-I.50 Synergetic computers
hardware realization
computer simulation
direct calculation of prototypes (SCAP, SCAPAL)
dynamic learning o! prototypes (SCLERN)
optical systems solid state systems
Fig. 1. Overview of different synergetic computer concepts.
freedom in terms of features with few degrees of freedom. The basic underlying idea of adiabatic elimination successfully assumes that some unstable modes take control of the system. A summary of the synergetic approach can be found in [5,4]. Meanwhile synergetic approaches are chosen in many different fields of natural sciences, economy and social sciences, since the underlying dynamics for many self-organized phenomena seems to obey similar mathematics. At the level of physics [ 16] or chemical reaction-diffusion processes [ 18,19 ] the aspect of pattern formation can often be explained by means of synergetics. Morphogenesis in biology was modelled with such equations [3,11 ], and also neurobiological phenomena have successfully been described [ 10,2]. There are also some attempts of applying synergetic ideas to social phenomena such as the formation of economic cycles [23] or sociology [22,17], though due to the larger timescales related to these problems, the study of these is difficult. Haken has pointed out that the concept of synergetics cannot only be applied to pattern formation processes, but also to pattern recognition tasks. Starting form this idea, he has proposed different synergetic computer algorithms. The crucial point with synergetic computers is that their underlying mathematics is the same as synergetic processes in the real world obey to. Therefore they offer a clear perspective for hardware realizations. The state of the art in the field of synergetic computing will be the topic of the next section.
2. Synergetic computing Mapping pattern recognition processes onto self-organized systems implies that one has to find a way to express similarity and pattern classes in terms of local or global energy and dynamic forces. The different approaches chosen so far (Fig. 1) will be explained briefly in this section. Mind that the underlying dynamics is the same for both software and hardware approaches. In his first experiments on pattern recognition, Haken used the potential function I~]
T. Wagner et al. / Physica D 80 (1995) 140-150
142
L
+
~
Vl = -vl E 2 k ( V k q)- + LB 4 E k= 1
(V~"q)-(vk,q)" ,2, + ,~ + C¼(q÷q )2
(1)
k~k'
in order to generate a recognition dynamics [6]. The potential function is motivated by physical systems in which the same potential occurs for pattern generation processes. In the case of a pattern recognition process, the scalar potential value is interpreted as a measure of similarity between a test sample q and some learned adjoined prototypes v +. The first term in the potential drives away the pattern from the origin of the prototype space. The second term generates the borders between different prototype classes, whereas the third term limits the length of the pattern vector q during the recognition process. In this context, recognition is equivalent to minimizing the value of the potential function by changing q according to a special dynamics. The minimal that q ends up in represents the class the test vector is classified into. B, C, and 2k are parameters of the recognition dynamics. The dynamics acts on linescanned and normalized images. In order to do so the representatives vi of L different classes (so called prototypes) are normalized according to N
Y'~(vni) = 0,
(2)
n=l N
(v~i) = 1,
(3)
n-----I
and are used to construct L adjoined prototypes vi+ so that L
V+ = E
aikvT
(4)
k=l
and
v ? v; = ~ij
(5)
hold. The calculation of the adjoined prototypes is a simple matrix inversion problem. From the potential V~ a recognition dynamics can be generated using
q =
OVl Oq+ ,
(6)
q+ _
OVl Oq
(7)
This dynamics drives an arbitrary test pattern q into a final pattern which is identical with the prototype of the class to which the original pattern is considered to be most similar. The important point about the recognition potential I/l is that it is identical with potentials of synergetic systems in the real world. Therefore it is possible to map the recognition problem onto synergetic hardware which then "integrates" the differential equation in a parallel way and with time constants related to the underlying physical effects. Attemps have been made to investigate into the performance of such hardware systems realized by means of computer generated holograms [ 15 ] and current filaments in semiconducturs [ 13 ]. There are also examples of chemical systems capable of performing image processing steps such as edge extraction or local enhancement [7,9]
T. Wagner et al. / Physica D 80 (1995) 140-150
143
For software simulations, the differential equations for the recognition process do not have to be integrated, since due to special properties of the potential V1 the original test pattern q will always develop into the prototype class i with [(vTq) I = m a x j ( [ ( v f q)l},
j = 1. . . . . L.
(8)
This direct calculation of the recognized class has led to a number of what we call direct calculation methods for pattern recognition such as SCAP, SCAPAL, and MELT. Since we have described these methods earlier [20,21], we only want to point out that these algorithms are already in use for pattern recognition on the industrial floor where short learning times are crutial. Together with Schramm [14] we have compared this class of algorithms to conventional and neural approaches, especially with respect to learning and recogition effort and performance. Haken's group has also mapped other image processing problems onto synergetic potentials such as stereooptic image analysis and recognition of motion. Rees and his collaborators have started to apply synergetic potentials to texture generation processes [12]. So far we have only been dealing with synergetic recognition. An even more powerful aspect of synergetic computing is the field of unsupervised synergetic learning: just by being offered learning samples of different classes, the system is capable of forming the appropriate classes on its own. Haken [6] has shown that such an approach can be successfully chosen for restauration of disturbed or partially covered images. For this one has to keep in mind that. strictly speaking, Vt is a function of both q and the u~', that is ~] (q, v~). During a recognition process, the v~" were given and kept fixed, and Vl was minimized with respect to q. The learning process acts exactly vice versa: Now the adjoined prototypes t,,~ are modified by the gradient dynamics, whereas the shape of the potential is formed by a superposition of all learning samples q. For this the potential V~ is slightly modified to L
= - ½Z
2k(vk'q)2 + ¼B ~
k= t
(v;q)Z(v~.,q) 2 + C¼ ~-~ (v~,q) 2 (v k+ q)-.~
k:~k'
(9)
kk'
The modification in the third term results from the fact that now it is no longer the length of q, but of the uk~- which has to be constrained. In order to include all learning samples, a potential V~ is formed: V~ = ~
9(v~qi),
(10)
j=t
But there is another point which has to be taken into account: During the minimisation process, the v~ may run out of the subspace which is spanned by the qj. Haken has shown that this problem can be fixed by adding a second potential term V2, I~ = ~ - [ [ ( 1 - Pl)q! 2 + [ ( 1 - P2)q]2],
(11)
which is derived from projection operators P~
=
Z k
'+ L'k ' ~'k
(12)
144
T. Wagner et al. / Physica D 80 (1995) 140-150
introduced to restrict the v k÷ to this subspace. As done above for the recognition process, a learning dynamics is derived from the potential V = V t + ~ and leads to the equations L
• = 2 ( v ~ . q ) q - B ~ ( v ~ ' q ) (vk,+ q)-q+C ~_, (Vk+ q)(v~,q)2q v; ~ + k,~k'
k' = t L
L
+yt(2(Vkq)q-- Z(gkVk')(v~,q)q-- ~'-~(gkq)(vk'q)v~. ) , k'=l
L
=
-
L
_ k~--t
(14)
k'=l
(v k Vk,)(gk,q)q .
(15)
k'=l
The bars in these equations (such as in -v+k,) denote transposed vectors. Haken has used these equations to perform autoassociative pattern analysis on images of persons. Banzhaf [ 1 ] has shown this sort of dynamics is related to Kohonen's self-organized approaches [8]. Furthermore, he was able to show that receptive fields form if Gaussian distributed noise spots are offered to such a system. We suggest that it may be helpful to discuss such dynamics in the context of unsupervised feature extraction. Many problems in the field of pattern recognition have to separate classes using features. Finding the appropriate features is often the most difficult part of practical image processing problems and unfortunately has to be done again if the objects under investigation are changed. From the engineer's point of view it would be very helpful to have a system which does the extraction of characteristic features on its own. We have conducted many experiments testing the performance of the above Eqs. (14) and (t5) for unsupervised feature extraction both in artificial and in real problems. Some of these experiments are presented in the following section.
3. Experiments
3.1. Computer generated images Four different datasets have been used to demonstrate the performance of an unsupervised synergetic learning algorithm (Table 1). All computer generated data sets consist of 64*64 binary images. Fig. 2 shows data from samples POSITION_LEARN and POSITION_TEST which contain learning and testing data of horizontal and vertical lines at different positions and variable width. Samples ANGLE_LEARN and ANGLE_TEST contain images with lines of constant width, but different orientations (Fig. 3). All patterns are normalized according to Eqs. (2) and (3), but do not undergo any further preprocessing steps. For the learning process, the number of learning classes is fixed in advance, that means that a fixed value is assigned to the variable L in Eqs. (14) and (15), which is listed in the second
145
T. Wagner et al. / Physica D 80 (1995) 140-150
Table 1 Computer generated datasets used for the unsupervised learning algorithm Name
Patterns
POSITION_LEARN
80
Horizontal and vertical lines variable width and variable position learning sample
POSITION_TEST
80
Horizontal and vertical lines variable width and variable position testing sample
ANGLE_LEARN
36
ANGLE_TEST
36
m
m
Learning classes
3
Description
Lines of variable directions constant width learning sample Lines of variable directions constant width testing sample
m
m
u
m
m
m m m n
I
m
m
IIIIIIIII I II II II I I I I I IIII IIII II I III II I I Fig. 2.
(/JI
I Ill
\
Fig. 3.
Fig. 2. Input for unsupervised learing of two directions: vertical and horizontal lines of different width (used for samples POSITION_LEARN and POSITION_TEST). Fig. 3. Input for unsupervised learning of many directions: lines of different orientation and constant width (used for samples ANGLE_LEARN and ANGLE_TEST).
column of Table 1. The two directions of the patterns of POSITION_LEARN are forced into two classes, and for the variable directions of sample ANGLE_LEARN we allow three classes. All prototypes and adjoined prototypes are assigned random values. Then patterns of the learning sample are picked randomly and are offered repeatedly to the learning dynamics of Eqs. (14) and (15). Default values for the parameters B, C, and y~ were 1, 0.5, and 8, respectively. The value of )~ has been varied from 2 to 10 in different simulations. In order to integrate the system of differential equations, we used a simple fixed time step method. We believe that in this case high accuracy in integration is not necessary., since the underlying potential changes in time when different learning patterns are offered. The outcome for the adjoined prototypes is a result of an statistical average of the potentials related to the different learning samples. Details can be found in [6]. Some experiments done with a fourth order Runge-Kutta method did not influence the final result, but slowed down the integration process considerably.
146
T. Wagner et al. / Physica D 80 (1995) 140-150
prototype 2 1
/I
prototype 1 20
40
80 160 iterations
320
~I~
640
Fig. 4. Unsupervised learning of two directions and variable line width: time evolution of prototypes: each prototype specializes on one direction.
prototype 3
~::.?,>
~..~
prototype 2
prototype 1
-;N
!!!~:;i?
~D .".<{.< I
0
20
40 80 iterations
160
Fig. 5. Unsupervised learning of many directions and constant line width; time evolution of prototypes: each prototype specializes on a group of similarily oriented lines. 1
/
/
r"-X
~
\
,
i'",
j
r
',,
",I" \
~e.sit~,,ity °.°
of
prototypes0.4
\
~
.1..--"k .
",'-...-'."
l'O
/
.~ " ~ _ 1"5 2"0
'~/
,.._./. 2'5
~:z_.../. 3'0 3'5
orientation in 5° Fig. 6. Sensitivity of the three prototypes for different input directions: each prototype is representing one direction. The [earning process is e n d e d by either visually checking the a d j o i n e d p r o t o t y p e s for c o n v e r g e n c e or calculating an average change in the c o m p o n e n t s o f all a d j o i n e d p r o t o t y p e s . T h e t i m e evolution o f the a d j o i n e d p r o t o t y p e s for the two learning s a m p l e s P O S I T I O N _ L E A R N a n d A N G L E _ L E A R N is s h o w n in Figs. 4 a n d 5, respectively. Fig. 4 d e m o n s t r a t e s t h a t the a d j o i n e d p r o t o t y p e s develop f r o m their initially r a n d o m state via a m i x t u r e o f the different classes t o w a r d s
T. Wagner et al.
prototype 2
prototype 1
147
Physica D 80 (1995) 140-150
ii l/FlmilI
i'0 2'0 3'0 4'0 5'0 6'0 7'0 8'0
m
-h o r i z o n t a l lines s a m p l e s 1 - 40
I
II
vertical lines s a m p l e s 41-80
Fig. 7. Recognition of testing sample with lines of two orientations and variable line width: the classes are clearly separated.
a specialization with respect to the orientation of the lines. For the integration of the differential equations leading to Figs. 4 and 5, a time step of A = 0.1 has been used. The number of necessary iterations can be seen from the figures. Fig. 5 shows the time evolution of the adjoined prototypes which form if the lines of variable directions (ANGLE_LEARN) are forced into three classes. Though there is no direct correspondence between the pattern space and the space of the adjoined prototypes, looking at the adjoined prototypes will give an idea of the features the prototypes have specialized on. Fig. 6 shows the sensitivity of the three prototypes as a function of line orientation. It can be clearly seen that each prototype has specialized on one direction. In order to test for generalization capabilities of the learning algorithm, we investigated the classification of the testing samples POSITION_TEST and ANGLE_TEST according to Eq. (8). As adjoined prototypes t ' f we used the results of the learning process. The dot products with the test images are depicted as grey scale values in Figs. 7 and 8, with the high values of the dot product corresponding to dark areas. At the .r axis the different testing samples are lined up, the v axis represents the adjoined prototypes. The testing samples are ordered, that is for Fig. 7, all horizontal images have numbers less than 40. the rest being the vertical ones. In Fig. 8 the images ore ordered corresponding to their angle of rotation. Fig. 7 shows the results for test images with horizontal and vertical lines of variable width. The x axis represents the testing sample (40 horizontal and 40 vertical lines; only 3 representatives per class given in the label) and the v axis the two prototypes. Recognition is depicted as grey value with dark areas representing large values of similarity. It can be seen that the test sample is classified properly into two classes by using the adjoined prototypes from the unsupervised learning process. The lines of variable direction get caught by the adjoined prototype with the closest direction (Fig. 8). As above, the x axis represents the 36 testing samples and the y axis the three prototypes. Sensibility with respect to orientation has formed.
148
T. Wagner et al.
Physica D 80 (1995) 140-150
prototype
prototype
prototype 5
i'0
I'5
20
2'5
3'0
3'5
-. v a r i a b l e o r i e n t a t i o n s in 5 ° (whole axis r e p r e s e n t s 180 ° ) Fig. 8. Recognition of testing sample with lines of different orientations: the adjoined prototypes have developed sensitivity for different directions.
3.2. Real textural defects In order to demonstrate the performance on real images, a texture with a real defect has been selected: it shows a fabrics which is disturbed (Fig. 9). The image has been broken into subimages of smaller size. In order to achieve invariance with respect to translation, a logarithmic two-dimensional fourier-transform of the subimages was taken as input. Randomly chosen subimages were taken as learning sample. Again the resulting adjoined prototypes were used to classify the whole image according to Eq. (8). The classification result can be seen as a binary decision in Fig. 10. The adjoined prototypes which have formed have developed the capability of distinguishing between the original and disturbed areas. Note that learning and recognition dynamics are exactly the same as in Section 3.1.
4. Discussion and outlook
We have demonstrated that synergetic algorithms can be successfully used to perform unsupervised texture classification tasks. Example runs on artificial as well as on real data have shown that the classes generated by synergetic algorithms correspond to features with a global meaning. Due to the close similarity of synergetic algorithms with self-organized phenomena in real systems, such algorithms are potential candidates for synergetic hardware. In this respect they are superior to conventional or even neural pattern classification algorithms. Due to the large variety of selforganized systems studied during the recent years, we expect important contributions to hardware realizations from the fields of physics, chemistry and biology. The use of other synergetic potential for problems in machine vision has still to be investigated
T. Wagner et al. / Physica D 80 (1995) 140-150
149
mummmmmmmmmmmmmmulmummlmmmulum nmnnunumuuunnuunmnunmumnunn|nu nmmummmmmmnmmmmmmmmmnmmmmmmmmm nmmmmmmmumnummmmmmmnnmmnuumunn nmnnmmmmmmmmmmmmmmmnmmmmmmmmmn nmmmmmmmmmmmmmmnnmmmmmmmmmmmnm nmnmmmmmmmmmmmmmmmnmmmmmmmmmmm nnmununnunuunmuuununmmnuuuunuu uuuumnnnnnunnnnuununuununnunnu nnnnuuunuuuunuuuunnmmnnunuuunn nmnnnmmnuuummmmnnnnmunmmmnummu ummmmmmmmmmmmmmmmmmnmmmmnmmmmm nnnuuunnuuuunnununnnnnmunnmmum nmnmmmmmmmmmmmmmmmummnmmmnmmmm ummmmummmmmummmmmmmnmnmmmmmmmm umnnnmnmmmmmmmmmnmmmunmmummmmm luunnunnnnnununu~i~i~unnuuuuuuun mm mnm n mmm mn mnm~i~ii~i~mm mmm mum mmm nmmnmmummmummmm~i~nmmmmmmmmmm umnmmmmmmmmmmmmmummmmmmmnmmmmm umnmnmmmnmmmmmummmmmmmmmmmmmmm nnnmmnmnmmmunmmuummmuumummnimm ummmmmmmmmmmmmmnmmmmmmmmmmmmmm lmmmmmmmmmmnnmmnummmunmumnummm nnuunuummmuunnunnmmnnnmuuuumnm nunuunummnuunnnnnnumnumnnnnmnm unnnnnnnnnnnnunnnnnnunnnnnnnnn nmmmmmmmmmmmmnmmmmmmmmmmmmmmmm unmununmummmnmmuummmmnmmnnmmmm nmmmnmmnmmmmmmmmmmmmmmmmmmmmmm
Fig. 9.
Fig. 10.
Fig. 9. Examples of a real texture defect. Fig. l O. Classification result for unsupervised learning on a real texture defect. on. T h o u g h there is s o m e w o r k in the field o f stereoopsis and recognition o f motion, a general m e a n s o f applying synergetic a l g o r i t h m s to m a c h i n e vision p r o b l e m s has still to be found. In o u r opinion, e x t e n d i n g the c o n c e p t o f synergetic c o m p u t i n g to the local n e i g h b o r h o o d relations between single pixels could be very interesting. Including a p p r o a c h e s which describe the different scales o f a texture could yield to synergetic a l g o r i t h m s in which the d y n a m i c s takes place at a global a n d local scale at the s a m e time. This would definitely constitute an i m p o r t a n t step towards u n s u p e r v i s e d learning in c o m p l e x m a c h i n e vision systems.
Acknowledgements We w a n t to t h a n k R. Frischholz for m a n y inspiring discussions, U. Hassler a n d D. G e u d e r for i m p l e m e n t i n g s o m e o f the algorithms, a n d K. Nicolich for checking our English.
References [1] [2] [3] [4) [5] F6] [7]
W. Banzhaf and H. Haken. Learning in a Competitive Network. Neural Networks 3 (1990) 423-435. E. Basar. H. Flohr and H. Haken. eds. Synergetics of the Brain (Springer, Berlin, I983). L.A. Blumenfeld, Physics of Bioenergetic Processes (Springer, Berlin, 1983). H. Haken, Advanced Synergetics (Springer, Berlin, 1983). H. Haken. Synergeucs. An Introduction (Springer, Berlin, 1983). H. Haken. Synergeuc Computers and Cogrution. A Top-Down Approach to Neural Nets (Spnnger, Berlin, 1991 ). A.V. Holden, J.V. Tucker and B.C. Thompson, Can excitable media be considered as computational systems? Physica D 49 (1991) 240-246. [8 ] T. Kohonen. Self-organized formation of topologically correct feature maps, Biological Cybernetics 43 (1982) 59-69.
50
T. Wagner et al. / Physica D 80 (1995) 140-150
[9] V.L Krinsky, V.N. Biktashev and LR. Efimov, Autowave Principles for Parallel Image Processing, Physica D 49 (1991) 247-253. [101 J. KriJger, ed., Neural Cooperativity (Springer, Berlin, 1991). [11] M. Markus, S.C. Miiller and G. Nicolis, eds., From Chemical to Biological Organisation (Springer, Berlin, 1988). [12] D. Rees, ed,, Proc. Introductory Course on Synergetic Engineering, Sydney: CSIRO Division of Radiophysics (1993). [13] M. Schindel, Theorie eines Halbleitersystems zur Realisierung der Ordnungsparameterdynamik eines Synergetischen Computers, Stuttgart: PhD thesis (1993). [14] U. Schramm et al., A Practical Comparison of Synergetic Computer, Restricted Coulomb Energy Networks and Multilayer Perceptron, Hillsdale, Lawrence Erlbaum Associates, World Congress of Neural Networks III (1993) 657-660. [15] C.-D. Schulz, Theorie eines Lasersystems zur Mustererkennung als optische Realisierung eines synergetischen Computers, Stuttgart: PhD thesis (1992). [ 16] H. Takayama, ed., Cooperative Dynamics in Complex Physical System (Springer, Berlin, 1989) [ 17] H. Ulrich and G.J.B. Probst, eds., Self-Organization and Management of Social Systems (Springer, Berlin, 1984). [ 18] C. Vidal and A. Pacault, eds., Nonlinear Phenomena in Chemical Dynamics (Springer, Berlin, 1984). [ 19] C. Vidal and A. Pacault, eds., Non-Equlilibrium Dynamics in Chemical Systems (Springer, Berlin, 1984). [20] T. Wagner et al. (1993), Using a Synergetic Computer in an Industrial Classification Problem, in: R.F. Albrecht, C.R. Reeves and N.C. Steele, eds., Artificial Neural Networks and Genetic Algorithms (Springer, Wien, 1993) pp. 206-212. [21] T. Wagner et al., Testing Synergetic Algorithms with Industrial Classification Problems, INNS Neural Networks (1993), accepted. [22] W. Weidlich and G. Haag, Concepts and Models of a Quantitative Sociology (Springer, Berlin, 1983). [ 23 ] W.-B. Zhang, Synergetic Economics (Springer, Berlin, 1991 ).