Pattern Recognition, Vol. 25, No. 8, pp. 883-889, 1992 Printed in Great Britain
0031-3203/92 $5.00+.00 Persamon Press Ltd © 1992 Pattern Recognition Society
PRIMITIVE PATTERN LEARNING TONY Y. T. CHAN and LEV G O L D F A R B t Faculty of Computer Science, University of New Brunswick, Box 4400, Fredericton, N.B., Canada E3B 5A3; Intelligent Information Systems (IIS), 29 Pembroke Crescent, Fredericton, N.B., Canada E3B 2VI (Received 13 March 1990; in revisedjbrm 7 October 1991;receivedfor publication 11 December 1991) Ai~tract--A new approach to the feature detection problem, i.e. learning "useful" primitive features from raw images, is proposed. The "useful" features are defined within the training environment as those that allow the learning agent (learning system) to form object representations sufficient for subsequent object recognition. In other words, the "useful" features detected are discriminating, useful features. Primitive Image Features
Pattern Learning
Machine Vision
1, INTRODUCTION The main objective of this paper is to propose a novel approach to learning "useful" syntactic primitivefeatures, or primitive image fragments (PFs), from raw images. The approach conforms to the metric "philosophy" in pattern recognition, ~1-3~ which is applied here to a low-level learning process. One of, if not, the main ideas behind the metric "philosophy" is that an appropriate distance function on the pattern space should be the basic tool for formalizing the learning processes at all levels where learning occurs. First of all, "pattern" in the above "pattern space" does not refer only to a usual high-level, gestalt, pattern but also to a PF pattern. Second, one of the important outcomes of the recent work °~ is that, in general, any useful distance function defined on a pattern space could be viewed either as composite or integrated, i.e. in the former case the distance function is computed using some given set of primitive operations, while in the latter case the distance function is represented by a "primitive" black-box procedure (see also reference (4)). In the classical pattern recognition model (in Euclidean space) these distinctions are blurred, because in this special case the distance is essentially predetermined by the vector representation itself. We suggest that the distance functions used in low-level learning can be viewed as integrated and their efficient computation can be accomplished by special purpose hardware devices. We use the Taxicab (Cityblock) distance function defined for pairs of (primitive) image fragments (PFs) of the same size, each enclosed within a fixed window. Our choice of the Taxicab distance is rather incidental. Several other low-level distance functions are known. An interesting application of one of such distance functions'to the design of autonomous mobile robots is presented in reference (5). Our basic model for analytical image representation, t Author to whom all correspondence should be addressed.
Local Disparity Function
which we call local disparity function (LDF), comes from reference (6), from which also the "static" version of the proposed algorithm comes. However, our dynamic learning algorithm represents a solution to a problem which is different from that considered in reference (6). The basic problem we address in this paper can be stated as follows. Given two disjoint sets Jl and J2 of image functions, how can a machine learn "useful" PFs? By a useful PF for J1, for example, we mean such PF that images in set J1 can be distinguished from those in set J2 by this PF. An important distinction of the proposed learning algorithm from that of reference (6) is that both the mask size (in the construction of the LDF) and the clipping window size (PF size) are being adaptively selected by the algorithm. We are not aware of any PF selection algorithms that utilize both a dynamic image representation and dynamic window adjustment. On the other hand, such a general algorithm should be quite a useful and important tool in syntactic preprocessing models. 2. LOCALDISPARITYFUNCTIONAND THE CORRESPONDINGCHARACTERISTIC PRIMITIVEFEATURES Following reference (6), one can introduce an important concept associated with image representation, which, unfortunately, has not been sufficiently investigated. First, let us choose the Gaussian mask (GM) (see Fig. 1). Given an input image I and a GM of size m, G, (m is the number of pixels in the side of the mask) the LDF, f(l,Gm), is the function defined on the image whose value at a pixel p is equal to the Taxicab distance between the GM (considered as a vector) and the image fragment (also considered as a vector) cut out by the window of size m centered at p (Fig. 2). To obtain the values of the LDF for the border pixels of the image, we pad the image with blank pixels. It is clear that the LDF changes with the size of the GM. The image fragments corresponding to the local extremums in the LDF will be called extremalfragraents,
883
T. Y. T. CI-IA'~acid L, GoLOF/G~-B
0.35
O.B3
~4 O.tg
Y °d,
0,00 x
Fig. 1. ~fhree.dimensional plot of a Oaussian mask (Obl).
2~0
64.00
Y Fig. 2, Three.dimensional plot of a locsl disparity function for th© first triangl© in Fi8o 4(b).
~xationpore
aot
such positron three related facts 3usu Y ~ ,,-" extrema! pnrmtwe.features, and those corresponding ~,,~r there are -:-+~ ;n vision processes. ]¢trs~, will be called u . . of. the . . ttxauv "~n nr o m , o .- n oroccssing by the vision or to the local maximums of the LDF characteristic fragments, or characteristic primitive" rolee main share du ng the umps a t~stem CaUs or the -x='~"~rc~iv e the image- Second, features (cPF). important to note Regarding human vision, it isof the periods where human almOSt does not consist that"the eye movements rdativdY stable position the vision axis remains in a
Primitive pattern learning under normal conditions the position of the fixation point coincides with a place where the viewer's attention is concentrated. Third, fixation points, and therefore the places of attention concentration, are not arbitrarily positioned in the image: there exist image fragments that as a rule attract attention of any viewer" (p. 9 of reference (7)). Experiments have shown that the locations of fixation points coincide with the locations of local extremums in the LDFs.
3. ALGORITHMFOR LEARNINGUSEFUL PRIMITIVE-FEATURES
Let us assume that we begin with two finite disjoint sets of images J1 and J2, the learning sets. From the algorithmic point of view, which we adopt here, the precise nature of the two sets is not important except that they are disjoint. Let us also assume that the set, C, of all CPFs (for all images from J = J l w J 2 ) is k
partitioned into disjoint clusters: C = ~ Cv i=1
We will say that cluster C~ is a class of useful PFs if it can be alternatively obtained by taking one or more CPFs from each of the images in one learning set. If some CPFs of C~ are found in both learning sets, C~ is not useful. In the classical pattern recognition setting, the learning (training) occurs in a fixed Euclidean space (with the associated distance function). The learning process, however, becomes more powerful if it can be expanded to include processes that can dynamically modify the pattern representation space, i.e. the actual representation space becomes one of the "parameters" in the learning process. The following algorithm takes the learning sets J1 and J2 of images as inputs, and outputs the set UPFC, the set of all useful PF classes. To facilitate earlier detection of the UPFC, we introduce in the algorithm a clippinff window (CW): a CW is used to extract fragments that cover the characteristic fragments. Thus, although the LDFs and the CPFs are computed using the same window size m, the size c of the clipping window becomes larger than m. The two nested while-loops in the algorithm below iterate on the mask size m, and the CW size c, until U P F C is found (success) or U P F C remains empty (failure). We note that in the case of failure the input learning sets Jt and J2 can be modified and the process can be repeated. (Some of the sets in the algorithm below are indexed by the raw images l e J , although at the beginning of the algorithm each raw image I is being replaced by the preprocessed image I.) Algorithm LEARNUPFC(J1, J2, UPFC) /*Two constants are used: ubm is an upper bound and Ibm is a lower bound on the mask size*/ U P F C ,-J~-J1uJ 2 /*Preprocess the raw images to obtain set J*/
885
J ~ {Ill ~- PREPROCESS(I), l~J} m 4- Ibm while (m <_ubm) and (UPFC = ~ ) /*For each image I, compute matrix Mr representing its LDF*/ F *-- {MflMf*-f(l, Gin), T~J--} /*For each oftbe LDFs, compute the set gt of the coordinates of all its local maximums*/ L~- {fllfz ~- LOCAL_MAX(Mf), Mf~F} ¢~.-m while (c <_ubm) and (UPFC = ~ ) /*For each of the preprocessed images I, extract a set of fragments H z by positioning the CW of size c at locations specified in fzIf there are k maximums in f~, then there are k CPFs in Hi* / H I ~- CLIP (I, Ez, c) P~UH~ I¢3 /*Compute the distance matrix D of the Taxicab distances between all CPFs*/ D ~ [DIST(p~, Pl)]p,,pj~e /*Perform clustering on (P,D) and obtain, if any, useful PF classes, UPFC*/ CLUSTER (P, D, UPFC) c~c+l end while
m,,--m+l end while end LEARNING.
4. EXPERIMENTALRESULTS In this section we follow closely Algorithm LEARNUPFC of the last section to present some experimental results. The two input learning sets of images are shown in Fig. 3. Next, the preprocessing of the input images was performed by morphological operations, dilation followed by erosion: images f shown in Fig. 4(b) are the closing of images in J by the structural element S shown in Fig.4(a) ~ = I ' S = ( I ~ S ) e S , see Definition 36 in reference (8)). Obviously, if necessary, other preprocessing techniques can be applied: scaling, rotations, skeletonization, etc. The lower bound for the mask size (Ibm) was chosen to be 11 and the upper bound (ubm), the smallest of the learning shape sizes. The corresponding set of coordinates of all the local maximums is ~'1 = {(15,95), (15,96), (15,97), (29,134), (46,17), (53,175)}. To find the local maximums we used the following (parametric) definition of the local maximum. Pixel x* corresponds to a local maximum if and only if for the value of the LDF at this point, f(x*), we have: f(x*)>_f(x) for all pixels x under the mask Gm and f(x*) > f(x) for all pixels under the mask but outside the smaller window of size r(r _
The results of the CLIPping applied to the above image, the set H I, is given in Fig. 5.
886
T.Y.T. CHA~ and L. GOLDF.~B
j--'"
--.....>.
J1
i
...................
\,\
I
\
i !
i I L.........
\
\
\ \
I
\
\
\ \
\,
i
\
"\
I
\
]
\
\
.......................
\
'\
\
j
I I J
\\
J2 Fig. 3. The learning sets of images, Jl and J~, used in the experiment.
a)
\
'\ \
\ \
\
\ \
\
\
\
\
\
\
I I I
~
'\
I I I
\
b) Fig. 4. (a) The structural element used in the morphological preprocessing of the learning sets of images. (b) The learning sets after preprocessing.
Primitive pattern learning
887
NN
NN. N
N• ~N
N~
X
N KN~INK~
~K XXX
KNK
NKKX~K
• •
N• ~U NN
NU
~N~
N
M~K x ~
~N
KK
~ "N
N IN
NN~
N~ X
Fig. 5. The characteristic fragments extracted (CLIPped) from the first triangle in set J~ in Fig. 4(b).
For our experiment we chose distance DIST (Pl, P2) to be the Taxicab distance between the fragments pl and P2 by dilating them by the same structural element that was used in the preprocessing (Fig. 4(a)). Finally, the clustering of CPFs is performed using the minimal spanning tree clustering algorithm (see, for example reference (9)). The resulting two useful PF classes are shown in Fig. 6.
5. SUMMARY
In this section we contrast the differences between the primitive pattern learning model proposed here and the one proposed by Muchnik: 6~ First of all, it is important to single out the new form of image representation introduced in reference (6) under the name of "informative function", which we called the local disparity function (LDF) and which is used in both learning algorithms. The LDF is a function defined over the image and is obtained by measuring local disparity (distance) between a fixed Gaussian mask and the image fragment under it. The basic difference between the two learning algorithms is best understood if one keeps in mind the fact that they provide solutions to two quite different problems. One of the problems addressed in reference (6) is: given one set of images, construct an alphabet of primitive "forms" (features) for image representation. On the other hand, the problem considered in this
paper can be stated as follows: given two sets of images, find useful primitive features, where a useful feature is a discriminating feature with respect to the two input sets of images. Behind the two problem formulations lurks an important issue related to the meaningfulness of clustering: is it legitimate (1) to proceed to cluster image fragments for a single class of images and (2) to form the primitive features on the basis only of a fixed Taxicab distance? Our answer is no to both. That is why we assume two instead of one input learning sets, as well as allow for other distance functions between the image fragments. In addition, the presence of two learning image sets is used in the learning algorithm to choose values for some of the basic parameters associated with the image fragments such as mask and dipping window sizes. Other parameters can similarly be introduced into the algorithm. It goes without saying that the proposed learning algorithm can also be used when there are more than two input learning sets. In this case, as always, it is reasonable to apply the above algorithm within a loop which determines an optimal partition of the input learning sets into two learning sets. In fact, referring to the above experiments, it is interesting to note that if the original learning sets are rearranged in such a way that Jl is the set of all triangles and J2 the set of all rectangles, then the acute becomes the only UPFC. Although we have not discussed the recognition stage, one should point out that in general the on-line recognition of new primitive patterns can be accom-
""
•
NN
-•• •N MRI
N~
• •
~•KN
M.
NM
MN
~N
MK
.•l
uu
TM
I•
• ~•
•N
..R.
• u
Fig. 6. Two output classes corresponding to two useful primitive features for the two input learning sets of images.
--
N••
INN~mlnXMK~NXXNmU~H~NXMNI~N
U
M.
N ~-
-
~INMWU~mN~N~Nm~NMN~NI INIMmi
). Im
o
.t-
z
00
Primitive pattern learning plished within the metric model proposed in references (1, 2) (using the above distance DIST). Finally, we note that a stronger version of the proposed learning algorithm utilizing the weight learning model of reference (3) will be considered in a follow-up paper.
4. 5. 6. 7.
REFERENCES
1. L. Goldfarb, A new approach to pattern recognition, Progress in Pattern Recognition 2, L.N. Kanal and A. Rosenfeid, eds. North-Holland, Amsterdam (1985). 2. L. Goldfarb, A unified approach to pattern recognition, Pattern Recognition 17, 575-582 (1984). 3. L. Goldfarb, On the foundations ofintelligent processcs--l.
8. 9.
889
An evolving model for pattern learning, Pattern Recognition 23, 595-616 (1990). L. Goldfarb, On the foundation of intelligent processes--If. A general structure of learning machines, in preparation. H. Suzuki and S. Arimoto, Visual control of autonomous mobile robot based on self-organizing model for pattern learning, J. Robotic Syst. $(5), 453-470 (1988). L B. Muchnik, Simulation of process of forming the language for description and analysis of the forms of images, Pattern Recognition 4, 101-140 (1972). N.V. Zavalishin and I.B. Muchnik, Models of Visual Perception and Image Analysis Algorithms. Nauka, Moscow (1974). R.M. Haralick, S. R. Sternberg and X. Zhuang, Image analysis using mathematical morphology: Part 1, IEEE PAM! 9(4), 532-550 (1987). C.T. Zhan, Graph-theoretic methods for detecting and describing gestalt clusters, IEEE Trans. Comput. C-20, 68-86 (1971).
About the Audmr--ToNv CRAN was born in Hong Kong in 1960. He received the B.S. degree in computer science from the University of New Brunswick in 1983, M.S. in 1985 under the supervision of Professor Goldfarb, and currently is finishing up his Ph.D. program, again with Professor Goldfarb. He has been involved with the development of a unified model for (machine) learning based on (pseudometric) distances. This "distance" direction was first set out by Goldfarb. He taught full-time at UNB Saint John Campus between 1989 and 1991. He married a Christian girl by the name of Lynda J. Golding on 28 December 1991.
About the Audmr--Ll~v GOLDFARBrgo~ived the Diploma in mathematics and computer science from Leningrad University, U.S.S.R., in 1973 and Ph.D. degree in systems design from the University of Waterloo, Canada, in 1980. He was awarded an N.S.E.R.C. Post-doctoral Fellowship for 1980-82, and he spent the two years at the School of Computer Science, McGill University, teaching and writing the monograph "A New Approach to Pattern Recognition". Since 1982 he has been with the School (now Faculty) of Computer Science, University of New Brunswick. His main research interests are in the foundations of artificial intelligence, pattern recognition, machine learning, and computer vision, as well as in the design of associated parallel computer architectures. Lev Goldfarb is an associate editor of Pattern Recognition and a member of the Pattern Recognition Society.