Pattern Recoqnition, Vol. 29, No. 8, pp. 1323 1333, 1996 Elsevier Science Ltd Copyright © 1996 Pattern Recognition Society Printed in Great Britain. All rights reserved 0031 3203/96 $15.00+.00
Pergamon
0031-3203(95)00158-1
SHAPE ANALYSIS USING HYBRID LEARNING JERZY BALA and HARRY WECHSLER Department of Computer Science, George Mason University, Fairfax, VA 22030, U.S.A. (Received 13 March 1995; in revised form 17 October 1995; received for publication 24 November 1995)
This paper is concerned with hybrid learning and it describes how to combine evolution and symbolic learning for shape analysis. The methodology introduced in this paper integrates genetic algorithms (GAs) characteristic of evolutionary learning with empirical inductive generalization characteristic of symbolic learning. GAs evolve operators that discriminate among image classes comprising different shapes, where the operators are defined as variable morphological structuring elements that can be sequenced as program forms. The optimal operators evolved by GAs are used to derive discriminant feature vectors, which are then used by empirical inductive learning to generate rule-based class description in disjunctive normal form (DNF). The GA constitutes the data-driven, performance-oriented part of the shape analysis system, while the empirical inductive generalization is the model-driven part of the system. The rule-based descriptionsare finally optimized by removing small disj uncts in order to enhance the robustness of the shape analysis system. Experimental results are presented to illustrate the feasibility of our novel methodoloy for discriminating among classes of different shaped objects and for learning the concepts of convexity and concavity. Copyright © 1996 Pattern Recognition Society. Published by Elsevier Science Ltd. Abstract
Shape analysis Inductive learning
Hybrid learning
Morphological processing
I. I N T R O D U C T I O N
Shape, an intrinsic property of objects from which many other properties such as motion and depth can be derived, is crucial for object recognition. Shape analysis, doubtless one of the most important capabilities of the human visual system, is also one of the major challenges to computer vision. It is not easy to describe or quantify shape information such that it can be learned and discriminated. One of the major difficulties is the lack of an appropriate representation for shape description and analysis. Learning strategies and representations are essential components of adaptive intelligent systems. In many situations it is unlikely that a single strategy and representation would serve all the system's need. F o r example, a vision system capable of learning to recognize shapes at different levels of detail must have at its disposal the corresponding representations and learning strategies to do so efficiently. When multiple tasks need to be learned and performed in a coordinated manner it is necessary for a system to handle multiple representations and execute different learning strategies. This paper is concerned with hybrid learning and it describes how to integrate evolution and symbolic learning in order to create high-performance shape analysis systems. The specific methodology introduced in this paper integrates morphological processing and genetic algorithms (GAs) with empirical inductive generalization. GAs evolve operators that discriminate among image classes comprising different shapes, where the operators are defined as variable mor-
Genetic algorithms
phological structuring elements that can be sequenced as program forms. The optimal operators evolved by GAs derive discriminant feature vectors, which are then used by empirical inductive learning to generate rule-based class description in disjunctive normal form (DNF). The GA constitutes the data-driven, performance-oriented part of the shape recognition system, while the empirical inductive generalization is the model-driven part of the system. The rule-based descriptions are finally optimized by removing small disjuncts in order to enhance the robustness of the shape analysis system. The following sections of this paper present the methodology for building such shape analysis systems, describe the analytical tools being used and their specific instantiation, and finally report on experimental results to prove the feasibility of our novel approach.
2. METHODOLOGY
The rationale behind our approach is the belief I1> that further advances in pattern analysis and classification require the integration of various learning processes in a modular fashion. Learning systems that employ several strategies can potentially offer significant advantage over single-strategy systems. Since the type of input and acquired knowledge are more flexible, such hybrid systems can be applied to a wider range of problems. Examples of such integration include combinations of genetic algorithms and neural networks c2J and genetic algorithms and rule-based systems33)
1323
1324
J. BALA and H. WECHSLER
Subsymbolic learning techniques (e.g. connectionists, genetic algorithms) have been steadily gaining in popularity and now outperform other techniques on many real-world tasks (e.g. learning in vision). One advantage of these approaches is that they seem to be more effective than their symbolic counterparts in dealing with noisy and incomplete information. The downside is that subsymbolic learning is usually a time-consuming process, and more importantly, that the representations learned lack explanatory power. On the other hand, symbolic reasoning techniques are very good at producing explanations of how input data is mapped onto output. This paper addresses the problem of combining the advantages of genetic algorithm and rule-based learning techniques, in order to yield high-performance shape analysis systems. Shape analysis should start by processing numerical information, while later stages of analysis, for compactness and efficiency reasons, would require symbolic processing. Figure 1 describes our system approach and it shows the specific components and their interconnections. We briefly explain the structure and the flow of control supporting the shape analysis system, while detailed description is deferred to Section 3. Original shapes are subject to morphological
I
Shape Images
I
processing (MP), where the goal is to derive optimal operators using genetic algorithms (GAs). Optimality ~s in terms of the classification performance of specific discriminative feature vectors (DFV) obtained as a result of using the morphological operation of erosion to be described later on. The shape training exemplars described by such numerical feature vectors are used by empirical inductive learning (EIL) to derive rulebased descriptions in disjunctive normal form (DNF). The rules obtained are not necessarily optimal so an optimization cycle consisting of iteratively changing DNF into temporary forms DNF' by deleting insignificant rule components. The net result of each optimization cycle is to then delete disjuncts deemed insignificant for shape analysis in terms of their influence on class membership assignment. The optimizaton component of each iteration has to be followed by a relearning EIL stage, where training (DFVs) exemplars corresponding to the disjuncts previously eliminated are labeled as outliers and removed from further learning by a truncation process taking place once an initial DNF' has been derived. The need for relearning is motivated by the fact that outliers removal affects not only their own class membership description, but other classes as well as it would be the case for nonlinearly separable classes. Such rule-based optimization leads to both efficiency, i.e. simpler rules, and robustness. The DNF learned using such an optimization relearning cycle becomes the final classifier.
3. sUBSYMBOLIC LEARNING
We describe in this section the specific tools used for shape analysis, their implementation and interaction, and how they provide for efficient and robust shape discrimination. As we will show below, the system as a whole is representative of a new generation of image analysis systems whose emphasis is on employing different learning techniques and strategies.
Erosion Operator
3.1. Morphology
Discriminative Feature V e c t o r
SUBS~BOLIC
Fig. l. System diagram of the shape analysis system.
Mathematical morphology 14'5) can quantify many aspects of the geometrical structure of images in a way that agrees with human intuition and perception. Mathematical morphology has been widely used for biomedical and electron microscopy image analysis and it has been also a valuable tool in many computer vision applications, especially in the area of automated visual inspection. Morphological processing can be employed for many purposes, including preprocessing, edge detection, segmentation and object recognition. Morphological expressions are defined as combinations of image operations, the simplest of which are the operations of erosion and deletion. The morphological approach is based upon the analysis of an image in terms of some predetermined geometric shape templates known as elemental structuring elements. The manner in which the structuring elements can be embedded into the original shape using a specific se-
Shape analysis using hybrid learning
; ......
I
i
....
i
i ......
, ..
L_
__} ..........] ...........
T
1325
i
i
(a)
(b)
(c)
(d)
Fig. 2. Exampleoftwo-dimensionalmorphology:(a)imageX,(b)structuringelement B (filter),(c) erosion of X by B, (d) dilation of X by B.
quence of operators leads eventually to shape classification and/or discrimination. The morphological operators can be thought of filters that encode the original shape, lead to data compression and provide "features" as needed for shape discrimination. Morphological operators are defined as those operations where an object is filtered with some elemental structuring element and thereby reduced to its more revealing characteristic features. Most morphological operations can be defined in terms of two basic operations, erosion and dilation. Suppose the shape X and the structuring element B are represented as sets in a two-dimensional (2D) Euclidean space. Let Bx denote the translation of B so that the origin is located at x. The erosion of X by B is then defined as the set of all points x such that Bx is included in X, that is: Erosion:
X ~ B - = {x:Bx = X}.
(1)
Similarly, the dilation of X by B is defined as the set of all points x such that B Xfilters X, that is, their intersection is not empty, that is: Dilation:
X ( g B - {x:Bxc~X # @}.
(2)
Figure 2 shows examples of erosion and dilation, where erosion is a shrinking operation, whereas dilation is an expansion operation. The encoding scheme used herein to implement morphological processing consists of an ordered list of fields together with the look-up function, which indicates how the bit strings are to be decoded to produce feature extraction operators. Each operator field encodes an elementary structuring element together with the type of operation applied on it and the corresponding repetition factor needed to replicate the operation. The morphological bit strings as a whole thus encode how to combine different elemental structuring elements into a valid algebraic expression and how to use encoded operators to extract shape features. Figure 3 below describes the morphological encoding scheme used to implement specific operator structures and to extract predefined discriminating features. The morphological encoding scheme consists of four operator fields (F1 ..... F4). Each of the four fields encodes one of 32 different structuring elements, a type of operation and a repetition factor. Structuring elements are defined over 3 x 3 windows. There are two
types of operators, 0 for dilation and 1 for erosion, while a repetition factor bit of one indicates if a given operation (deletion or erosion) is repeated. The first left-hand side seven bits of the string are decoded and the first elemental structuring element B is chosen. The operations of dilation or erosion are performed on this element. The first element serves as the initial 2D structure and at the same time as the structuring element. It is chosen randomly using a look-up table that defines initial elements. To start the whole process, assume that the result of first dilation or erosion operation is defined (by default) as the same initial (2D) structure or its equivalent structuring element. The next seven bits are decoded and the same process is repeated, but this time the dilation or erosion operation is applied to the 2D structure obtained from the previous operation using the next structuring element. Successive operations as defined by the remaining fields of the expression evolve into the final operator defined over an 11 × 11 window. It should be noticed that the first elements in the string (on the left hand side of the string) are basically responsible for the final shape of the operator. The elements of the right hand side of the string introduce minor changes to the final operator. This type of "linear" coding is important for the crossover operation (Section 3.2). The final operator erodes each shape resulting into a new shape and the feature set is extracted from this shape. The feature extraction process is illustrated in Fig. 4 using an eroded shape. The first feature is defined as the percentage of eroded pixels to the original number of pixels in the window and its values are normalized to the range [-1,..., 10]. Four additional features are derived as illustrated in Fig. 4. Four radii lines R1 to R4, from the center of the window to the boundary of the shape contour, are defined and the longest radius is denoted as R1 while the others are defined counterclockwise, each at 45 ° apart from the previous one. Radii R1 to R4 are measured, in terms of pixel length, before and after applying the final erosion operator. The absolute differences for each radius constitute the remaining four features. The most discriminating feature vectors, as would be found by genetic algorithms, are eventually used by empirical inductive learning to generate rulebased class descriptions.
1326
J. BALA and H: WECHSLER
110011 101Jlllol Ill 101001 [10 111010 11oi
B× /Type o~op~~titionfector Encoded elemental structuring element I mmm nmm H 10011
I mmmmmmmmmmm OPERATOR' SSHAPE ]
!!!!iii~ii~i!ii!~i!
FOUR STAGE DECODING PROCESS
nmmmnmumnmm mnmnnmnlmmm-..~L
Decoding operator's shape by morphologicalprocessing
IlmlmimlilDIglll mmmmmmmmmim mmmmmmmmmmm mmmmmmmmmmm mmmmmmmmmmm mmmmmmmmmmm mmmmmmmmmmm
RESPONSES
SHAPES TO WHICH OPERATOR
ISAPPLIED
DD
4 --
IU
Radii difference measurements
] Shape sizes measurement
X2, x3, X4, X5
x1
FEATURE VALUES Fig. 3. Morphological encoding schema for operator structure and features extraction.
R1 R3
R4
Fig. 4. Four radius features.
3.2. Genetic algorithms (GAs) The genetic algorithms (6) are optimization and adaptation techniques that maintain a constant-sized population of candidate solutions, known as individ-
uals. The initial seed population can be chosen randomly or on the basis of heuristics, if those are available for a given application. At each iteration, known as a generation, each individual is evaluated and recombined with others on the basis of its overall quality or fitness in solving the task. The expected number of times an individual is selected for recombination is proportional to its fitness relative to the rest of population. The power of a genetic algorithm lies in its ability to exploit, in a highly efficient manner, information about a large number of individuals. The search underlying GAs is such that breadth and depth are balanced according to the observed performance of the algebraic expressions evaluated so far. By allocating more reproductive occurrences to above average individuals, the overall effect is to increase the population's average fitness. New individuals are created using two
Shape analysis using hybrid learning main genetic recombination operators known as crossover and mutation. Crossover operates by selecting a random location in the genetic string of the parents (crossover point) and concatenating the initial segments of one parent with the final segment of the other parent to create a new child. A second child is simultaneously generated using the remaining segments of the two parents. Mutation provides for occasional disturbances in the crossover operation by inverting one or more genetic elements during reproduction. This operation ensures diversity in the genetic strings over long periods of time and prevents stagnation in the evolution of optimal morphological expressions. The individuals in the population are typically represented using a binary notation to promote efficiency and application independence of the genetic operations. GAs are used to generate the population of morphological expressions for shape discrimination. By using simple genetic operators (crossover, mutation, and selection), the population evolves and only the strongest feature extraction operators survive, thus contributing to improved overall performance in terms of shape recognition and/or discrimination. This performance is used as an objective evaluation function to drive the learning process in its search for new and useful morphological expressions. A crossover position is chosen randomly in the initial string representation. The crossover operation, when applied to two operators produces offspring strings (operators). Examples of crossover and mutation operations are depicted in Fig. 5. Selection is determined using fitness evaluation. The evaluation of each shape operator is based on its ability to cluster training examples. For each training exemplar, the Euclidean distance to each exemplar from all other P~entl B|I|N|NUNNN
|mmm|mn|m|| mmnmuummm lllumlmmnmnul mmmmummm mm~u~mmm n B ~ m ~ u EUlUUIimm muummmumm mJmnmnmmmm Ml~lUmWBmMm
1327
exemplars is computed. The overall measure of performance is the sum of the pairwise class distances computed over their corresponding members. Note that the reason to maximize the interclass distance rather than the usual ratio of interclass/intraclass distance is due to the way empirical inductive learning (ELL) operates. Specifically, the primary goal for ElL is to generate discriminatory rule-based descriptions rather than to specialize the cover descriptions and to make them compact. Further information on the subsymbolic part is available in Bala and Wechsler (1993). 17)
4. EMPIRICALINDUCTIVELEARING In learning from examples (also called concept acquisition), the task is to determine a general cover description explaining all positive examples of the target concept but excluding examples of other concepts (negative examples). We briefly overview next the AQ algorithm as an example of such empirical inductive learning. 4.1. Inductive learning using AQ learning The AQ algorithm Is) learns attributional cover descriptions from examples. When building a decision rule, AQ performs a heuristic search through a space of logical expressions to determine those that account for all positive examples and fail to account for negative examples. Since there are usually many such complete and consistent expressions, the goal of AQ is to find the most preferred one, according to some criterion. Training exemplars are given in the form of events, which are vectors of attribute values. Attributes may
Paint2 N|ll|Nllll!
Initial operator
|mmmmm|||||
|mmmmnmmmm| llnllmmnml|ll
umnuumm mmmmnmmi nuillNmm m i u a l i ~ l mnmmm~m mmmmmnmmmmm BBMMImMMNmg
liunnNunlgU
I mmmmnmmmmmm mmmmmmnmmmm • .m..N.m.. .....mu.um m m ~ m m mmmmm~nnm m ~ u m mnmlmInnum I | i i m I..NUBBN. n g ~ i u n n n i N n n n n n . ~ ....nmnum. . i n ~ .ninnninnm . . . m ~ . INBBBBBBg. iammiIilMIm nlmmlnmmmlm Child 1 Child2 Fig. 5. Crossover and mutation operations.
mmmnmmmmmmm .Um.HmUm. I
~
~
" ~ i n . ~ mllmlmRniw8
Mutated operator
1328
J. BALA and H. WECHSLER
be of three types: nominal (e.g. color), linear (e.g. temperature) or structured. Events represent different decision classes. Events from a given class are counted as positive examples, while all other events are counted as negative examples. For each class is decision rule is then produced that covers all positive examples and no negative ones. The concept descriptions learned by AQ algorithm are represented in VL1, which is a simplified version of the variable-valued logic system V L , OI and are used to represent attributional concept descriptions. A description of a concept is a disjunctive normal form (DNF), which is termed a cover. A cover is a disjunction of complexes and a complex is a conjunction of selectors. A selector is a form [L # R], where L is termed the referee, which is an attribute, R is the referent, which is a set of values in the domain of the attribute L, and # is one of the following relational symbols = , < , > , > = , < = , < > . The following is an example of a AQ complex (equality is used as a relational symbol):
where levels is the total number of attribute values for x n. If a condition of a rule is expressed as the range [xn=valjl...valj2 ], then the closest value (valjl or valj2) to val k is chosen as valj in formula (3). The total match is then computed by multiplying the evaluation values of matches to each condition of the rule. The total evaluation of class membership of a given test instance to a concept description (set of rules) is equal to the value of the best matching rule. For example, the match c of a test instance:
[xl = 1..7] and I x 2 = 5..7] and
Cx2 = 1 - ( ] 2 - 51/55)=0.946
[ x 4 = 2] and [x5 = 1] (t:3)
Cxv = 1 - (110 - 6[/55) = 0.928
where (t:3) represents the total-weight parameter, which is the total number of examples covered by the complex, and is used to order disjuncts in decreasing order. The following is an example of the convex cover generated by the AQ algorithm for the convex concave shape discrimination experiment described later:
(1) [xl = 5 . . 1 0 ] and [ x 2 = 4 . . 7 ] and Ix3 = 2..5] (2) [ x 1 = 4 . . 8 ] [x3 = 4 . . 6 ] (3) [ x 3 = 5 . . 1 2 ]
and [x4 = 0 . . 2 ] and [ x 2 = 8 . . 1 0 ] and and Ix5 = 1..3] and [ x 4 = 4 . . 6 ] and
Ix5 = 0..3] (4) [xl = 4 . . 7 ] [x4=4..8] (5) [xl = 1..6] [x3 = 0..4]
(t: 4) (t: 3)
(t: 2) and and and and
Ix3 = [x5 = Ix2= [x4 =
9.. 12] and 1] 8..9] and 2].
(t: 1) (t: 1)
4.2. Flexible matchingfor shape recognition The D N F covers once learned using AQ are used to classify u n k n o w n shape instances. There are two methods for recognizing the membership of an instance, strict and flexible matching. "°1 In the strict match, one tests whether an instance strictly satisfies the condition part of a rule (a complex), while in flexible matching one determines only the degree of closeness between the example and the concept description. The calculation of the degree of closeness is executed according to the following procedure. For a given condition of a rule [xn = valj] and an instance where x, = val k, the normalized value of a match of the instance to the conditional part of the rule is computed as:
1 - (Ivalj - valkl/levels)
(3)
x = ( 4 , 5 , 2 4 , 3 4 , 0 , 12,6,25) (each attribute assumes range consisting of 55 values) to a rule: [x 1 = 0 ] and [x 2 = 1..2] and [x 7 = 10] and Ix 8 = 10.. 20] is computed as follows: Cx~ = 1 - (10 - 41/55) = 0.928
Cx8 = 1 - (120 - 251/55) = 0.91 since x3, x4, x5 and x6 are not present in the rule cx3, Cx4 , Cx5 , Cx6 = 1
c = c'1c'22c'3c'4c'5c'6c'7c'8 = 0.74. An example of a matching function is depicted in Fig. 6(a). This function shows the degree of match for a simple two attributes rule: [ x 1 = 2 . . 4 ] and [x2 = 3.. 5] for all points of the representation space (xl, x2) defined by 10 levels for each attribute. Figure 6(b) shows four types of regions in the representation space created by the evaluation function. Region type 1 is strictly matched by the rule. Region type 2 represents a part of the representation space satisfied by the selector xl. Points of this region are evaluated to 1 - DxE/Nx2, where Dx2 is a distance measure from a given point to the rule border lines (border line x2 = 3 or x 2 = 5) and Nx2 is the total number of the x2 attribute values (10 values for the Fig. 6 example). Region type 3 represents a part of the representation space satisfied by the selector x2. Points of this region are evaluated to 1 - Dxl/N~I, where Dx~ is a distance measure from a given point to the rule border line (border line xl = 2 or x2 = 4) and Nxl is the total n u m b e r of the xl attribute values. The last region (type 4) represents the areas that are not satisfied by either selectors of the rule. Points of this region are evaluated to: (1 - D~I/Nx0*(1 -- Dx2/N J .
(4)
Figure 6(b) shows that although logic style rules are used to represent class description, the flexible evaluation of a match of u n k n o w n example not covered by the rule can be computed. Moreover, depending on the region in which the example is located, one can detect
Shape analysis using hybrid learning
1329
1 0.8 0.6 0.4 0.2 0
a) 10 9 8
D Region type 1
7 6 x2
~
Region type 2
~
Region type 3
~
Region type 4
5 4 3 2 1 1
2
3
4
5
6
7
8
9
10
xl
b) Fig. 6. Flexible matching function for the rule [x 1 = 2.. 4] and Ix2 = 3.. 5]: (a) degree of match: (b) four types of regions in the representation space.
which selectors (rule conditional parts) are satisfied. This detection process adds comprehensibility to the flexible matching function. 4.3. Optimization of learned descriptions There are two basic groups of approaches to learning from data with outliers. One is to allow a certain degree of inconsistent classification of training examples so that the learned descriptions will be general enough to describe basic characteristics of a concept. This approach has been taken by the ID family of algorithms.~l i j The main outliers-handling mechanism for decision trees is tree pruning. There are two types of tree pruning: ~2~ pre-pruning (example removal), performed during the construction of a decision tree, and post-pruning (tree pruning), used after the decision tree is constructed. The second approach, post-pruning, discards some of the unimportant rules/subtrees and
retains those covering the largest number of examples. The remaining rules thus provide a general description of the concept. The approach has been taken by the AQ family of programs. I13,14) Since the above methods try to remove outliers in one step, they share a common problem--~the final descriptions are based on the initial data. For methods applying pretruncation, the search for the best disjunct/subtree is also influenced by noise in training data. This problem is more severe for postpruning (subtree removal) and posttruncation (rule removal) methods. Such a postlearning concept optimization cannot reorganize concept boundaries to generalize descriptions over unseen examples. For example, the postlearning concept optimization (tree pruning or rule truncation) cannot merge broken (by outliers examples) concept components (i.e. rules, subtrees). So, the "gaps" in concept descriptions remain unfilled. This also causes the complexity of concept descrip-
1330
J. BALA and H. WECHSLER
tions to decrease only by the magnitude of truncated concept components. The effectiveness of more recent learning methods which learn concept descriptions regardless of the consistency criterion is also affected by the existence of positive and negative outliers, unknown/irregular distribution of the target data, and also by the lack of statistical guidance for the direction of concept generalization due to unknown distribution of noise. The approach employed in our system is an iterative detection of rules deemed unimportant in terms of their cover and the elimination of training exemplars covered by those disjuncts. ~41 The first novel aspect of this approach is that descriptions optimized through disjunct removal are used to filter outliers and then the filtered set of training data is used to relearn improved rules. The second novel aspect is that outliers detection is carried out on the higher level (DNF model level) and can be more effective than traditional data filteration applied on the input level only. The expected effect of such a learning approach is the improvement of recognition performance and a decrease in the complexity of learned class descriptions. The optimization stage is applied in order to modify the rules derived earlier in such a way that the resulting rules perform better when compared with the original cover descriptions and require less storage and processing time. The specific method used, that of indirect optimization, is described then as follows: (1) Remove insignificant (small t-weight) disjuncts from the rule description. Small disjuncts are not likely to represent representative patterns in the training data. (2) Use the truncated description generated in the previous step as a filter to remove those examples covered only by the removed disjuncts. (3) Learn new descriptions from the filtered training data. The following is an example of the optimization process for the convex-concave experiment described later on in this paper (Section 5.2 and Fig. 8). The decision rule to be optimized is: (1) [xF=5..IO] and I-x3 =2..5] and (2) [ x 1 = 4 . . 8 ] and [ x 3 = 4 . . 6 ] and (3) [-x3=5..12] and [x5 = 0..3] (4) [xl =4..7] and Ix4 = 4..8] and (5) [xl = 1..6] and [x3 = 0..4] and
[ x 2 = 4 . . 7 ] and [-x4=0..2] [ x 2 = 8 . . 1 0 ] and I x 5 = 1..3] [ x 4 = 4 . . 6 ] and
(t: 4) (t: 3) (t: 2)
[x3=9..12] and [x5 = 1] [ x 2 = 8 . . 9 ] and [x4 = 2].
(t: 1) (t: 1)
The least significant components of the rule (disjunct #5) is removed and the optimized (DNF') rule is: (1) [ x l = 5 . . 1 0 ] [x3=2..5] (2) [ x 1 = 4 . . 8 ] [x3=4..6]
and I-x2=4..7] and and [ x 4 = 0 . . 2 ] and [x2=8..10] and and [ x 5 = 1..3]
(t:4) (t: 3)
(3) [x3=5..12] and [ x 4 = 4 . . 6 ] and [x5 = 0..3] (4) [ x 1 = 4 , 7 ] and [ x 3 = 9 . . t 2 ] and [ x 4 = 4 . . 8 ] and [ x 5 = l ] .
(t: 2) (t:l)
The DNF' rule obtained above filters out during the truncation step the training exemplars not covered. For this example, one exemplar was filtered out and then the truncated set of training exemplars was processed again by AQ resulting in the new rule (DNF) given as: (1) [xl = 8] and [x2 Ix4 = 3] and [x5 (2) [ x l = 2 . . 1 0 ] and Ix5 = 1_5] (3) [xl =6..9] and [x4 = 5..7] and (4) [ x 1 = 5 . . 9 ] and [x4 = 5].
= 2..8] and Ix3 = 3..5] and = 0.. 1] (t: 4) [x3=4..11] and (t: 3) [x3=4..8] and [x5 =0..2] (t: 2) [ x 3 = 8 . . 1 1 ] and (t: 1)
Filtered-out exemplars are not longer used to learn concept descriptions. 5. EXPERIMENTAL RESULTS
The following experiments were performed to assess the feasibility and future potential for our novel shape analysis methodology. The first experiment was concerned with learning to recognize and explain the shape concepts of ellipse, triangle and rectangle, while the goal for the second experiment was to learn the shape concepts of concavity and convexity. The shape exemplars used in both experiments were partitioned into training and testing exemplars. The testing exemplars were used to evaluate the performance of the learned description, prior and after the optimization, and led us to conclude that optimization followed by truncation leads to enhanced robustness (defined as the higher correct recognition rate for unseen shapes).
5.1. Shape discrimination and recognition The GA step was executed over 15 generations and the best 15th generation feature extraction operator was used to derive the discriminative feature vectors (DFV) for three shape classes as shown in Fig. 7. Three class descriptions were generated using the AQ empirical inductive learning (ELL) method. When testing exemplars were matched with class description (using flexible matching method) two testing shapes, depicted as hatched, were misclassified. After rule optimization (removal of the least significant disjunct from each class description and truncating the training set by eliminating those shapes depicted as hatched), the learning process was repeated with only eight exemplars per class. The newly derived D N F (from the learning data with filtered out outliers) correctly matched all the shapes tested. Note that the misclassifled shapes, in the testing set, are different from the outliers being eliminated from further learning and
Outlier shapes
4A
n L)
Class (
~m
e L)
AYw
Oo~ Class A
.ll Class B
~
C
Misclassified shapes before the optimization, and correctly recognized after the optimization. Fig. 7. Geometricalshapes usedfor learningand testing. Outlier shapes
B
~ 6o& oR G
Oo ooJPmO
Nol. We~
V~' r,g @~
19e Convex shapes \ i /
Concaveshapes
/
Misclassified shapes before the optimization, and correctly recognized after the optimization. Misclassified shape before and after the optimization Fig. 8. Convexand concaveshapes usedfor learningand testing.
1332
J. BALA and H. WECHSLER
that following truncation they are now correctly classified. Note that optimization and truncation lead not only to robustness, but to efficiency as well. Specifically, the level of performance achieved after running GAs for 15 generations and using optimization is better than running GAs until saturation of the evaluation function occurs (in our case after 27 generations), without optimization, with regard to recognize accuracy. 5.2. Learning the concepts of convexity and concavity The G A step was executed over 15 generations and the best 15th generation feature extraction operator was used to derive the discriminative feature vectors (DFV) for convex and concave shapes as shown in Fig. 8.
10. 11.
12. 13.
14.
description of programs ESEL and AQll, Technical Report, 867, Computer Science Department, University of Illinois (1978). F. Bergadano, S. Matwin, R. Michalski and J. Zhang, Learning two-tiered descriptions of flexible concepts: the POSEIDON system, Mach. Learning 8, 5-43 (1992). J. Quinlan, The effect of noise on concept learning, Machine Learning: An Artificial Intelligence Approach, R. S. Michalski, J. G. Carbonell and T. M. Mitchell, eds, Vol. 2. Morgan Kaufmann, Los Altos, California (1986). J. Mingers, An empirical comparison ofpruning methods for decision-tree induction, Mach. Learning 3(4), 227-243 (1989). R. Michalski, How to learn imprecise concepts: a method for employing a two-tiered knowledge representation in learning, Proc. 4th Int. Mach. Learning Workshop. University of California, Irvine (1987). J. Bala and P. Pachowicz, Issues in learning from noisy sensory data, 1993 AAAI Fall Symposium on Machine Learning in Computer Vision, AAAI Technical Report FS-93-04, Raleigh, North Corolina ( 22 24 October (1993).
6. CONCLUSIONS We have proposed in this paper how to combine evolutionary and symbolic processes in order to create robust shape analysis systems. The specific methodology introduced in this paper integrates morphological processing and genetic algorithms (GAs) with empirical inductive generalization. The experimental results presented illustrate the feasibility of our novel methodology for discriminating among classes of arbitrarily shaped objects and for learning the concepts of convexity and concavity. Future work involves closing the loop between the symbolic and evolutionary components using the measure of correct classification of obtained rules on tune data shapes (part of all training shapes) as fitness (feedback).
REFERENCES
1. R. Michalski, Inferential theory of learning: developing foundations for multistrategy learning, Machine Learning: A Multistrategy Approach, R. S. Michalski and G. Tecuci, eds, Vol. 4, pp. 3 61. Morgan Kaufmann, San M ateo, California (1994). 2. F. Grau and D. Whitley, Adding learning to the cellular development of neural networks: evolution and the Baldwin effect, Evol. Comput. 1(3), 213-234 (1993). 3. H. Vafaie and K. DeJong, Improving a rule induction system using genetic algorithms, Machine Learning: A Multistrategy Approach, R. S. Michalski and G. Tecuci, eds, Vol. 4, pp. 453-469. Morgan Kaufmann, San Mateo, California (1994). 4. R. Haralick, S. Sternberg and X. Zhuang, Image analysis using mathematical morphology, IEEE Trans. Pattern Anal. Mach. Intell. 9(4), 532 550 (1988). 5. J. Serra, Image Analysis and Mathematical Morphology. Academic Press, New York (1982). 6. K. DeJong, Learning with genetic algorithms: an overview, Mach. Learning 3, 123 138, (1988). 7. J. Bala and H. Wechsler, Shape analysis using genetic algorithms, Pattern Recognition Lett. 14(12), 965 973 (1993). 8. R. Michalski, A theory and methodology of inductive learning, Artific. Intell. 20, 111-116 (1983). 9. R. Michalski and J. Larson, Selection of the most representative training examples and incremental generation of VL1 hypotheses: the underlying methodology and the
APPENDIX
AQ15 Algorithm In generating the description of a shape class we have been using Michalski's AQ-15 inductive learning program. ~8~This appendix gives a short overview of the AQ-15 algorithm. The AQ15 program is based on the AQ algorithm, which generates decision rules from a set of examples. When building a decision rule, AQ performs a heuristic search through a space of logical expressions to determine those that account for all positive examples and no negative ones. Since there are usually many such complete and consistent expressions, the goal of AQ is to find the most preferred one, according to flexible extra-logical criteria. Learning examples are given in the form of events, which are vectors of attribute values. Attributes may be of three types: nominal, linear or structured (hierarchical). Events represent different decision classes or, more generally, concepts. Events from a given class are considered to be positive examples, and all other events are considered to be negative examples. For each class a decision rule is produced that covers all positive examples and no negative ones. Rules are represented in VL1 (variable-valued logic system 1). VL1 is a multiple-valued logical attributional calculus with typed variables. These multivalued variables are expressed by using selectors which are two-valued functions. Examples of selectors are: [x7 = 2, 5, 6] [weather_type = cloudy or rain].
Conjunctions of selectors form complexes. An example of a complex is: [x3 = 2, 3, 5] [xl = 3, 7]. Complexes are assembled into covers. A cover is a disjunction of complexes describing all positive examples and none of the negative examples of the concept. A cover is formed for each decision class separately. It defines the condition part of a corresponding decision rule. The following are two examples of decision rules: [transport = car] < = [weather_type = cloudy [Temp = 40..60]
or
rain]
or
[transport = bike] < = [weather_type = sun] [ T e m p > 60].
The major idea behind the covering algorithm is to generate
Shape analysis using hybrid learning
a cover in steps, each step producing one conjunctive term (complex) of the cover. A condition is represented by a complex. A VL 1 complex is a conjunction of relational statements called selectors. A selector, representing a logical statement concerning the value range of an attribute, is of the following form:
1333
where both lower- and upper-range are integer numbers. For example, the VLI complex Ix = 10.. 13] [y = 1..5] [d = 2] indicates that the value of the attribute x falls inclusively in the range between ( 10 and 13, the value of y falls inclusively in the range between 1 and 5, and the d attribute represents direction value 2.
[attribute = Iower-range..upper-range]
About the Author--JERZY BALA received his M.S. degree in electrical and computer engineering and Ph.D. in information technology from George Mason University. His dissertation research dealt with the development of symbolic machine learning systems that learns descriptions of visual concepts from low-level vision data. This research was in the new and rapidly developing area of learning from sensory data, which investigates problems of incorporating learning capabilities in computer vision systems. His research in this area has led to over 35 publications. In May 1993 he was awarded a postdoctoral research grant by the National Science Foundation in Computational Science and Engineering. From September 1994 to May 1995 he was Research Assistant Professor in the School of Information Technology and Engineering at George Mason University. He was also a member of the Laboratory for Machine Learning and Inference at George Mason University. He is presently Senior Scientist with Datamat Systems Research, Inc. and Affiliate Professor of the Computer Science Department at George Mason University. He is a member of the Association for Computing Machinery, American Association for Artificial Intelligence and American Defense Preparedness Association.
About the Author--HARRY WECHSLER received the Ph.D. in Computer Science from the University of
California, Irvine, in 1975, and he is presently Professor of Computer Science at George Mason University. His research in the areas of Computer Vision and Neural Networks (NN) includes hybrid learning (statistics, adaptive signal processing, information theory, genetic algorithms, machine learning and neural networks), scale-space and joint (Gabor and wavelet) image representations, attention, functional and selective perception, face recognition and object recognition. He was Director for the NATO Advanced Study Institutes on "Active Perception and Robot Vision" (Maratea, Italy, 1989) and on "From Statistics to Neural Networks" (Les Arcs, France, 1993) and he will serve as the co-Chair for International Conference on Pattern Recognition to be held in Vienna, Austria, in 1996. He authored over 100 scientific papers, his book '~Computational Vision" was published by Academic Press in 1990, and he is the editor of the book, "Neural Networks for Perception" (Vols 1 and 2), published by Academic Press in 1991. He was elected as an IEEE Fellow in 1992.