Pattern Recognition, Vol. 27. N o 10, pp. 1291 1302. 1994 Elsevier Science Ltd Copyright (~) 1994 Pattern Recognition Society Printed in Great Britain. All rights reserved 0031-320394 S7.00 + .00
Pergamon
0031-3203(94) E0052-M
A NEURAL N E T W O R K CAPABLE OF LEARNING AND I N F E R E N C E FOR VISUAL PATTERN R E C O G N I T I O N Ho J. KIM and HYUN S. YANGt Department of Computer Science,Center for ArtificialIntelligence Research, KAIST, 373-l Kusong-dong Yusonggu, Taejon, 305-701 Korea
(Received 28 July 1993; in revisedJbrm 5 April 1994;accepted for publication 20 April 1994) Abstract In this paper, a neural network model which has logical inferencecapability as well as inductive learning capability for pattern recognition is presented. This model, Adaptive Inference Network (AINET), is a feedforward neural network consisting of two types of connections and can be trained by a backpropagation algorithm. The proposed model exhibits four major characteristics: logical inference ability, knowledge acquisition by learning, performance improvement by utilizing expert knowledge and result explanation capability. We first describe in this paper the network structure and behavior, the learning method, and other features of the proposed network. We then introduce a hybrid image recognition model using the neural network. The model consists of two stages: feature extraction and recognition. For the feature extraction stage, modular structure neural networks and conventional approaches are used together to extract the features more effectively. In the image recognition model, as a methodology of utilizng the knowledge from human experts, three forms of knowledge representations and a multistage learning technique are also presented. From the results of the typed and handwritten digit recognition experiment, the effectivenessof the proposed network on image recognition applications is evaluated. Image recognition Neural networks Knowledge-based vision
Learning theory
I. INTRODUCTION In recent years, expert system techniques have achieved considerable success in various applications related to artificial intelligence. However, in the computer vision and pattern recognition fields, the difficulties of knowledge acquisition and representation are still the major problems of the knowledge-based approaches. The neural network approach has emerged as a promising approach for contributing to the solution of the knowledge acquisition bottleneck."-5~ A principal advantage of the neural network is the ability to acquire the knowledge of the given problem directly from example data. But it is known that in some cases even a large number of examples are not enough to train the neural network for generalized performance. From this point of view, we have investigated the methodology of incorporating a priori knowledge about the problem in the structuring and learning of the neural networks. Recently, various models and methodologies for combining neural networks and expert system techniques have been proposed by Lacher et al. ~6~ Gallant~7~ Low et al., ts~ Keller and Tahani, t9~ Yager and Keller,"°~ Yang and Bhargawa," ~ Stottler and Henke °2) and Fu. "3) The image recognition problem is one of the most popular applications of neural networks. Many re-
t Author to whom all correspondence should be sent.
Inferencesystem
searchers have recently presented various techniques of neural networks for image recognition in the area of learning method, "4-~61 invariant recognition,"V~ knowledge processing"s z01 and uncertainty processingtzt'22) etc. In this paper, we propose a new approach to integrate neural network and knowledge-based techniques for image recognition. We will first propose a new neural network model named "Adaptive Inference NETwork (AINET)", which is capable of inductive learning from example data and logical inference from the rule base. We will then introduce an image recognition model using AINET, and discuss the usefulness of the model. The proposed model exhibits four main characteristics.
• Logical inference capability. By adjusting the connection weights on the network, the combinational logic or rule base composed of the logical operations such as AND, OR, NOT and IMPLY can be realized by AINET. • Knowledge acquisition by learning from examples. AINET is well suited to the applications in which only partial knowledge to solve the problem exists. The knowledge can be acquired and refined by the learning process. • Utilizing expert knowledge in the learning process. AINET has two types of connections called positive connections and negative connections, which represent the fuzzy relations between feature variables. By using the characteristics of these connections, we can effec-
1291
1292
H.J. KIM and H. S. YANG
tively apply the expert knowledge to the network learning process to improve the recognition performance. • Result explanation capability. During the recall process, we can justify the current output of AINET by tracing the value of nodes and connection weights. This function allows one to design a more flexible system.
process. In order to effectively utilize the knoweledge from human experts in the pattern recognition problem, we have considered three elements of the representational form of knowledge: feature variables, relationships among the features and model patterns. Each of these knowledge representations is applied to network construction, weight initialization and training process, respectively.
Unlike the Connectionist Expert System (v) and the Adaptive CES, (8) both of which use binary or threestate logic, each node in AINET takes a continuous real number in the range [0, 1]. AINET has acyclic feedforward network structure and can be trained by back-propagation (BP) learning algorithm, t2'6) In this paper, we will also develop an image recognition model using AINET. In order to show the effectiveness of the model, we have applied it to the character recognition problem for typed and handwritten digits. The model consists of two main components: feature extractor and pattern recognizer. The feature extractor is divided into two parts. One extracts the feature variables represented by fuzzy sets. (9'23) The other consists of neural networks as local feature detectors, which are trained for extracting the predefined feature variables. The pattern recognizer is implemented by using AINET, which is initialized and trained through a multistage procedure utilizing expert knowledge. One of the merits of the AINET for pattern recognition is that external knowledge from human expertise can be effectively utilized to improve learning efficiency and recognition performance. If there is a complete rule base to solve the given problem, we can realize the inference system using the AINET. In such case, the AINET is constructed to provide the same function as the rule base. Furthermore, even in the event that the rule base is not sufficient or too rough to solve the problem, we can gradually refine it through the learning
2. ADAPTIVE INFERENCE NETWORK
2.1. Structure and behavior
The AINET proposed in this paper is an acyclic feedforward neural network. As shown in Fig. 1, it can be represented by a directed graph with multiple arcs. Neurons in the network are classified into three types of nodes named input nodes, intermediate nodes and output nodes depending on whether they have incoming connections or outgoing connections. In other words, the input nodes do not have incoming connections and the output nodes have only incoming connections, and both types of connections are incident to the intermediate nodes. In Fig. 1, single circles, double circles and squares denote input nodes, output nodes and intermediate nodes, respectively. There are two types of connections in AINET: positive connection and negative connection. Figure 2 illustrates the connections from node i to node j, where real lines denote positive connection and dashed lines represent negative connection. And the w~ and c~i denote the weights of positive connection and negative connection, respectively. For an arbitrary node j in our network, let mr be the number of nodes connected to node j, and xi be the activation value of the ith node, then the activation
$
7/
%
..<
.,...,
.
.
5" X
,2p
I ! Z . . " 1.o
0
: O u t p u t node
~I
: intermedlate
0
: I n p u t node
¥
Fig. 1. An example of AINET structure.
node
Neural network for visual pattern recognition
1293
The chaining rule shows that
dE p
...........................----''"--
i
~Wji
j
c3i
t~Ep t~ze t3Net 1]
--7
dNet~ t3wji
~Z~
(5)
and .....~
: positive
connection
: negative
connection
~3Ep dcji
Fig. 2. Interconnections between nodes.
if Netj > 1 if - l < N e t j < l if Netj < - 1
i
(Netj+l)
t3z~] ~Net 1] c3cji
t3Ep _ [ ~ - ( t ~ - z e ) t3z~ .{ _ t~Ep
(1)
/L,
L k dNetl~
where
Setj=
mj
mj
~ wjixi-~
~
i=l
~zy Cji(1--Xi).
,
(6)
which are proportional to -APwj~ and -APcji, respectively. The Net~ denotes the input value of the jth node for pattern p as shown in equation (2). The APwji and APc~ denote the changes to be made to the two weights wj, and cji. From equations (2)-(4), the derivatives are given by
function for node j is given by
f(Netj)=
t~Ep t3z~ t3Net~ -
(2)
~Net~
/=1
In order to use the generalized delta rule as the learning algorithm, 12~equation (1) can be approximated to a sigrnoid function of equation (3), which is differentiable and monotonic increasing.
: outputnode
wkj : intermediate node,
= f'(Nety) = zP(1 - zy),
(7)
(8)
dNet t] -
x~e
(9)
(~Wji
and
dNete
1
f(Netj) - l @ e -Netj/T"
-
(3)
From the two terms S'~J=lwyix i and S~J:lcji(1 - x i ) in the right-hand side of equation (2), we can see the conceptual roles of positive and negative connections. While the positive connection represents the relation associated with the degree of activation of the lower node, the negative connection represents the relation associated with the degree of deactivation of the lower node. Such characteristics of the two types of connections allow one to effectively utilize the expert knowledge in its learing process. More details are described in the next sections.
1 -
xe,.
(lO)
~Cji
Thus, the changes to be made to the two weights of connections from the ith node to the jth node are computed as ~E p
APwji = rl 1 ~ f'(Netl])x e Ozj
(11)
and ~3Ep
APcj, = ~12 - -
~zy
f'(Nete)(1 - xe),
(12)
where r/1 and ~/2 are positive constants for controlling the learning rates.
2.2. Learning algorithm Like the learning process of multilayer perceptron (MLP) which is a most popular neural network model,t2~ AINET can be trained by a back-propagation (BP) learning algorithm. We have described in the previous section the behavior of AINET. The activation values of nodes in AINET can be computed by equations (1) and (2). Equation (1) approximates to equation (3). In order to use a back-propagation algorithm we consider equation (3) as the activation function which is a differentiable and nondecreasing function. Now we describe as follows how weight corrective steps are made. For a learning pattern p, let t~ be the desired output for jth output node, and z~ be the actual output for the node. Total error E for pattern p can be defined as 1
EP = ~ E
J
p
(tj --
zP) 2.
(4)
3. L O G I C A L I N F E R E N C E CAPABILITY
The logical inference capability from IF THEN rules is one of the major advantages of AINET. The basic logical operations such as AND, OR, N O T and IMPLY can be easily represented by AINET. Figure 3 illustrates a realization of four logical operations. By combining these network structures, we can implement any combinational logic with AINET. As a simple example, the Half-Adder logic represented by AINET is shown in Fig. 1. From the knowledge base represented as I F - T H E N rules we can easily build an implementation of the inference process using AINET. The input nodes are initially assigned with the value of zero or one which means true or false logic, respectively. The activation of each node in the network is computed by equation
1294
H.J. KIM and H. S. YANG
A ~ B
,~t~B
CheckSUM ~ CheckSum + w.iixi if (i is an input node) then DISPLAY (FeatureName(i), xi) else ADDQUEUE(i, Q) endif Remove node i from target list.
&
B
&
B
while (Q is not empty) do DELETEQUEUE(i, Q) if (x, >- fl) then WHY(i) else WHYNOT(i) endif endwhile endprocedure.
WOT (A)
&
[ until (CheckSum >--ot"Zk.wjk>owjkXk) ]
&
la
Fig. 3. A realization of logical operations by AINET.
(1) in event driven manner c6J starting from the input nodes. To determine the truth value associated with each node as a binary value, we may apply a thresholding technique to the output value of the node. 4. EXPLAININGTHE NETWORKOUTPUT When applying the neural networks to the image recognition problem, it may be practically difficult to select the proper set of training patterns as there are too many variants of objective patterns• Thus, there is a need for methodologies that can find out the effect of input features on the current output state of the network. In this section, as an additional function of AINET, the ability of justifying the current network output is introduced• Particularly, this function can be utilized as an explanation facility in expert system• The function provides two kinds of services: the WHY-function and the WHYNOT-function. The WHY-function is applied to the currently activated output node. By using the WHY-function, we can retrieve the information called desirable features, which is the list of feature variables positively contributive to the current activation of the objective output node. On the contrary, the WHYNOT-function is applied to the currently deactivated output nodes• In that case, the function provides the lists of extraneous features and missing features• The searching algorithm for desirable features in WHYfunction is structured as follows•
In the algorithm, j is the output node to be justified and Q is a queue of intermediate nodes traversed• The ARGMAX function returns the index of the node which causes the value of argument to be maximum. The parameters ct and fl are positive constants in the range [0, l], which determine the limit of total weightedsummation for the nodes traversed, and the threshold value for the node activation, respectively• The other features can be also found by similar algorithms. In order to look for extraneous features and missing features for the WHYNOT-function, the boxed parts in Algorithm 1 are changed into the following algorithms. The other parts of each algorithms are the same as those of Algorithm 1.
Algorithm 2: search for extraneous features procedure WHYNOT(j) repeat
[ i~ ARGMlNk(WjkXk) I CheckSum ~- CheckSum + wjix i
until
(CheckSum >_~• ~k.w~k>0 WjkXk)
endprocedure.
Algorithm 3: search for missing features Algorithm 1: search for desirable features
procedure WHYNOT(j)
procedure WHY(j)
CheckSum ,-- 0
repeat
repeat
l i~-ARGMAXk(wj, xk) I
l i4- ARGMINk(Cjk(1-- Xk) I
Neural network for visual pattern recognition
1295
lying assumptions in the AINET of the learning procedure are:
CheckSum,-- CheckSum + cji (1 - xi) ]
• In the AINET, some of intermediate nodes called visible nodes can be assigned external meanings as feature variables. Thus, the nodes in the network are classified into two classes: visible and hidden nodes. The terminology of hidden node, like that of the ordinary multilayer perceptrons, 12t means that it hides the specific information about its role as internal representation. However, each of the visible nodes is initially assigned its meaning, and we can see and utilize the information on the nodes in the designing process. The kinds of feature variables assigned to visible nodes are derived from the given rule base. Therefore, the number of the visible nodes is determined in proportion to the relative size of the given rule base.
[ until (CheckSum > or. S~k.c,k
• During the learning process, the weights of some connections in the network can be selectively fixed for updating• We call them fixed weights and the others variable weights. The constraint of each connection is determined at the beginning time of each stage in the multistage learning procedure.
This section describes a hybrid pattern classification model using the proposed network model. The structure ofthe proposed model is shown in Fig. 4. We have adopted a multistage learning procedure shown in Fig. 5 for the image recognition model• The under-
recognition results
t Pattern AINET for Recognizer Pattern Recognition
/ Fuzzy
Feature
Feature Extraction Neural
(conventional
Networks
program)
primitive features
I
Feature
Extractor
!
~
/
Preprocessor
[
t input image
Fig. 4. Image recognition model with AINET.
Extractor
1296
H.J. KIM and H.S. YANG
feature learning feature variables
relationships initializ-
among features
ation
model patterns primary learning
Expert Knowledge
secondary ~IgggmI~gIIDgmll
learning
training patterns
Fig. 5. Multistage learning for image recognition.
As shown in Fig. 4, the image recognition model with AINET consists of two components: feature extractor and pattern recognizer. The multistage procedure learning, as shown in Fig. 5, consists of four steps: feature learning, network initialization, primary learning and secondary learning. First, The feature learning stage is applied to the neural networks of feature extractor, and then the AINET of pattern recognizer is trained through the primary and secondary learning. In the following sections, we will give a detailed description for each step of the procedure.
In order to effectively extract such feature variables, we have adopted two approaches as follows. Fuzzy feature extractor: the first approach is the method based on fuzzy set theory. In this approach two kinds of feature variables defined as the fuzzy sets are considered. • For the case that the value of a feature variable can be defined as the membership degree for an element of the predefined fuzzy set, we can easily extract the feature value by the method described as follows. Let VA,VBand vc be the values of feature variable A, B and C, and t~A(X),ltn(x) and lac(X)the membership degrees for the input x of the fuzzy sets representing A, B and C, respectively. Then we can compute VA,VBand Vc by directly assigning the values of/~A(X),/~n(X) and #c(X), respectively. In order to define the membership functions of the predefined fuzzy sets, we have considered two methods: histogram method and heuristic method. <24'z5) In the histogram method, we compute the histogram of the feature for each class from the training data. The normalized histogram from the training data is then treated as a possibility distribution. In the heuristic method, the membership values are calculated using the preassumed possibility functions such as Gaussian, triangular or trapezoidal function. • In the case that both of the feature variable definition and the primitive feature set for input are the fuzzy sets of the same universe of discourse,t we can compute the values of such feature variables by using the similarity measure between two fuzzy sets. Let A be the predefined fuzzy set representing a feature variable and X another fuzzy set for the current input, and suppose that A and X have the same universe of discourse U. The value of the feature variable VA can be computed by
( VA MAX =
~ ,t~,(A)- ~=(X), ] 1- " ~
~2 re
,0
.
(13)
u~U
I F (antecedents) THEN (consequents) with ( a certainty factor).
Feature extraction by neural networks: the second approach to extract the feature variables is the method based on neural networks of sub-networks structure. The method is well suited for the case that the relations between the feature variables and primitive feature sets associated with them are too complex to be represented as simple formulae or statements. In such a case it may be very difficult not only to determine the universe of discourse but to assign the membership degree for each element of the fuzzy set. Therefore the neural networks learn how to extract such features through the learning process directly from example data. The model patterns given by a human expert for each feature variable are used in the learning process. In other words, the kinds of feature variables which are used in all experiments are selected from the rule base and the neural networks
The feature variables to be extracted correspond to the propositions included in the antecedents and consequents of the rule base represented in linguistic form.
t Universe of discourse means the domain in which the fuzzy sets are defined.
6. EXTRACTINGFEATUREVARIABLES In the image recognition problem, there is a growing need for modeling and managing uncertainties such as vagueness of class definitions and ambiguity of representations, t9) From the fact that most of the expert knowledge for image recognition may have a little fuzziness, we consider the knowledge of the form of
Neural network for visual pattern recognition learns the methodology of extracting the values of them through the learning process. The predefined feature set contains not only the global features but also the local features of the objects. In our system, the input image pattern is divided into several parts as spatial locality, then the local features are extracted from each part of the image. These local features represent the spatial characteristics of the objective patterns. Therefore, even if the objects are partially distorted or only partially visible, they can be recognized based on the fact that there may be some relevant local features preserved in their incomplete images. 7. UTILIZINGEXPERTKNOWLEDGE
7.1. Knowledge representation As described in the previous sections, in order to enhance the recognition performance, we have considered three types of knowledge representations: feature variables, relationships among features and model patterns. Each of the knoweledge representations can be generated from the rules having the form of IF A THEN B with (cf), where cf denotes the certainty factor of the rule. The certainty factor is a real number in the range [0, 1]. Feature variables: each of the propositions or simple features in the antecedents and consequents of the rules is defined as the feature variable corresponding to a node in the network. These feature variables can be described as linguistic terms, and can be utilized in the network construction process. The methodologies of extracting the feature variables have been described in the previous section. Relationships among the features: there may be many kinds of relations according to the forms of the feature variables. In order to represent these relations effectively, we define six kinds of fuzzy relations as follows. If the following four types of rules are given, then we can consider the relation between feature variables X and Y. Rule (1): IF Rule (2): IF Rule (3): IF Rule (4): IF
X THEN Y with (cf). X THEN Not(Y) with (cf). Not(X) THEN Y with (cf). not(X) THEN Not(Y) with (cf).
• Excitatory relation, inhibitory relation: from the forms of Rules (1) and (2) we can define two fuzzy
relations named excitatory and inhibitory relations. The excitatory relation represents the degree of presence for the feature in antecedent that has an excitatory effect on the presence for the feature in consequent. And the inhibitory relation means that the former has an inhibitory effect on the latter. • Negative excitatory relation, negative inhibitory relation: from Rules (3) and (4), we can also define two other relations as negative excitatory and negative inhibitory relations. The negative excitatory relation represents the degree of absence for the feature in antecedent that has an excitatory effect on the presence for the feature in consequent; the negative inhibitory relation means that the former has an inhibitory effect on the latter. • Independent relation: in the exceptional case of a rule of IF X THEN Y with (0), we can consider that the relation demonstrates that there are no dependencies between X and Y. We define such a relation as an independent relation. • Unknown relation: in addition, we define unknown relation as a case in which no rules about the two features are given.
Model patterns: as a special form of knowledge representation, we consider model patterns, which are ideal examples of the typical cases for the given problem. The model patterns are used not only for the feature extractor learning but for first-phase learning of the pattern recognizer. 7.2. Network construction and initialization The three forms of knowledge representations can be utilized in each step of the multistage training procedure. Now, as the second step preceded by the feature learning, network construction and initialization procedure is introduced. Firstly, we establish input, output and intermediate nodes of the network by referencing the feature variable list generated from the given rule base. Then we connect the nodes to each other by the dependency information among the feature variables. If the amount of the given rule base is not sufficient for solving the problem, we can selectively add some hidden nodes as the intermediate nodes. Secondly, from the six relations defined in the previous subsection, we initialize the weights of the connections. We assign the initial value to the weights between nodes representing the feature variables. The weight initialization process is carried out by the
Table 1. Weight initialization from the fuzzy relations Relation Excitatory Inhibitory Negative excitatory Negative inhibitory Independent Unknown
1297
Rule x~---,x i with (cf) xi ---,Not(xj) with (cf) Not(xi) ---,xj with (cf) Not(x~)---, Not(xj) with (cf) x~~ xj with (0) --
Initial weight
wji : ~" c f wj~ = - ~.. cf cji = T" cf cj~ = - ) , . c f w~ = 0, cj~= 0 Small random no.
1298
H.J. KIM and H. S. YANG
method described in Table 1. In the table, the parameter 7 is a positive constant which determines the range of initial weight values. After initializing weights from the relations, each of the remaining weights which are not initialized by the method is assigned with a small random number. 7.3.
m
~ Upper-right
m
:
Lower
zone
zone
Two-phaselearning
As shown in Fig. 5, the two-phase learning is followed by the initialization process. During the primary learning, the weights assigned by the method of Table 1 are clamped as the fixed weights, and the other weights are updated as variable weights. But during the secondary learning, all weights are updated as variable weights. From the two-phase learning of our model we can expect two additional benefits.
Fig. 7. An example of subregions of input image.
• The fixed weights of the primary learning play the role of providing a guideline for the direction of convergence in the early stage of the learning process. • If the given knowledge is insufficient or partially incorrect, it can be gradually refined through the second phase learning. 8. EXPERIMENTAL RESULTS
This section reports the experimental results to evaluate learning speed, recognition performance and the result explanation capability of the proposed model. We have simulated the proposed model to recognize images representing typed and handwritten digits "0" to "9". We have used 30 typed digits and 40 handwritten digits for training, as shown in Fig. 6. A digitized binary image of 64 x 64 pixels is generated for each typed or handwritten digit. We have divided the image plane into 10 zones as shown in Fig. 7, and selected 72 kinds of feature variables such as "Stroke", "Curve", etc. In other words, the kinds of feature variables are selected from the expert knowledge, and the values of them are extracted by the feature extractors shown in Fig. 4. Figure 8 illustrates handwritten digits used for testing. As shown in Fig. 4, the image is then encoded
0 II1 II2 II3 II4 51t6 II7 II8 I~ 0 II1 II2 II3 II4 5116 II7 II8 I~ 0111112 II3 II4 5116 II7 II8 II91 o IlJll~ II3 II, II6 It~ I~ o II1 IIzll 3 II~" sll 6 II7 I~q~ o II/ll~ll 3 II ~IL6 II~ I~ o Ilall ~ II~ II II7 I~N] Fig. 6. Training patterns: typed and handwritten digits.
a l ) liar
r
Fig. 8. Examples of test patterns. into a set of the primitive features such as contour shape, mesh vector, horizontal and vertical profiles, and the distribution of marked pixels for each subregion. (14'26) From the primitive features, the feature variables to be used in the pattern recognizer are Table 2. Examples of feature variables Feature variables extracted by feature extractor 1 Upper-left-stroke Upper-left-curve Upper-right-curve Upper-slant-stroke Lower-left-curve Lower-right-curve Upper-horizontal-stroke Upper-curve Vertical-long-stroke Left-long-curve
Feature variables extracted by feature extractor 2 Upper-centroid Lower-large-hole Upper-single-hole Upper-left-endpoint Upper-right-endpoint Lower-left-endpoint Lower-right-endpoint Double-hole-shape Single-contour-shape Lower-centroid
Neural network for visual pattern recognition
1299
Table 3. Examples of expert knowledge representation for digit classification IF (Upper-stroke-shape) THEN (Pattern-5) with (cf = 0.7). IF (Lower-right-curve) THEN (Pattern-6) with (cf = 0.8). IF (Lower-horizontal-stroke) THEN (Pattern-2) with (cf = 0.9). IF (Lower-horizontal stroke) THEN (Pattern-l) with (cf = 0.4). IF (Upper-single-hole) THEN NOT (Pattern-6) with (cf = 0.6). IF NOT (Left-long-curve) THEN (Pattern-3) with (cf = 0.5). IF NOT (Lower-single-hole) THEN NOT (Pattern-8) with (cf = 0.7).
extracted by feature extractor, as was described in the previous section. We have considered 80 rules as the expert knowledge. Tables 2 and 3 show some examples of feature variables and knowledge representations, respectively. The A I N E T for the recognition stage in our digit recognition system consists of 162 nodes in all: 72 input nodes, 10 output nodes and 80 intermediate nodes.
8.1. Experiment 1. Learning speed To evaluate the learning speed of the proposed networks as compared with the ordinary M L P , the same feature set has been used in the learning for both
models. The A I N E T and M L P , which have the same number of nodes, have been trained with the same learning patterns shown in Fig. 6. Figure 9 shows the comparison of learning speeds a m o n g them. In the graph, cases (a) and (b) represent the change of total error of randomly initialized M L P and A I N E T , respectively. Case (c) represents that of the A I N E T initialized by the method of Table 1 from the given knowledge. F r o m the results, we can see that the A I N E T more rapidly converges to low error state than the ordinary M L P , and the A I N E T which is initialized by the knowedge case (c) starts from lower error state than others and converges to the low error state after the similar times of iterations with the randomly initialized AINET.
Table 4. Number of incorrect results and recognition rates Typical handwritten digits (200) Model
Partially distorted digits (100)
No. of incorrect results
Recognition rate ( ~ o )
No. of incorrect results
Recognition rate (~o)
96 61 38
52.0 69.5 81.0
67 43 22
33.0 57.0 78.0
MLP AINET 1 AINET 2
6O
5O
'
\
0
o,o
4O
2O
10
0
I
i
I
i
I
I
I
200
400
600
800
1000
1200
1400
No. o f iterations
Fig. 9. Learning speed comparisons of three types of neural networks.
1600
1300
H.J. KIM and H. S. YANG
8.2. Experiment 2. Recognition performance We have used 200 typical handwritten digits and 100 partially distorted digits as the test patterns, as illustrated in Fig. 8. The training patterns of 6 which is used for the learning process, are excluded in this experiment. Table 4 shows the performance comparison of our model and the ordinary M L P using primitive features. In the table, the AINET ! represents the case that all weights of the AINET in our model have been randomly initialized and the AINET 2 represents the case using the AINET initialized by the method of Table 1. In the conventional neural network approaches for pattern recognition, the external knowledge is not considered. Thus, they are trained by the primitive features which are selected independently of the objective patterns. In this experiment, we have compared the recognition performance of our model with the M L P using primitive features. We have assumed that the M L P does not consider any external knowledge about the objective patterns in its learning process. Therefore, while the M L P has 1420 input nodes for the primitive features, AINET I and AINET 2 for the recognition stage of Fig. 4 have only 72 input nodes since the primitive features are transformed into 72 feature variables by the feature extrators as a preliminary stage, as shown in Fig. 4. We can see that the size of the AINET in our model can be remarkably reduced as compared with the M L P using primitive features. Also the results of
Table 4 show that the recognition performance of our model is much better than the M L P which does not consider any external knowledge, and that the initialization technique utilizing expert knowledge is particularly useful to partially distorted character recognition.
8.3. Experiment 3. Justification of network output The above-mentioned justification algorithms have been tested on a set of variously distorted digits. Figures 10 and 11 demonstrate the results of the experiments. For the given input images in the figures, the activation values of 10 output nodes are displayed as a bar chart. Through the WHY-function and the WHYNOT-function described in the previous section, desirable features, missing features and extraneous features for the winner node or a specific node of loser nodes can be retrieved and displayed. For examples, in Fig. 10, the desirable features of the given image for pattern 3 have been enumerated by the WHY-function, and the missing features and extraneous features of the image for pattern 5 have been enumerated by the WHYNOT-function. Figure 11 shows the result of the same experiment for a partially distorted image. From the results, we can see that the functions provide considerably reasonable information about the present state of the network.
< IF,,IKI[LI'S >
< I]IIL~T >
3 II UDEB 0
1
2
3
4
5
6
7
8
O00OO00000
(
whY( De,,J.z'able
Features
U p l : ~ r....r £ gh'e. e u . ~ v e
-
love: :
-
)
131
-
:
JUSIIFI~TI|
mrm~)
(
IS)
NlasJ.~j
Feat~z-ea
-
me:toke
upl~:
)
: 0,2010
0.7102 =lghT,_ourve O. 9 1 7 9
upper...ourve : 0.3005
( l~l~aaeous -
FeaZes
uppe= _:lgbl:
aurae
: 0.7102
Fig. 10. Example of justification experiment l.
)
g
Neural network for visual pattern recognition
< m T I l
0
1
2
3
4
1301
•
5
6
8
7
9
O00000OO00 • NJJTIFI~£TIOJJ •
WHY( Deulreble
(61
Featurea
-
1 owe=.~:Lght_¢
-
1owe= h o l e _ s h a p e
~o,z' ( Nlssl~j
)
-
u~-~e
-
rtstu~ee lett
(s) )
endl~lnt
: O. 0 0 0 0
: O. 0534
:
lowe~
-
( IL~.z'ane(m:
1.0000
-
u p p o = _ r £ q h t _ond_po £ n t
lover
Foat-a:es
)
hole_shape
: 1. 0000
: 1.0000
Fig. 11. Example of the justification experiment 2.
Table 5. The performance of the AINET initialized by the rule base Typical handwritten digits ( 2 0 0 ) No. of rules used for initialization
Partially distorted digits (100)
No. of incorrect results
Recognition rate (%)
No. of incorrect results
Recognition rate (%)
61 53 49 48 45 41 40 38 38
69.5 73.5 75.5 76.0 77.5 79.5 80.0 81.0 81.0
43 39 33 29 28 28 25 23 22
57.0 61.0 67.0 71.0 72.0 72.0 75.0 77.0 78.0
0 rules 10 rules 20 rules 30 rules 40 rules 50 rules 60 rules 70 rules 80 rules
9. C O N C L U S I O N
In this paper, we have studied the methodology for integrating neural networks and knowledge-based approaches for pattern recognition. We have proposed a neural network model called Adaptive Inference Network which is capable of learning and inference. We have introduced the structure and behavior, learning algorithm and other features of the proposed network. It has been shown that the neural network can realize any combinational logic or inference system for the knowledge base represented by IF-THEN rules, and can be trained for knowledge acquisition from example
data by the learning process. The image recognition model using the proposed neural network has been also presented. We have defined six types of relations among feature variables, and proposed the method of utilizing them in the learning process of the network. Through the experimental results, we have shown that the proposed model performs significantly better than the conventional neural network both in terms of recognition rate and computational load. The feasibility of the model was demonstrated by a digit recognition example. For partial pattern recognition, we have divided the input image into several parts, and extracted the local features from each part of the image. The
1302
H.J. KIM and H. S. YANG
multistage learning technique with the A I N E T for proposed image recognition model can provide better recognition performance than the ordinary M L P when tested on partially distorted as well as transformed input patterns. The experiment of the result explanation capability of A I N E T has shown that it can provide reasonable information about the recognition results.
13. 14. 15. 16.
REFERENCES
1. Y. H. Pao and D. J. Sobajic, Neural networks and knowledge engineering, IEEE Trans. Knowledge Data Engn9 3, 185-192 (1991). 2. D. E. Rumelhart and J. L. McClelland, Parallel Distributes Processing. M IT Press, Cambridge, M A (1986). 3. S. I. Gallant, Perceptron-based learning algorithms, IEEE Trans. Neural Networks 1, 179-191 (1990). 4. R. P. Brent, Fast training algorithms for multilayer neural nets, IEEE Trans. Neural Networks 2, 346-354 (1991). 5. R.P. Lippmann, An introduction to computing with neural nets, IEEE ASSP Mag. April, 4-22 (1987). 6. R.C. Lacher, I. Hruska and D. C. Kuncicky, Back-propaga'tion learning in expert networks, IEEE Trans. Neural Networks 3, 62-72 (1992). 7. S. I. Gallant, Connectionist expert system, Commun. ACM 31, 152-169 (1988). 8. B. T. Low, H. C. Lui, A. H. Tan and H. H. Teh, Connectionist expert system with adaptive learning capability, IEEE Trans. Knowledge Data Engng 3 (1991). 9. J. M. Keller and H. Tahani, Implementation ofconjunctive and disjunctive fuzzy logic rules with neural networks, Int. J. approx. Reas. 6, 221-240 (1992). 10. J. M. Keller and R. R. Yager, Fuzzy logic inference neural networks, Proc. SPIE 1192, Intelligent Robots and Computer Vision VIII: Algorithms and Techniques, SPIE 1192, 582-591 (1989). 11. Q. Yang and V. K. Bhargava, Building expert systems by a modified perceptron network with rule-transfer algorithms, in Proc. I J C N N 90, 2, 77-82 (1990). 12. R. H. Stottler and A. L. Henke, Automatic translation
17. 18.
19.
20. 21. 22. 23. 24. 25. 26.
form an expert system to a neural network representation, Proc. IJCNN 92, 1, pp. 13-20 (1992). L. Fu, A neural network model for learning rule-based systems, Proc. IJCNN, 1,343-348 (1992). H. Guo and S. B. Galfand, Classification trees with neural network feature extraction, IEEE Trans. Neural Networks 3, 923-933 (1992). B. A. Macdonald and H. Witten, A framework for knowledge acquisition through techniques of concepts learning, 1EEE Trans. System Man Cybern. 19, 499-512 (1989), P. W. Baim, A method for attribute selection in inductive learning systems, I EEE Trans. Pattern Anal. M ach. Intell. 10, 888-896 (1988). K. Fukushima, Neocognitron: a hierarchical neural network capable of visual Pattern recognition, Neural Networks 1, 119-130 (1988). W. B. Rouse, J. M. Hammer andC. M. Lewis, Oncapturing human skills and knowledge: algorithmic approaches to model identification, IEEE Trans. Cybern. 19, 558-573 (1989). R. M. Goodman, J. W. Miller and P. Smith, An information theoretic approach to rule-based connectionist expert systems, Adv. Neural Inform. Process. System 1, 256-263 (1989). R. Sun, Neural network models for rule-based reasoning, in Proc. IJCNN 91, 1, pp. 503-508 (1991). J. M. Keller and D. J. Hunt, Incorporating fuzzy membership functions into the perceptron algorithm, IEEE Trans. Pattern Anal. Mach. lntell. 7, (1985). A. Mogre, R. McLaren and J. Keller, Utilizing context in computer vision by confidence modification, Intell. Robots Comput. Vision SPIE 1002, 384-393 (1988). H. J. Zimmermann, Fuzzy set theory and its applications. Kluwer Nijhoff, Dordrecht (1985). R. Krishnapuram and J. Lee, Fuzzy-set-based hierarchial networks for information fusion in computer vision, Neural Networks 5, 335-350 (1992). T. L. Huntsberger, C. Rangarajan and S. N. Jayaramamurthy, Representation of uncertainty in computer vision using fuzzy sets, IEEE Trans. Comp. 35, 145-156 (1986). R. M. Bozinovic and S. N. Srihari, Off-line cursive script word recognition, IEEE Trans. Pattern Anal. Mach. Intell. 11, 68-83 (1989).
About the A u t h o r - - H o JOON KIM was born in 1961. He received the B.S. degree from the Department of Electronics Engineering, Kyeongpook National University in 1987, and the M.S. degree from the Department of Computer Science, Chungnam National University in 199,1, Korea. From January 1987 until March 1991, he was a member of the research staff in the Department of Computer Code Development, Korea Atomic Energy Research Institute. He is currently a Ph.D. student in KAIST. His research interests include computer vision, neural network, pattern recognition and expert system.
About the Author--HvuN SEUNG YANG was born in 1953. He received the B.S.E.E. degree from the Department of Electronics Engineering, Seoul National University, Korea, in 1976, and the M.S.E.E. and Ph.D. degrees from the School of Electronical Engineering, Purdue University, West Lafayette, in U.S.A., in 1983 and 1986, respectively. From August 1986 to July 1988, he worked as an assistant professor in the Department of Electrical and Computer Engineering, University of Iowa, Iowa City, IA. While attending Purdue University, he worked at the Robot Vision Lab as a research assistant from May 1983 to August 1985 and as a research associate from September 1985 to August 1986. He was engaged in such research as 3-D shape representation and recognition, knowledge-based vision systems, and intelligent robot manipulation with visual feedback. While he was with the University of Iowa, he developed Robot and Machine Intelligence Lab in the Department of Electrical and Computer Engineering. He is currently working with the Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST) as an associate professor and a director of Computer Vision and Intelligent Robotics Lab in the Center for Artificial Intelligence Research at KAIST. His current research interests include 3-D object representation and recognition, CAD-based 3-D robot vision, knowledge-based vision, neural network-based vision and mobile robot. He is currently a governing board member of the International Association for Pattern Recognition.