Pattern Recognition, Vol. 29, No. 12, pp. 2047-2060, 1996 Copyright © 1996 Pattern Recognition Society. Published by Elsevier Science Ltd Printed in Great Britain. All rights reserved 0031-3203/96 $15.00+.00
Pergamon
PII:SOO31-3203(96)OOO43-X
A ROBUST BOUNDARY-BASED OBJECT RECOGNITION IN OCCLUSION ENVIRONMENT BY HYBRID HOPFIELD NEURAL NETWORKS JUNG H. KIM, *f SUNG H. YOON* and KWANG H. SOHN § tDepartment of Electrical Engineering, North Carolina A&T State University, Greensboro, NC 27411, U.S.A. *Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 27695, U.S.A. §Department of Communication Engineering, Yonsei University, Seoul, Korea (Received 8 September 1995; in revised form 6 March 1996; received for publication 4 April 1996) Abstract--This paper presents a new method of occluded object matching for machine vision applications. The current methods for occluded object matching lack robustness and require high computational effort. In this paper, a new Hybrid Hopfield Neural Network (HHN) algorithm, which combines the advantages of both a Continuous Hopfield Network (CHN) and a Discrete Hopfield Network (DHN), will be described and applied for partially occluded object recognition in a multi-context scenery. The HHN proposed as a new approach provides great fault tolerance and robustness and requires less computation time. Also, advantages of HHN such as reliability and speed will be discussed. Copyright © 1996 Pattern Recognition Society. Published by Elsevier Science Ltd. Object recognition Matching algorithm 1.
Machine vision Hopfield network
Neural network Occluded object
INTRODUCTION
Recognition of partially occluded objects has been an important issue in the fields of industrial automation and military applications because occlusion causes significant problems when one attempts to identify and locate an object in the workspace of robots, baggage inspection in airports, etc. Occlusion occurs when two or more objects in a given image touch or overlap with one another. In such situations vision techniques using global features to identify and locate an object may fail because descriptors of a part of a shape may not have any resemblance to the descriptors of the entire shape. Local-feature-based methods have been developed in an effort to solve the occlusion problem instead of global features. 0-4) Comers, protrusions, holes, lines, and textures are examples of local features often mentioned in the papers referred to above. Reliable features are needed to recognize occluded objects since false features in addition to features of occluded parts accelerate mismatching of objects. A polygonal approximation method, ¢5) which has been widely used, may not provide reliable features because it may not produce unique break points which are necessary for reliable matching. Researchers therefore are burdened to compensate for phantom segments as well as the occlusion problem. In our early work, ~6) we tried to represent a boundary with lines and arcs because lines and arcs have explicit features such as line length, angle, radius, arc length, and vexity, etc. However, it is
* Author to whom correspondence should be addressed.
Boundary representation Curvature function
very difficult to find consistent break points between segments. Price (v) used a conceptually simple technique to solve the occlusion problem by following the order of the matched line segments in the model and the input image. He used a device called the disparity matrix. The algorithm used line segments resulting from the polygonal approximation. His method then compared every line segment in the model with line segments in the input image. If the segment pairs were compatible in terms of length and angle between successive segments, the rotational offset between the two segments was entered into the disparity matrix. After all line segments had been compared, the matrix contained the offsets, or disparities for all line segments. By traversing this newly formed matrix diagonally, the longest sequence in the matrix that contains compatible entries can be found. From the longest sequence, the Price method then computed the transform dictated by the segment pairs in the sequence. This value is the final result. This method is very simple but exhibits a lack of tolerance. The method uses local features such as length and angle of segments. Therefore, the information based on locally focused features may fail to match because there is no relational information from features of remote nodes. To improve the above primitive method, Bhanu and Ming ¢8) combined a cluster-structured approach with Price's method. The method gives more reliable matching results than Price's method, but it takes several additional steps to recognize occluded objects in the input. The method still exhibits a lack of tolerance and requires high computational complexity due to the
2047
2048
J.H. KIM et al.
rule-based approach and excessive number of steps in the algorithm Koch and Kashyap35) have used a graph matching technique to handle the partial occlusion problem. They used local features such as rectangular corners and holes. The algorithm is based on the graph matching techniques of Bolles and Cain(4) and polygonal approximation. However, they do not consider relational features which are important for matching based on geometrical features. To overcome the above problems, we apply a neural network approach to an occluded object matching problem in this paper. We use the corner points for matching based on neural networks since it is well known that shape information is concentrated at the points having high curvature. (9) The corner points are usually detected in a curvature function space by capturing all the local extrema whose curvature values are above a certain threshold value. The corner points can generate useful local features such as angle between neighboring corners and distance between every pair of corners. In addition, the corner points are appropriate for constructing neural networks. To extract reliable features, it is very important to detect consistent corner points invariant under translation, rotation, and scale. For the purposes of experimentation, the consistent corner points are obtained by using a constrained regularization technique.~1°) With the result of the unique segmentation, a graph matching technique(4'n'12) based on neural networks will be presented in this paper. The inherent parallelism of neural networks allows rapid pursuit of many hypotheses in parallel with high c o m p u t a t i o n r a t e . (13'14) Moreover, it provides a great degree of robustness or fault tolerance compared to conventional computers because of many processing nodes, each of which is responsible for a small portion of the task. Thus, damages to a few nodes or links do not impair overall performance significantly. For this reason, a Hopfield style neural network has been proposed to solve matching problems for partially occluded objects. Hopfield proposed two types of neural networks: Discrete Hopfield Network (DHN) and Continuous Hopfield Network (CHN). (15-17) The Hopfield networks can be applied to a content addressable memory or combinatorial optimization problems31°-22~ The Hopfield neural networks have been used to solve NP-complete optimization problems and attempted in applications to pattern recognition problems, which can be cast into a combinatorial optimization class. (2i) Some researchers apply the CHN to three-dimensional object matching problems.(22) However, CHN takes much computational time in simulating a differential equation even though it provides good solutions. DHN has been used for two-dimensional object matching problems323) However, DHN is an approximation method and gives only rough solutions, but it reduces computational time. In this paper, a new method for partial object recognition by using a two-dimensional Hopfield neural network will be presented. A Hybrid Hopfield Neural Network (HHN) algorithm, which combines the ad-
vantages of both the CHN and the DHN, is proposed. Unlike the traveling salesman problem implemented by the Hop field neural network, the matching problem is handled by normalizing features made by a fuzzy function which gives distinguishable values to a connectivity matrix. HHN is derived by estimating the behavior of neurons based on the distinguishable value of a connectivitymatrix. The proposed method reduces the amount of simulation time and provides optimal solutions. This paper is structured as follows: Section 2 gives the background explanation of the Hopfield neural networks. Matrix associative memory and two types of Hopfield networks- DHN and C H N - are reviewed. Current methods to recognize occluded target objects and their problems are also reviewed. In Section 3, the theory of the new method, HHN is presented. In Section 4, simulation results are shown. Performances of HHN and DHN are compared by using optical images obtained by a camera and frame grabber (DT-2852). Finally, a summary of the important results and contribution of the paper are given in Section 5.
2. ROBUST BOUNDARY SMOOTHING AND FEATURE EXTRACTION
2.1. Curvature estimation The curvature is generally defined as the derivative of the tangent angle to the curve. The formula for computing the curvature function using a parametric representation of the curve is obtained as follows:
~(t) = X(t)Y(t) - X(t)Y(t)
(1)
In a discrete image, the parametrized boundary is represented by an 8-neighbor Freeman chain code, i.e., it is represented by only eight integer values (0-7), and is therefore severely affected by quantization. Since the formula to compute the curvature function involves the first and second derivatives and the data themselves are noisy, the resulting curvature function computed on a digitized boundary is very ragged, and is difficult to use for further processing. The following equation is one of the k-curvature methods which was developed by Groan and Verbeek(243: 1 -1
1 k 1
I~ik : ~ jFkqi_j --~ j~=oqi_j
(2)
where qi-j is an integer valued (0-7) chain code element. The value of k works as a smoothing factor in equation (2). However, it is difficult to determine an optimal (or unique) value of k. Mokhtarian and Mackworth ~25) used Gaussian smoothing to compute curvature at varying levels of detail. They convolved x(t) and y(t) with a onedimensional Ganssian kernel of width tr, resulting in a smoothed function X(t, cr) and Y(t,a). Hence, rewriting
A robust boundary-based object recognition in occlusion environment
2.2. Optimalboundary smoothing for curvature estimation
equation (1), the discrete curvature using Gaussian smoothing becomes:
k(t, ~r) -- X(t, or)~'(t, tr)
X(t, tr)~'(t, cr)
-
2049
We assume that an original boundary ( f ) is not known. It is only known that the function is smooth. The objective of the algorithm is to optimally estimate the original boundary f, given a measured boundary fm and some knowledge about the noise v,
(3)
e(,,,,/] 3j2 A 256x256 original gun image was acquired through DT-2851 image acquisition system and the boundary [Fig. l(a)] was extracted by contour tracing algorithm. (26) Figures l(b) and (c) show the results after the Gaussian smoothing for some values of cr. We have noisy estimates of curvature with a = 1 [Fig. l(b)], i.e., it is undersmoothed as a result of insufficient filtering. On the other hand, the filter with • = 8 oversmooths the curvature function [Fig. l(c)]. Thus, the curvature extrema are significantly attenuated, and poorly localized. We can clearly see the difficulty in determining a unique (or optimal) smoothing factor•
fm = f + v.
(4)
We find the optimally estimated smooth boundary (fe) using a Constrained Regularization (CR) approach. The following smoothness requirement is used as a stabilizing functional to obtain the smooth curvature estimate since the second derivative represents the roughness of data: ["
~ n I[fe (u)] du. •
/o
250
1.0
200
0.5 !
tt
(5)
2
j
150
100
&5
1.0
,
i
i
I
50
,
,
,
i
I
i
100
i
i
,
~
150
i
,
,
,
I
J
200
i
i
1.5
i
250
1O0
200
300
400
t
(a)
(b)
0.20
0,15
O,TO
0.05
0.00
-0.05
-0.10
-0.15 100
200
300
400
500
t
(c) Fig. 1. (a) The boundary of a gun. (b) The curvature function at ~r = 1. (c) The curvature function at ~r = 8.
500
J.H. KIMet al.
2050
Using a discrete form, we summarize the problem as
m i n { E k [fe(k + 1 ) - 2 f e ( k ) + f e ( k - 1 ) ] 2 } .
(6)
driven method. In other words, the level of noise of the boundary data determines the amount of smoothing. We can obtain the smooth curvature function from the above optimal smooth boundary by equation (1).
2.3. Feature extraction and graph formation 2.2.1. Problem statement. Find a boundary vectorfe to minimize the following stabilizing functional subject to the following noise equality constraint, ( f m - - f e ) T ( f m --fe) = f T fn = ~ ~ Ncr~n
(7)
where N is the number of data points, fn is a residual vector, e is a residual error and ~ is a noise variance. We do not know the noise function, but we assume we know some of its statistical properties. In other words, if v(k) has zero mean, then
vTv =- Z
=- N n.
(8)
k
Thus, we find f~ such that the residual fn has the same statistical property as the noise vector (v). The statement of the above problem can be handled without difficulty by using the method of Lagrange multipliers as follows: J ( f e , )~) =fTCTeCefe + )~[(fm - - f e ) T ( f m --fe) -- ~],
(9) where ), is a Lagrange multiplier. Since the boundary is closed, Ce is an N × N circulant matrix, -2 1
1 0
.-.
0
0
1
...
0
0
-2
Ce= 0
0
0
" .-.
1
1
0
0
...
0
Differentiating
1"] 3. THEORY OF HYBRID HOPFIELD NEURAL NETWORK
.
(10)
-2 1
-
equation (9) with respect to f~ and A
yields
(fm--fe)T(fm--fe) =~,~No~n ,
fe =
1
I+~cTce
)-1 fm=
(11)
(I+TCTCe)-lfm,
where , ' / = 1/.~ and I is an N × N identity matrix. Since the size of the matrix used in equation (11) is N × N and N is usually large, it is clear that the inversion of Ce is impractical. This difficulty can be avoided by making use of the properties of circulant matrices. That is, we can estimate the boundary in a frequency domain using Discrete Fourier Transform (DFT) as follows: 1
Fe(u) -- 1 + "y(Dc)ilFro(u)
In boundary-based approaches, comer points are important since the information of the shape is concentrated at the points having high curvatures. (9) Comer points are detected in a curvature function space by capturing the points whose curvature values are above a certain threshold value. We showed a new comer detection algorithm which provides reliable and invariant comers for a matching procedure in the previous paper. (27) From the comer points, we can extract useful features such as a local feature of an angle between neighboring comers and relational features of distances between the comers. These two features which are invariant under translational and rotational changes are used for the robust description of shape of the boundary. A graph can be constructed for a model object using comer points as nodes of the graph. Each node has a local feature as well as relational features with other nodes. For the matching process, a similar graph is constructed for the input image which may consist of one or several overlapping objects. Each model graph is then matched against the input image graph to find the best matching subgraph.
(12)
where Fe(u) and Fm(u) are DFTs off,(k) and fro(k), and Dc is the diagonalized matrix of CeTCe. Finally, we obtain the optimally estimated boundary vector fe by taking the inverse DFT of equation (12). The optimal value ~/* is iteratively obtained according to the constraint [equation (7)]. The CR technique is a data-
3.1. Discrete Hopfield network (DHN) DHN is an original model of the Hopfield style neural network and has the advantage that it is simple and quick to implement. A two-dimensional array is constructed to apply a matching problem to a neural network. The columns of the array label, the nodes of an object model, and the rows indicate the nodes of an input object. Therefore, the state of each neuron represents the measure of match between two nodes from each graph. The matching process can be characterized as minimizing the following energy function:
E=--A~i~j~k~lfijklVikVjl -~-~i ~EVikWil-~-~-~EEWikVjl(13) l~k k i j~i where V/k is a binary variable which converges to "1" if the ith node in the input image matches the kth node in the object model; otherwise, it converges to "0". The first term in equation (13) is a compatibility constraint. Local and relational features which have different measures are normalized to give tolerance for ambiguity of the features. The last two terms are included to enforce the uniqueness constraint so that each node in the object model eventually matches only one node in the input image and the summation of the outputs of the neurons in each row or column is no more than 1. Some
A robust boundary-based object recognition in occlusion environment papers concerning a matching problem with the Hopfield style neural network have used ~ - ] ( 1 - - V / k ) 2 as a uniqueness constraint. This term implies global restriction. However, matching of occluded objects will not guarantee that every row or every column has only one active neuron. Thus the energy function of the occluded matching problem excludes the global restriction condition in equation (13). In a traveling salesman problem, coefficients BI and B2 are more emphasized than the coefficient A because B~ and B 2 contribute yielding valid solutions. However, conditions of valid solutions in the matching of occluded objects are indefinite. In addition, Cikfl is normalized by a fuzzy function so that it helps us obtain good solutions. Therefore the coefficient A is supposed to be more emphasized in the matching problem. The comparability measure Cikjt is expressed as follows:
Cikjl =Wl )< FOCi, fk) + W2 x F ~ . , ft) + W3 x F(rij, rkt).
(14)
The fuzzy function F(x, y) shown in Fig. 2 has a value 1 for a positive support and - 1 for a negative support. The value of F(x, y) is defined such that if the absolute value of the difference between x and y is less than a threshold, then F(x, y) is set to 1, otherwise F(x, y) is set to - 1 . The first term is related to the local feature of (g k)th neuron. If the ith node of a model and the kth node of an input have similarity in their local features, then the value of the fuzzy function F(~, fk) is set to 1, otherwise set to 0. The second term is related to the local feature of (j, l)th neuron. F(/), j~) is set to 0 or 1 by the procedure as explained above. The third term is related to the relational feature between two neurons. If the relational feature between ith and jth node in a model is similar to the relational feature between kth and/th node, then F(rij, rkt) is set to 1, otherwise set to 0. The coefficient Wi would add to 1. Therefore the value of Cikjl is normalized from - 1 to 1. The performance of the algorithm is significantly influenced by the weight and the tolerance of the fuzzy function. As the tolerance 0 in the fuzzy function is larger, robustness of the
F(x,y)
2051
algorithm is increased but mismatching may occur. On the other hand, as the tolerance 0 is smaller, the algorithm becomes very sensitive to the noise level of an input so that a matchable node may not be detected. However, the mismatching rate will be decreased. For example, when the current comer detection algorithm is applied to a noisy boundary or small boundary data points, comer points can be displaced by the smoothing effect. In this case, the tolerance should be increased even though it causes mismatching. The weight Wg is decided by the significance of the features. In this graph matching, local features do not contribute to interactions between neurons but relational features do. Therefore, relational features are more emphasized than local features in the neural network application, i.e. the weight of the relational feature W3 has a larger value than other weights. The second term of the energy function of the matching problem is represented by a quadratic function so that it can be cast into the Hopfield energy function. The third term also has the same formation as the second term of the energy function. Therefore, equation (13) can be cast into a Hopfield style energy function as follows: 1
E=--~ZZZZTikjlVikVjl-i j k t
ZZlikWik 1 j
(15) Tikjl = ACikjt - Bl6ij - Bz6kl + (BI + Bz)6ij6kl where 60 = 1 when i = j , otherwise 60 = 0. Hopfield proved that the energy function is a Liapunov function. Thus the energy function converges to a local minimum when the states of neurons converge to stable states. The matching process is based on global information of the image which provides excitatory or inhibitory supports for matching local features. The simulation is a random process which will arrive at a stable state when the energy function of equation (14) is at its minimum. The algorithm is summarized as follows: 1. Set the initial state of neurons. 2. Randomly pick up a node (i,k). 3. Calculate its input,
"ik : Z j
Z
Ti'jlVfl ~- lik. t
4. Decide the new state of each neuron according to the following rules:
Vik =- 1 if uik > 0.5, Vik : 0 if Rik < --0.5. 0
U
-1
Fig. 2. The shape of fuzzy function.
5. Count the changes of the state. If they have not changed for a given number of steps, stop and go to the next step or repeat the process from (2). 6. Output the final states of nodes V~k which show the matching assignment between the model features and the input features. DHN is very sensitive to an initial state so that outputs of the neural network will converge to the local minimum close to the initial state. Therefore, we need to set initial conditions by local features in order to prevent
2052
J.H. KIM et al.
the algorithm from falling to get desired solutions. For each neuron, local features of corresponding nodes are compared. If they are similar to each other, then the initial value of the neuron is set to 1, otherwise set to 0. The threshold of the initial setting is the same as tolerance in the comparability term of the energy function. The initial condition may be close to a desired solution and then the algorithm will find the desired solution quickly. When Tiklk is equal to 0, the convergence of the network has been proved. In our algorithm, although Tiklk is not equal to 0, it will converge to a stable state, since the updating of the state of neurons is conducted such that the energy of the system is decreased. The convergence can be proved by the following. The change of the energy due to the change of V/is A E = --
(
~
1
Z
\j
)
~(1/R,~) i
rikik = ACikik -- Bl~ii -- B2~kk -}- (B1 + S2)~ii~kk. (17) Therefore,
Tikik ~-" ACikik -- B1 - B2 -t- (B1 + B2) = ACikik . Let A be a unit number for convenience, then - 1 < Tikik < 1.
(18)
Now, when AV > O,
ZzikjlVjl-[-I
]ik-~lzikikmVik~z
/
1 g(uik) -- 1 + e[-u~k/°,]
TikflVjl + lik -
Vik
(21)
(22)
where Ot is a parameter, the sigmoid function is rendered more gradually or abruptly. This term comes from the point of view that neural input state ulk will lag because of the existence of capacitance in an analog electrical circuit. Thus, there is a resistance-capacitance charging equation, called the equation of motion that determines the rate of change of uik. It is the first order differential equation. As explained in Section 3.1, simulation of the differential equation requires a lot of computation. We can solve the equation of the motion in a small time interval to reduce the computational time caused by the equation of motion. The equation of motion is as follows:
(19) Z
g-' (v) dV
f0"v~k
where g is a sigmoid function and Rik is the input resistance of a neuron. The sigmoid function g(ui~) can be represented as follows:
l
From equation (15),
>
k
TiijtVjl +lik + ~ TilikA Vik A Vik .
(16)
(~j
adjusted by an analytic procedure based upon the CHN theory. In fact, adjusting of neurons is accomplished without iterations. Thus the running time of HHN is as fast as that of DHN. This method is different from the assumption of Lin's method, (22) which is constraint ~ Vik = N is valid for initial states, since occluded objects can lose a lot of segments of the original. Let us consider the adjusting procedure beginning with CHN. The matching process of CHN can be characterized by the same energy function as that of DHN. Only an integral term is added to the energy function as follows:
duik
> 0
dt
1
--
uikl/~ q- Z
Z j
ZikjlVjl -}- Iik"
(23)
t
A sigmoid function g is linearly modeled in HHN:
also when AV < 0,
I g(Uik) =
(20)
O~
auik + b, [ 1,
Uik ~_ --U 0 , - u o < uik < uo, ui~ >_ uo.
(24)
where Uo is positive. Figure 3 shows the linearly modeled sigmoid function.Therefore, a modified motion equation is as follows: Therefore, AE always decreases and converges to a local minimum when the previous updating rule is used.
duik dt
--
blik/A -[- Tikik(aUik -~- b) + ~ Z TikjlVjt -[- Ilk. j(j~i) l(l¢k)
(25)
3.2. Mapping D H N to H H N
DHN gives an approximate solution of the problem so that some neurons might have unexpected final states. This may cause mismatching of the objects. On the other hand, CHN gives a near optimal solution since it seeks a solution in a continuous domain. HHN combines the above two types of Hopfield neural networks. The principal concept of HHN is that the output of DHN is used as the input of CHN since the configuration of the output of DHN is very close to the stable state of desired output of CHN. After running DHN, the output is
In a small time interval, ~-]TikjtVjt (jl ~ ik) can be considered as a constant in synchronous/asynchronous neural systems, so that the behavior of the input state uik in equation (25) is:
blik(t ) = (Uini __ g l ) e k2t _}_rl
(26)
where Kl =(b+ Y ~ T i k j l V j t + l i k ) / ( 1 - - a T i k i k ) , K2= (1--aT~kik), and tlini is the output state of DHN. As shown in equation (26), Uik(t) is decreasing exponentially with small change of time At. The range
A robust boundary-based object recognition in occlusion environment
2053
v
b
I0 Fig. 3. A linearly modeled sigmoid function. of A t with constant Y~T/kjlVjk depends on initial states and interactions of unstable states which cause fluctuation of the change A E in the function E. In our algorithm, almost all of the initial states of neurons uini are close to stable states and only few neurons would be unstable. It is therefore possible that one can estimate the future behavior of each neuron from equation (26), For example, if Uin i is greater than K1, uik will monotonically decrease, so that input of the ikth neuron will get close to K1; an input state gets closer to KI as the transient part is exponentially decreasing. K1 is not a constant but has time-varying behavior of inputs of a neuron as time evolves. Suppose that M is the number of columns and N is the number of rows in a neural structure. We might have mismatched neurons at initial states of CHN. Let active neurons at the initial state be N % e , where N ~ is the number of exactly matched neurons and e is the number of mismatched neurons at the final state of DHN, then the number of inactive neurons is N x M - (N' + e). Let us assume that T/kjl is c~(c~ > 0) for the positive support and -c~ for the negative support for simplification. Now, Kl(t) is calculated and the final output of each neuron can be analysed and predicted as follows: (1) For the neuron to be unmatched:
Z K~(to) =
i
Z
TikjlVjl -~-lik + bTikik t
1 - aTikik - c ~ ( N ' + e) + Ilk + bTikik 1 -- aTikik
~_,~_Tikj, Vjt+lik +briki, Kl(t0) =
j
1
I -- aTikik
- o ~ N + lik + bTikik 1 -- aTikik Therefore, Vik(tf)
=
g(uik(tf)) = g ( K l ( t f ) = g ( K i (to)),
if K1 (to) = - c ~ ( N ' + e) + lik + bTilcik < - u o .
(27)
(2) For the neuron to be matched:
~_,~_,Tiki, Vj,+ li~ +bTikik Kl(to)--
j
l
1 - aTikik = - o ~ ( N ' - e) + Ii~ + bTikik 1 - aTikik
~_,~_Tik,,Vjt+Ii, +bTikik Kl(to) =
j
(28)
l
1 - aT~kik c~N + lik + bTikik 1 -- aTikii Therefore, ~k(tf) = g(uik(tf)) = g(K1 (tf)) = g ( g l (to)), if K1 (to) = c~(N' - e) +lik + bTikik > uo where To indicates initial time and tf indicates final time when neural states reach stable points. As indicated in equations (27) and (28), a final output state of a neuron can be predicted by an initial output state because the value of Kl(to) between a matched neuron and an unmatched neuron is distinguishable. For a neuron to be unmatched, Kl(to) is always a negative support because -c~(N' + e) << - 1 and Ilk + bTikik is not more than 1 (b is set to 0.5). The restriction therefore is always satisfied. For a neuron to be matched, if e is less than Nt-1/~(uo-lik-bTikik), then it makes the matched neuron active, otherwise mismatching occurs. This condition requires DHN to give an approximate solution as an initial state of neurons. If violation of the restriction for the neuron to be matched occurs, all neurons to be matched will be inactive. Thus, one can find that the procedure goes the wrong way and can correct the situation by starting again from the first stage of the algorithm. After running DHN, Kl(to) is calculated with the output state of DHN and the final output state of HHN is directly obtained from Kl(t0). Few neurons might have similar local features generated by a fuzzy function because of the use of simple features and tolerable threshold levels of the
2054
J.H. KIMet aL
(a)
(b)
(c)
(d)
(e)
(f)
(i) (g)
(h)
(1)
~)
(k) Fig. 4. Model objects and occluded images.
fuzzy function. These similarities between the different segments make a false decision. When the neurons have correspondences of relational features of some neurons to be matched, the neurons remain unstable or cause mismatching in DHN. Once obtaining output stable states of neurons of DHN, we can more highly emphasize relational features in order to adjust the states of the neurons because those are related to both distance and order of all positions of active neurons. It
therefore gives more confidence to the theory of HHN and can even improve the performance of CHN. 3.3. Reconstruction and decision
HHN provides good matching results between a model object and an input image. Active neurons have corresponding matched nodes. We decide from the information of the active neurons whether the two
A robust boundary-based object recognition in occlusion environment 15(]
...................
I
2055
. . . . . . . .
i
. . . . . . . . .
i
. . . . . . . . .
~ . . . . . . . . .
I00 so
/60
40
20
0
,
,
,
12~
I
i
i
,
I
140
i
i
i
I
160
,
,
,
180
I
,
I
*
200
220
0
.......
i .........
I,,H,....I
110
IOO
.
.
.
. . . .
80
ICO
120
140
160
5D
180
,
I
.
.
.
.
,
.
.
.
.
.
.
.
r
. . . .
100
150
160
I
•
. . . .
150
•
•
,
•
-
•
. . . .
I
200
250
(d)
(c) 200
I .......
14D
(b) .
60
i .........
130
(a)
40
.........
120
.
r
. . . .
I
. . . .
50
I00
50
I
50
r
1 O0
i
i
I
I
150
,
,
*
I
I
i
i
i
200
250
0
100
(e)
150
200
250
(f) Fig. 5. Boundaries and comer points of model images.
objects are matched or not. Two strategies are chosen for the decision. In the first strategy, the matching rate (MR) is measured as follows:
MR =
Number of active neurons Number of columns
(29)
MR implies the extent of occlusion, when all neurons are exactly matched. Let us set a threshold n. If matching rate is less than the threshold ~;, then we consider a model is heavily occluded or not matched with an input image. If the first strategy is passed, then the second strategy is applied. The corresponding node
numbers of active neurons are extracted and inserted in MATCH_ARR so that it has the information of matched nodes between a model object and an input image. The model object is located and reconstructed on the input image by translating and rotating it from the knowledge of MATCH_ARR. As the second strategy, a kind of template matching is performed as follows:
~-~(gm(Xik) -- gi(Xik)) 2 < ~
(30)
xik~{gm(x,k)=0} where functions gm(Xik), gi(Xik) are gray levels of a model object and an input image, and xik is a pixel inside
2056
J.H. KIM •
2~0
•
.
,
.
.
.
.
,
.
.
.
et al.
.
2~0
150
~
.
~
"f
:I .....
~ .
lC~
15o
200
250
14
~00
i i i
i
120
j
i
i,
J,
140
. * 1 . . , 1 . . ,
160
(a)
180
i , , *
200
220
240
(b) 140
.
.
.
.
i
.
.
.
.
i
.
I
.
.
.
.
,
.
.
i
.
.
.
.
i
.
.
.
.
.
.
120
1O0
80
60
40
20 o
. . . .
0
,
. . . .
,
1OO
,
150
....
,
.
200
.
.
250
.
i
.
.
.
.
50
.
.
100
(c)
.
.
.
150
i
200
.
.
250
(d) 200
''
150
100
50
~ , , i , , , 60
I
8o
~oo
12o
~4o
~
18o
80
1~0
120
40
160
h
I
180
200
0
(f)
(e) Fig. 6. Boundaries and comer points of occluded images. the model object after segmentation. If the summation of the square error is less than error bound (, then the two objects are considered as matched. This procedure finally establishes whether the objects are matched or not.
4. EXPERIMENTAL RESULTS
We generate model and input images using cameras and a digitizer for the experiment. The performance of the primitive algorithms are tested and compared to the other algorithm such as DHN. Twelve sets of model and occluded input images are obtained and used in the matching procedure to
understand the capability of the HHN to find objects in occluded images. The models consist of a set of tools and a couple of guns. Images are obtained by a camera and a digitizer (DT-2852). Several model objects and occluded images are shown in Fig. 4. Once images are obtained, the boundary is extracted by an 8-neighbor Freeman chain code. The number of boundary points for the models ranges from 225 to 514. After extracting boundaries of images, comer points are detected by the constrained regularization approach. (23) Figure 5 shows boundaries and comer points detected for the model images. The number of segments (nodes) in models ranges from 4 to 15. From each segment, features are extracted: an angle as a local feature and the distance
A robust boundary-based object recognition in occlusion environment 250
.
.
.
.
,
.
.
.
.
r
.
.
.
200
.
•
,
"
•
,
•
•
i
•
2057
•
•
,
•
•
•
,
.
.
.
.
.
.
.
200
150 IOO
i00
5o
50
0 . 1OC
.
.
.
I 150
.
.
.
.
I 200
.
.
.
. 2~
,oo
120
140
160
(a) 200~
-I
'
'
.
,
.
.
180
220
200
240
(b)
.
-
,
'
'
.
,
.
.
.
[
l o o m
L L/ O
~
.
.
.
.
i
50
.
.
.
.
h
~00
.
.
.
.
L
1,50
.
.
.
oi
.
200
250
.
.
.
I
.
5(3
(c) 140 - -
- - ,
.
.
.
.
T
.
.
.
.
.
.
.
~
100
.
.
.
I~D
.
i 200
.
.
.
.
(d) .
,
.
.
.
.
,
•
•
I oo
J i
i 50
100
T5O
200
250
40
6
(e)
100
(f)
100
50
r
o : , , , ~o
t , , ~oo
, ,~o
.
,
.
,
~*o
.
, Tso
. . . . . . 18o
2o0
(g)
(h)
Fig. 7. Matched nodes in occluded images.
120
140
160
iBo
2058
J.H. KIM et aL 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (a)
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (c)
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Initial
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Output
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
States
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 States
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1
I 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(18)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
of H H N
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(b) O u t p u t
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
states
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
of D H N
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(13)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
(8)
~g. 8. Neuron states after convergence.
between nodes as a relational feature. Occluded images are obtained through the same procedure as models as shown in Figs 4 and 6. For experiments, 50 images which are combinations of models are created. The number of boundary points in occluded images ranges from 421 to 816. The number of nodes ranges from 6 to 26. The boundary segmentation algorithm is very reliable in the sense that it is not noise dependent and thus it keeps detecting the consistent comer points from the object in different scenes. However, some models in occluded images are occasionally oversegmented or lose
some comer points. They affect the matching procedure as occluded parts do. A matching algorithm should be tolerant for false segmentation which occurs in the preprocessing stage as well as the occlusion. HHN shows good performance in the above situation. Fig. 7(f) shows robustness of the algorithm under over-segmentation as well as occlusion. The number of segments of the model is 8, but the model has 14 segments in the occluded image. The 8 nodes of the model are exactly matched with those of the occluded images. Experiments are also conducted for heavily
A robust boundary-based object recognition in occlusion environment
2059
5. CONCLUSION
(a)
(b) Fig. 9. Reconstructed image: (a) model object; (b) reconstructed model,
occluded images. When 67% of the nodes in a model are lost, the HHN still picks up corresponding nodes exactly. To compare the performance between DHN and HHN, the matching score (MS) is MS= 1 -
No. of mismatched nodes - No. of unmatched nodes No. of matchable nodes (31)
computed as follows: The average matching scores of DHN and HHN are 0.62 and 0.95, respectively. It means that HHN eliminates most mismatched nodes in DHN. We know from the average matching scores that the performance of HHN is superior to that of DHN. Figure 8 shows output states of neurons in DHN and HHN. In HHN, remaining active neurons are diagonally located after adjusting output states of DHN. Figure 7 shows output plots of HHN. " * " shows matched nodes between the models and the occluded images. The figure shows that the desired matching results are successfully obtained. After the matching procedures are performed, a decision is made based on the template matching technique. Exact reconstruction is complete in Fig. 9. The black area indicates the reconstructed model object in the input image. From Fig. 9, we can recognize that the model object is in the input image.
Issues related to the unique boundary representation and a reliable matching have been discussed in this paper. The current methods to compute curvature on a digitized boundary have a common difficulty in determining a unique smoothing factor. We solved this problem by applying a constrained regularization technique to the digitized boundary. By using properties of circulant matrices we significantly reduced the computation time since we avoided inversion of a large matrix. From the smooth boundary, we obtained a unique curvature function which had invariant properties. Thus, we could detect comer points invariantly. We defined comer sharpness to compensate for the slight smoothing effect of the regularization as well as to mimic a human's behavior of detecting comer points. Once solutions close to a global minimum are obtained in DHN, HHN can find the desired output by adjusting the states of neuron outputs. If the output of DHN is trapped into a local minimum well away from a global minimum, then HHN fails to find a desired solution. In the experiment, 7 out of 100 images fail to be matched. The three models are heavily occluded in the images and remaining nodes range only from three to four with many nodes of images. In that case, even human perception can hardly recognize the models from the occluded images. In conclusion, HHN gives a reliable matching of the corresponding segments between two objects. The method eliminates the possibility for a part of an object to be matched to similar segments in a different object by finally adjusting states of neurons. The template matching reconstructs a prospective model object in an input image and decides whether the input image has the one corresponding to the model object. Acknowledgements--This research has been supported by FAA
under Grant No. 93-G-012, ARO under Grant No. DAAL03-900913, NASA-CORE under Grant No. NAGW-2924, and ARPA under Grant No. N00600-93-K-2051.
REFERENCES
1. B. Bhanu and O. D. Faugeras, Shape matching of twodimensional objects, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6, 137-155 (1984). 2. W. K. Chow and J. K. Aggarwal, Computer analysis of planar curvilinear moving images, IEEE Trans. Comput. C26, 179-185 (1987). 3. J.L. Turney, T. N. Mudge and R. A. Volz, Recognizing partially occluded parts, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-7, 410-421 (1985). 4. R.C. Bolles and R. A. Cain, Recognizing and locating partially visible objects: The local-feature-focus method, Int. J. Robot. Res. 1, 57-82 (1982). 5. Mark W. Koch and Rangasami L. Kashyap, Using polygon to recognize and locate partially occluded objects, IEEE Trans. Pattern Anal. Mach. lntell. PAMI-9, 483-494 (1987). 6. J. H. Kim, S. H. Yoon and K. H. Sohn, Significant point detection and boundary representation with lines and circular arcs, The Fourth Int. Conf. on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (lEA/AlE-91), 694-701 (June 1991).
2060
J.H. KIM et al.
7. K. E. Price, Matching closed contours, Proc. 7th Int. Conf. Pattern Recognition, 990-992 (July 1984-August 1984). 8. B. Bhanu and John C. Ming, Recognition of occluded objects: A cluster-structure algorithm, IEEE Trans. Pattern Anal. Mach. Intell. 20(2), 199-211 (1987). 9. F. Attneave, Some informational aspects of visual perception, Psychol. Rev. 61(3), 183-193 (1954). 10. KwangHoon Sohn, Jung H. Kim, Sung H. Yoon, Yonghoon Kim, Eui H. Park, C. A. Ntuen and Winser E. Alexander, Optimal boundary smoothing for curvature estimation, Proc. 25th Asilomar Conf. on Signals, Systems, and Computers, Pacific Grove, CA, 1220-1224 (4-6 November 1991). 11. W.S. Rutkowski, Recognition of occluded shapes using relaxation, Comput. Graphics Image Process. 19, 111-128 (1982). 12. B. Bhanu, Shape matching and image segmentation using stochastic labeling, Image Processing Institute, University of Southern California, (August 1981). 13. R. E Lippman, An introduction to computing with neural networks, IEEE ASSP 000, 4-22 (1987). 14. M. Takeda and J. W. Goodman, Neural networks for computation: Number representations and programming complexity, AppL Opt. 25(18), (1986). 15. J. J. Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons, Proc. Natl. Acad. Sci. USA, 81, 3088-3092 (May 1984). 16. J. J. Hopfield, Neural networks and physical systems with emergency collective computational abilities, Proc. Nat. Acad. Sci. USA, 79, 2554-2558 (1982). 17. J.J. Hopfield and D. W. Tank, Computing with neural circuits: A model, Science 233, 625-633 (1986).
18. Robert J. Mceliece et al., The capacity of the Hopfield associative memory, IEEE Trans. Inf. Theory, IT-33(4), 461482 (1987). 19. D.W. Tank and J. J. Hopfield, Simple neural optimization networks: An A/D converter, single decision circuit, and a linear programming circuit, 1EEE Trans. Circuits and Systems 33(5), 533-541 (1986). 20. Bang W. Lee and Bing J. Sheu, Modified Hopfield neural networks for retrieving the optimal solution, IEEE Trans. Neural Net. 2(1), 137-142 (1991). 21. J.J. Hopfield and D. W. Tank, Neural computation of decisions in optimization problems, Biolog. Cybernet. 52, 141-152 (1985). 22. W. Lin, E Liao, C. Taso and T. Lingutle, A hierarchical multiple-view approach to three-dimensional object recognition, IEEE Trans. on Neural Net. 2, 84-92 (1991). 23. W. Li and M. Nasrabadi, Object recognition based on graph matching implemented by a Hopfield-style neural network, Int. J. Conf. Neural Networks, II, 287-290 (18-22 June 1989). 24. E C. Groan and P. W. Verbeek, Freeman-code probabilities of object boundary quantized contours, Comput. Vis. Graphics Image Process. 7, 3914-402 (1978). 25. F. Mokhtarian and A. Mackworth, Scale-based description and recognition of planar curves and two-dimensional shapes, IEEE Trans. Pattern Anal. Mach. Intell. PAMI8(1), 000 (1986). 26. T. Pavlidis, Algorithms for Graphics and Image Processing. Computer Science Press, Murray Hill, NJ (1981). 27. K. Sohn, W. Alexander, J. Kim and W. Snyder, A constrained regularization approach to robust corner point detection, IEEE Trans. Syst. Man Cybernet. SMC-24(5), 820-828 (1994).
About the A u t h o r - - J U N G HYOUN KIM received the B.S. degree in Electronics Engineering from Yonsei University, Seoul, Korea in 1974, the M.S. degree in Electrical Engineering and the Ph.D. degree in Electrical and Computer Engineering from North Carolina State University, Raleigh, in 1982 and 1985, respectively. He was employed as an Engineer at Samsung Electronic Ltd, Korea, from 1973 to 1976. He worked as a Team Leader of an R&D Group at Gold Star Precision Central Research Laboratories, Korea, from 1977 to 1980. Currently, he is Professor of Electrical Engineering at North Carolina A&T State University, Greensboro, NC. He has published 70 papers in refereed journals, and national and international conference proceedings. His research interests include image processing, computer vision, computational algorithms, and neural networks. Dr Kim is a member of IEEE, ACM, INNS and Sigma Xi.
About the Author-- SUNG HO YOON received the B.S. degree in Electrical Engineering from Seoul National University, Seoul, Korea, in 1984 and the M.S.E.E. degree in Electrical Engineering from North Carolina A&T State University in 1992. Currently, he is pursuing the doctoral degree at North Carolina State University, Raleigh. Mr Yoon is a member of IEEE, Signal Processing Society and Korean Scientists and Engineers Association. His research interests include digital signal processing, image processing, machine vision, communication, and wavelet transform.
About the Author - - KWANGHOON SOHN received the B.S. degree in Electrical Engineering from Yonsei University, Seoul, Korea in 1983, the M.S.E.E. degree in Electrical Engineering from University of Minnesota in 1985, and the Ph.D. degree in Electrical and Computer Engineering from North Carolina State University in 1992. He was employed as a senior member of the research staff of the Satellite Communications Division at Electronics and Telecommunications Research Institute, Daeduk Science Town, Korea from 1992 to 1993. Also, he was employed as a postdoctoral fellow at Magnetic Resonance Imaging Center in the Medical School of Georgetown University. Currently, he is an Assistant Professor of Communication Engineering at Yonsei University. His research interests include pattern recognition, computer vision, image processing, and neural networks. Dr Sohn is a member of IEEE, Korean Institute of Communications Science, and Korean Institute of Telematics and Electronics.