Recognition of occluded objects with heuristic search

Recognition of occluded objects with heuristic search

0031-3203/90 $3.00 + .00 Pergamon Press plc © 1990 Pattern Recognition Society Pattern Recognition, Vol. 23, No. 6, pp. 617-635, 1990 Printed in Grea...

1MB Sizes 3 Downloads 83 Views

0031-3203/90 $3.00 + .00 Pergamon Press plc © 1990 Pattern Recognition Society

Pattern Recognition, Vol. 23, No. 6, pp. 617-635, 1990 Printed in Great Britain

RECOGNITION OF O C C L U D E D OBJECTS WITH HEURISTIC SEARCH* S. CHAUDHURY, A. ACHARYYAand S. SUBRAMANIAN'~ Department of Computer Science and Engineering, I.I.T., Kharagpur, PIN 721302, India and GUTURU PARTHASARATHY Department of Electronics and Communication Engineering, I.I.T. Kharagpur, India (Received 10 March 1989; in revised form 21 June 1989; received for publication 27 September 1989)

Abstract--This paper presents a new heuristic search based approach for recognition of partially obscured planar shapes. Based on a general scheme for representing the planar shapes in terms of their contour segments, a state space formulation is obtained for the recognition problem. The search in the state space is guided by an admissible heuristic function which is not dependent upon the features actually used for representing the shapes. Some schemes for toning up the efficiency of the method are also discussed. A study of the method was carried out by experimenting with some typical objects and results of experimentation are presented. Occlusion Planar shape recognition Pruning strategy

I.

State s p a c e

INTRODUCTION

Object recognition is a widely studied problem of computer vision. "'2~ Complexity of the problem is dependent upon the ways in which objects are configured in the scene. Occlusions are inevitable when objects are allowed to occur in random orientations and positions. When objects are only partially visible, volume of information available for recognition tasks becomes limited and objects are to be identified despite missing information. Consequently, specialized recognition strategies are needed to tackle the problem of partial shape recognition. This paper addresses precisely this issue in the context of planar shapes. Planar shapes are considered because the associated problem is relatively less complex but still widely applicable since many industrial objects can be considered planar due to small thickness. Object recognition systems generally use precompiled descriptions of the model objects that can possibly occur in a scene. This is what is known as model based vision. 2-D object models are built around either global or local features. Global feature based methods are not of much use for partial shape recognition because of the difficulties involved in correlating global descriptors computed from a part of the object to those of the entire object. Local * Part of this work is sponsored by Department of HRD, Government of India through the project on Knowledge based Systems. t To whom correspondence should be addressed.

Heuristicsearch

Ae* algorithm

features used can be further classified into (i)closely localised and (ii)extended features. Closely localised features such as corners and holes can be extracted from small image windows. Extended features, on the other hand, are extracted from the shape boundaries. They are, in general, contour segments of significant extent. These features are relatively more informative, less ambiguous and more precisely located. Closely localised features are mostly used in conjunction with spatial relations between them in what are called the relational model based methods. But these methods require that a large number of features must be detected and grouped together for object recognition. On the other hand, the extended feature based methods require lesser number of features for characterising the shapes and consequently algorithms are less complex. One of the earliest object recognition systems capable of recognising partial planar shapes was developed by Perkins: 3~ He used extended boundary segments called concurves for representing the shapes and obtained tentative matches between model and image concurves. Based on these matches transformations from model to image coordinates are computed. A global check of the tentative transformation is performed by matching the complete model with the image. Turney et al. ~4) used subtemplate matching scheme in generalised Hough space. The method uses salient subtemplates of the object boundaries as features. The model subtemplates are matched with segments of the image contours using least-squares fit in O-S space. Depending on the position of the

617

618

S. CHAUDHURYet al.

match an accumulator cell in Hough space is incremented. The amount of increment is related to the quality of the match as well as saliency of the subtemplate. Accumulator having the largest value is considered as the centroid of the matched object. The method sequentially searches for all known models in the scene to find the best match. Knoll and Jain, ~5~ in their feature-indexed hypothesis consider boundary features which are multiply present and well distributed over the set of objects as reliable in the event of occlusions. The recognition procedure involves verification of hypotheses formed on the basis of the best matches of the appropriately positioned model segments with the segments of the image boundary. For all the above methods, once a match is found for a model feature in the image, the corresponding object hypotheses are considered for further analysis. Consequently, performance of the systems are more critically dependent upon the nature of the features and the complexity of the feature detection algorithms. Instead of specialised features, many recognition methods rely on easily detectable features and consider all possible matches before making recognition decisions. Stockman et alJ 6) proposed a method of object detection where local evidences are accumulated by pose clustering. For planar partial shape recognition Bhanu and Ming tT~ used clustering and subsequent verification based on the sequences of matched sides of the polygons approximating the object boundaries. Obviously these systems need to untangle a lot of spurious matches and consequently recognition techniques are elaborate and complex. Another class of methods adopt a different policy. These methods neither consider all possible matches nor do they commit to a hypothesis immediately on detection of a match of a model feature. They consider promising matches of the model features and look for more features of the objects in the image before coming to a conclusion. Ayache and Faugeras ts~ propose a method named "HYPER" for recognition and positioning of 2D objects. It involves generation and recursive evaluation of hypotheses. The method uses polygonal approximations of contours of the model and scene objects. Rives et al. t9) employ Hough transform based technique using polygonal approximations of the object contours. Rummel and Beutel tl°~ suggest a heuristic search based method. Features used are corners, circles and straight line segments. The search is guided by a generative model of the objects produced interactively. Wallace "1~ used another heuristic search based method using local features and spatial relations between them. There is yet another class of methods that use strings of primitives for representing the shapes and apply techniques of string matching for shape recognition. Tsai and Yu "2~ used attributed string matching with merging for recognition of distorted shapes. Mehrotra and Grosky~13~ adopting a similar approach, incorporated in their system a data-driven indexing mechanism for model retrieval. The method

works for both occluded and non-occluded shapes. Gorman et al. "4~ present a dynamic programming based shape recognition technique using local features described by Fourier descriptors. Most of the recognition schemes discussed in the above have tended to be dependent on the choice of features which, in turn, is dependent upon the set of object models used. Heuristics employed to speed up the system are also entwined with the features chosen. Hence, there is a need to evolve a general problem solving paradigm to meet various problem situations and application demands. In this paper a heuristic search based recognition scheme is proposed. The search is carried out in an appropriately designed state space with formal specification of transition rules and goal states. This formulation exploits general characteristics of highly structured organisation of the extended feature based shape representations. Consequently, the method can be applied for any class of object models irrespective of the type of the extended features used for representing the shapes. An admissible heuristic function proposed for guiding the search in this method guarantees an optimal solution. Admissibility property of the function is not dependent upon the features actually used. Also, it is possible to tone up the efficiency of this method by different search space pruning strategies which are not directly related to the object models involved. These are the distinguished and advantageous aspects of the work in contrast to earlier heuristic search based methods/1 o. 111 The dissimilarity between the shapes is defined in terms of transformations normally used with string matching operations. But, unlike the conventional string matching schemes, the matching procedure here accommodates possible rotations and occlusions. Also, instead of trying for all possible matches, dynamically the best match is looked for through the search process. These features make the current scheme more suitable for various applications than the previous works based on string matching. "2-14~ A study of this method was carried out by experimenting with some typical industrial objects. Even though the present method can be used with any kind of extended features we have considered sides of the polygons approximating object contours as features because of universal applicability of the polygonal representation. A new operation for merging sides during matching is proposed in this connection. This operation helps in making the matching process robust enough to take care of errors in polygonal approximation and distortions in object contours due to noise. Experimental results show effectiveness and utility of the recognition technique. In the next section, we introduce a general representational framework for the planar shapes. 2. SHAPE REPRESENTATION

In general, 2D objects are characterised by the shapes of the bounding contours. If these closed

Recognition of occluded objects

boundaries can be appropriately segmented into primitives, these primitives and their contextual relations form a logical representation of the shapes. The description of the shapes in terms of the primitives and their contexts can be generated using a structure which we call ARC (Attributed Relational Cycle). ARC's are a special class of attributed relational graphs which have been found very useful for computer vision problemsJ T M In ARC, each node has only one incoming and one outgoing arc. For a given problem domain, a description of the shapes are usually formed on the basis of a known set of primitive types and/or their contextual relations. The set of primitive types and possible contextual relations between them represent a particular modelling strategy. Using different sets of primitive types, different descriptions can be constructed for the same shape. Hence, any description of a shape should be considered with reference to the universe U of primitives. We may represent U as ( A , B ) where A is the set of descriptions about the primitives and B is the set of descriptions about contextual relations between the primitives. More precisely, A is defined as

a = {(sl,x)[i = 1 to M and x e R a } .

Each element of A is a tuple consisting of a syntactic symbol si and a d-dimensional attribute vector x. The syntactic symbol si denotes a particular primitive structure. The corresponding attribute vector indicates relevant geometrical measurements. Values of the components of the attribute vector may change for different instances of structurally identical primitives (i.e. with same sl). Set A is basically collection of description of all such instances of different primitive types. Thus, even though number of syntactic symbols can be finite, the set A is potentially infinite as the s e t of possible values for x can be infinite. Similarly, B is defined as B = {(ci,y)li = 1 to K and y~Rd'}.

With reference to a universe U of primitives, an ARC representation of a shape (_Hv) is defined as the tuple H u = (V,E, Gv,G~).

Here, V = {Vo,v~, v2 . . . . . vm} is a finite set of nodes representing contour segments. The nodes are enumerated according to the order in which they are encountered while traversing the contour, starting from an arbitrary segment, in a particular (clockwise or anticlockwise) sense. E = {ex, e2, . . . , era} is a set of edges or ordered node pairs such that e~ = (V~,V,+l}modm); denotes a branch emanating from vi and terminating on V(i + 1}rood m ' These edges represent contextual relations between the contour segments. Gv, GE are the functions which associate the elements of V and E with the elements of U. With

619

reference to the sets A and B, G v and Gs are defined as"

Gv: V--} A

and

G~: E--* B.

Hence, shapes are uniquely characterised by the particular way of mapping contour segments and their associated relations to the elements of A and B as indicated by the functions Gv and Ge. The functions Gv and G e are constructed through the process of feature detection. It is easy to see that a contour segment and its relation with the successor can be considered together as a composite entity incorporating properties of the corresponding node and the arc. Accordingly, an ARC Hu can be transformed into a simpler representation. To perform the transformation, the universe of the primitives should be appropriately modified by coalescing the sets A and B. The new universe UN = ( L ) , where L = {lll is constructed from (a,b), a E A , b E B and (a, b) is structurally compatible). Each element l is a tuple (f~, z where fi is the syntactic symbol and z is formed by augmenting attribute vectors of the corresponding elements of A and B. Now, in order to represent node arc combinations in a shape, we may construct a set N such that N = { n [ l i = 1 to m and n[ represent the node-arc

combination (vl, ei)}. It is required to define a function F : N ~ L which maps each element n~ of N representing the combination (vi, el) to a label l e L constructed from (a, b) such that Gv(vl) = a and Ge(e~) = b. Using the set N and the function F, an attributed set Ng of node arc composites can be defined in the following way N o = {n i = (n~,F(n~))]i = 1 to m}.

Another function z: N 0 --, N 0 defining a one-to-one mapping of the elements of Ng onto itself such that nll - ' n~+ 1}modmis required to represent the structural connectivity between node-arc pairs. Therefore, an ARC representation finally reduces to (N0,z). We term this tuple as shape structure. The examples in Figs 1 and 2 illustrate the scheme. Shape structures form an adequate representation of 2-D contours. In fact, for most structural methods of object recognition contours are represented as an ordered list of primitives{1} and these ordered lists are essentially domain dependent realisations of the shape structures. But, definition of the universe of primitives i.e. U varies for different approaches depending upon the actual choice of features. In this paper, using the basic representational framework of shape structures, we have developed a recognition strategy which is not dependent on the ways in which U is specified for a particular problem domain. The basic objective is to recognise a partially obscured shape which is more clearly visible compared to any other known

S. CHAUDHURYet al.

620

(ZN

:/3.

/~o

ARC OF T

MODEL OBJECT I

v =IVy, v2,v~,v=} E

=lel,ez,e3,e=

l

A = I(L,C~] ), (c,r.o0 ol .. On'I),(C r--/9o /91 "'" /gnJ)} B =l(Ang,[90"3)} Gv Vl : ~ ( L , C ~ 3 ) V2 : . (c,f..OoO I. • On]) V3 ~ (c,r/9o/91. • / 9 . 3 ) V~ ~ (t,r,~]) G¢ ei :"-'~ (Ang, r90"3) i = 1 to 4 L = {(L1 , r ~ 9 0 " 3 ) , ( c l , r ( x 0 O l ' " o n 9 0 ' ] ) , ( c 2 , [

/90 ' " /9n90" 3) )

N~=( a , b , c , d } Where, a : b = c = d =

( ( v ~ , e ~ ) , ( L ~ , r ~ 90'3)) ((Vz,e z) , ( c j , r ' o 0 o l ".. On90"3)) ((V3,e3),( c2,E/90 " ' " /9n90"3)) ((V4,e4),(LI,[~ 90'3)) Fig. 1. ARC and SHAPE STRUCTURE of a shape I.

e3

el

MODEL

ARC

OBJECT J


Ng = { e , f , g

OF J

}

Where,

• = (( Vl ,e 1 ) , ( L', C~ (:503)) f = ( ( V z , e z) , ( L " , C . / 9 g = ((V 3 , % )

, (c",E

6t3))

Qo Q 1 . . .

~n6z 3 ) )

Fig. 2. ARC and SHAPE STRUCTURE of another shape J.

Recognition of occluded objects

objects present in a scene. Hence, the recognition process involves a search for the model shape which most closely resembles the test shape. A measure of dissimilarity is used for comparing the shapes. Dissimilarity between the test and a reference shape structure is computed as the minimum total cost of a sequence of transformations which are required to produce the reference shape structure from the test shape. One shape structure can be modified into another by inserting, deleting or substituting elements of the corresponding Ng. Costs of individual operations depend on components of the attribute vectors of the elements of Ng. The problem of finding a reference shape having minimum distance with the given test input is modelled as a problem of finding minimum cost path to the goal in a state space graph. The state space graph is described in the next section.

621

obtain C~. Extend the other with the null element ).. The cost of transition from St to Sj in this case logically represents the cost of deletion of the newly added primitive from the lth shape. Alternatively, it can be considered as the cost of insertion of the new primitive in the Jth shape. It can be easily observed that use of null-element helps in establishing a one-to-one correspondence between elements of the subshapes C~ and C~ of a state S~. Precise specification of the transition rules are as follows: (i)For the initial state So = (q~,q~), i.e. both the constituent strings are null: Choose first element ntr of Ngr (i.e. Ng of the testshape) and generate the following states, Sinit =

SoinUSo,ub. Here

S%, = {(nrt,C~)lC~ = ). and J = 1 to N1},

3. S T A T E S P A C E F O R M U L A T I O N

Portions of the object contours can be logically represented by arrangements of the elements of Ng as defined by the function ~. For example, the string ni II'r(ni)ll zZ(ni)II "C3(tli) represents that portion of the contour which starts from the primitive ni and spans over three of its successors. We define such strings as subshapes. A state Si in our state space graph is a tuple (C~, C~), where C[ and C~ are the subshapes corresponding to shape structures I and J. It may be stated that C~ and C~, in addition to elements of N~ or N~, may also contain null elements. The state space graph consists of a set of labelled states S = {So, $1 . . . . . Sw} and a set of directed labelled branches R of the form R = {(Si,Sj)lSi, S j e S } where (Sl,S~,) indicates a branch originating from Si and terminating on Sj. Each branch is labelled with c(Si, Sg), i.e. cost of transition from S~ to Sj. State transitions represent individual steps in transformation of a shape structure into another. Consequently, the successors of a state ($3 can be partitioned depending upon the transformation operations, namely insertion (S~,.,), deletion (Sir,i), substitution (S~,ob),by which they are obtained. The state So is a special state indicating an empty state (~b,q~). Search for the minimum cost path begins from the empty state So. Through state transitions both the shapes I and J are reconstructed parallely and the cost of the path to a state denotes a measure of similarity between the partial shapes so far reconstructed. The transition from a state S~ representing the partially reconstructed shape pair (C~, C~) to another state Sj can be achieved in two possible ways: (i) Extend C~ by augmenting it with an element from N~ succeeding the last non-null element of C~. Extend C~. similarly. The cost of transition from S~ to S~ in this case, is the cost of substitution of the newly added element of CJ by the corresponding element of CJ. (ii) Extend one of the shapes (say C~) as above to

N1 is the total number of reference shapes known a

priori. Thus, the elements of So,,, logically represent the process of insertion of ntr in Nas of all the reference shapes.

So,u~ = {nr,n~)l i = 1 to IN~I and J = 1 to N1}. The elements of So,,b represent substitutive transformations for all elements of N~, J = 1 to N1, with n r" (ii) For any intermediate state S~, where C[ is a string of 2s, i.e. only insertion of elements of N~ has taken place: Let n~" be the last element embedded in Cir. Then,

Si .... = Si~,,USi,ub; where Si,.. = {(C/llT(n~),C~ll).)} and

Si,o~ = {(Cfllx(nr),CfllS~,)lu = l to IN~I} (11 is the concatenation operator). (iii) For any intermediate state Si where C S and C[ are both nonnull strings: Let n r and n[ be the last non-), elements embedded in C r and C[ respectively. Then,

Si.... = Siin, U Sid,l U Sisub; where

S~,., = {(CTIl~(n~), C[112)} S,,., = {(CT II ~), (C~' II T(n~))}

S,,.~ =

{((C r It Z(nE)), (C[ II z(n[))}

(iv) For any intermediate Si where all nvJ E NgJ in C~ but all nr are all in c r :

S, .... = {(CTIl~(n~), C~'ll~)}

are

S. CHAUDHURYet al.

622

where, n[ is the last non-2 element embedded in

to the goal state determines the nearest neighbour of the unknown shape as well as its orientation. Description of the corresponding goal node uniquely specifies the correspondence between elements of the test and reference shape structures.

CL (v) For any intermediate St where all nk, r ~ Ngr are in CT but all n~,¢N~ are not in C/:

{(c7 rl 2, cl

s~.... =

II ~(nf))}

where, n/is the last non-2 element embedded in C/. The transition rules define all possible sequence of transformations applicable between shape structures. We illustrate the transition rules in Fig. 3. Descriptions of a state Sj stores the complete sequence in which elements of N~s have been transformed for arriving at St. A state Sg = (C~, c~)is recognised as the goal state if all nii 6 Ngi and all n~6 N~ are in Cg and C~ respectively. Therefore, there exist goal states corresponding to all possible paths in the state space. Cost of each path is calculated as summation of individual transformation costs along it. The minimum cost path

4. STATESPACESEARCHALGORITHMS The state space formulation of the shape matching problem discussed in the previous section can be explored efficiently by using heuristic knowledge for guiding the search in the state space. The additive nature of the transformation costs suggests that the algorithm A *(t6) can be employed here for guaranteed optimal solution provided an appropriate admissible heuristic function can be designed. A* distinguishes between nodes that have been expanded and nodes that have been generated but

l

Ng = t i , j , k

}

For the reference shapes Nlg = la,b,c,d ) (c.f. fig. l) N~ = { e , f , g

; k

I (c.f. fig. 2 )

L

TEST

SHAPE

.%

('

(

~

(I,c)

( '

(a)

(i,e)

f

)

shape7 )

(i,f)

RULE(I)

//% / L 59

(ij,),e)

(ij ,Xf)

5/.

(ij, ,kg)

(b) RULE(U)

(ij ,X,~)

(ij,ef)

(iX,ef)

(Ij ,eg)

(c) RULE(lID

Fig. 3. Illustration of state transition rules. (For a hypothetical object with N~ = {e,f}, second transition in (c) illustrates rule (V). Similarly, for a test object with N~ = {i, j}, third transition in (C) above illustrates rule (I V).)

Recognition of occluded objects

not yet expanded. Two separate lists CLOSED and O P E N are used to keep track of the nodes of these categories. Storage of these two lists, in general, imposes memory constraints. However, in this case, since the state description conveys all the information regarding the transformation procedure we need not store the actual path to each node. In addition, the state space is a tree. Therefore we can do away with the list CLOSED and thereby obtain substantial savings in space utilisation. It has been found that A* tends to spend large time discriminating among the paths whose costs do not vary significantly. For the given problem domain this issue is of considerable importance because situations are not uncommon where promising states with subshapes formed by concatenation of a number of contour segments have path costs nearly equal to those of the states representing smaller but dissimilar partial shapes. The algorithm spends time unnecessarily exploring states of the second category. This problem can be at least partially circumvented if the set of nodes with roughly equal f values are reevaluated using a more direct discriminant function E() and the node for which the function has the maximum value is chosen for expansion. If we always consider nodes having f values lying within an ebound of the currently minimum .1, then the faster solution thus obtained may deviate from the correct minimum solution by a value bounded by e.(17) This algorithm is known in AI literature as Ae*."7) can represent the discriminating resolution for the recognition task in the sense that we need not distinguish between the solutions lying within the ebound. A suboptimal solution lying within the ebound may represent instead of the correct object, one of the other plausible candidate objects whose visible portions closely resemble the correct one. The suboptimal solution can also correspond to one of the other objects which are nearly equally visible as the correct one and are actually present in the scene. Also, the suboptimal solution may be a good approximation of the correct transformational sequence for the correct object. Therefore, using this algorithm Ae*, though we may not always get an optimal solution, we are guaranteed to obtain at least a valid and reasonable solution in lesser time. Sometimes, the images of the objects may be degraded to such an extent that reliable recognition becomes impossible. To take into account that possibility and to reduce complexity of the search by eliminating deviant nodes, we define a parameter D. D is an upper bound on f specified by the user, If f value of any node exceeds D, we do not add it to the OPEN. Therefore, we consider all recognition possibilities that incur a sum cost greater than D as unreliable. To reduce the search space further, at the beginning of the matching procedure we can use depth first search to find an estimate D' of the recognition cost. Since all the paths guarantee a goal state the estimate can be calculated along any

623

arbitrary path. If the estimate D' is less than D, then D' can be used in place of D. Finally, it may be mentioned that although zero arc costs are possible in this formulation, the algorithm is guaranteed to terminate because each path terminates in a goal and the search tree consists of finite number of nodes. 4.1. An admissible heuristic function In this sub-section an admissible heuristic function is proposed to guide the search. The heuristic has evolved from a simplified and relaxed model of the unexplored state space. For computing the heuristic function h(n) of the node n = (C/r, C~) attention is obviously focussed on that subtree of the state space which is rooted at node n. Easiest way to estimate cost of the optimal path to the goal for that subtree is to find the number of insertion and deletion operations which are unavoidable to reach the goal and to make an assessment of the total cost involved. Accordingly a function can be defined in the following way:

h(n) = abs(([Nr[ -

(IS~l

-

IC/rl) Ic/I))

* min(i', d')

where

IC/rl gives the number of non-), elements in the string C/r; i' is lower bound on insertion cost, d' is lower bound on deletion cost. While computing the above function it has been implicitly assumed that, for the remaining elements of NOr or NOr substitutions take place on the optimal path at every possible instance and that too with zero cost. This assumption is restrictive and consequently the discriminating power of h(n) turns out to be limited. A more powerful heuristic function can be computed by considering best possible matching configuration of the next unmatched element of Nor i.e. T(nr), when nr is the last non-), element concatenated to C r. For computing this function, element z(n r) is slid along remaining segments of the reference shape. Total cost that would be incurred, if the corresponding element of the reference shape is substituted by z(nkr), is computed for all remaining elements of N~ assuming that all possible substitutions occur and these substitution incur zero costs. Minimum of these costs obviously gives an underestimate of the cost of the optimal path provided the ~(n~') is actually substituted on the optimal path. If, n[ is the last non-), element embedded in C[, the function can be defined as follows: V(n) = rain [Cs(z(nr), zJ(n[)) J

+ (Y -

1) • i' + I(INrl -

IC~rl -

1)

(INgl - IC~'l - J)l * min(i', d)l].

This function gives a better estimate than the previous function because in this case problem constraints are

S. CHAUDHURYet al.

624

less relaxed in the sense that actual substitutive costs between z(n[) and zi(n[) is taken into account along with costs incurred for skipping j - 1 elements of N~. In other words, now we assume that for only [ [ N ~ [ - [CT[- 1] and [[N~[ -[CJ[ - 1] elements of test and reference shapes all possible substitutions occur with zero costs. But, this estimate may not be always an underestimate. In case, the element z(nD is actually deleted, the function may err. So another quantity Dmi, is computed: Omi.(n) = [-Cd('c(n/)) + I((IN0rl - I C Y I - a) - (IN~l -

INgl -

1)l, min(i',d')]

and finally h is computed as

h(n) = min(V(n), Drain(n)). For computation of Dmi., actual deletion cost of the element is taken into account. Due to this reason, even when h = Dmin this function turns out to be better than the first heuristic. Theorem 1. The heuristic function is an underestimate. Proof. Obvious from the above discussions. The heuristic, thus calculated, guides the search on the basis of the cost likely to be incurred on the path. Other sources of heuristic information can be brought into play to further reduce the search complexity. One possible source is the proportion of the shape contours accounted for by the individual states. This information is used for calculating the discriminant function E for reevaluating the nodes as required for A - ~* algorithm. Exact specification of the function E is obviously dependent on the nature of the contour primitives. A formulation of this function for a particular case will be presented in Section 5. Search complexity can still be reduced by considering the information regarding the quality of substitutive transformations along each path and using them for eliminating nonpromising paths. Also, if substitutive costs among the elements of N 9 are defined to be infinite then many of the possible paths in the state space are automatically removed from consideration (due to the bound D). Usually in these cases, search complexity is reduced but primitive identification and cost definitions may become more involved. 4.2. A pruning strategy As indicated in the previous section, the search complexity can be reduced by taking into account the quality of match at the substitutive nodes. The approach followed here for this purpose uses a quality measure called markcount. It is initialised to zero to start with. Now, the procedure for pruning may be described as follows: A set of primitives of the reference shapes are marked privileged according to some domain dependent discriminating criterion. Whenever a privileged primitive is substituted and the substitution cost is less than a threshold, the corresponding node is

marked and its markcount is incremented. A node inherits markcount from its parent and increments the markcount of it if it is marked. Therefore nodes with higher markcount indicate that, for the paths where these nodes lie, parts of the reference shape have matched well with the test shape. Hence, there is a good likelihood that the actual solution path lies among the paths through these nodes. So, these paths must be identified at an intermediate stage of the search to prune away nonpromising nodes. When a marked node with a markcount greater than an initial threshold is chosen for expansion, it is clearly evident that this path with good partial matches has become globally promising. In turn, this implies that search has progressed sufficiently so that promising and unpromising paths can he differentiated. Under this condition the algorithm can prune away all the nodes whose f-values are greater than p . f m i , (p is a prespecified factor and fmi, is the fvalue of the node currently chosen for expansion) and markcount is less than a threshold rain_count. This threshold min_count can be fixed a priori or it can be made dynamic by prespecifying the difference between markcount of the node to be expanded and markcount of the node to be deleted. Pruning is initiated for the first time when initial threshold condition is satisfied. Now, threshold of the markcount is incremented preventing further initiation of ineffective pruning operation by nodes with identical markcount. The same procedure is repeated for all subsequent pruning operations. Therefore, pruning occurs only once for each increment of the markcount and that too for the minimum f-valued marked node. This is essentially a parameterised pruning strategy which can be tuned for actual application domain. The overhead of pruning is kept minimal. It is initiated only when it is expected to be most effective. Obviously, the algorithm with the pruning operation does not guarantee an optimal solution, but by proper choice of the relevant parameters, possibility of missing the correct solution can be minimised. 4.3. Extension of the methodology for recognition of all the objects present in the scene The techniques presented and discussed in the previous sections are concerned with recognition of an object in a conglomerate compared to which no other object is more visible. This technique can be easily extended to identify all the objects present in a scene provided (i) sufficient portions of the individual object contours are present in the composite contour of the conglomerate so as to enable unambiguous recognition and (ii) number of confusing segments, i.e. segments actually belonging to the object A but inferred as an element of object B, in the composite contour are small. As evident from the previous sections, first optimal solution obtained by any of the above strategies identifies an object, a major portion of whose contour has matched well with the segments of the conglomer-

Recognition of occluded objects ate boundary. Remaining segments of the conglomerate contour which do not have a one-to-one correspondence with the segments of the object in the final goal node description, therefore, possibly belong to other objects present in the scene. To identify these objects the same algorithm can be applied in a modified setting. The conglomerate segments matched are now marked 'done'. The segments thus marked are effectively removed from consideration by associating zero transformation cost for all state transitions involving these segments. Now by applying algorithm a second time, we obtain a goal which identifies the object having best overall correspondence with the unmatched segments of the conglomerate contour. To identify all the objects which can be reliably recognised in the conglomerate, the above process is continued till a very small number of the conglomerate-contour segments are left unmarked. However, if the number of confusing segments turn out to be large, they get associated with wrong objects and hence the method fails. 5. SHAPE DESCRIPTION AND MATCHING WITH POLYGONAL REPRESENTATION

To illustrate the use of the generalised framework presented in the previous sections, an example problem involving polygonal representations of the shapes is taken up. A digital boundary can be approximated with arbitrary accuracy by a polygon. But the basic goal of polygonal approximation is to capture the essence of the shape boundary with the fewest polygonal sides. Although, the problem is not trivial and can very quickly turn into a time consuming iterative search, there are a number of polygonal approximation algorithms whose modest complexity and processing requirements make them well suited for industrial automation applications. Polygonal approximation of an object contour can be easily represented in the shape structure based representation scheme. In this case all elements of Ng have an identical symbol 's' which represents sides of the polygons, while components of the attribute vector are the length of the polygonal side and the internal angle that it makes with the succeeding side. The polygonal approximation of the object-contours is extracted using the split merge procedure of Pavlidis and Horowitz3 ~s'~9) The initial set of points required for the approximation are chosen to be curvature maximas. These are approximately located on the object boundaries using corner finding algorithm of Johnston and Rosenfeld. t2°) At the cost of accuracy initial segmentation operation is speeded up by considering small neighbourhoods for each boundary point. The split-merge algorithm is now initiated with this set of segmentation points. For the split-merge procedure, the following error criterion was used: E, = m a x (E,)

625

where El is the maximum perpendicular distance of a point on the contour from the corresponding polygonal side. The algorithm aims to reduce Et below a prespecified upperbound (Eb). The procedure splits a polygonal side at the point of maximal perpendicular distance as long as Et is above the predefined bound and merges the side for which newly computed Ei is less than Eb. The resulting polygon approximates the shape within the error bound and has minimum number of sides for the given constraints. The representation is invariant to the choices of starting points and the changes in orientation. If objects occlude each other or only partial views of the objects are available, then it is possible that some of the contour primitives (i.e. the polygonal sides) are missing, some additional primitives have appeared or some primitives are replaced in a given scene. The insertion, the deletion and the substitution operations for transforming one N9 into another therefore adequately cover the above possibilities and can be used for matching distorted shapes. Hence, for the given primitives of No, we need to define costs for individual operations. The definitions are as follows: (i) substitution cost of ith primitive of the Ith shape by jth primitive of the Jth shape is defined as: abs(/i - lj)/max(L1, Lj) + A(~ - ctj)/180 where, l~ and lj are the length components of the ith and jth primitive respectively. ~ and ~j are angular components of the corresponding attribute vectors of the primitives. Lt, L s are perimeters of the shapes I and J respectively. A(x) = Ix[ if Ixl < 180°

= 360° - Ix1 if Ixl > 180°. The first term of the substitution cost measures the difference in the lengths of the segments. The quantity is normalised with respect to the perimeter of the larger contour. Consequently, total contribution to the matching cost due to this term can never exceed 1. But, there will be always finite non-zero contribution due to this term when two identical but scaled objects are being matched. On the other hand, when a segment of the reference shape is being substituted by an identical segment of an object conglomerate of greater or smaller perimeter this term would evaluate to zero. Thus, the normalising factor in denominator of this term regulates contribution due to actual difference in lengths of the primitives. The second term in the cost definition is the measure of the difference in angular orientation of the succeeding sides with respect to the current primitives. Obviously this term has higher weightage because contextual organization of the primitives characterises the shapes and it is determined by the angle between the primitives. (ii) insertion/deletion costs of ith primitive of the lth shape: Insertion or deletion of a primitive means

S. CHAUDHURYet al.

626

that it is being effectively substituted by a null primitive which does not possess any length or angular value. In that case, the corresponding cost can be defined as

A(~i)

ldL~ + 18----O" Now, to avoid substitution between the segments that do not match properly, actual lengths li and l~ of primitives are compared. If (l~ < SM*Ij) or (lj < S M * li) where (0 < S M < 1), the substitution is defined invalid and correspondingly arc cost of the substitutive transformation is made oo. To guide the search more effectively, we designed an evaluation function E(n) which gives priority to nodes in which greater portions of the test and reference shape boundaries are accounted for. To be more specific,

E(n) = L(cr)/LT + L(C~)/Lj where w

L(C r) = ~ Iv i=1

w is the number of non-2 elements of C~r and Ii represents length component of the ith non-2 element of C r. LT is the perimeter of the test shape. The parameters for Jth shape are similarly defined. The above formulation is tested with a number of example scenes. The results are presented in Section 7. As indicated in reference (12), the given set of transformation operations cannot adequately handle the problem of determinations in polygonal representations caused by noise and errors in approximations. Distortions due to random noise or errors may split up a side of the polygon representing the shape. Inclusion of an operation for merging and creating new primitives during matching would, in that case, eliminate need for very accurate contour segmentation and make the matching operation more robust against noise. Hence, we focussed our attention to merging of primitives at match time and have come up with a new operation for merging. In this situation, this operation is more appropriate than the one proposed in reference (2). In the next section we discuss matching of shapes and structures with this merging operation. 6. S H A P E M A T C H I N G

vector lla, 0a] and [lb,0b] respectively, we create a new primitive nc with the attribute vector Ilc,0c] where lc and 0c are given by the following expressions: (i) l~ = (l2 + l~ - 21~ lb Cos 0a)1

(ii) Oc = abs(Ob - [ c o s - l ( ( - l ~

+12 +12)½)]) 21~lb

Figure 4 illustrates the above operation. It is easy to see that the resultant due to merging closely resembles the original undistorted side. Now the cost for merging operation should be defined in such a way as to take the following issues into account: (i) cost in case(i) of Fig. 5 should be less than that in case(ii) because in the first case at least one side closely corresponds with the new side formed; (ii) it should be greater when number of primitives merged is more; (iii) In case of merging operation immediately after any primitive the angle that this primitives makes with its successor changes because now the orientation of the successor (resultant) is different. This fact should be taken care of in defining costs. In conformity with these criteria, we may define a merging cost M as follows:

M

=

M L

+ MA

where M L = 2 * abs(la - It) • abs(lb - lc) max(/a,lc) max(/b,l¢)

MA =

A(angd) 180

Here, angd is the difference in orientation between the first primitive merged and the new primitive created. This cost can be very easily generalised for n mergings. The merged primitive is then subjected to substitutive transformation during matching. Now, the state space and the search strategy should be modified to include possibilities for merging during state transitions. We have considered merging of at most three sides. The transition rules are modified in the following manner: (i) while transition from So:

WITH MERGING Sinit ~-

Whenever a polygonal side is split due to noise, a proper merging operation should create from these primitives a new primitive which would match perfectly with the undistorted side. The proposed merge operation merges two primitives to create a new primitive as a third side of a triangle containing the first two as its other sides. For merging k primitives, one may merge the first two primitives and then the resultant with the third one and so on (k - 1) times. By merging primitives na and nb with the attribute

Soin, U S0m,r s

Here, S%, is same as in Section 3 and states in So=,,. corresponds to substitutive transformations involving single primitives as well as those created through merge operations. Hence So.... is defined as

So .... = {(~r,~J)ld = 1 to N,} ~r and aJ are defined in such a way that all merging possibilities corresponding to the first primitive of

Recognition of occluded objects

627

e

I sh I

sh 2

Side m in sh 1 sprit into a b c d in sh 2 due to noised a b c d are merged into m 3

ml,m2,m3 are primitives generated in each stage of merge operation

"" "C"

Fig. 4. Illustration of merge operation.

Ngr and those corresponding to all the primitives of NoJ are taken into consideration. More precisely,

ar E {~(n[), ,(n~') • ~2(nI), ,(hi). zZ(nT) • "cS(nT)} and

~ r ~ {(n~lNorl_l)• n,N~,'nO, T r

{n,,ni ~(n~),n['z(n~)'z2(n[)l

(n~:l ," nr),(n[q~:l ," n r" nl),

S/,o, is as defined in section 3. (iii) from any state S/where none of the strings are null:

(hr. nz),(nl • r 'nz" r nr),(n r)

where Si . . . . =

• indicates merging operation; where and

(ii) from any state

S~ where

SIa,~and

Siins U Sidel U Siraerg

S/~°, are as defined in Section 3 and

S/.... = {(C r IIhr, Cl II#)}

a' ~ {(n~' • z(n:)), (n[- ~(n[)- ~Z(n~'), ( nJl ) l nJ/ ~ N 39 and

i = 1 to [N~I }

If n T and n] are the last non-), symbols embedded in C T and C~ respectively then,

J= 1 toNt} C~ is a null string:

Sisu¢ = Siins U Siraerl I

~r e {(n~r )' * 2(n~), r ,(n~)" r • 2(n~)', r 3(n~), r ,(n~)} r and

where

a ~ ~ (~(n~). ~Z(nr)),(~(nl)" ~Z(nl)" ~3(nl), ~(nl)l S/ .... = {(C[ II~r, #)}

If n T is the last non-), symbol embedded in C r then

~

The rule (iii) is illustrated in the Fig. 6. Other two rules remain identical.

b new

case ( i )

Fig. 5. Two cases for merge cost calculation.

new

case I ii I

628

S. CHAUDHURVet al.

1

c

[

d"

S $i

(a,a')

.Ira

1-

' b

':t

ff "

lab.c , ~.'b' ) Fig. 6. Illustration of state transitions with merging.

Because of inclusion of the merge operation, the 7. RESULTSAND DISCUSSIONS heuristic proposed in Section 4.1 is no longer guarThe methodologies presented in the previous secanteed to be admissible. In this case, during experimentation h is set to zero. Consequently overhead for tions are used for recognising occluded objects. Object computation of h during each node expansion is models are organised into two libraries. One consists thereby eliminated. But, the number of node expan- of polygonal shapes and the other contains shapes sions are expected to be more since h( )is not used to which are not inherently polygonal. Number of conguide the search. Even without h( ) use of the function tour segments for the second set of the objects are in E() for the algorithm A - e* will have the same general larger than those for the first set. Aim of the desired effect as in the other situations. Also, the experiments described here is to assess the effectivepruning strategy developed in Section 4.2 can als o be ness of the technique for recognising partially visible effectively used. Some of the longer segments of the shapes and to study characteristics of the different reference shape are marked privileged. During the matching techniques proposed. The algorithms are course of search whenever a node is created by implemented in C on HP-9000 system. One set of experiments is designed to recognise substitutive transformation of the privileged primitive or a merged primitive containing the privileged one distorted and/or partially visible objects in the scenes and the corresponding transformation cost is less than (Fig. 8) formed by model objects present in Fig. 7. Six a threshold, the node is marked and its markcount is different scenes with varying degrees of occlusion are incremented. Now, the pruning strategy works as in considered. The results of these experiments are Section 4.2. The results of matching experiments with presented in Table 1. For this experiment value of merge operation using the above search strategies are S M (cf. Section 5) used is 0.6. From the table, it is also presented in the next section. evident that the algorithm has identified the correct

Recognition of occluded objects

a

h

,

]c / e

I

[ k

a

I F-q

Ib

e g

¢

2 a

b

g Fig. 7. Library 1 of model objects.

629

object in all the cases. Also, correspondence between the segments of the reference shape and the test objects are more or less correct. With an increase in the value of ~ correctness of the results has not changed but the number of nodes expanded has decreased in most cases. This observation becomes more prominent from the results of Table 2. With even unnatural variations of e we have not obtained wrong results. This is probably due to absence of other candidate solutions since the test objects strongly resemble the model objects. In an isolated case in Table 1 we find that the number of nodes expanded has increased for a higher value of ~. This effect can occur if number of unpromising nodes are found to be promising according to the function E(). In Table 3 also, the recognition results of the same scenes are presented. But, in this case A~* has been modified to incorporate the pruning strategy. For this purpose, sides of the reference shape whose length are greater than 5% of the perimeter are considered to be privileged sides. If the substitution cost is less than 0.3, then only the corresponding nodes are marked. The threshold for initiation of pruning is kept at 3. Min-count is made dynamic in the sense that the difference between mark-count of the node to be deleted and the threshold is fixed at 2. For experimentation, value o f p used is 2 (cf. Section 4.2). The results in Table 3 show that the parameters chosen never lead to wrong results. From the table it is also easy to see that the number of nodes expanded decreases substantially. Therefore, pruning strategy effectively focusses the search on to promising candidates and thereby avoids unnecessary node expansions. In another set of experiments, attempts are made to analyse scenes (Fig. 10) formed by object models

r

f

dl i

,|

5 Fig. 8. Example scenes--1.

630

S. CHAUDHURY et al.

Table 1. Recognition results for scenes in Fig. 8 without merging

8

No. of nodes expanded

1

1

2.36

0

471

1 1 2

1 1 3

2.36 2.36 4.52

0.25 0.5 0

359 297 798

2 2 3

3 3 1

4.52 4.52 2.66

0.25 0.5 0

676 589 612

3 3 4

1 1 1

2.66 2.66 2.79

0.25 0.5 0

558 510 819

4 4 5

1 1 3

2.79 2.79 3.65

0.25 0.5 0

753 767 1912

5 5 6

3 3 2

3.65 3.65 3.32

0.25 0.5 0

1823 1791 1073

Scenes

Object model matched Distance

Correspondence* between the segments (a, a), (b, b), (d,e), (f, h) --(a,a), (b,b), (c,c), (d, d), (e, e) --(a,a), (f,g), (g,h), (h,i) --(c,h), (d,i), (e,j), if, k), (g, l), (h, m) --(a,m), (b, n), (c,o), (d, p), (e, q), (f, r) --(b,a), (c, b), (d, c),

(e, d), (f, e), (g,i), 6 6

2 2

3.32 3.32

0.25 0.5

1021 1092

(h, g), (i, h), (j, i), (k,j), (1, k) ---

* First element of tuples of this column correspond to segments of the model object and the second element correspond to segments of the scene object.

Table 2. Recognition results for scenes in Fig. 8 with variations in

Scene

Object model matched Distance

~

1

1

2.36

0.6 0.7 0.8

3

1

2.66

0.6 0.7 0.8 1

1

No. of nodes expanded 262 229 193 99 487 436 423 405

Segment correspondence

Same as in table 1

Table 3. Recognition results with pruning (without merging) for scenes in Fig. 8

Scene 1 2 3 4 5 6

Object model matched Distance 1 3 1 1 3 2

2.36 4.52 2.66 2.79 3.65 3.32

~

No. of nodes expanded

0.25 0.25 0.25 0.25 0.25 0.25

239 466 375 413 817 772

Segment correspondence

Same as in Table 1

Recognition of occluded objects

presented in Fig. 9. The corresponding object models are not inherently polygonal. The results of the experiment are given in Table 5. Values of SM used here is 0.6. In this case also, correct recognition results are obtained. Only in one case, for scene $3 and e = 0.25, result found is different from that with e = 0. However in this case also we can consider the result to be valid since one of the possible competing results is obtained. Since, the n u m b e r of sides of test and reference shapes are more than those for the previous experiments, n u m b e r of nodes expanded in this experiment are in general more. The tables 4, 6 and 7 present results for experimentation with merging operation. Inclusion of merging operation increases size of the corresponding statespace. To keep the n u m b e r of node expansions manageable, pruning operations are used while searching in all but one situation. For the results of Table 7, no pruning procedure is used. In all the cases SM is kept at 0.6. For the merged primitives, substitutions are considered valid on the basis of comparison between lengths of the resultant primitives. It is evident from the results of Tables 4 and 6 that more appropriate segment-correspondences are obtained if merging operation is used during matching. It is also clear from the results that the pruning operation is useful for keeping the n u m b e r of node expansions manageable. F r o m the results of Table 7, it is found that with merging slightly distorted objects match with the undistorted ones incurring a cost much lesser than that without merging. Hence, dissimilarity index thus obtained reflects the relation between the shapes more correctly.

631

01

d

m

L~

. .

k

d q

_ _

Scene

Fig. 9. Library 2 of model objects.

From the above discussions and experimental results, we can conclude that this technique for identifying more clearly visible object in a multiple or single object scene is effective and useful.

8

No. of nodes expanded

1

1

1.57

0

310

1 1 2

1 1 3

1.57 1.57 4.52

0.1 0.2 0

294 282 812

2 2 3

3 3 1

4.52 4.52 2.02

0.1 0.2 0

781 775 471

3 3 4

1 1 1

2.02 2.02 2.79

0.1 0.2 0

466 458 827

4 4 5

1 1 3

2.79 2.79 3.65

0.1 0.2 0

811 779 1089

5 5 6

3 3 1

3.65 3.65 3.32

0.1 0.2 0

1081 1094 753

Correspondence between the segments (a,a), (b, b), (c,c.d), (d.e, e.f.g), (f, h) --(a,a), (b,b), (c, c),

(d, d), (e, e) --(a, a), (d.e, e.0, if, g),

(g, h), (h, i) --(c,h), (d,i), (e,j), if, k),

(g, 1), (h, m) --

-(a,m), (b,n), (c, o), (d, p), (e, q), (f, r) --(b,a), (c,b), (d,c),

(e, d), (f, e), (~ f), (h,g), (i,h), {j,i), (k,j),

(l, k) 6 6 PR 23:6-G

1 1

3.32 3.32

--

o3

Table 4. Recognition results for scenes in Fig. 8 with merging and pruning Object matched Distance

h

02

0.1 0.2

649 633

---

632

S. CHAUDHURYet al.

;I

i

ol

n

m~,~ k

$1

a

t

b c

f

g

S2

k

b

Ix a

p~..~

v-

"

-

~.t~

y z

Fig. 10. Example scenes--2.

Table 5. Recognition results for scenes in Fig. 10 without merging

Object model Scene matched Distance

e

No. of nodes No. of expanded nodes with expanded pruning

S1

02

3.31

0

3312

1031

Sl $2

02 02

3.31 3.99

0.25 0

3383 3114

979 1178

$2 $3

02 02

3.99 3.88

0.25 0

3001 3947

1063 1027

$3

03

4.11

0.25

3813

Correspondence between segments (a,f), (b,g), (d,h), (e,i), if, j), (g, k), (h,l), (i, m), (j, n) -(m,b), (g,h), (h,i),

tj, m) -(a, x), (b,y), (d,z), (e, ~t), (f, fl), (h, k), (i, 1), (j, m), (k, p), (1, r) 1038" (a,v), (b,~), (c,a), (e, b), (g, e), (h, f),

(i, g), (j, h), (m, m), (n, n), (o, o) * Without pruning same as in previous case.

Recognition of occluded objects

633

Table 6. Recognition results for scenes in Fig. 10 with merging and pruning No. of nodes Object expanded Segment Scene matched e (withpruning) Distance correspondences S1 02 0.1 1463 3.31 SI 02 0.2 1451 3.31 Same as that $2 02 0.1 1892 3.99 of table 5 $2 02 0.2 1814 3.99 $3 03 0.1 592 3.59 (a, v), (b, 6), (c, a), (e, b), (g, e), (h, f), (i, g), 0, h), (m, m), (n, n), (o.p, o.p), (q, q.r.s) $3 03 0.2 511 3.59 -Table 7. Some more results to illustrate efficiency of merge operation with objects of Fig. 11 Cost Cost Segment Segment Test Object without with correspondence correspondence object matched merging merging without merging with merging T(a) O(a) 2.313 0.412 (a,a),(d,c), (h, f), (a,a), (b.c,b), (d,c), (i, h) (e.f,d.e), (g.h,f.g), (i, h) T(b) O(b) 1.953 0.718 (a, a), (b, b), (c, c), (a,a), (b,b), (c,c), (h, i), (i,j), (j, k), (k, 1), (d, d.e), (e, f), if.g,g.h), (h.i), (j, k), (k, l), (1,m) (1,m)

8. S U M M A R Y A N D C O N C L U S I O N S

In this paper a technique for recognition of partially occluded planar shapes is proposed and discussed. The technique makes use of extended features extracted from object contours. The methodology is general in the sense that it is not dependent upon the actual choice of features for representing the shapes. A general representational framework is evolved for the planar shapes. Recognition methodology is specified with respect to that representational scheme. A state space formulation is obtained for the recognition problem. Heuristic search strategies are proposed for searching the state space.

The methodology has been applied for recognition of polygons approximating the object contours. A new operation for merging sides during matching is proposed in this connection. This operation helps in making the matching process robust enough to take care of errors in polygonal approximation and distortions in object contours due to noise. Experimental results show effectiveness and utility of the recognition technique. I) is found that Ae* combined with pruning strategy forms a fast and effective technique for matching shapes represented by shape structures. The description of the goal state gives a one-to-one correspondence between local features of the object model and its instance in the scene. From this

d

c

g

d

g

O(a) T(a)

d

a

t~ k

d

e

e f

a

j k

h

O(b) h

Fig. 11. Example shapes.

T(b)

i

634

S. CHAUDHURYet al.

correspondence, object model can be correctly localised in the scene. This information is also useful for analysing contours of the objects present in the scene for inspection and grasping purposes. The methodology presented in this paper is, therefore, useful for identifying an object in a conglomerate compared to which no other known object is more clearly visible. It can also be applied for finding out whether a known object is present in a conglomerate. The methodology is advantageous in the sense that it can be used with any choice of contour segments after appropriate cost definitions. Basic matching strategy and related modifications are in no way connected with the features used for representing the contours. Another advantage of this method is that search for the correct reference shape and its segment correspondences is conducted simultaneously through the entire set of object models. F o r the given representational scheme, the number of local features required for each object model is not large and no redundancy is involved in the storage of shape descriptions. With simple contour segmentation schemes like polygonal approximations, extraction of contour description in terms of shape structures is simple and straight forward. The same scheme can be used for objects having curved contours for which polygonal approximation may not be efficient. One such application is reported in reference (21). The method can also be applied for non planar shapes which have limited number of stable positions. Finally, it may be mentioned that the interesting features of the present methodology are the facilities for incorporation of the o p t i o n s - - 0 ) of not distinguishing between more or less identical objects (or object configurations) using a proper value of e in the Ae* algorithm; (ii)of rejection of a scene not containing any object with a reasonable degree of resemblance to one of the model objects by a proper choice of the value for the tolerance factor D. These features will be useful in many applications. An industrial robot, for example, need not waste its time in figuring out a particular one of a number of spanners which possibly differ only in the handles. Similarly, it need not bring about a forcible correspondence between a scene object and one of the model objects in a situation where the scene contains none of the models. Since these facilities are not available in many of the existing systems, the present methodology will be, in our opinion, more suitable for industrial automation applications.

REFERENCES

1. R. T. Chin and C. R. Dyer, Model based recognition in robot vision, ACM Comput. Surv. 18, 69 (1986). 2. A. M. Wallace, A comparison of approaches to high level image interpretation, Pattern Recognition 21, 241 (1988). 3. W.A. Perkins, A model-based vision system for industrial parts, IEEE Trans. Comput. C-27, 210 (1978). 4. J. Turney, T. Mudge and R. Volz, Recognizing partially occluded parts, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-7, 410 (1985). 5. T. F. Knoll and R. C. Jain, Rccognising partially visible objects using feature indexed hypotheses, IEEE J. Robotics Automation RA-2, 3 (1986). 6. G. Stockman, S. Kopstein and S. Bennet, Matching images to models for recognition and object detection via clustering, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-4, 229 (1982). 7. B. Bhanu and J. C. Ming, Recognition of occluded objects: a cluster-structure algorithm, Pattern Recognition 20, 199 (1987). 8. N. Ayache and O. D. Faugeras, HYPER: A new approach for the recognition and positioning of twodimensional objects, IEEE Trans. Pattern Anal. Mach. lntell. PAMI-8, 44 (1986). 9. G. Rives, J.-T. Lapraste, M. Dhome and M. Richetin, Planar partially occluded objects scene analysis, Proc. 8th IEEE Conf. Pattern Recognition, ParAs(1986). 10. P. Rummel and W. Beutel, Workpiece recognition and inspection by a model based scene analysis system, Pattern Recognition 17, 241 (1984). 11. A. M. Wallace, An informed strategy for matching models to images of segmented scenes, Pattern Recognition 20, 309 (1987). 12. W. Tsai and S. Yu, Attributed string matching with merging for shape recognition, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-7, 453 (1985). 13. R. Mehrortra and W. I. Grosky, SMITH: An efficient model-based two dimensional shape matching technique, NATO ASI Series F45 (1987). 14. J. W. Gorman, O. R. Mitchell and F. P. Kuhl, Partial shape recognition using dynamic programming, IEEE Trans. Pattern Anal. Macb. lntell. PAMI-10, 257 (1988). 15. M. A. Eshera and K. S. Fu, A graph distance measure for image analysis, I E E E Trans. S yst. Man Cybern. SMC14, 398 (1984). 16. N. J. Nilsson, Principles of Artificial Intelligence. Tioga, Palo Alto, CA (1980). 17. J. Pearl, Heuristics. Addison Wesley, Reading, MA (1984). 18. T. Pavlidis and S. Horowitz, Segmentation of plane curves, IEEE Trans. Comput. C-23, 860 (1974). 19. T. Pavlidis, Structural Pattern Recognition. Springer, Berlin (1977). 20. E. Johnston and A. Rosenfeld, Angle detection on digital curves, IEEE Trans. Comput. C-24, 1006 (1975). 21. S. Chaudhury, Development of methodologies for recognition of partially distorted planar shapes, Ph.D. Thesis, Department of Computer Science and Engineering, I. I. T., Kharagpur (1989).

Recognition of occluded objects

About the Autbor--Mr SANTANUCHAUDHURYobtained his B.Tech. (Hons) in Electronics and Electrical Communication Engineering from the Indian Institute of Technology, Kbaragpur in 1984. He is at present a scientific officer in a project on Knowledge Based Systems sponsored by HRD, GOI in the Department of Computer Science and Engineering at I.I.T., Kharagpur. His research interests are in the areas of Computer Vision and Artificial Intelligence. About the Author--Mr ARUPACHARYYAobtained his B.Tech.(Hons) in Computer Science and Engineering from I.I.T., Kharagpur, India in 1987 and is presently pursuing his Ph.D. at Rutgers University, U.S.A. About the Author--Dr S. SUBRAMANIANobtained his B.E. in Electrical Engineering from Government College of Technology, Coimbatore, India (1970), M.Sc.(Engng) in Applied Electronics and Servo Mechanisms from P.S.G. College of Technology, Coimbatore (1972), and Ph.D. in Electrical Engineering from Indian Institute of Technology, Kharagpur (1979). He is presently an assistant professor in the Computer Science and Engineering Department of I.I.T., Kharagpur. His present research interests are in the areas of Computer Vision, Artificial Intelligence, VLSI design and Neural Computing. About the Autbor--Dr GUTURUPARTHASARATHYobtained his B.Tech.(Hons) in Electronics and Electrical Communication Engineering (1973), D.I.I.T. in Computer Technology (1974) and Ph.D. in Engineering (1984) all from the Indian Institute of Technology, Kharagpur. He is currently an assistant professor in the Electronics and Electrical Communication Engineering Department of I.I.T., Kharagpur. His research interests are in the areas of Computer Vision, Artificial Intelligence and Neural Networks.

635