An improved algorithm for relational distance graph matching

An improved algorithm for relational distance graph matching

Pattern Recognition, Vol. 29, No. 2, pp. 349 359, 1996 Elsevier Science Ltd Copyright © 1996 Pattern Recognition Society Printed in Great Britain. All...

914KB Sizes 1 Downloads 126 Views

Pattern Recognition, Vol. 29, No. 2, pp. 349 359, 1996 Elsevier Science Ltd Copyright © 1996 Pattern Recognition Society Printed in Great Britain. All rights reserved 0031 3203/96 $15.00 +.00

Pergamon

0031-3203(95)00089-5

AN I M P R O V E D A L G O R I T H M F O R R E L A T I O N A L DISTANCE GRAPH MATCHING L. CINQUE,*t D. YASUDA,:~ L. G. SHAPIRO,++ S. TANIMOTO:~ and B. ALLEN,: f Dipartimento di Scienze dell'Informazione Universita' "La Sapienza" di Roma, Via Salaria 113, 00198 Roma, Italy Department of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA 98195, U.S.A. (Received 1 November 1994; in revised 16 May 1995;receivedfor publication 28 June 1995) Abstract One of the problems in computer vision is to give a computer a representation of a scene and recognize the objects in the scene and their spatial relationships. This involves low-level vision (image processing),mid-levelvision (featureextraction and measurement)and high-levelvision (interpretation). One important part of high-levelvision is relational matching, the process of matching two relational descriptions of objects, often for the purpose of object identification.This paper presents an improvement of the original method by Shapiro and Haralick for solving the relational distance graph matching problem for unlabeled graphs on a parallel computer. Forward checking Relational distance

Graph matching Tree search

Inexact matching

1. INTRODUCTION Model-based vision is the analysis of a scene, utilizing knowledge about the structures of the objects being recognized. Part of the analysis process consists of: • constructing a structural description of a portion of the image. • comparing this description to a model from a database of possible models. The result of this comparison should indicate the likelihood that this portion of the image is a projection of the object represented by the model. When the structural description is a relational model, this comparison is known as relational matching. Relational matching has been used by many researchers and is expressed in several different formalisms. Frequently the relational descriptions are graphs and the relational matching problem becomes a problem in graph matching. One of the earliest papers for matching was the work of reference (1), where several strategies of finding a mapping from a relational description of a model to a relational description of an image were compared. The method that worked best in their study was the method of hierarchical synthesis, which first looks for subgraph matches and then tries to extend them to larger mappings. This work, along with a paper that addressed the problem of graph and subgraph isomorphisms, Iz) led to many algorithms for discrete relaxation and the introduction of probabilistic * Author to whom correspondence should be addressed.

Structural description

relaxation./3) Haralick and Elliot(41 defined and compared several discrete relational operators and concluded that the operator that worked best was the forward-checking operator. The exact matching problem was generalized to the consistent labeling problem (s/and the inexact matching problem. (6) This was extended further to the problem of determining the relational distance between two structural descriptionsJ 7.8) The simplest structural description is a single binary relation, a graph. In reference (7) a distance measure known as the relational distance between graphs was defined and used to determine the best mapping between two graphs. Recent work has underscored the need to find computationally efficient solutions to graph matching problems. °) In this paper, we will focus on the efficient solution of this problem. Our goal in this work is to find the best match between two graphs: the unit graph and the label graph. 15~ The best match is a permutation of the vertices of the label graph so that as few edges as possible differ from the unit graph. Our method is to find the best match by a standard backtracking search of the permutation tree, with algorithmic methods for pruning the search. For each possible mapping, the edges of the unit and permuted label graphs are compared, and an error of 1 is recorded for each incorrect edge. An incorrect edge can be a unit graph edge not present in the label graph or a label edge not present in the unit graph. The number of possible vertex assignments is very large, growing as the factorial of the number of vertices of the unit graph. Without clever means of finding

349

350

L. CINQUE et al.

cutoffs quickly, this number grows so rapidly that any standard backtracking scheme will take too long to be u~eful, even on a parallel machine. Shapiro and H'aralick proposed forward checking and look-ahead functions to reduce the number of comparisons needed. In this paper we present an improvement on these techniques to solve the graph matching problem for unlabeled graphs. We have developed a method that significantly improves the pruning function, leading to smaller search trees and reduced search times. The method has been implemented on both a serial machine and on the MassPar parallel computer. Shapiro and Haralick used a structure called the forward checking matrix (FCM) in their pruning algorithm. This matrix is indexed by unmatched vertices in the unit graph versus unmatched vertices in the label graph; its elements hold the independent error caused by assigning a presently unmatched vertex of the unit graph to a presently unmatched vertex of the label graph. Our improvements in the algorithm lie in two basic areas: (1) estimating the relational distance between the subgraphs of the unit and label graphs composed of' edges for which both incident edges are unmatched future-future edges; (2) calculating the minimum possible error from the FCM. These improvements greatly reduce the number of nodes examined in the permutation, which enables us to compare graphs with many more nodes than the basic forward-checking algorithm (the improvement depends greatly on the graph involved). The paper is organized as follows: First we give a formal definition of relational descriptions and relational distance, followed by a brief description of the problem (Section 2). In Section 3, a strategy to estimate the relational distance between edges in the unmatched portion of the graph (FF edges) is presented; this source of error was ignored in the basic forwardchecking algorithm. Our F F edge estimate compares the number of edges going in and out of each unmatched unit vertex with the number of corresponding edges for each unmatched label vertex. In Section 4 we present our methods to calculate the minimum error of all matchings stemming from the current partial matching from the values in the FCM. Finding this minimum error is equivalent to solving the Weighted Bipartite Matching (WBM) problem (sometimes called the Assignment problem) we need to find the row-column permutation in the matrix for which the sum of the elements of the matrix are minimized. Our estimation methods take O(n2) time, while finding the exact minimum error is an O(n 3) operation using the fastest known algorithm. Using this in place of our best estimation method reduces the number of explicit searches by a small factor, which is generally not great enough to enhance the overall run times. Finally, in the last section, we compare the

performances of our algorithms to the performance of the original forward checking algorithm. 2. P R O B L E M D E F I N I T I O N

Relational matching refers to the process of finding a correspondence between two relational structures. In this section we review the necessary definitions, and we briefly define the problem we are addressing. Our notation is similar to that used in Shapiro and Haralick. (6) A relational description of an object D = (P, R) consists of a set of "primitives", each having its own attribute description and a set of named relations. Let P = (P1 ..... PN) be a set of primitives, one for each of the N primitive parts of the object. For example, P1 could represent the back of a chair and P2 a leg. The set R = ( R 1 , . . . , R ~ ) is a set of K relations over P. Each relation R k is a s e t o f Mk-tuples, R k_~PM~. For example, R~ could represent adjacency of parts of the chair. Relational descriptions are used to define prototype objects known as stored models (1'~°~ and are used as part of the knowledge base for a recognition system. Such a system inputs candidate objects, computes their relational descriptions and tries to identify each candidate by matching it to a stored model. Given two relational descriptions D~ = (U, R) and D 2 = (L, S), the goal of a matching algorithm is to find a good mapping h: U--* L comparing the corresponding elements of the relational descriptions. Shapiro and Haralick (7) have defined the structural error of the mapping h for the kth relation as the number of tuples in R k that are not mapped by h to the tuples in Sk, plus the number of tuples in Sk that are not mapped by h to tuples in R k. This error is given by: E~s(h) = Iekoh y- Ski + ISkoh - ~ - ekl where R©h for relation R and mapping h e U n is the set of tuples {(h(ul), h(u2) . . . . . h(un)} such that (ul, u 2..... u~) is a tuple of R. The total error of h is given by the sum over all the relations. That is: K

E(h)= ~ Eks(h). k-1

In this way, the distance between two descriptions is defined as the minimum total error obtained for any bijective mapping h from U to L: GD(D 1, D2) = rain E(h) h

h:U

bg

~L.

We name this mapping a best mapping from Dt to the parts of the first relational description D exactly match the parts of the second relational description D 2 with respect to all required attributes and relationships, we say that the relational distance GD(D a, D2) = 0 and h is a relational isomorphism.

D 2. When

Related distance graph matching

351

this estimation process "solving the FCM"; it is equivThe specific problem we are addressing is graph matching. A graph is a relational description with one alent to solving the weighted bipartite matching probbinary relation. The goal is to find a mapping of lem. The future-future error is the error between the N units (vertices of the first graph) to N labels (vertices unmatched subgraphs of the graphs U and L. The of the second graph) by permuting the matrix of the labels graph, so that as few edges of the permuted future-future error constitutes a major portion of the graph differ from that of the unit graph as possible. An potential error between the graphs when the partial error of 1 is recorded for each incorrect edge. This labeling is small and pruning is most effective. It problem is combinatorial in nature and can be solved cannot be exactly computed, because neither vertex by a brute-force backtracking tree search. Each iter- incident to an edge is fixed. Some information from the ation of the search answers the following question: are future-future error, however, can be added to the there any unexplored paths from the root to a leaf forward-checking matrix. If there are differences in the going through the current node that might have an in-degree or out-degree between a unit vertex and error less than the current error threshold value? If a label vertex, then an error must occur when they are there is such a path, move down the tree by adding the matched. The addition of this error to the FCM, labeling corresponding to the next node along the path although often far smaller than the true error that and repeat the iteration. If there is no such path, move results from a full mapping, provides us with much back toward the root by removing the labeling corre- faster cutoffs than when using the past-past and pastsponding to the current node. The error threshold is future errors only. the lowest error found prior to this point in the search. The path from the root to the current node in the 2.1. Forward checking algorithm search defines a partial labeling. At a partial labeling The graph matching problem is combinatorial in L a subset U M of the unit graph's vertices is matched with a subset L(UM) of the label graph's vertices. Given nature and can be solved by a brute-force backtrackthis labeling, we wish to find a large lower bound for ing tree search. However, several discrete relaxation the error of all complete labelings containing L, so that algorithms have been proposed in order to cut down the search can be pruned as soon as possible.To find search time by reducing the size of the tree that is this lower bound we need to examine the possible searched. Shapiro and Haralick ~6~proposed the inexsources of error. The minimum error of the permuta- act forward-checking tree search based on the idea that tion branch of this labeling can be divided into three once a unit-label pair (u, 1) is instantiated at a node in the tree, the constraints imposed by the relations cause sources, the past-past, past-future and future-future errors, which refer to the differences found among instantiation of some future unit-label pair (u',l') to edges with both, one or none of their incident vertices become impossible. The principle of this method is to rule out (u', 1') at the time that (u, l) is instantiated and in the matched portion of the graph. Since the past-past error is the error between the keep a record of that information. fixed subgraphs of the graphs U and L, it can always be The inexact forward-checking tree search deals with computed exactly by comparing the edges of the the three types of errors, as described above. The graphs against each other. The past-future error is the variable past-error is input to the tree search proerror resulting from comparing those edges with exact- cedure, representing the error of a partial mapping that ly one incident vertex in U M with corresponding edges has been constructed so far. The tree search is initially in the label graph. All information from the past-future called with a value of 0 for past-error and the value of error can be placed into a structure called the forward- the variable is never allowed to exceed the minimum checking matrix (FCM). error, rain-error, which is initialized to a high value The FCM has one row for each future unit and one prior to the tree search. As unit-label pairs are added to column for each future label. The entry F C M [ i , j ] the mapping and some constraints are not satisfied by represents the accumulated error for the ith unit and the resultant function, past errors will increase. jth label. The FCM is initialized to zero, meaning that The variable current-error is a local variable to the each unit-label pair has no error accumulated at the tree-search procedure representing the error assobeginning of the search. The values in the FCM are ciated with the current pair (u, l). This is the error that updated by the forward-checking procedure, which is the addition of the pair (u, l) to the partial mapping called every time a unit-label pair is added or subtrac- would add to the error already associated with the ted from the partial mapping. The value of FCM [u', l'] partial mapping. At the time of considering a pair (u, I) is incremented whenever a unit-label pair (u, l) that is the tree search does not have to compute its currentinconsistent with (u', l') is added to the mapping being error. It has been gradually computed all along by the constructed. Thus each matrix element of the FCM update procedure, which is called every time a pair is contains the error resulting from matching a future added to the mapping. unit with a future label that has accumulated so far in The value of current-error for a pair (u, I) is found in the current search path. Since the future labeling is not the forward-checking matrix at position FCM(u,I). yet determined, we need to estimate a lower bound for Thus, in the inexact forward-checking tree search, the the minimum cost permutation in this matrix. We call F C M contains real numbers instead of 0s and ls. The

352

L. CINQUE et al.

table is initialized to all zero (no error) and the value for a pair (u, l) increases whenever a unit-label pair (u',l') that is inconsistent with (u,l) is added to the mapping being constructed. Furthermore, F C M is augmented by an extra vector array M I N E R R , M I N E R R ( u ) is used to store the minimum error of all the labels for a given unit. It is also initialized to all zero. Finally, the third variable future error represents the possible error that can be incurred by the instantiation of future (not yet instantiated) units. Since the F C M associates an accumulated error with each future unit and possible label based on the compatibility between that future unit-label pair and the partial mapping, future-error can be estimated by the sum over all future units u of M I N E R R ( u ) , the minimum error for any label of u. This sum is guaranteed to be not greater than the real future error, which has to take into account not only the error caused by future units with past units, but also the error caused by the future units interacting with future units. The variable futureerror is also set to zero for the initial call to the tree-search procedure. There are two global variables: best-map and rain_error. The variable best map will contain the best mapping found by the procedure upon exit. The variable rain error will contain the error of the best mapping. It is initially set to a large value (999 999), so that the first mapping found can become the initial best mapping. With this notation the Shapiro/Haralick algorithm for inexact forward-checking t~ge search is as follows: function inexact-forward_checking treesearch (U, L, f, F C M, M I N ERR, R, S,pasLerror, future_error) u:= first(U); for each 16 L do current_error:= F C M (u, l); if:= past error +current_error+future error - M I N E R R ( u ) < rain error then begin f':= fw{(u,l)}; U' := remainder(U); if isempty(U') then if past error + currenkerror < min error then begin rain_error:= past error + current_error; best_map := f ' end else begin N E W F C M : = copy(FCM); new future_error := inexact-forwardcheck ( N E W FCM, M I N E R R , u, l, U', L, R, S, past_ error,if); if new_future_error + past error + current_error <_mi~error

then inexact-forward_checking treesearch ( U ' , L , f ' , N E W FCM, M I N E R R , R, S p a s k error, currenkerror, new_future_error) end end end for end inexact-forward~hecking treesearch function inexact-forward-check (FCM, M I N E R R , u, 1, future_units, L, R, S, past_error,if); inexact-forward_check := O; for each u' ~s future_unit do smalleskerror := 9999999; for each l ' e L with past_error + inexact forward check + FCM(u', l') N rain_error do error:= inexact-compatible(u, l, u', l', T, R, f'); FCM(u',/'):= FCM(u', l') + error; if FCM(u', l') < smallest_error then smallest_error:= FCM(u', l') end for M I N E R R ( u ) := smalleskerror; inexact-forward-check:= inexact-forwardcheck + smallest_error; if inexact-forward-check + past error > rain error then break ent for end inexact-forward-check function inexact-compatible(u, l, u', l', R, S,f'); f":=f'~{(u',l')}; inexact-compatible:= 0; if (u, 1)e R and (u', l')q~S then inexact compatible:= inexact-compatible + 1; if (l, u)~R and (1',u')~S then inexact_compatible:= inexact-compatible + 1; if (u', 1') E S and (u, I)~R then inexackcompatible:= inexact-compatible + 1; if (1', u') e S and (l, u)~R then inexackcompatible:= inexact-compatible + 1; end inexact-compatible In order to illustrate the concepts presented in this section, we will use the following example: In the example, a mapping from the set of units {1, 2, 3} to the set of labels {A, B, C} is sought, subject to the constraints of the two graph R and S. The best mapping will be {(1, A), (2, B), (3, C)} which has an error of 1, because (3, 1)~R, (C, A)q~S. The inexact forward-checking tree search must find the best mapping from the unit set U = {1, 2, 3} to the label set L = {A, B, C} constrained by the graphs R and S shown in Fig. 1. To keep the illustration small, assume that the global mimerror has been initialized to be 1.1, which will cause the procedure to search for the best mapping with error less than 1.1. The search procedure is called at the top level with unit set {1, 2, 3}

Related distance graph matching

Units: {1, 2, 3}

Labels: {A, B, C}

R = {(1,2), (2, 3), (3, 1)}

S = {(A, B), (B, C)}

Fig. 1. Unit graph R and label graph S to illustrate graph matching. and the forward-checking matrix F C M initialized to all zero, indicating no errors so for. FCM A

B

C

MINERR

1

0

0

0

0

2 3

0 0

0 0

0 0

0 0

The procedure begins with u = 1 and tries the label 1 = A. This result is a call to function inexact-forwardcheck to update the forWard-checking matrix and calculate a new future error estimate. For future unit u' = 2, the function determines that (1, 2)eR and that (A, A)¢S, (A, B)eS, and (A,C)¢~S. For future unit u' = 3, it finds that (3, 1) s R and that (A, A) ¢ S, (B, A) ¢ S and (C,A)¢S. The new forward-checking matrix returned by the function is given by FCM

2 3

A

B

C

MINERR

1 1

0 1

1 1

0 1

and the value returned for new_future error is the sum of the M I N E R R column which is 1. Now that the mapping {(1, A)} is instantiated, the search procedure is called recursively for the reduced unit set {(2, 3)}. The label set is still {A, B, C}, because this general procedure is not looking for 1 : i mappings. The procedure begins with u = 2 and tries the label l = A. The current plus error of (2, A) from the future error table is 1. This makes the sum of past plus current future errors come out to 2, which is too large, and (2,A) is not considered further in this branch of the tree. The procedure then tries this label l = B. This time the sum of past plus current plus future errors is only 1, which is still acceptable. So the mapping becomes {(1, A), (2, B)}, and the search procedure is once again called recursively, this time for the further reduced unit

353

set {3}. The procedure immediately rules out the possible labels I = A and 1 = B for unit 3, because the sum of past plus current plus future errors comes out to 2 in both cases. The label l = C passes the test, and since it is at the bottom level of the tree, the mapping best_map is set to the current mapping {(1, A), (2, B), (3, C)} with rain error equal to 1.0. The procedure has already found the best mapping, but it has no way of knowing it, so it will back up and see if it can do better. The forward checking procedure will keep it from investigating any path where the error is going to be larger than the current m i n i m u m 1.0. Thus, when it backs up and tries the label 1 = C for unit u = 2, the sum of current plus past plus future errors will be 2, and this path will be immediately ruled out. While the savings are small in our toy example, large subtrees can be eliminated in real problems. The search procedure uses the three variables pasterror, current-error and future-error to achieve cutoffs in the tree search. The forward-checking algorithm given by Shapiro and Haralick estimates the futureerror as the sum of the m i n i m u m value in each row of the forward checking matrix. This estimate is only a lower b o u n d on the real future-error. A better calculation of the future-error is described in this paper and is shown to greatly speed up the search. 3. SOLVING THE FORWARD-CHECKING MATRIX

Suppose we have a partial mapping M of unit and label vertices. Any full mapping which corresponds to M on mapped vertices corresponds to an isomorphism in the FCM. Each element of the F C M holds the m i n i m u m error that can be attributed to the pairing of a future unit vertex with a future label vertex. The m i n i m u m error of all full mappings stemming from M is therefore at least equal to M's error plus the m i n i m u m cost isomorphism in the FCM. Finding the m i n i m u m cost isomorphism of a matrix is called the assignment problem or the weighted bipartite matching problem (WBM). Since the F C M will store our past-future and future-future error approximations, our problem reduces to obtaining a more efficient lower-bound approximation to the WBM. To find early cutoffs in our search, estimating minim u m errors for each child of our current node in the search tree is more beneficial than estimating the m i n i m u m error for our current search tree node. We name the problem of finding estimates for all children simultaneously the modified W B M (MWBM) problem. In the FCM, finding the m i n i m u m error for the kth child corresponds to finding the m i n i m u m cost i s o m o r p h i s m where the element in the first row is assigned to a column k. We have developed a good lower b o u n d approximation algorithm for the M W B M that works in O(n 2) time and there is an exact solution that takes O(n 3) time with a low constant, where n is the n u m b e r of unmatched vertices in the graphs. We now describe the approximation algorithm.

354

L. C I N Q U E et al.

ried, so Yr,, the cheapest marital isomorphism, denotes the WBM solution. The M W B M solution vector is the set of cheapest marital isomorphisms for which the first woman is assigned to a particular man (or alternatively that the first man is assigned to a particular woman). We define C(H) to be the cost of a marital function H. Our approximation algorithms use marital functions as lower bound approximations to the minimum cost marital isomorphism. All approximations are

4. THE APPROXIMATION ALGORITHM

Marriage terminology will be used to depict the bipartite graph, with women (units)and men (labels) being the two sets of vertices and the edges representing the costs of heterosexual marriages. We define a marital function to be a mapping from the set of women into the set of men. If the mapping is bijective, we call it a marital isomorphism. We let ~,ut~ k denote the cheapest marital function with k men mar-

E s t i m a t e s o l u t i o n to t h e M W B M P r o b l e m for the following m a t r i x , usi'nq the 3 a p p r o x i mation algorithms.

0

Original Matrix 1 1 2 3

0

-1

0a 2 0b 1 1

1~ 1b 2 3 3

0a 2b 3 3 3

0 0 -3 a -2 b -2

0~ 1b 3 3 3

1a 2b 3 3 3

3b 2~ 3 3 3

Perturbed Matrix 0 0 1 2 1 -1 ~ -1 b 0 0

0b -1 ~ 0 0 0

1 0a 0b 0 0

3 0~ 0b 0 0

-1 0~ 0b 0 0 0

a s m a l l e s t e l e m e n t in c o l u m n (excluding t o p row). bsecond s m a l l e s t e l e m e n t in c o l u m n (excluding t o p row). T h e p e r t u r b e d m a t r i x is c o m p u t e d by s u b t r a c t i n g the t h i r d cheapest e l e m e n t in each row f r o m e v e r y e l e m e n t in t h a t row. A l g o r i t h m 1 (original a l g o r i t h m ) : T o p row e l e m e n t (child cost in p e r m u t a t i o n tree) p l u s s m a l l e s t e l e m e n t in each c o l u m n ( e x c l u d i n g . t o p row). E l e m e n t in t o p row S u m of c o l u m n selections MWBM Estimatel

+

0 4 4

1 3 4

1 2 4 •3 5 5

3 2 5

0 4 4

A l g o r i t h m 2 ( i m p r o v e d a l g o r i t h m ) : T o p row e l e m e n t plus s m a l l e s t or second s m a l l e s t e l e m e n t in c o l u m n ( e x c l u d i n g t o p row).

E l e m e n t in t o p row S u m of c o l u m n selections MWBM Estimate 2

+

0 6 6

1 5 6

1 5 6

2 4 6

3 4 7

0 5 5

A l g o r i t h m 3 (best a l g o r i t h m ) : A p p l y A l g o r i t h m 2 to p e r t u r b e d m a t r i x . T h e p e r t u r b e d m a t r i x ' s M W B M s o l u t i o n s are 12 less t h a n t h a t of the o r i g i n a l m a t r i x . Perturbation total E l e m e n t in t o p row S u m o f c o l u m n selections MWBM Estimate 3 Exact Solution

+

12 -1 -2 9

12 0 -4 8

12 0 -4 8

I2 1 -5 8

12 2 -5 9

12 -1 -5 .6

9

8

8

9

10

7

Fig. 2. Estimates to the MWBM problem.

Related distance graph matching O(n2) time for the WBM and MWBM problems. The lower bound approximation algorithms we have used are as follows: (1) (Original algorithm.) Allow each woman to marry her favourite man, with polygamy allowed. This is the cheapest overall marital function, ~k0, ko being the number of men married by the function. For the MWBM, the first man marries a particular woman; every other woman marries her favourite man, excluding the first man. (2) Beginning with ~k0 from algorithm 1, each keeps track of her second favourite man as well. When several women claim the same mate, they each consider their second choices. The woman who would experience the greatest relative loss by marriage to her second choice is allowed to marry her first choice. The other women must marry their second choice, with no further replacements allowed. This method gives a lower approximation to the cheapest marital isomorphism Jet°,. The replacement strategy finds the cheapest marital function for which no more than k 0 women get to keep their first choices. Since no more than k o women can keep their first choices in .Jr°,, the cost of this scheme cannot exceed C(~,). For the MWBM, each woman must know her two favourite men, excluding the first man, By precomputing total replacement costs, the run time remains O(n2). (3) When working with larger matrices, the chance of the above algorithms finding a near-monogamous marital function decreases. This is especially likely in a badly balanced matching (i.e. all the women wish to marry 1 or 2 men). In addressing this problem, we found that by perturbing columns--offering a "dowry" for undesirable m e n - - w e can enhance our chances of finding a near-monogamous marital function with algorithm (2). For large FCM's an effective dowry is the cost of matching a man with his 4th most compatible woman. This dowry ensures that each man has four women who tolerate him, thereby increasing the probability that a man will find a wife in the resulting marital function. The dowry is subtracted from each element in the man's column to create an adjusted FCM. Algorithm (2) is then applied to the adjusted matrix, after which the sum of the dowries is added to recover the approximation for the original FCM. Determining the fourth smallest element in each column slows the algorithm down, but the benefits for large or poorly suited matrices make this adjustment worthwhile. To test the accuracy of these lower bound approximation algorithms, we also computed the exact MWBM problem, which takes O(n 3) steps. Figure 2 illustrates the three approximation methods on a 6 x 6 forward-checking matrix.

355 5. F U T U R E - F U T U R E EDGES

The backtracking tree search explores mappings from the units to the labels. Every node of the tree represents a partial mapping. Based on the partial mapping at a node, the edges of each graph can be classified as past-past edges (both incident vertices are part of the partial mapping), past-future edges (one incident vertex is part of the partial mapping and the other is not) and future-future edges (neither incident vertex is part of the partial mapping). In the basic forward-checking algorithm presented by Shapiro and Haralick, only the past-future edges and past-past edges are examined. The future-future edges are examined in the lookahead algorithm in the same paper, but this algorithm was dismissed as being slower than the forward-checking algorithm. Our treatment of the future-future error allocates the errors to unmatched unit-label vertex pairs in the FCM according to the local properties of the graphs, so that the FCM contains both the exact past-future and the estimated future-future error information. While only an estimate of the future future error information can be used, it is important to maximize its contribution to the total error. Rather than counting edge inaccuracies, we found it simpler to modify our error computation formula to count errors only when an edge appears in the unit graph but not in the label graph. This type of error is called an edge deficiency and counts as two errors. If the number of label edges equals the number of unit edges, then each missing unit edge would have a corresponding missing label edge. In general, we use this formula to compute the error from the number of deficiencies: Error = Ilabel e d g e s ] - ] u n i t edges[ + 2[unit-edge deficiencies[. Other than simplifying the logic of our algorithms, the new edge counting method is equivalent to the standard one. When will henceforth count edge deficiencies rather than edge differences. One way of estimating the future-future error compares the in-degree or out-degree of vertices in the unmatched subgraphs (future-future edges). For instance, counting in-edges, a unit vertex with three in-future-future edges matched to a label vertex with one in-future-future edge yields a two-edge deficiency and thus an error of 4 is attributed to the unit-label pair. A better method uses the average of the in- and out-edge deficiencies as an error estimate rather than the in-edges or out-edges alone. This tends to spread out the errors among more elements in the FCM, which gives a generally higher FCM solution. Also, after the FCM is solved, the FCM solution may not be an even integer. Since the actual error contribution of the FCM must be an even integer, we can round the number up to the nearest even number. Exploiting this further, our current O(n2) future-future error formula

L. CINQUE et al.

356

Unit G r a p h

Label G r a p h

Ex. I Matching unit vertices (I, 2, 3, 4, 5, 6) with label vertices (A, B, C, D, E, F) (ignore box) yields 7 incorrect edges. With 9 unit edges, 10 unit edges, and 4 unit edges without label edge counterparts under this matching, the edge-dcficicncy formula also yields an error of 9 - 10 + 4 * 2 = 7. Ex. 2 N o w match (1,2) with (A,B), with the other vertices unmatched. The futurefuture edge counts are: Unit In (3,4,5,6): 0 1 2 1 Unit Out (3,4,5,6): 2 1 10 Label In (C,D,E,F): 0 1 1 2 Label Out (C,D,E,F): 1 2 1 0 The base error (llabel edges I -lunit edges I + 2.past-past deficiency) is 9 - 10 + 1 * 2 = 1. The total predicted error is this base error plus the solution of the F C M matrix. The rows of the F C M correspond to vertices (3,4,5,6); the columns to vertices (C,D,E,F). Each element in the F C M contains the past-future and future-future deficienciesfor its unit-label pair: pf-error+1.01(in-deficiencies)+ 0.99(out-deficiencies) 2 + 1.01(0) + 0.99(1) 0 + 1.01(1) + 0.99(0)

4 + 1.01(0) + 0.99(0) 0 + 1.01(0) + 0.99(0)

4 + 1.01(0) + 0.99(1) 0 + 1.01(0) + 0.99(0)

4 + 1.01(0) + 0.99(2) 0 + 1,01(0) + 0.99(1)

0 + 1.01(2) + 0.99(0)

2 + 1.01(1) + 0.99(0)

2 + 1.01(1)+ 0.99(0)

0 + 1.01(0) + 0.99(1)

0 + 1.01(1) + 0.99(0)

2 + 1.01(0) + 0.99(0)

2 + 1.01(0) + 0.99(0)

0 + 1.01(0) + 0.99(0)

Total F C M 2.99 4 4.99 5.98 1.01 0 0 0.99 2.02 3.01 3.01 0.99 1.01 2 2 0 Fig. 3. Future-Future edges.

for the unit-label pair (u, l) is:

future-future error(u, l)= 1.01 (in-edge deficiencies) + .99(out-edge deficiencies). This improves upon the straight average when the sum of the in-edge deficiencies and out-edge deficiencies is even and the number of in-edges exceeds the number of

out-edges. The past-future and past-past contributions to the error in the FCM are illustrated in Fig. 3. Other strategies to improve the future-future error contribution might take into account the number of edges attached to the vertices adjacent to each vertex rather than the vertex itself. This method attempts to make errors less local. We have not yet implemented this method.

Related distance graph matching

357

Table I. List of experiments Test

Graph 1

(V, E)

Graph 2

(V, E)

Start error

Depth

Best err6r

1 2 3 4 5 6 7 8 9 10 11

mSa ml3a m17a m17a m17b m27a m16a ml9a ml9a m27a m27a

(8, 20) (13, 25) (17, 25) (17, 25) (17, 25) (27, 54) (16, 20) (19,40) (19,40) (27, 54) (27, 54)

m8b m13b ml7b ml7c ml7c m27b ml6b m19b ml9b m27b m27b

(8, 20) (13, 30) (17, 27) (17, 27) (17, 27) (27, 55) (16,20) (19,40) (19,40) (27, 55) (27, 55)

none none 6 6 6 10 26 24 32 40 56

none none none none none none depth 3 depth 2 depth 4 depth 6 depth 8

12 19 2 2 0 0 none none none none none

Table 2. Results for algorithms (A), (B), (C) and (D)

6. E X P E R I M E N T S A N D R E S U L T S

If we seek a unit-label mapping only when the error is below a certain threshold, for faster cutoffs it is desirable to run the algorithm using this threshold as the initial best error. The algorithm will find the best mapping that surpasses the threshold or decide that no such mapping exists. The efficiency of using a low initial best error is often great enough to start with a low value and try successively higher initial best errors until a mapping is found. We tested our improved algorithms against the original forward checking algorithm on several different graphs, with different initial errors. In later tests We searched a small branch of the entire search'tree using a.low initial error. This error was known to be below the smallest error for mappings in that branch, but larger than the smallest error for all mappings in the tree. These later tests were designed to simulate a typical portion of the entire search, which would not be feasible for the slower algorithms. The four algorithms we employed for each graph were:

(A) The original forward-checking algorithm presented in the paper by Shapiro and Haralick (algorithm ), without future future considerations. (B) The original algorithm to solve the F C M , with our best future future error method (in/out edge deficiencies added to the FCM). (C) The best O(n2) F C M approximation (algorithm 3) used to solve the F C M , with our best future-future error method. (D) The forward checking matrix solved exactly, with our best future-future error method. (A)-(C) are O(n2) time per step, while (D) is O(n3) time, running several times slower per step on a 19 vertex graph. Tests 7-11 test small branches of the tree. The experiments are listed in Table 1, with the names of the graphs, numbers of vertices and edges, initial error, depth of the branch (if applicable) and best error found (if applicable). The numbers in Table 2 represent moves in the search tree. The starred items in column A are esti-

Test

A

B

C

D

1 2 3 4 5 6 7 8 9 10 11

4641 186059 6.0M* 1895 803 8.0M* 79186 160467 350169 1.4M* 9.8M*

803 18143 20327 203 147 4251 3038 2615 663 10279 45447

405 3255 121 65 109 347 682 391 77 5 707

343 2251 79 63 99 239 402 267 55 4 93

mates found by comparing 500 000 thousand moves to the number of moves taken by algorithm B to obtain to the same permutation. These tests were run on a serial computer, but we have also run experiments on a M a s P a r parallel computer with similar results. Our parallel programming approach is to first divide the entire permutation search into branch permutations, so every processor has a queue of branches on which to work. The branches are initially all of the same depth, and divided so that a maximal number of sizeable branches are given to each processor. If some PEs are without branches and all the branches are larger than minimal size, then the branches are split into smaller branches, and are spread around the PE's. When a best error is achieved, the results are broadcast to each PE so as to maximize the pruning process. In this context, we have discussed and analysed strategies that significantly improved the pruning function, leading to a smaller search tree and reduced search times. The details of the parallel implementation can be found in reference (11). 7. C O N C L U S I O N

We have presented a method for solving the relational distance graph matching problem for unlabeled graphs on a parallel computer. We have discussed

358

L. CINQUE et al.

m e t h o d s involved in finding a feasible way to store the graphs, t O represent the space of all possible matches a n d to m a n i p u l a t e the b r a n c h e s of the search tree a n d have described a n d analysed strategies to estimate the relational distance between edges in the u n m a p p e d p o r t i o n of the g r a p h a n d to calculate the m i n i m u m possible error resulting from the forward-checking matrix. W e have developed a m e t h o d t h a t estimates the forward forward error in the u n m a p p e d p o r t i o n of the graph. This m i n i m u m error is a d d e d to the base values of the F C M , after which the best possible error for all m a p p i n g s in the b r a n c h of the search tree is estimated from the data in the F C M . Finally we have calculated the m i n i m u m error from the values in F C M . W e have s h o w n t h a t finding this m i n i m u m error is equivalent to solving the weighted bipartite m a t c h i n g problem. O u r estimation m e t h o d s take O(n 2) time, while finding the exact m i n i m u m e r r o r is a n O(n 3) operation. T h e original future-error estimation m e t h o d in the S h a p i r o a n d H a r a l i c k a l g o r i t h m was to take the minim u m values in each row of the F C M a n d add these to get a lower b o u n d for the error. This is a p o o r estimate because m a n y of these m i n i m u m s can occur in the same column, which ignores m u c h of the matrix. An i m p r o v e m e n t is to force rows to take their second m i n i m u m if a n o t h e r row chose the same c o l u m n as this row a n d to replace the other row's choice would be m o r e expensive t h a n replacing this row. A further i m p r o v e m e n t is to p r e c o n d i t i o n the m a t r i x so t h a t there is a smaller likelihood for such c o l u m n collisions to take place. These i m p r o v e m e n t s are e n o u g h to m a k e

the F C M estimate a p p r o a c h the exact W B M solution for moderate-sized matrices. REFERENCES

1. H.G. Borrow, A. P. Ambler and R. M. Burstall, Some techniques for recognizing structures in pictures, Frontiers of Pattern Recognition, pp. 1-29, Academic Press, New York (1972). 2. J.R. Ullman, An algorithm for subgraph homomorphisms, J. Ass. Comput. Mach. 23, 31-42 (1976). 3. A. Rosenfeld, R. A. Hummel and S.W. Zucker, Scene labeling by relaxaton operators, IEEE Trans. Syst. Man Cybernet. SMC-6, 420 443 (1976). 4. R.M. Haralick and G. Elliot, Increasing tree search efficiency for constraint satisfaction problems, Proc. 6th Int. Joint Conf. Artif. lntell. (1979). 5. R. M. Haralick and L. G. Shapiro, The consistent labeling problem: Part 1, IEEE Trans. Pattern Anal. Mach. lmell. PAMI-1,173 184 (1979). 6. L.G. Shapiro and R. M. Haralick, Structural descriptions and inexact matching, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-3, 504-519 (1981). 7. L. G. Shapiro and R. M. Haralick, A metric for computing relational description, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-7, 90-94 (1985). 8. A..Sanfelieu and K. S. Fu, A distance measure between attributed relational graphs for pattern recognition, IEEE Trans. Syst. Man Cybernet. SMC-13, 353-362 (1983). 9. K. Sengupta and K.L. Boyer, Information theoretic clustering of large structural modelbases, Proc. IEEE Conf. Comput. Vis. Pattern Recognition 174-179 (1993). 10. R. M. Haralick and J. Kartus, Arrangements, homomorphisms, and discrete relaxation, IEEE Trans. Syst. Man Cybernet. SMC-13, 600-612 (1976). 11. R. Allen, D. Yasuda, S. Tanimoto, L. Shapiro and L. Cinque, A parallel algorithm for graph matching and its MasPar implementation, Proc. 1EEE Workshop Comput. Architect. Mach. Percept. 13-18 (1993).

About the Author LINDA G. SHAPIRO was born in Chicago, Illinois, 1949. She received the B.S. degree in Mathematics from the University of Illinois, Urbana, in 1970, and the M.S. and Ph.D. degrees in Computer Science from the University of Iowa, Iowa City, in 1972 and 1974, respectively. She was an Assstant Professor of Computer Science at Kansas State University, Manhattan, from 1974 to 1978 and was an Assistant Professor of Computer Science from 1979 to 1981 and Associate Professor of Computer Science from 1981 to 1984 at Virginia Polytechnic Institute and State University, Blacksburg. She was Director of Intelligent Systems at Machine Vision International in Ann Arbor from 1984 to 1986. She is currently Professor of Computer Science and Engineering and of Electrical Engineering at the University of Washington. Her research interests include computer vision, artificial intelligence, pattern recognition, robotics and spatial database systems. She has co-authored two textbooks, one on data structures and one on computer and robot vision. Dr Shapiro is a senior member of the IEEE Computer Society and a member of the Association for Computing Machinery, the Pattern Recognition Society and the American Association for Artificial Intelligence. She is past Editor of CVGIP: Image Understanding and is currently an editorial board member of IEEE Transactions on Pattern Analysis and Machine Intelligence and of Pattern Recognition. She was co-Program Chairman of the IEEE Conference on Computer Vision and Pattern Recognition in 1994, General Chairman of the IEEE Workshop o n Directions in Automated CAD-Based Vision in 1991, General Chairman of the IEEE Conference on Computer Vision and Pattern Recognition in 1986, General Chairman of the IEEE Computer Vision Workshop in 1985 and co-Program Chairman of the IEEE Computer Vision Workshop in 1982; she has served on the program committees of a number of vision and AI workshops and conferences.

About the Author--STEVEN L. TANIMOTO is Professor of Computer Science and Adjunct Professor of

Electrical Engineering at the University of Washington in Seattle. His interests include parallel image processing, artificial intelligence and educational technology, He has held visiting positions at the University of Paris, Link6ping University (Sweden), Kobe University (Japan), the National Institutes of Standards and Technology and Thinkifig Machine Cooperation. Dr Tanimoto served an an Associate Editor of 1EEE

Related distance graph matching

Transactions on Pattern Analysis and Machine Intelligence from 1983 to 1986, and as Editor-in-Chief from 1986 to 1900. His text, The Elements of Artificial Intelligence Usin 9 Common Lisp, Second Edition, was published by W. H. Freeman in 1995. He served as the program chair for the 1994 International Conference on Pattern Recognition subconference on parallel computation and as co-program chair of the 1994 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. He received the A.B. from Harvard in 1971 and the Ph.D. from Princeton in 1975.

About the Author LUIGI CINQUE received his Ph,D. degree in Physics from the University of Napoli, in 1983. After a few years spent at Artificial Intelligence Laboratory (Selenia S.p.A), working on process monitoring and malfunction diagnosis, expert systems, user interfaces and knowledge-based vision systems, in 1990 he joined as a research associate the Department of Computer Science of the University "La Sapienza" of Rome, working on object recognition, parallel architectures and algorithms for image processing and computer vision. In 1992 he was visiting scientist at the University of Washington (Seattle, USA), working on parallel graph matching algorithms and their implementation on parallel machines. His current interests include parallel algorithms and architectures, pattern recognition, 3D object recognition and computer vision. Dr Cinque is a member of IEEE Computer Society, the Pattern Recognition Society and the International Association for Artificial Intelligence.

About the Author ROBERT ALLEN received his BS in Computer Software from Regent's College of the State of New York. After a successful career in non-profit business management, he entered the M.S. program at Stanford University where he earned his masters in Computer Science. He is currently employed by Hewlett-Packard Corporation.

About the Author--DEAN YASUDA received his BS in Mathematics at the University of Washington in 1988, and his MA in Mathematics at the University of California, San Diego, in 1990. He is currently a graduate student at University of Washington and a scientific computer programmer at the Center for Quantitative Sciences, University of Washington working on tracking the wild salmon runs on the Columbia River tributaries.

359