Pattern Recognition, Vol. 31, No. 2, pp. 205-218, 1998 © 1997 Pattern Recognition Society. Published by Elsevier Science Ltd Printed in Great Britain. All rights reserved 0031-3203/98517.00 + .00
Pergamon
PII: S0031-3203 (97) 00041-1
A NEW A L G O R I T H M FOR SUBGRAPH O P T I M A L ISOMORPHISM YASSER EL-SONBATY t and M. A. I S M A I L : t Department of Computer and Electrical Engineering, Arab Academy for Science & Technology, Alexandria 1029, Egypt Department of Computer Science, University of Alexandria, Alexandria 21544, Egypt (Received 30 April 1996; accepted 13 February 1997)
Abstract--In this paper a new algorithm for subgraph isomorphism is proposed. The main idea of the new algorithm is to decompose the graphs to be matched into smaller subgraphs. The matching process is then done at the level of the decomposed subgraphs based on the concept of error-correcting transformations. The cost of matching two graphs is defined as the minimum of a weighted bipartite graph constructed from the decomposed subgraphs. The average computational complexity of the proposed algorithm is found to be O(N4). The results of the application of the new algorithm show that the new technique is quite efficient and, in many respects, superior to similar existing techniques in the literature. Besides, it overcomes most of the problems encountered in using tree search algorithms. The new algorithm is also suitable for parallel processing implementation. © 1997 Pattern Recognition Society. Published by Elsevier Science Ltd. Decision tree Direct transformation Subgraph isomorphism Graph decomposition Weighted bipartite graph
I. INTRODUCTION Describing structured objects in terms of their subpatterns, primitives and mutual relations, plays an important role in many applications in pattern recognition, image analysis and computer vision, t~) Attributed relational graphs (ARGs) are one of the most powerful tools in describing structured objects. In this representation, nodes represent primitives or subpatterns of structured objects and branches between nodes represent relations between primitives or subpatterns. ~1) One way to recognize the structure of an unknown pattern is to transform it into an ARG, then match it with other ARGs representing structures of prototype patterns. This process of matching is called graph isomorphism. Formally, two graphs G and G' are said to be isomorphic (to each other) if there is a one-to-one correspondence between their vertices and between their edges such that incidence relationship is preserved/2) If the isomorphism is encountered between a graph and a subgraph of another larger graph, then the problem is called subgraph isomorphism or graph monomorphism. The notion of optimality becomes important in subgraph isomorphism when a perfect matching between the attributed values of the two graphs may not be available/3) Inexact subgraph matching of weighted relational graphs has been investigated by Shapiro and Haralick.cA) Relaxation methods were used in Kitchen, <5) Davis, ~6) Li, t7) Christmas et al. ~s) and Berthod et al. ~9) Ghahraman et al. ~°) proposed an algorithm based on
getting subgraphs of the Cartesian graph product. Subgraph error-correcting isomorphism was introduced by Tsai and Fu t11) and Eshera and Fu/12'1a) An algorithm for graph optimal monomorphism was proposed by Wong et al. ~13) The approach of neural networks was also used for solving the problem of graph isomorphism like in references (14-16). Some recent techniques for graph matching were introduced by Gold and Rangarjan ~17)based on combining graduated nonconvexity and the two-way (assignment) constraints for matching both weighted and attributed relational graphs, and by Depero et al. (~s) using a direct classification of node attendance. Various applications of the subgraph optimal isomorphism have been outlined in reference (10). An experiment on image understanding by inexact graphmatching can be found in reference (12). Trials for matching graphs of the same size can be found in references (19-27). In all these papers, we can find just few papers for matching attributed relational graphs (1'3'11-13'17'xs) while there are many researches for matching other types of graphs. Most of the techniques mentioned above and which deal with the problem of subgraph optimal isomorphism for attributed relational graphs have the problem of computational complexity as it may grow exponentially when increasing the sizes of matched graphs. Some algorithms have polynomial complexity for special types of graphs but still have exponential complexity at the worst case. The core of this paper is to introduce a new algorithm for solving the problem of subgraph optimal
205
206
Y. EL-SONBATY and M. A. ISMAIL
isomorphism based on the idea of decomposing the graphs into smaller subgraphs and perform the matching between the graphs at the level of their decomposed subgraphs. The main advantage of the proposed algorithm is its average computational complexity which is O(N4), while the worst case complexity is O(NS). The definition of the attributed relational graph, how to decompose it into smaller subgraph, and finally how to match the decomposed subgraphs are shown in Section 2. Section 3 introduces the proposed algorithm for subgraph optimal isomorphism with analysis of its computational complexity. Experimental results are presented in Section 4, and finally, the conclusions of the proposed algorithm are given in Section 5.
2. G R A P H D E C O M P O S I T I O N
In this section the definition of attributed relational graphs is reviewed, followed by an introduction as to how to decompose a graph into smaller subgraphs, and finally how to match these decomposed subgraphs.
2.1. Definition of attributed relational graphs An attributed relational graph (ARG) is defined a s (11)
G = (N, B, A, E, GN, GB), where N (N = {nl, nz . . . . . nlNI}). A finite set of nodes; [NI is the number of nodes in N. B (B = {bl, b2 . . . . . blBi).A set of ordered node pairs (or directed branch) i.e., b = (ni, ni) for some 1 < I, J < [NI denotes the branch emanating from node nl to node nj, and Inl is the number of branches in B.
A An alphabet of node attributes. E An alphabet of branch attributes. GN A function (or a set of functions) for generating the node attributes. GB A function (or a set of functions) for generating the branch attributes.
2.2. Decomposition of attributed relational graphs The main idea of decomposing the ARGs is that, the smaller the graph structure, the easier the matching process, and this will result in lower complexity than matching the original ARGs, which is the main contribution of this paper. The graphs resulting from decomposing an ARG are called basic attributed relational graphs (BARGs) and this notion is adopted from reference (13). The BARG is in the form of one-level tree which consists of a node and all nodes connected to that node whether the connected branches are in or out of that node. BARGs are different from those in reference (13) where only emanating branches and nodes on which these branches terminate are encountered. The new
(b)( e ~ )
(e)
!
(d)
Fig. 1. (a) Original graph; (b), (c), (d), (e), (f) Graph decomposition of (a).
definition gives the matching process more stability and preserves the structure of the original graph. Figure 1 shows an example of decomposing a graph with five nodes into five basic graphs 1-(b)-(f). At each step of decomposition, a node from the original graph is selected to be the root node and its basic graph is completed according to the structure of the original graph. The basic graphs for nodes a, b, c, d, and e are shown in Figs 1 (b)-(f), respectively. The graph decomposition can be done using the following algorithm: Decompose (Graph) F o r i = 1 to No_of_Nodes(Graph) BARG [i] = {} For i = 1 to No_of_Nodes(Graph) For j = 1 to No_of_Nodes(Graph) ir aij ~ 0 then BARG[i] = BARG[i] U ( j t h node, aij) {undirected graph} if alj or alj ~ 0 then BARG[i] = BARG[i] U(jth node, aij, aij) {directed graph}
f
End Decompose
2.3. Matching BARGs Matching BARGs is much easier than matching ARGs and this is because the structure of BARGs is simpler and easier in processing than ARGs. The same distance measures for ARGs can also be adopted for BARGs. There are three general methods for calculating the distance between two ARGs: (1) extracting the main features of each graph and computing the distance as a function of the distance between the features. This type of distance needs the graphs to be well defined by specific features and this cannot be available all the time. One of the applications based on this methodology can be found in. (28) (2) based on grammar matching. Grammar matching consists of a set of nonterminal node and edge labels, a set of terminal node and edge labels, a starting graph and a set of productions. The left and right sides of a production are graphs. The language generated by a graph grammar is the set
A new algorithm for subgraph optimal isomorphism of all graphs with terminal node and edge labels that can be derived from the starting graph using productions. The distance is based on the minimum number of rules applied to the staring graph of the grammar to produce the input graph. There are some problems in using graph grammar that graph grammar is not applicable when training samples are too few to infer pattern grammar or when each pattern itself could be regarded as a prototype of the pattern class. Also, the construction of graph grammar from a group of graphs is a tough problem with high complexity associated with the parsing operation which is needed to determine if the input graph is syntactically well formed in the context of one or more prespecified grammar. ~29) A study about graph grammar was introduced by Shaw, ~3°' 31) followed by Web grammar. ~32'33) Some research in the theory of graph grammar has been carried out by Pavlidis. ~34'35) A survey about grammar matching can be found in. t36) (3) using the concept of error-correcting transformation. The cost of matching two ARGs is defined as the cost of the sequence of transformations that possesses minimum total cost and that must be performed on one of the two ARGs in order to produce the other ARG. The operations permitted to transform one A R G to the other are: node insertion, node deletion, branch insertion, branch deletion, node label substitution and branch label substitution. A cost (weight) is associated with each operation and its value is determined by an optimization procedure or heuristically. The costs corresponding to each operation are: Wni, Wna, Wbi, Wbd, Wns, and Wbs respectively. This type of distance is logical and intuitive but its main problem is the complexity that may grow up exponentially when increasing the graph size. Some research in this area can be found in references (1,11-13,37-39). In this paper we adopt the method of error-correcting transformations to calculate the distance between two BARGs because it is more logical and easier in processing than other methodologies, besides its
results have strong interpretation. A new operation structure preservation is added here with cost Wsp.The main function of the new operation is to help preserving the global structure of the original ARG after performing the operation of graph decomposition and its use will be declared later in this section. Given two BARGs U and V, as shown in Fig. 2, the cost of matching U and V is Distance (U,V) = Wns*dist(ni, vj) + min(w,i, w,d)*abs (k - p) + min(wbi, Wba)*abs(k - p) + dist (b's,e's)
.o
.
(1)
where • dist(n~,vj) is the distance between node labels of ni and v~ and node attributes of n'i and v), and is computed depending on their data types. • k and p are the number of branches connected to the root node of the BARGs being matched, and • dist(b's,e's) is calculated as the minimum of a weighted bipartite graph constructed from b's and e's as their nodes and its structure is as shown in Fig. 3. There are many algorithms ~4°- 45) for computing the minimum of a weighted bipartite graph which is defined as the complete matching with the sum of the weights of matching edges is minimum. ~46) This process is known to have O(N 2) average case complexity, and O(N 3) worst-case complexity. (4s) The weight of any branch connecting two nodes, say by and eh in the bipartite graph is the distance between by and eh and is calculated as follows: distance(by, eh) = Wbs*dist(bf, eh) 4- Wsr,*dist(n r, Vh) (2) where dist(x,y) is the distance between x and y, and their attributes, which is computed based on their data type. Illustrative example: assume we would like to get the cost of matching of two BARGs U and V shown in Fig. 4.
n" I
.
207
J
.o
-
oo
.. V"
U
V
Fig. 2. Two BARGs; U and V.
P
208
Y. EL-SONBATY and M. A. ISMAIL
el bI
e2
b,.
e3
bk
ep
Fig. 3. Weighted bipartite graph corresponding to branches distance.
So, d i s t a n c e ( U , V ) = l + l + l + 2 = 5 .
3. THE ALGORITHM
u
v
Fig. 4. Two B A R G s to be matched.
Distance (U,V) = w,s*dist(m, n) + min(wni, w,d) *abs(k - p) + min(wbl, Wbd)*abs(k -- p) + dist(branches of U, branches of V) We assume, for simplicity, that the distance between any two qualitative values is equal to 0 if they are identical, 1 otherwise. If the values are numeric, the distance is defined as the absolute difference between these values. The values of Wni, Wnd, Wbi, Wbd, Wns, Wb~and Wsp are all assumed to equal 1. k = number of branches in U = 2 p = number of branches in V = 3 dist(m,n) = 1
min (Wni,Wnd)* abs(k -- p) = 1 min(wbi,wb~) * abs(k - p) = 1 dist(branches of U, branches of V) =
®®
(~ (~
The main idea of the proposed algorithm is to decompose both reference and input graphs; say Gr and Gi, respectively; to be matched into BARGs as previously mentioned in Section 2.2. A distance matrix "D" between reference and input graphs is then constructed and is equivalent to the distances between the BARGs of both reference and input graphs. The labels of the rows in the distance matrix represent the BARGs of the input graph, while the labels of the columns represent the BARG~ of the reference graph. Di,j represents the distance between the ith BARG in the input graph and the jth BARG in the reference graph and is calculated as mentioned in Section 2.3. After calculating the distance matrix D, a weighted bipartite graph is constructed which is equivalent to the distance matrix D where each branch connecting two nodes, one represents a BARG from the input graph and the other node represents a BARG from the reference graph, has a weight equivalent to the distance between the two BARGs connected by this branch as shown in Fig. 5. The minimum of the weighted bipartite graph which is equivalent to the distance matrix is considered as the cost of matching the input and the reference graphs. Every pair of BARGs (one BARG from input graph and the other is from reference graph) in the weighted bipartite graph connected by a branch whose weight is taken in calculating the minimum of the weighted bipartite graph, is considered to be matched, i.e., the
®
1+3
1+3
0+I
i 1+I
0+I
1"3
®® ®
,
@ ,
[5;
=
2
A new algorithm for subgraph optimal isomorphism
REFERENCE
R
GRAPH
BARGI BARC.2r ...................... BARGI BAR~ BARC~
209
BAR(:
X
X . . . . . . . . . . . . . . . . . . .
X
X
X
. . . . . . . . . . . . . . . . . . .
X
X
X . . . . . . . . . . . . . . . . . . .
X
!
N P U T BARCn
BARG~
E F E
BARG2
R
BARGa
N C E
G G
R
G
A P BARC,m x
X . . . . . . . . . . . . . . . . . . .
X
R
B,~RG,
H
A P H
(a)
(b)
Fig. 5. (a) The distance matrix between input and reference graphs. (b) Weighted bipartite graph corresponding to the distance matrix.
R
BARG) BARC~ r2
E
ff~
BARC~.RE BARC~N
so,.
C
G
0v
R
G
A P H
R
BARC~O
~ BARG.AP
lJ
V
[]
(a)
(b)
Fig. 6. (a) The structure of the minimum weightedbipartite graph. (b) If U matches V then ni matches vj.
node rooted the BARG of input graph matches the node rooted the BARG of reference graph. Figure 6 explains the above idea.
(5) Construct the weighted bipartite graph equivalent to the Distance_matrix. (6) Cost_of_matching = the minimum of the weighted bipartite graph.
3.1. Proposed algorithm
3.2. Analysis of computational complexity
The proposed algorithm can be summarized in the following steps: (1) Read both reference and input graphs. (2) D e c o m p o s e (reference graph) (3) Decompose (input graph) (4) For i = ! to No_of~nodes(input graph) F o r j = I to No_of_nodes(reference graph) Distance-matrix [i,j] = distance between ith BARG of input graph and jth BARG of reference graph {review equations 1,2} End For End For
It is well known that the graph matching problem belongs to the class of NP problems and any conventional search algorithm will require exponential time. The proposed cost of matching is defined as the minimum of the weighted bipartite graph constructed from the BARGs of both input and reference graphs as its nodes and the weight associated with a branch connecting two BARGs is defined as the minimum total-cost sequence of error transformations that is required to transform one BARG to the other. In this section we investigate the computational complexity of the proposed approach.
210
Y. EL-SONBATY and M. A. ISMAIL
Suppose that we are given a reference graph R with M nodes and an input graph I with N nodes. The first step in the algorithm is to decompose both R and I into BARGs. It is obvious that this step is done in quadratic time for each graph, i.e., has time complexity of O(M 2 + N2). The second step in the algorithm is to calculate the distance between all the BARGs of input and reference graphs. The computational complexity of matching two BARGs mainly depends on the calculation of error-transformations between the two BARGs and the computational complexity of getting the minimum of a weighted bipartite graph. The computational complexity of calculating the error-correcting transformations is O(62), where 6 denotes the maximum number of branches connected to any node in the matched BARGs. The average computational complexity of getting the minimum of a weighted bipartite graph is O(M*N) [45], assuming that any node is connected to all other nodes in the graph. So we can consider the computational complexity of matching two BARGs is O(M*N). The calculation of the distance between the BARGs is repeated for all the BARGs of both input and reference graphs, and that means, the average computational complexity of second step is O(M2*N2), where M*N >~62. The best and
~feC
worst cases for this step have computational complexity of O(M*N min(M,N)) and O(M2*N 2 min(M,N)), respectively. The third step is to calculate the cost of matching input and reference graphs and which is defined as the minimum of the weighted bipartite graph constructed in step 2 and the computational complexity of this step is O(M*N). In summary, the average computational complexity of the proposed algorithm in calculating the cost of matching two ARGs with M and N nodes is O(M2*N2). The best and worst computational complexity are O(M*N min(M,N)), and O(M2*N 2 min(M, N)), respectively. It can be easily proven that the average computational complexity of the proposed algorithm with respect to the number of edges is O (l p), where I and p are the number of edges in the two graphs. The computational complexity of the new algorithm is much better than the graduated assignment algorithm I1~1that has a low-order complexity of O(lp) and also than algorithms (1' 3.11-13) which have exponential complexity. Illustrative example: assume we need to match the two letters shown in Fig. 7(a). The primitives and relations used in representing these letters are shown in Fig. 7(b), where (X .... Y.... Xmi,, Ymin) shows the
(Xmax, Ymax, Xmin, Ymin) 1 I I I
e
cl
L =
VY
Primitives
{ o 1 2 3
equal le~ than greaterthan all possible
Relations
(a)
(b)
(09-J2)~
v
(/(22,02)
"r" (c)
Fig. 7 (a) Letters to be matched. (b) Primitives and relations. (c) Graph representation of the two letters.
A new algorithm for subgraph optimal isomorphism
211
l.O,t~.o.2)
02 02
7
11
7
t,0A,0)~
6
Fig. 8. Distance matrix of graphs in Fig. 7(c).
/
v
\
y
Fig. 9. Matched strokes obtained from the minimum of the weighted bipartite graph equivalent to Fig. 8. relations between the x - y coordinates of two primitives. The graphs equivalent to the letters are shown in Fig. 7(c). Figure 8 shows the distance matrix equivalent to the graphs shown in Fig. 7(c) where we assumed the distance between two primitive is equal to the absolute difference of their order and the distance between any two relations is equal to the sum of the binary distance (0 when identical, 1 otherwise} between the four relations Xm,x, Ymax, Xmin and Ymin. The values of w.~, w.a, wbi, Wbd,Wbs,Wns and Wspare all assumed to equal 1. In getting the minimum of the weighted bipartite graph equivalent to the distance matrix, the cost of matching the two letters is 3 + 3 = 6. Figure 9 shows the mapping between matched strokes in the two letters obtained from the minimum weighted bipartite graph in Fig. 8. 4. EXPERIMENTALRESULTS In this section, the proposed algorithm is tested using some experiments that show its efficiency and capability of handling various ARGs with different structures. Experiment 1: one of the computer vision problems is used to test the proposed algorithm. The problem is to identify an image graph of the prominent runways of the Jacksonville international airport from the picture shown in Fig. 10: 3' '~) The image graph has to be matched with three runway models of the open-V runway configuration. Airports that have open-V runway are Houston intercontinental airport,
(a)
1 2
3 (b)
Fig. 10. (a) Photograph of Jacksonville International Airport.(3:7J (b) Image graph of Jacksonville International Airport. Jacksonville international airport, and finally, Midcontinent airport at Wichita and are shown in Fig. 1 l(a), (b), and (c), respectively. The matrix representation of the attributed graphs of Figs 10 and 11 are shown in Tables 1-4 respectively. An entry a~j of each matrix represents: (1) The attribute value of vertex i, i.e., the length of runways i, for i --j; or otherwise,
212
Y. EL-SONBATY and M. A. ISMAIL Table 1. Attributed values of Fig. 6 1
1 2 3 4 5 6
654
62
2
3
4
5 + 0 54+82 27+70 59 49+82 22+70 60 27+12 25 Symmetrical
5
31+12 26+12 23+70 4 + 58 26
6
29+41 24+41 25+41 7 + 29 7 + 29 54
Table 2. Attributed values of Fig. 7(a)
(.) 1
2 3 4 5 6
6
1
2
6
4
(c) Fig. l 1. (a) Runway model A. (b) Runway model B. (c) Runway model C.
(2) The attributed value of edge (/j), i.e., the centre-tocenter distance between runway i and runway j plus the cute distance angle between them. TM In this example, we find optimal matching between the image graph and each of the three models. The values needed for w's are given in order to dominate the weights of Wspto preserve the graph structure and
1
2
50
3+0
50
3
4
5
6
5+0
38+62
2+0 60
35+62 35+62 65
40+62 37+62 37+62 2+0 65
42+62 39+62 39+62 5+0 3+ 0 65
Symmetrical
Wb~for the branch substitution of the second attribute (cute angle between runways) in order to minimize the effect of rotation encountered when taking the image from different views. Table 5 shows the output of the proposed algorithm, which indicates that the image graph of the Jacksonville runways and runway model B incur the lowest cost of matching. The results of matching are consistent with the results published in reference (3). Experiment 2: In this experiment, an application from character recognition is used. The problem is to match a group of multi-font characters shown in Fig. 12(a) with a set of normal-font characters shown in Fig 12(b). Each character is represented as an ARG, where nodes represent primitives which are shown in Fig. 13. The attributes of an edge, connecting nodes ni and nj, are in the form of (X .... Y.... Xmi,, Ymin,Connection), where X .... Y.... X~i,, and Yminhave the values 0, 1, 2, or 3 when X - Y coordinates of ni and nj are in the relation = , < , > , or all possible, respectively. Connection has the value of 1 when ni and nj are connected, 0 otherwise. Each character in Fig. 12(a) is matched with all characters shown in Fig. 12(b) and the character with the minimum distance is selected as the matched character. Table 6 shows the results of the matching process. From Table 6, the multi-font characters are correctly mapped to their equivalent normal-font characters except ~ , I and ~ because these characters are closer in structure to Z, T and 0 than ~, z and 2. Experiment 3: In this experiment we use the simulation method to test the performance of the new algorithm in handling sparsely connected graphs.
A new algorithm for subgraph optimal isomorphism
213
Table 3. Attributed values of Fig. 7(b)
1
1
2
3
4
5
6
70
6 + 0
45 + 60
70
40 + 60 72
49 + 60 45 + 60 6+ 0 72
22 + 78 17 + 78 23 + 20 27 + 20 30 10 + 42
27 + 20 22 + 20 18 + 78 23 + 78 5 + 82
2 3 4 5 6 7
Symmetrical
30
7
28 + 22 + 22 + 27 + 10 +
60 60 60 60 40
50
Table 4. Attribute values of Fig. 7(c)
1
1
2
3
4
5
6
7
8
75
17 + 0 38
44 +50 38 +50 37
51 +50 45 +50 9 +0 55
47 +50 46 +50 13 +0 17 +0 85
21 +65 23 +65 24 +65 32 +65 25 +65 25
21 +65 25 +65 26 +65 34 +65 26 +65 4 +0 22
27 +25 37 +25 37 +25 45 +25 32 +25 19 +90 15 +90 30
2 3 4 5 6 7 8
Symmetrical
Table 5. Results of subgraph optimal isomorphism Base graph
Matching cost
Model A
5660
Model B
3847
Model C
4430
Vertex mapping (1,6),(2,5),(3,1), (4,2),(5,4),(6,3) (1,1),(2,2),(3,4), (4,5),(5,6),(6,7) (1,5),(2,4),(3,1), (4,2),(5,3),(6,8)
Attributed relational graphs of size 100 are generated with 10% degree of connectivity. The weights of edges are real numbers and are produced randomly in the range of 0-1. Nodes have either three or five random binary valued attributes. After the generation of the graphs, 60% or 80% nodes are deleted and uniform noise is added to the edges. The noise levels are in the range of {0.00, 0.02, 0.04, 0.06, 0.08, 0.i0, 0.12, 0.14,
A
B
C
HT N
~:
,
-
V
f
Fig. 13. Primitives used in experiment 2.
0.16, 0.18, 0.20}. One hundred trials are run at each noise level. The results of the proposed algorithm in comparison with the results obtained from the algorithm of graduated assignment (~7~ are shown in Figs 14-16 where (a) is the proposed algorithm and
[
FG
A BCDE
K
LM
H I J K LM
D
J
b
Q
v'~ X
R
y
S
Z
T
0
Pq
FG
R ST
U VWX
(a) Fig. 12. (a) Multi-font characters. (b) Normal-font characters.
YZ
(b)
N
214
Y. EL-SONBATY and M. A. ISMAIL Table 6. Results of experiment 2.
IA.A) IE.EI Iz.T} IN.MI I.~ .0) lu, U) ~ , Y)
IB.BI IF.Z) (/.JI (II.N) JR.R) IV.V) IZ,ZI
IC.Cl IGoGI (K.KI 10.OI iS.S) (:..W)
(D.D} (H.HI I/..L) (P.PI iT.T) IX.X)
2O
15
(b)
10 '-
/ --- " t
(a)
16
20
5 n n
n
0
4
8
l n
12
Noise level Fig. 14. Results of matching ARGs with 5 binary attributes
and 60% deleted.
('0)
60
/
/
40
/ J
2O /
o
o
f
(~
I
f p
0
4
8
12
16
20
Noise level Fig. 15. Results of matching ARGs with 5 binary attributes
and 80% deleted.
(b) is the graduated assignment algorithm. The percent of incorrect matches is calculated as the average percent of mismatched nodes over the 100 trials. From Figs 14-16, it can be concluded that the proposed algorithm has lower percentage of incorrect matches than the recent technique of graduated assignment3 ~71 Its percent of incorrect matches is almost half that of graduated assignment algorithm. Experiment 4: Figure 17 demonstrates a comparison between the proposed algorithm and the algorithm of DCA recently introduced in reference (18). In this experiment, two graphs of sizes 15 and 30 are matched at different noise levels r/, where r/is in the range of {0.00, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45}. Nodes and edges have four attributes of integer type and each node has a connectivity degree of ~4. It can be easily concluded that the proposed algorithm is better and more stable than the DCA algorithm. Experiment 5: In this experiment, the performance of the proposed algorithm in matching weighted graphs is tested. The same simulation experiments as in experiment 3 are rerun but for weighted graphs {i.e. removing node attributes}. Noise levels in the range of +_ {0.00, 0.02, 0.04, 0.06, 0.08, 0.10} are added to edges that have random numeric weights in the range of {0 -1.00}. Two experiments are run for 60% deleted and 10% connectivity and 60% deleted and 15% connectivity. Figures 18 and 19 show the results of these experiments in comparison with the results of the graduated assignment algorithm °7) where (a) is the proposed algorithm and (b) is the graduated assignment algorithm. It can be deduced from Figs 18 and 19 that the proposed algorithm has much lower percentage of incorrect matches than the graduated assignment algorithm. Experiment 6: In this experiment, we use the simulation method to test the behavior of the proposed algorithm in matching fully connected attributed relational graphs at different noise levels. Attributed relational graphs of different sizes ranging from 10 to 50 are generated. The attributes of both nodes and edges are real numbers and are produced randomly in the
80
40
. . . . ~ ~ ~'~
20 0
I 0
-7"---- I 4
I 8
I
~ 12
I
b 16
I
1 20
Noise level Fig. 16. Results of matching ARGs with 3 binary attributes and 80% deleted.
A new algorithm for subgraph optimal isomorphism
215
Sparsely Attributed Relational Graphs proposeAalgorithm
100 80 60
E
40
~algorithm
20 0 0%
I
t
I
[
f
I
I
I
I
5%
10%
15%
20%
25%
30%
35%
40%
45%
noise Fig. 17. Results of the comparison between the proposed algorithm and DCA algorithm.°8)
100
(b)
80
E
f
60
(a)
,"
~0 k,,
¢~
20 0 0
2
4
6
8
10
Noise level Fig. 18. Results of matching weighted graphs with 10% connectivity and 60% deleted.
attributes are added to produce new graphs to be matched with the original ones. A hundred graphs of attributed relational graphs are generated and matched with each other to get the subgraph isomorphism using the proposed algorithm. Figures 20 (a)-(c) shows the percentage of incorrect matching obtained for subgraphs of sizes 10, 20 and 30, respectively. From Fig. 20, we can see that the proposed algorithm can get the correct matching for graphs with large and complex structures in different noise levels. The new algorithm got a 100% correct matching for totally complete attributed relational graphs at noise levels less than 16% and minimum of 96% for noise level of 20%. 5. CONCLUSIONS
100
~
80
.~
60
E ~
40
t_
e
20 0
(b)
(a)
jY t
I
I
4
6
8
10
Noise level Fig. 19. Results of matching weighted graphs with 15% connectivity and 60% deleted.
range of 0-1.0. Different scales of uniform noise are added to node and edge attributes. The noise are in the range of - ~ to 4, where ( = {0.00, 0.04, 0.80, 0.12, 0.16, 0.20}. After adding the noise, the graphs are shuffled and new nodes and branches with random
In this paper, a new algorithm for subgraph optimal isomorphism is proposed. The proposed algorithm decomposes the reference and input graphs into smaller subgraphs called BARGs. A weighted bipartite graph is then constructed with the BARGs as its nodes and the weight associated with a branch is calculated as the distance between the two BARGs connected by this branch. The distance is calculated based on the concept of error-correcting transformations. The subgraph optimal isomorphism is found to be the minimum of this weighted bipartite graph. From experimental results and complexity analysis, the following points can be concluded: (1) the new algorithm can be used efficiently for sparsely and fully connected attributed relational graphs, and also for other types of graphs like attributed graphs and weighted graphs; (2) the proposed algorithm proved its superiority over other recent techniques reported in the literature in matching sparsely and fully connected attributed relational graphs;
216
Y. EL-SONBATY and M. A. ISMAIL
Totally Complete Attributed Relational Graphs 20
8
noise 20%
~ o
._~
10/20
10/15
l 0/25
10/30
10/35
10/40
10/45
~raphsizes
~ 10/50
noise 0% - 16%
(a)
Totally Complete Attributed Relational Graphs
| 12
noise20%
~
,
•~
20125
i
ill
i
ii
i
I
ii
20•30
~"~'I
, ,,,
"
~
....... ~ ~.
20/35
,,,,
r
20/40
,,
,
20/45
graphsizes
~
I
~ 20•50 noise
0%
-/6%
Totally Complete Attributed Relational Graphs 20 16
.
noise20%
8
~ 0 .m 30/35
30/40
30/45
graphstzes
50 noise 0% o 16%
(c) Fig. 20. (a), (b) and (c) Percentage of incorrect matches for totally complete attributed relational graphs for subgraphs of sizes 10, 20, 30, respectively. (3) the proposed algorithm showed its efficiency in handling the problem of matching weighted graphs; (4) the computational complexity of this algorithm is much lower than other techniques found in the literature. The average computational complexity of the proposed algorithm is found to be O(M2* NZ);
(5) the best and worst case for the computational complexity of the new algorithm is O(M* N m i n ( M , N ) ) and O ( M 2 * N 2 m i n ( M , N ) ) respectively, which is better than many techniques; (6) the proposed algorithm is based on the concept of error-correcting transformations which is more powerful and has more meaning in calculating the distance than other techniques;
A new algorithm for subgraph optimal isomorphism (7) the proposed algorithm is parallel in nature and can take advantage of hardware parallel architecture.
REFERENCES
1. A. Sanfeliu and K. S. Fu, A distance between attributed relational graphs for pattern recognition, IEEE Trans. Systems, Man Cybernet., 13(3), 353-362 (1983). 2. Narsingh Deo, Graph Theory with Applications to Engineering and Computer Science, Printice-Hall, Englewood Cliffs, New Jersey (1995). 3. A. K. C. Wong, M. You and S. C. Chan, An algorithm for graph optimal monomorphism, IEEE Trans. Systems, Man Cybernet. 20(3), 628-636 (1990). 4. L. G. Shapiro and R. M. Haralick, Structural descriptions and inexact matching, Technical Report CS-7901 lR, Virginia Polytechnic Institute and State University, Blacksburg, Virginia (November 1979). 5. L. Kitchen, Relaxation applied to matching quantitative relational structures, IEEE Trans. Systems Man Cybernet. 10(2), (1980). 6. L. S. Davis, Shape matching using relaxation techniques, IEEE Trans. Pattern Analysis Mach. Intell. 1, 60-72 (1979). 7. S. Z. Li, Matching: invariant to translation, rotation, and scale changes, Pattern Recognition 25, 583 594 (1992). 8. W. J. Christmas, J. Kittler and M. Petrou, Structural matching in computer vision using probabilistic relaxation, IEEE Trans. Pattern Analysis Mach. Intell. 17(8), 749-764 (1995). 9. M. Berthod, M. Kato and J. Zerubia, DPA: a deterministic approach to the MAP problem, 1EEE Trans. Image Process. (1996). 10. D. E. Ghahraman, A. K. C. Wong and T. Au, Graph optimal monomorphism algorithms, IEEE Trans. Systems, Man Cybernet. 10(4), (1980). I 1. W. Tsai and K. S. Fu, Subgraph error-correcting isomorphisms for syntactic pattern recognition, IEEE Trans. Systems, Man Cybernet. 13 (1), 48 62 (1983). 12 M. A. Eshera and K. S. Fu, An image understanding system using attributed symbolic representation and inexact graph matching, Pattern Analysis Mach. lntell. 8(5), 604 617 (1986). 13. M.A. Eshera and K. S. Fu, A graph distance measure for image analysis, 1EEE Tran. Systems, Man Cybernet. 14 (3), 398 408 (1984). 14. S. S. Young, P. D. Scott and N. M. Nasrabadi, Object recognition using multilayer hopfield neural networks, Proc. 1EEE Conf. Comput. Vision and Pattern Recognition, PP. 417 422 (1994). 15. T. W. Chen and W. C. Lin, A neural networks approach to CSG-based 3D object recognition, IEEE Trans. Pattern Analysis Mach. Intell. 16, 719-725 (1994). 16. P. Suganthan, E. Teoh and D. Mital, Pattern recognition by graph matching using the potts MFT neural networks, Pattern Recognition, 28, 997-1009 (1995). 17. S. Gold and A. Rangarajan, A graduated assignment algorithm for graph matching, IEEE Trans. Pattern Analysis Mach. Intell. 18(4), 377-388 (1996). 18. F. Depiero, M. Trivedi and S. Serbin, Graph matching using a direct classification of node attendance, Pattern Recognition, 29(6), 1031- 1048 (1996). 19. S. Umeyama, An Eigendecomposition approach to weighted graph matching problems, IEEE Trans. Pattern Analysis Mach. Intell. 10, 695-703 (1988). 20. H. A. Almohamad, A polynomial transform for matching pairs of weighted graphs, J. Appl. Math. Modeling 15 (4), (1991).
i
217
21. H.A. Almohamad and S. O. Duffuaa, A linear programming approach for the weighted graph matching problem, IEEE Trans. Pattern Analysis Mach. Intell. 15(5), (1993). 22. H. L. Bodlaender, Polynomial algorithm for graph isomorphism and chromatic index on partial K-trees, J. Algorithms 11, 631-643 (1990). 23. J. Chen, A linear time algorithm for isomorphism of graphs of bounded average genus, Technical Report 91-015, Department of Computer Science Texas A and M University, 1991. 24. H. B. Mittal, A fast backtrack algorithm for graph isomorphism, Inform. Process. Lett. 29, 105-110 (1988). 25. U. Schoning, Graph isomorphism is in the low hierarchy, J. Comput. System Sci. 37, 312 323 (1988). 26. R. C. Read and D. G. Corneil, The graph isomorphism disease, J. Graph Theory, 339-363 (1977) 27. V. G. Timkovskii, Some Recognition Problems Related to Graph Isomorphism, UDC 519.1, PP. 153-160. Plenum Press, New York, (1988). 28. A. Lowe, Three dimension object recognition from single two dimension images, Artificial Intell. 31, 335 395 (1987). 29. R. Schalkoff, Pattern Recognition: Statistical, Structural and Neural Approaches, Wiley, New York (1992). 30. A. C. Shaw, A formal picture description scheme or a basis for picture processing systems, Inform. Control. 14, 9-52 (1969). 31. A. S. Shaw, Parsing of graph-representable pictures, J. ACM, Vol. 17, 453-481 (1970). 32. N. Abe, M. Mizumoto, J. Toyoda and K. Tanaka, Web grammar and several graphs, J. Comput. Systems Sci., 7, 37-65 (1973). 33. C. R. Cook, First order graph grammars, S I A M J. Comput., 3, 90-99 (1974). 34. T. Pavlidis, Linear and context-free graph grammars, J. A C M 19, 11-22 (1972). 35. T. Pavlidis, Representation of figures by labelled graphs, Pattern Recognition, 4, 5-17 (1972). 36. E. Tanaka, Theoretical aspects of syntactic pattern recognition, Pattern Recognition, 28(7), 1053-1061 (1995). 37. Y. Kajitani and H. Sakurai, On distance of graphs defined by use of orthogonality between circuits and cutsets, Report CT-73-30, The Institute of Electrical and Communication Engineers of Japan (1973). 38. Y. Kajitani and H. Uede, On the metric space of labeled Graphs, Rep. CST-74-115, The Institute of Electrical and Communication Engineers of Japan (1975). 39. E. Tanaka, A metric on graphs and its applica ions, IEE Japan ip-77-55, 31 October (1977). 40. H. W. Kuhn, The Hungarian method for the assignment problem, Naval Res. Log. Quart. 12, 83-97 (1955). 41. R. M. Karp, An algorithm to solve the m*n assignment problem in expected time O(mn log n), Networks, 10, 143 152 (1980). 42. M. S. Hung and W. O. Rom, Solving the assignment problem by relaxation, Oper. Res. 4, 969-982 (1980). 43. S. Yamada and T. Hasai, An efficient algorithm for planner assignment problem, Int. Symp. on Circuits and Systems., pp. 284 287 (1987). 44. D. Avis, A survey of heuristics for the weighted matching problem, Networks, 13, 475-493 (1983). 45. S. Yamada and T. Hasai, An efficient algorithm for the linear assignment problem, Elect. Comm. Japan, Part 3, 73(12), 28-36 (1990). 46. A. Hsieh, K. C. Fan and T. Fan, Bipartite weighted matching for on-line handwritten chiness character recognition, Pattern Recognition 28(2), 143-151 (1995). 47. J. Stroud, Airports of the world, Putnam and Company, London (1980).
218
Y. EL-SONBATY and M. A. ISMAIL About the Autbor--YASSER EL-SONBATY received B.Sc. (Honors) and M.Sc. degrees in computer science from the university of Alexandira. He is currently a Ph.D. student at the university of Alexandria. His research interests include object representation and recognition, machine vision and pattern recognition.
About the Autbor--M. A. ISMAIL is a professor of computer science at the department of computer
science. Alexandria university, Egypt. He has taught computer science and engineering at the university of Waterloo, Canada, UPM, Saudi Arabia; the university of Windsor, Canada; the university of Michigan, U.S.A. His research interests include pattern analysis and machine intelligence, data structures and analysis, medical computer science, and non-traditional databases.