Journal of Algorithms 33, 267-280 (1999) Article ID jagm.1999.1044, available online at http://www.idealibrary.com on
kk''
Faster Subtree Isomorphism* Ron Shamir and Dekel Tsur DepaHment of Computer Science, Tel Aric Unirersiy, Tel Aric 69978, Israel E-mail:
[email protected],
[email protected] Received March 7, 1997
We study the subtree isomorphism problem: Given trees H and G, find a subtree of G which is isomorphic to H or decide that there is no such subtree. We give an O((k' 5/log k)n)-time algorithm for this problem, where k and n are the number of vertices in H and G, respectively. This improves over the O(k"n) algorithms of Chung and Matula. We also give a randomized (Las Vegas) O(k' 376n)-timealgorithm for the decision problem. o 1999 Academic Press
1. INTRODUCTION A fundamental problem in graph theory is the subgraph isomorphism problem: Given two graphs G and H , find a subgraph of G which is isomorphic to H (if there is one). This problem is NP-hard as it includes, for example, the clique problem. When restricting the graph G, the subgraph isomorphism problem may become easier. Polynomial-time algorithms for the subgraph isomorphism problem were given for trees [20], two-connected outerplanar graphs [ 161, and two-connected series-parallel graphs [171. All these graphs have a treewidth of at most 2. The subgraph isomorphism problem is NP-hard when the graph G is an arbitrary graph with treewidth at most 2 [19]. However, the problem is polynomial for the family of graphs with treewidth at most p for every p 2 2, if we also add the restriction that G is p-connected, or that H has bounded degree [19]. A faster algorithm for the former case was given in 161. In this paper we study subtree isomorphism, i.e., the subgraph isomorphism problem when G and H are trees. Figure 1 gives an instance of this * A preliminary version of this paper has appeared in the Fifth Israel Symposium on Theory of Computing and Systems (ISTCS 97). 267 0196-6774/99 $30.00 Copyright 0 1999 by Academic Press All rights of reproduction in any form reserved.
268
SHAMIR AND TSUR
FIG. 1. An instance of the subtree isomorphism problem. The tree G has a subtree which is isomorphic to H .
problem. Throughout this work n and k denote the number of vertices in G and H , respectively. When k = n we get the tree isomorphism problem, which has a linear time algorithm due to Hopcroft and Tarjan [131. Polynomial algorithms for subtree isomorphism were first given by Matula 1201 and by Edmonds (cf. [211). Faster algorithms, with worst-case time complexity of O(k1.5n),were given by Matula [21] and Chung [4]. In contrast, the subgraph isomorphism problem is NP-complete when G is a tree and H is a forest (subforest isomorphism [lo]). The subgraph isomorphism and the subtree isomorphism problems have applications in pattern recognition, computational biology, and chemistry [ 1, 241. Our main result here is an O((k'.'/log k)n)-time algorithm. Our algorithm, like most previous studies of the problem, is based on the close relationship between subtree isomorphism and maximum matching in bipartite graphs. (This relationship was also utilized to obtain fast parallel algorithms for subtree isomorphism [ l l , 151.) The subtree isomorphism problem is recursively translated into a collection of smaller subtree isomorphism problems, which are solved using maximum matching algorithms. The improved complexity is achieved by a combinatorial lemma that bounds the possible number of distinct subtrees involved, and by using the notion of clique partition and its application by Feder and Motwani to finding maximum matching in bipartite graphs [9]. We show that for the matching problem resulting from the subtree isomorphism problem, we can find a clique partition in a simple way. The ideas we use here also give an improved algorithm for the maximum subforest problem (finding a subforest of a given tree with the maximum number of edges which does not have a subgraph isomorphic to a tree from a given set of trees) 1231. We also give a randomized (Las Vegas) algorithm for the decision problem whose expected running time is O(kw-'n), where w is the exponent of matrix multiplication. Using the best known bound for w [5], the above bound is O(k1.376n).This algorithm follows directly from a
FASTER SUBTREE ISOMORPHISM
269
randomized algorithm for the cardinality of a maximum matching given by Cheriyan [3]. The rest of the paper is organized as follows: Section 2 describes an O(k' 5n)algorithm for subtree isomorphism. In Section 3 we prove some simple lemmas on matching that are needed in later sections. In Section 4 we prove the combinatorial lemma and use it together with the clique partition in order to improve the running time of the algorithm from Section 2. Finally, Section 5 describes the randomized algorithm. We finish this section with some basic definitions. For more definitions and basic graph terminology see, e.g., [ 2 ] . A rooted tree is a triplet G = (V, E , r ) , where (V, E ) is an unrooted tree, and r is some vertex in V which is called the root. We will sometimes write G' to denote that r is the root of the rooted tree G. Also, for an unrooted tree G , we denote by G' the rooted tree formed by choosing the vertex r to be its root. We denote by G: the rooted subtree of G' whose vertices are all descendants of L ] , and its root is L]. We say that two rooted trees G' and H" are isomorphic if there is an isomorphism between G and H which maps r to r f . Likewise, we write H" cRG' if there is a rooted subtree J' of G' which is isomorphic to H" (note that the subtree J' must have the same root as G'). We define the open neighborhood of a vertex L] in a graph G by N ( u ) = { u : U L ~E E } , and the closed neighborhood by N[u] = N ( c ) u {u}. We use d,;(u) to denote the degree of a vertex z: in a graph G (or d ( u ) if G is clear from the context). 2. AN O(k1.5n)ALGORITHM In this section we describe an O(k15n) algorithm for the subtree isomorphism problem that will be the basis for the improved algorithms in Sections 4 and 5. It is based on Chung's algorithm 141 and has the same asymptotic time complexity, but is simpler as the tree G is traversed only once, compared to twice in Chung's algorithm. We briefly describe the algorithm for completeness. We note that the improvements of Sections 4 and 5 can also be applied to Chung's original algorithm. For simplicity, we describe an algorithm for the decision problem. It can be extended easily to an algorithm for the search problem. E ) and H = ( V HE, H )be the input trees (w.1.o.g. IVHl > 1). Let G = (V, Select a vertex r of G to be the root. We wish to know for each L] E V and u E V, whether H L L cRG:. In order to compute this efficiently we also need to know for each neighbor w of u if HLrcRG: (notice that HLr is the graph formed by removing H i from H " ) . This information is relevant because H" cRG: iff for every child uf of u there is a distinct child u' of
270
SHAMIR AND TSUR
u such that HI: c RG:,. More generally, we have the following lemma: LEMMA2.1. For any vertex ~1 in G', certex u in H , and a vertex w E N [ u ] we have that HI," cRG: iff for ecevy child u' of u in H," there is a distinct child uf of u such that H,lr cRG:,.
u
We store this information in sets S ( u , u ) defined as follows: for every V,and for every u E V,,
E
See Fig. 2 for an example for this definition. Notice that (1) u E S ( u , u ) if and only if H L L= H,: cRG:, ( 2 ) u E S(u, u ) implies S ( u , u ) = N [ u ] ,and (3) d(u) < d(u) - 1 implies S ( u , u ) = 6. The algorithm computes the sets S ( c , u ) for all ~1 and u. We show how to compute S ( c , u ) inductively by going over the vertices c of G' from the leaves to the root and computing S ( u , u ) for all u E V, (the algorithm is also described using pseudo-code in Fig. 3). The base of the induction is when u is a leaf of G'. Then S(u, u ) can be computed for all u as follows: If u is a leaf of H , then S ( u , u ) consists of one vertex which is the single neighbor of u in H . Otherwise, if u is an internal vertex, S ( c , u ) is empty. For the inductive step, consider an internal vertex c (from (3) we can assume that d(c) 2 d(u) - 1). We first need to compute S ( d , u ) for all the children uf of u , for all u E V,.Then, in order to decide for w E N [ u ] if w E S(u, u), we construct a bipartite graph B,,,(u, u ) with the two parts Xk ' and Y ' lL, where Xk ' is the set of children of u in HI,",Y ' lL is the set of children of c , and u'L~'is an edge of B,(u, u ) iff H,lr cRG:, (i.e., iff
H
FIG. 2. An instance of subtree isomorphism. Here, we have H " , H:2 g K G; and H,U1,H i 3 it(GL, so S ( c , u ) = {ul, uJ. The graph B ( c , u ) is the bipartite graph which we in this graph iff u E S ( L > ,ui). construct in order to compute S(r., u). There is an edge H U g R GL as B ( c , u ) does not contain a matching of size 3. H,L"I iK GL as B L L , ( ru,) = Hr.,u ) u1 contains a matching of size 2. ~
271
FASTER SUBTREE ISOMORPHISM
1 2 3 4 5 6
Select a vertex T of G to be the root of for all u E H,v 6 G do S ( v , u )t 4.
G.
for all leaves v of G' do for all leaves u of H do S ( v , u )t N ( u ) . for all internal vertices v of G' in a postorder do Let vl,vZ,. . . ,vt be the children of v. 7 for all vertices u = uo of H with degree at most t
8 9
10
+
1 do Let u1,u2,. . . ,u8be the neighbors of u. Construct a bipartite graph B ( v ,u) = (X, Y ,E,,,), where X = {ul,. . . ,u8},Y = {vl, . . . ,vt}, and E,, = {u;q : u E S ( q , u ; ) } . Denote X, = X and X ; = X - {u;}. for all 0 5 i 5 s do compute the size mi of a maximum matching between Xi and Y . S ( v , u )t {u;: mi = IXil,O 5 i 5 s}. if u E S ( v, u) then answer YES and stop end for
11 12 13 14 end for 15 Answer NO.
FIG. 3. Algorithm Subtree-Isomorphism(G, H ) .
u E S(u', u')). By Lemma 2.1, w is in S ( u , u ) iff B,,,(u, u ) has a matching of size IX,;,''l. Therefore, in order to compute S ( u , u ) we need to find maximum matchings in d(u) + 1 bipartite graphs. However, all these graphs are similar one to another: Each graph B,,,(u, u ) (for w # u ) is obtained by deleting the vertex w from the graph B,(u, u). In Section 3 (Corollary 3.2) we shall show that it suffices to find a maximum matching u), and then we can efficiently compute the size of the only in BLL(u, maximum matching in B,(u, u ) for all w # u. In the following, we will use B ( u , u ) instead of B,(u, u). See Fig. 2 for an example of the relation between S ( u , u ) and the graph B ( u , u). Algorithm Subtree-Isomorphism is described by the pseudo-code shown in Fig. 3. THEOREM 2.2. Algorithm Subtree-Isomorphism solves the subtree isomorphism problem in O(k'.5n)time and O(kn) space.
Pro08 The correctness of the algorithm follows from Lemma 2.1. Let us consider the space complexity. Let V = {xl,x,,. . . , xn}, and V, = { y , ,y , , . . . , y k } . As each set S ( c , u ) is a subset of N [ u ] ,we can maintain S ( c , u ) using a binary vector A ( c , u ) of size IN[u]l.Also, we maintain a k X k matrix I , where I ( u ,w ) is the index of w E N [ u ] in the vector A(u,u). In other words, w E S ( c , u ) iff bit number I ( u ,w ) in A(u,u ) is set.
272
SHAMIR AND TSUR
Thus the space complexity is O ( k 2 + Zf=,d(y,)n) = O(kn) and each access to S ( c , u ) takes constant time. As for the algorithm's time complexity, the crux is step 10. It computes the size of several maximum matchings in some graphs. Since these graphs are very related to each other, we are able to show in Section 3 that computing the sizes of their maximum matchings can be done in the same time bound taken by computing a single maximum matching. We therefore perform step 10 of the algorithm using Corollary 3.2, and therefore the dominant part of the algorithm's time complexity is finding maximum matchings in the graphs B ( x , ,yr) for all i , j in step 10. Finding a maximum matching in B(x,,yJ) takes O(d(yJ)' 5d(x,)) time using the algorithm of Hopcroft and Karp [12] (or the equivalent algorithm of Dinic [7]) and therefore the time needed to handle a vertex x, is at most O(Zf= ,d(y,)' 5d(x,)) = O((2k)' 5d(x,)). Thus, the total time complexity is O(k15n). I We note that the above algorithm can be changed to solve the subtree homeomorphism problem without changing the asymptotic complexity. The same modification applies to the algorithms in the sections below. 3. MATCHING In this section, we give several lemmas about matchings which are needed for obtaining the more efficient algorithms for the subtree isomorphism problem. For the following lemmas, let B = ( X , Y ,E ) be a bipartite t + 1. Degraph, where X = {xl,x g , . . , xp},Y = {yl,y,, . . . , y J , and s I iI s. For 0 I iI s, denote by rn, note X, = X and X, = X - {x,}for 1 I the size of a maximum matching between X, and Y . Clearly for every i 2 1, either m, = m , or m, = m , - 1. An important notion in matching theory is critical vertices (see, e.g., [IS]): A vertex x in a graph G is critical if the size of a maximum matching in G - x is strictly less than the size of a maximum matching in G (i.e., x, is critical iff rn, = rn, - 1). Let M be a maximum matching of B. We , define a directed graph B, by B, = ( X u Y ,E M ) where EM = { ( x , Y ) 9 E E
-
M ,x
E
X,y
E
Y)
u((y,x):9~MJEX,y+ We denote by X, all the vertices from X which are unmatched in M . LEMMA3.1. For any maximum matching M of B and any certex x, E X, - 1) ifand only i f x , is matched in M , and there is no directedpath in BM from a vertex in X, to x,.
x, is critical (i.e., m, = m ,
FASTER SUBTREE ISOMORPHISM
273
Pro05 (-1 The proof is by contradiction. If x, is unmatched in M , then M is a matching between X , and Y and therefore m, = m,, a contradiction. Also, if there is a path P in B, from a vertex in X , to x,, then M A P ( = M u P - M n P ) is a maximum matching in which x, is unmatched, and again we have a contradiction as m, = m,. (+) Conversely, assume by contradiction that m, = m,. Let y be the vertex matched to x, in M . As IM - {x,y>l= m , - 1 < m,, M - {x,y> is not a maximum matching between X , and Y ,and by Berge’s theorem (see [18]) there is an augmenting path P whose one end is a vertex xI E X,, and the other end is y (because if the other end is a vertex in Y that is unmatched in M then P is an augmenting path for M ) . But this implies a directed path in BM from xl to x,, a contradiction. I
COROLLARY 3.2. Gicen a maximummatching M of B , we can compute iI s in O ( s t ) time. the value of m, for 0 I Pro05 Building the graph BM and finding all the vertices reachable from X , can be done in O ( s t ) time using a depth-first search [25]. We then apply Lemma 3.1. For each vertex x,, if x, is matched in M and is not reachable from X , we set m, = IMI - 1 and otherwise m, = IMI. I LEMMA3.3. The problem of finding a maximum matching in B can be reduced in O(st) time to the problem of finding a maximum matching in a subgraph of B with at most s 2 uertices and edges and with maximumdegree at most s. Pro05 Let X’ be the set of all vertices of X with degree less than s. Let B’ be the subgraph induced from B by the vertices of X’ and their neighbors. Building X’ and B’ takes O(lXl + IYI + IEI) = O ( s t )time. We claim that given a maximum matching M‘ of B’ we can build a maximum matching M of B : First, add all the edges of M’ to M . For each vertex x E X - X’ find an unmatched neighbor y E Y and add xy to M (such a vertex y must exist each such x has at least s neighbors and at most s - 1 of them are matched). Finding an unmatched neighbor of x takes O(s) time, and therefore building M takes O(s2) time. I 4. AN O ( s n ) ALGORITHM We now improve the algorithm from Section 2 by a log k factor. We use the previous algorithm but we solve the maximum matching problems more efficiently using the idea of clique partitions of a bipartite graph and its usage in finding maximum matching [9]. The algorithm of Feder and Motwani, originally stated for bipartite graphs with equal size parts, can be
274
SHAMIR AND TSUR
extended to general bipartite graphs. This allows one to give an algorithm for subtree isomorphism whose time complexity is O((k’5/10g k)n). Instead of describing the modifications to the algorithm of Feder and Motwani, we will give here a simpler way for obtaining an O((k15/ log k)n)-time algorithm for subtree isomorphism. In contrast with [9] where the denseness of the graph is exploited, we achieve the reduction in time complexity by utilizing the special structure of the matching problems that must be solved in the subtree isomorphism algorithm. The modified algorithm, called Improved-Subtree-Isomorphism, is the same as the algorithm Subtree-Isomorphism with the exception that, in step 10, we solve the maximum matching problems differently. Let z: be some vertex in G’ whose children are u,, u g ,. . . , u,, and let u be a vertex in H whose neighbors are u,, u,, . . . , u,. We now consider finding a maximum matching in B = B(u,u). Recall that B = ( X ,Y ,E ) with X = {u,, . . . , u,} and Y = {c,, . . . , cJ. We assume that s 4 t + 1 (because otherwise S ( u , u ) = $1. We first apply Lemma 3.3 and build a subgraph B’ = ( X ’ , Y ’E’) , of B having maximum degree at most s. Then, like in [9], we partition the edges of B’ into complete bipartite graphs C,, C,, . . . , C,. We do the partition in the following way: First, we sort the vertices of X’ in lexicographic order where the key of a vertex u is N(u). Afterward, we split X ’ into sets of equal keys X 1 ,X ’ , . . . , X p (i.e., all the vertices in a set X L have the same neighbors in Y’).Now, for 1 4 i 4 p we set C, to be the subgraph induced by the vertices of X Land all their neighbors in Y‘.We now follow the method of [9] and build a network B” whose vertices are V” = X’ U Y’ U { c , , . . . , c,, a, b}.The edges are E” = El U E,, where El
=
{ u u , : ~E,X ’ } U { u , b : u ,E Y’}
E,
=
{ u , c , : j < p , u , E C,} U { c , u , : <~ p , u , E C,}
All edges have capacity 1. The source is a and the sink is b. We find a maximum (integral) flow f in B” using Dinic’s algorithm [7] (see also [S]), and construct from this flow a maximum matching in B’. (Since the capacity of all edges is 1, the flow f can be decomposed into edge-disjoint paths from a to b where the flow along each path is 1. Since each such path is of the form a - u , - ck - u, - b, we can define a matching in B’ by taking the edge u,u1 for each such path. The maximality of this matching follows from the maximality of the flow.) We will now analyze the time complexity of the algorithm described above. We denote by N u ) the number of distinct trees in the forest . . . , HU”,.
FASTER SUBTREE ISOMORPHISM
275
LEMMA4.1. Algorithm Improued-Subtree-Isomorphismfinds a maximum matching in B ( u , u ) in O(st + tso5D(u))time.
Pro08 Note that if for some i, j the rooted trees If:, and If:, are isomorphic, then in B = B ( c ,u ) the vertices u,, uJ have exactly the same neighbors, and this remains true in B' (assuming that u , and u, were not D(u). deleted in B'). Therefore p I The time for constructing B , B', and B" is O(st)(we sort the vertices of X ' using radix-sort). We now bound the time for finding a maximum flow s and IY'I I sp (since in B " : The size of El is IX'I + IY'I, where IX'I I each vertex in Y' has at least one edge arriving from some vertex c,, and no more than s edges depart from each vertex c, to the vertices in Y'). The size of E , is at most 2sp as the number of edges in E , incident on 2s, where u, E C,. Hence, the some vertex cJ is at most IX'I + dH,(u,)I number of edges in B" is O(sp). The number of vertices in B" is at most s + t + p + 2 = O ( t ) (as s I t + 1). Now, Dinic's algorithm performs O ( m ) stages (see [9]), and each stage takes O(IE"I) time. Hence, the total time is o ( t o 5 s p )= 0 ( t s o 5 p )= 0 ( t s o 5 ~ ( u ) )I. We denote again V = {x,,x 2 , . . , xn} and V, = (y,,y,, . . . .yJ. By Lemma 4.1, the time complexity of algorithm Improved-Subtree-Isomorphism is
We will bound the summation C,d(y,)' 5D(y,) by O ( k l5/10g k ) . We first need a simple combinatorial lemma. Let g ( n ) denote the number of distinct (i.e., non-isomorphic) rooted trees with n vertices. We shall use the following result: LEMMA4.2 (see, e.g., 122, p. 11971) g ( n ) = 2@("). Let f ( n ) denote the maximum number of distinct rooted trees in a forest of n vertices. LEMMA4.3. f ( n ) = O(n/log n).
Pro08 To show the lower bound, we use the fact that g ( n ) 2 2" for large n. Hence, the number of distinct rooted trees with 1 = [log&] vertices is g ( l ) 2 2 l o g ( ~ l / l o g ' l ) k 1 = 2,:gn. Therefore we can build a forest by taking 1 6 1 distinct trees with l vertices each, and the total number of ~
276
SHAMIR AND TSUR
vertices in this forest is [*]I < n. Thus f ( n ) = O(&). We will now show the upper bound. If we have a forest of rooted trees and ri is the number of trees with i vertices, then the number of distinct trees in this forest is at most C j min(r,, g(i)). Hence, , . . . , rn
n
E N ,C i r , ~ n i= 1
By Lemma 4.2,
for some integer constant c. Let x be the minimum integer for which C:= 2 n. Let rl, . . . , rn be the integers that maximize C, min(r,, c l ) under the constraint Cy=lir, I n. We can assume that r, I c' for all i, because if rJ > C J for some j , we can set rJ = C J and the value of C, min(r,, c l ) does not change. Now, suppose that rJ > 0 for some j > n. This implies that there is a k I x for which rk < c k (because otherwise Cy=lir, 2 C:= lir, + jr, = C:= + jr, > n , a contradiction). If we decrease rJ by one, and increase rk by one, then the value of C, min(r,, c') does not n still holds (as k < j ) . We can repeat change, and the constraint Cy=,ir, I this process until rJ = 0 for all j > x and therefore X
n
f ( n ) IC min(r,,c')
C c1
I
1=l
The lemma follows from the fact that x
=
~ ( c " ) .
1=1
=
log, n
-
log, log, n
-
O(1).
I
We now continue with the analysis of algorithm Improved-Subtree-Isomorphism. Let E be some constant 0 < E < 1/3. We call a vertex of H heauy if its degree is at least 2 k 1 - ' , and otherwise it is called light. Clearly, the number of heavy vertices is at most k'. If u is a heavy vertex and u is a neighbor of u such that H," does not contain a heavy vertex, then we call every vertex in H," a pricate vertex of u. We denote by 1, the number of the private vertices of a heavy vertex yJ. LEMMA4.4. f
.
For any heauy vertex y j , d ( y j ) I k'
+ l j and
D(yj)I k'
+
Pro08 Denote by u l , . . . , u p the neighbors of yJ which are private vertices of y, and denote by u l , . . . , uq the rest of the neighbors of y,. Clearly, p is at most the total number of private vertices of yJ which is lJ. Furthermore, for each vertex u, we can choose a heavy vertex w, in Hf;.
277
FASTER SUBTREE ISOMORPHISM
As the vertices we choose are distinct, we have that 4 is less than the number of heavy vertices. Hence, d ( y , ) = 4 + p I k' + 1,. The trees H i ; , . . . , Hd; constitute all the private vertices of y , , and therefore they have a total of 1, vertices and there are at most f ( Z J ) distinct trees among them. Hence, D ( y , ) I 4 +f(Z,) I k' +f(Z,). I LEMMA4.5.
C,d(y,)O 5D(y,) = O(k' 5 / l ~ gk ) .
Pro05 We split C f = , d ( y J ) 0 5 D ( y ,into ) two sums. Summing over the light vertices of H we have
where the last inequality follows from the fact that C,k=,d(yj)= 2k Summing over the heavy vertices and using Lemma 4.4 we have
c
j : y , is heavy
Since f(1,) we have
c
d ( ~ ~ ) " ~ D I( y ~ )
I Zj I k and
(k'/'
j : j , is heavy
-
2
+ Z:.5)(k' + f ( Z j ) ) .
since the number of heavy vertices is at most k',
is ,,cavy Zj I k (as each vertex can be a and by Lemma 4.3, the fact that Cj:yI private vertex of at most one heavy vertex), and the fact that the function h(x) = ~ l . ~ / i oxgis convex,
I We therefore proved the following theorem: THEOREM4.6. Algorithm Improced-Subtree-Isomorphismsolves the subtree isomorphism problem in O((k'.'/log k)n)-time.
278
SHAMIR AND TSUR
5. AN o(k1.376~) ALGORITHM In this section we give a randomized algorithm for the decision problem of subtree isomorphism. The algorithm is more efficient asymptotically than the deterministic algorithm of the previous section. Again, the algorithm Randomized-Subtree-Isomorphism is based on the algorithm Subtree-Isomorphism but solving the maximum matching problems is done differently. Consider some vertex u in G' whose children are u,, u g , .. . , u,, and some vertex u in H whose neighbors are u,, u g , .. . , u,. We use the algorithm of Cheriyan [3] to find the size of a maximum matching in B ( ,u ) and to find all the critical vertices in B ( u , u). Cheriyan's algorithm for finding critical vertices (in a general graph) is as follows: Given an input graph G = ( V ,E ) , where V = {1,2,.. . , n}, choose a large prime number q and build an n X n matrix C in the following way: For each edge i j E E , choose a random number wLJuniformly from {1,2,.. . , 4 - l}, and set cLI= wLIand cJI= -wIJ. Set all the other elements of C to zero. Finally, find a basis a,, a 2 , .. . , a, for the null space of C over the field 2,. Then, with high probability, the size of a maximum matching in G is equal to ( n - r)/2 and for all I E V ,vertex i is critical iff all the ith coordinates of a,, a 2 , .. . , a, are zeros. As the graph B ( u , u ) is bipartite, when applying Cheriyan's algorithm on B ( u , u ) , the matrix C is of the form
where C' is an t X s block and C" is an s X t block. Therefore, we can find a basis for the null space of C by finding bases for the null spaces of C' and C". This can be done in 0 ( s o p ' t )time [141, where w denotes the exponent of matrix multiplication. Thus, the running time of algorithm Randomized-Subtree-Isomorphism is O(Cy, Cf, ,d(y,)"- ' d ( x , ) ) = O(k"- In).
,
THEOREM5.1. Algorithm Randomized-Subtree-Isomorphism solves the decision problem in O ( k U p 1 nexpected ) time. Since w < 2.376 [51, we have COROLLARY 5.2. Algorithm Randomized-Subtree-Isomorphismsolves the decision problem in O(k1.376n) expected time.
FASTER SUBTREE ISOMORPHISM
279
ACKNOWLEDGMENT We are grateful to the anonymous referee for a careful reading of the manuscript and many helpful comments.
REFERENCES 1. S. Anderson, Graphical representation of molecules and substructure-search queries in MACCS, J . Mol. Graphics 2 (19841, 8-90. 2. M. Behzad, G. Chartrand, arid L. Lesriiak-Foster, “Graphs & Digraphs,” Wadsworth, Belrnont, CA, 1979. 3. J. Cheriyan, Randomized 6(M(1V))algorithms for problems irri matching theory, SIAM J . Comput. 26 (19971, 1635-1655. 4. M. J. Chung, O(n2.5)time algorithms for the subgraph homeomorphism problem on trees, J . Algorithms 8 (1987), 106-112. 5. D. Coppersmith and S. Winograd, Matrix multiplication via arithmetic progressions, J . Symbolic Comput. 9 (19901, 23-52. 6. A. Dessmark, A. Lingas, and A. Proskurowski, Faster algorithms for subgraph isomorphism of k-connected partial k-trees, in “Proceedings, Fourth European Symposium on Algorithms, (ESA 961,’’pp. 501-513, Lecture Notes in Computer Science, Springer-Verlag. Berlin/New York, 1996. 7. E. A. Diriic, An algorithm for solution of a problem of maximum flow in a network with power estimation, Soriet Math. Doklady 11 (1970), 1277-1280. 8. S. Even, “Graph Algorithms,” Computer Science Press, Rockville, MD, 1979. 9. T. Feder and R. Motwani, Clique partitions, graph compression and speeding-up algorithms, in “Proceedings, 23rd Symposium on the Theory of Computing (STOC 91),” pp. 123-33, Am. Chem. SOC.,Washington, DC, 1991. 10. M. R. Garey and D. S. Johnson, “Computers and Intractability: A Guide to the Theory of NP-Completeness,” Freeman, San Francisco, 1979. 11. P. B. Gibbons, R. M. Karp, G. L. Miller, and D. Soroker, Subtree isomorphism is in random NC, Discrete Appl. Math. 29 (1990), 35-62. 12. J. E. Hopcroft and R. M. Karp, A n5/‘ algorithm for maximum matching in bipartite graphs, SIAM J . Comput. 2 (19731, 225-231. 13. J. E. Hopcroft arid R. E. Tarjari, Isomorphism of planar graphs, in “Complexity of Computer Computations” (R. E. Miller arid J. W. Thatcher, Eds.), pp. 131-152, Plenum, New York, 1972. 14. 0. H. Ibarra, S. Moran, arid R. Hui, A generalization of the fast LUP matrix decomposition algorithm and applications, J . Algorithms 3 (1982), 45-46. 15. M. Karpinski and A. Lingas, Subtree isomorphism is NC reducible to bipartite perfect matching, Inform. Process. Lett. 30, No. 1 (1989), 27-32. 16. A. Lingas, Subgraph isomorphism for biconnected outerplanar graphs in cubic time, Theoret. Comput. Sci. 63 (1989), 295-302. 17. A. Lingas and M. M. Syslo, A polynomial-time algorithm for subgraph isomorphism of two-connected series-parallel graphs, in “Proceedings, 15th Int. Colloq. Automata, Languages and Programming,” pp. 394-409, Lecture Notes in Computer Science, Vol. 317, Springer-Verlag, Berlin/New York, 1988. 18. L. Lovasz and M. D. Plummer, “Matching Theory,” North-Holland, Amsterdam, 1986.
280
SHAMIR AND TSUR
19. J. MatouSek and R. Thomas, On the complexity of finding iso- and other morphisms for partial k-trees, Discrete Math. 108 (19921, 343-364. 20. D. W. Matula, An algorithm for subtree identification, SlAM Re(.. 10 (1968), 273-274. 21. D. W. Matula, Subtree isomorphism in O ( n 5 / ' ) , Ann. Discrete Math. 2 (1978), 91-106. 22. A. M. Odlyzko, Asymptotic enumeration methods, in "Handbook of Combinatorics" (R.L. Graham, M. Grotschel, and L. Lovasz, Eds.), Vol. 2, pp. 1063-1229, Elsevier and MIT Press, Cambridge, MA, 1995. 23. R. Shamir and D. Tsur, The maximum subforest problem: Approximation and exact algorithms, in "Proceedings, Ninth Symposium on Discrete Algorithms (SODA 981,'' pp. 394-399, Am. Chem. SOC.,Washington, DC, 1998. 24. R. E. Stobaugh, Chemical substructure searching, J . Chem. Inform. Comput. Sci. 25 (1985). 271-275. 25. R. E. Tarjan, Depth-first search arid linear graph algorithms, SIAM J . Comput. 1 (1972), 146-160.