Information Processing Letters 103 (2007) 14–18 www.elsevier.com/locate/ipl
On Chen and Chen’s new tree inclusion algorithm Hai-Lung Cheng ∗ , Biing-Feng Wang Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 30043, Republic of China Received 24 August 2006; received in revised form 27 January 2007 Available online 16 February 2007 Communicated by L. Boasson
Abstract Very recently, Chen and Chen [Y. Chen, Y. Chen, A new tree inclusion algorithm, Information Processing Letters 98 (2006) 253– 262] gave a new algorithm for the tree inclusion problem, which requires O(|T | × min{depth(P ), |leaves(P )|}) time and no extra space. In this Note, we show that there are flaws in their time-complexity analysis by presenting two counterexamples. We also give an example to show that the worst-case time complexity of their algorithm is non-polynomial. Consequently, the asymptotically most efficient algorithm for the tree inclusion problem is the former algorithm in [W. Chen, More efficient algorithm for ordered tree inclusion, Journal of Algorithms 26 (1998) 370–385]. © 2007 Elsevier B.V. All rights reserved. Keywords: Trees; Tree inclusion; Algorithms
1. Introduction An ordered labeled tree is a tree whose nodes are labeled and in which the left-to-right order among siblings is significant. Given two ordered labeled trees T and P , the tree inclusion problem [1–6] is to determine whether it is possible to obtain P from T by deleting some nodes. Deleting a node v in T means making the children of v become the children of the parent of v and then removing v. If P can be obtained from T by deleting some nodes, we say that T includes P , or equivalently, we say that P is embedded in T . Very recently, Chen and Chen [2] proposed a new algorithm for the tree inclusion problem, which requires O(|T | × min{depth(P ), |leaves(P )|}) time and no ex* Corresponding author. Tel.: +886 3 5742806.
E-mail addresses:
[email protected] (H.-L. Cheng),
[email protected] (B.-F. Wang). 0020-0190/$ – see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.ipl.2007.02.006
tra space. In this note, we show that there are flaws in their time-complexity analysis by presenting two counterexamples. We also give an example to show that the worst-case time complexity of their algorithm is nonpolynomial. 2. Chen and Chen’s algorithm Let T be an ordered labeled tree. The root of T is denoted by root(T ), and the label of each node v in T is denoted by label(v). The subtrees rooted at the children of root(T ) are called the subtrees of T . For ease of presentation, a tree with root t and subtrees T1 , T2 , . . . , Tk is denoted by t; T1 , T2 , . . . , Tk . An ordered forest is a sequence of ordered trees. Let G = (P1 , P2 , . . . , Pq ) be an ordered forest. If q = 1, G = P1 is a tree. If q > 1, for convenience, we also regard it as a tree pv ; P1 , P2 , . . . , Pq , where pv denotes a virtual node.
H.-L. Cheng, B.-F. Wang / Information Processing Letters 103 (2007) 14–18
15
function top–down-process(T , G) Input: T = t; T1 , T2 , . . . , Tk and G = p; P1 , P2 , . . . , Pq (* p may or may not be a virtual node *) Output: the maximum number j 0 such that T includes the sub-forest (P1 , P2 , . . . , Pj ) begin 1 if root(G) is virtual 2 then {if (|T | < |P1 | + |P2 | or p has only one child) 3 then G ← P1 4 else {j ← bottom–up-process(T , G); (* handling Case 3.1 *) (* handling Case 3.2 *) 5 if (j = 0 and label(t) = label(root(P1 )) 6 then {change root(P1 ) to a virtual node; x ← bottom–up-process(T , P1 ); 7 if (x = (the number of subtrees of P1 )) then j ← 1 else j ← 0} 8 return j }} 9 if |T | < |G| then return 0 10 else {if (label(t) = label(p)) (* handling Case 2 *) 11 then {p ← virtual node; 12 j ← bottom–up-process(T , G); 13 if (j = (the number of children of p)) then return 1 else return 0} 14 else {if t is a leaf then return 0; (* handling Case 1 *) 15 i ← 1; 16 while (i k) do 17 {if top–down-process(Ti , G) > 0 then return 1; 18 i ← i + 1} 19 return 0}} end function bottom–up-process(T , G) (* p is a virtual node *) Input: T = t; T1 , T2 , . . . , Tk and G = p; P1 , P2 , . . . , Pq Output: the maximum number j 0 such that T1 , T2 , . . . , Tk include the sub-forest (P1 , P2 , . . . , Pj ) begin 1 j ← 0; i ← 1; 2 while (j < q and i k) do 3 {x ← top–down-process(Ti , G); 4 j ← j + x; G ← p; Pj +1 , Pj +2 , . . . , Pq ; i ← i + 1} 5 return j end
Thus, root(G) returns root(P1 ) if q = 1, and returns pv otherwise. Given an ordered tree T = t; T1 , T2 , . . . , Tk and an ordered forest G = (P1 , P2 , . . . , Pq ), Chen and Chen’s tree inclusion algorithm attempts to find the maximum number j 0 such that T includes the sub-forest (P1 , P2 , . . . , Pj ). If j = q, G is embedded in T ; otherwise, only the sub-forest (P1 , P2 , . . . , Pj ) is embedded in T . Let p1 be the root of P1 . In the algorithm, three cases are considered. Case 1: root(G) = pv (i.e., G = P1 and root(G) = p1 ) and label(p1 ) = label(t). If G is embedded in T , there must exist a subtree Ti that includes G. Thus, the algorithm returns j = 1 if such an inclusion can be found, and returns j = 0 otherwise. Case 2: root(G) = pv and label(p1 ) = label(t). Let P1 = p1 ; P11 , P12 , . . . , P1l . If G is embedded in T , there must exist two increasing sequences (k1 , k2 , . . . , kg ) and (l1 , l2 , . . . , lg ) of integers such that lg = l and each Tki , 1 i g, includes the
sub-forest (P1(li−1 +1) , P1(li−1 +2) , . . . , P1li ), where l0 = 0. Thus, if such two sequences exist, the algorithm returns j = 1; otherwise, it returns j = 0. Case 3: root(G) = pv . Let j be the maximum number such that (P1 , P2 , . . . , Pj ) is embedded in T . There are two sub-cases to be considered when looking for j . Case 3.1: This sub-case is similar to Case 2, where there exist two increasing sequences (k1 , k2 , . . . , kg ) and (l1 , l2 , . . . , lg ) such that lg = j 1 and each Tki , 1 i g, includes the sub-forest (Pli−1 +1 , Pli−1 +2 , . . . , Pli ). Case 3.2: label(p1 ) = label(t) and the subtrees of T include the subtrees of P1 . In this sub-case, j = 1. The algorithm consists of two functions: top–downprocess and bottom–up-process. Given two ordered labeled trees T and P , whether T includes P is determined by performing top–downprocess(T , P ). If the function call returns 1, T includes P ; otherwise, T does not include P .
16
H.-L. Cheng, B.-F. Wang / Information Processing Letters 103 (2007) 14–18
3. Time complexity For each node v in T , the subtree rooted at v is denoted by T [v]. For each node u in P , the subtree rooted at u is denoted by P [u]. For any node v in T and any node u in P , we say that u is checked against v if a function call top–down-process(T [v], G) is invoked such that u is the root of the first tree in the forest G. Consider that a function call top–down-process(T , G) is invoked. For each node v in T , let G(v) be the sequence of nodes in G that are checked against v during the execution of the function call. We say that two nodes u1 and u2 in G are on the same path if they have an ancestor/descendant relation. In [2], Chen and Chen proved that the running time of top–down-process(T , G) is O(|T | × depth(G)) based upon the following proposition.
cursively calls top–down-process(T [v1 ], (P [u1 ], P [u])) first. Since the call returns 0, it then calls top–downprocess(T [v], (P [u1 ], P [u])). The call also returns 0. Thus, top–down-process(T , P ) returns 0, indicating that T does not include P . Note that top–down-process(T , P ) invokes these two calls via bottom–up-process(T , (P [u1 ], P [u])). The execution of top–downprocess(T , P ) described above is shown in Fig. 2(a). Let us further discuss the recursive call top–downprocess(T [v1 ], (P [u1 ], P [u])). Since root((P [u1 ], P [u])) = pv , Case 3 holds. Thus, it first checks whether the subtree T [v2 ] can include (P [u1 ], P [u]) by calling top–down-process(T [v2 ], (P [u1 ], P [u])) (Case 3.1). Since the call returns 0 and label(v1 ) = label(u1 ), it then calls top–down-process(T [v2 ], P [u2 ]) to see
Proposition 1. (See [2].) Let G(v) = (u1 , u2 , . . . , us ). Then, (1) ui = uj if i = j , and (2) all ui are on the same path. (a)
Proposition 1 indicates that for any node v in T , no node in G is checked against v more than once, and further, those nodes checked against v are on the same path. In the following, we show the incorrectness of Proposition 1 by presenting two counterexamples. The first shows that it is possible that there is a node in G that appears in G(v) more than once. The second shows that it is possible that there are two nodes in G(v) that are not on the same path. We begin to present the first counterexample. Let T and P be the trees depicted in Fig. 1. Consider the operation of top–down-process(T , P ). Since label(t) = label(p), Case 2 holds. Thus, the problem is reduced to check whether T [v1 ] and T [v] can include the forest (P [u1 ], P [u]). Therefore, top–down-process(T , P ) re-
(b)
(c)
Fig. 1. A counterexample, in which v3 is checked against u2 twice.
Fig. 2. Recursive expansions of top–down-process on the trees T and P in Fig. 1, in which we write top–down in place of top–down-process for short.
H.-L. Cheng, B.-F. Wang / Information Processing Letters 103 (2007) 14–18
Fig. 3. A counterexample, in which u3 and u4 are checked against v5 , but they are not on the same path.
Fig. 5. An example for which the top–down-process(T , P ) is non-polynomial.
17
running
time
of
Fig. 6. The subtree P [ui+1 ] and the forest Gi .
Fig. 4. Recursive expansion of top–down-process on the trees T and P in Fig. 3.
whether T [v2 ] can include P [u2 ] (Case 3.2). Fig. 2(b) is obtained from Fig. 2(a) by expanding the recursive call top–down-process(T [v1 ], (P [u1 ], P [u])) according to the above discussion. Let us expand the two recursive calls top–down-process(T [v2 ], (P [u1 ], P [u])) and top–down-process(T [v2 ], P [u2 ]) in Fig. 2(b) one step further. First, consider the call top–down-process(T [v2 ], (P [u1 ], P [u])). Since |T [v2 ]| < |P [u1 ]| + |P [u]|, according to lines 2–3 of top–down-process, this call is equivalent to top– down-process(T [v2 ], P [u1 ]). Since Case 2 holds, top– down-process(T [v3 ], P [u2 ]) is invoked. Next, consider the call top–down-process(T [v2 ], P [u2 ]). Since Case 1 holds, top–down-process(T [v3 ], P [u2 ]) is invoked. Fig. 2(c) shows the above further expansion. From Fig. 2(c), we observe that u2 is checked against v3 twice.
The second counterexample is depicted in Fig. 3. A recursive expansion of top–down-process on the trees T and P in Fig. 3 is depicted in Fig. 4. In the expansion, there are two function calls top–downprocess(T [v5 ], (P [u3 ], P [u4 ])) and top–down-process(T [v5 ], P [u4 ]). Thus, both u3 and u4 are checked against v5 , but they are not on the same path. In the remainder of this section, we give an example to show that the worst-case time complexity of Chen and Chen’s tree inclusion algorithm is nonpolynomial. Consider the trees T and P depicted in Fig. 5, where n, m are integers with n m 1. In the following, we show that the number of times that u is checked against v during the execution of n , which is the number top–down-process(T , P ) is Cm of ways of selecting m objects from n distinct objects. For 1 i m, let Gi be the ordered forest obtained from P [ui+1 ] by replacing ui+1 with pv . (See Fig. 6.) Consider the operation of top–down-process(T , P ) on the trees in Fig. 5. Since label(t) = label(um+1 ), Case 2 holds. Thus, top–down-process(T , P ) calls top–down-process(T [vn ], Gm ) first. Let us count the number of times that v is checked against u during the execution of top–down-process(T [vn ], Gm ). Denote the number by f (n, m). Clearly, f (1, 1) = 1. If n = m, since |T [vn ]| = |P [um ]| < Gm , it is easy to see
18
H.-L. Cheng, B.-F. Wang / Information Processing Letters 103 (2007) 14–18
The algorithms in [1,3] use, resp., O(|leaves(P )| min{depth(T ), |leaves(T )|}) and O(|T ||P |) space. Chen and Chen tried to design an efficient algorithm without using extra space. Although they did not succeed, their work still suggests a possible direction for further study. Fig. 7. Recursive expansion of top–down-process(T [vn ], Gm ) with n > m.
that f (n, m) = f (n − 1, m − 1). Thus, we conclude that f (n, m) = 1 for n = m. Assume that n > m. Let us expand top–down-process(T [vn ], Gm ) one step further. Since Case 3 holds, top–down-process(T [vn ], Gm ) first calls top–down-process(T [vn−1 ], Gm ) (Case 3.1). Since the call returns 0 and label(vn ) = label(um ), it then calls top–down-process(T [vn−1 ], Gm−1 ) (Case 3.2). Fig. 7 illustrates the above expansion. According to the expansion, we have f (n, m) = f (n − 1, m) + f (n − 1, m − 1), from which it is concluded that n . Therefore, the worst-case time comf (n, m) = Cm plexity of top–down-process(T , P ) is non-polynomial.
References [1] W. Chen, More efficient algorithm for ordered tree inclusion, Journal of Algorithms 26 (1998) 370–385. [2] Y. Chen, Y. Chen, A new tree inclusion algorithm, Information Processing Letters 98 (2006) 253–262. [3] P. Kilpeläinen, H. Mannila, Ordered and unordered tree inclusion, SIAM Journal on Computing 24 (1995) 340–356. [4] D.E. Knuth, The Art of Computer Programming, vol. 1, AddisonWesley, Reading, MA, 1969. [5] H. Mannila, K.-J. Raiha, On query languages for the p-string data model, in: H. Kangassalo, S. Ohsuga, H. Jaakola (Eds.), Information Modelling and Knowledge Bases, IOS Press, Amsterdam, 1990, pp. 469–482. [6] J. Matoušek, R. Thomas, On the complexity of finding iso- and other morphisms for partial k-trees, Discrete Mathematics 108 (1992) 343–364.