Journal of Algorithms 59 (2006) 107–124 www.elsevier.com/locate/jalgor
Efficient algorithms for a constrained k-tree core problem in a tree network ✩ Biing-Feng Wang a,∗ , Shietung Peng b , Hong-Yi Yu a , Shan-Chyun Ku a a Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 30043, Republic of China b Faculty of Computer and Information Sciences, Hosei University, Tokyo 184-8584, Japan
Received 13 January 2003 Available online 19 February 2005
Abstract Let T = (V , E) be a free tree in which each vertex has a weight and each edge has a length. Let n = |V |. Given T and parameters k and l, a (k, l)-tree core is a subtree X of T with diameter l, having k leaves, which minimizes the sum of the weighted distances from all vertices in T to X. In this paper, two efficient algorithms are presented for finding a (k, l)-tree core of T . The first algorithm has O(n2 ) time complexity for the case that each edge has an arbitrary length. The second algorithm has O(lkn) time complexity for the case that the lengths of all edges are 1. The (k, l)-tree core problem has an application in distributed database systems. © 2005 Elsevier Inc. All rights reserved. Keywords: Algorithms; Trees; Cores; Diameters; Centers
✩ Work supported by the National Science Council of the Republic of China under grants NSC-92-2213-E-007006 and NSC-93-2752-E-007-004-PAE. * Corresponding author. Fax: 886-3-5723694. E-mail addresses:
[email protected] (B.-F. Wang),
[email protected] (S. Peng),
[email protected] (H.-Y. Yu),
[email protected] (S.-C. Ku).
0196-6774/$ – see front matter © 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.jalgor.2004.12.002
108
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
1. Introduction The problem of optimally locating a service facility in a network has been of considerable interest for quite a long time. Due to the variety of facilities and different criteria for optimality, many location problems have been defined and studied [7–9,12–16,18,22–25]. The facility can be a point, a set of points, a path, a tree, or a forest in the network. The criteria for optimality extensively studied in the literature are the minimum cumulative distance, in which the sum of the distances from all vertices to the facility is minimized, and the minimum eccentricity, in which the distance from the farthest vertex to the facility is minimized. These location problems usually have applications in computer science, information science, and operations research. From some practical point of view, many researchers have paid their attention to constrained versions of existing location problems. Location problems imposed by a lengthconstraint were studied in [12,13,16,23]. The constraint is to limit the total edge length to the selected facilities. Location problems imposed by an eccentricity-constraint were studied in [2,21]. The constraint is to limit the eccentricity to the selected facilities. A location problem imposed by a diameter-constraint was studied in [10]. The constraint is to limit the longest distance among vertices to the selected facilities. In a conditional location problem, the selected facility must include a given subset of vertices. Many conditional location problems have been defined and studied [3–5,11]. Let T = (V , E) be a free tree in which each v ∈ V has a nonnegative weight w(v) and each edge has a nonnegative length. Let n = |V |. Let d(α, β) be the distance function in T , where α and β are either vertices or subtrees of T . In this paper, we consider the following optimization problem: Given T and parameters k and l, the problem is to identify a subtree X that satisfies the following: (1) X has precisely k leaves; (2) X has diameter at most l; and (3) the weighted cumulative distance from V to X, namely, v∈V w(v)d(v, X), is minimized. The subtree X is called a (k, l)-tree core of T . This problem is a constrained version of the k-tree core problem [15], in which the constraint is to set a limit to the diameter of the selected k-tree core. If l exceeds the diameter of T , then the (k, l)-tree core problem is just the k-tree core problem. Peng, Stephen, and Yesha [15] first presented O(n log n) and O(kn) time algorithms for the k-tree core problem. Later, Shioura and Uno [17] improved their results to O(n) time. The main contribution of this paper is to show that the (k, l)tree core problem can be solved efficiently. More specifically, we give two algorithms for constructing a (k, l)-tree. The first algorithm has O(n2 ) time complexity for the case that each edge has an arbitrary length. The second algorithm has O(lkn) time complexity for the case that the lengths of all edges are 1. When the values of k and l are small compared to n, the O(lkn) time algorithm is preferred. Recently, Becker et al. [1] gave an O(n2 log n)-time algorithm for arbitrary edge lengths and an O(n2 )-time algorithm for equal edge lengths.
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
109
The (k, l)-tree core problem is motivated from a distributed database application in computer science. Consider the following replicated data placement problem in a distributed tree network T with n vertices. The tree T is edge-unweighted, vertex-unweighted, and each edge in T may fail with probability p. Users can log in at random vertices of the network. In order to increase the availability of a data object, we assume that it has been replicated and that k copies of it exist on the network. Let x1 , x2 , . . . , xk be the vertices holding the copies of the data object, and let X be the subtree induced by {x1 , x2 , . . . , xk }. A write operation for the replicated data has to be propagated to all copies of the data object. For data consistence, once a copy xi is updated, all the other copies should be updated within a short period of time. Thus, it is reasonable to restrict the diameter of X within a bound l. Let P (X) be the probability of a successful read operation from a random login site. We wish to place the copies on T in such a way as to maximize P (X). Stephens, Yesha, and Humenik [19] showed that for p sufficiently close to zero, P (X) is maximized by those subtrees X which minimize v∈V d(v, X), and this minimum must occur when all copies xi are leaves of X. Therefore, the optimal placement of the replicated data in T can be solved by finding a (k, l)-tree core of T . The remainder of this paper is organized into four sections. In Section 2, we give necessary notation, definitions, and some preliminary results. In Section 3, we present the algorithm for arbitrary edge lengths. In Section 4, we present the algorithm for equal edge lengths. Finally, in Section 5, we give concluding remarks.
2. Preliminaries A free tree is a connected, acyclic, undirected graph. Let T = (V , E) be a free tree, where V is the vertex set and E is the edge set. Let V = {1,2, . . . , n}. Each v ∈ V has a nonnegative vertex weight w(v). The weight of T is w(T ) = v∈V w(v). Each e ∈ E has a nonnegative length. If the lengths of all edges are 1 then T is edge-unweighted. Otherwise, it is edge-weighted. For any two vertices u, v ∈ V , let P (u, v) be the unique path from u to v, and let d(u, v) be its length. The diameter of T is maxu,v∈V d(u, v). For each v ∈ V , let N(v) be the set of vertices adjacent to v. The degree of a vertex v ∈ V is |N (v)|. A leaf of T is a vertex with degree one. For a subtree X of T , the vertex set and edge set of X are denoted, respectively, by V (X) and E(X). For notational simplicity, given a vertex v ∈ V , the subtree having vertex set {v} and edge set ∅ is simply denoted by v. For a vertex v ∈ V and a subtree X of T , the distance from v to X is d(v, X) = minu∈V (X) d(v, u). Fortwo subtrees X and Y of T , the weighted cumulative distance from Y to X is D(Y, X) = v∈V (Y ) w(v)d(v, X). If Y = T , we simply write D(X) in place of D(Y, X). The center of T is a point of T for which the distance from the farthest vertex is minimized. In this paper, the tree T is regarded as a set of points in the plane, and the center is a point in such a tree which need not be a vertex. The center of a tree T is unique and can be found in O(n) time [7,9]. A (k, l)-tree core of T is a subtree X with diameter l, having k subtree leaves (subtree leaves need not be leaves of T ), which minimizes D(X). For A, B ⊆ V , define the set difference A\B = {v | v ∈ A and v ∈ / B}. For S ⊆ V , the subtree induced by S, denoted by S, is the minimum subtree in T whose vertices include
110
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
all the vertices in S. By definition, a vertex v ∈ V is in S if and only if v is on P (x, y) for some pair of vertices x, y ∈ S. When T is oriented into a rooted tree, the parent of a vertex v ∈ V is denoted by p(v). Let r ∈ V be a vertex. Let Tr be the tree obtained by making T rooted at r. The height of Tr is maxv∈V d(r, v). Let v be a vertex in Tr . We denote the subtree rooted at v as Tr (v). If v ∈ N (r), Tr (v) is called a subtree of r. In the following, some preliminary results are presented. Lemma 1 ([14]). In O(n) time, we can compute w(Tv (u)) and D(v) for all v ∈ V and all (u, v) ∈ E. Proof. For any (u, v) ∈ E, we have w(Tv (u)) + w(Tu (v)) = w(T ) and D(v) = D(u) + d(v, u)(w(Tv (u)) − w(Tu (v))). From these two equations, it is easy to compute all w(Tv (u)) and D(v) in linear time by first orienting T into a rooted tree (with an arbitrary root) and then performing top–down/bottom–up computations on it. 2 For any u, v ∈ V , the distance saving of P (u, v), denoted by δ(P (u, v)), is D(u) − D(P (u, v)). We have the following lemma. ut ) be the path in Tr from a vertex u1 to a descendant Lemma 2 ([16]). Let (u1 , u2 , . . . , ut . Then, we have δ(P (u1 , ut )) = 1it−1 {d(ui , ui+1 )w(Tr (ui+1 ))}. Proof. Since P (u1 , ui+1 ), 1 i t − 1, is the union of P (u1 , ui ) and the edge (ui , ui+1 ), it is easy to see that δ(P (u1 , ui+1 )) = δ(P (u1 , ui )) + d(ui , ui+1 )w(Tr (ui+1 )). Combining this and δ(P (u1 , u1 )) = 0, we conclude the lemma immediately. 2 The following lemma shows that the concept of distance saving is very useful for computing the cumulative distance from T to a given subtree. Lemma 3 ([17]). Let P (u, v) be a path in T . Let X be any subtree of T which intersects P (u, v) only at u. Let X be the subtree obtained from X by adding P (u, v). Then, D(X ) = D(X) − δ(P (u, v)). Proof. Let u be the second vertex on the path from u to v. We have D(X) − D X = d(i, X) − d i, X ∪ P (u, v) i∈V
=
i∈V (Tu (u ))
=
d(i, u) − d i, P (u, v) +
=
i∈V
d(i, u) − d i, P (u, v) i∈V
= D(u) − D P (u, v) = δ P (u, v) . Thus, the lemma holds.
2
d(i, X) − d(i, X)
i ∈V / (Tu (u ))
d(i, u) − d i, P (u, v) +
i∈V (Tu (u ))
i ∈V / (Tu (u ))
d(i, u) − d(i, u)
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
111
3. The algorithm for arbitrary edge lengths A subtree of T is l-maximal if it has diameter l and any larger subtree containing it has diameter > l. Let M be the set of all l-maximal subtrees of T . Any subtree within an l-maximal subtree has diameter l. Clearly, there is a subtree in M that contains a (k, l)tree core of T . Therefore, the (k, l)-tree core problem can be solved as follows. First, we construct M. Second, within each Q ∈ M we find a k-leaf subtree X that minimizes D(X). Finally, we compute a (k, l)-tree core of T from the k-leaf subtrees obtained in the second step. We now show how to construct M. The following lemma gives an upper bound on the size of M. Lemma 4. The number of l-maximal subtrees of T is at most n. Proof. We prove this lemma by induction on n. The base case n = 1 is trivial. Suppose, by induction, that the lemma is true for all values less than n; we will show that the lemma holds for n as well. We arbitrarily select an r ∈ V and orient T into Tr . Let q be a leaf that is farthest from r. We classify the l-maximal subtrees of T into two groups. The first includes those not containing q; and the second includes those containing q. Consider the first group. By the induction hypothesis, the number of l-maximal subtrees in this group is at most n − 1, since all such subtrees are contained in the (n − 1)-vertex tree obtained by removing q from T . Next, consider the second group. Let U = {v | v ∈ V , d(q, v) l}. Any subtree of T containing both q and a vertex v ∈ / U has diameter > l. Therefore, only subtrees having vertex sets included by U can be in the second group. In the following we show that the subtree induced by U has diameter l, from which we conclude that there is only one l-maximal subtree containing q. Consider any pair of vertices u, v ∈ U . Let (x1 , x2 , . . . , xk ) be the path from r to q, where x1 = r and xk = q. Let xi be the lowest common ancestor of q and u, and let xj be the lowest common ancestor of q and v. Without losing generality, assume that i j . Since q is a vertex farthest from r, we have d(u, xi ) d(q, xi ). If i = j , we have d(u, v) d(u, xi ) + d(xi , v). If i > j , we have d(u, v) = d(u, xi ) + d(xi , v). In either case, we have d(u, v) d(u, xi ) + d(xi , v) d(q, xi ) + d(xi , v) d(q, v) l. Therefore, the subtree induced by U has diameter l, which completes the proof.
2
The proof of Lemma 4 also suggests an efficient way to construct M. We arbitrarily select an r ∈ V and orient T into Tr . Initially, set M = ∅. Then, proceed to iterate as follows. First, find a leaf q that is farthest from r, determine U = {v | v ∈ V (Tr ), d(q, v) l}, and put U into M. Second, if the diameter of the current Tr is larger than l then remove q from Tr and continue the next iteration else stop. There are |M| iterations. Since each iteration takes O(n) time, the total time is O(n|M|) = O(n2 ). From the above discussion, we obtain the following.
112
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
Lemma 5. The set M can be constructed in O(n2 ) time. Next, we show the solution for the problem: Find a k-leaf subtree X minimizing D(X) within each Q ∈ M. Consider a fixed Q ∈ M. We transform the above problem to the k-tree core problem, which is as follows. We arbitrarily select an r ∈ V (Q) and orient T into Tr . If we remove Q from Tr , we get some rooted subtrees. Let r1 , r2 , . . . , rm be the roots of the subtrees. And, let q1 , q2 , . . . , qm be their parents. We obtain a tree Q from Tr by adding w(Tr (ri )) to w(qi ) and then removing Tr (ri ) for all i = 1, 2, . . . , m. Since Q and Q have the same topology, for any subtree in Q there is a correspondent subtree in Q . Let X be a subtree in Q and X be its correspondent subtree in Q . We have D(T , X) = w(v)d(v, X) + w(v)d(v, X) 1im v∈V (Tr (ri ))
v∈V (Q)
=
w(v)d(v, X) +
=
w(v)d(v, X) +
v∈V (Q)
+
w Tr (ri ) d(qi , X)
1im
w(v)d(v, qi )
1im v∈V (Tr (ri ))
=
w(v) d(v, qi ) + d(qi , X)
1im v∈V (Tr (ri ))
v∈V (Q)
v∈V (Q )
w(v)d v, X +
= D Q , X +
w(v)d(v, qi )
1im v∈V (Tr (ri ))
w(v)d(v, qi ).
1im v∈V (Tr (ri ))
Since 1im v∈V (Tr (ri )) w(v)d(v, qi ) is irrelevant to X, the problem of finding a k-leaf subtree X within Q to minimize D(T , X) becomes the problem of finding a k-leaf subtree X in Q to minimize D(Q , X ). Therefore, by applying Shioura and Uno’s k-tree core algorithm [17] on Q , finding a k-leaf subtree X minimizing D(X) for a given Q can be done in O(n) time. Consequently, it takes O(n|M|) = O(n2 ) time for all Q ∈ M. We summarize this result in the following theorem. Theorem 1. A (k, l)-tree core of an edge-weighted tree T can be found in O(n2 ) time. Remark 1. In case l = ∞, we have |M| = 1. Thus, the complexity in Theorem 1 is O(n) for the original k-tree core problem. Remark 2. Ku et al. [10] imposed a diameter-constraint on the well-known p-median problem of a tree and gave an O(n log n) time algorithm for p = 2. So far, no algorithms have been proposed for general p. Based upon the same strategy of our first algorithm, an efficient algorithm for general p is as follows. For each Q ∈ M, we transform T into Q as above and then compute the p-median of Q as a candidate by using Tamir’s p-median algorithm [20], which requires O(pn2 ) time. Then, we determine a solution by finding the best among the candidates. The above algorithm takes O(pn2 |M|) = O(pn3 ) time.
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
113
4. The algorithm for equal edge lengths Assume that T is edge-unweighted. In this section, we present a new algorithm, which finds a (k, l)-tree core in O(lkn) time. This algorithm is efficient if k and l are small. Let X be a subtree of T with diameter l and having k leaves. Let c be the unique center of X. For easy discussion, assume that c is a vertex of T . Clearly, the degree of c in X is at least 2. Let h = l/2. If we orient T into Tc , then X is a subtree rooted at c, with height h, and having k leaves. Therefore, if the center c of a (k, l)-tree core of T is given, our (k, l)-tree core problem becomes the following rooted (k, h)-tree core problem: Given a rooted tree Tc , identify a subtree X that satisfies the following: (1) (2) (3) (4)
c is the root of X and has degree at least 2; X has precisely k leaves; X has height at most h; and the weighted cumulative distance D(X) is minimized.
Let C = {c | c is the center of P (u, v), where u, v ∈ V and d(u, v) l}. Since T is edgeunweighted, we can rewrite C = {c | c is a nonleaf vertex in T or the midpoint of an edge in T } and thus |C| = O(n). Handler and Mirchandani [9] showed that the center of any longest path of a tree is the unique center. Therefore, there exists a point in C that is the center of a (k, l)-tree core of T . Intuitively, the new algorithm works as follows: Given a free tree T , the algorithm tries to find a rooted (k, h)-tree core of Tc for each c in C. This is done by selecting k vertices among the set of candidates (vertices in Tc whose distance to c h and the distances of their children to c > h). The selection is based on the value of each candidate v that is the distance saving of a path uniquely associated to v (these paths to be defined later are edge disjoint). Generally speaking, the k vertices whose values are the largest k values are selected (the case that all these k vertices are in the same proper subtree of Tc should be handled separately). Denote the set of these k vertices as Gc . The induced subtree Gc is a rooted (k, h)-tree core of Tc . To find Gc for every c in C efficiently, the algorithm uses the dynamic programming technique to compute a table B, where each entry B[i, m, j ] of B contains a set of up to k vertices in Ti (j ) (the subtree of Ti rooted at j ) whose values are the largest among the set of candidates in Ti (j ) with respect to h = m. Finally, the algorithm selects the Gc that minimizes D(Gc ) to be a (k, l)-tree core of T . Before presenting details of the new algorithm, we first give definitions of some new terminologies for a given Tc . Let A be the set of vertices within distance h of c, but all of whose children are more than h distance away from c. That is, A = {v | v ∈ V , d(c, v) h, and d(c, u) > h for every child u of v}. For example, consider the rooted tree Tc in Fig. 1. If h = 3, we have A = {3, 7, 9, 10, 11, 12}. Clearly, as a consequence of the optimization criterion, all leaves of a rooted (k, h)-tree core of Tc are vertices in A. Therefore, our problem is to select a subset S ⊆ A of k vertices such that the induced tree S is a rooted (k, h)-tree core. We call each v ∈ A a candidate (with respect to h). Note that the selected k candidates cannot be contained in the same subtree of c. Otherwise, c is not contained in the induced tree.
114
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
Fig. 1. An edge-unweighted tree Tc .
Let v ∈ V be a vertex with d(c, v) h. We say that a candidate u ∈ A ∩ V (Tc (v)) dominates v if and only if for every x ∈ A ∩ V (Tc (v))\{u}, either δ(P (v, u)) > δ(P (v, x)), or δ(P (v, u)) = δ(P (v, x)) and u > x. By definition, the vertex dominating v is unique. The path P (v, u) has the maximum distance saving among all paths from v to the vertices in A ∩ V (Tc (v)). Therefore, according to Lemma 3, u is the best choice when only one vertex in Tc (v) can be selected as a leaf of a rooted (k, h)-tree core. Clearly, if a vertex v ∈ V is dominated by a candidate u ∈ A, all vertices on P (u, v) are dominated by u. For each u ∈ A, let π(u) be the farthest ancestor of u that is dominated by it. Our construction of a rooted (k, h)-tree core is based upon a greedy selection of k candidates from A as the leaves. Given u ∈ A, we define its value as δ(P (c, u)) if π(u) = c, and Δ(u) = δ(P (p(π(u)), u)) otherwise. The values Δ(u) are defined by the distance savings of a set of |A| paths. The set of paths forms a partition of the edges of T . Consider the Tc in Fig. 1. Assume that w(v) = 1 for all v ∈ V . Such a partition for h = 3 is depicted in Fig. 2. In this example, we have (Δ(3), Δ(7), Δ(9), Δ(10), Δ(11), Δ(12)) = (1, 1, 19, 1, 6, 7), which can be easily verified by Lemma 2. The rank of a candidate u ∈ A, denoted by r(u), is the number of candidates x ∈ A such that either Δ(x) > Δ(u), or Δ(x) = Δ(u) and x u. In the following, we denote the vertex with rank i in A as ai , 1 i |A|. The following lemma shows that the cumulative distance from T to the subtree induced by {c, a1 , a2 , . . . , ai } can be easily computed by using D(c) and the total value of a1 , a2 , . . . , and ai . Lemma 6. D({c, a1 , a2 , . . . , ai }) = D(c) −
1j i
Δ(aj ) for all i = 1, 2, . . . , |A|.
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
115
Fig. 2. The paths that define the values Δ(u).
Proof. Let Xi be {c, a1 , a2 , . . . , ai }, 1 i |A|. Consider the subtree X1 . By definition, a1 dominates the root c. Thus, we have Δ(a1 ) = δ(P (c, a1 )) and D(X1 ) = D(c) − δ(P (c, a1 )) = D(c) − Δ(a1 ). For i > 1, it is easy to see that Xi differs from Xi−1 only in P (p(π(ai )), ai ). Thus, by Lemma 3, D(Xi ) = D(Xi−1 ) − δ(P (p(π(ai )), ai )) = D(Xi−1 ) − Δ(ai ) for 1 < i |A|. From the above, it is easy to conclude that D(Xi ) = D(c) − 1j i Δ(aj ) for all i = 1, 2, . . . , |A|. 2 Let t be the smallest integer such that a1 and at are not contained in the same subtree of c. We define if t k, and {a1 , a2 , . . . , ak } Gc = {a1 , a2 , . . . , ak−1 , at } otherwise. We remark that Gc maximizes u∈S Δ(u) among all S ⊆ A such that |S| = k and the k candidates in S are not all contained in the same subtree of c. The following lemma shows that the cumulative distance from T to the subtree induced by Gc can be easily computed by using D(c) and the total value of the candidates in Gc . Lemma 7. D(Gc ) = D(c) −
u∈Gc
Δ(u).
Proof. If t k, {c, a1 , a2 , . . . , ak } and {a1 , a2 , . . . , ak } are the same and thus the lemma holds immediately from Lemma 6. Assume that t > k. In this case, Gc differs from {c, a1 , a2 , . . . , ak−1 } only in P (c, at ). Thus, D(Gc ) = D({c, a1 , a2 , . . . , ak−1 })− δ(P (c, at )), which by Lemma 6 is D(c) − 1j k−1 Δ(aj ) − δ(P (c, at )). Let Tc (u) be the subtree of c that contains a1 . Since at has the highest rank among all the candidates that are not contained in Tc (u), we have p(π(at )) = c. Thus, Δ(at ) = δ(P (c, at )). Therefore, D(Gc ) = D(c) − u∈Gc Δ(u), which completes the proof. 2
116
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
Our construction of a rooted (k, h)-tree core for a given Tc mainly depends on the following theorem, whose proof is lengthy and thus is given in Appendix A. Theorem 2. Gc is a rooted (k, h)-tree core of Tc . Now, we start to present the new algorithm for finding a (k, l)-tree core of T . The algorithm first uses the dynamic programming technique to compute a table B in bottom– up fashion. Then, it uses the table to compute Gc for each c ∈ C efficiently. Finally, it finds a (k, l)-tree core of T by determining the Gc that minimizes D(Gc ). To describe the algorithm, we need a few more definitions. First, we generalize the terms of candidates and values to the case that both the root of the tree and the height of the subtrees to be considered are not fixed. The root of the tree can be any vertex i in T and the subtrees considered are those of height m h + 1/2 . For a given pair of i and m, we define A(i, m) = {v | v ∈ V (Ti ), d(i, v) m, and d(i, u) > m for every child u of v}. The value of a candidate u ∈ A(i, m), denoted by Δ(i, m, u), is defined similarly as Δ(u). For easy description, for each j ∈ N (i), we further define A(i, m)[j ] = A(i, m) ∩ V (Ti (j )). Since the length of edge (i, j ) is 1, a vertex u ∈ V (Ti (j )) is a candidate of Ti with respect to h = m if and only if it is a candidate of Tj with respect to h = m − 1. That is, for any vertex u ∈ V (Ti (j )), u ∈ A(i, m) if and only if u ∈ A(j, m − 1). The following lemma shows that we can compute the value Δ(i, m, u) for each u ∈ A(i, m)[j ] easily by using the value Δ(j, m − 1, u). Lemma 8. Let i be a vertex in T and j ∈ N (i). We have (1) A(i, m)[j ] = v=i,v∈N (j ) A(j, m − 1)[v]; and (2) for each u ∈ A(i, m)[j ], Δ(j, m − 1, u) + w(Ti (j )) Δ(i, m, u) = Δ(j, m − 1, u)
if u dominates j , and otherwise.
Proof. By comparing trees Ti and Tj , we can see that Ti (j ) is the rooted tree obtained from Tj by pruning Tj (i). (See Fig. 3.) Therefore, the lemma follows immediately from the definitions of A and Δ. We remark that δ(P (i, j )) = w(Ti (j )) since d(i, j ) = 1. 2 For any subset S ⊆ A(i, m), the rank of a candidate u ∈ A(i, m) in S, denoted by r(u, S), is the number of candidates x ∈ S such that either Δ(i, m, x) > Δ(i, m, u), or Δ(i, m, x) = Δ(i, m, u) and x u. For each j ∈ N (i), there is an entry B[i, m, j ] that stores a set of up to k candidates as well as their values. The candidates stored in B[i, m, j ] are those of rank at most k in A(i, m)[j ]. That is, B[i, m, j ] = {(u, Δ(i, m, u)) | u ∈ A(i, m)[j ] and r(u, A(i, m)[j ]) k}. Our computation of table B uses the standard dynamic programming technique. A recursive formula of B[i, m, j ] is given as follows. When m = 1, A(i, m)[j ] = {j }. Thus, B[i, 1, j ] = j, δ P (i, j ) = j, w Ti (j ) . If m > 1 and j is a leaf, we have B[i, m, j ] = j, w(j ) .
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
117
(a)
(b) Fig. 3. (a) Ti and (b) Tj .
For m > 1 and j is not a leaf, we can recursively compute B[i, m, j ] = α ∗ , β ∗ + w Ti (j ) ∪ (α, β) | (α, β) ∈ Y, 2 r(α, Y ) k , where Y = v=i,v∈N (j ) B[j, m − 1, v] and (α ∗ , β ∗ ) ∈ Y is the pair with r(α ∗ , Y ) = 1. The above recursive formula says that the best k candidates in Ti (j ) for Ti with respect to h = m are the best k candidates in Y . The correctness follows directly from Lemma 8. For example, letting k = 2 and T be the tree depicted in Fig. 4, some entries of B are listed in Table 1. Consider the computation of B[1, 2, 4]. We have Y = v=1,v∈N (4) B[4, 1, v] = {(5, 1), (6, 6), (7, 3), (8, 5)}. The vertex 6 has r(6, Y ) = 1. Thus, α ∗ = 6. Since w(T1 (4)) = 16, B[1, 2, 4] = {(6, 6 + 16)} ∪ {(8, 5)} = {(6, 22), (8, 5)}. Finding the largest k elements in a given set can be done efficiently by using a linear time selection algorithm. Thus, for each i, m, and j , it takes O(k|N (j )|) time for computing B[i, m, j ] according to the above recursive formulas. And therefore, the time for constructing table B is 1m h+1/2 (i,j )∈E {O(k|N (j )|)} = O(lnkg), where g is the
118
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
Fig. 4. An edge-unweighted tree T , in which the weight of every vertex is 1.
Table 1 An example of constructing table B (i, j )
B[i, 1, j ]
B[i, 2, j ]
(1, 2)
{(2, 3)}
(1, 3)
{(3, 1)}
(1, 4)
{(4, 16)}
(2, 9)
{(9, 1)}
(2, 10)
{(10, 1)}
(4, 5)
{(5, 1)}
(4, 6)
{(6, 6)}
(4, 7)
{(7, 3)}
(4, 8)
{(8, 5)}
{(10, 4), (9, 1)} /* α ∗ = 10, Y = {(9, 1), (10, 1)} */ {(3, 1)} /* 3 is a leaf */ {(6, 22), (8, 5)} /* α ∗ = 6, Y = {(5, 1), (6, 6), (7, 3), (8, 5)} */ {(9, 1)} /* 9 is a leaf */ {(10, 1)} /* 10 is a leaf */ {(5, 1)} /* 5 is a leaf */ {(12, 10), (11, 1)} /* α ∗ = 12, Y = {(11, 1), (12, 4)} */ {(14, 4), (13, 1)} /* α ∗ = 14, Y = {(13, 1), (14, 1)} */ {(17, 7), (16, 1)} /* α ∗ = 17, Y = {(15, 1), (16, 1), (17, 2)} */
maximum degree of the vertices in T . Next, we show how to reduce the time for constructing table B. The key for efficiently constructing table B is the following lemma. Lemma 9. Let Zi , 1 i s, be s (> 1) sets of distinct numbers such that |Zi | = k for all in i and Zi ∩ Zj = ∅ for all i = j . Let W be the set of the largest 2k numbers 1is Zi . For each q, 1 q s, let Yq be the set of the largest k numbers in i=q,1is Zi . Then, we have Yq ⊂ W for all q.
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
119
Proof. Consider a fixed Yq . Assume y ∈ Yq . By definition, there are at most k − 1 numbers in i=q,1is Zi that are larger than y. Since |Zq | = k, there are at most 2k − 1 numbers in 1is Zi that are larger than y. Thus, y ∈ W and the lemma holds. 2 Using Lemma 9, table B can be constructed efficiently by the following subroutine. Subroutine TABLE(T , k, h). Input: an edge-unweighted tree T and parameters k and h Output: table B begin for each (i, j ) ∈ E do B[i, 1, j ] ← {(j, w(Ti (j )))}; for m ← 2 to h + 1/2 do for each j ∈ V do begin W ← {(α, β) | (α, β) ∈ Z, 1 r(α, Z) 2k}, where Z = v∈N (j ) B[j, m − 1, v]; for each i ∈ N (j ) do if j is a leaf then B[i, m, j ] ← {(j, w(j ))} else begin Y ← W \B[j, m − 1, i]; (α ∗ , β ∗ ) ← the pair (α, β) ∈ Y with r(α, Y ) = 1; B[i, m, j ] ← {(α ∗ , β ∗ + w(Ti (j ))} ∪ {(α, β) | (α, β) ∈ Y, 2 r(α, Y ) k}; end end return (B) end Lemma 10. Subroutine TABLE has O(lkn) time complexity. Proof. We consider fixed m and j first. It takes O(k|N (j )|) time for finding W . For each i ∈ N(j ), finding the k smallest rank elements in W \B[j, m − 1, i] and updating their values take O(k) time. Thus, it takes i∈N (j ) O(k) = O(k|N (j )|) time for all we conclude that the time complexity of the subroutine is i ∈ N(j ). Therefore, 1m (l+1)/2 j ∈V O(k|N (j )|) = O(lkn). 2 Next, we show how to compute Gc for a fixed c ∈ C using table B. Assume first c ∈ V . Let W = v∈N (c) B[c, h , v]. We compute a1 as the candidate in W having rank 1, ak as the candidate in W having rank k, K as the set of candidates in W having ranks at most k, and at as the candidate in W \B[c, h , u] with the highest rank, where u is the vertex in N(c) such that a1 ∈ V (Tc (u)). Then, Gc = K if at ∈ K, and Gc = K\{ak } ∪ {at }, otherwise. By using a linear time selection algorithm, the above computation for Gc requires O(k|N (c)|) time. Next, assume that c is the midpoint of an edge (i, j ) ∈ E.
120
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
(a)
(b) Fig. 5. (a) Tc , and (b) Ti and Tj .
Let h = h + 1/2 . In this case, N (c) = {i, j } and Gc can be found from B[i, h , j ] and B[j, h , i] similarly in O(k) time according to the following lemma, which shows that the values of the candidates for Tc with respect to height h can be obtained from the values of the candidates of Ti and Tj with respect to height h . Lemma 11. Let c ∈ C be the midpoint of an edge (i, j ). Let h = h + 1/2 . Let α1∗ and α2∗ be the candidates of highest rank in A(i, h )[j ] and A(j, h )[i], respectively. We have (1) A(c, h )[i] = A(j, h )[i]; (2) for each u ∈ A(c, h )[i], Δ(j, h , u) Δ c, h , u = Δ(j, h , u) − (1/2)w(Tj (i))
if u = α2∗ , otherwise;
(3) A(c, h )[j ] = A(i, h )[j ]; and (4) for each u ∈ A(c, h )[j ], Δ(i, h , u) Δ c, h , u = Δ(i, h , u) − (1/2)w(Ti (j ))
if u = α1∗ , otherwise.
Proof. By comparing tree Tc with trees Ti and Tj (see Fig. 5), it is easy to see that the formulas in (1)–(4) are true. 2
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
121
Thus, in the case that c is the midpoint of an edge, the time for constructing Gc from B is O(k). A (k, l)-tree core is the Gc that minimizes D(Gc ) over all c ∈ C. Using the values in B, the value of D(Gc ) for every c ∈ C can be easily computed by a (k, l)-tree core of T can be constructed from B in using Lemma 7. Therefore, O(k|N (c)|) + c∈(C∩V ) c∈(C\V ) O(k) = O(kn) time. Theorem 3. A(k, l)-tree core of an edge-unweighted tree T can be found in O(lkn) time.
5. Concluding remarks Another interesting optimization problem is to find a k-tree core of bounded total edgelength. Consider the replicated data placement problem described in Section 1 again. Assume that the MST (minimum-spanning-tree) write policy [26] is used. In the MST write, the writer propagates the data object along the edges of the subtree induced by {r, x1 , x2 , . . . , xk }, where r is the site of the writer. Since write operation is expensive, it is reasonable to restrict the total edge-weight of X within a bound l. Suppose that we wish to place the copies on T in such a way as to minimize the average communication cost of a random write operation, which can be specified as (1/n) v∈V d(v, X) + X, where X is the total edge-length of X. Since the copies xi should be kept as apart as possible to increase their availability and the lengths of all edges are 1, X should be the same. Thus, with the assumptions given above, the optimal placement of the replicated data in T becomes the problem of finding a k-tree core of bounded total edge-length. In the following, we prove that this optimization problem on an edge-weighted tree is NP-hard by reducing the partition problem to it. In the partition problem, we are given a set A of n positive integers and we want to determine whether or not there exists a subset A ⊆ A such that |A | = n/2 and a∈A {a} = m/2, where m = a∈A {a}. For the NPcompleteness of the partition problem, we refer the interested readers to [6]. Given an instance A = {a1 , a2 , . . . , an } of the partition problem, we construct the following edge-weighted tree T = (V , E): V = {c, v1 , v2 , . . . , vn }; E = {(c, v1 ), (c, v2 ), . . . , (c, vn )}; and d(c, vi ) = ai , 1 i n. Each v ∈ V has w(v) = 1. Let X be a k-tree core of T with bounded total edge-length l, where k = n/2 and l = m/2. It is easy to see that there is a subset A of A such that |A | = n/2 and a∈A {a} = m/2 if and only if D(X) = m/2. Therefore, the problem of finding a k-tree core of bounded total edge-length on an edgeweighted tree is NP-hard. We remark that the above proof holds only for edge-weighted trees. The problem of finding a k-tree core with bounded total edge-length on an edge-unweighted tree may be worth further research.
Acknowledgments The authors are grateful to the anonymous referees for their valuable comments, which greatly improved the presentation of this paper.
122
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
Appendix A. Proof of Theorem 2 For each S ⊆ A, let X(S) be S ∪ {c}. Note that for any S ⊆ A, X(S) is a rooted tree having root c and leaf set S. Let Γ = {S | S ⊆ A, |S| = k}. The key to proving Theorem 2 is the following. Lemma A.1. {a1 , a2 , . . . , ak } minimizes D(X(S)) among all S ∈ Γ . Proof. Let y be the largest integer satisfying that there exists a subset Q ∈ Γ such that Q ⊇ {a1 , a2 , . . . , ay } and D(X(Q)) = minS∈Γ D(X(S)). We prove this lemma by showing that y = k. The proof is by contradiction. Suppose that y < k. Let f be the vertex in X(Q) that is closest to ay+1 . Two cases are discussed: (1) f is dominated by ay+1 and (2) f is not dominated by ay+1 . Case 1. f is dominated by ay+1 . Since f ∈ V (X(Q)), X(Q) has a leaf s that is a proper descendant of f . (See Fig. A.1(a).) Let Q = Q\{s} ∪ {ay+1 }. (See Fig. A.1(b).) Let g be the vertex in X(Q ) that is closest to s. Since f ∈ V (X(Q )), g is a descendant of f . Since X(Q ) can be obtained from X(Q) by removing P (g, s) and adding P (f, ay+1 ), we have D(X(Q )) = D(X(Q)) + δ(P (g, s)) − δ(P (f, ay+1 )). Since f is dominated by ay+1 , δ(P (f, ay+1 )) δ(P (f, s)) δ(P (g, s)). Consequently, D(X(Q )) D(X(Q)) and thus D(X(Q )) = minS∈Γ D(X(S)). The set Q ⊇ {a1 , a2 , . . . , ay+1 }, which contradicts to the definition of y. Case 2. f is not dominated by ay+1 . Let s be any vertex in Q\{a1 , a2 , . . . , ay }. Let g be the vertex in X(Q\{s}) that is closest to s. (See Fig. A.2(a).) Let g be the second vertex on P (g, s). Note that since X(Q\{s}) contains c, g is on P (c, s) and g is a child of g. Clearly, s is the unique leaf of X(Q) that is contained in Tc (g ); otherwise, g is not the vertex closest to s in X(Q\{s}). Thus, none of a1 , a2 , . . . , ay are contained in Tc (g ). Also, Tc (g ) does not contain ay+1 ; otherwise, since g ∈ V (X(Q)), f is a descendant of g and is dominated
(a)
(b)
Fig. A.1. Case 1 in the proof of Lemma A.1. (a) A part of X(Q). (b) A part of X(Q ).
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
123
(a)
(b) Fig. A.2. Case 2 in the proof of Lemma A.1. (a) A part of X(Q). (b) A part of X(Q ).
by ay+1 . Let s ∈ A be the candidate that dominates g . From the above discussion, the rank r(s ) > y + 1 and thus δ(P (g, s)) Δ(s ) Δ(ay+1 ). Since f is not dominated by ay+1 , δ(P (f, ay+1 )) Δ(ay+1 ) δ(P (g, s)). Let Q = Q\{s} ∪ {ay+1 }. (See Fig. A.2(b).) We have D(X(Q )) = D(X(Q)) + δ(P (g, s)) − δ(P (f, ay+1 )) D(X(Q)). Therefore, D(X(Q )) = minS∈Γ D(X(S)), which contradicts to the definition of y. From the above discussion, we have y = k and therefore the lemma holds. 2 The proof of Theorem 2 is now presented as follows. Let Γ ∗ = {S | S ∈ Γ, S ⊂ V (Tc (u)) for any u ∈ N (c)}. Note that Gc ∈ Γ ∗ and for any S ∈ Γ ∗ , S and X(S) are the same. In the following, we prove the theorem by showing that D(Gc ) D(S) for any S ∈ Γ ∗ . Let u ∈ N (c) be the vertex such that Tc (u) contains a1 . Recall that t is the smallest integer such that at ∈ / V (Tc (u)). Assume first that t k. In this case, Gc = {a1 , a2 , . . . , ak }. By Lemma A.1 and Γ ∗ ⊆ Γ , we obtain D(Gc ) = minS∈Γ D(X(S)) minS∈Γ ∗ D(S). Therefore, the theorem holds for t k. Next, assume that t > k. In this case, Gc is the union of X({a1 , a2 , . . . , ak−1 }) and P (c, at ). Thus, D(Gc ) = D(X({a1 , a2 , . . . , ak−1 })) − δ(P (c, at )). Consider a fixed S ∈ Γ ∗ . Select an arbitrary s ∈ S/V (Tc (u)). Let S = S\{s}. Clearly, S is the union of X(S ) and some sub-path of P (c, s). Thus, D(S) D(X(S )) − δ(P (c, s)). By Lemma A.1, {a1 , a2 , . . . , ak−1 } min-
124
B.-F. Wang et al. / Journal of Algorithms 59 (2006) 107–124
imizes D(X(Q)) for all Q ⊆ A of k − 1 candidates. Thus, D(X({a1 , a2 , . . . , ak−1 })) D(X(S )). By the definition of t, we have δ(P (c, at )) δ(P (c, s)). Therefore, D Gc = D X {a1 , a2 , . . . , ak−1 } − δ P (c, at ) D X S − δ P (c, s) D S , which completes the proof.
References [1] R.I. Becker, I. Lari, G. Storchi, A. Scozzari, Efficient algorithms for finding the (k, l)-core of tree networks, Networks 40 (4) (2002) 208–251. [2] R.I. Becker, Y.-I. Chiang, I. Lari, A. Scozzari, The centdian path problem on tree networks, in: Proceedings of the 12th International Symposium on Algorithms and Computation, in: Lecture Notes in Comput. Sci., vol. 2223, Springer, 2001, pp. 743–755. [3] O. Berman, D. Simchi-Levi, Conditional location problems on networks, Transport. Sci. 24 (1) (1990) 77– 78. [4] Z. Drezner, Conditional p-center problems, Transport. Sci. 23 (1989) 51–53. [5] Z. Drezner, On the conditional p-median problem, Comput. Oper. Res. 22 (1995) 525–530. [6] M.R. Garey, D.S. Johnson, Computers and Intractability, Freeman, 1979. [7] A.J. Goldman, Optimal center location in simple networks, Transport. Sci. 5 (1971) 212–221. [8] S.L. Hakimi, Optimal distribution of switching centers in communication networks and some related graph theoretical problems, Oper. Res. 13 (1965) 462–475. [9] G.Y. Handler, P. Mirchandani, Location on Networks, MIT Press, Cambridge, MA, 1979. [10] S.-C. Ku, C.-J. Lu, B.-F. Wang, P.-C. Lin, Efficient algorithms for two generalized 2-median problems on trees, in: Proceedings of the 12th International Symposium on Algorithms and Computation, in: Lecture Notes in Comput. Sci., vol. 2223, Springer, 2001, pp. 768–778. [11] E. Minieka, Conditional centers and medians on a graph, Networks 10 (1980) 265–272. [12] E. Minieka, The optimal location of a path or tree in a tree network, Networks 15 (1985) 309–321. [13] E. Minieka, N.H. Patel, On finding the core of a tree with a specified length, J. Algorithms 4 (1983) 345–352. [14] C.A. Morgan, P.L. Slater, A linear time algorithm for a core of a tree, J. Algorithms 1 (1980) 247–258. [15] S. Peng, A.B. Stephen, Y. Yesha, Algorithms for a core and k-tree core of a tree, J. Algorithms 15 (1993) 143–159. [16] S. Peng, W. Lo, Efficient algorithms for finding a core of a tree with specified length, J. Algorithms 15 (1996) 143–159. [17] A. Shioura, T. Uno, A linear time algorithm for finding a k-tree core, J. Algorithms 23 (1997) 281–290. [18] P.J. Slater, Locating central paths in a network, Transport. Sci. 16 (1) (1982) 1–18. [19] A.B. Stephens, Y. Yesha, K. Humenik, Optimal allocation for partially replicated database systems on tree-based networks, in: Proceedings of the 11th International Phoenix Conference on Computing and Communication, 1992, pp. 125–129. [20] A. Tamir, An O(pn2 ) algorithm for the p-median and related problems on tree graphs, Oper. Res. Lett. 19 (1996) 59–64. [21] A. Tamir, D. Perez-Brito, J.A. Moreno-Perez, A polynomial algorithm for the p-centdian problem on a tree, Networks 32 (1998) 255–262. [22] B.C. Tansel, R.L. Francis, T.J. Lowe, Location on networks: a survey, Management Sci. 29 (1983) 482–511. [23] B.-F. Wang, Efficient parallel algorithms for optimally locating a path and a tree of a specified length in a weighted tree network, J. Algorithms 34 (2000) 90–108. [24] B.-F. Wang, Finding a two-core of a tree in linear time, SIAM J. Discrete Math. 15 (2) (2002) 193–210. [25] B.-F. Wang, S.C. Ku, K.-H. Shi, Cost-optimal parallel algorithms for the tree bisector problem and applications, IEEE Trans. Parallel Distrib. Systems 12 (9) (2001) 888–898. [26] O. Wolfson, A. Milo, The multicast policy and its relationship to replicated data placement, ACM Trans. Database Systems 16 (1) (1991) 181–205.