ARTICLE IN PRESS
Journal of Computer and System Sciences 69 (2004) 547–561 http://www.elsevier.com/locate/jcss
Approximating the dense set-cover problem Reuven Bar-Yehuda,1 and Zehavit Kehat Computer Science Department, Technion - Israel Institute of Technology, Haifa 32000, Israel Received 25 May 2001; revised 3 March 2003
Abstract We study the set-cover problem, i.e. given a collection C of subsets of a finite set U; find a minimum size subset C 0 DC such that every element in U belongs to at least one member of C: An instance ðC; UÞ of the set-cover problem is k-bounded if the number of occurrences in C of any element is bounded by a constant kX2: We present an approximation algorithm for the k-bounded set-cover problem, that achieves the ratio k pffiffiffiffiffiffi þ oð1Þ; where e is defined as jUj=ðjCj Þ: If e is relatively high, we say that the problem is dense, k kðk1Þ k 1e and this ratio in this case is better than k; which is the best known constant ratio for this problem. In the case that the number of occurrences in C of any element is exactly k ¼ 2 the problem is known as the vertex-cover problem. For dense graphs, our algorithm achieves an approximation ratio better than that of Nagamochi and Ibaraki (Japan J. Indust. Appl. Math. 16 (1999) 369), and the same approximation ratios as Karpinski and Zelikovsky (Proceedings of DIMACS Workshop on Network Design: Connectivity and Facilities Location, Vol. 40, Princeton, 1998, pp. 169–178). In our algorithm we use a combinatorial property of the set-cover problem, which is based on the classical greedy algorithm for the set-cover problem. We use this property to define a ‘‘greedy-sequence’’, which is defined over a given instance of the set-cover problem and its cover. In addition, we show evidence that the ratio we achieve for the e-dense k-bounded set-cover problem is the best constant ratio one can expect. We do this by showing that finding a better constant ratio is as hard as finding a constant ratio better than k for the k-bounded set-cover problem in which the optimal cover is known to be of size at least jCj k : (k is the best known constant ratio for this version of the k-bounded setcover problem.) We show a similar lower bound for the approximation ratio for the vertex-cover problem in e-dense graphs. r 2004 Elsevier Inc. All rights reserved. Keywords: Approximation algorithm; Set-cover; Vertex-cover
Corresponding author. E-mail address:
[email protected] (R. Bar-Yehuda). 1 This research was supported by the fund for the promotion of research at the Technion. 0022-0000/$ - see front matter r 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.jcss.2004.03.006
ARTICLE IN PRESS R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
548
1. Introduction Given a collection C ¼ fs1 ; y; sn g of subsets of a finite set U of size m; a sub-collection C 0 DC is called a set-cover if every element in U belongs to at least one member of C 0 : The minimum setcover problem is defined as follows: Given an instance ðC; UÞ; find a set-cover of minimum cardinality. Throughout this paper we assume that all set cover instances ðC; UÞ are simple, i.e. for every two elements u1 ; u2 AU; there are two sets s1 ; s2 AC such that u1 As1 ; u1 es2 ; u2 As2 and u2 es1 ; otherwise one of these elements can be deleted without changing the problem. We call an instance ðC; UÞ of the set-cover problem k-bounded if the number of occurrences in C of any element is bounded by a constant kX2; and there exists an element which appears in C exactly k times. If the number of occurrences in C is the same for all elements in U we call the instance homogeneous. An instance is called k-homogeneous if it is k-bounded and also homogeneous. Note that the 2-homogeneous set-cover problem is known as the vertex-cover problem (where C and U are the vertex set and the edge set of a graph). Given an instance ðC; UÞ of the set-cover problem, a set-cover C 0 is called an r-approximation, if jC 0 jprjOPTj; where OPT is an optimal set cover for ðC; UÞ: We say that an algorithm has an approximation ratio of r; if for every instance it returns a cover which is an r-approximation. Johnson [6] has presented the best known approximation algorithm for the set-cover problem which has a ratio of 1 þ ln m: The best approximation ratio for the k-bounded set-cover problem ln n due to Halperin [4]. Bar-Yehuda and Even [1], and Hochbaum [5] presented is k ðk1Þln ln n k-approximation algorithms for the k-bounded set-cover problem. k is the best known constant ratio for the k-bounded problem. k Define e ¼ m=nk! : We call a k-bounded instance ðC; UÞ dense if e is relatively high (e.g., m ¼ Oðnk Þ). Formally, given a positive eo1; we call a k-bounded instance of ðC; UÞ e-dense if 2
m=nk! ¼ e: A graph G ¼ ðV; EÞ is called e-dense if jEj=jV2j ¼ e: We present a kðk1Þk pk ffiffiffiffiffiffi -approximation algorithm for the e-dense k-homogeneous set-cover 1e k
problem, and a
k pffiffiffiffiffiffi kðk1Þ k 1e
þ oð1Þ-approximation algorithm for the k-bounded set-cover
problem. As far as we know, there were no previous works for the e-dense k-homogeneous and k-bounded problems. In the special case of e-dense graphs, our algorithm achieves a better approximation ratio than the algorithm presented by Nagamochi and Ibaraki [9], and the same ratio as the one presented by Karpinski and Zelikovsky [7]. The parameter e; which represents the density of the problem, can be defined in other ways. Given a k-bounded instance ðC; UÞ; we assume it is simple, thus the maximum number of elements m is bounded by ðknÞ: Therefore, another definition, possibility more natural, is e ¼ m=ðknÞ: k
We discuss the former definition (e ¼ m=nk! ), as did Karpinski and Zelikovsky [7] with respect to dense graphs. However, we show that when using the latter definition we get similar results (up to an additive factor of oð1Þ). We show evidence that the ratio we achieve for the e-dense k-bounded set-cover problem is the best constant ratio one can expect. We do this by showing that finding a better constant ratio is as hard as finding a constant ratio better than k for the k-bounded set-cover problem in which the optimal cover is known to be at least of size kn: (k is the best known constant ratio for this version
ARTICLE IN PRESS R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
549
of the k-bounded set-cover problem.). This is also true for the case of e-dense graphs, in which our ratio is the best constant ratio, unless there exists an r-approximation algorithm for the vertexcover problem, where r is a constant less than 2. It has been recently shown by Eremeev [3], that the vertex-cover problem in all-over e-dense graphs is NP-hard to approximate within a factor less 7þe 2 : Clearly there is still a gap between the approximation 1þe for this problem, and the than 6þ2e factor which is NP-hard to obtain. We try to measure the hardness of finding better 2 approximation ratios than the ratios 1þe and 2p2ffiffiffiffiffiffi for the vertex-cover problem in all-over e1e dense graphs and in e-dense graphs, respectively. The remainder of the paper is organized as follows: In Section 2 we present a property of a cover in the set cover problem, which we use to define ‘‘the greedy sequence’’. We use this property to construct an approximation algorithm for the k-bounded set-cover problem in Section 3. In Section 4 we analyze the performance of the algorithm when applied to e-dense k-bounded set-cover problem, and we show that the ratio we achieve is the best constant ratio we can expect. Section 5 deals with the special case of the 2-homogeneous set-cover problem, which is the vertexcover problem.
2. Greedy-sequences The classical algorithm for the set-cover problem is the greedy algorithm [6], which in each iteration selects a set that covers a maximal number of uncovered elements. Given ðC; UÞ we define greedy-sort(C; U) as a list of the sets in C sorted according to the order of their selection by the greedy algorithm. Given an instance ðC; UÞ; let k be the maximum number of occurrences of any element in C: Given a cover C 0 DC we define the greedy-sequence of ðC 0 ; C; UÞ as a vector ðnk ; y; n1 Þ: The first nk sets in greedy-sortðC; UÞ are in C 0 and the following set, which we denote by snk þ1 ; is not. Let s˜nk þ1 denote the set of the elements of snk þ1 which are still uncovered when it is selected by the greedy algorithm. Consider the instance in which we need to cover only the elements in s˜nk þ1 ; i.e., ðCk1 ; Uk1 Þ ¼ ðfs-˜snk þ1 j sAC\fsnk þ1 gg; U-˜snk þ1 Þ; which is a ðk 1Þbounded instance. The first nk1 sets in greedy-sortðfs-˜snk þ1 j sAC\fsnk þ1 gg; U-˜snk þ1 Þ are in C 0 and the next set is not. The other ni ’s are defined similarly, and we finally get a 2-bounded instance, ðC2 ; U2 Þ; in which the first n2 sets in greedy-sortðC2 ; U2 Þ are in C 0 and the next set sn2 þ1 is not. Therefore the collection of sets fsj sAC2 and s-˜sn2 þ1 afg is in C 0 : We denote by n1 the size of this collection. In ðC2 ; U2 Þ every element appears in at most two sets, so each of these n1 sets covers at most one element of s˜n2 þ1 ; and therefore j˜sn2 þ1 j ¼ n1 : (See Figs. 1 and 2.) We say that a sequence ðnk ; y; n1 Þ is a possible greedy-sequence of ðC; UÞ; if there exists a setcover C 0 DC such that the sequence ðnk ; y; n1 Þ is the greedy-sequence of ðC 0 ; C; UÞ: We present a procedure called extract-cover, which receives as an input an instance ðC; UÞ and a sequence of numbers ðnk ; y; n2 Þ; and returns a partial cover, denoted by SC; such that ðnk ; y; n2 Þ is the prefix of a possible greedy-sequence of ðC; UÞ: Note that given a sequence nk ; y; n2 and an instance ðC; UÞ the number n1 is defined uniquely, therefore, ðnk ; y; n2 Þ is sufficient when calling Procedure Extract-Cover.
ARTICLE IN PRESS 550
R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
Fig. 1. ‘‘Greedy-Sequences’’—Example. Let ðC; UÞ be an instance of the set-cover problem, where C ¼ fc1 ; y; c10 g; U ¼ fu1 ; y; u7 g; and k ¼ 3: Let C 0 ¼ fc1 ; c5 ; c8 ; c9 g be a possible cover for the instance. The first two sets from C that the greedy algorithm may choose are c1 and c10 : We can see that c1 AC 0 and c10 eC 0 ; and therefore, n3 ¼ 1 in the greedysequence of ðC 0 ; C; UÞ: Following the greedy algorithm on ðC2 ; U2 Þ we can see that the first set to be chosen is c6 : The set c6 is not in C 0 ; thus n2 ¼ 0 and n1 ¼ 2 in the greedy-sequence of ðC 0 ; C; UÞ:
Fig. 2. ‘‘Greedy-Sequences’’—A schematic figure.
Procedure Extract-Cover(C; U; ðnk ; y; n2 Þ) ðCk ; Uk Þ=ðC; UÞ SC ¼ f For i ¼ k downto 2 If ni XjCi j Return ‘‘illegal greedy-sequence’’ SC ¼ SC,the first ni sets in greedy-sortðCi ; Ui Þ sni þ1 ¼ the next set in greedy-sortðCi ; Ui Þ If i42 ðCi1 ; Ui1 Þ ¼ ðs-ðsni þ1 \SCÞjsACi ; Ui -ðsni þ1 \SCÞÞ SC ¼ SC,s j sAC2 and s-ðsn2 þ1 \SCÞaf Return SC
ARTICLE IN PRESS R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
551
In the next section we use this procedure in our main approximation algorithm. We would like to find a lower bound for the size of the partial cover, which is returned by the procedure. The resulting partial cover is of the same size as the sum of the numbers in the input greedy-sequence. Given a homogeneous instance ðC; UÞ of the set-cover problem, we prove that for any possible P k greedy-sequence of ðC; UÞ; there exists a lower bound for ki¼1 ni ; which is a function of e ¼ m=nk! : Lemma 1. Let ðC; UÞ be a homogeneous instance of the set-cover problem and let ðnk ; y; n1 Þ be a possible greedy-sequence of ðC; UÞ: Then k X ffiffiffiffiffiffiffiffiffiffiffi p k ni Xð1 1 eÞn: i¼1
Proof. Let xi denote the number of elements in the set sni þ1 ; which are still uncovered when sniþ1 is chosen by the greedy algorithm, for 2pipk: Let Sk be the first nk sets in greedy-sortðC; UÞ and let UA be the set of elements that are covered by Sk : Denote UB ¼ U\UA : Examine the instance ðC\Sk ; UB Þ: The number of sets in C\Sk is n nk and the maximum size of a set is xk : Also, the k number of occurrences in C\Sk of any element is exactly k; therefore, jUA jpðknÞ ðnn k Þ and P kjUB j ¼ sAC\Sk jsjpxk ðn nk Þ: Thus n n nk 1 ð1Þ þ xk ðn nk Þ: jUj ¼ jUA j þ jUB jp k k k By definition jUi j ¼ xiþ1 ; for 2piok: Let Ck ¼ C; Uk ¼ U and xkþ1 ¼ jUk j: We can bound jUi j similarly to the way we have bounded jUj in (1), therefore, any possible-greedy-sequence, ðnk ; y; n1 Þ; has to satisfy the following, where 2pipk: jCi j ni jCi j xi xiþ1 ¼ jUi jp ð2Þ þ ðjCi j ni Þ i i i 1 0 1 0 ! k k P P k X nj C B n nj C xi Bn p@ ð3Þ nj n Aþ A@ j¼i j¼iþ1 i j¼i i i P P Xk ðn kj¼iþ1 nj Þi ðn kj¼i nj Þi xi p n þ n : ð4Þ j j¼i i! i! i P The inequality from (3) is due to jCi jpn kj¼iþ1 nj ; and due to the following claim: For any x; y; y yz z; k such that 0pzpx and 1pkpx z; if xpy then ðxkÞ ðxz k ÞpðkÞ ð k Þ: The inequality from (4) is due the following claim: For any n; z; k such that 1pkpn z and 0pzpn: ðknÞ k
ðnzÞ n ðnz k Þp k! k! : Both claims can be easily proven by induction on k: Therefore, we get the following constraint for any i such that 2pipk: !i !i ! k k k X X X nj n nj þði 1Þ!xi n nj : i!xiþ1 ¼ i!jUi jp n k
j¼iþ1
j¼i
j¼i
ð5Þ
ARTICLE IN PRESS R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
552
P It follows that in order to bound ki¼1 ni for some possible-greedy-sequence ðnk ; y; n1 Þ; it is sufficient to solve the following optimization problem: Min
k P
ni
i¼1
s:t:
constraint ð5Þ; k P and ni pn
ð6Þ
i¼1
and ni ; xi AR: In the appendix we show that the optimal solution for this optimization problem is ð1 ffiffiffiffiffiffiffiffiffiffiffi p k 1 eÞn: Note that we find an optimal solution for the relaxed problem, where ni ; xi AR; instead of ni ; xi AN: &
3. Approximating the k-bounded set-cover problem In this section we present an approximation algorithm called multi-greedy. This algorithm checks all the possibilities of a prefix ðnk ; y; n2 Þ of a greedy-sequence. For each possibility it calls Procedure Extract-Cover in order to get a partial cover, which is a subset of a cover that has such a greedy-sequence, and extends it to a possible set-cover. Then the algorithm chooses a set-cover of minimum size among all these possibilities. At least one possibility will be a prefix of a greedysequence of some optimal cover of the problem. In this case, Procedure Extract-Cover will return a subset of an optimal cover, whose size is equal to the sum of the numbers in the corresponding greedy-sequence vector.
Algorithm Multi-Greedy(C,U) APPX ¼ C Let Uk ¼ the set of all the elements that belong to exactly k sets For each possible prefix nk ; y; n2 of a greedy-sequence SC ¼ Extract-CoverðC; Uk ; ðnk ; y; n2 ÞÞ S Covered ¼ sASC s Find kSC ¼ a k-approximation for ðC\SC; U\CoveredÞ If jSC,kSCjojAPPXj APPX ¼ SC,kSC Return APPX Let OPT denote some optimal cover of a given k-bounded instance, ðC; UÞ: We define the sequence (nˆ k ; y; nˆ 1 ) as the greedy-sequence of ðOPT; C; Uk Þ: Algorithm Multi-Greedy checks all the possible sequences, therefore, there will be a step in the algorithm when it calls Procedure d denote the values of the c and C^vered Extract-Cover with the sequence (nˆ k ; y; nˆ 2 ). Let SC
ARTICLE IN PRESS R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
553
parameters SC and Covered in this specific step. Note that as mentioned in Section 2, given a sequence nˆ k ; y; nˆ 2 and an instance ðC; UÞ; the number nˆ 1 is defined uniquely. Our first step in the analysis of Algorithm Multi-Greedy is to bound the size of the resulting cover APPX as a function of nˆ k ; y; nˆ 1 : Lemma 2. k jAPPXjp Pk jOPTj: nˆ i 1 þ ðk 1Þ i¼1 n c c ¼ Pk nˆ i : The sets in SC c cover all the elements in Proof. By definition SC SCDOPT and jSC SCj i¼1 d C^vered C^vered: Therefore the rest of the sets in OPT, denote them by OPT0 ; cover the elements in d d C^vered: Algorithm Multi-Greedy applies U\C^vered C^vered: Moreover, OPT0 is an optimal cover of U\C^vered d a k-approximation algorithm on U\C^vered C^vered; therefore its result will have the size of at most 0 kjOPT j: Let R ¼ jAPPXj jOPTj : By the algorithm, APPX is the set of minimum size among all possible covers, in c therefore we can bound the set particular it is smaller or equal to the possibility that includes SC SC; cover APPX as follows: ! k k k X X X nˆ i þ kjOPT0 j ¼ nˆ i þ k jOPTj nˆ i RjOPTj ¼ jAPPXjp i¼1
i¼1
¼ kjOPTj ðk 1Þ
k X
nˆ i ¼ kjOPTj ðk 1Þ
i¼1
i¼1
Pk
i¼1
nˆ i
n
n:
By definition nXjAPPXj ¼ RjOPTj; thus,
Pk nˆ i RjOPTj ¼ jAPPXjpkjOPTj ðk 1Þ i¼1 RjOPTj n ! Pk ˆi i¼1 n R ¼ jOPTj k ðk 1Þ n
and, therefore, k Rp Pk : nˆ i 1 þ ðk 1Þ i¼1 n
&
The following step is to find a lower bound for the expression (for the same nˆ k ; y; nˆ 1 as defined above). Lemma 3. Given a k-bounded instance ðC; UÞ k X ffiffiffiffiffiffiffiffiffiffiffi p k nˆ i Xð1 1 e oð1ÞÞn: i¼1
Pk
i¼1
k
nˆ i as a function of e ¼ m=nk!
ARTICLE IN PRESS R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
554
Proof. Given a k-bounded instance ðC; UÞ; we consider the following division of U into two sets: Uk is defined as the set of all the elements that belong to exactly k sets in C; and U% k ¼ U\Uk : It is k simple to see that jU% k j ¼ oðnk! Þ: Therefore, by the definition of e we get k nk n % e ¼ jUj ¼ jUk j þ jUk j ¼ jUk j þ o k! k! nk ðe oð1ÞÞ: k! We denote ek ¼ e oð1Þ: The proof of Lemma 1 is based on constraint (5), which has been constructed under the assumption that the instance of the problem is homogeneous. This lemma gives a lower bound for any possible greedy-sequence of the homogeneous instance ðC; Uk Þ; in particular it is true for the sequence nˆ k ; y; nˆ 1 ; therefore, k X pffiffiffiffiffiffiffiffiffiffiffiffiffi nˆ i Xð1 k 1 ek Þn: jUk j ¼
i¼1
By the definition of ek k X pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffi p k nˆ i Xð1 k 1 e þ oð1ÞÞn ¼ ð1 1 e oð1ÞÞn:
&
i¼1
Theorem 4. Algorithm Multi-Greedy has an approximation ratio of homogeneous set-cover problem, and a ratio of
k pffiffiffiffiffiffi kðk1Þ k 1e
k pffiffiffiffiffiffi kðk1Þ k 1e
for the k-
þ oð1Þ for the k-bounded set-cover
problem. Proof. Given a k-bounded instance ðC; UÞ; by Lemma 2, k jAPPXjp Pk jOPTj: nˆ i 1 þ ðk 1Þ i¼1 n If ðC; UÞ is homogeneous, then Uk ¼ U; and nˆ k ; y; nˆ 1 is the greedy-sequence of ðOPT; C; UÞ: Lemma 1 gives a lower bound for any possible greedy-sequence of ðC; UÞ; in particular for the sequence nˆ k ; y; nˆ 1 ; thus, k X
nˆ i Xð1
ffiffiffiffiffiffiffiffiffiffiffi p k 1 eÞn
i¼1
and, therefore, k
pk ffiffiffiffiffiffi jOPTj 1 þ ðk 1Þð1 n1eÞn k pffiffiffiffiffiffiffiffiffiffiffijOPTj: ¼ k ðk 1Þ k 1 e
jAPPXjp
ARTICLE IN PRESS R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
555
Otherwise, by Lemma 3 we get k X
nˆ i Xð1
ffiffiffiffiffiffiffiffiffiffiffi p k 1 e oð1ÞÞn
i¼1
and, therefore, k
pk ffiffiffiffiffiffi jOPTj 1 þ ðk 1Þð1 1en oð1ÞÞn k p ffiffiffiffiffiffiffiffiffiffiffi jOPTj ¼ k ðk 1Þ k 1 e oð1Þ ! k pffiffiffiffiffiffiffiffiffiffiffi þ oð1Þ jOPTj: ¼ k ðk 1Þ k 1 e
jAPPXjp
&
Corollary 5. Algorithm Multi-Greedy has an approximation ratio of
k pffiffiffiffiffiffi kðk1Þ k 1e
for the e-dense k-
k pffiffiffiffiffiffi homogeneous set-cover problem, and a ratio of kðk1Þ þ oð1Þ for the e-dense k-bounded set-cover k 1e
problem. Now we intend to bound the approximation ratio of Algorithm Multi-Greedy as a function of a possibly more natural expression, ðknÞ; which is the maximum number of possible elements in a kbounded instance, ðC; UÞ; of the set-cover problem. Theorem 6. Algorithm Multi-Greedy has an approximation ratio of k qffiffiffiffiffiffiffiffiffiffiffiffiffi þ oð1Þ k ðk 1Þ k 1 mn ðkÞ
for the k-bounded set-cover problem. k
Proof. The approximation ratio from Theorem 4, denoted by Ratio; was computed for e ¼ m=nk! ; therefore, k pffiffiffiffiffiffiffiffiffiffiffi þ oð1Þ Ratio ¼ k ðk 1Þ k 1 e k ffi þ oð1Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ k ðk 1Þ k 1 n m ðkÞþoð1Þ
¼
k ffi þ oð1Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k ðk 1Þ k 1 mn þ oð1Þ ðkÞ
¼
k qffiffiffiffiffiffiffiffiffiffiffiffiffi þ oð1Þ k ðk 1Þ k 1 mn oð1Þ ðkÞ
ARTICLE IN PRESS R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
556
¼
k qffiffiffiffiffiffiffiffiffiffiffiffiffi þ oð1Þ: k ðk 1Þ k 1 mn ðkÞ
Since there are nk1 candidates for nk ; y; n2 ; and each candidate can be checked in polynomial time, we get the following lemma: Lemma 7. Algorithm Multi-Greedy can be implemented in polynomial time.
4. Hardness result The best constant approximation ratio that is known today for the k-bounded set-cover problem is k (see [1,5]). This is the best constant approximation ratio even if we know that the size of an optimal cover of a problem is at least kn: The following theorem tries to measure the hardness of finding an approximation for the e-dense k-bounded set-cover problem. Theorem 8. For every fixed constant e there is no polynomial time algorithm with an approximation k pffiffiffiffiffiffi ratio better than kðk1Þ by a constant for the e-dense k-bounded set-cover problem, unless there k 1e is a polynomial time algorithm with an approximation ratio better than k by a constant for the general k-bounded set-cover problem in which the size of the optimal cover is at least kn: Proof. We call a k-hyper-graph a hyper-graph in which each edge has at most k endpoints, for a given k: In this proof we consider the vertex-cover problem in k-hyper-graphs, which is equivalent to the k-bounded set-cover problem. We assume there exists a polynomial time algorithm A with an approximation ratio of R for the vertex-cover problem in e-dense k-hyper-graphs, and we show a lower bound for R: Given a general k-hyper-graph GðV; EÞ; ðn ¼ jV jÞ; we build an e-dense k-hyper-graph G0 ðV 0 ; E 0 Þ; ðn0 ¼ jV 0 jÞ: Then we apply Algorithm A on G0 and get a cover VCA ðG 0 Þ; which is by assumption an Rapproximation for G 0 : We show that if R is less than this lower bound, then the part of VCA ðG 0 Þ; that is a cover of G; has an approximation ratio better than k by a constant. Therefore, we get an approximation ratio better than k by a constant for the vertex-cover problem in a general khyper-graph, which is known to be impossible. We build G 0 as follows: We add a clique of an0 hyper-vertices to V ; and add all possible hyperedges with k endpoints, such that their endpoints are from V and at least one endpoint is from the n : We denote by OPT the optimal vertex cover of G; and by clique. Then, n0 ¼ an0 þ n ) n0 ¼ 1a 0 OPT’ the optimal vertex cover of G : Let us look at the total number of hyper-edges in G 0 : This has to be at least all possible hyperedges in a k-hyper-graph with n0 hyper-vertices, possibly without the hyper-edges, which all their endpoints are from V : Therefore, 0 0 n n an0 0 jE jX k k
ARTICLE IN PRESS R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
557
1 ¼ n0 ðn0 1Þ?ðn0 k þ 1Þ k! 1 ðn0 an0 Þðn0 an0 1Þ?ðn0 an0 k þ 1Þ k! ! 1 0 1 0 1 0k k1 k k k 0 k X ðn k þ 1Þ ðn an Þ ¼ n 1 0 ð1 aÞ : k! k! k! n 0k
k k Given e; we try to find a value of a for which jE 0 j ¼ enk! : Therefore by ð1 k1 n0 Þ ð1 aÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k k e oð1Þ; we get that G 0 is e-dense for a ¼ 1 ð1 k1 n0 Þ e þ oð1Þ: Any optimal cover of G 0 includes all the hyper-vertices of the clique. Also, we can assume that VCA ðG0 Þ includes them, otherwise all the hyper-vertices of V 0 would be in this cover (except maybe a constant number of hyper-vertices). Therefore, jOPT0 j ¼ an0 þ jOPTj: The hyper-vertices of the clique cover all the hyper-edges of the clique and the hyper-edges between the clique and V; therefore, the other hyper-vertices in the cover VCA ðG 0 Þ are a cover for G; which we denote by VCA ðGÞ: By the above and by the assumption on Algorithm A:
jVCA ðG 0 Þj ¼ an0 þ jVCA ðGÞjpRjOPT0 j ¼ Rðan0 þ jOPTjÞ; jVCA ðGÞjpan0 ðR 1Þ þ RjOPTj: k jOPTj: Therefore, we get By our assumption on G; npkjOPTj; thus, n0 p1a ak ak ðR 1ÞjOPTj þ RjOPTj ¼ jOPTj ðR 1Þ þ R : jVCA ðGÞjp 1a 1a ak We assume that 1a ðR 1Þ þ RXk; otherwise we get an approximation ratio which is better than k for the general k-hyper-graph G: Then, k : RX 1 þ ðk 1Þa qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k k By choosing a ¼ 1 ð1 k1 n0 Þ e þ oð1Þ we get
RX
X
k qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi k k k ðk 1Þ ð1 k1 n0 Þ e þ oð1Þ k
pffiffiffiffiffiffiffiffiffiffiffi oð1Þ: k ðk 1Þ k 1 e
k Note that the value of a is legal only for epð1 k1 n0 Þ ; which is true asymptotically for all k1 pk ffi : & n0 X1 e
ARTICLE IN PRESS 558
R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
5. The vertex-cover problem in e-dense graphs The case of the k-homogeneous set-cover problem when k ¼ 2 is known as the vertex-cover problem. Karpinski and Zelikovsky [7] discuss two kinds of dense graphs: For some e40; they call a graph GðV ; EÞ all-over e-dense if every vertex in G has at least ejV j neighbors, and they call a graph GðV ; EÞ e-dense if the number of edges is relatively high, i.e. jEjX12ejV j2 : The later definition is similar to our definition of the e-dense 2-homogeneous set-cover problem. 2 and 1þe as Karpinski Algorithm Multi-Greedy achieves the same approximation ratios of 2p2ffiffiffiffiffiffi 1e and Zelikovsky in e-dense graphs and in all-over e-dense graphs, respectively. The ratio for edense graphs is a special case of Corollary 5. It can be easily shown that the algorithm achieves the 2 in the case of all-over e-dense graphs. ratio of 1þe log jV j The best approximation ratios known for the vertex-cover problem are 2 2log 2 log jV j due to j Monien and Speckenmeyer [8] and due to Bar-Yehuda and Even [2], and 2 2 lnln lnjVjV j :ð1 oð1ÞÞ due to Halperin [4]. It has been recently shown by Eremeev [3], that the vertex-cover problem in 7þe all-over e-dense graphs is NP-hard to approximate within a factor less than 6þ2e : Clearly, there is 2 still a gap between the approximation 1þe for this problem, and the factor which is NP-hard to obtain. The following theorem tries to measure the hardness of finding better approximation 2 ratios than the ratios 1þe and 2p2ffiffiffiffiffiffi for the vertex-cover problem in all-over e-dense graphs and in 1e e-dense graphs, respectively.
Theorem 9. For every fixed constant e there is no polynomial time algorithm with an approximation 2 and 2p2ffiffiffiffiffiffi by a constant for the vertex-cover problem in all-over e-dense and in ratio better than 1þe 1e e-dense graphs, unless there is a polynomial time algorithm for the vertex-cover problem in general graphs with an approximation ratio better than 2 by a constant. Proof. By Theorem 8, for every fixed constant e there cannot be a polynomial time algorithm with for the vertex-cover problem in e-dense an approximation ratio better by a constant than 2p2ffiffiffiffiffiffi 1e graphs, unless there is a polynomial time algorithm with an approximation ratio better by a constant than 2 for the general vertex-cover problem in graphs in which the size of the optimal cover is at least n 2: By Nemhauser and Trotter (see [2,10]) we can assume that for any given graph GðV ; EÞ; the size of the optimal vertex-cover of the graph is at least n2: Therefore the claim is true for any graph. The case of all-over e-dense graphs can be proven similarly to the proof of Theorem 8, by choosing a ¼ e: &
Acknowledgments We would especially thank Dror Rawitz for the helpful discussions and corrections, and Ronit Reger for her helpful suggestions. Many thanks to Hadas Heier for her help in typing the manuscript.
ARTICLE IN PRESS R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
559
Appendix In the proof of Lemma 1 we presented the following optimization problem. In this section we pffiffiffiffiffiffiffiffiffiffiffi prove that the optimal solution of this optimization problem is ð1 k 1 eÞn: Min
k P
ni
i¼1
s:t:
ð5Þ and
k P
ðA:1Þ
ni pn;
i¼1
ni ; xi AR; where (5) is as follows: i!xiþ1 ¼ i!jUi jp n
k X
!i nj
j¼iþ1
n
k X
!i nj
j¼i
þði 1Þ!xi n
k X
! nj
j¼i
for any i such that 2pipk: Proof. We intend to show that there exists a solution, whose value is Nmin for the problem, that satisfies 81piok; ni ¼ 0; 82pipk; xi ¼ 0; nk ¼ Nmin :
ðA:2Þ
Therefore by using (5) and by the definition of e; enk ¼ k!jUjpnk ðn nk Þk ; nk k enk pnk nk 1 ; n nk k ep1 1 ; n ffi ffiffiffiffiffiffiffiffiffiffi p k nk Xð1 1 eÞn: Let N 0 ¼ ðnk 0 ; y; n1 0 Þ be an optimal solution for the problem which was stated in (A.1), i.e. nk 0 þ ? þ n1 0 ¼ Nmin : And let ðxk 0 ; y; x2 0 Þ be the suitable parameters for the solution N 0 : If N 0 satisfies (A.2) then we are done. Otherwise we show that we can take the optimal solution N 0 ; and build another optimal solution from it, which satisfies (A.2). We also show that after this process constraint (5) is still satisfied. We show the following claim: For each 2pipk if for each joi nj ¼ 0 and xj ¼ 0 then xi ¼ 0:
ARTICLE IN PRESS R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
560
In the special case of i ¼ 2; by definition x2 ¼ n1 ; so x2 ¼ 0 whenever n1 ¼ 0: By (5) we get for 2oipk: !i1 !i1 ! k k k X X X ði 1Þ!xi p n nj n nj þði 2Þ!xi1 n nj : j¼i
j¼i1
j¼i1
By the assumption, ni1 ¼ 0 and xi1 ¼ 0; therefore xi p0; which means that xi ¼ 0: Now we intend to show how we can change N 0 into a suitable optimal solution for (A.1) that satisfies (A.2): Let jok be the first index such that for each ioj; ni 0 ¼ 0 and nj 0 a0: By the previous claim, for each ipj; xi 0 ¼ 0: We move the value of nj 0 to njþ1 0 ; then after the change nj 0 ¼ 0 and njþ1 0 is equal to the sum of the old values of nj 0 and njþ1 0 : This change influences only the following two constraints of (5): Before the change, since xj 0 ¼ 0 !j !j k k X X 0 0 0 j!xjþ1 p n ni n ni ; ðA:3Þ i¼j
i¼jþ1
ðj þ 1Þ!xjþ2 0 p n
k X
!jþ1 ni 0
n
i¼jþ2
þ j!xjþ1 0 n
k X
!
k X
!jþ1 ni 0
i¼jþ1
ðA:4Þ
ni :
i¼jþ1
After the change: j!xjþ1 0 p0 We decrease xjþ1 0 to 0 in order to satisfy the first constraint. Therefore, the second constraint after the changes will be as follows: !jþ1 !jþ1 k k X X 0 0 0 ðj þ 1Þ!xjþ2 p n ni n ni : ðA:5Þ i¼jþ2
i¼j
Pk
Denote z ¼ n i¼jþ1 ni 0 ; and by D the change in the second constraint, i.e. the difference between (A.5) and (A.4), then D ¼ ðz nj 0 Þjþ1 þ zjþ1 j!xjþ1 0 z: By (A.3) DX ðz nj 0 Þjþ1 þ zjþ1 zðzj ðz nj 0 Þj Þ ¼ ðz nj 0 Þjþ1 þ zðz nj 0 Þj ¼ nj 0 ðz nj 0 Þj X0: After this change we get an optimal solution such that for ipj; ni 0 ¼ 0; while the validity of constraint (5) is not harmed. By induction we can move the value of njþ1 0 to njþ2 0 and so on, until we get an optimal solution in which for each iok; ni 0 ¼ 0 and nk 0 ¼ N 0 : &
ARTICLE IN PRESS R. Bar-Yehuda, Z. Kehat / Journal of Computer and System Sciences 69 (2004) 547–561
561
References [1] R. Bar-Yehuda, S. Even, A linear time approximation algorithm for the weighted vertex cover problem, J. Algorithms 2 (1981) 198–203. [2] R. Bar-Yehuda, S. Even, A local-ratio theorem for approximating the weighted vertex cover problem, Ann. Discrete Math. 25 (1985) 27–46. [3] A. Eremeev, On some approximation algorithms for dense vertex cover problem, Proceedings of SOR’99, Springer, 2000, pp. 58–62. [4] E. Halperin, Improved approximation algorithms for the vertex cover problem in graphs and hypergraphs, Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms, ACM-SIAM, 2000. [5] D.S. Hochbaum, Approximation algorithms for the set covering and vertex cover problems, SIAM J. Comput. 11 (3) (1982) 555–556. [6] D.S. Johnson, Approximation algorithms for combinatorial problems, J. Comput. System Sci. 9 (1974) 256–278. [7] M. Karpinski, A. Zelikovsky, Approximating dense cases of covering problems, Proceedings of DIMACS Workshop on Network Design: Connectivity and Facilities Location, Vol. 40, Princeton, 1998, pp. 169–178 (the slightly extended version appeared as a preprint at ICSI in Berkeley). [8] B. Monien, E. Speckenmeyer, Ramsey numbers and an approximation algorithm for the vertex cover problem, Acta Inform. 22 (1985) 115–123. [9] H. Nagamochi, T. Ibaraki, An approximation of the minimum vertex cover in a graph, Japan J. Indust. Appl. Math. 16 (1999) 369–375. [10] G.L. Nemhauser, L.E. Trotter, Vertex packings: structural properties and algorithms, Math. Programming 8 (1975) 232–248.