Information Processing Letters 113 (2013) 477–480
Contents lists available at SciVerse ScienceDirect
Information Processing Letters www.elsevier.com/locate/ipl
On the hardness of finding subsets with equal average Edith Elkind a,∗ , James B. Orlin b a b
Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore MIT Sloan School of Management, USA
a r t i c l e
i n f o
Article history: Received 26 August 2011 Received in revised form 28 March 2013 Accepted 6 April 2013 Available online 8 April 2013 Communicated by R. Uehara
a b s t r a c t We show that, given a set of positive integers, it is NP-complete to decide whether it contains two subsets with the same average. Our interest in this problem is motivated by questions in decision theory that are related to defining preferences on sets of objects given preferences over individual objects. © 2013 Elsevier B.V. All rights reserved.
Keywords: Computational complexity Expected utility theory Subset ranking
1. Introduction In decision-making scenarios, an agent often has to compare two objects from a given, fixed set of objects Q, and choose the one that she prefers. An agent is said to be rational if her preferences over the elements of Q are transitive, i.e., for any triple of elements a, b, c ∈ Q, if she prefers a over b and b over c, she also prefers a over c. A fundamental result in decision theory is that transitive preferences can be encoded by a utility function u : Q → R, so that an agent prefers a to b if and only if u (a) > u (b) [5]. Thus, to describe the agent’s behavior, it suffices to list the values of u (x) for all x ∈ Q. As long as u (x) = u ( y ) for all x, y ∈ Q, the function u (·) uniquely determines which of the two given objects will be chosen by the agent. The situation is more complicated when the agent has to choose between sets of objects, i.e., subsets of Q. There are multiple ways of extending preferences from objects to sets of objects: for instance, one can order sets according to their total utility, their average utility, or the utility of their best/worst element [1]. The choice of a preference
*
Corresponding author. Tel.: +65 6513 2028; fax: +65 6515 8213. E-mail addresses:
[email protected] (E. Elkind),
[email protected] (J.B. Orlin). 0020-0190/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.ipl.2013.04.001
extension method depends on what it means to select a set of objects rather than a single object: would the agent be able to enjoy all objects in the set, or just one of them, and if so, which one? Sometimes, selecting a set simply means that the object will eventually be chosen from this set uniformly at random. This is the case, for instance, when a group of agents votes in a single-winner election, an agent’s vote determines which candidates will have the top score, and the winner is chosen among the top-scoring candidates by tossing a fair coin; this setting is discussed in, e.g. [2,4,3]. In such settings, it is natural to compare subsets according to their expected utility with respect to this eventual probabilistic choice; if this choice can be assumed to be uniform, this implies that the agent should always choose a set with the highest average utility. However, it may happen that two subsets of Q have the same average utility with respect to u (·), even if the values of u (·) on elements of Q are all distinct. If this is the case, to fully specify the agent’s behavior, we will need to provide an additional tie-breaking rule, in order to describe how she makes her choice when faced with two subsets that have the same average utility. Therefore, given a utility function u (·) on a set Q, it is natural to ask whether this function is sufficient to fully describe the decision-making process, i.e., whether it is the case that
478
E. Elkind, J.B. Orlin / Information Processing Letters 113 (2013) 477–480
for every pair of subsets S , T ⊆ Q with S = T , we have 1 1 q∈ S u (q ) = | T | u (q)∈ T q. |S| In this note, we will show that this problem is computationally hard. More specifically, we will show that the complement of this problem, i.e., deciding whether a set of numbers contains two subsets with the same average, is NP-complete. In what follows, to simplify notation, instead of considering sets of objects and utility functions defined on these sets, we simply consider sets of natural numbers; these numbers can be thought of as the utilities of the objects in the set. As one would expect, our proof follows by a reduction from Subset Sum; however, the reduction is surprisingly complicated. 2. Main result In this section, we state and prove our main result. We first provide a formal definition of our problem. Definition 1. An instance of Subset Average problem is given by a set of positive integers Q = {q1 , . . . , qn }. It is a “yes”-instance if there are two (possibly overlapping) subsets S and T of Q such that S = T and
qi ∈ S
qi
|S|
qi ∈ T
=
qi
|T |
,
and a “no”-instance otherwise. We will now prove that Subset Average is computationally hard. Theorem 1. Subset Average is NP-complete.
16m2 N m < K .
T = xi a i ∈ A ∪ y i a i ∈ / A ∪ { v }. Indeed, we have | S | = | T | = m + 1, and
Σ( S ) =
(2)
,
xj + y j + zj < N j 3 +
+
zi + u =
(4M i + N i + ai )
ai ∈ A
zi ∈ S
(7M i + N i + ai ) + K
ai ∈ / A
=
m (4M i + N i + ai ) + 3 Mi + K , i =1
Σ( T ) =
xi +
xi ∈ T
+
(1)
yi +
yi ∈T
ai ∈ / A
yi + v =
1 10m
,
(3)
=
(M i + N i )
ai ∈ A
(4M i + N i + ai ) + K + 3
ai ∈ / A
a j < M j, Nj
yi ∈ S
y i = 4M i + N i + ai ,
and let Q i = {xi , y i , zi }. Also, set K = B 2m+7 , and let u = K , m v = K + 3 i =1 M i + b. Observe that for any j m we have
150m
(5)
Indeed, inequalities (1) and (2) are immediate from the definitions of M j and N j , inequality (3) follows from (1) and (2), inequality (4) follows from (3) and the observa tions that i =1,..., j −1 N i < N j /( B − 1) and B − 1 > 30m + 1, and inequality (5) follows from the definition of K and the fact that B 3 > 16m2 . m Finally, define Q = i =1 Q i ∪ {u , v }. For any subset Q of Q , we will denote by Σ( Q ) the sum of all elements of Q . Note that inequality (3) implies that Σ( Q ) < 4mN m for every Q ⊆ Q \ {u , v }. Throughout the proof, we will consider Σ( S ) and Σ( T ) written in base B = 50a∗ m. For i = 0, . . . , 2m + 7, let si (respectively, t i ) denote the (i + 1)-st least significant digit of Σ( S ) (respectively, Σ( T )) in base B. Observe that when we add the elements of any Q ⊆ Q in base B, there is no carry. Therefore, for i = 1, . . . , m, si and sm+i +4 are fully determined by the set S ∩ { Q i , v }, while t i and tm+i +4 are fully determined by T ∩ { Q i , v }. Specifically, sm+i +4 = | S ∩ Q i |, tm+i +4 = | T ∩ Q i |, and, moreover, if v ∈ / S, we have si ∈ {0, 1, 4, 7, 5, 8, 11, 12} and if v ∈ S, we have si ∈ {3, 4, 7, 10, 8, 11, 14, 15}; the same holds for t i . In fact, given the value of si (respectively, t i ), we can reconstruct S ∩ Q i (respectively, T ∩ Q i ) as long as we know whether v ∈ S (respectively, v ∈ T ). Suppose first that I is a “yes”-instance of Subset Sum, i.e., for some set A ⊆ A we have ai ∈ A ai = b. Then we can set
zi = 7M i + N i + ai ,
Mj <
(4)
10m
i =1
S = y i ai ∈ A ∪ zi ai ∈ / A ∪ {u },
Proof. It is not hard to see that this problem is in NP: we can guess two sets S and T and compute the averages of their elements. To show that the problem is NP-hard, we give a reduction from Subset Sum. Recall that an instance of Subset Sum is given by a set of positive integers A = {a1 , . . . , am } and another positive integer b. It is a “yes”-instance if there exists an A ⊆ A such that ai ∈ A ai = b, and a “no”instance otherwise. We can assume without loss of generality that m > 3 and max{ai | ai ∈ A } 2, as otherwise the problem is easily solvable. Given an instance I of Subset Sum, we construct an instance of Subset Average as follows. Set a∗ = max{ai | ai ∈ A }, let B = 50a∗ m, and, for i = 1, . . . , m, set M i = B i , N i = B m+i +4 . Now define
xi = M i + N i ,
j −1 Nj ( xi + y i + z i ) < ,
m i =1
Mi +
ai
ai ∈ A
m m ( M i + N i + ai ) + 3 Mi + K + 3 Mi , i =1
ai ∈ / A
i =1
i.e., Σ( S ) = Σ( T ). For the converse direction, suppose that there exist two Σ( S )
Σ( T )
sets S and T , S = T , with | S | = | T | . Pick S and T so that they form a minimal pair with this property, i.e., so that there do not exist S ⊂ S and T ⊂ T such that
E. Elkind, J.B. Orlin / Information Processing Letters 113 (2013) 477–480
Σ( S ) |S|
=
Σ( T ) | T | . Note that for this choice of S and T , it can-
not be the case that S ∩ T = ∅ and | S | = | T |: indeed, we can set S = S \ {q}, T = T \ {q}, where q ∈ S ∩ T , and obtain
Σ( S ) Σ( S ) − q Σ( T ) − q Σ( T ) = = = . |S | |S| − 1 |T | − 1 |T | We will now show how to construct the set A given S and T . Suppose first that S ∩ {u , v } = ∅, T ∩ {u , v } = ∅. Using (5) and (3), we obtain
Σ( T ) Σ( S ) K > 4mNm , |S| 3m + 2 |T | a contradiction. Thus, either S ∩ {u , v } = T ∩ {u , v } = ∅, ∅ , T ∩ {u , v } = ∅. To further simplify our or S ∩ {u , v } = analysis, we need the following lemma. Lemma 1. Suppose that S ∩ {u , v } = ∅, T ∩ {u , v } = ∅, and set α = | S ∩ {u , v }|, β = | T ∩ {u , v }|. We have
|S| α = . |T | β Proof. We have
479
We show that case (1) leads to a contradiction; in all the remaining cases, we will construct a set A with = b. In cases (2)–(4), we will only consider the first a∈ A of the two symmetric scenarios listed above; the other scenario can be handled in a similar manner. We will analyze these four possibilities one by one.
/ S ∪ T. (1) u , v ∈ Set k = max{i | Q i ∩ ( S ∪ T ) = ∅}. Suppose that S ∩ Q k = ∅, but T ∩ Q k = ∅. We have
Σ( S ) Σ( T ) Nk < , |T | 10m |S| a contradiction. Similarly, T ∩ Q k = ∅, S ∩ Q k = ∅ leads to a contradiction, too. Hence, we have S ∩ Q k = ∅, T ∩ Q k = ∅. In fact, a stronger statement holds. Lemma 2. Let γ |S| |T | = δ .
γ = | S ∩ Q k |, δ = | T ∩ Q k |. Then we have
Proof. The proof is similar to that of Lemma 1. By the argument above, we have γ , δ ∈ {1, 2, 3}. Furthermore,
γ Nk Σ( S ) γ Nk + 14Mk +
i =1
α , β ∈ {1, 2}, and γ Nk + 15
α K Σ( S ) < α K + 4mNm , β K Σ( T ) < β K + 4mNm . Hence, if β| S | > α | T |, we have | S |
k −1 ( xi + y i + z i )
Similarly, α | T |+1 β
, and
Nk 150m
+
δ Nk Σ( T ) Nk δ +
Nk 10m
1 5m
1
Nk γ +
5m
.
.
Σ( T ) Σ( S ) β(α K + 4mNm ) β K < , |S| α|T | + 1 |T | |T |
Hence, if | T | > δ , we have | S |
where we use the fact that 4mN m | T | < K . Similarly, β| S |+1 if β| S | < α | T |, we have | T | α , and
1 ) Nk δ Nk Σ( T ) Σ( S ) δ(γ + 5m < , |S| γ |T | + 1 |T | |T |
Σ( S ) α K α (β K + 4mNm ) Σ( T ) > , |S| |S| β| S | + 1 |T |
where the strict inequality follows from the fact that | T | < 5m. Similarly, if ||TS || < γδ , we have | T | δ| Sγ|+1 , and
where we use the fact that 4mN m | S | < K . Thus, we have β| S | = α | T |, i.e., the lemma is proven. 2 In particular, Lemma 1 implies that if S ∩ {u , v } = ∅, we cannot have S ∩ {u , v } = T ∩ {u , v }, as this will mean | S | = | T |, S ∩ T = ∅, and hence contradict our choice of Σ( S ) ∅, we have Σ( = ||TS || = αβ . S , T . Further, if S ∩ {u , v } = T) Consequently, since β si , αt i < B for all i = 0, . . . , m, for s each i = 0, . . . , m we have either si = t i = 0 or t i = α β. i We will now consider all the remaining possibilities for S ∩ {u , v } and T ∩ {u , v }, namely, (1) S ∩ {u , v } = ∅, T ∩ {u , v } = ∅; (2) S ∩ {u , v } = {u }, T ∩ {u , v } = {u , v } (or, symmetrically, T ∩ {u , v } = {u }, S ∩ {u , v } = {u , v }); (3) S ∩ {u , v } = { v }, T ∩ {u , v } = {u , v } (or, symmetrically, T ∩ {u , v } = { v }, S ∩ {u , v } = {u , v }); (4) S ∩ {u , v } = {u }, T ∩ {u , v } = { v } (or, symmetrically, T ∩ {u , v } = {u }, S ∩ {u , v } = { v }).
|S|
γ
γ | T |+1 δ
, and
1 γ (δ + 5m ) Nk Σ( T ) Σ( S ) γ Nk > , |S| |S| δ| S | + 1 |T |
where the strict inequality follows from the fact that | S | < 5m. S) In both cases, we get a contradiction with Σ( |S| = |S| Σ( T ) | T | . Hence, | T |
= γδ . 2
Now, for i = 1, . . . , m, we have si , t i ∈ {0, 1, 4, 7, 5, 8, 11, 12}, and, moreover,
s0 =
yi ∈ S
ai +
t0 =
ai ,
zi ∈ S
Σ( S )
yi ∈T
|S|
γ
ai +
ai .
zi ∈ T
Observe that Σ( T ) = | T | = δ , and, furthermore, δ si , γ t i < B for all i = 0, . . . , m, γ , δ ∈ {1, 2, 3}. Hence, γ s for i = 0, . . . , m we have either si = t i = 0 or t i = δ . i Assuming without loss of generality that γ δ , we have the following possibilities:
480
E. Elkind, J.B. Orlin / Information Processing Letters 113 (2013) 477–480
− γ = δ.
By Lemma 2, we have | S | = | T |, and, by the argument above, sk = tk . This is only possible if S ∩ Q k = T ∩ Q k . Further, S ∩ Q k = ∅ by our choice of k. Together with | S | = | T |, this contradicts our choice of S and T . − γ = 1, δ = 2. For all i = 1, . . . , k, we have either si = t i = 0 or si = 4, t i = 8, i.e., S ∩ Q i = { y i }, T ∩ Q i = {xi , zi }. Set A = {ai | y i ∈ S }; the set A is non-empty since ak ∈ A . We have
s0 =
ai ,
t0 =
ai ∈ A
ai ,
t0 =
ai ∈ A
ai ,
t0 = 2
ai ∈ A
ai ,
ai ∈ A
i.e., t 0 = 2s0 . On the other hand, we have 3s0 = t 0 and hence s0 = 0, a contradiction with A = ∅. − γ = 2, δ = 3. For all i = 1, . . . , k, we have either si = t i = 0 or si = 8, t i = 12, i.e., S ∩ Q i = {xi , zi }, T ∩ Q i = {xi , y i , zi }. Set A = {ai | zi ∈ S }; the set A is nonempty since ak ∈ A . We have
s0 =
ai ,
t0 = 2
ai ∈ A
ai +
yi ∈ S
t0 =
yi ∈T
zi ∈ S
ai +
ai =
ai +
ai ∈ A
ai + b =
zi ∈ T
Since 2s0 = t 0 , we have
ai ∈ A
ai ∈ A
ai +
ai ,
ai ∈ / A
ai + 2
ai + b .
ai ∈ / A
ai = b, as required.
ai +
ai + b =
ai + b ,
ai ∈ / A
zi ∈ S
ai + b = 2
zi ∈ T
ai +
ai ∈ / A
ai + b .
ai ∈ A
Since 2s0 = t 0 , we have ai ∈ A ai = b, as required. (4) S ∩ {u , v } = {u }, T ∩ {u , v } = { v }. We have α = 1, β = 1. Further, for i = 1, . . . , m, we have si ∈ {0, 1, 4, 7, 5, 8, 11, 12} and t i ∈ {3, 4, 7, 10, 8, 11, 14, 15}. Thus, si = t i implies that one of the following four cases holds: (i) si = 4, t i = 3 + 1, i.e., S ∩ Q i = { y i }, T ∩ Q i = {xi }; (ii) si = 7, t i = 3 + 4, i.e., S ∩ Q i = { zi }, T ∩ Q i = { y i }; (iii) si = 7 + 1, t i = 3 + 4 + 1, i.e., S ∩ Q i = {xi , zi }, T ∩ Q i = { xi , y i } ; (iv) si = 7 + 4, t i = 3 + 1 + 7, i.e., S ∩ Q i = { y i , zi }, T ∩ Q i = { xi , z i } . Further, cases (iii) and (iv) are not possible since our analysis implies that | S | = | T | and therefore S ∩ T = ∅. Set A = {ai | y i ∈ S }, A = {ai | zi ∈ S }. We have
s0 =
ai +
yi ∈ S
ai ∈ A
yi ∈T
ai ,
i.e., t 0 = 2s0 . On the other hand, we have 3s0 = 2t 0 and hence s0 = 0, a contradiction with A = ∅. In all cases, we obtain a contradiction. Thus, we can / S ∪ T is impossible. conclude that the case u , v ∈ (2) S ∩ {u , v } = {u }, T ∩ {u , v } = {u , v }. We have α = 1, β = 2. Further, for i = 1, . . . , m, we have si ∈ {0, 1, 4, 7, 5, 8, 11, 12} and t i ∈ {3, 4, 7, 10, 8, 11, 14, 15}. Thus, 2si = t i implies that one of the following three cases holds: (i) si = 4, t i = 3 + 1 + 4, i.e., S ∩ Q i = { y i }, T ∩ Q i = { xi , y i } ; (ii) si = 7, t i = 3 + 4 + 7, i.e., S ∩ Q i = { zi }, T ∩ Q i = { y i , z i }; (iii) si = 1 + 4, t i = 3 + 7, i.e., S ∩ Q i = {xi , y i }, T ∩ Q i = { z i }. However, case (iii) is impossible, as it implies si +m+4 = 2, t i +m+4 = 1, a contradiction with 2si = t i . Thus, each i = 1, . . . , m satisfies either (i) or (ii). Set A = {ai | y i ∈ S }. We have
s0 =
s0 =
yi ∈ S
i.e., s0 = t 0 . On the other hand, we have 2s0 = t 0 and hence s0 = 0, a contradiction with A = ∅. − γ = 1, δ = 3. For all i = 1, . . . , k, we have either si = t i = 0 or si = 4, t i = 12, i.e., S ∩ Q i = { y i }, T ∩ Q i = {xi , y i , zi }. Set A = {ai | y i ∈ S }; the set A is non-empty since ak ∈ A . We have
s0 =
(3) S ∩ {u , v } = { v }, T ∩ {u , v } = {u , v }. We have α = 1, β = 2. Further, for i = 1, . . . , m, we have si , t i ∈ {3, 4, 7, 10, 8, 11, 14, 15}. Thus, 2si = t i implies that one of the following two cases holds: (i) si = 3 + 1, t i = 3 + 1 + 4, i.e., S ∩ Q i = {xi }, T ∩ Q i = {xi , y i }, or (ii) si = 3 + 4, t i = 3 + 4 + 7, i.e., S ∩ Q i = { y i }, T ∩ Q i = { y i , z i }. Set A = {ai | xi ∈ S }. We have
t0 =
yi ∈T
ai =
ai +
ai ∈ A
zi ∈ S
ai +
ai + b =
zi ∈ T
Since s0 = t 0 , we have
ai ,
ai ∈ A
ai + b .
ai ∈ A
ai ∈ A
ai = b, as required.
2
Acknowledgements Edith Elkind was supported by Singapore NRF Research Fellowship 2009-08 and by NTU SUG grant. James B. Orlin was partially supported by Office of Naval Research grant N000141110056. The authors would like to thank the anonymous IPL referees for their very useful feedback. This work was initiated during Dagstuhl seminar 10171, and the authors would like to thank the Dagstuhl staff for providing a great research environment. References [1] S. Barbera, W. Bossert, P.K. Pattanaik, Ranking sets of objects, in: S. Barbera, P.J. Hammond, C. Seidl (Eds.), Handbook of Utility Theory, Kluwer, Boston, 1998, pp. 895–979 (Chapter 17). [2] Y. Desmedt, E. Elkind, Equilibria of plurality voting with abstentions, in: ACM EC’10, 2010, pp. 347–356. [3] S. Obraztsova, E. Elkind, On the complexity of voting manipulation under randomized tie-breaking, in: IJCAI’11, 2011, pp. 319–324. [4] S. Obraztsova, E. Elkind, N. Hazon, Ties matter: Complexity of voting manipulation revisited, in: AAMAS’11, 2011, pp. 71–78. [5] J. von Neumann, O. Morgenstern, Theory of Games and Economic Behavior, Princeton University Press, Princeton, 1944.