JID:YINCO AID:104439 /FLA
[m3G; v1.260; Prn:30/08/2019; 17:14] P.1 (1-11)
Information and Computation ••• (••••) ••••••
Contents lists available at ScienceDirect
Information and Computation www.elsevier.com/locate/yinco
On XOR lemmas for the weight of polynomial threshold functions ✩ Kazuyuki Amano ∗ , Shoma Tate Department of Computer Science, Gunma University, Tenjin 1-5-1, Kiryu, Gunma 376-8515 Japan
a r t i c l e
i n f o
Article history: Received 30 June 2016 Received in revised form 17 February 2017 Accepted 30 July 2017 Available online xxxx Keywords: Computational complexity Boolean functions PTF Integer programming
a b s t r a c t A multilinear polynomial p is said to sign-represent a Boolean function f : {−1, 1}n → {−1, 1} if f (x) = sgn( p (x)) for all x ∈ {−1, 1}n . In this paper, we consider the length and weight of polynomials sign-representing Boolean functions of the form ⊕k f , the XOR of k copies of f on disjoint sets of variables. Firstly, we show that for an infinite family of functions f , a naive construction does not yield a shortest polynomial sign-representing ⊕k f . More precisely, we give a construction of polynomials sign-representing ⊕k ANDn whose length is strictly smaller than the k-th power of the minimum length of a polynomial sign-representing ANDn , for every k ≥ 2 and n ≥ 2 except for k = n = 2. Previously, such polynomials were known only for n = 2 (Sezener and Oztop, 2015). A similar result for the weight is also provided. Secondly, we introduce a parameter v ∗f of a Boolean function f and show that the k-th root of the minimum weight of a polynomial sign-representing ⊕k f converges between v ∗f and ( v ∗f )2 as k goes to infinity. © 2019 Elsevier Inc. All rights reserved.
1. Introduction
Throughout the paper, we consider a Boolean function as a mapping from {−1, 1}n to {−1, 1}; −1 denotes True and 1 denotes False. We discuss the representation of Boolean functions by polynomial threshold functions (PTF, in short). Let p be a multilinear polynomial on n variables. If f (x) = sgn( p (x)) for every x ∈ {−1, 1}n , we say that f is computed by a polynomial threshold function p, or p sign-represents f . The PTF representation of Boolean functions have been extensively studied especially in complexity theory and learning theory (see e.g., [3–9]). The most well-investigated measure in the study of PTF representation is its degree, which is defined as the minimum degree over all polynomials sign-representing f . Other important but less understood measures are the weight and length. The weight of a Boolean function f is the minimum value of the sum of the absolute values of integer coefficients of a polynomial that sign-represents f . The length of f is the minimum number of monomials in a polynomial that sign-represents f . For known results on the PTF weight or length, see, e.g., [3,5,10–14] and the references therein.
✩ A part of this work was presented by the first author under the title “On XOR Lemma for Polynomial Threshold Weight and Length” at the 10th International Conference on Language and Automata Theory and Application (LATA ’16), LNCS 9618, pp. 256–269 (2016). Corresponding author. E-mail addresses:
[email protected] (K. Amano),
[email protected] (S. Tate).
*
https://doi.org/10.1016/j.ic.2019.104439 0890-5401/© 2019 Elsevier Inc. All rights reserved.
JID:YINCO AID:104439 /FLA
[m3G; v1.260; Prn:30/08/2019; 17:14] P.2 (1-11)
K. Amano, S. Tate / Information and Computation ••• (••••) ••••••
2
In this paper, we focus on these two measures for a certain class Boolean functions. For Boolean functions f and g, let f ⊕ g denote the XOR of f and g on disjoint sets of variables. If p sign-represents f and q sign-represents g, then pq sign-represents f ⊕ g. It is natural to ask whether this naive construction gives a best polynomial for f ⊕ g. O’Donnell and Servedio [1] proved that the answer is “yes” for PTF degree. In fact, they proved the “XOR Lemma” saying that the degree of f ⊕ g is equal to the sum of the degrees of f and g. It would be natural to expect that such a property holds for PTF length. The problem to verify this is listed e.g., in the open problems compiled by O’Donnell [15, p. 10]. Recently, Sezener and Oztop [2] gave a heuristic algorithm to find a short PTF and computationally verified that every 6-variable Boolean function can be sign-represented using at most 26 monomials. This is surprising because it implies that the length of the function x1 x2 ⊕ x3 x4 ⊕ x5 x6 is at most 26, which is strictly smaller than 33 ; the cube of the length of x1 x2 = sgn(x1 + x2 + 1). This shows that the “XOR Lemma” does not hold for PTF length in its “ideal” form, and gives us a strong motivation for further research. What functions admit such a saving? How much saving can be possible? 1.1. Our results Let ⊕k f denote the XOR of k copies of f on disjoint sets of variables and ANDn denote the AND of n variables. In the first part of this paper (in Section 3), we focus on the PTF representation of ⊕k ANDn which is also known as the generalized inner product functions. First, we give explicit polynomials that sign-represent ANDn ⊕ ANDn whose length or weight is less than the one in a naive construction. Using this, we see that the length of ⊕k ANDn is strictly smaller than the k-th power of the length of ANDn for every k ≥ 2 and n ≥ 2 except for k = n = 2 (Theorem 4), and the weight of ⊕k ANDn is strictly smaller than the k-th power of the weight of ANDn for every k ≥ 2 and n ≥ 3 (Theorem 5). In the second part of this paper (in Section 4), we analyze the asymptotic behavior of the length and weight of the function ⊕k f . We introduce a parameter v ∗f of a Boolean function f and obtain a weak version of the “XOR Lemma” for
PTF weight. Namely, we show that the k-th root of the weight of ⊕k f converges between v ∗f and ( v ∗f )2 as k goes to infinity. We define this parameter by the value of a certain linear programming problem (the actual definition will be given in Section 4).1 Interestingly, this LP problem is a relaxation of two LP problems simultaneously. The first one is a problem whose value is the weight of f , and the second one is a problem whose value is the spectral norm (a.k.a. Fourier L 1 -norm) of f . 1.2. Organization of paper In Section 2, we give notations and definitions. In Section 3, we exhibit explicit constructions of short and low-weight polynomials sign-representing the XOR of ANDs. In Section 4, we show that the weight and length of Boolean functions of the form ⊕k f is strongly related to a parameter based on a certain LP problem. Finally, in Section 5, we give some open questions. 2. Preliminaries Let [m, n] denote the set {m, m + 1, . . . , n} and [1, n] is simply denoted by [n]. Let Z denote the set of integers and Z≥0 denote the set of non-negative integers. For a set S ⊆ [n], we write x S to denote the monomial i ∈ S xi . Note that x∅ is the constant 1. A multilinear polynomial is the sum of the monomials
p (x1 , . . . , xn ) =
p S xS .
S ⊆[n]
Throughout the paper, we assume that all the coefficients p S are integers. Definition 1. Let f : {−1, 1}n → {−1, 1} be a Boolean function and let p : {−1, 1}n → Z be a multilinear polynomial with integer coefficients. We say that p sign-represents f if f (x) = sgn( p (x)) and p (x) = 0 for every x ∈ {−1, 1}n , where sgn( y ) = 1 if y > 0 and sgn( y ) = −1 if y < 0.
S 2. The weight of a polynomial p (x) = is the sum of the absolute values of the coefficients Definition S ⊆[n] p S x n S ⊆[n] | p S |. Let f : {−1, 1} → {−1, 1} be a Boolean function. The weight of f , denoted by wt ( f ), is defined as the minimum weight of a polynomial that sign-represents f . The length of f , denoted by len( f ), is defined as the minimum number of monomials in a polynomial that sign-represents f .
1 An anonymous reviewer pointed out that our parameter is essentially equivalent to the inverse of the correlation to a certain family of functions discussed in [14]. See Section 4 for more details on this connection.
JID:YINCO AID:104439 /FLA
[m3G; v1.260; Prn:30/08/2019; 17:14] P.3 (1-11)
K. Amano, S. Tate / Information and Computation ••• (••••) ••••••
3
Definition 3. Let f be a Boolean function on {x1 , . . . , xn } and g be a Boolean function on { y 1 , . . . , yn }. Let f ⊕ g denote the Boolean function on {x1 , . . . , xn , y 1 , . . . , yn } whose value is the XOR of f (x1 , . . . , xn ) and g ( y 1 , . . . , yn ). Let ⊕k f denote the XOR of k copies of f on disjoint sets of variables. 3. Upper bounds on the length and weight of XOR of ANDs Let ANDn denote the AND of n variables. The function IP2k := ⊕k AND2 is usually called the inner product mod 2 function. In this section, we give constructive proofs of the following theorems. Theorem 4. For k ≥ 2 and n ≥ 2, len(⊕k ANDn ) is strictly smaller than len(ANDn )k except for k = n = 2. Theorem 5. For k ≥ 2 and n ≥ 3, wt(⊕k ANDn ) is strictly smaller than wt(ANDn )k . We note here that we have not succeeded to prove or disprove that wt (⊕k AND2 ) = wt(AND2 )k (= 3k ). 3.1. Optimal representation for ANDn Observe that ANDn is sign-represented by x1 + x2 + · · · + xn + (n − 1). We first show that this polynomial is in fact optimal in terms of the length and weight. The following fact may be known, but we include a proof for completeness. Fact 6. For every n, len(ANDn ) = n + 1 and wt(ANDn ) = 2n − 1. Proof. The upper bounds are obvious by the representation ANDn (x1 , . . . , xn ) = sgn(x1 + · · · + xn + (n − 1)). Below we show the lower bounds. We first show the lower bound on the length. Suppose that a length- polynomial p (x) = i ∈[] p i x S i sign-represents ANDn . For S ⊆ [n], let S˜ denote an n-bit vector over GF(2) such that the j-th bit of S˜ is 1 iff j ∈ S. Let χ : {1, −1} → GF(2) be a mapping such that χ (1) = 0 and χ (−1) = 1. When we apply χ for a vector, it applies element-wise. Let M p be an × n matrix over GF(2) whose i-th row is S˜ i and let q be a length- row vector over Z whose i-th entry is p i . Observe that
p (x) = q · χ −1 ( M p χ (x)). We first suppose that the rank of M p is less than n. Since p sign-represents ANDn , p (x) ≤ −1 only when x = (−1, . . . , −1). However, there also exists an input y = (−1, . . . , −1) such that M p (χ (−1, . . . , −1)) = M p (χ ( y )), which implies p ( y ) ≤ −1, a contradiction. We now assume that M p is an n × n nonsingular matrix. For any n-bit vector a, if M p χ (x1 ) = a and M p χ (x2 ) = a, then p (x1 ) = − p (x2 ), where a denotes the bitwise complement of a. Since M p is nonsingular, we see that there must exist such a pair x1 , x2 = (−1, . . . , −1), which says that every length-n polynomial cannot sign-represent ANDn . We now proceed to the lower bound on the weight. Suppose a weight-w polynomial p (x) = i ∈[ w ] p i x S i sign-represents ANDn . For notational convenience, we restrict p i ∈ {−1, 1}, but allow S = { S 1 , . . . , S w } to be a multiset. Let M p be a w × n matrix over GF(2) defined as above. We can assume that M p has rank n since otherwise p never sign-represents ANDn . Without loss of generality, we can assume that S˜ 1 , S˜ 2 , . . . , S˜ n are linearly independent (by renumbering if necessary). Write p (x) as
p (x) =
i ∈[n]
pi xSi +
pi xSi ,
(1)
i ∈[ w ]\[n]
and put q1 and q2 be the first and second polynomials in the RHS of Eq. (1). By the non-singularity of M q1 , we observe that the range of q1 (x) over all x ∈ {−1, 1}n is {−n, −n + 2, . . . , n}. Since p (x) ≥ 1 except for x = (−1, . . . , −1), q2 should contain at least n − 1 terms, which implies w ≥ n + (n − 1) = 2n − 1 completing the proof. 2 3.2. Short polynomials for XOR of ANDs In this subsection, we show Theorem 4 by giving explicit constructions of short polynomials that sign-represent the XOR of ANDs. We begin with small cases. Fact 7. len(⊕2 AND2 ) = 9 and len(⊕3 AND2 ) ≤ 26.
JID:YINCO AID:104439 /FLA
4
[m3G; v1.260; Prn:30/08/2019; 17:14] P.4 (1-11)
K. Amano, S. Tate / Information and Computation ••• (••••) ••••••
Obviously, the 9-monomial polynomial (x1 + x2 + 1)(x3 + x4 + 1) sign-represents x1 x2 ⊕ x3 x4 . The optimality of this can easily be verified by using a computer. The second inequality is obtained by Sezener and Oztop [2, Sect. 7.2] who gave a polynomial of length 26 and total weight 686 that sign-represents ⊕3 AND2 by a heuristic method. Below we describe another polynomial p (x) of the same length that sign-represents x1 x2 ⊕ x3 x4 ⊕ x5 x6 .
p (x) = 2(x{3,5} − x{1,2,3,5} + x{4,5} + x{2,4,5} − x{2,3,4,5} + x{1,2,3,4,5} + x{1,6}
+x{2,3,6} + x{4,6} + x{1,2,3,4,6} − x{3,5,6} − x{2,3,5,6} + x{1,2,3,5,6} −x{2,4,5,6} + x{2,3,4,5,6} ) +3(x{2} − x{1,2} − x{1,3,4} + x{1,5} − x{1,2,4,5} + x{1,3,6} −x{1,2,3,6} − x{2,3,4,6} − x{4,5,6} + x{1,2,4,5,6} − x{1,2,3,4,5,6} ). The weight of this polynomial is 63, which is the smallest among all 26-monomial polynomials that we have found. Note that we found this polynomial using an IP solver [16]. We strongly believe that len(⊕3 AND2 ) is actually 26, but we have not succeeded in showing this. Fact 7 shows that len(⊕k AND2 ) is strictly smaller than len(AND2 )k for every k ≥ 3 since len(⊕i + j f ) ≤ len(⊕i f ) · len(⊕ j f ). In order to prove Theorem 4, it is sufficient to show that len(ANDn ⊕ ANDn ) is strictly smaller than len(ANDn )2 for n ≥ 3. Theorem 8. For every n ≥ 3, len(ANDn ⊕ ANDn ) ≤ n2 + n + 4. Proof. The proof is constructive. Let U = {1, . . . , n} and V = {n + 1, . . . , 2n}. Consider a polynomial of the form
p (x1 , . . . , xn , xn+1 , . . . , x2n ) = a + b
xi + c
i ∈U ∪ V
xi x j
i ∈U , j ∈ V
i = j (mod n)
+d(x + x ) + e · xU ∪ V . U
V
Let a = (n − 1)2 , b = (n − 1.5), c = 1, d = 0.5(−1)n+1 and e = 1. Here we use half integral weights for simplicity. The number of monomials in p is
1 + 2n + n(n − 1) + 2 + 1 = n2 + n + 4. Below we show that p sign-represents the XOR of ANDn (x1 , . . . , xn ) and ANDn (xn+1 , . . . , x2n ). For any input x = (x1 , . . . , x2n ), let u and v denote the number of −1’s in the first half and second half of x, respectively, and let s := |{i ∈ [n] | xi = xn+i }|. Observe that the value of p depends only on u, v and s, which we denote by p (u , v , s). An easy calculation shows that
p (u , v , s) = a + b{2n − 2(u + v )} + c {(n − 2u )(n − 2v ) + 2s − n}
+d{(−1)u + (−1) v )} + e (−1)u + v .
(2)
What we should verify is that p (u , v , s) > 0 if u = v = n or u , v < n and p (u , v , s) < 0 otherwise. Without loss of generality we can assume that u ≥ v. Since as s increases, p (u , v , s) increases, it is sufficient to verify that (a) p (n, n, 0) > 0, (b) p (u , v , u − v ) > 0 (∀ v ≤ u < n), and (c) p (n, v , n − v ) < 0 (∀ v < n). The proof of these is a bit tedious but elementary. First we consider the case u = v which covers (a) and a part of (b). We have that
p (u , u , 0) = 4(n − u )(n − u − 1.5) + 1 + 2 · 1[(n − u ) is odd], here 1[·] is an indicator function. It is easy to check that the value of the right hand side of the above formula is positive for every integer 0 ≤ u ≤ n. Now we consider the case u = v which covers the rest of (b) and (c). Let p˜ (u , v ) denote the sum of the first three terms of p (u , v , u − v ) in Eq. (2). Put v = u − α (α ≥ 1). We have
p˜ (u , u − α ) = 4(n − u )(n − u + α − 1.5) − α + 1. What we should show is p (u , u − α , α ) is negative when u = n, and is positive when u ≤ n − 1. Since p˜ (n, n − α ) = 1 − α , we have p (n, n − α , α ) < 0 for every α ≥ 1. d p˜ (u ,u −α ) Since the derivative is negative when u ≤ n − 1 < n + 2α4−3 , we have du
p˜ (u , u − α ) ≥ p˜ (n − 1, n − 1 − α ) = 3α − 1 ≥ 2, which implies p (u , u − α , α ) ≥ 1 for every 1 ≤ α ≤ u ≤ n − 1 as desired.
2
JID:YINCO AID:104439 /FLA
[m3G; v1.260; Prn:30/08/2019; 17:14] P.5 (1-11)
K. Amano, S. Tate / Information and Computation ••• (••••) ••••••
5
By Theorem 8, we have
len(⊕2 ANDn ) ≤ n2 + n + 4 = len(ANDn )2 − (n − 3), which is strictly smaller than len(ANDn )2 for n ≥ 4. For n = 3, we can verify that the following polynomial with 15 monomials sign-represents x1 x2 x3 ⊕ x4 x5 x6 .
p (x) = 8 + 3x{1} + 3x{2} + 3x{3} + 3x{4} + 2x{1,4} + 2x{3,4} + 3x{5} + 2x{1,5}
+2x{2,5} + 3x{6} + 2x{2,6} + 2x{3,6} + x{4,5,6} + 2x{1,2,3,4,5,6} . This finishes the proof of Theorem 4. 3.3. Low-weight polynomials for XOR of ANDs In this subsection, we give constructions of polynomials for ANDn (x1 , . . . , xn ) ⊕ ANDn (xn+1 , . . . , x2n ) of length < (2n − 1)2 for n ≥ 3, which is sufficient to prove Theorem 5. Our polynomials are of the form
⎛
p (x1 , x2 , . . . , x2n ) =
⎞
⎜
(i , j )∈[0,n]×[0,n]
p (i , j ) ⎜ ⎝
S : S ⊆U ,| S |=i T : T ⊆ V ,| T |= j
⎟
x S xT ⎟ ⎠,
(3)
where U = {1, . . . , n} and V = {n + 1, . . . , 2n}. The coefficients p (i , j ) are a bit different for n being even and odd. Lemma 9. For every even n ≥ 4, wt(ANDn ⊕ ANDn ) ≤ 4n2 − 5n + 1. Proof. Consider a polynomial p of the form Eq. (3) where
p (0,0) = (n − 2)2 − p (1,1) = 1, p (n,n) = −
n 2
n
2
−1 ,
+ 2,
p (0,1) = p (1,0) = n − 3, p (0,n−1) = p (n−1,0) = 1, p (0,n) = p (n,0) = −
n
2 p (n−1,n) = p (n,n−1) = −1,
+ 1,
and p (i , j ) = 0 otherwise. The weight of p is 4n2 − 5n + 1 as described in the statement of the lemma. By the symmetry of p, the value of p (x) only depends on the numbers of −1’s in the first and second half of x. We write p (u , v ) to denote the value of p for an input x ∈ {−1, 1}2n containing u and v −1’s in the first and the second half of x, respectively. An easy calculation shows that
p (u , v ) =
p (i , j ) G i (u )G j ( v ),
(i , j )∈[0,n]×[0,n]
where
G 0 (c ) = 1, G 1 (c ) = −2c + n, G n−1 (c ) = (−1)c (−2c + n), G n (c ) = (−1)c . What we should show is that (i) p (n, n) ≥ 1, (ii) p (u , n) ≤ −1 and p (n, v ) ≤ −1 for u , v ∈ [0, n − 1] and (iii) p (u , v ) ≥ 1 for u , v ∈ [0, n − 1]. All of them can be verified easily. (i) A direct calculation shows that p (n, n) = 9 ≥ 1. (ii) A simple calculation shows p (u , n) = 3((−1)u + 2(1 + u − n)), which is negative when u ∈ [0, n − 1]. The case p (n, v ) is by symmetry.
JID:YINCO AID:104439 /FLA
[m3G; v1.260; Prn:30/08/2019; 17:14] P.6 (1-11)
K. Amano, S. Tate / Information and Computation ••• (••••) ••••••
6
(iii) We divide into several subcases depending on the parities of u (iii-a) (u , v ) = (even, even). A direct calculation shows p (u , v ) {0, 2, 4, . . . , n − 2}. (iii-b) (u , v ) = (even, odd). We have p (u , v ) = (3 + 2u − 2n)(1 {0, 2, 4, . . . , n − 2} and v ∈ {1, 3, 5, . . . , n − 1}. (iii-c) (u , v ) = (odd, even). Same as (iii-c) by symmetry. (iii-d) (u , v ) = (even, odd). We have p (u , v ) = (1 + 2u − 2n)(1 + 2v calculations shows p (u , v ) ≥ 1 for u , v ∈ {1, 3, 5, . . . , n − 1}. 2
and v. = (3 + 2u − 2n)(3 + 2v − 2n) ≥ 1 when u , v ∈
+ 2v − 2n), which is strictly positive when u ∈ − 2n) + 4 + 8u + 8v − 10n. A bit tedious but simple
Lemma 10. For every odd n ≥ 7, wt(ANDn ⊕ ANDn ) ≤ 4n2 − 5n + 4. Proof. Consider a polynomial p of the form Eq. (3) where
p (0,0) = (n − 2)2 −
n+1 2
−3 ,
p (1,1) = 1, p (n,n) = −
n+1
2 p (0,1) = p (1,0) = n − 3,
+ 4,
p (0,n−1) = p (n−1,0) = −1, p (0,n) = p (n,0) =
n+1
2 p (n−1,n) = p (n,n−1) = −1,
,
and p (i , j ) = 0 otherwise. The weight of p is
4n2 −
11 2
n+
n + 1 + − 4 , 2 2
15
which is equal to 4n2 − 5n + 4 when n ≥ 7. The proof of the correctness of this polynomial is analogous to the proof of Lemma 9 and is left to the readers. 2 The remaining cases are n = 3 and 5, for which we give the constructions separately. Actual polynomials are shown in Appendix. Lemma 11. wt(AND3 ⊕ AND3 ) ≤ 21 and wt(AND5 ⊕ AND5 ) ≤ 57. 4. The XOR lemma for PTF weight In this section, we consider the weight and length of Boolean functions of the form ⊕k f for large k. We first introduce a parameter v ∗f for a Boolean function f .
The problem to find a minimum weight polynomial p (x) = S p S x S that sign-represents f can be represented by the following integer programming problem, which we call the problem P f .
Minimize: Subject to:
S ⊆[n]
| p S |, p S M f ,(x, S ) ≥ 1,
S ⊆[n]
p S ∈ Z,
(∀x ∈ {−1, 1}n ), (∀ S ⊆ [n]),
where M f ,(x, S ) := f (x)x for x ∈ {−1, 1} and S ⊆ [n]. The value of P f gives the weight of f and the optimal solution to P f gives the coefficients of such a polynomial. It is natural to consider the LP-relaxation of P f , denoted by P ∗f , in which the integral conditions on p S ’s are removed. Let v f denote the value of P f and let v ∗f denote the value of P ∗f . Interestingly, the problem P ∗f is also a relaxation of the one whose value is the spectral norm (a.k.a. Fourier L 1 -norm) of f . For S ⊆ [n], we define χ S : {−1, 1}n → {−1, 1} by S
χ S (x) =
i∈ S
xi .
n
JID:YINCO AID:104439 /FLA
[m3G; v1.260; Prn:30/08/2019; 17:14] P.7 (1-11)
K. Amano, S. Tate / Information and Computation ••• (••••) ••••••
In fact,
7
χ S (x) = x S in our notation. Every Boolean function f can be uniquely represented as
f (x) =
ˆf ( S )χ S (x).
(4)
S ⊆[n]
This expression is called the Fourier expansion of f , and ˆf ( S ) is called the Fourier coefficient of f on S. The spectral norm of f is defined as
|| ˆf ||1 =
| ˆf ( S )|.
S ⊆[n]
Multiplying f (x) to both sides of Eq. (4) and substituting
1=
χ S (x) by x S , we have
f (x) ˆf ( S )x S .
S ⊆[n]
Recalling that M f ,(x, S ) = f (x)x S , we see that the system
p S M f ,(x, S ) = 1 (∀x ∈ {−1, 1}n )
S ⊆[n]
has a unique solution p S = ˆf ( S ) (for S ⊆ [n]), and hence problem, which we denote Q f ,
Minimize: Subject to:
S ⊆[n]
S ⊆[n] | p S |
= || ˆf ||1 . This means that the value of the following
| p S |, p S M f ,(x, S ) = 1,
(∀x ∈ {−1, 1}n )
S ⊆[n]
gives the spectral norm of f . We can immediately see that the problem P ∗f is a relaxation of Q (“=”) in the constraints with inequality (“≥”). The following is now obvious. Fact 12. For every f , v ∗f ≤ v f = wt ( f ) and v ∗f ≤ || ˆf ||1 .
f
by replacing the equality
2
If we wish to remove the absolute symbols in the problem P f or P ∗f , we can achieve this by splitting each variable p S
− into two non-negative variables p + S and p S . Given the problem P f , we define the integer programming problem P f as follows:
Minimize:
S ⊆[n]
Subject to:
p+ S +
p− S,
S ⊆[n]
p+ S M f ,(x, S ) −
S ⊆[n]
− p+ S , p S ∈ Z≥0 ,
S ⊆[n]
p− S M f ,(x, S ) ≥ 1,
(∀x ∈ {−1, 1}n ), (∀ S ⊆ [n]).
The following fact shows that the problems P f and P f are essentially the same, i.e., the value of P f is v f and the value
of the LP-relaxation of P f is v ∗f .
− Fact 13. The optimal solution to P f satisfies that, for every S ⊆ [n], at least one of p + S and p S is 0. This is also true for the LP-relaxation
of P f .
α = min( p +S , p −S ) > 0 in the optimal solution to P f (or the LP-relaxation of P f ). Then by changing − − to p + S − α and p S to p S − α , the value of the objective function is decreased but still satisfy all the
Proof. Suppose that +
the value of p S constraints. 2
JID:YINCO AID:104439 /FLA
[m3G; v1.260; Prn:30/08/2019; 17:14] P.8 (1-11)
K. Amano, S. Tate / Information and Computation ••• (••••) ••••••
8
The dual of the LP-relaxation of P f is
Maximize:
ax ,
x∈{−1,1}n
Subject to: ax M f ,(x, S ) ≤ 1, (∀ S ⊆ [n]), x∈{−1,1}n a x ≥ 0, (∀x ∈ {−1, 1}n ). By LP duality, the value of the dual problem is also v ∗f . This can be restated as follows. Fact 14. There is a probability distribution μ on {−1, 1}n such that, for every S ⊆ [n],
Ex∼μ [ M f ,(x, S ) ] ≤ 1 . 2 ∗ vf
The following theorem shows that the gap between v f and v ∗f , which is in fact the integrality gap of P ∗f , is at most quadratic (when v ∗f is sufficiently large). An anonymous reviewer pointed out that 1/( v ∗f ) is essentially equivalent to the correlation of f with respect to H := {±x S | S ⊆ [n]} defined by Goldmann, Håstad and Razborov in [14, Definition 3] and the following theorem is proved by combining Lemma 4 and Theorem 10 in [14] (in which Lemma 4 is credited to Hajnal et al., [17] and Theorem 10 is credited to Freund [18]). Here we include the proof, which is essentially the same to the proof of Theorem 10 in [14], for completeness. Theorem 15. [14] For every Boolean function f on n variables, v f = wt( f ) ≤ 2n( v ∗f )2 . −
Proof. Let v ( p + S ) and v ( p S ) be the values of an optimal solution to the problem P f . Let D be the distribution on the
− ∗ ∗ family of all signed monomials {±x S | S ⊆ [n]} such that the probability of choosing x S and −x S is v ( p + S )/ v f and v ( p S )/ v f , respectively. Then, we have
Eh∈ D [ f (x)h(x)] ≥
1 v ∗f
.
Now let = 2n( v ∗f )2 . Consider independent copies h1 , . . . , h of the distribution D and denote their sum by H . By Chernoff bound (see, e.g., [19, Theorem 4.5-2]), we have that, for every fixed x,
Pr[ f (x) H (x) ≤ 0] < exp(−( v ∗f )−2 /2) < 2−n . Then, by the union bound, we get
Pr[sgn( H (x)) = f (x) for some x] < 1, which means that there exists a polynomial of length 2n( v ∗f )2 that sign-represents f . This completes the proof of Theorem 15. 2 We see next that the quadratic gap in Theorem 15 is tight for almost all functions. Fact 16. For almost all functions f , the gap between v f and v ∗f is quadratic. Proof. By Parseval’s theorem ( variables. This implies v ∗f
≤2
n /2
S ⊆[n]
ˆf ( S )2 = 1) and Cauchy-Schwartz inequality, we have || ˆf ||1 ≤ 2n/2 for every f on n
by Fact 12. On the other hand, Saks [13] proved that, for almost all Boolean functions f
on n variables, len( f ) ≥ (0.11)2n . The fact follows from v f = wt( f ) ≥ len( f ).
2
We are now ready to discuss the weight and length of the function ⊕k f . It seems that the parameters
wt⊕ ( f ) := lim
k→∞
len⊕ ( f ) := lim
k→∞
k k
wt(⊕k f ), len(⊕k f )
well characterize the complexity of f . Since wt(⊕i + j f ) ≤ wt(⊕i f )wt(⊕ j f ) and len(⊕i + j f ) ≤ len(⊕i f )len(⊕ j f ) for all i and j, these limits do exist by Fekete’s lemma.
JID:YINCO AID:104439 /FLA
[m3G; v1.260; Prn:30/08/2019; 17:14] P.9 (1-11)
K. Amano, S. Tate / Information and Computation ••• (••••) ••••••
9
For example, we know
2 ≤ len⊕ (x1 ∧ x2 ) ≤ log3 26 < 2.966, 2 ≤ wt⊕ (x1 ∧ x2 ) ≤ 3. The lower bounds follow from the bound of len(IP2n ) ≥ 2n by Bruck [5] and a trivial inequality len⊕ ( f ) ≤ wt⊕ ( f ). The upper bound on len⊕ (x1 ∧ x2 ) follows from Fact 7. Below we show that wt⊕ ( f ) is between v ∗f and ( v ∗f )2 . Theorem 17. For every Boolean function f , v ∗f ≤ wt⊕ ( f ) ≤ ( v ∗f )2 . Proof. Let n be the number of input variables of f . We first verify that v ∗⊕ f = ( v ∗f )k . Recall that v ∗⊕ f is the value of the k k ∗ , which is given by problem P ⊕ f k
Minimize:
S
Subject to:
|qS |, ⎛ qS ⎝
S
qS ∈ R
⎞ M f ,(xi , S i ) ⎠ ≥ 1,
i ∈[n]
(∀x = (x1 , . . . , xk ) ∈ {−1, 1}nk ), (∀S ⊆ [n]k ),
where the summations are over all S = ( S 1 , . . . , S k ) ⊆ [n]k . Let v ( p S )’s be the optimal solution to P ∗f . Then by setting
v ( p S i ), we have v ∗⊕ f ≤ ( v ∗f )k . k ∗ : Similarly, we can verify the lower bound v ∗⊕ f ≥ ( v ∗f )k by considering the dual of P ⊕ k kf
qS =
i ∈[n]
Maximize:
Subject to:
bx
x ⎛ ⎞ k ⎝ ⎠ b M x f ,(xi , S i ) ≤ 1, (∀S = ( S 1 , . . . , S k ) ⊆ [n] ), x i ∈[n]
b x ≥ 0,
(∀x ∈ {−1, 1}nk ),
where the summations are over all x = (x1 , . . . , xk ) ∈ {−1, 1}nk . By LP duality, the value of the above problem is v ∗⊕ f . Let μ be a probability distribution on {−1, 1}n in Fact 14, and k be the product distribution of k μ’s. Then, we have
μk
⎡ ⎤ 1 E ⎣ ⎦ M f ,(xi , S i ) ≤ ∗ , x∼μk ( v )k f i ∈[n]
which implies v ∗⊕ f ≥ ( v ∗f )k . k We now have
( v ∗f )k = v ∗⊕k f ≤ v ⊕k f = wt (⊕k f ) ≤ 2nk( v ∗⊕k f )2 = 2nk( v ∗f )2k . Here we use Theorem 15 to derive the last inequality. By taking k-th root, we have
v ∗f ≤ wt ⊕ ( f ) ≤ ( v ∗f )2 , since (2nk)1/k goes to 1 as k goes to infinity. This completes the proof of the theorem.
2
As P ∗f is an LP problem with 2n variables and 2n constraints, we can compute v ∗f for reasonable size of n, say n ∼ 10, by an LP-solver. Based on some computer experiments, it is very plausible that, for f = ANDn ,
3 − 2−(n−2) = || ˆf ||1 = v ∗f < v f = n + 1, and, for f = MAJn (= sgn( i ∈[n] xi ), n is odd),
(n + 1)/2 = v ∗f < v f = n. The formal proof of the above values of v ∗f could be provided by considering the dual of P ∗f .
(5)
JID:YINCO AID:104439 /FLA
[m3G; v1.260; Prn:30/08/2019; 17:14] P.10 (1-11)
K. Amano, S. Tate / Information and Computation ••• (••••) ••••••
10
For example, Ineq. (5) implies that 3 − 2−(n−2) ≤ wt⊕ (ANDn ) < 9 and len⊕ (ANDn ) < 9, which show that there is a large gap between wt⊕ (ANDn ) and wt(ANDn ) = 2n − 1 (len⊕ (ANDn ) and len(ANDn ) = n + 1, respectively.). For the lower bound on len⊕ (ANDn ), we only know 2 ≤ len⊕ (ANDn ) which follows from AND2 (x1 , x2 ) = ANDn (x1 , x2 , −1, . . . , −1). Currently, the relationship between three parameters v f , v ∗f and ˆf 1 seems mysterious. 5. Future works There are many interesting problems for future research. Below we list some of them.
• Can we obtain a good parameter which gives the lower bound on len⊕ ( f )? By the result of Bruck [5], we have 1/ ˆf ∞ ≤ len⊕ ( f ), where ˆf ∞ is the Fourier L ∞ -norm of f . However, this seems weak for non-bent functions. One may expect that v ∗f ≤ len⊕ ( f ), but this is far from the truth since it is known that there is a function f satisfying √ √ len⊕ ( f ) ≤ len( f ) ≤ n + 1 but v f = wt( f ) ≥ exp(( n)) and hence v ∗f ≥ exp(( n)) by Theorem 15 (see [14, Corollary 9]).
• Identify the functions satisfying wt ( f ⊕ f ) = wt( f )2 or len( f ⊕ f ) = len( f )2 . We can observe that the integer programming problem for representing wt( f )2 is given by a tensor power of the problem for wt( f ). (See e.g., [20] for a formal definition and discussions on the tensor power of integer programming problems.) Hence, the weight version of this problem is closely related to the problem to seek the (sufficient/necessary) conditions on an integer programming problem P satisfying v ( P ⊗ P ) = v ( P )2 , which would be interesting in its own right. • Is there ever a trade-off between weight and length? In other words, is there a function for which the optimum weight and length cannot be achieved simultaneously by the same sign-representation? • Give a constructive proof of the upper bound of Theorem 17. • Prove or disprove that wt(IP2n ) = 3n . Declaration of competing interest There is no competing interest. Acknowledgments The authors would like to thank anonymous reviewers for their careful reading and many constructive suggestions that significantly improved the paper. The first author also would like to thank an anonymous referee of LATA ’16 for pointing out an error in an earlier version of the proof of Theorem 8. This work was partially supported by KAKENHI No. 15K00006, 24106006 and 24500006. Appendix A We give the polynomials verifying Lemma 11, i.e., wt(AND3 ⊕ AND3 ) ≤ 21 and wt(AND5 ⊕ AND5 ) ≤ 57. The following polynomials p 3 and p 5 sign-represent x1 x2 ⊕ · · · ⊕ x2k−1 x2k for k = 3 and k = 5, respectively. Note that p 3 is shown to be optimal, whereas we could not verify the optimality of p 5 . We obtain these polynomials by using an IP solver.
p 3 (x)= 3 + x{2} + x{3} + x{1,2,3} + x{4} + x{3,4} + x{5} + x{3,5} + x{6}
+ x{2,6} + x{1,2,3,6} + x{1,2,4,6} + x{2,3,4,6} + x{4,5,6} + x{2,4,5,6} − x{1,2} − x{2,3} − x{4,6} − x{1,5,6} . p 5 (x)= 10 + x{1,2,3} + x{1,3,4} + x{5} + x{1,2,5} + x{2,3,5} + x{1,4,5}
+ x{3,4,5} + x{6,7,8} + x{1,2,4,6,7,8} + x{1,2,3,6,7,9} + x{6,8,9} + x{1,3,4,6,8,9} + x{2,3,4,7,8,9} + x{1,2,4,5,6,7,8,9} + x{10} + x{5,10} + x{6,7,10} + x{3,5,8,10} + x{7,8,10} + x{2,3,5,7,8,10} + x{6,9,10} + x{1,2,3,4,6,7,9,10} + x{8,9,10} + x{1,3,4,5,6,8,9,10} − x{1,2} − x{2,3} − x{1,2,3,4} − x{2,5} − x{1,2,3,5} − x{4,5} − x{1,2,4,5} − x{6,7} − x{3,4,5,6,7} − x{7,8} − x{1,4,5,7,8} − x{6,7,8,9} − x{4,5,10} − x{7,10} − x{1,2,3,5,6,7,10} − x{6,7,8,10} − x{1,2,5,6,7,8,10} − x{9,10} − x{5,9,10} − x{2,3,6,9,10} − x{6,7,9,10} − x{1,2,8,9,10} − x{2,5,8,9,10} .
JID:YINCO AID:104439 /FLA
[m3G; v1.260; Prn:30/08/2019; 17:14] P.11 (1-11)
K. Amano, S. Tate / Information and Computation ••• (••••) ••••••
11
References [1] R. O’Donnell, R.A. Servedio, New degree bounds for polynomial threshold functions, Combinatorica 30 (2010) 327–358. [2] C.E. Sezener, E. Oztop, Minimal sign representation of boolean functions: algorithms and exact results for low dimensions, Neural Comput. 27 (2015) 1796–1823. [3] K. Amano, New upper bounds on the average PTF density of boolean functions, in: O. Cheong, K. Chwa, K. Park (Eds.), ISAAC 2010, Part I, in: LNCS, vol. 6506, Springer, 2010, pp. 304–315. [4] R. Beigel, The polynomial method in circuit complexity, in: Proc. of 8th Conf. on Structure in Complexity Theory, 1993, pp. 82–95. [5] J. Bruck, Harmonic analysis of polynomial threshold functions, SIAM J. Discrete Math. 3 (1990) 168–177. [6] A.R. Klivans, R. O’Donnell, R.A. Servedio, Learning intersections and thresholds of halfspaces, Sov. J. Comput. Syst. Sci. 68 (2004) 808–840. [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]
˜
1/ 3
A.R. Klivans, R.A. Servedio, Learning DNF in time 2 O (n ) , J. Comput. Syst. Sci. 68 (2004) 303–318. A.R. Klivans, A.A. Sherstov, Unconditional lower bounds for learning intersections of halfspaces, Mach. Learn. 69 (2007) 97–114. A.A. Sherstov, The intersection of two halfspaces has high threshold degree, SIAM J. Comput. 42 (2013) 2329–2374. T. Hayasaka, K. Amano, Improved upper bounds on PTF density of boolean functions (in Japanese), in: Proceedings of the IEICE General Conference, Institute of Electronics, Information and Communication Engineers, 2011, S-5–S-6. E. Oztop, An upper bound on the minimum number of monomials required to separate dichotomies of {−1, 1}n , Neural Comput. 18 (2006) 3119–3138. R. O’Donnell, R.A. Servedio, Extremal properties of polynomial threshold functions, J. Comput. Syst. Sci. 74 (2008) 298–312. M.E. Saks, Slicing the Hypercubes, Surveys in Combinatorics, 1993, pp. 211–255. M. Goldmann, J. Håstad, A.A. Razborov, Majority gates vs. general weighted threshold gates, Comput. Complex. 2 (1992) 277–300. R. O’Donnell, Open problems in analysis of boolean functions, arXiv:1204.6447, 2012. Gurobi Optimization Inc., Gurobi optimizer, http://www.gurobi.com, 2016. A. Hajnal, W. Maass, P. Pudlák, M. Szegedy, G. Turán, Threshold circuits of bounded depth, J. Comput. Syst. Sci. 46 (1993) 129–154. Y. Freund, Boosting a weak learning algorithm by majority, Inf. Comput. 121 (1995) 256–285. M. Mitzenmacher, E. Upfal, Probability and Computing, Cambridge University Press, 2005. R. Pemantle, J.G. Propp, D. Ullman, On tensor powers of integer programs, SIAM J. Discrete Math. 5 (1992) 127–143.