A deterministic annealing algorithm for approximating a solution of the min-bisection problem

A deterministic annealing algorithm for approximating a solution of the min-bisection problem

Neural Networks 22 (2009) 58–66 Contents lists available at ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet A determ...

2MB Sizes 2 Downloads 93 Views

Neural Networks 22 (2009) 58–66

Contents lists available at ScienceDirect

Neural Networks journal homepage: www.elsevier.com/locate/neunet

A deterministic annealing algorithm for approximating a solution of the min-bisection problem Chuangyin Dang a,∗ , Wei Ma a , Jiye Liang b a

Department of Manufacturing Engineering and Engineering Management, City University of Hong Kong, Kowloon, Hong Kong

b

School of Computer and Information Technology, Shanxi University, Taiyuan, China

article

info

Article history: Received 25 April 2008 Received in revised form 12 September 2008 Accepted 17 September 2008 Keywords: Log-cos barrier function Descent direction Deterministic annealing Iterative algorithm

a b s t r a c t The min-bisection problem is an NP-hard combinatorial optimization problem. In this paper an equivalent linearly constrained continuous optimization problem is formulated and an algorithm is proposed for approximating its solution. The algorithm is derived from the introduction of a logarithmic-cosine barrier function, where the barrier parameter behaves as temperature in an annealing procedure and decreases from a sufficiently large positive number to zero. The algorithm searches for a better solution in a feasible descent direction, which has a desired property that lower and upper bounds are always satisfied automatically if the step length is a number between zero and one. We prove that the algorithm converges to at least a local minimum point of the problem if a local minimum point of the barrier problem is generated for a sequence of descending values of the barrier parameter with a limit of zero. Numerical results show that the algorithm is much more efficient than two of the best existing heuristic methods for the min-bisection problem, Kernighan–Lin method with multiple starting points (MSKL) and multilevel graph partitioning scheme (MLGP). © 2008 Elsevier Ltd. All rights reserved.

1. Introduction Consider an undirected graph G = (V , E ), where V = {1, 2, . . . , n} is the node set of G and E the edge set of G. We denote by (i, j) an edge between nodes i and j. Let   0 w12 · · · w1n 0 · · · w2n  w21 W = .. ..  ..   .. . . . . wn1 wn2 · · · 0 be a given symmetric weight matrix such that wij > 0 if (i, j) ∈ E and wij = 0 if (i, j) 6∈ E. Assume that G has an even number of nodes. The min-bisection problem is to partition V into two sets, S and V \ S, of equal cardinality such that

w(S ) =

X

wij

i∈S ,j∈V \S

is minimized. This problem is an NP-hard problem (Murty & Kabadi, 1987) and has many applications. A variant of it called the bisection problem with k-resource sets can be found in Ishii, Iwata, and Nagamochi (2007). Due to its computational complexity, the



Corresponding author. Tel.: +852 27888429; fax: +852 27888423. E-mail address: [email protected] (C. Dang).

0893-6080/$ – see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.neunet.2008.09.008

min-bisection problem is very difficult to solve to optimality with an exact algorithm. The min-bisection problem is to partition a graph into p = 2 parts of equal cardinality, a natural generalization of which is the case when p ≥ 2, yielding the graph partitioning problem. This problem has attracted much attention in recent years, because of its extensive applications in such areas as scientific computing, VLSI design, etc. According to Karypis and Kumar (1998), algorithms for graph partitioning are of three major types, which are spectral partitioning methods (Amin, 2005), geometric partitioning algorithms, and multilevel graph partitioning schemes with three phases: coarsening, partitioning of the coarsest graph, and refining (Hendrickson & Leland, 1993). During the past few decades, meta-heuristics have become popular, among which are simulated annealing (SA) and evolutionary algorithms (EAs), two most remarkable algorithms. An application of meta-heuristic algorithms to the graph partitioning problem can be found in many literatures. For details, please refer to Bui and Moon (1996), Gil, Ortega, Diaz, and Montoya (1998), Jerrum and Sorkin (1998) and Soper, Walshaw, and Cross (2004). In addition, some software packages such as JOSTLE, METIS, CHACO are available. In Loiola, Abreu, Boaventura-Netto, Hahn, and Querido (2007), the graph partitioning problem was formulated as a special case of the quadratic assignment problem (QAP). Furthermore, with its speciality in mind, the min-bisection problem has enjoyed some privileges, theoretical and algorithmic. For details, they are referred to Saab and Rao (1992) and Feig and Kuauthgamer (2006).

C. Dang et al. / Neural Networks 22 (2009) 58–66

Neural Networks, since their emergence, have experienced significant advances in both theory and applications, especially in optimization, among which combinatorial optimization, due to Hopfield and Tank (1985), has become a popular topic in the literature of neural computation. Although initial results were disappointing, modified network dynamics and better problem mapping contribute significantly to solution quality (Gee, Aiyer, & Prager, 1993). Peterson and Soderberg (1989) mapped the graph partition problem onto a neural network with the graded neurons encoding, which can reduce the solution space by one dimension. Gee et al. (1993) presented a problem mapping evaluation method without recourse to purely experimental means. Gee and Prager (1994) proposed a rigorous mapping for quadratic 0–1 programming problems with linear equality and inequality constraints. After transforming variables with exponential functions, Urahama (1996) presented an analog solver for nonlinear programming problems with linear constraints. A feasible solution construction mechanism was introduced in Horio, Ikeguchi, and Aihara (2005) to improve the performance of the Hopfield-type chaotic neuro-computer system for quadratic assignment problems (QAPs). Bout and Miller (1990) and Wu (2004) applied the mean field annealing (MFA) algorithm for a solution of the graph bisection problem. Waugh and Westervelt (1993) introduced a neural network architecture that is applicable in optimization. By combining deterministic annealing, selfamplification, algebraic transformations, clocked objectives, and softassign, an optimizing network architecture was constructed in Rangarajan, Gold, and Mjolsness (1996). A Hybrid of Lagrange and transformation approaches (Hybrid LT) was proposed in Xu (1994) for solving combinatorial optimization problems whose constraints were separated into linear-constant-sum constraints and binary constraints and they were, respectively, treated by Lagrange approach and penalty or barrier functions. Furthermore, special network models were constructed for the traveling salesman problem (Aiyer, Niranjan, & Fallside, 1990; Dang & Xu, 2001; Durbin & Willshaw, 1987; Wacholder, Han, & Mann, 1989; Wolfe, Parry, & MacMillan, 1994). Statistical mechanics as the underlying theory of optimization neural networks was studied in Simic (1990). A systematic investigation of such neural computational models for combinatorial optimization can be found in Berg (1996) and Cichocki and Unbehaunen (1993). Most of these algorithms are of deterministic annealing type, which is a heuristic continuation method that attempts to find the global minimum of the effective energy at a high temperature and track it as the temperature decreases. There is no guarantee that the minimum at a high temperature can always be tracked to the minimum at a low temperature, but the experimental results are encouraging (Yuille & Kosowsky, 1994). In this paper we adapt the idea of deterministic annealing for approximating a solution of the min-bisection problem. An equivalent linearly constrained continuous optimization problem is formulated and an algorithm is proposed for approximating its solution. The algorithm is derived from the introduction of a logarithmic-cosine barrier function, where the barrier parameter behaves as temperature in an annealing procedure and decreases to zero from a sufficiently large positive number satisfying that the barrier function is convex. The algorithm searches for a better solution in a feasible descent direction, which has a desired property that lower and upper bounds are always satisfied automatically if the step length is a number between zero and one. We prove that the algorithm converges to at least a local minimum point of the problem if a local minimum point of the barrier problem is generated for a sequence of descending values of the barrier parameter with a limit of zero. The main differences between the proposed algorithm and existing neural computational models are the barrier function and the feasible descent direction. Numerical results show

59

that the algorithm is much more efficient than MSKL and MLGP, two best existing heuristic methods for the min-bisection or graph partitioning problem. The rest of this paper is organized as follows. In Section 2 we formulate an equivalent linearly constrained continuous optimization problem to the min-bisection problem, introduce a logarithmic-cosine barrier function, and derive several important theoretical results. In Section 3 we describe a deterministic annealing algorithm for approximating a solution of the minbisection problem. In Section 4 we present some numerical results to show that the algorithm is effective and efficient. Finally, we conclude the paper with some remarks in Section 5. 2. A logarithmic-cosine barrier function It is clear that the min-bisection problem is equivalent to min subject to

n n 1 XX

4 i=1 j=1 n X

(1 − xi xj )wij (1)

xi = 0,

i =1

xi ∈ {−1, 1},

i = 1, 2, . . . , n.

Let

ξi = max wji 1≤j≤n

for i = 1, 2, . . . , n, and ξ = (ξ1 , ξ2 , . . . , ξn )> . Then the minbisection problem is equivalent to 1 min f (x) = − x> (W + Ξ + α I )x 2 subject to

n X

(2)

xi = 0,

i =1

xi ∈ {−1, 1},

i = 1, 2, . . . , n,

where Ξ is the diagonal matrix formed by the components of ξ , α any given positive number, and I an n × n identity matrix. A continuous relaxation of (2) yields 1 min f (x) = − x> (W + Ξ + α I )x 2 subject to

n X

(3)

xi = 0,

i =1

−1 ≤ xi ≤ 1,

i = 1, 2, . . . , n.

Let b(x) = −

n X

ln cos

π 

i=1

2

xi ,

which will be used as a barrier term to incorporate −1 ≤ xi ≤ 1, i = 1, 2, . . . , n, into the objective function. For any given positive number β , consider min h(x; β) = f (x) + β b(x) subject to

n X

xi = 0.

i =1

Let F = {x |

Pn

i=1

xi = 0},

B = {x | −1 ≤ xi ≤ 1, i = 1, 2, . . . , n}, and int(B) = {x | −1 < xi < 1, i = 1, 2, . . . , n}. Note that

π  ∂ b(x) π = tan xi . ∂ xi 2 2

(4)

60

C. Dang et al. / Neural Networks 22 (2009) 58–66

Thus, lim

xi →(−1)+

It follows that for any k,

∂ b(x) lim = +∞. − ∂ xi xi →1

∂ b(x) = −∞ and ∂ xi

f (x∗ ) +  + βk b(¯x) ≥ f (¯x) + βk b(¯x) ≥ f (x(βk )) + βk b(x(βk ))

= h(x(βk ); βk ).

Due to the boundedness of ∇ f (x) = −(W + Ξ + α I )x on B, it follows that x∗ ∈ int(B) if x∗ is a local or global minimum point of (4). Pn Let e = (1, 1, . . . , 1)> ∈ Rn . Then i=1 xi = 0 can be written as

Then, lim h(x(βk ); βk ) ≤ f (x∗ ) + .

k→∞

From (5), we obtain lim h(x(βk ); βk ) ≥ f (x∗ ).

e> x = 0.

k→∞

Let

Thus,

Pe = I − e(e> e)−1 e> = I −

1 n

lim h(x(βk ); βk ) = f (x∗ ).

ee> .

k→∞

Then the orthogonal projection of any point y ∈ Rn onto F is given by Pe y. Thus, if x∗ is a minimum point of (4), from the first-order and second-order necessary optimality conditions, we obtain that Pe ∇ h(x∗ ; β) = 0 and that Pe ∇ 2 h(x∗ ; β)Pe is positive semidefinite, where ∇ 2 h(x∗ ; β) is the Hessian matrix of h(x; β) at x = x∗ . Let βk , k = 1, 2, . . . , be any sequence of positive numbers such that

β1 > β2 > · · ·

Observe that limk→∞ βk b(x(βk )) = 0. Therefore, lim f (x(βk )) = f (x∗ ).

k→∞

Let x(βkj ), j = 1, 2, . . . , be a convergent subsequence of x(βk ), k = 1, 2, . . .. Assume limj→∞ x(βkj ) = v ∗ . Then f (v ∗ ) = f (x∗ ). In the following we show that every component of v ∗ is equal to either −1 or 1. The Hessian matrix of h(x; β) is given by

∇ 2 h(x; β) = −(W + Ξ + α I ) + β D(x), where 1



and limk→∞ βk = 0. For k = 1, 2, . . . , let x(βk ) be a global minimum point of (4) with β = βk , i.e. x(βk ) = argmin{h(x; βk ) | x ∈ F }. Theorem 1. For k = 1, 2, . . . ,



π

 ( π cos( 2 x1 ))     D(x) =      2

2

1

( π2 cos( π2 x2 ))2

..

1

( π2 cos( π2 xn ))2

Since x(βkj ) is a minimum point, hence,

f (x(βk )) ≥ f (x(βk+1 )), and every limit point of x(βk ), k = 1, 2, . . . , is an optimal solution of (2). Proof. Let x∗ be a global minimum point of (3). Then f (x∗ ) ≤ f (x) for any x ∈ F ∩ B. Clearly, 0 ≤ b(x) for any x ∈ int(B). By the definitions of x(βk ) and x(βk+1 ), we have

Pe (−(W + Ξ + α I ) + βkj D(x(βkj )))Pe is positive semidefinite. Suppose that v ∗ has a component equal to neither −1 nor 1. Without loss of generality, we assume that v1∗ is such a component. Then −1 < v1∗ < 1. For i = 2, 3, . . . , n, let yi = (1, 0, . . . , 0, −1, 0, . . . , 0)> ,

f (x(βk )) + βk b(x(βk )) ≤ f (x(βk+1 )) + βk b(x(βk+1 ))

where −1 is the ith component of yi . Then,

and

Pe y i =

f (x(βk+1 )) + βk+1 b(x(βk+1 )) ≤ f (x(βk )) + βk+1 b(x(βk )).

I−

1 n

ee>



yi = yi .

0 ≤ (yi )> Pe (−(W + Ξ + α I ) + βkj D(x(βkj )))Pe yi

(βk − βk+1 )b(x(βk )) ≤ (βk − βk+1 )b(x(βk+1 )).

= −(yi )> (W + Ξ + α I )yi + βkj (yi )> D(x(βkj ))yi = −(ξ1 + ξi + 2α − 2w1i )

Thus, b(x(βk )) ≤ b(x(βk+1 ))

+ βkj

since βk > βk+1 . Therefore, f (x(βk )) ≥ f (x(βk+1 )). For any k, we can write (5)

Note that for any  > 0, there exists an interior point x¯ ∈ F ∩ B such that f (¯x) ≤ f (x∗ ) + .



Thus,

Then,

f (x∗ ) ≤ f (x(βk )) ≤ f (x(βk )) + βk b(x(βk )) = h(x(βk ); βk ).

.

     .    

1

( π2 cos( π2 x1 (βkj )))2

+

1

( π2 cos( π2 xi (βkj )))2

! .

(6)

Since α > 0, hence, ξ1 + ξi + 2α − 2w1i > 0. We derive from (6) that as j → ∞, xi (βkj ) must approach either −1 or 1 since βkj goes to zero and −1 < v1∗ < 1. Hence, vi∗ is equal to either −1 or Pn ∗ ∗ 1 for i = 2, 3, . . . , n. From i=1 vi = 0, we obtain that v1 must be equal to either −1 or 1. A contradiction occurs. Therefore, every component of v ∗ is equal to either −1 or 1, which means that v ∗ is a vertex of B. Hence, v ∗ is an optimal solution of (2). The theorem follows. 

C. Dang et al. / Neural Networks 22 (2009) 58–66

This theorem shows that (3) has a global minimum point with every component equal to either −1 or 1. Furthermore, this theorem implies that finding an optimal solution of (2) is equivalent to finding a global minimum point of (3) with every component equal to either −1 or 1. Since (3) is also an NP-hard problem, instead of solving it directly for a minimum point with every component equal to either −1 or 1, let us consider a scheme, which obtains a solution of (3) with every component equal to either −1 or 1 from the solution of (4) at the limit of β ↓ 0. We noted that in the proof of Theorem 1, to derive that vi∗ is equal to either −1 or 1 for i = 1, 2, . . . , n, we need the condition of ξ1 + ξi + 2α − 2w1i > 0 for i ≥ 2. If ξ1 + ξi − 2w1i > 0 is always true for any i ≥ 2, then α can be zero; otherwise, α needs to be positive. About the case of ξ1 + ξi + 2α − 2w1i = 0 for some i ≥ 2, we do not know whether it is still true that vi∗ is equal to either −1 or 1 for i = 1, 2, . . . , n. >

Theorem 2. For any vertex y of B with e y = 0, assume that ∇ f (y) 6= 0 and ∇ f (y) is not a vector of the one-dimensional subspace spanned by e. For k = 1, 2, . . . , let xk be a local minimum point of (4) with β = βk . Then every limit point of xk , k = 1, 2, . . . , is at least a local minimum point of (3) with every component equal to either −1 or 1. Proof. Since xk , k = 1, 2, . . . , are contained in the bounded set B, we can extract a convergent subsequence. Let xkq , q = 1, 2, . . . , be a convergent subsequence of xk , k = 1, 2, . . .. Assume limq→∞ xkq = v ∗ . In the same way as in the proof of Theorem 1, one can easily derive that v ∗ is a vertex of B. Since xkq is a local minimum point of (4) with β = βkq , from the first-order necessary optimality condition, we obtain

−(W + Ξ + α I )xkq = λkq e − βkq π π 2

tan

2



tan

2 kq

x2





,...,

2

kq

x1

π 2



tan

, π 2

kq

xn

2

π 2

tan

2

kq

xn

2

2

2

(7)

Let x be an arbitrary interior point of F ∩ B. Then,

2

kq

xi



,

i = 1, 2, . . . , n, is not equal to zero. Therefore, at least one of

−(xi − vi∗ ) lim βkq q→∞

π 2

tan

π 2

kq

xi



,

i = 1, 2, . . . , n, is positive, and all of them are not negative. Hence,

−(x − v ∗ )> (W + Ξ + α I )v ∗ = lim −(x − xkq )> (W + Ξ + α I )xkq q→∞

=−

n X i=1

kq

lim βkq (xi − xi )

q→∞

π 2

tan

X π (xi − vi∗ ) lim βkq tan n

=−

i=1

q→∞

2

π 2

π 2

kq

xi

kq

xi





> 0. Now we have obtained that for any interior point x of F ∩ B, 0 < −(x − v ∗ )> (W + Ξ + α I )v ∗ .

(8)

Observe f (x) − f (v ∗ ) 1 1 = − x> (W + Ξ + α I )x + (v ∗ )> (W + Ξ + α I )v ∗ 2 2 = −(x − v ∗ )> (W + Ξ + α I )v ∗ 1

− (x − v ∗ )> (W + Ξ + α I )(x − v ∗ ).

(9)

This theorem means that at least a local minimum point of (3) with every component equal to either −1 or 1 can be obtained if we are able to generate a local minimum point of (4) for a sequence of descending values of β with a limit of zero. Our algorithm for approximating a solution of the min-bisection problem is stimulated from these results.

kq

2

i=1 n

X



3. A deterministic annealing algorithm

−(x − x ) (W + Ξ + α I )x n π  X k π kq = λkq e> (x − xkq ) − βkq (xi − xi q ) tan xi = −βkq

2

tan

since − 12 (x − v ∗ )> (W + Ξ + α I )(x − v ∗ ) goes to zero two times as fast as −(x − v ∗ )> (W + Ξ + α I )v ∗ if x approaches v ∗ . This implies that v ∗ is a local minimum point of (3). The theorem follows. 

.

kq >

q→∞

π

f (x) − f (v ∗ ) > 0

,

= lim −(W + Ξ + α I )xkq q→∞ π  π π  π kq kq = lim λkq e − βkq tan x1 , tan x2 , . . . , >

lim −βkq

Then, when x is sufficiently close to v ∗ , using (8) and (9), we obtain

>

0 6= −(W + Ξ + α I )v ∗



From (7) and the assumption that −(W + Ξ + α I )v ∗ − λe 6= 0 for any λ, we obtain that at least one of

2

where λkq is a Lagrange multiplier. Hence,

q→∞

61

k

(xi − xi q )

π

i=1

tan

2

π 2

kq

xi



In this section we develop an algorithm for approximating a solution of (2) by attempting to obtain a minimum point of (3) with every component equal to −1 or 1 from the solution of (4) at the limit of β ↓ 0. Let

2

.

L(x, λ) = h(x; β) − λe> x. kq

Consider vi∗ = −1. We have xi − vi∗ > 0 and limq→∞ xi Thus, when q is sufficiently large,

−βkq (xi − vi∗ )

π 2

tan

π 2

kq

xi



= −1.

> 0.

∇x L(x, λ) = ∇ h(x; β) − λe = 0, kq

Consider vi∗ = 1. We have xi − vi∗ < 0 and limq→∞ xi = 1. Thus, when q is sufficiently large,

−βkq (xi − vi∗ )

π 2

tan

π 2

kq

xi



> 0.

For any given barrier parameter β > 0, the first-order necessary optimality condition for (4) says that if x is a minimum point of (4) then there exists a Lagrange multiplier λ satisfying e> x = 0. From

π  ∂ L(x, λ) ∂ f (x) π = − λ + β tan xi = 0, ∂ xi ∂ xi 2 2

(10)

62

C. Dang et al. / Neural Networks 22 (2009) 58–66

∂ f (x)

we derive xi = −



2

arctan

π

2



βπ

Clearly, a solution of (13) must be a number between min1≤i≤n ∂ x i ∂ f (x) and max1≤i≤n ∂ x since s(λ) is continuous,

 ∂ f (x) −λ , ∂ xi

i

   n X 2 ∂ f (x) 2 arctan −λ <0 s(λ) = − π βπ ∂ xi i =1

i = 1, . . . , n. Let di (x, λ) = −



2

arctan

π

2

βπ



∂ f ( x) −λ ∂ xi



∂ f (x)

,

for any λ < min1≤i≤n ∂ x , and i

i = 1, . . . , n, and d(x, λ) = (d1 (x, λ), d2 (x, λ), . . . , dn (x, λ))> . The following lemma shows that d(x, λ) − x is a descent direction of L(x, λ) if x ∈ int(B). Lemma 1. Assume that x ∈ int(B).

• If di (x, λ) − xi < 0, then • If di (x, λ) − xi > 0, then • If di (x, λ) − xi = 0, then

∂ L(x,λ) ∂ xi ∂ L(x,λ) ∂ xi ∂ L(x,λ) ∂ xi

> 0. < 0. = 0.

• If d(x, λ) − x 6= 0, then ∇x L(x, λ)> (d(x, λ) − x) < 0. • If e> (d(x, λ) − x) = 0 and d(x, λ) − x 6= 0, then

∂ L(x,λ)

Proof. We only need to prove that ∂ x > 0 if di (x, λ) − xi < 0. i The rest can be obtained similarly. From di (x, λ)− xi < 0, we obtain 2

π

 arctan

2



βπ

 ∂ f (x) −λ < xi . ∂ xi

Then,

 arctan

2

βπ



 ∂ f (x) π −λ > − xi . ∂ xi 2

(11)

Note that tan(y) = − tan(−y) and tan(y) is a monotonically increasing function on (− π2 , π2 ). Thus, applying tan to both sides of (11), we get 2

βπ



 π  ∂ f ( x) − λ > − tan xi , ∂ xi 2

and multiplying

βπ 2

(12)

to both sides of (12), we get

π  ∂ f (x) π − λ > −β tan xi . ∂ xi 2 2 Therefore, when di (x, λ) − xi < 0,

π  ∂ L(x, λ) ∂ f (x) π = − λ + β tan xi > 0. ∂ xi ∂ xi 2 2 The lemma follows.



Note that for any x ∈ int(B), d(x, λ) − x = 0 if and only if ∇x L(x, λ) = 0. We remark that d(x, λ) − x has a desired property that when searching for a point in the direction d(x, λ) − x, the constraint {−1 < xi < 1, i = 1, 2, . . . , n} is always satisfied automatically if the step length is a number between zero and one. Let s(λ) =

n X i=1

   n X 2 2 ∂ f (x) di (x, λ) = − arctan −λ . π βπ ∂ xi i =1

ak−1 + bk−1 2

,

¯ < 0, let ak = λ¯ and bk = bk−1 , and otherwise, let and if s(λ) ¯ . The bisection method terminates as soon as ak = ak−1 and bk = λ ¯ is smaller than a given tolerance. |s(λ)| For any x ∈ B, let λ(x) be the corresponding solution of s(λ) = 0. Since {∇ f (x) | x ∈ B} is bounded, hence, {λ(x) | x ∈ B} is bounded. Based on the feasible descent direction, d(x, λ(x)) − x, and the bisection method for updating Lagrange multiplier λ, we have developed an algorithm (DAA) for approximating a solution of the problem (2). The idea of the algorithm is as follows: Let βq , q = 1, 2, . . . , be any given sequence of positive numbers such that β1 > β2 > · · · and limq→∞ βq = 0. The value of β1 should be sufficiently large so that h(x; β1 ) is convex over the set int(B). Let x0 be an arbitrary nonzero interior point of B satisfying e> x0 = 0. For q = 1, 2, . . . , starting at xq−1 , we employ the feasible descent direction d(x, λ(x)) − x to search for a better feasible interior point xq satisfying d(xq , λ(xq )) − xq = 0. The algorithm can be described as follows. Initialization: Let β0 be a sufficiently large positive number such that h(x; β0 ) is convex (e.g., β0 = 1 − 4smin /π 2 with smin being the minimum eigenvalue of −(W + Ξ + α I )). Take x∗,0 to be an arbitrary nonzero feasible interior point (e.g., x∗,0 = (0.5, . . . , 0.5, −0.5, . . . , −0.5)> ). Choose θ ∈ (0, 1), which, in general, should be close to one (e.g., θ = 0.95). Let x0 = x∗,0 , q = 0, and k = 0. Go to Step 1. Step 1: Given x = xk , use the bisection method to find a Lagrange multiplier λ(xk ) satisfying s(λ(xk )) = 0 (an approximate solution is sufficient in the implementation of the algorithm, e.g. |s(λ(xk ))| < tolb with tolb being a given tolerance). Go to Step 2. Step 2: Compute di (xk , λ(xk )) = −

Given any point x ∈ F ∩ int(B), for d(x, λ) − x to become a feasible descent direction of (4), we need to solve for λ the following equation s(λ) = 0.

∂ f (x)

for any λ > max1≤i≤n ∂ x . i ∂ f (x) ∂ f (x) To compute a solution of (13) in [min1≤i≤n ∂ x , max1≤i≤n ∂ x ], i i we use the simple bisection method (Minoux, 1986), which is as follows. ∂ f (x) ∂ f (x) Let a0 = min1≤i≤n ∂ x and b0 = max1≤i≤n ∂ x . For k = i i 1, 2, . . . , compute

λ¯ =

∇ h(x; β)> (d(x, λ) − x) < 0.



   n X 2 2 ∂ f (x) s(λ) = − arctan −λ >0 π βπ ∂ xi i =1

(13)

2

π

 arctan

2

βπ



∂ f (xk ) − λ(xk ) ∂ xi



,

i = 1, 2, . . . , n. If kd(xk , λ(xk )) − xk k < tola with tola being a given tolerance, the algorithm terminates when β is sufficiently small so that a feasible solution with every component equal to either −1 or 1 can be generated by rounding off xk , or let βq+1 = θ βq , x∗,q+1 = xk , x0 = xk ,

C. Dang et al. / Neural Networks 22 (2009) 58–66

q = q + 1, and k = 0, and go to Step 1. Otherwise, compute xk+1 = xk + µk (d(xk , λ(xk )) − xk ),

(14)

where µk ∈ [0, 1] satisfying that

Let k = k + 1 and go to Step 1.

and as q → ∞,

µ∗q → µ ¯∗ =

min h(xk + µ(d(xk , λ(xk )) − xk ); βq )

µ∈[0,1]

is not required in the implementation of the algorithm, and an approximate solution will do. One can find several ways to determine µk in Minoux (1986). We remark that the algorithm is insensitive to the starting point since h(x; β) is convex at the beginning of the algorithm. From the algorithm, one can see xk ∈ int(B) and e> xk = 0, k = 1, 2, . . .. The following theorem shows that when β is a given positive number, the iterative procedure (14) generates at least a stationary point of (4). Theorem 3. For β = βq , every limit point of x , k = 1, 2, . . . , generated by (14) is a stationary point of (4). k

Proof. Let

∂ f (x) amax = max max − λ(x) . 1≤i≤n x∈B ∂ xi



We define xmin = −

2

π

 arctan

2

βπ

 amax

e

k¯y − x¯ k kd(¯x, λ(¯x)) − x¯ k

with µ ¯ ∗ ∈ [0, 1]. Therefore, y¯ = x¯ + µ ¯ ∗ (d(¯x, λ(¯x)) − x¯ ). q q Furthermore, since y ∈ A(x ), we have h(yq ; β) ≤ h(xq + µ(d(xq , λ(xq )) − xq ); β) for any µ ∈ [0, 1]. It implies that h(¯y; β) ≤ h(¯x + µ(d(¯x, λ(¯x)) − x¯ ); β) for any µ ∈ [0, 1], which proves that h(¯y; β) = min h(¯x + µ(d(¯x, λ(¯x)) − x¯ ); β). µ∈[0,1]

According to the definition of A(x), it follows that y¯ ∈ A(¯x). Since X is bounded and xk ∈ X , k = 1, 2, . . . , we can extract a convergent subsequence from the sequence, xk , k = 1, 2, . . .. Let xkj , j = 1, 2, . . . , be a convergent subsequence of the sequence, xk , k = 1, 2, . . .. Let x∗ be the limit point of the subsequence. We show x∗ ∈ Ω in the following. Clearly, as k → ∞, h(xk ; β) converges to h(x∗ ; β) since h(x; β) is continuous and h(xk+1 ; β) < h(xk ; β), k = 1, 2, . . .. Consider the sequence, xkj +1 , j = 1, 2, . . .. Note that xkj +1 = xkj + µkj (d(xkj , λ(xkj )) − xkj ) and h(xkj +1 ; β) = min h(xkj + µ(d(xkj , λ(xkj )) − xkj ); β).

and x

kyq − xq k , kd(xq , λ(xq )) − xq k

µ∗q =

Note that an exact solution of

max

have d(xq , λ(xq )) − xq 6= 0 and d(¯x, λ(¯x)) − x¯ 6= 0. Observe that d(x, λ(x)) is continuous. Thus, d(xq , λ(xq )) converges to d(¯x, λ(¯x)) as q → ∞. Since yq ∈ A(xq ), hence, there is some number µ∗q ∈ [0, 1] satisfying yq = xq + µ∗q (d(xq , λ(xq )) − xq ). From d(xq , λ(xq )) − xq 6= 0 we obtain that

h(xk+1 ; βq ) = min h(xk + µ(d(xk , λ(xk )) − xk ); βq ). µ∈[0,1]

63

µ∈[0,1]

=

2

π

 arctan

2

βπ

 amax

e,

where e = (1, 1, . . . , 1)> ∈ Rn . Since {∇ f (x) | x ∈ B} and {λ(x) | x ∈ B} are bounded, hence, amax is bounded. Thus,

−1 < xmin ≤ di (x, λ(x)) ≤ xmax < 1, i i

i = 1, 2, . . . , n

for any x ∈ B, where d(x, λ(x)) = (d1 (x, λ(x)), d2 (x, λ(x)), . . . , dn (x, λ(x))). Therefore, no limit of xki , k = 1, 2, . . . , is equal to either −1 or 1 for i = 1, 2, . . . , n. From Lemma 1, we obtain that d(xk , λ(xk )) − xk is a feasible descent direction of (4). Let X = {x ∈ F | xmin ≤ xi ≤ xmax , i = 1, 2, . . . , n} and i i

Ω = {x ∈ X | d(x, λ(x)) − x = 0}. It follows from Lemma 1 that every point in Ω is a stationary point of (4). For any x ∈ X , let A(x) =



x + µ∗ (d(x, λ(x)) − x)|µ∗ ∈ [0, 1],

h(x + µ∗ (d(x, λ(x)) − x); β)

 = min h(x + µ(d(x, λ(x)) − x); β) . µ∈[0,1]

Then, A(x) consists of all the points that minimize h(x + µ(d(x, λ(x))− x); β) on the line segment between x and d(x, λ(x)). In the following we prove that A(x) is closed at every point x ∈ X \ Ω. Let x¯ be an arbitrary point of X \ Ω . Let xq ∈ X \ Ω , q = 1, 2, . . . , be a sequence convergent to x¯ , and yq ∈ A(xq ), q = 1, 2, . . . , a sequence convergent to y¯ . To prove that A(¯x) is closed, we only need to show y¯ ∈ A(¯x). From xq ∈ X \ Ω and x¯ ∈ X \ Ω , we

According to the definition of A(x), we have xkj +1 ∈ A(xkj ). Since xkj +1 , j = 1, 2, . . . , are bounded, we can extract a convergent subsequence from the sequence, xkj +1 , j = 1, 2, . . .. Let xkj +1 , j ∈ K , be a convergent subsequence extracted from the sequence, xkj +1 , j = 1, 2, . . .. Let x# be the limit point of the subsequence, xkj +1 , j ∈ K . Suppose that x∗ 6∈ Ω . Since A(x∗ ) is closed, we have x# ∈ A(x∗ ). Thus, h(x# ; β) < h(x∗ ; β), which contradicts that h(xk ; β) converges as k → ∞. Therefore, x∗ ∈ Ω . The theorem follows.  Although it is difficult to prove that for β = βq , any limit point of xk , k = 1, 2, . . . , generated by (14) is at least a local minimum point of (4), in general, they are indeed at least a local minimum point of (4). Theorem 2 implies that every limit point of x∗,q , q = 1, 2, . . . , is at least a local minimum point of (3) with every component equal to either −1 or 1 if x∗,q is a minimum point of (4) with β = βq . 4. Numerical results In this section we present some numerical results to show effectiveness and efficiency of the algorithm. The algorithm is programmed in MATLAB. To compare the algorithm (DAA) with the other two heuristic methods, MSKL (Kernighan & Lin, 1970) and MLGP (Karypis & Kumar, 1998), we also programmed them in MATLAB. In our implementation of the algorithm, α = 0.01, β0 = 1 − 4smin /π 2 with smin being the minimum eigenvalue of −(W + Ξ +α I ), x∗,0 = (0.5, . . . , 0.5, −0.5, . . . , −0.5)> , tola = 0.01, and tolb = 0.001.

64

C. Dang et al. / Neural Networks 22 (2009) 58–66

Table 1 Mean computational time of the three algorithms for different graphs. Density

Algorithm

n = 1000

θ = 0.9

Table 2 Mean results for R1 and R2 . Density

n = 2000

θ = 0.95

θ = 0.9

θ = 0.95

0.01

DAA MSKL MLGP

21.7 108.7 26.9

28.7 97.8 37

101.9 1030.6 401.7

113.7 918.5 323.8

0.1

DAA MSKL MLGP

20.5 150.3 50.9

28.8 160 61.4

106.4 1419.5 555.1

116.5 1476.3 466.5

0.2

DAA MSKL MLGP

20.7 185.4 63.2

26.9 156.9 50.4

94.9 1586.8 566.8

114.2 1516.1 622.9

0.5

DAA MSKL MLGP

24 153 49.3

33.4 152.4 55.4

97.8 1511.3 549.7

123.2 1629.8 645.5

0.8

DAA MSKL MLGP

22.1 147.3 52.7

29.3 173.9 54.4

103 1619.8 821.8

149.9 1581.6 573.5

1.0

DAA MSKL MLGP

21.8 163.1 51.8

25.5 146.6 56.2

109.6 1522.6 553.2

126.3 1481.2 533

RATIO

n = 1000

θ = 0.9

θ = 0.95

n = 2000 θ = 0.9

θ = 0.95

0.01

R1 R2

0.0339 0.0250

0.0310 0.0304

0.0173 0.0174

0.0168 0.0168

0.1

R1 R2

0.0032 0.0044

0.0018 0.0036

0.0033 0.0039

0.0026 0.0039

0.2

R1 R2

0.0010 0.0028

0.0027 0.0023

0.0021 0.0027

0.0020 0.0026

0.5

R1 R2

0.0011 0.0019

0.0015 0.0019

0.0010 0.0011

0.0009 0.0014

0.8

R1 R2

0.0006 0.0010

0.0007 0.0010

0.0005 0.0006

0.0005 0.0009

1.0

R1 R2

0.0003 0.0006

0.0003 0.0005

0.0003 0.0005

0.0004 0.0006

To determine µk , we employ the Armijo-type rule, which is as follows: Let δ and ν be any two given numbers in (0, 1). Choose mk to be the smallest nonnegative integer satisfying h(xk + ν mk (d(xk , λ(xk )) − xk ); βq )

≤ h(xk ; βq ) + ν mk δ(d(xk , λ(xk )) − xk )> ∇x h(xk ; βq ). Let µk = ν mk . We set δ = 0.8 and ν = 0.4. The algorithm terminates if βq < 0.1. When the algorithm stops, we define z ∗ = (z1∗ , z2∗ , . . . , zn∗ )> with zi∗ =



1

−1

∗,q

if xi ≥ 0, ∗,q if xi < 0,

which always satisfies e> z ∗ = 0 in our numerical tests. All of our testing graphs are generated randomly. The procedure to generate a random graph is as follows: let p ∈ (0, 1) be any given number. There is an edge between two nodes if a number taken randomly from (0, 1) is less than or equal to p. We call p the density of the graph. If there is an edge between nodes i and j, wij is a random integer between 1 and 10, and if there is no edge between nodes i and j, wij = 0. In our numerical presentations, OBJDAA stands for the objective function value of (1) at the solution obtained by DAA, OBJMSKL the objective function value of (1) at the solution obtained by MSKL method, and OBJMLGP the objective function value of (1) OBJMSKL−OBJDAA at the solution obtained by MLGP, R1 = , and R2 = OBJDAA OBJMLGP−OBJDAA . OBJDAA

In Tables 1 and 2 are shown numerical results, where the first column is the density of a graph and n is the number of nodes in the graph. For each pair of density and n, ten graphs are generated randomly. The figures in the lower right part of two tables are the average computational time in seconds for each algorithm to solve these 10 graphs for a min bisection and the average values of R1 and R2 , respectively. Two criteria are used for comparison of the three algorithms: one is CPU time and the other is solution quality. From Tables 1 and 2, one can see that:

• DAA has the best performance, in terms of both computational time and solution quality. It is many times faster than MSKL, which is also the case for MLGP when n = 2000.

Fig. 1. Relationship between density of a graph and computational time.

• In terms of computational time, the next best algorithm is MLGP, exhibiting the great power of the three levels: coarsening, partitioning, and refining; whereas MSKL has higher solution quality, indicating the essence of starting points, because of their being easily trapped in local minima. • Computational times for MSKL and MLGP increase much faster with the number of nodes in a graph than DAA does, rendering DAA a more suitable alternative for larger graphs. • Under the same condition, different large values of θ have little effect on the performance of DAA except that larger values need a longer computational time. • On the whole, when n is fixed, R1 and R2 decrease with the increase of the graph density, showing that the differences among DAA, MSKL and MLGP decline when the graph density increases, which is indicative of the suitability of DAA for sparse graphs. Box plot for the evaluation of the three algorithms is given in Fig. 1, where the horizontal axis is the density of a graph; the vertical axis corresponds to the computational time for each algorithm to solve the graph for a min bisection; the first three values on the horizontal axis are for the algorithm DAA, the second three for MSKL, and the last three for MLGP). From Fig. 1, it is clear that:

• DAA outperforms MSKL and MLGP dramatically. • With the same density, such as 0.5, the computational time of MSKL has the largest variation, followed by MLGP, indicating again that the starting points of the algorithms have significant effect on their performances. • Different densities have little effect on DAA.

C. Dang et al. / Neural Networks 22 (2009) 58–66

65

5. Conclusions In this paper we have formulated an equivalent linearly constrained continuous optimization problem to the min-bisection problem and developed an algorithm, based on a logarithmiccosine barrier function and a feasible descent direction. We have proved that the algorithm converges to at least a local minimum point of (3) with every component equal to either −1 or 1 if the algorithm yields a local minimum point of (4) for a sequence of descending values of the barrier parameter with a limit of zero. Numerical results show that the algorithm is much more efficient than two of the best existing heuristic methods, Kernighan–Lin Method with multiple starting points (MSKL) and multilevel graph partitioning scheme (MLGP). Fig. 2. Insensitivity tests of DAA to starting points. symbol ‘+’ denotes the outliers.

Acknowledgments This research was partially supported by CERG: CityU 101005 of the Government of Hong Kong SAR. References

Fig. 3. Relationship between the Number of Nodes in a Graph, n, and the Length of the Sequence {βk }, L.

The last point above amounts to the robustness of DAA for different densities of graphs, insight into which may be gained by the following fact. DAA as a mathematical method utilizes little information on the graph’s structure, hence its robustness. In order to demonstrate how insensitive DAA is to the starting points, the following experiment is conducted. A starting point x = (x1 , x2 , . . . , xn )> is generated by the following procedure: (1) S = (s1 , s2 , . . . , sn/2 ) with each of its member an integer randomly selected from V = {1, 2, . . . , n}, and S¯ = V \ S = {¯s1 , s¯2 , . . . , s¯n/2 }; (2) xsi = ri , i = 1, 2, . . . , n/2, where ri is a random number in (−1, 1), and xs¯i = −ri , i = 1, 2, . . . , n/2. For the same graph, ten starting points are generated for DAA. Fig. 2 witnessed the algorithm’s insensitivity to the starting points, where the horizontal axis is the number of nodes of a graph and the vertical axis represents the ratio of objective values obtained by DAA using different starting points to their maximum. In the case of n = 1000, the standard deviation of the ten objective values is 0.0184 and when n = 2000, the standard deviation is 0.0131, a statistical fact exhibiting the immunity of DAA to different starting points. Furthermore, another noteworthy fact is that the length of the sequence {βk } is greatly shorter than O(2n ), which is illustrated in Fig. 3 with the horizontal axis representing the number of nodes, n, in the graph, i.e., the dimensionality of the problem and the vertical axis the length of the sequence {βk }. The node numbers of graphs used in the experiments are between 100 and 3000 with the increment of 100.

Aiyer, S., Niranjan, M., & Fallside, F. (1990). A theoretical investigation into the performance of the Hopfield model. IEEE Transactions on Neural Networks, 1, 204–215. Amin, C. O. (2005). A spectral heuristic for bisecting random graphs. Available online: www.interscience.wiley.com. Berg, J. van den (1996). Neural relaxation dynamics. Ph.D. thesis. The Netherlands: Erasmus University of Roterdam. Bout, D. van den, & Miller, T., III (1990). Graph partitioning using annealed networks. IEEE Transactions on Neural Networks, 1, 192–203. Bui, T. N., & Moon, B. R. (1996). Genetic algorithm and graph partitioning. IEEE Transactions on Computers, 45(7), 841–855. Cichocki, A., & Unbehaunen, R. (1993). Neural networks for optimization and signal processing. New York: John Wiley & Sons. Dang, C., & Xu, L. (2001). A Lagrange multiplier and Hopfield-type barrier function method for the traveling salesman problem. Neural Computation, 14, 303–324. Durbin, R., & Willshaw, D. (1987). An analogue approach to the traveling salesman problem using an elastic network method. Nature, 326, 689–691. Feig, U., & Kuauthgamer, R. (2006). A polylogarithmic approximation of the minimum bisection. SIAM Review, 48(1), 99–130. Gee, A., Aiyer, S., & Prager, R. (1993). An analytical framework for optimizing neural networks. Neural Networks, 6, 79–97. Gee, A., & Prager, R. (1994). Polyhedral combinatorics & neural networks. Neural Computation, 6, 161–180. Gil, C., Ortega, J., Diaz, A. F., & Montoya, M. D. G. (1998). Annealing based heuristics and genetic algorithms for circuit partitioning in parallel test generation. Future Generation Computer Systems, 14, 439–451. Hendrickson, B., & Leland, R. (1993). A multilevel algorithm for partitioning graphs. Tech. report SAND93-1301. Albuquerque, NM: Sandia National Laboratories. Hopfield, J., & Tank, D. (1985). Neural computation of decisions in optimization problems. Biological Cybernetics, 52, 141–152. Horio, Y., Ikeguchi, T., & Aihara, K. (2005). A mixed analog/digital chaotic neurocomputer system for quadratic assignment problems. Neural Networks, 18, 505–513. Ishii, T., Iwata, K., & Nagamochi, H. (2007). Bisecting a 4-connected graph with three resource sets. Discrete Applied Mathematics, 155, 1441–1450. Jerrum, M., & Sorkin, G. B. (1998). The metropolis algorithm for graph bisection. Discrete Applied Mathematics, 82, 155–175. Karypis, G., & Kumar, V. (1998). A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1), 359–392. Kernighan, B. W., & Lin, S. (1970). An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 49, 291–307. Loiola, E. M., Abreu, N. M. M. de, Boaventura-Netto, P. O., Hahn, P., & Querido, T. (2007). A survey for the quadratic assignment problem. European Journal of Operational Research, 176, 657–690. Minoux, M. (1986). Mathematical programming: Theory and algorithms. New York: John Wiley & Sons. Murty, K. G., & Kabadi, S. N. (1987). Some NP-complete problems in quadratic and nonlinear programming. Mathematical Programming, 39, 117–129. Peterson, C., & Soderberg, B. (1989). A new method for mapping optimization problems onto neural networks. International Journal of Neural Systems, 1, 3–22. Rangarajan, A., Gold, S., & Mjolsness, E. (1996). A novel optimizing network architecture with applications. Neural Computation, 8, 1041–1060. Saab, Y., & Rao, V. (1992). On the graph bisection problem. IEEE Transactions On Circuits and Systems-1: Fundamental Theory and Applications, 39(9), 760–762. Simic, P. (1990). Statistical mechanics as the underlying theory of elastic and neural optimizations. Networks, 1, 89–103.

66

C. Dang et al. / Neural Networks 22 (2009) 58–66

Soper, A. J., Walshaw, C., & Cross, M. (2004). A combined evolutionary search and multilevel optimization approach to graph-partitioning. Journal of Global Optimization, 29, 225–241. Urahama, K. (1996). Gradient projection network: Analog solver for linearly constrained nonlinear programming. Neural Computation, 6, 1061–1073. Wacholder, E., Han, J., & Mann, R. (1989). A neural network algorithm for the multiple traveling salesman problem. Biological Cybernetics, 61, 11–19. Waugh, F., & Westervelt, R. (1993). Analog neural networks with local competition: I. Dynamics and stability. Physical Review E, 47, 4524–4536.

Wolfe, W., Parry, M., & MacMillan, J. (1994). Hopfield-style neural networks and the TSP. IEEE International Conference on Neural Networks, 7, 4577–4582. Wu, J. (2004). Annealing by two sets of interactive dynamics. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 34(3), 1519–1525. Xu, L. (1994). Combinatorial optimization neural nets based on a hybrid of Lagrange and transformation approaches. In Proceedings of the world congress on neural networks (pp. 399–404). Yuille, A., & Kosowsky, J. (1994). Statistical physics algorithms that converge. Neural Computation, 6, 341–356.