ELSEVIER
European Journal of Operational Research 88 (1996) 382-403
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH
Theory and Methodology
Conditional subgradient optimization - Theory and applications Torbj6rn Larsson, Michael Patriksson, Ann-Brith Str6mberg* Department of Mathematics, Link6ping Institute of Technology, S-581 83 Linki~ping, Sweden
Received April 1993; accepted June 1994
Abstract
We generalize the subgradient optimization method for nondifferentiable convex programming to utilize conditional subgradients. Firstly, we derive the new method and establish its convergence by generalizing convergence results for traditional subgradient optimization. Secondly, we consider a particular choice of conditional subgradients, obtained by projections, which leads to an easily implementable modification of traditional subgradient optimization schemes. To evaluate the subgradient projection method we consider its use in three applications: uncapacitated facility location, twoperson zero-sum matrix games, and multicommodity network flows. Computational experiments show that the subgradient projection method performs better than traditional subgradient optimization; in some cases the difference is considerable. These results suggest that our simple modification may improve subgradient optimization schemes significantly. This finding is important as such schemes are very popular, especially in the context of Lagrangean relaxation. Keywords: Convex programming;Nonsmooth optimization; Subgradient methods; Conditional subgradients; Lagrangean relaxation
1. I n t r o d u c t i o n
and motivation
Subgradient optimization methods for minimizing a nondifferentiable convex function originate in a work by N.Z. Shor from 1962; see Shor (1991) for a review of the early history of nonsmooth optimization. For the case of unconstrained optimization, Ermol'ev (1966) proved convergence of the method using step lengths according to a divergent series, and Polyak (1967, 1969) extended the method to the case of convex constraints and gave additional convergence results. These methods have been frequently and often successfully applied, particularly in connection with Lagrangean duality (e.g., Geoffrion, 1974; Held et al., 1974; Fisher, 1981, 1985); subgradient methods also are closely related to relaxation methods for solving systems of linear inequalities (e.g., Goffin, 1978). We present conditional subgradient optimization methods, which include as special cases the methods just mentioned and subgradient projection methods; the latter are shown to have better practical performances. * Corresponding author.
0377-2217/96/$09.50 (~) 1996 Elsevier Science B.V. All rights reserved SSDI 0377-2217(94)00200-2
T. Larsson et al./ European Journal of Operational Research 88 (1996) 382-403
383
Let the function f • R ~ ~ IR be convex and thus everywhere continuous and subdifferentiable. Further, let X C_ R ~ be a nonempty, closed and convex set, and assume that inf~cx f ( x ) = f ( x * ) > - o c for some x* E X. The program considered is [P]
:" with the nonempty and convex set of optimal solutions X* = {x E X t f ( x ) = f*}. The stated properties of the program [P] are assumed to hold throughout the paper. The subdifferential of f at x is defined as
Of(x) = { ~ / E ~ n I f ( Y ) > - f ( x ) + ' y T ( y - - x ) ,
VyEN[n},
which is a nonempty, convex and compact set (Rockafellar, 1970, Theorem 23.4). Letting t denote iteration number and ),(x t) be a subgradient of f at x t E X, i.e., y ( x t) C a f ( x t ) , the subgradient optimization method for the program [P] is given by
xt+l=Px(xt-o6.y(xt)),
/=0,1
.....
(I)
where
Px(x)
=
argmixn
IIY -
xll 2
(2)
denotes the Euclidean projection of a point x C R n onto the set X, and the step length at is chosen according to some rule which guarantees convergence (see, e.g., Shor, 1985). When X = R n, step lengths satisfying the divergent series conditions oo
at > 0 , Vt,
lim
Olt = O
t --* O0
and
~'~ at = e~
(3)
t...O
yield an infinite sequence {x t} of iteration points such that { f ( x t) } ~ f* (Ermol'ev, 1966). When the optimal value of [P] is known a priori, one may use the step length formula (Polyak, 1969) f(xt) = o,.
--
Ib'(x')112
f*
0 < el < Ot < 2
-
e2 <
2,
Vt,
(4)
'
where 0t is a step length (or relaxation) parameter. Example 1.1 (Slow convergence of Polyak's method). The following (trivial) instance of [P] illustrates the possibly slow convergence of Polyak's method ( l ) , (4). Let X = {x C R 2 I xl + x2 = l; xt,x2 >_ 0} and f ( x ) = xl + 2x2 - l; the optimal solution is x* = (1,0) x, with f* = 0. Applying the method ( I ) , (4) with the starting point x ° = (0, 1 )X and step length parameter 0t - 1 gives step lengths at = f ( x t ) / 5 , for all t. The iterates then are x t = (1 - (9/10) t, ( 9 / 1 0 ) t ) x, with objective values f ( x t) = (9/10) t, for all t (see Fig. 1). Clearly, for this instance, Polyak's method converges very slowly towards the optimal solution. The zig-zagging phenomenon in this example is due to the fact that the subgradients are almost antiparallei to the normals of the feasible set. This behaviour was very pronounced in an application studied in Damberg (1990), and it motivated us to investigate the possibility of using projected subgradient directions in subgradient optimization schemes. It subsequently lead to the development of conditional subgradient optimization methods, of which subgradient projection methods are special cases. Because of the high computational cost of performing
384
T. Larsson et al./ EuropeanJournal of OperationalResearch 88 (1996) 382--403 X2
x2 :ll3 x4
-o~o ~' (x °) ~7 .c 8
x9 xll
~- X 1
1 Fig. 1. Iteration point sequencefor the exampleusing Pnlyak's method. projections onto general convex sets, both Polyak's method ( 1 ), (4) and the subgradient projection method are most conveniently used when the feasible set is simple (for example a simplex). To the best of our knowledge, subgradient projection approaches have only been studied by Svanberg and BMthe (1985), and Kim and Um (1989). A more frequently utilized strategy for the minimization of nondifferentiable functions subject to linear constraints is to construct descent directions and perform line searches; see, e.g., Bertsekas and Mitter (1973), Polak et al. (1983), Strodiot et al. (1983), Bihain et al. (1987), Panier (1987), Sreedharan (1987), and Tamarit Goerlich (1988). For an application with general linear constraints it is likely that a descent method is preferable to a subgradient optimization method, since in that case both methods involve the solution of general quadratic programs, but the former requires fewer iterations. The outline of the paper is as follows. The general conditional subgradient optimization method is derived in Section 2, together with convergence results for step length rules of the forms (3) and (4). In Section 3, we present the subgradient projection method. Implementation and evaluation of the subgradient projection method for applications to uncapacitated facility location, two-person zero-sum matrix games, and minimum cost multicommodity network flows are reported in Section 4. In Section 5, we make some concluding remarks and suggest further research.
2. Conditional subgradient optimization 2.1. Preliminaries Dem'janov and Somesova (1978, 1980) generalize the definition of the subdifferential so that it takes the feasible set into account. This generalization is the conditional subdifferential aXf(x)={yE~n
If(Y) >--f(x)+yT(y--x),
VyEX},
xEX,
(5)
T. Larsson et al./ European Journal of Operational Research 88 (1996) 382-403
385
the elements of which will be referred to as conditional subgradients. Clearly, OXf(x) o_ Of(x) for all x E X. The following proposition is immediate. Proposition 2.1 (Properties of the conditional subdifferential). The conditional subdifferential Ox f ( x ) is nonempty, closed and convex for all x E X. The normal cone of X at x is defined by {v E R" I vT(y -- x) < 0, 0,
Nx(x) =
Vy E X},
xEX, x ~ X.
Dem'janov and Somesova show the following result. Proposition 2.2 (Characterization of the conditional subdifferential). oXf(x) = af(x) + Nx(x),
x E X.
Note that o X f ( x ) is unbounded whenever x is on the boundary of X. In order to establish the relationships between traditional subgradient optimization and the conditional subgradient optimization method to be derived, we reformulate [P] as the unconstrained auxiliary program [AP] f* = min f X ( x ) a'6R"
dej f ( x )
+ ~Px(X),
where
~Ox(x)
J" 0, +oo,
I
x E X, x ~ X
is the indicator function of the set X at x. The auxiliary function f x is lower semi-continuous, proper and convex, and its subdifferentiai is characterized in the proposition below, which follows from Theorem 5C in Rockafellar ( 1981 ) since X 4= 0 and dom f = R" imply that int(dom f ) fq dom~Px :¢ 0. Here, int denotes interior and dom denotes effective domain. Proposition 2.3 (Characterization of the subdifferential of the auxiliary function). Of X(x) = O f ( x ) + O ~ x ( x ) ,
x E X.
Since a~bx(x) = N x ( x ) for all x E X (Rockafellar, 1981, Proposition 3J), Propositions 2.2 and 2.3 imply the following result. Proposition 2.4 (Equality of subdifferentials). oXf(x) = ofX(x),
x E X.
Note that Of x denotes the ordinary subdifferential of the auxiliary function f x , whereas a x f denotes the conditional subdifferential of f with respect to X. As a conditional subgradient of f is a subgradient of f x ,
386
T. Larsson et aL / European Journal of Operational Research 88 (1996) 382~I03
the conditional subgradient method to be derived below may be viewed as traditional subgradient optimization applied to the auxiliary unconstrained program [AP], with an additional projection of the point obtained onto d o m f x. However, if traditional subgradient optimization is applied to the program [AP], then the existing convergence results can not be directly invoked since they require the objective function to be finite on the feasible set. The optimality condition of the program [P] is given in the next proposition, which follows from Proposition 5A and Equation (5.5) in Rockafellar ( 1981 ).
Proposition 2.5 (An optimality condition). x E X*
¢==~
0 E aXf(x)
¢==~
- - O f ( x ) fq N x ( x )
~ O.
Note that if X is a polyhedral set, i.e., if X = {x E R n I aTx <-- bi, i = 1. . . . . m}, then m
Nx(x)
=
( v E ]Rn I v = ~-~ aicoi; ~oi(aTx -- bi) - 0 ; i-I
~oi>0, i f 1. . . . . m},
x E X, (6)
0,
xCx,
and by defining an index set for the active constraints at x, Z ( x ) ffi {i E {1 . . . . . m }
l aTx =bi},
(7)
the necessary and sufficient conditions for the optimality of x in [P] can be expressed as 3~, E c l f ( x )
and
coi> 0, i E 2"(x),
such that
~, = -- ~
ai¢oi
iEZ(x)
(cf. the Karush-Kuhn-Tucker conditions for a differentiable program). The negative of the shortest conditional subgradient of f at x with respect to X equals the conditional steepest descent direction (Dem'yanov and Vasil'ev, 1985, Section 1.11.3); it is found by solving min{llYl[ 2 [ T E a X f ( x )
= af(x) + Nx(x)}.
(8)
The resulting direction is feasible if X is polyhedral. From Proposition 2.5, ~, = 0 is optimal in (8) if and only if x is optimal in [P]. The program (8) is the basis for the development of descent methods for [P] (e.g., Dem'yanov and Vasil'ev, 1985, Section 4.4), and also for our subgradient projection scheme (see Section 3). Its specialization to differentiable nonlinear programming has been utilized in the convergence analysis of descent methods (e.g., Burke and Mor6, 1988). In addition, the step direction in Rosen's (1960) gradient projection method (e.g., Luenberger, 1984, Exercise 11.10:8) can be interpreted as a heuristic solution to a dual formulation of (8). 2.2. M e t h o d analysis
Let ~,X ( x t) be a conditional subgradient of f at x t E X. T h e conditional subgradient optimization m e t h o d is given by x t+l
m
Px (x' - at" ~ / X ( x t ) ) ,
t = 0, 1 . . . . .
In the (unlikely) case that ~,X(xt) = 0 for some t, x t is optimal, and the procedure (9) is terminated.
(9)
T. Larsson et al./ European Journal of Operational Research 88 (1996) 382-403
387
Our main convergence result for the method (9), (3) establishes convergence towards the optimal set X*; the distance from a point x to this set is defined as dx. (x) ffi minyex. IIY - xll. The following theorem generalizes a classical convergence result of Ermol'ev (1966, Section 9). Theorem 2.6 (Convergence to the optimal set using divergent series step lengths). Let {x t} be generated by the method (9), (3) applied to [P]. If X* is bounded and sup,{ll~,X(x')ll} < co, then { f ( x t ) } ~ f * and {dx* (xt)} --* 0.
Proof. The proof is a non-trivial extension of that of Ermol'ev's theorem. We show that the iterates will eventually belong to an arbitrarily small neighbourhood of the set of optimal solutions to [P]. Let 8 > 0 and B 8 ffi {x E R n I Ilxll -< 8}. Since f is convex, X is nonempty, closed and convex, and X* is bounded, it follows (from Theorem 27.2 of Rockafeilar, 1970, applied to the lower semi-continuous, proper and convex function f + ~bx) that there exists an e ffi e ( 8 ) > 0 such that the level set {x E X I f ( x ) < f * + e} C_ X* + B8/2; this level set is denoted by X ~. Moreover, since for all t, II~,X(x')ll ___sup~{ll~,X(xS)ll} < ~ , and {trt} ---, 0, there exists an N ( 8 ) such that ,~,ll~,X(x')[I 2 <_ ~ and ,~,ll~,X(x')ll _ ,V2 for all t > N ( 8 ) . The sequel of the proof is based on induction and is organized as follows. In the first part, we show that there exists a finite t ( 8 ) >_ N ( 8 ) such that x itS) E X* + B ~. In the second part, we establish that if x t belongs to X* + B ~ for some t > N ( 8 ) , then so does x t+j, by showing that d x * ( x t+l ) < d x * ( x t ) , or x t E X e so that x t+~ E X* + B 8 since the step taken is not longer than ½8. Let x* E X* be arbitrary. In every iteration t we then have llx* - x'÷'ll 2 ~ llx* - p ~ ( x ' - ~,~X(x'))II =
Ilx* - x' + ~,~X(x,)112 ffi IIX* -- xtll 2 d- O t t ( 2 T X ( x t ) T ( x * -- X') -']-
,~,ll~,X(x ') 112),
(10)
where the inequality follows from the projection property. Now, suppose that 2rX(xS)t(x
* _ x ~) + ~llrX(x~)llZ
< -~
(11)
for all s ~_ N ( 8 ) . Then, using (lO) repeatedly, we obtain that for any t _~ N ( 8 ) , t
IIx*-x'+lll2 < IIx*--xN¢8~ll2--~
~
~.
s-N(8)
and from (3) it follows that the right-hand-side.of this inequality tends to minus infinity as t ~ cx~, which clearly is impossible. Therefore, 2TX ( x ' ) T ( x * -- x ' ) + ~,llTX(x')ll 2 _> - ~
(12)
for at least one t >_ N ( 8 ) , say t ffi t ( 8 ) . From the definition of N ( 8 ) , it follows that TX(xt~8))T(x * - - x '~8)) >_ --e. From (5) we have that f ( x * ) - f ( x ,~8~) > T X ( x ' t 8 ) ) T ( x * --xtt~)), since x * , x '~8~ E X. Hence, f ( x ttS>)< f* + e, i.e., x t(8) E X ~ C_ X* + B 8/2 C X* + B ~. Now, suppose that x' E X* + B 8 for some t > N ( 8 ) . If (11) holds, then, using (10), we have that IIx* - x '+~ II < IIx* - x'll for any x* E X*. Hence, d x . ( x '+~) <_ I l e x . ( x ' )
-x'÷'ll
< IIPx.(x') -x'll •dx.(x')
<_ ,~.
Thus, x '+l E X* + B 8. Otherwise, (12) must hold and, using the same arguments as above, we obtain that f ( x t) "( f * q- e, i.e., x t E X ~ C_ X* + B 8/2. As
IIx '÷~ - x'll - IIPx(x' - ~ , r X ( x ' ) ) - x'll _< IIx' - , ~ , r X ( x ') - x'll - ~,ll~,X(x') II _< ½8
388
T. Larsson et al. / European Journal o f Operational Research 88 (1996) 382-403
whenever t _> N(t~), it follows that x t+l C X* + B 8/2 + B 8/2 = X* + B ~. By induction with respect to t _> t(t~), it follows that x t E X* + B ~ for all t > t(t~). Since this holds for arbitrarily small values of 6 > 0 and f is continuous, the theorem follows. [] From the proof of the theorem it can be seen that the condition supt{ilTX(x t) II} < ~ can be relaxed into {,~, II~,x (x,) It2 } ~ 0. However, we have chosen to impose the, generally, stronger condition sup, { II~,x (x') II} < cxz because it is both necessary and sufficient for the convergence of the important special case of conditional subgradient optimization studied in the next section. By imposing an additional condition on the choice of step lengths, it is possible to establish convergence of the whole sequence of iterates to an optimal solution, even without the boundedness condition on X* (see, e.g., Shepilov, 1976, for a corresponding result for traditional subgradient optimization). Theorem 2.7 (Convergence to an optimal solution using divergent series step lengths). Let {x t} be generated by the method (9), (3) applied to [P] with the step lengths also fulfilling ~ c~2 < oo. lf sup,{ll~X(x ,) II} < c~, then {x t} converges to an element of X*. Proof. Let x* E X* and t > 1. Repeated application of (10) yields that t--I IIx* - - x t l [ 2 "( I I X * - - x O I I 2 + 2 ~ - ~ O t s ' y X ( x s ) T ( x s=0
t--I . X(xS. * - - X s) + 2 .X-"~ . . ~ ~ 2sHY
)l] 2.
(13)
s=0
Since x* E X* and yX(xs) C o X f ( x S ) for all s > 0 we obtain that
f ( x s ) >- f* >- f ( x s ) + y X ( x S ) T ( x* --xS),
(14)
and hence that yX(x~)T(x* -- x s) __ < 0 for all s __ > 0. Define c = supt{llg, X(xt)l[ } and p = ~ t =ooo %2, so that
lit x (x~) II -< c for any s > 0 and y~tsSola s2 < p. From (13) we then conclude that I[x* - x' II2 < IIx* - x ° 112+pc 2 for any t > 1, and thus that the sequence {x t} is bounded. Assume now that there is no subsequence {x t' } of {x t} with {~,X(xt')T(x* - x t,) } ~ 0. Then there must exist an e > 0 with yX(xs)T(x* --X s) < --e for all sufficiently large values of s. From (13) and the conditions on the step lengths it follows that {llx* - x t l l } ~ - o o , which clearly is impossible. The sequence {x t} must therefore contain a subsequence {x"} such that {yX(xt')T(x* - x t,) } ~ 0. From (14) it follows that { f ( x t') } ~ f*. The boundedness of {x t} implies the existence of an accumulation point of the subsequence {xt'}, say x ~ . From the continuity of f it follows that x °° E X*. To show that x °° is the only accumulation point of the sequence {xt}, let S > 0 and find an M ( 6 ) such that IIx°~ - x M ~ ) IIm _< ~ / 2 and ~ Moo <~ '~s2 <_ 8/(2cZ). Consider any t > M(t~) Analogously to the derivation of (13), and using (14), we then obtain that t--I
iix~ -
xtll2 <_ llxOO-
xM(~)II2+
~
, ~2l l r X i x )sl l 2 < ~6 + ~ ~ c2 = ~ .
s=M(8)
Since this holds for arbitrarily small values of 6 > 0, the theorem follows.
[]
The results of Theorems 2.6 and 2.7 can, through the almost complete relaxation strategy of Dem'yanov and Vasil'ev (1985, Section 3.4), be used to establish convergence for other step length selection rules; the following corollary is immediate. Corollary 2.8 (Convergence under the almost complete relaxation condition). Let {fl,} be an arbitrary sequence of step lengths replacing {at} in the method (9). If there exist sequences {~-t} and {~t} that both
T.
Larsson et al./ European Journal of Operational Research 88 (1996) 382-403
satisfy (3) and a_q_ t < fit <_ -fit for all t, then the invocation of Theorem 2.6 is valid. If, in addition,
389
Et~ 012 and
~ t ~ -f2 are finite, then the invocation o f Theorem 2.7 is valid.
Since the elements of the sequences {at} and {-ft} can be made arbitrarily small and arbitrarily large, respectively, while satisfying (3) (e.g., ~ = c / t and -ft = -~/t, where c and ? are very small and very large constants, respectively), the almost complete relaxation strategy may be used as a safe-guard in step length selections based on line searches or formulas involving estimates of the optimal value, among others, through an (almost never used) upwards or downwards rounding of the tentative step length. This safe-guarding strategy could be introduced to enforce convergence also in descent methods with line searches for nondifferentiable optimization, which may otherwise yield false convergence (e.g,, Fletcher, 1987, Section 14.4). We now consider step lengths f(x~)--f* ~, = 0,.
0
Vt.
(15)
Ib'X(x')ll 2 ,
The next convergence theorem generalizes Polyak's (1969, Theorem 1 ) convergence result for the method (1), (4), Theorem 2.9 (Convergence to an optimal solution using Polyak step lengths). Let {x t} be generated by the method (9), (15) applied to [P]. lf sup,{Ib, X(x')ll} < ~ , then { f ( x t ) } ~ f * , while {x t} converges to an element of X*. Proof. Identical to the proof of Polyak's (1969) Theorem 1, with y replaced by yx.
[]
It is well known (e.g., Ermoi'ev, 1966, and Poljak, 1967) that the method (1), (3) cannot exhibit a linear convergence rate; this is also true for the more general method (9), (3). For the method (9), (15) linear convergence is achieved if f satisfies a weak sharpness condition, i.e., if there is a bt > 0 such that f ( x ) -- f * >_ t z d x . ( x ) ,
x E {y E
x llly-Px.(x°)l[ ~
dx.(X°)}.
(16)
Polyak (1969, Theorem 3) establishes this result for the method (1), (4). See Polyak (1969, 1979, 1987) and Burke and Ferris (1993) for characterizations of the condition (16). Theorem 2.10 (Linear convergence under a weak sharpness condition). Under the assumptions o f Theorem 2.9 and the condition (16), the method (9), (15) applied to [P] exhibits linear convergence towards an element x* E X*, i.e., I[xt - x* [[ < Kq t, where K >_ 0 and q ~ ( 1 - el e2 (l~/C) 2) I/2 < 1 with c = suPt { IIyx ( x t) II}
Proof. Identical to the proof of Polyak's (1969) Theorem 3, with y replaced by yx.
[]
Analogously to Polyak (1969) it is also possible to show that in the method (9), (15) with f* replaced by a lower bound f < f* and with Ot - 1 there exists, for any e > O, a finite t(e) such that f ( x t(e)) ~ 2f* - - f + e . A fundamental property of subgradients is that they define halfspaces containing X*. In the case of conditional subgradients, these halfspaces may be given an interesting interpretation. For some x E X, let y E a f ( x ) and ~, E N x ( x ) . Then X* C_ {y E N n J 7 T ( y - X) _< 0} since f ( x * ) < f ( x ) for all x* E X*, and, further, X* C_ {y E R n J uT(y _ x) _< 0} since X* C X. By adding these two set-defining inequalities, the surrogate inequality ( y + ~,)T(y _ x) < 0 is obtained and, clearly, X* C {y E R" J (3' + u)T(y _ X) < 0}. But this is also a consequence of the fact that y + u E a X f ( x ) = a f X ( x ) .
390
T. Larsson et al./ European Journal of Operational Research 88 (1996) 382--403
3. The subgradient projection method 3.1. Preliminaries In many applications of subgradient optimization, for instance in Lagrangean relaxation schemes, only one subgradient of f at x is immediately available, while the feasible set is explicitly described by a set of (simple) constraints. The restriction of the projection problem (8) to the corresponding partial subdifferential of f x at x is the subgradient projection problem
[SP] 7X(x) ~ argmin{[[7[[ 2 [ y E 7(x) + N x ( x ) } . Henceforth, ~,X(x) will denote a projected subgradient, and not an arbitrary conditional subgradient. The solution x is optimal in [P] if 7X(x) ~ 0; the reverse is, however, not true in general. The following characteristic of the projected subgradient is immediate from the projection property. Lemma 3.1 (Projection property). Let 7(x) C ? f ( x ) , x C X. Then, [[TX(x)[] < 117(x)11. In order to give an alternative characterization of the projected subgradient we consider a dual formulation of [SP]. Here, Tx(x) denotes the tangent cone of X at x (see Rockafellar, 1981, Chapter 2).
Lemma 3.2 (Dual characterization of the projected subgradient). The projected subgradient is alternatively obtained from [DSP]
7X(x) - arg min{lly - 7(x)ll 2 17 c - T x ( x ) } .
Proof. The convexity of X yields that Tx(x) is polar to N x ( x ) (Rockafellar, 1981, Chapter 2). Let C - N x ( x ) and C ° be its polar; then the Moreau decomposition (Zarantonello, 1971, Lemma 2.2) gives r ( x ) - e c ( y ( x ) ) + Pc°
(r(x)).
But Pc ( 7( x ) ) ~ )fix) - ~,X ( x ) , which yields the desired result.
[]
Hence, the projected subgradient ~,X(x) equals the projection of 7(x) onto - T x ( x ) ; this is illustrated in Fig. 2. (Further, the program [DSP] is the Wolfe dual of [SP].) When the set X satisfies the Guignard constraint qualification (Guignard, 1969), which in particular holds for polyhedral sets, the cones Nx and Tx are polyhedral (Gould and Tolle, 1971), and the problems [SP] and [DSP] are (dual) quadratic programs. If X ~ {x E R n I aTx <-- bi, i ~ 1. . . . . m}, then (6) and (7) can be used to explicitly state [SP] as the linearly constrained problem [SPL] minIllTl[217=7(x)+
E i6Z(x)
and [DSP] as
aiwi;
wi>O,
iEZ(x) I,
T. Larsson et al./ European Journal of Operational Research 88 (1996) 382--403
391
- ('Y(~) - ~x (x) )~'/'/
Fig. 2. Illustration of the Moreau decomposition.
[DSPL]
min{ll~,-~,(x)llZlaT~,>_O, iCZ(x)}. Y
3.2. Method analysis For an x t c X, letting y( x t) E a f ( x t) and yX ( xt) be given by [DSP], the subgradient projection method is given by x t+j = P x ( x t - o ~ t ' y X ( x t ) ) ,
t = O , 1. . . . .
(17)
Observe that if X is polyhedral then the direction _yX(x t) is feasible, and the projection (2) onto X is therefore not needed when the step length is small enough. If sup{il~'(x')ll} < ~ ,
(18)
t
then, from Lemma 3.1, supt { II'yX(xt) II} < ~ . In particular, the condition (18) holds if X is bounded or if f is piecewise linear with a finite number of segments; similarly to Shor ( 1985, p. 27) one may also show that for any x ° E X there exists an a(x °) > 0 such that if supt{at } < a(x °) then (18) holds. Corollary 3.3 below follows from Theorems 2.6 and 2.7 and Lemmata 3.1 and 3.2. Corollary 3.3 (Convergence using divergent series step lengths). L e t { x t } be generated by the method (17), (3) applied to [P] and assume that sup,{ll~'(x')ll} < ~ . If x* is bounded, then { f ( x t ) } ~ f* and c~ 2 { d x . ( x t ) } ~ O. If ~-~t~oO~ t < cx~, then {x t } converges to an element of X*. From the continuity on R" of the convex function f , it follows that the sequence {x t } cannot be bounded if the sequence {y(x t) } is unbounded. Hence, the sequence {x t } converges to X* if and only if the condition (18) holds. Corollary 3.4 (Convergence using Polyak step lengths). Let {x t } be generated by the method (17), (15) (with yX(xt) being a projected subgradient) applied to [P]. Then { f ( x t ) } ~ f*, while {x'} converges to an element of X*.
392
T. Larsson et al. / European Journal of Operational Research 88 (I 996) 382-403
Proof. The sequence {x t} C_ {x C x lllx -x*ll ~ IIx° -x*ll}, where x* C X* (cf. Polyak, 1969, Theorem 1). Hence, {x t} is bounded and, from the continuity on R n of the convex function f , the condition (18) holds. Apply Theorem 2.9 and Lemmata 3.1 and 3.2. [] If the weak sharpness condition (16) is satisfied, then it follows from Theorem 2.10 and Lemmata 3.1 and 3.2 that the method (17), (15) converges linearly. Corollary 3.5 (Linear convergence under a weak sharpness condition). Under the assumptions of Corollary 3.4 and the condition (16), the method (17), (15) (with yX ( xt) being a projected subgradient) applied to [P] exhibits linear convergence towards an element x* E X*, i.e., I[xt - x*l[ < Kq t, where K > 0 and q = (1 t~le2(~/C) 2) I/2 < 1 with c = sup,{ll~,X(x ') [[}. -
Example 3.6 (Enhanced convergence using projected subgradients). To show the effect of subgradient projection, we apply the method (17), (15) to the instance of Example 1.1, with x ° = (0, 1) x and 0t - 1. Then y ( x °) = (1,2) T, ?IX(x°) = ( - 1 / 2 , 1/2) T, and a0 = 2, which yields x J = P x ( ( 1 , 0 ) T) = (1,0) x = x*, i.e., the optimal solution is reached in one iteration. Kim and Um (1989) establish convergence of a subgradient projection method being the special case of the method (17), (15) in which the set X is required to be polyhedral and the value of the step length parameter is chosen as OrE / [el'Or]' [ el, 2 - e2 ],
if el _ < O t _ < 2 - e 2 , otherwise,
(19)
where the value aT~-(-x-~ • ~, [[7X(x,)[I2
i q~ I ( x ' )
such that aTTX(x t) < 0
corresponds to the maximal feasible step length (15) from x t in the direction - y X ( x t ) . (Since, from (19), 0t C [ e l , 2 - e2], convergence of their method follows from Corollary 3.4.) The reason for using (19) is to avoid performing the projection (2), and this is accomplished by choosing ej small. It is, however, our experience (see Sections 4.2 and 4.3) that step lengths according to (19) may significantly deteriorate the performance of the method (17), (15). A theoretical explanation for this is that the rate of convergence given in Corollary 3.5 decreases as the value of el decreases. The subgradient projection suggested by Svanberg and B~fftthe (1985) is the solution to [DSPL] for the case when the feasible set is the nonnegative orthant R~_. They use the projected direction in the subgradient scheme of Camerini et al. (1975), where the step direction is a convex combination of the negative of a subgradient at the current iteration point and the previous step direction. Svanberg and BMthe consider a Lagrangean relaxation application and observe that it is favourable to project the direction before making the convex combination.
4. Computational tests In order to compare the performance of the proposed subgradient projection method (17), (15) to that of Polyak's (1969) and Kim and Um's (1989) methods, we have experimented with a Lagrangean relaxation scheme for the uncapacitated facility location problem, a nondifferentiable optimization approach to the twoperson zero-sum matrix game, and a combined right-hand-side allocation and Lagrangean relaxation scheme for minimum cost muiticommodity network flows. The first two applications were chosen for the purpose of
T. Larsson et al./ European Journal of Operational Research 88 (1996) 382-403
393
illustration only, while in the third application the subgradient projection also leads to an improved performance of a well known, efficient solution approach. In all three applications, the projection problems [DSPL] and (2) have simple constraints and may thus be solved analytically or at a low computational cost. Therefore, the computational efforts per iteration of the three methods are similar, and the comparison is made with respect to the number of iterations. Further, in all three applications the convergence condition (18) is fulfilled. In the first and the third application, the optimal values are not known a priori; when referring to the formulas (4) or (15), the optimal value f* is for a minimization (maximization) problem then understood to be replaced by the best lower (upper) bound f ( f ) found so far (e.g., Fisher, 1981).
4.1. The uncapacitated facility location problem Assume that a number of facility locations is to be chosen among m alternatives and that each of n customers is to be assigned to exactly one of the located facilities. Let f~ be the cost for locating facility i, cij the cost for assigning customer j to facility i, and k and p be the lower and upper limits, respectively, on the number of located facilities. The problem of minimizing the sum of assignment and location costs subject to these restrictions is [LOCI min s.t.
~-~ ~ CijXij + ~ fiYi, i=1 j=l i=l nl Z Xij = 1,
j = 1. . . . . n,
i=1
Xij
--
i = 1 . . . . . m, j = l . . . . . n,
Yi <-- O,
rtl
k<_Zyi<_p,
i~l xij,Yi C {0, 1},
i = 1. . . . . m, j = 1 . . . . . n.
We consider solving [LOCI heuristically by a standard Lagrangean relaxation scheme (e.g., Fisher, 1981 ). The variable upper bound constraints, xij < Yi, are dualized with Lagrange multipliers uij, for all i, j. The Lagrangean dual problem is given by maxu>_Oh ( u ) , where h " ~ " ~ R is concave and piecewise linear with a finite number of segments (and the condition (18) will therefore be fulfilled). The method ( l ) applied to the dual problem consists of the following steps. Let u ° E R~ n. Given an iterate u' > 0, evaluate h(ut), which gives a lower bound on the optimal value of [LOCI. A subgradient of h ( d ) is given by Wij(U t) = Xij(U t) - - y i ( u t ) , for all i,j, where ( x ( u t ) , y ( u t ) ) is found when evaluating h ( u t ) . An upper bound on the optimal value is calculated by solving the restriction of [LOCI obtained from fixing y = y ( u t ) . A step computed according to the formula (4) is taken in the direction of the subgradient w(ut), and the resulting point is projected onto R~ n, i.e., u~+j = max{0, u~j + ce, wij(u')} for all i,j. The procedure is then repeated from the new iterate u t+~ . In the application of the subgradient projection method, the subgradient w(u t) is, in the above procedure, replaced by its projection onto the tangent cone of R~_~ at u t, i.e.,
w~i(u') = { max{O, w o ( u ' ) }, ' Wij(llt),
if u~j = 0, if U~i > 0,
Vi, j.
The step length is computed according to the formula (15). For the computational experiments we generated ten instances of [LOCI, each with 50 possible facility locations and 100 customers. The coordinates of the facility and customer sites are uniformly distributed
T. Larsson et al./ European Journal of Operational Research 88 (1996) 382-403
394
1.06 ", ~
....
1.04 'i~.
\ ............
1.02
1
.... "7.'"
0.98
..* .e ,." /" j,"/,'"
0.96
// 0
lo
Polyak's method: . . . . . . Kim and Um's method: . . . . . Subgradient projection: . . . .
" ."
, f 0.94
,.....'"" ....
..
2;
3'0
4'0
5'0
6'0
7'0
s.0
9'0
Fig. 3. Uncapacitated facility location with the lower location costs.
between 0 and 100 and the assignment costs are the Euclidean distances between sites. The number of located facilities is between k = 10 and p = 15. We constructed two test series by setting the location costs to 25 and 50, respectively, for each instance. In a preparatory experiment, we calibrated the step length parameter Ot for each method and test series. Then, each instance was solved by each of the three methods and in every iteration the lower and upper bounds, normalized with the optimal values (which were computed in advance), were recorded. Finally, for each test series and every iteration, we computed the average values of these normalized bounds over the ten instances. Fig. 3 shows the normalized bounds versus iterations for the test series with location costs f i = 25, for all i. Here, for all t, 8t equals 1.9 in Polyak's method, 1.0 in the subgradient projection method, and 1.1 in the method by Kim and Um. (As did Kim and Um, 1989, we used el = 1 0 - 6 . ) The relative difference between the average values of the normalized bounds becomes less than 1.0% at iteration 50 of the subgradient projection method and at iteration 53 of Kim and Um's method. After 100 iterations of Polyak's method, the relative difference 1.3% was obtained. For this test series, the average number of located facilities was 14.6. For the test series with location costs f i = 50, for all i, we obtained the result shown in Fig. 4. Here, for all t, 0t equals 1.9 in Polyak's method, 1.1 in the subgradient projection method, and 1.5 in the method by Kim and Um. The relative difference between the average values of the normalized bounds reaches below 1.0% at iteration 73 of the subgradient projection method and at iteration 84 of Kim and Um's method. For Polyak's method, the relative difference did not reach below 5.4% during the first 100 iterations. For these location costs, on the average 10.8 facilities were located. Figs. 3 and 4 show that the subgradient projection strategy in this application significantly enhances the convergence rate of subgradient optimization. The poor performance of Polyak's method is due to the large number of active nonnegativity constraints at an optimal dual solution, which results in a zig-zagging behaviour similar to that illustrated in Fig. 1. 4.2. The two-person zero-sum m a t r i x g a m e
Consider a two-person zero-sum matrix game in which the two players have n and m choices, respectively, which they choose with certain probabilities. If player one chooses alternative j and player two chooses
T. Larsson et al. / European Journal o f Operational Research 88 (I 996) 382-403
395
1.06
._,
1.04
...........
1.02
.......... ,....'-.- _.-_,_.. . . . . . . . . .
1
0.98
........... '" -.._,'"i~2-7:- Polyak's method: . . . . . . Kim and U m ' s method: . . . . . Subgr~:lient projection: . . . .
-'"" ."
0.96
:"
/
/
.....
r" ;
tO
• ....
f t
0"940
•
.......
'
•
20
3'0
.... so'
4'0
"
6'0
"
7'0
8'0
9'0
Fig. 4. Uncapacitated facility location with the higher location costs.
alternative i, then player one looses a U monetary units to player two; the m × n-matrix A = ( a u ) is referred to as the loss matrix of player one. Assume that player one chooses alternative j with the probability x j, j = 1 . . . . . n, and that he/she wishes to find a strategy x = (x i) which minimizes the expected loss, irrespective of the strategy of player two; similarly, player two tries to minimize his/her expected loss irrespective of player one's strategy. The problems of the two players define a primal-dual pair of linear programs. Further, if A is skew symmetric, i.e., if AT = - A , then these dual programs are equivalent, with optimal values zero; such games are called fair. We consider formulating and solving the game problem as a nondifferentiable optimization problem. Introducing the unit simplex S = {x E ll~" [ ~-~jfj xj = 1; x _> 0}, the problem of player one is (e.g., Tamarit Goerlich, 1988) n
[ PLAY ] f* = m i n f ( x ) , xES
where f ( x ) = max,-s{~ ....... ) aTx, with a T denoting the i:th row of A. The function f " R" H R is convex and piecewise linear with at most m segments (and the condition (18) will therefore be fulfilled). The subdifferential of f at x is given by O f ( x ) = conv{ai, i E {1 . . . . . m} I ai[x = f ( x ) } , where cony denotes the convex hull, and can easily be obtained when evaluating f at x. We consider fair games, so that f* = O. Solution methods for two-person zero-sum matrix games based on nondifferentiable optimization formulations have been proposed earlier. Korpelevi~, (1979) considers non-fair games and applies Polyak's method to a problem (with optimal value zero) in the product space of the two players. Tamarit Goerlich (1988) discusses a descent method applied to [PLAY]. Polyak's method (1), (4) applied to [PLAY] results in the following steps. Given an iterate x ~ c S, find an it such that aTxti, = f ( x t ) , and let x t+j/2 = x t o,f(xt)ai,/]lai, ll 2. The next iterate is x t~l = PT(xt+l/2). This projection can be performed through a Lagrangean dualization of the convexity constraint (e.g., Held et al., 1974), with multiplier u, say. The evaluation of the corresponding Lagrangean dual objective function h ( u ) yields y ( u ) , defined by y j ( u ) = max{O, xtj+1/2 - u / 2 } , for all j. The optimal multiplier value is found by solving the one-dimensional equation h ' ( u ) ffi 0, where the derivative h ' ( u ) = ~-~-i"=1y j ( u ) - 1 is a nonincreasing -
-
T. Larsson et al. / European Journal of Operational Research 88 (1996) 382 ~103
396
and piecewise linear function, by a simple line search based on the computation of the break-points. (This line search can be done in linear time by using a median search with respect to the break-points; see, e.g., Brucker, 1984.) The solution u* yields the projection x t+j = y ( u * ) . The projection v S ( x t) of the subgradient ai, is found by solving [DSPL], which in this application is given by
vS(xt) = a r g m i n { l l y - a i ,
ll2
~-~yj =O;
yj ~ O,
j c j(xt)},
j=l
where ,.7(x t) = {j [ x~ = 0}. This projection problem is solved analogously to the one above by dualizing the equality constraint, with multiplier u, say. The optimal multiplier u* is obtained by solving h~(u) = O, where n h~(u) = ~~j=j y j ( u ) and y • 1~ H R" is given by yj(u)
= ~ min{O, ai, j - u/2},
L ai,j -- u/2,
j C ,7(x'),
j ~ J(x').
The subgradient projection is given by vS(x t) = y ( u * ) . The step length in the direction - v S ( x t) is in the subgradient projection method calculated according to (15). We now turn to the generation of the test problems, and first observe that if the upper triangular part of A contains only positive elements, then the optimal solution is given by a pure strategy (i.e., an extreme point of S). If, on the other hand, the elements of the upper triangular part of A are uniformly distributed on an interval which is symmetric around zero, then an optimal solution is (as indicated by experiments) a mixed strategy with around 50% of the variables being positive. We created instances of [PLAY] with varying problem characteristics by taking convex combinations of randomly generated loss matrices of these two extreme types. We chose m = n = 100, and generated 20 instances of each problem characteristic; the portion (in percentage) of positive variables in an optimal solution, taken as the average of the 20 instances, is denoted by p. Whenever the optimal value of the program [P] is known a priori, the methods (1), (4) and (17), (15) are (according to Theorem 3 of Polyak, 1969, and our Corollary 3.5, respectively) linearly convergent, with maximal rates of convergence for el = e2 = 1; in this application we therefore chose Ot =- 1. As did Kim and Um (1989), we used ej = 1 0 - 6 in their method. As an example of the convergence behaviour of the three methods, Fig. 5 shows the average objective values over 20 instances (with p = 5) versus iterations; the methods were initiated at xjo = 1 / n , j = l , .. . , n. Here, the subgradient projection method performs much better than the other two; Kim and Um's method in particular exhibits very slow convergence. The subgradient projection method needed on the average 87 iterations to reach below 5% of the initial objective value. For Polyak's method and the method by Kim and Um this number was 834 and 4476, respectively. In order to investigate how the three methods perform for varying problem characteristics, we chose ten values of p in the interval [4, 52] and recorded after 250 iterations, for each of these choices and each method, the average objective values over 20 instances; the methods were again initiated at x)o = l / n , j = 1. . . . n. The average objective values obtained by Polyak's and Kim and Um's methods were normalized with respect to those obtained by the subgradient projection method. Fig. 6 shows these normalized average values versus the portion p. Clearly, the subgradient projection method becomes increasingly better than the other two methods as the optimal solutions tend towards pure strategies. For small values of p, almost all of the nonnegativity constraints will be active in an optimal solution, and, as in the facility location application, Polyak's method then suffers from zig-zagging. In Kim and Um's method, the steps taken will become very short when an optimal solution is approached from a starting point located in the relative interior of the unit simplex. For large values of p, relatively few nonnegativity constraints are active in an optimal solution. Further, a typical row of a loss
T. Larsson et al. / European Journal of Operational Research 88 (1996) 382-403
397
0.6
to4 0.3
Polyak's method: . . . . . . Kim and Um's method: Subgradient projection:
...........
!~
\
0.2
. .....
..
•
0.1
0
..... . . . .
"...... ......
-, . . . . . . . . . . . . . . . .
.7 ................
50
0
Fig.
1OO 5.
; ................
,. . . . . . . . . . . . . . . . .
150
200
Two-person zero-sum matrix game,
250
p = 5.
30 +,
25 l
20 :ti
Polyak's method: . . . . . . Kim and Um's method: Subgrndient projection:
'l
..... . . . .
15
10 Q
. . . . . . . . .
O
~
t'o
O Fig.
6.
_ - o _ -_- :.. a ~ . : ._. ~ . . . ~
2'0
.....
~ . . . . . . . . . . . . . .
~'o
~. . . . . . . . . . .
20
Normalized objective values, initial solution at xjo
~
. . . . . .
5'o I/n for all
60 j.
matrix which gives a large value of p is nearly orthogonal to a normal of the convexity constraint; therefore, a subgradient of f then defines a near-feasible direction with respect to this constraint. Thus, for large values of p, the step directions as well as the asymptotic performances of the three methods are similar. To investigate the stability of the methods with respect to their initialization, we then changed the initial solution for each instance to its best pure strategy. The result is shown in Fig. 7. The notable difference from the result of the previous experiment is in the performance of Kim and Um's method. In that experiment, Kim and Um's method suffered from taking very small steps when approaching the relative boundary. In this experiment, a similar phenomenon occurred when trying to leave the relative boundary. Note that the negative effects of the step length truncations now seem to be even more severe, since the performance is poor for all values of p. We conclude that the subgradient projection method is robust with respect to the choice of starting point, and that it has a higher overall efficiency than the other two methods. Concerning Kim and Um's method, our experience is that the shortening of the step, done in order to reduce the computations in each iteration, is the reason for its poor performance.
T. Larsson et aL / European Journal of Operational Research 88 (1996) 382-403
398
30
25
Polynk's 20
Kim
method:
and
Subgradient
Um's
...... method:
.....
projection:
. . . .
15
j"
s
"÷ .....
"4- . . . . . . . . . . . . . . .
"t-" - " -
I0 .,sl ' f
5
~" " "~'"
"0
. . . . . . . . . . . 0
0
" - ° _ ' S = L°_'_7.: - ° _ ' = ' = = _o=.= = =. = - = . = = =-=-_- -_.Q =-=-_- ~ =-=- . . . .
i
i
I0
20
m
30
t
40
o . . . . . .
,
50
60
Fig. 7. Normalizedobjectivevalues, initial solution at the best pure strategy.
4.3. The minimum cost multicommodity network flow problem Let G = (.A/',.A) denote a directed network, with .N" and ..4 being its sets of nodes and directed arcs, respectively. Further, let ]C be a set of flow commodities, dk be the demand vector for commodity k E/C, and xk be the vector of arc flows for commodity k C ]C. Associated with the commodity flows are vectors of unit transportation costs ck, k E/C, and the total arc flows are upper bounded by b. Letting A denote the node-arc incidence matrix for the network G, the minimum cost multicommodity network flow problem is [ MCNFP ] f*
min Z
=
c~xk,
kEK~
s.t. Axk =dk,
Z
k E IC,
xk <_ b,
kE1C
xk >_O,
kGIC.
Kennington and Shalaby (1977) suggest the following solution procedure, which is based on allocations of the capacities of the total arc flows to separate commodities. (See, e.g., Chapter 9 of Lasdon, 1970, for a general description of the right:hand-side allocation technique.) Let y = (Yk)kcPC, where yk C RI+"al, k C /C, denote a commodity capacity allocation. Then [MCNFP] can be equivalently formulated as the master problem [MP]
ming(y), yCY
where Y = {y C RI-aI.IK;i [ ~]k~x: Yk = b; y > 0} is the set of feasible allocations, and g • /I~1+ "alljcl H • is the implicit function
[SUB] g(Y) = Zmin{c~'xk [ a x k = d k ; kCK~
O<_xk <_y~},
y C RI+"Al'lxz[.
T. Larsson et al. / European Journal of Operational Research 88 (1996) 382--403
399
Its evaluation amounts to the solution of I/CI single-commodity minimum cost network flow problems. We presume, without any loss of generality, that g is finite on R ~ Il/cl _D Y; this can always be achieved by including artificial arcs with associated large costs and capacities in [MCNFP]. (However, g is not finite outside l~/~'llPc[.) Letting uk E II~IA;I and vk E R I~'1_, k E/C, be the linear programming dual variables associated with the flow conservation and arc capacity constraints, respectively, of the I Cl single-commodity problems of [SUB] , and letting t,,t, ,,e)t,E;,,, k E 1C, be the extreme points of the corresponding dual feasible sets, the ~,~k,~k function g may be expressed as [ DSUB ] g(y) =
f.T
I"
. W. I" ~
max ,takUk + YkUk ~ ,
VE R
LAI'I/¢I.
kEIC pE~k
Hence, this function is convex and piecewise linear with a finite number of segments. From this expression of g it is also clear that if, for some y E Y, (Uk(yk),vk(yk)), k E /C, are optimal dual extreme points, then v(y) = (vk(Yk))kE~: is a subgradient of g at y. The method of Kennington and Shalaby employs a subgradient optimization scheme for solving the nondifferentiable program [MP]. From the finiteness of the sets 79k, k E /C, it follows that the sequence {t'(y t) } generated in the solution process is bounded; therefore, the condition (18) will be fulfilled. Letting x ° be the solution to the relaxation of [MCNFP] obtained by replacing the set of constraints ~-]kEJCXk < b by xk _< b, k E /C, Kennington and Shalaby's procedure is initialized at yO = pv(xO). Given an iterate yt E Y, the value g(yt) defines an upper bound on the optimal value of [MCNFP]. A step is taken in the direction --t2(yt), with step length according to the formula (4), and the point obtained, yt+l/2 = yt _ ottu(yt), is projected onto the set Y. This projection separates into single-arc problems, each of which is solved by the dual line search method outlined in Section 4.2. Our modification of the above scheme is to replace the subgradient v ( y t) by its projection, found by solving [DSPL]; this projection separates into single-arc problems which are solved by the dual method outlined in Section 4.2. The step length is computed according to the formula (15). In order to obtain strong lower bounds on the optimal value of [MCNFP], Allen (1985) combines the procedure of Kennington and Shalaby (1977) with Lagrangean relaxation of the capacity constraints. In our application of Allen's relaxation scheme, we include in [MCNFP] the redundant commodity capacity constraints xk < b, k E/C, thereby obtaining a stronger Lagrangean subproblem. (Allen considers a version of [MCNFP] which includes commodity capacity constraints Xk _< b~, k E /C.) Introducing a vector u of multipliers, the Lagrangean dual problem then is to find max,_>0 h(u), where h(u) = ~-~min{(Ck + u ) T x k I A x k = d k ;
O< xk < b } - b V u .
kE1C
The function h • RI+"AI H I~ is concave and piecewise linear with a finite number of segments (and the condition (18) will therefore be fulfilled), and its evaluation amounts to solving I/Cl single-commodity problems. Let u° E R~+"al. Given an iterate u t > O, the value h(u t) is a lower bound on the optimal value of [MCNFP]. A subgradient of h at u t is given by w(u t) = ~-]kEPCxk(ut) - b , where Xk(Ut), k E IC, is found when calculating h(ut). A step is taken in the direction w(ut), with step length according to the formula (4), and the resulting point is projected onto the nonnegative orthant (cf. Section 4.1 ). In Allen's combination of this dual scheme with the procedure of Kennington and Shalaby, each main iteration consists of one dual subgradient step, which gives u ~+j and a lower bound on f*, and one primal subgradient step, resulting in yt+l and an upper bound on f*.
Our modification of the dual scheme is to project the subgradient w(u t) onto the tangent cone of R ~ I at u t (cf. Section 4.1 ). The step length is computed according to the formula (15).
400
T. Larsson et al. / European Journal of Operational Research 88 (1996) 3 82 ~103 1.05 1.04 1.03 1.02 1.01
1 0.99 0.98
/
;~-" .."ij' : '
0.97 0.96 0.95
•? ~
0
-'" "'
Polyak'a method: ...... Kim and Um'a method: ..... Subgr~lient projection:
,'"" ,'
o.'"
10
3'0
4'0
5'0
6'0
7'0
8'0
9'0
100
Fig. 8. M i n i m u m c o s t m u l t i c o m m o d i t y n e t w o r k flow, Ot = 1 f o r all three m e t h o d s .
The test problems have a transportation network structure and are randomly generated in accordance with the principles in Kennington and Shalaby (1977). We generated 15 instances, each having 10 sources, 10 sinks and 10 commodities. The supply/demand for each source/sink and commodity is in the range 0-300, arc capacities in the range 100-400, and arc costs in the range 0-100. All data is uniformly distributed. The transportation subproblems were solved using the minimum cost network flow code RELAX (Bertsekas and Tseng, 1988). For each of the three methods and for each instance and iteration we recorded current upper and lower bounds normalized by the corresponding optimal value (which was computed by the linear programming package XMP; see Marsten, 1981 ). Then, for each method and iteration, we computed the average value of these normalized bounds over the 15 instances. In our first experiment we investigated the difference in quality of the step directions of Polyak's (1969) method and the subgradient projection method, and the effect of the shortening of the step (19) made in the method by Kim and U m ( 1 9 8 9 ) . In order to make the comparison unbiased we chose 0t ---- 1 in both the primal and the dual application of each of the three methods. (As did Kim and Um, 1989, we used el = 1 0 - 6 in their method.) Fig. 8 shows average values of normalized upper and lower bounds for the three methods versus iterations. The relative difference between the average values of the normalized bounds became less than 2.0% at iteration 48 of Polyak's method and at iteration 25 of the subgradient projection method. For Kim and Um's method, this difference did not reach below 3.3% during the first 100 iterations. A first observation is that the introduction of the projection [ DSPL] yields significantly better step directions. Secondly, a shortening of the step length deteriorates the performance of the subgradient projection scheme. In the second experiment we compared the three methods' best possible performances. For this purpose we calibrated the primal and dual step length parameters for each of the three methods. This resulted in choosing, for all t, 0t equal to 1.9 in both the primal and the dual application of Polyak's method, 1.2 and 0.3 in the primal and dual applications of the subgradient projection method respectively, and correspondingly, 1.0 and 0.3 in the applications of Kim and Um's method. The result of the second experiment is shown in Fig. 9, which was generated in the same way as Fig. 8. The relative difference between the average values of the normalized bounds became less than 1.0% at iteration 63 of Polyak's method and at iteration 33 of the subgradient projection method. In the primal application of Kim and Um's method, the step length reduction resulted in a poor performance for all feasible choices of the step length parameter Or.
401
T. Larsson et al./ European Journal of Operational Research 88 (1996) 382-403 1.05
1.04 1.03
L. \.
1.02
1.01 1
0.99
/..;, 0.98 U;'
Polyak's method: ...... Kim Lnd Um's method: ..... Subgradient projection: ....
~i,
0.97 0.96 0"950
1'0
2'0
3'0
4'0
5'0
6'0
7'0
8'0
90
10o
Fig. 9. Minimumcost multicommoditynetworkflow, Ot calibratedfor each method. 5. C o n c l u s i o n
and discussion
The contribution of this paper is twofold. The theoretical contribution is the introduction of the conditional subgradient optimization method and the special case of subgradient projection, and the establishment of these methods' convergence under several step length selection rules. The computational contribution is found in the results of our experiments with three applications; these results justify the use of conditional subgradient optimization by showing that the practical rate of convergence of a subgradient optimization scheme may be significantly improved by the introduction of projected subgradients. The good overall performance of the subgradient projection method compared to that of Polyak's and Kim and Um's methods may be explained by the following observations. The method by Kim and Um (1989) performs like an active set method in the sense that the set of active constraints changes very slowly; this behaviour degrades the practical performance of the method and makes it sensitive to changes in the location of the starting solution. In the subgradient projection method, the active set is taken into account only when computing the step direction. Longer steps are therefore allowed, thus enabling a more rapid change in the active set; this property makes the method less sensitive to the choice of starting solution. (In some tests, not reported here, we found that the performance of Kim and Um's method can be improved by increasing the value of el considerably, so that their method tends to perform as the subgradient projection method.) Rapid changes in the active set are typical also for Polyak's (1969) method. However, when approaching an optimal solution, the zig-zagging phenomenon illustrated in Fig. 1 occurs. (As can be seen from the results of the experiments reported in Sections 4.1 and 4.3, the best performance of Polyak's method was always obtained from large values of the step length parameter, the reason being that the use of large step lengths compensates, to some extent, for the effects of the zig-zagging behaviour.) We conclude with a discussion of some suggestions for continued research. The method of Kennington and Shalaby (1977) for multicommodity network flows is claimed (Minoux, 1986, Section 8.5.2) to be efficient for obtaining good approximate solutions to [MCNFP]. It is our experience that the combination of this method with Allen's (1985) lower bounding procedure improves on its performance, and further that the introduction of the subgradient projection enhances the performance of the combined scheme. A more sophisticated implementation of the combined subgradient projection scheme may therefore
402
T. Larsson et al./ European Journal of Operational Research 88 (1996) 382--403
yield a highly efficient tool for the approximate solution of large-scale problems; it may also be applied to yield an advanced starting solution for an optimizing method (e.g., Saviozzi, 1986). As discussed in Section 1, the subgradient projection method is most beneficially used when the feasible set is simple, and this may seem to limit its applicability. It would, however, be possible to embed the method in a simplicial decomposition scheme (e.g., von Hohenbalken, 1977), which would convert any linearly constrained nondifferentiable optimization problem into a sequence of nondifferentiable programs over unit simplices, and our method may then be applied to each of these programs. In a current endeavour to establish such a scheme, we are exploiting an ergodic sequence of directions constructed from the subgradients used within the conditional subgradient optimization method. Any limit point of this sequence of directions is both a subgradient and an element of the negative of the normal cone at an optimal solution (cf. Proposition 2.5); the latter property provides a means for defining a new nondifferentiable program over a unit simplex (in a higher dimension).
Acknowledgements The research leading to this report was financially supported by the Swedish Research Council for Engineering Sciences (TFR). We thank Dr. Zhuangwei Liu for letting us use her implementation of the dual part of the multicommodity network flow algorithm, and Dr. Ulf Br~nnlund for drawing our attention to the report by Svanberg and B~fithe. We also thank two anonymous referees for giving constructive suggestions leading to an improved presentation, and one of them in particular for also finding a flaw in a proof.
References Allen, E.P. (1985), "Using two sequences of pure network problems to solve the multicommodity network flow problem", Dissertation, School of Engineering and Applied Science, Southern Methodist University, Dallas, TX. Bertsekas, D.E, and Mitter, S.K. (1973), "A descent method for optimization problems with nondifferentiable cost functionals", SIAM Journal on Control 11,637-652. Bertsekas, D.E, and Tseng, P. (1988), "The RELAX codes for linear minimum cost network flow problems", Annals of Operations Research 13, 125-190. Bihain, A., Nguyen, V.H., and Strodiot, J.-J. (1987), "A reduced subgradient algorithm", Mathematical Programming Study 30, 127-149. Brucker, P. (1984), "An O(n) algorithm for quadratic knapsack problems", Operations Research Letters 3, 163-166. Burke, J.V., and Ferris, M.C. (1993), "Weak shaW minima in mathematical programming", SlAM Journal on Control and Optimization 31, 1340-1359. Burke, J.V., and Mor6, J.J. (1988), "On the identification of active constraints", SIAM Journal on Numerical Analysis 25, I 197-1211. Camerini, P.M., Fratta, L., and Maffioli, E (1975), "On improving relaxation methods by modified gradient techniques", Mathematical Programming Study 3, 26-34. Damberg, O. (1990), "Solving the traffic assignment problem - A new approach using the gap function", Masters Thesis, LiTH-MAT-EX90-06, Department of Mathematics, Link6ping Institute of Technology, Link6ping, Sweden. Dem'janov, V.E, and Somesova, V.K. (1978), "Conditional subdifferentials of convex functions", Soviet Mathematics Doklady 19, 11811185. Dem'yanov, V.E, and Shomesova, V.K. (1980), "Subdifferentials of functions on sets", Cybernetics 16/I, 24-31. Dem'yanov, V.E, and Vasil'ev, L.V. (1985), Nondifferentiable Optimization, Optimization Software, New York, NY. Ermol'ev, Yu.M. (1966), "Methods for solving nonlinear extremal problems", Cybernetics 2/4, 1-14. Fisher, M.L. ( 1981 ), "The Lagrangian relaxation method for solving integer programming problems", Management Science 27, 1-18. Fisher, M.L. ( 1985 ), "An applications oriented guide to Lagrangian relaxation", Interfaces 15, 10-2 I. Fletcher, R. (1987), Practical Methods of Optimization, 2nd edn., Wiley, Chichester. Geoffrion, A.M. (1974), "Lagrangean relaxation for integer programming", Mathematical Programming Study 2, 82-114. Goffin, J.L. (1978), "Nondifferentiable optimization and the relaxation method", in: C. Lemar6chal and R. Mifflin (eds.), Nonsmooth Optimization (Proceedings of a IIASA Workshop, 1977), Pergamon Press, Oxford, 31-49. Gould, F.J., and Tolle, W. (1971), "A necessary and sufficient qualification for constrained optimization", SlAM Journal on Applied Mathematics 20, 164-172. Guignard, M. (1969), "Generalized Kuhn-Tucker conditions for mathematical programming problems in a Banach space", SIAM Journal on Control 10, 93-98.
T. Larsson et aL / European Journal of Operational Research 88 (1996) 382-403
403
Held, M., Wolfe, P., and Crowder, H.P. (1974), "Validation of subgradient optimization", Mathematical Programming 6, 62-88. Kennington, J., and Shalaby, M. (1977), "An effective subgradient procedure for minimal cost multicommodity flow problems", Management Science 23, 994-1004. Kim, S., and Um, B.-S. (1989), "'Polyak's subgradient method with simplified projection for nondifferentiable optimization with linear constraints", Optimization 20, 451-456. Korpelevi~, G.M. (1979), "A relaxation method of solution for matrix games", MatematiPeskie Metody ReYenija Ekonomi?eskie Zada( (Moscow) 8, 21-31 (in Russian). Lasdon, L.S. (1970), Optimization TheoryJor Large Systems, Macmillan, New York, NY. Luenberger, D.G. (1984), Linear and Nonlinear Programming, 2nd edn., Addison-Wesley, Reading, MA. Marsten, R.E. ( 1981 ), "The design of the XMP linear programming library", ACM Transactions on Mathematical Software 7, 481-497. Minoux, M. (1986), Mathematical Programming: Theory and Algorithms, Wiley, Chichester. Panier, E. (1987), "An active set method for solving linearly constrained nonsmooth optimization problems", Mathematical Programming 37, 269-292. Polak, E., Mayne, D.Q., and Wardi, Y. (1983), "On the extension of constrained optimization algorithms from differentiable to nondifferentiable problems", SIAM Journal on Control and Optimization 21, 179-203. Poljak, B.T. (1967), "A general method of solving extremum problems", Soviet Mathematics Doklady 8, 593-597. Polyak, B.T. (1969), "Minimization of unsmooth functionals", USSR Computational Mathematics and Mathematical Physics 9, 14-29. Polyak, B.T. (1979), "Sharp minima", Lecture Notes, Institute of Control Sciences, Moscow, USSR. Presented at the IIASA Workshop on Generalized Lagrangians and their Applications, IIASA, Laxenburg, Austria, 1979. Polyak, B.T. (1987), Introduction to Optimization, Optimization Software, New York, NY. Rockafellar, R.T. (1970), Convex Analysis, Princeton University Press, Princeton, NJ. Rockafellar, R.T. ( 1981 ), The Theory of Subgradients and its Applications to Problems of Optimization: Com'ex and Noncom,ex Functions, Heldermann Verlag, Berlin. Rosen, J.B. (1960), "The gradient projection method for nonlinear programming, part I: linear constraints", Journal of the Society Jbr Industrial and Applied Mathematics 8, 181-217. Saviozzi, G. (1986), "Advanced start for the multicommodity network flow problem", Mathematical Programming Study 26, 221-224. Shepilov, M.A. (1976), "Method of the generalized gradient for finding the absolute minimum of a convex function", Cybernetics 12, 547-553. Shor, N.Z. ( 1985 ), Minimization MethodsJbr Non-Differentiable Functions, translated from the Russian by K.C. Kiwiel and A. Ruszczyfiski, Springer-Verlag, Berlin. Shor, N.Z. ( 1991 ), "The development of numerical methods for nonsmooth optimization in the USSR", in: J.K. Lenstra, A.H.G. Rinnoy Kan and A. Schrijver (eds.), History of Mathematical Programming: A Collection of Personal Reminiscences, North-Holland, Amsterdam, 135-139. Sreedharan, V.P. (1987), "e-subgradient projection algorithm", Journal of Approximation Theory 51,27-46. Strodiot, J.-J., Nguyen, V.H., and Heukemes, N. (1983), "e-optimal solutions in nondifferentiable convex programming and some related questions", Matematical Programming 25, 307-328. Svanberg, K., and B/igtthe, O. (1985), "'Computational experiments with different subgradient techniques", TRITA-MAT-1985-3, The Swedish Institute of Applied Mathematics, Sweden. Tamarit Goerlich, J.M. (1988), "Improving feasible directions for a class of nondifferentiable functions", European Journal of Operational Research 34, 99-104. von Hohenbalken, B. (1977), "Simplicial decomposition in nonlinear programming algorithms", Mathematical Programming 13, 49-68. Zarantonello, E.H. ( 1971 ), "Projections on convex sets in Hilbert space and spectral theory", in: E.H. Zarantonello (ed.), Contributions to Nonlinear Functional Analysis ( Proceedings of a Symposium Conducted by the Mathematics Research Center, University of Wisconsin, Madison, WI, April t971 ), Academic Press, New York, NY, 237-424.