New dual-type decomposition algorithm for nonconvex separable optimization problems

New dual-type decomposition algorithm for nonconvex separable optimization problems

0005-1098/89 $3.00 + 0.00 Pergamon Press plc 1989 International Federation of Automatic Control Automat/ca, Vol. 25, No. 2, pp. 233-242, 1989 Printed...

776KB Sizes 1 Downloads 97 Views

0005-1098/89 $3.00 + 0.00 Pergamon Press plc 1989 International Federation of Automatic Control

Automat/ca, Vol. 25, No. 2, pp. 233-242, 1989 Printed in Great Britain.

New Dual-type Decomposition Algorithm for Nonconvex Separable Optimization Problems* P. TATJEWSKIt

Although the augmented Lagrange function is not separable, suitable reformulation of this function and introduction of auxiliary variables lead to a decomposition algorithm with simple and efficient adjusting rules for upper level variables. Key Woods---Large-scale systems; decomposition; nonlinear programming; augmented Lagrange functions.

Abstract--The aim of the paper is to present a new dual-type decomposition algorithm for large-scale nonconvex optimization problems of general separable structure. After reformulation of the augmented Lagrange function and introduction of auxiliary variables, called approximation points, separable local optimization problems are created. The lower level of the method consists of a few optimization runs of these problems for subsequently improved approximation points, whereas standard updating of Lagrange multipliers constitutes the highest level. Simple rules for adjusting the approximation points and the Lagrange multipliers are given and thoroughly analysed. Appficabifity conditions and example numerical results indicate that the presented algorithm eliminates, to a great extent, drawbacks of the previous approaches.

g, : Rn'---~R p' are given vector functions describing constraints, and Q,.R ~ R , i = 1, N. Among several possible methods, the dual method (also called the price method) of decomposition is ideally suited for problem (1), especially when N is large (see e.g. Lasdon (1970)). However, application of the original dual approach to problem (1) requires existence of a saddle-point of the Lagrange function and uniqueness of solutions of lower level problems (see e.g. Findeisen et al. (1980)). This is often not satisfied, especially when the problem results from engineering applications, and is nonlinear, nonconvex. To get the dual approach applicable for such problems, various convexification procedures have been proposed. In most approaches an application of the quadratic penalty term was tried, leading to the augmented Lagrange function

1. INTRODUCTION

LET us COSSXDER the following separable optimization problem

minimize { Q(x) = ~ Q,(x,)} iffil N

subject to: h(x) ~- ~, hi(x,) = O,

(1)

N

/=1

gi(xi)~O, where xi~R ~' are

L,(x, x) = i=1 .....

decision

N,

variables,

N

Q,(xi) + i=1

x=

+ ½r

N

(xl . . . . . XN)~R n, n= ~ ni, hi:R ~'--*Rm and

h,(x,) i=1

12, I h,(x,)

,

(2)

where r > 0 is a penalty coe~icient. The approach is well known and has been successful in theory and practice of general purpose constrained optimization algorithms. However, the quadratic penalty term is not separable due to its crossproduct terms, and thus the function L,(., ~.) is not separable. For an interconnected system optimization problem, which is a very special case of (1), Stephanopoulos and Westerberg (1975) proposed a kind of approximation of the crossproduct terms by linear functions, and a decomposition algorithm based on it. The

i=l

* Received 15 May 1986; revised 15 June 1987; received in final form 27 July 1988. The original version of this paper was presented at the 4th IFAC/IFORS Symposium on Large Scale Systems: Theory and Appfications which was held in Zurich, Switzerland during August 1986. The Published Proceedings of this IFAC meeting may be ordered from: Pergamon Press pic, Headington Hill Hall, Oxford OX3 0BW, England. This paper was recommended for publication in revised form by Associate Editor M. Jamshidi under the direction of Editor A. P. Sage. t Institute of Automatic Control, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warszawa, Poland. 233

234

P. TATJEWSKI

algorithm was heuristic, complex and poorly effective, as pointed out by Watanabe et al. (1978). Its simpler version was proposed by Stoilov (1977). Watanabe et al. (1978) proposed quite another approach to the interconnected system optimization problem, based on replacing the penalty terms by equivalent minimum functions, with additional unknown variables. However, an additional minimization level was then necessary, still leading to increased complexity. Moreover, it is not clear how to adopt this approach to the more general problem (1). In this paper a new decomposition method based on the augmented Lagrange function (2) is proposed and analysed, for the general case of the optimization problem (1). It is based on a separable approximation of the augmented Lagrangian (2) which uses some auxiliary variables called approximation points. The key elements of the method are the proposed and thoroughly analysed simple rules for adjusting both the approximation points and the Lagrange multipliers. The applied function approximation reduces in fact to that of Stephanopoulos and Westerberg (1975) in the case of the interconnected system optimization problem, but the whole algorithm is quite different. It will be called the augmented price method, since it seems to be a natural generalization of the classical price method. It was first proposed by Tatjewski (1984) and improved by Engelmann and Tatjewski (1986), for the special case of the interconnected system optimization problem. It should be noted that another decomposition algorithm for nonconvex optimization problems of the general form (1) was proposed by Bertsekas (1979), based on quite different convexification of the Lagrange function. The algorithm was poorly efficient, therefore Tanikawa and Mukai (1985) modified and improved it significantly--but only for problems without local constraints gi(x~) <~O, i = 1 . . . . . N. Tatjewski (1986) proposed further developments of this approach to include local constraints and to increase the efficiency. Although the algorithm was clearly stated in Tatjewski (1986), its applicability conditions and numerical behaviour are still under research, and will be published elsewhere.

2. A L G O R I T H M

STRUCTURE

AND

function N i=1

i=l

+ ½r~ h/(sj) + hi(xi) • (3) i=I j= ~i where s = ( s , , s 2 . . . . , s N ) ~ R ~ is the approximation point. The function (3) is obviously separable, N

A,(x, ~., s) = ~ A~.(xi, s),

(4)

i=1

A,,(xi, ;t, s) = Qi(xi) + ;tXhi(x~) + ½r 10/i;(xi, s)ll 2, (5) where N

h~(x~, s) = ~_, hj(sj) + hi(x~) j=

(6)

1,i~i

was introduced to shorten the notation, i = 1. . . . , N. Let us note an interesting interpretation of local performance functions A,,--treating N

the interaction constraints

E h~(xi)=O as a i=l

common resources constraint for N subproblems, the approximate consumption of these resources by remaining subproblems N

E

hj(sj) is the only global information needed

j= l,j~i

to build the ith local function A~, (apart from Lagrange multipliers ~.). We are now in a position to define the following structure o f the augmented price method. (1) Set initial values x ° and ).0, a sequence 6t ~ 0, 61 > 0, and final accuracy e > 0. Set iteration count I = 0. (2) For ). = ~/perform, with the accuracy 61, the following approximation loop. (2a) Set inner iteration count k =0, and S O = X 1.

(2b) Find solutions $~ of N local optimization problems minimize A,,(x,, Z1, s k) subject to gi(x~) <- O,

(7) i = 1. . . . .

N.

(2c) If IIsk - ~*11 ~< 61

(8)

then set x l+a =-~* and go to (3), else improve the approximation point

ASSUMPTIONS

The augmented Lagrange function (2) is not separable with respect to x~, i = 1. . . . . N, since it contains crossproduct terms h~(xOTh/(xj), i #:j. In order to make this function separable it is proposed in this paper to replace it by the

N

At(x, )., s) = ~ Qi(xi) + ) T ~ hi(xi)

s j'+l = B(s*, Xk),

(9)

set k = k + 1 and go to (2b). (3) If d ~z~ iih(x~+~)lI <~ E

(10)

Algorithm for nonconvex separable optimization problems then stop taking x 1+~ as an acceptable representation of the optimal point ~, else adjust the multipliers

~/+1 = M()f, xl+l),

minimize L,(x, Z1)

(12)

x

i--1 .....

N.

An augmented price method algorithm will be well defined and effective if local optimization problems (7) are well defined and adjusting rules B and M are effective. Let us formulate assumptions now. Let us denote by g : R ~ ~ R p a vector function consisting of all local constraint functions g;, g = (gl, g2 .... , gN). Let gA "R" ~ R 1"~ denote a vector function consisting of those elements of g which are active at the optimal point $, i.e. go(x~) = 0, j = 1. . . . , Pa,, i = 1. . . . , N, where go denotes the jth component of the vector g~. The following assumptions, which are rather standard conditions assuring that a saddle-point of the augmented Lagrange function (2) exists (Arrow et al., 1973; Bertsekas, 1982; Findeisen et al., 1980), are used throughout the paper: ($1): functions Q, h and g are of class C 2 in some neighbourhood of the optimal point ($2): $ is a regular point, i.e. gradients of all active constraints are linearly independent at £; ($3): the point £ satisfies (a) first order necessary optimality conditions V:Y-o(£, )~,/~) = 0,

(13)

h(£) = 0, g(£) ~<0, ~iTg(£) = 0, /~>~0,

•Sgo(x, 2~, It) = Q(x) + ~I.Th(x) "at"uTg(x); (16)

j=l .....

i = 1. . . . .

N,

(b) second order sufficient optimality condition Vx ~ T(£)\{0} xTV2x-(~o(£, ~, f~)x > 0, (17) where T(£) = (x e R" :h'(£)x = 0,

g'a(£)x = 0}.

(18)

3. PROPERTIES OF LOCAL OPTIMIZATION PROBLEMS The following Lemma is a generalization of a result sometimes referred to as Finsler's L e m m a (see e.g. Arrow et al. (1973)).

Lemma 1. Let us consider n x n matrices A and F, and p x n matrix G, p < n . Assume that xTFx >I 0 for every x e R" satisfying Gx = 0, and xTAx > 0 for every x 4= 0 satisfying Gx = 0 and xTFx = 0. Then there exists a scalar ~ > 0 such that for every r > ~ and every x =~0 satisfying Gx = 0 it holds: xT(A + rF)x > O. Proof. The result can be obtained in the same way as in the case without matrix G, see L e m m a 1.25 in Bertsekas (1982), or L e m m a 2.1 in Arrow et al. (1973). • The following theorem states that lower level problems (7) are locally well defined. It is, in fact, a consequence of the existence of a local saddle-point of the augmented Lagrangian (2).

Theorem 1. If assumptions ($1), ($2) and ($3) are satisfied, then for every sufficiently large r the lower level problem minimize A,(x, )J, s k) x

(19)

subject to g(x) <- 0 has unique optimal point .fQ.l, s k) and K - T (Kuhn-Tucker) multiplier ~(~:,s k) in some neighbourhood of the point (£, g), where g = .~, and mappings £ ( . , .), ~(. , .) are continuous and differentiable.

Proof. Let us consider optimality conditions for (14)

together with strict complementarity condition

fzo>OC~go(£,)=O,

where

(11)

set I = I + 1 and go to (2). The adjusting rules B and M, see (9) and (11), will be discussed in the sequel. The method can be considered as a generalization of the classical price method, since it has a similar structure. The only difference is the presence of the approximation loop (2a)-(2c), instead of a single minimization of local problems in the classical case. It is, of course, due to the use of the augmented instead of the usual Lagrange function, and the aim of the approximation loop is to find, with accuracy 6t, a point £~ = x T M minimizing the unseparable problem

subject to gi(xi) <~O,

235

problem (19), in a neighbourhood precisely constraints ga are active: Q;(x,) T + h;(x,)TX ' + rh;(x,)Th,(x,, : ) t T + ga,(x,) /~a, = 0,

p,, (15)

where

gA,(X,) = 0,

i = 1. . . . .

N,

(20) (21)

236

P. TATJEWSKI

where #A = (#A,, . . . . #A~) is a K - T multiplier for the constraints gA(X) <~O. When ~* = ~ and s k =.f then equations (20), (21) are satisfied at x = .f and #,a =/im. The Jacobi matrix is then = [

{~A

(22)

A0 = V~.5¢o($, ~,/~),

(23)

where t~ = d i a g {h;(£~)Vh;($i),

i = 1. . . . .

N},

GA = g~l(-1~)•

(24) (25)

We shall show that ]~ is nonsingular, provided r is sufficiently large. From definition (24), x T F x = 0 implies h t ( i i ) X i = O, i = 1 . . . . . N, which further implies h'(X)x =0. Since P is positive semidefinite, then using ($3) and Lemma 1 we obtain, for sufficiently large r xT(Ao+rF)x>0

if

(~ax=0

and

x4=0. (26)

Jr is nonsingular iff the set of equations

(,4o + rF)x + 4AT~.LA= 0,

(27)

(~ax = 0,

(28)

has only zero solution. Suppose x 4:0. Then (27)-(28) lead to contradiction with (26). Hence x = 0, which implies (~aV/~a= 0 and, due to ($2), /~A = 0. Nonsingularity of Jr has been proved. We can now apply the implicit function theorem to investigate dependence of solutions Sk, ok of equations (20), (21) on ;t and s. Hence, there are neighbourhoods Nl(~, -f) and N2(i, ~fiA) of points (~, .~) and (.~, /~a) such that for every (A~, s k) e N~(~, ~) there is a unique, continuous and differentiable solution ($k(~J, Sk), 0~(~*, sk))~ Nd.f, Oa) of (20) and (21). Taking into account the derived continuity, (26) and facts that every /~aq > 0, and for every go not contained in gA &i(X~) < 0 one can always choose Na(~,.f) sufficiently small to secure that if ().i, s k) e Na(X, .f) then second order sufficient optimality conditions are satisfied for problem (19) at points ,fk(~t, Sk). •

G ( ~ k, w k) = 0,

(30)

where ffk = (.fk, /~A), Wk = (S k, #~), and / ~ is an auxiliary variable introduced for convenience of further analysis. Iteration (29) can now be written in the augmented form w k+l = w k + 0¢(ff k - wk),

(31)

and an operator q~,

~(w k) _a__wk+,,

(32)

can be defined since ffk is a (locally) unique function of w k. Hence, algorithm (31) can be considered as that of finding a fixed point w * = q0(w*). Finally, let us define the following operator P: P(ff, w)= -G'l(ff, w)-lG~(Cv, w),

(33)

where G; denotes partial derivative with respect to the ith variable, i = 1 , 2 . We are now in a position to formulate the convergence theorem.

Theorem 2. If assumptions ($1), ($2) and (S3) are satisfied, then there exists a value r0 of the penalty coefficient r such that for every r > ro the following applies. (a) There exists a neighbourhood N(~) of the optimal multiplier ~ such that for every A ~ N(~) the point ~,(~.)= (~().), //A(~.)) is a fixed point of q0, where $(~.) is a (unique) solution of the problem (12), and /~a(Z) is the non-zero part of the corresponding K - T multiplier. (b) For every )~ e N(~) there exists a value or(A) of the relaxation parameter such that for every a~e(0, a~(~.)) the algorithm (31) is locally convergent to if().). All eigenvalues of P(ff(~.), ~(A)) are real, and denoting by 0= and 0M the smallest and largest of them, we have cr(~.) = 2/(1 - 0,,).

(34)

(c) For every cre (0, co(it)) the convergence is linear, with the convergence ratio q = m a x {11 + ~r(tg,,, - 1)1, I1 + 0ff0M - 1)l};

4. APPROXIMATION LOOP ALGORITHMS

Let us consider, first, the following simple relaxation formula B(s k, .fk): s k÷l = s ~ + m(:U - sk),

defining an operator equation

(29)

where ~ > 0 is a relaxation parameter, and assume that (At, s k) belongs to a neighbourhood of (~, g) stated in Theorem 1. Then .~, and the corresponding multiplier //ka, are uniquely defined by necessary optimality conditions (20) and (21). We shall treat this set of equations as

(35) the optimal convergence ratio is q = (OM - 0.,)/(2 - OM - 0,,),

(36)

and is obtained for o~ = &, & = 2/(2 - OM - 0,,,).

(37)

Proof. (a) Let us consider the following set of necessary optimality conditions for the problem

Algorithm for nonconvex separable optimization problems

237

sufficiently close to i(~.). Moreover,

(12) Q ' ( x ) r + h'(x)TZ + rh'(x)Th(x) + g~t(x)r/lA = 0,

(38) g,~(x) = 0,

(39)

and let us denote its solution by i ( A ) = ($(A), /2A(A)). Assumptions ($1), ($2) and ($3) imply, for every sufficiently large r, that there exists a neighbourhood NI(~) of ~ such that for every ;t e NI(~) the point i().) is unique, continuous and differentiable with respect to A minimizing point for the problem (12). This can be shown arguing analogously as in the proof of Theorem 1. Equations (20), (21) and (38), (39) yield identical solutions i(A)=(~(A), /2A().)), if s k =2(),). This means that w * = i ( 2 ) is a fixed point of q~, ;t e N~(~). (b) Let us constrain further analysis to NI(~) and let us consider properties of the solutions i k = f ( w k) of the operator equation (30), i.e. of the set of equations (20) and (21), in a neighbourhood of i(),). To this end the implicit function theorem will be applied. The Jacobi matrix for equations (20) and (21), at the point (x, uA, ;t, s) = z, = ( i ( z ) , z, is

= P(i(z), i(z)), where G;(i(Z), i(Z)) = -]r[H(2(X)) -- F(.~(A))] I_

(40)

u

(47)

H(2(A)) =

h '(2(~.))Th '($(A)).

Ao(i(A), 2.) = VL.T~o($(A), X,/~(),)), F(.~(A)) = diag {h;(2;(A))Th;(.f~(A)),

(41)

i = 1. . . . , N},

(42)

h(i(A))/V2ho(i,(A)), i = 1. . . . .

CA(i(X)) ----g;,(i(Z)),

N},

(48)

It follows directly from (31), (32) and (46) that tp'(ff(Z)) = trP(i(A), if(A)) + (1 - tr)l.

(49)

Now, using Ostrovski's theorem (see e.g. Ortega and Rheinboldt (1970)), one concludes that iteration (31) is locally convergent to i(A) if sr{~p'(i(A))} < 1, where sr {.} denotes spectral radius. It will be shown first that for every sufficiently large r all eigenvalues of the matrix P(i(~.), i(A)) are real. Let (z, t)4=0 be an eigenvector corresponding to an eigenvalue 0 4:0 (the case 0 = 0 is trivial). Then it follows from (46), (40) and (47) that

A lrzl

d[Ao + r[F GA

0

JLtJ

where matrix arguments have been omitted to shorten the notation. Equation (50) can be written in the form O(Ao + r[F + K])z + # G r t = - r ( H - F ) z

where

K(i(Z)) = diag

(46)

(50)

a',(i(z), i ( z ) ) rAo(i(A), z) + rIF(2(A)) + K($(A))] L GA(.~(~.)) GA($0(A))T]

f ' ( i ( A ) ) = - G ' , ( i ( A ) , i(A))-tG~(i(A), i()t))

(43) (44)

and subscript "j" denotes the jth component of the appropriate vector. Since i ( ~ ) = ( 2 , ~A), then G ; ( i ( 1 ) , i ( ~ ) ) = ], (45) and is nonsingular, see (22) and below. Due to continuity of i ( . ) the matrix G'~(~,(Z), i().)) remains nonsingular in a neighbourhood N2(~) ~_ N~(~). Hence for e v e r y ~,eN2(J~) there is a unique, continuous and differentiable solution i ~ = f ( w ~) of equation (30), if only w ~ is

OGAz = 0.

(51) (52)

Let us constrain the analysis to a neighbourhood Na(~) c_ N2(~) such that columns of the matrix GA=G,4($(A)) are linearly independent--it exists due to assumption ($2) and continuity of GA(:~(.)). We will show that z 4=0. If z = 0 then (51) yields G~t = 0, that implies t = 0 due to linear independence of all columns of Ga. It is a contradiction--(z, t) must not be zero for 0 4=0. Let us denote z = ZR + jz,, ~. = z R - - j z t and 0 = OR+jOt. Since 04=0 then (52) implies GAS = 0. Multiplying now (51) by 5 T we obtain 02T(A0 + r[F +

Kl)z = r2T(F -- H)z.

Since matrices A o + r [ F + K ] symmetric, then (53) implies

(53)

and F - H

are

= r[zT(F -- H)ZR + zT(F -- H)z,I,

(54)

(OR + jO,)[zT(ao + rlF + K])zR + zT(Ao + r[F + Kl)z,]

238

P. TATJEWSKI

hence O,[zTR(Ao + r[F + K])ZR + zT(Ao + r[F + Kl)z,] = 0.

(55)

Since h ( £ ( ~ ) ) = 0 , then for ) . = ~ A o + r [ F + K] = , ' i o + rF, compare (41) and (42) with (23) and (24). It has been shown in the proof of Theorem 1 that for sufficiently large r eio + rF is positive definite on the subspace GAx = 0. Due to the continuity matrix Ao + r[F + K] remains positive definite on the subspace GAX = 0, for all ). from some neighbourhood N 4 ( ~ ) c N 3 ( f t ) . Equality (55) implies then 01 = 0, due to z #: 0. Therefore, all eigenvalues of P ( i ( ) . ) , i().)) are real for ). • N4(~). It will now be shown that for sufficiently large r all eigenvalues of P ( i ( ; t ) , if(A)) are less then one. ~-o this end let us consider an eigenvalue 0 ~ 0 of the matrix P ( i , i ) , where i = i ( ~ ) = (2, /2A), and a corresponding eigenvector (z, t):#0, which is real since 0 is real. Let 0 = 1 + A O, then (53) implies zT(A0 + rlgl)z = -AOzX(fi,0 + rF)z,

(56)

where /4 = H(£(,~)). Due to Lemma 1 assumption ($3) implies that zT(fi~o + rl2I)z > 0 for z 4: 0, (~AZ = 0, provided r is sufficiently large. Hence, using (26), it must be A 0 < 0 and thus 0 < 1. Due to continuity of P ( i ( . ) , i ( . ) ) there is a neighbourhood N ( ~ ) _ N4(~) and a value ro > 0 such that all eigenvalues 0~ of the matrix e ( i ( Z ) , i(~.)) are less than one for A • N(,~) and r>ro. If o~ denotes an eigenvalue of q0'(i(A)), then (49) implies that there is an eigenvalue 0~ such that a~ = 1 + a~(0~ - 1). (57) Thus, due to O~ < 1 for all i, a~ • ( - 1 , 1) for all i provided a~ • (0, afrO.)), where o~(A) is defined by (34). (c) Let o,, and aM denote the smallest and largest eigenvalue of q0'(~,(A)), A • N ( ~ ) . For a" • (0, a'()t)) linear convergence of the algorithm (31) with the convergence ratio equal to sr{99'(10.))} follows from corollaries from Ostrovski's theorem (Ortega and Rheinboldt, 1970). Therefore, (35) follows from (57). Linear dependence of a~ on o~ implies that q = 0 when Io,.I = IoMI, which gives (36) and (37). • Theorem 2 shows that, under standard conditions, the simple relaxation algorithm (29) is always convergent, for sufficiently large r and suitably chosen a~. The convergence is linear, and depends on spectral properties of matrix P. Of course, it is rather unreasonable to try to find best values of r and a~, for a given problem, by analysing the spectrum of P, which itself is a

rather complex problem. Practically. some simple trials around the initial point are sufficient. Numerical experience gathered to date indicates (see Tatjewski (1984, 1988) and Section 4 of this paper), that convergence is good for a wide range of moderate values of r, becoming slower when r---, oc. It was also rather easy to find values of a assuring satisfactory convergence. For the interconnected system optimization problem, which is a special case of (1), an additional condition was also formulated (Tatjewski, 1984) assuring that algorithm (29) with a : = l , i.e. the algorithm sk+l=)( k, iS convergent. However, a similar condition for the problem (1) is much more constraining. To derive an updating scheme B with faster convergence for cases when (29) happens to be slow, let us look at the aim of the approximation loop as at looking for a solution g ( ) . ) = i ( X ) of the vector equation F,(s) ~ s - f,(s) = O,

(58)

or, more conveniently, F ( w ) ~=w - f ( w )

= 0,

(59)

where f ( w k) = i k = (£k, ok) is a (locally unique) solution of the operator equation (30), f ( w k) = (fl(Wk), fz(Wk)), f l ( w k ) = $ k, and fl(s) can be written in (58) instead of f l ( w ) since /akA is not present in (20) and (21). It follows now from the implicit function theorem that f ' ( w ) = - G ' , ( f ( w ) , w ) - l G ~ ( f ( w ) , w)

= P ( f ( w ) , w).

(6o)

It was shown in the proof of Theorem 2 that all eigenvalues of P(ff(A), if(Z)) are less than 1, provided r is sufficiently large and A is sufficiently close to ~. Hence F ' ( w ) = 1 - f ' ( w ) is nonsingular provided w is sufficiently close to i ( ~ ) . A Newton algorithm for solving (59) can be then proposed, which reduces, due to the structure of matrix P, to s k+l = s k - ( I - P l ( i k, Wk))-l(S k - 2 k ) ,

(61)

where P~ is a submatrix of P, with all eigenvalues less than 1, too. Due to the complex nature of Pt application of (61) seems rather unreasonable. However, using an appropriate quasi-Newton algorithm seems to be a good possibility. The following algorithm of Broyden is an example: s k+l = s k - B-~l(s k - 2k), B-~+, = B-~' -~

(62)

(As h -- B-~IAF~)(Ask)TB[, ' (Ask)TB_~tAF ~

(63)

As h = s k +! - s k,

(64)

where AF~ = F~(s k+') - F~(s k) = As k _ £ k + , +2k.

(65)

Algorithm for nonconvex separable optimization problems It should be emphasized that the matrix F~ = I - P~, the inverse of which is to be approximated, is not symmetric. It is well known (see e.g. Schwetlick (1979)), that Broyden's algorithm is one of the most effective for such problems. The quasi-Newton algorithm (62) and (63) requires many more computations per iteration than the simple relaxation algorithm (29), the more the larger dimensionality n of the initial problem. Therefore, its application is reasonable when the gain in computation time due to a smaller number of iterations is greater than the loss due to its increased complexity. It happens, usually, when local optimization problems (7) are not simple. 5. MULTIPLIER UPDATING RULE

An algorithm for updating Lagrange multipliers A (i.e. adjusting rule M, see Section 2.1) will now be considered, to complete the analysis. We are in a good position here, since after every run of the approximation loop a point x ~+~ = :e1 is obtained minimizing the original augmented Lagrange function L,(x, ,~1) subject to local constraints gi(x~) <~O, i = 1 . . . . . N. Therefore, for updating ,~ we can adopt schemes from general purpose optimization routines. Adaptation is needed since, in the standard approach, all (or at least all nonlinear) constraints are involved in the augmented Lagrangian, and in our case all local constraints are not. We use, in fact, a "partial" augmented Lagrange function. The most known and useful adjusting rule for is the following Hestenes-Powell multiplier rule: ~.'+' = ;tI + rh(xl+~). (66) The following result is basic for convergence analysis of (66), in our case of "partial" augmented Lagrangian. It takes into account only approximate solutions of problem (12) and a sequence {rt} of penalty coefficients, constant value r' = r being a special case. Theorem 3. If the assumptions ($1), ($2) and ($3) are satisfied, then there are positive scalars ~, r, M and e such that there is a unique, within a sphere S((:~,/zA), e), solution 0?1,/~) of the equations

V~L,,(x, ).) + g'A(x)Tl~A = a~,

(67)

gA(x) = a~,

(68)

for every (~/, r 1, a') e D, where a I = (a~, a12)• R ~+p~, and D = {(A, r, a) : (ll~. - £112/r 2 + Ila112) ~rz <

r,

r t> F}.

(69)

239

Moreover, the following estimation holds:

IIA1 + rlh(~1) - £11 ~
(71)

where {6~} is a nonincreasing sequence converging to zero, then for r I > M(1 + 2y) HA' + rlh(~ ') - ~ll ~
IIA'- II.

(72)

Proof. T h e proof proceeds along the lines of the proof of Theorem 2.14 in Bertsekas (1982), and considerations following this theorem. It will be omitted here, to preserve a reasonable length to the paper. •

It follows from Theorem 3 that if subsequent points :eI = x T M satisfy (lIVxL,,(~', A1) + g~(~l)T~ [i2 + IIgA(~')l[2)'r2 ~< 6,,

(73)

where {61} is an arbitrary sequence converging to zero or is given by (71), then A'--,~. However, the convergence rate is linear only when 61---~0 as fast as II,V-,~ll/r', what is satisfied for (71). The termination criterion for the approximation loop corresponding to (73) is ([[VxLr,(~k, ~1) ac g,~(.l~k)T~k[12 + IIgA(~k)ll2)"~ ~< 6,,

(74)

and can be used instead of (8). However, (8) gives good practical results and is simpler. In fact, it can be easily shown that (8) implies (74), if only local optimization problems (7) are solved sufficiently accurately, with solution error of necessary optimality conditions tending to zero as fast as 6,. When using (71), a posteriori building of the sequence {6~} is usually recommended, e.g. 6~+1 = 6,, 6o given. In this case criterion (8) can be written in compact form

IIsk -~kll ~
60 given.

(75) (76)

Relation (72) implies that the larger the value of r the better the convergence rate of (66). However, as r increases, the local optimization problems (7) become ill-conditioned. Therefore, a trade-off must be found. The method can work well with a constant, sufficiently large °(but moderate) value of r, chosen by some initial trials. This way of operation is recommended, first of all, for the case when a sequence of

240

P. TATJEWSK!

problems (1) with varying parameters is solved. Theorem 3 also justifies the use of an increasing sequence { / } of penalty coefficients. Automatic procedures for building such sequences, and with penalty limitation, are well known in general purpose optimization theory (see e.g. Bertsekas (1982), and Polak and Tits (1980)). They can be easily incorporated into our method. 6. A N A P P L I C A T I O N

Let us consider the optimization problem:

EXAMPLE

following

separable

11

minimize ~ t(k). [294ul(k) 2 + 503ul(k) k=0

+ 707.5u2(k) z + 1385uz(k) + 154.5]

(77)

subject to:

Zl(k + 1) = Zl(k) + 3.6u3(k);

(78)

z2(k + 1) = zz(k) + 7.2(Ul(k) + u2(k) -u3(k)-d(k)),

k=O,...,ll;

155.2u,(k)-59.5 ~<0,

(79)

k =0 .....

7;

(80)

+ 4 7 d l ( k ) 2 - b - 5 9 . 5 ~< 0;

(81)

z,(k + 1) + 155.2ut(k) + au3(k) zl(k + 1) + 124u2(k) + aua(k ) + 47dl(k) 2 -b-60~
k=8,...,11;

(82)

0.15 ~< u~(k) ~<0.39; O. 146 ~< u 2 ( k

(83)

) ~< 0 . 5 2 8 ;

(84)

0.5 ~
l<~zz(k)<~6,

k=0 .....

-0.632<~u3(k)~<0,

(85) 11; 7;

(87)

11;

(88)

z,(O) = za(12) = z2(O) = zz(12) = O;

(89)

0 ~
fO.8, / t(k) = ~ 2, [ 3.5, a=

k=0 .....

(86)

k=8 .....

k=8,9,10,11; k = 1, 2, 3, 4; k=0,5,6,7;

5+94dr(k), u3(k)E[0, 0.1); 27.5+94dl(k), u3(k)~>0.1;

(90)

(91)

reservoir levels of a water supply network, being an element of a large water management system (Findeisen et al., 1984). Mathematically, (77)(93) is a convex optimization problem with 60 variables ul(k), u2(k), u3(k), k = 0, 1. . . . . 11, Zl(k), z2(k), k = 1, 2 . . . . . 12. It is not strictly convex, and this precludes application of the classical dual method (see e.g. Findeisen et al. (1980)). The augmented price method was therefore applied. The augmented Lagrange function Lr, see (2), was formulated for (77)-(93), after converting inequalities into equalities using additional (slack) variables-which increased dimensionality of the problem to n =76. Dimensionality of A is m =40. The function A,, see (4), was formulated for N = 76, i.e. treating each scalar variable as belonging to its own "subproblem". It was possible due to simple structure of all local constraints (83)-(89) and led to 76 one-dimensional simple constrained local optimization problems (7), and thus to high effectiveness of the approach. An example run of the method, with relaxation algorithm (29) with 0:=0.48 and dynamic accuracy using (75) and (76) with ~, = 0.1 in the approximation loop, is presented in Fig. 1. Points on the curves indicate subsequent modifications of multipliers adjusted according to (66) and Ik denotes total number of solutions of local optimization problems. The curves show the performance function A, and the norm d r, see (10), which is a good measure of the distance from the optimum. They are shown until d t ~< e = 0.01, which yields a quite accurate solution. Initial values of 76 + 40 variables were such that dl~-10.4 after first termination of the approximation loop, and r = 150 was used. Figures 2 and 3 provide an insight into sensitivity of the method to its parameters. Figure 2 shows dependence of Ik on penalty coefficient r. Two cases are presented: with relaxation algorithm (29) and with quasi-Newton algorithm (62)-(63) with Bo~=0:l, both for

d = (0.7153, 0.7734, 0.8103, 0.8723,

i lO i !

0.9321, 0.7163, 0.5941, 0.6553, 0.6921, 0.4440, 0.4334, 0.5063);

(92) 20500

dl = (0.2748, 0.3377, 0.3729, 0.3738, 0.3796, 0.2234, 0.1854, 0.2527,

20300

0.2189, 0.2452, 0.2618).

~0.1 i i

(93)

The presented problem constitutes a discrete optimal control problem with many constraints on controls ul(k), u2(k), u3(k) and states zl(k), z2(k). The problem has been formulated for repeated calculations of optimal schedules of

Ar 20100

dI 0

t 50

. I O0

i 150

i 200

j 0.01 i 250

Ik FIG. 1. E x a m p l e run of the algorithm.

Algorithm for nonconvex separable optimization p r o b l e m s 600 RL

5O0 40O 3OO

A

2001 I00~0

i

Ioo

200

rithm Ne~.,~ algorithm

~ "

360

I

'

400

560

r

FIG. 2. Effect of the value of r on performance. te = 0.48, and all other choices as for Fig. 1. For small values of r both versions b e c o m e inefficient due to low efficiency of (66). For larger values of r the application of quasi-Newton updating yields results practically insensitive to r) as expected. Figure 3 shows the d e p e n d e n c e of I k on re, for all other choices as for Fig. 1. It is remarkable that the range of acceptable initial diagonal matrices Bo 1 = trl is rather wide. Similar behaviour of the m e t h o d was also encountered for other examples (Tatjewski, 1988). It can be therefore treated as typical, at least for some classes of problems. As indicated earlier, the n u m b e r of lower level solutions I k cannot be the only measure for comparison of both versions of the method. For easy local optimization problems (7) and large dimensionality n the time needed for the quasi-Newton updating can be dominating. It was just the case in our example, where every run with this updating t o o k m o r e computing time than the one with the relaxation algorithm (29), in spite of the smaller value of l k . 7. CONCLUSIONS A new decomposition m e t h o d for nonconvex optimization problems of general separable f o r m has been presented and thoroughly analysed in

600 500

~

N

RL RL. relaxationalgorithm

400 .=~

300

200

0

0'.2

014

016

018

' 1,0

ot

FIo. 3. Effect of the value of o: on performance.

' 1.2

241

the paper, which eliminates, to a great extent, drawbacks of previous approaches. The m e t h o d is a dual-type one, based on the a u g m e n t e d Lagrange function in its a p p r o x i m a t e , separable form involving auxiliary variables. The m e t h o d is, in fact, a three-level one, but with the optimization task at the fully decentralized lowest level only, and with simple and efficient adjusting rules at higher levels. Convergence of these rules has been thoroughly analysed in the paper, and a representative numerical example has also been given stemming from optimal control applications. It should also be noticed that the basic idea underlying the presented m e t h o d has found a fruitful application in on-line steady-state control of interconnected processes (Tatjewski, 1985). Acknowledgements--This work was supported in part by Polish Ministerium of National Education under Grant RP.1.02. The author also wishes to thank the referees for helpful comments. REFERENCES Arrow, K. I., F. J. Gould and S. M. Howe (1973). A general saddle point result for constrained optimization. Math. Prog., 5, 225-234. Bertsekas, D. P. (1979). Convexification procedures and decomposition methods for nonconvex optimization problems. J. Opt. Theory Applic., 29, 169-197. Bertsekas, D. P. (1982). Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York. Engelmann, B. and P. Tatjewski (1986). Accelerated algorithms of the augmented interaction balance method for large-scale system optimization and control. J. Syst. Anal. Model. Simul., 3, 209-226, Findeisen, W., F. N. Bailey, M. Brdy~, K. Malinowski, P. Tatjewski and A. Wo~haiak(1980). Control and Coordination in Hierarchical Systems. J. Wiley, Chichester/New York. Findeisen, W., M. Brdy~, B. Frelek, H. Michaiska, P. Tatjewski and A. Wo~.niak (1984). Hierarchical structure design for decision making and control in a large water management system---a case study paper. In Proc. 3rd IFAC Symposium Large Scale Systems Theory and Applications, pp. 619-636. Pergamon Press, Oxford. Lasdon, L. S. (1970). Optimization Theory for Large Systems. Macmillan, London. Ortega, J. M. and W. C. Rheinboldt (1970). lterative Solution of Nonlinear Equations in Several Variables.

Academic Press, New York/London. Polak, E. and A. Tits (1980). A globally convergent, implementable multiplier method with automatic penalty limitation. Appl. Math. Optimiz., 6, 335-360. Schwetlick, H. (1979). Numerische Lrsung Nichtlinearer Gleichungen. Verlag der Wiss., Berlin. Stephanopoulos, G. and A. W. Westerberg (1975). The use of Hestenes' method of multipliers to resolve dual gaps in engineering system optimization. J. Opt. Theory Applic., 15, 285-309. Stoiiov, E. (1977). An algorithm of augmented Lagrangians in two-level static optimization. Arch. Aut. Telemech., XXll, 219-237 (in Polish). Tanikawa A. and H. Mukai (1985). A new technique for nonconvex primal-dual decomposition of large scale separable optimization problem. IEEE Trans. Aut. Control, AC-30) 133-143. Tatjewski, P. (1984). A hierarchical algorithm for large-scale system optimization problems with duality gaps. In P. Thoft-Christensen (Ed.), Modelling and Optimization, Proc. 11th IFIP Conf., pp. 662-667. Springer, Berlin.

242

P. TATJEWSKI

Tatjewski, P. (1985). On-line hierarchical control of steady-state systems using the augmented interaction balance method with feedback. Large Scale Syst., 8, 1-18. Tatjewski, P. (1986). New dual-type decomposition algorithms for nonconvex separable optimization problems. In Prepr. 4th IFAC Symposium Large Scale Systems: Theory and Applicatiort~, Ziirich, pp. 296-303. Tatjewski, P. (1988). Hierarchical Methods of Optimization

and Steady-state Control of Complex, Nonconvex Processes. Wydawnictwa Politechniki Warszawskiej~ Warsaw (in Polish). Watanabe, N., Y. Nishimura and M. Matsubara (1978). Decomposition in large system optimization using the method of multipliers. J. Opt. Theory. Applic., 25, 181-193.