A gradient method for the modified Lagrange function

A gradient method for the modified Lagrange function

U.S.S.R. Compur. Maths. Math. Phys. Vol. 19. pp. 51-71 o Pergamon Press Ltd. 1980. Printed in Great Britain. 0041-5553/79/0201/0057%07.50/0 AGRADIEN...

831KB Sizes 0 Downloads 28 Views

U.S.S.R. Compur. Maths. Math. Phys. Vol. 19. pp. 51-71 o Pergamon Press Ltd. 1980. Printed in Great Britain.

0041-5553/79/0201/0057%07.50/0

AGRADIENTMETHODFORTHEMODIFIED LAGRANGEFUNCTION" G.

D.MAISTROVSKII Kharkov

(Received 4 Ocrober 1977)

A GRADIENT method with an adaptive procedure for choosing the step length is applied to search for the saddle point of the modi~ed Lagrange function of a convex programing problem. It is shown that the process is convergent to a saddle point. When sufficient conditions for a strict regular maximum are satisfied, the rate of convergence is exponential. Consider in Rn the convex programming problem

with non-empty set Z* = X* X Y* of saddle points of the Lagrange function L. We shall assume that the concave functionsf, gl , . . . , g, are differentiable, and that their derivatives satisfy a Lipschitz condition in any compactum. A gradient method for seeking a point of the set Z* was proposed in [ 11, and is usually known as the method of Lagrange multipliers. This method is not in general convergent to the set Z* even when the standard sufficient conditions for a second-order maximum are satisfied: along with the conditions for strict regularity. The convergence holds only when extra conditions on the spectrum of the Hessian of the function L are satisfied, assuming, of course, that the step length of the method is sufficiently small. If the set of saddle points is stable with respect to x, the method is convergent to the e-neighbourhood of the set Z* for any E > 0, provided that the step length is sufficiently small compared with E (see [2] ). In the absence of stability (this is the situation e.g. in a linear programming problem), even e-convergence cannot be guaranteed. The method described in fl] is not an algorithm, inasmuch as there is no procedure for finding the step length. Selection of the step factors is difficult, for the following reasons. A fu;ed step multiplier may not ensure the required smallness of the step. If a sequence of step multipliers (rkJO- is chosen, so that the conditions

lim zk=O, k-cm

lZh. vjkhisl. Mat. mar. Fix, 19, 1,56-69,

lil

.tk=

m

1

max +< t, h

R=O

1979.

57

G. D. Maistrovskii

58

are satisfied, convergence to the set Z* is ensured (provided that the set of saddle points is stable with respect to x), though only in the case when 7 is sufficiently small [2]. But it is not obvious u @on’ how this latter quantity should be chosen, since no effective estimates are available. Moreover, if we use a method in which the step multiplier length tends to zero, we inevitably get slow convergence. In the present paper, the method described in [I], supplemented by an adaptive procedure for choosing the step length, is applied to a modified Lagrange function, instead of to the classical function itself. We will show that the method is convergent to the set Z*, while Iim t'>O. k-r-

In the case when the standard conditions for a maximum, and the conditions of strict regularity, are satisfied, the method converges at a rate of a geometric progression. In Sections 1 and 2, the method is studied in the context of a class of concave functions which are not in general connected with problem (0.1); and in Section 3 our results are illustrated by taking the example of the modified function, studied in [4-71 and some other papers.

1. Description of the method and proof of convergence

Let $ be a concave-convex differentiable function in Rn X Rn, with a derivative that satisfies a Lipschitz condition in any compactum. Let $‘(z) denote the value of the derivative at an arbitrary point z = (x, ?sjzand Gx(x, _v):Gy(x, JJ) the partial derivatives with respect to x and?!. Hence, +‘(z) =(qI(s, y), qV(x, y)). We shall assame that the setZ*=X*X Y*of saddle points of J/ is non-empty. Given any concave-convex function, we have the inequality (see ]l,

81) ‘I’> (I.

(X-X..

9) ) - (Y-Y’, Zl.b(XI Y>>w

(x, y) ER’ XR”‘.

(5’. y')EZ’.

We shall impose a stonger condition on $: a 7

(x-x’.

lJ%(X. y))-

(J. y) =R”XK‘“.

(y-y’, (ax’, y’)

>

(1.1)

0 exists such that

3bC-r.Y))~--Tll$V(~? Y)jj**

~7’.

(1.2)

Here and below, 11. II denotes the Euclidean norm. be an arbitrary point of space Rn X Rm, and r” any positive number I_& z.(‘= (2. y”) Consider the sequence {z4}om, z4= satisfying the inequality T” Gmin {I, .?‘ii$‘(z”) II-‘}. defined respectively by the recurrence (sL. y”) =R’ XR”,. and the numerical sequence {T’}~~, relations +‘=Z4+l,+,

IJ”),

$,lit’=y4-T4&,(Zk,

Tk+l=min {rk( I-T”]~$‘(z”) II’),

~-‘//$‘(Z”“)

y”),

(1.3) Ii-‘).

(1.4)

59

A gradient method for the modified Lagrange function

To avoid the need to make formal remarks regarding division by a quantity that vanishes for zk E Z*, we shall exclude from our discussion the trivial case of convergence to the set Z* after a fiite number of steps. Theorem 1 1. There exists a number k, such that r”“=t’(

I-$‘/j$‘(z’)

ii’), k>ko.

2. lim ?>O. A-r_ 3. The sequence

(1.5)

{zR}o- is convergent to a point of the set Z*.

For the proof, we need three preliminary lemmas. In accordance with the accepted terminology, we shall say that process (1.3) (1.4) is compact, if the sequence {/Iz~!/}~~ is bounded. Lemma

1

Process (1.3), (1.4) is compact.

Hence we find. using inequality (1.1). (1.6) On the other hand, it follows from (1.4) that (1.7) Adding (1.6) and (1.7) we get

and hence

Now observe that, from relation (1.4), we have T ~=~.i!-'~~~'(z') i,-'.i=l, 2. . . . . ?I’) ,I-‘} >O, Ic=O, 1, . . by the same expression, T'+' 22-' min {r’. 111J’( ~~z~-_z*~(~~J~~O-~*~~*+~~,This proves the lemma.

Hence, Hence

Lemma 2

A number ko exists such that T4+‘=TA(1-T1ij3’(~k)I/‘)r

k>ko.

(1.8)

60

G. D.

Maistrovskii

Proof: Assume that the lemma is false. Then, it follows from (1.4) that, for an infinite subsequence

of numbers k, we have

t’+‘=2-‘ii$‘(z’+‘)

II-‘.

To arrive at a contradiction,

it is sufficient to show that, in fact,

lim z’ll$’ (zi) 11*=0. i-tm

(l-9)

If the series m

c

~‘ll$’ (2’) II2

(1.10)

i-0

is convergent,

then Eq. (1.9) is satisfied. Now assume that series (1 .lO) is divergent. From inequality

(1.7) we have

Since the series (1.10) is divergent, it follows from the last inequality

that (1.11)

lim t’=O. k-ccc

But process (1.3) (1.4) is compact. Hence sup II*'

(zR)ll
k

Inequality

(1.9) follows from the last two inequalities.

Fix a point

z’= (5‘. y’) =z!’

where y is the quantity

This proves the lemma.

and consider the function

appearing in inequality

(1.2). We choose a convex compacturn

K,

containing all points zk. The possibility of such a choice is ensured by the compactness of the process. Let M be the Lipschitz constant for the derivative of the function G in the set K, and 7 any positive number satisfying

? <27&I-‘.

Lemma 3 If rk < r, then

(1.12)

Proof: Using Taylor’s formula, we arrive at the inequality

G (F’) i.e.


+

~ll~~+~-zI/l~,

A gradient method for the modified Lagrange fbnction

G (zk+‘)
Hence,

using

(&-x*,

y”) ~~‘-~t~v(~,

$,(r’, 3”) ii’>+

y”)) -

61

(y”--y’, $v(t’, y”) ) 1

$(Tk)zj/‘$%k)

\I’*

inequality (1.2), and collecting like terms, we obtain

This proves the lemma. Proof of Theorem 1. Para. 1 is already proved. To prove Para. 2, we start by assuming that series (1.10) is divergent. We showed in the proof of Lemma 2 that this assumption implies that Eq. (1 .I 1) can be satisfied. Hence a number kl exists, such that, for all k > k,, we have rk < 7, where ‘i is the quantity appearing in the statement of Lemma 3. Hence, for k < k,, inequalities (1.12) hold. On adding these inequalities, we obtain

zkll$‘(zk) lI%G(zkl) - lim inf G(zk). k=k,

Since the process is compact, the sequence {G (z’) } ooc is bounded. Hence series (1.10) is in fact convergent. Hence, from Eq. (1.8) and the well-known property of an infinite product, we obtain inequality (1 S). Let us now prove Para. 3 of the theorem. It follows from inequality (1 S) and the convergence of series (1.10) that v3

c

II$’ (zk)112<~. A=0

(1.13)

Hence lim 3’ (z’) =O. L+ A’

(1.14j

Since the process is compact, it has at least one limit point z*. It follows from (1.14) that $‘(z*) = 0, and hence z* E Z*. We choose an arbitrary E > 0. In view of inequality (1.13), a number k2 exists such that

(1.15)

Since z* is the limit point of the process, a number k, > kp_ exists, such that (1.16)

62

G.D.Maistrovskii But it follows from inequality (1.16) that, for anu number k > kj ,

(1.17) i=k,

On combining inequalities (I 1_5)-(l.l7),weobtain positive, it now follows that

/j~“--z’~~~~(~, k>k,.

Since eisarbitrary

lim zk=z’. I-+@ This proves the theorem.

2. Rate of convergence Consider the conditions under which the convergence of the process at the rate of a geometric progression can be guaranteed. We shall use the following standard notation. If T; Rn + Rm is a linear operator, then T* is the adjoint operator to T, Im T is the image of T, and Ker T is the kernel of T. We denote by I the identity mapping in any space. By an eigenvalue of a real operator we mean an eigenvalue of its complex extension. The term “spectrum” will be used in the corresponding way. Theorem 2

Let the set Z* consist of a single point Z* = (x*, J’*), let the function il, be thrice differentiable at this point, let the operator $,x(x*, y*) be negative definite, and operator GY,,(x*, y*) non-negative, and let Ker qyy (z*? y’) =Im qVX(r*. y’) . Then process (1.3), (1.4) is convergent to the point z* at the rate of a geometric progression. We consider in Rntm the mapping F, defined by the equation F(Z) = ($X(X. Y) 1-% :=(5. y). Let

(5. y)).

T= lim TV. k-bBy Theorem 1,~ > 0, the sequence {zk}oX is convergent to the point z*, while, for all sufficiently large k, we have the equations z”“=;k+TkF(ek),

We put D=l t~F’(.z’).

Tk++k(l-~kIIF(Zk)

II)‘.

(2.1)

Then the operator D has the matrix representation

(2.2)

where A=-~$,,(s‘, y’). B=rqUx(r*, 11A II < 2 and II C II < 2, i.e.

y’),

~=T$,,(z'.

y'j.

It

can be shown that, if

63

A gradient method for the modified Lagrange function

then the spectral radius of operator D is less than I. In this situation, our theorem can easily be derived from Lyapunov’s theorem, The difficulty thus lies in the fact that satisfaction of inequalities (2.3) cannot in general be guaranteed, (In particular, this means that the unit circle does not necessarily split up the spectrum of operator D). A similar difficulty was overcome in [9, lo] when proving the exponential convergence of the iterative process {zk},,=, satisfying the condition for finiteness of the path length, i.e.

2 liZt+’4/l

(2.4)


k=”

since the Admittedly, our process (3.1) is not iterative with respect to the sequence (z~)~step multiplier is not constant. Also. instead of condition (2.4), we can only guarantee a priori for it the weaker condition

Yet in spite of these differences, it turns out that the scheme developed in 19, lo] is applicable to our present situation. We shall therefore use this scheme. As a preliminary, we shall examine the spectral properties of the operator D. Lemn1o 4 Let A, B, Cbe linear operators, d:R‘l-R’s, B:R”-R’,, conditions

d =_-l=>B’B.

whi,e

l)~R~,--~t~_R’c+“~

C=C’G4h

C:R’,-R”.

Ker C=I111

satisfyring the

B.

is the operator defined by Eq. (2.2).

Then : (1) all the real eigenvalues of the operator D are less than 1, (2) all the non-rea1 eigenvalues of the operator D are inside the unit circle. (3) the Jordan cells, corresponding to the real eigenvalues p < -1, are one-dimensional. Proof: Let cct i h be an eigenvalue of the operator D, and (U t ir, s + iv) the corresponding non-zero eigenvector. Then,

We rewrite this equation as

(3.5

1

64

G. D. Maistrovskii

(I-A)~(~B’s=c~z~-?.“:

(2.6)

-BUS

(2.7)

(I-C)s=p-ix,

(J-_-l)rSB’c=l.u+pr, -Br+

(1-C)

We shall make use of the consequences

I.=~.s-T~L..

-(!-pjr-i.u.Br),

(2.9)

of these equations

basic way’: we form the scalar product of each equation resulting equations.

(2.8)

obtained

from them in the same

with a certain vector, and then add the

As the four sets of four vectors we take (u, s, r. L’). (-1-p) (r,--L;. --u~s)~

-Cs) (0, Cv,O,

u+i.r,

Bu,

.As a result, we obtain respectively:

(1-p) (IIul/~i/~r112~lisllzSllull~) = (.A21,12)$ (Ar, r) + (Cs, s) +- (CV, c) , [i.2-(1-~)‘I +(Ar,

(llu’iii-!irll~)+(1-~)

r))-(liBrrli?+IIBri12)

Here we have made use of the equation

((Au.

(2.10)

(2.11)

U)

=O,

CB = 0, which follows from the third of conditions

(2.5).

Let us prove para. (1) of the lemma. Let the real number P be an eigenvalue of operator D. This implies that h = 0. r = 0. I‘ = 0. It follows from (2.10) that 16 A is positive definite! and the operator C is non-negative

1. Let I-(= 1. Since the operator

definite, we find from (2.10) that u = 0

and Cs = 0. Then, (2.6) gives us B*s = 0. Since Ker C= Im B, the equations imply that s = 0. Hence (U t ir, s t iv) is a zero vector; and this contradicts

that

Cs = 0 and B*s = 0 our hypothesis.

Turn to para. (2). Let p + ih be a non-real eigenvalue. Since h f 0, it follows from (2.12) and 0 = Cs = 0. Then, we can rewrite Eq. (2.10) as Ijs;,?S[IuII:=jjU!j’fIjrjl:PO ((:Ui’~TJ(r;:‘)=(;1zI.

?(&I’)

rr)T(Ar,

r),

(2.13)

From (2.11) and (2.13) we obtain

p+j,z= Since, by hypothesis,

I-

( (A-B-B)

u, u) + (

A - B*B > 0, the last equation

(A-B'B) r, r)

IIullz+II~IIz

.

implies that

We now prove para. 3. Let a Jordan cell of the operator D which is not one-dimensional correspond to the real eigenvalue ~1.Let wl = (~1, sl) be the non-zero eigenvector, ~9 = (~2, “2) be the associated first-order vector, of this cell. Then,

i.e.

and

65

A gradierlt method for the modified Lagranpe functiorl

(I-.l)uiSB*R~=~Irl~.

(2.14)

-Rrr;-(I-C)n,=ps,.

(2.15)

{I- -:1 ,I rl~tB’S2=U,+@l,,

(2.16)

-Brc,’

(2.17‘)

(I-C)s2=s,Sp3,

2Bu,--( l-p)Bu,.Eqs. (2.14)-(2.17) by the vectors - (I-p)u,~ (1-11)~~~. BQ, respectively and add. Recalling that CB = 0, we obtain (l-p)%,, (1--I+) We multiply

(1-p)

(Arc,. u,)=21lBu,~ ‘.

(2.18)

and adding, we In the same way. multiplying (2.14)-(2.17) by the vectors u,. s:. -ul. -s, S,Ii’=l/lj,i;‘. Since ~‘1 =# 0. it follows from the last equation that ul $ 0 also. Then. On combining this inequality with (2.18). the condition A > B*B gives (Au,. u,) >~~Bz~,li'. obtain

we find that p > -1. The lemma is proved.

Proof of l%eorem 2. By the hypothesis of the theorem. the mapping F is twice differentiable at the point z*. Moreover,

F(z*) = 0.To simplify, the notation. we shall assume during our proof

that z* = 0. This obviouslv implies no loss of generality.

Then. in the neighbourhood

of zero.

F(z)=F’(o)z+o(ll;ll): Since

r=

lim tk, k-m

the first of Eqs. (2.1) can be rewritten

as _‘-‘=DrJ+o(;,;‘“).

(2.19)

into the direct sum of subspaces U, I’, and I%‘,ccrresponding to the We decompose space R 't+m parts of the spectrum of the operator D lying respectively inside, outside, and on the unit circle. Let the corresponding decomposition for the operator D be D=D,+D2+D,. Since the spectral radius of the operator a contraction operator.

D, is less than 1, we can introduce into the space U a norm such that D1 is Denote this norm by, 1. 1.Hence.

lD,1+=11.

Similarly, in the appropriate

norm in space I’:

ID_-'1Gq-c 1. I S\R1’4: I

(2.20)

( 2.21)

66

G.D.

Maistrovskii

In space II’we choose any norm 1. I. By I z I, with z E Rn+“, we shall mean the quantity I~I-/r:I-i-lu*l.

where

z=u+~-Cu:

is the direct resolution

if the vector z. The conditions

of Lemma 4 are satisfied for the operator D. For, expanding vectors $,(x, .Y*) in Taylor series with respect to x in the neighbourhood

of x*, we can rewrite condition

(1.2). with y = .I’*, as

i.e. J--J’)

(A (a-s’),

2 +- ~;B(J--TC’) lj”-i-0 (/ix---s’/i’l.

Since x is any point of the neighbourhood

of x *, this last inequality

{rk} ,,y

Since r” < y, and the sequence

is equivalent

to .A>yB*Bl?.

is decreasing, we have y > r. Moreover, by the

hypothesis

of the theorem, A > 0. The first of conditions

(2.5) follows from these facts. The other

conditions

of Lemma 4 are obviously satisfied. By the lemma, all the eigenvalues of the operator D,

lying on the unit circle, are equal to -1. and the corresponding

Jordan cells are one-dimensional.

This means that (2.22)

D,=-I. Using relations (2.20)-(2.22).

we find from Eq. (2.19) that

where L~~+~~~~Lc’=z~. while tozero.Weput a’=JZij-‘((7.FI-171.11).

is a sequence of non-negative

(I.‘},,~

Then.

Jt~‘(=(l-~‘l~)~r’~.

numbers. convergent ltfollowsfrom

(2.23) that

On cross-multiplying

these inequalities,

we obtain

Hence we obtain

a”-2r’,

ak+’

2

(I-q)x"+q'

(2.24)

Let us now show that there exists lirn CC’, k-s

(2.25)

lim sup a’. I-r*

(2 26)

a=

where CK= 0 or 1. We put a=

A gradient method for the modified Lagrange function

67

Since 0 < ct! Q 1, then also 0 < a Q 1. If Q = 0, our assertion is proved. Let (I:> 0. We choose an arbitrary v E (0, a). Since rk --t 00,a number kl exists such that

v(l--v) (I-qp29,

kak,.

(2.27)

By definition (2.26) of oL,a k2 > kl exists such that ok2 > V.Then, from (2.24), (2.28)

kak,.

a’>v,

For, if (2.28) holds for some k > k2, we obtain from (2.24) and (2.27): Qb+’ >

v-2? (I-q)vSq

>v*

Since v is an arbitrary number less than cr, we derive (2.25) from (2.26) and (2.28). Let us now pass to the limit in (2.24): a a SinceO
(i-q;a+q*

l.Hence,cr=Oor Iu~I+Iu~~[).

1.

Using the same method as when proving (2.25) we can

p= lim PA, k-cm where fl= 0 or 1. In short, four cases are logically possible: cx= 0 and fi = 0, cv= 1 and /.3= 0, Q = 1 and /3= 1, and cr = 0 and fl= 1. The first case is not realised, since a’+p”= l+ 1zk I--! ( wb I 2 1 and hence cr t 0 > 1. The second case implies that

zk=vkSo ( 1vk I)

)

(2.29)

the third, that z~=z~;k+o ( I WRI ) ,

(2.30)

zk=Uk+o ( 1UkI) .

(2.31)

and the fourth, that

Let us show that, in fact, either of relations (2.29) or (2.30) contradicts the convergence to zero of the sequence {z”}OoDIn fact, it follows from (2.29) and the second of inequalities (2.23), that

68

G. D. Maisttovskii

and hence, for all sufficiently

large k, we have 1zk+’ I> 1zk I.

Now assume that relations (2.30) hold. From the Taylor expansion

F(z) =F’(O)zS

(0) [z, z] +o ( lIzliz) we obtain z”+‘=~~+T~F~~=z~+T~F (0) zk+ $

We put

Ek,tk-.r

and observe that

,rkF”(0) [ zk, zk] +O ( Izk I’).

F’(0) =zbi (D-I).

(2.32)

Hence we can rewrite (2.32)

as zk+‘=Dzk + !&)z*

T

Since 03 = -I. this equation

rewrite the last equation

[~“,z~]+U(IZ~~~).

gives in the MI-component:

m*+‘=-(l+~)

where 3R is the projection

+ +F”(O)

rJk+~RIZk;Zk]+o(IZkl~),

of the mapping p’(O) on to the subspace M’.Using relation (2.30), we in the final form

Wk+l=-

(1 +F)

U;V-tR[Wk,

We shall first show that, by, virtue of Eq. (2.33)

w”]iok,

cr~=O(IuPl*).

a constant

(2.33)

Cl > 0 exists such that ((2.34)

In fact, (2.33) implies that ~wP-+‘~++Cz~zL~~“,

(2.35)

CZ>O.

On the other hand. T’+‘=T~-

(~A)2~jF~kj/2<~k-~Z~;F~A~(~.

(2.36)

By Lemma 4, unity is not a point of the spectrum of operator D. This means that 0 is not a point of the spectrum of the operator p(O). Hence a constant C3 > 0 exists such that IiFzk]j>Cs]lzA]]. But, in finite-dimensional space, all norms are equivalent. Hence the last inequality gives llFzkIIsG~zk~~C,~ ~‘1, where Cd > 0. In short, from (2.36) we obtain (2.37)

Zk+‘
On comparing

inequalities

(2.35) and (2.37), we obtain

Iw~I-lwk+‘I~c~($-Tk+~),

c, =-.

G T2C‘2

69

A gradient method for the modified Lagrange fhction

On adding these inequalities,

from an arbitrary

fied

superscript k to infinity,

we arrive at

relation (2.34). The mapping R occurring in Eq. (2.33) is quadratic. can obtain from (2.33) the following recurrence

Hence, after elementary

working, we

relation of depth 2:

(1.38)

where

In the last relation, there is no quadratic term. This circumstance

s2=0(jw4j’).

is

decisive. For, it follows from (2.38) and (2.34) that

large /i, we have 1wk+* I> 1td’] .

i.e., for all sufficiently T_he inequality

obtained

obviously

contradicts

the convergence

to zero of the sequence

{w”} om In short, in the conditions of the theorem. relations (2.3 1) are necessarily satisfied. Hence ~~u”~i). From this and (2.20) we find that u k+‘=Di~ki-o(

it follows from (2.19) that IIUk+‘JI~qIIUbllSO(lIUblj).

Consequently,

lim sup(

IukI)~‘kGq.

h-w

It only remains to use relation (2.31) a second time. Theorem 2 is proved.

3. Application Now let $ be the modified

to a problem of convex programming Lagrangian of problem (0.1): i.e.

(3.1)

Here,

[a]+=(a+ja1>/2

concave-convex.

is the positive part of the number cr. The function

$J is

The set of its saddle points in Rn X Rm is the same as the set X* X Y* of

saddle points of the classical Lagrangian L in Rn X R, m. Under our assumptions

about problem

(O.l), the function $ is differentiable and its derivative satisfies in any compactum a Lipschitz condition (see [3-61). It is easily shown by direct working that, given any points (z, y) =R”+“’ and (x*, y*) E Z*, we have tl;e equation

b--3*, fx (5, Y)>- (Y-Y’,4%(5, Y)>= t b--X’, L (5, u) > (3.2)

-~~-~~.L.~I,U~~l-r’ld~~~~Y~!!l+~~~~.~~,

70

G.D. Mairtrovskii

where ui=[y,-ygi(s)]+,

V,=[Yi-yg,(x)I-.

The first term on the right-hand side of (3.2) is non-positive,

since inequality

(1 .l) holds

for the function L in Rn X R+m, while the third term is also non-positive, since y * > 0 and v < 0. On discarding these terms, we arrive at inequality (1.2). This inequality was also obtained for the modified Lagrangian in [7]. In short, all the assumptions Hence we obtain from Theorem

made in Section 1 hold for the modified Lagrange function. 1:

CorollaQ, 1 If Ji is the modified function

(3.1) of problem (O.l), the following statements

are true

for process (1.3) (1.4): 1) a number k, exists, such that

~h+i=~b(l-~RII~‘(~k)ii’), k>ko;

2) lim +O; h-woo

3) the sequence Nowletz*=

{z’} OoD is convergent

to a point of the set Z*.

(x*, JB*) be a saddle point of the Lagrange function L of problem (0.1) and

let functions land gj be thrice differentiable at the point x *, For clarity, we shall assume that the constraints gi(x) > 0, i = 1, 2. . . . , p, are active at the point x*, and are passive for i = p t 1, p+2,...,

m. Let Lo be the classical Lagrange function

corresponding

to the active constraints,

i.e. 9

We shall assume that the sufficient of strict regularity,

conditions

for a second-order

maximum,

and the conditions

are satisfied at the point (x*, y*). Recall that this implies the following:

(a) the operator

LX:,’(x’, y’)

has rank p (i.e. the gradients of the active constraints

are

linearly independent), (b)y*i>O,i=l,2,

. . . . p.

(c) there is no non-zero vector h satisfying the conditions

L,” (Y, y’) h=O

and

L,,o (a?, y’) h-0. Under these assumptions,

z * is the unique saddle point of the function

li, in Rn X Rm.

Ifi’ l> 2,. . . , p, then, by condition (b), yi*-ygi (5’) =yt'>O. If i = p + 1, p + 2, . . . , m, then yi'-_Ygi(S')=-ygi(S')p. Hence Eq. (3.1) takes the form in this neighbourhood:

A gradient

method

for the modified

Lagrange

71

function

Then.

Here, all the derivatives are calculated at the point (x*, _I’*), while P is the orthogonal Rm

onto the coordinate

subspace corresponding

to the coordinates,

projector in

numbers p + 1, p + 2, . . . nz.

The operator Gxx is obviously non-positive, while it follows from condition (c) that is negative definite. Condition (a) implies that Im Gyx is the same as the coordinate subspace corresponding to the coordinates,

numbers

1, 2, , . , p. Hence

Ker $,,=Im

qyx.

The conditions

of

Theorem 2 are thus satisfied. Corollan~

?

Assume that the sufficient conditions for a second-order maximum, and the condition of strict regularity, are satisfied at the saddle point z * = (.Y*,_I.*) of the Lagrange function of problem (0.1). and that the functions f and gi are thrice differentiable at the point x”. Then process (I .3). ( 1.4) is convergent to the point z * at the rate of a geometric progression. Translated

b?,

D. E. Brown

REFERENCES 1.

UDZAW’A. H., Iterative methods of concave programming. in: Studies in [blear and non-liuear Standford U.P., 1960 (Russian translation, IIL, Moscow. 1962). pp. 228-245.

2.

MAISTROVSKII, G. D., A gradient method for finding saddle points, Ekonomika .Vo. 5, 91 7-929,

matem.

merod],

12,

1976.

3.

WIERZBICKI, A. P., A penah!. function shifting method in constrained static optimization convergence properties, Arch. automaf. telemech., 16, No. 4, 395-416, 1971.

4.

ROCKAFELLAR, J. Optimiz.

prog~ummi~p,

and its

R. T., The multiplier method of Hestenes and Powell applied to come?. propammmp 12, No. 6, 555-562, 1973.

Theor? Appl,

5.

TRET’YAKOV, N. V., The method of penalt!, estimates for problems of convex programming, mate?n. metody, 9, No. 3, 526-540, 1973.

Ekonomika

6.

GOL’SHTEIN, E. G. and TRET’YAKOV, N. V., A gradient method of minimization and convex programming algorithms connected with modified Lagrange functions, Ekonomika matem. meted?,, 11, No. 4, 730-742, 1975.

7.

GOL’SHTEIN. E. G., The convergence of a gradient method for seeking saddle points of modified Lagrange functions, Ekonomika matem. metody, 13, No. 2, 322-329, 1977.

8.

ZANGWILL, W. I., Nonlinear programming

9.

LYUBICH, Yu. I., The rate of convergence of stationary gradient relaxation, Zh. @hisi.

(Russian translation, Sov. radio, Moscow, 1973). mat. ma?. Fiz., 6.

No. 2,356-359,1966.

10. OL’KHOVSKII, Yu. G., The rate of convergence of iterative processes, in: Computational mathematics computing techniques (Vshisl. matem. i. vychisl. tekhn.,) No. II, FTINT Akad. Nauk UkSSR, Kharkov, 1971, pp. 7-10.

and