Two schemes for a non-linear method of optimization in extremal problems

Two schemes for a non-linear method of optimization in extremal problems

14 2. KAPASNOSEL'SKII M.A., et al., Approximate solutions of operator equations (Priblizhennye resheniya operatornykh uravnenii), Nauka, Moscow, 1969...

344KB Sizes 0 Downloads 6 Views

14 2. KAPASNOSEL'SKII M.A., et al., Approximate solutions of operator equations (Priblizhennye resheniya operatornykh uravnenii), Nauka, Moscow, 1969. 3. VAINBERG M.M. andTRENOGIN V.A., Theory of branching of solutions of non-linear equations (Teoriya vetvleniya reshenii nelineinykh uravenii), Nauka, Moscow, 1969. 4. CHEBOTAREV N-G., Theory of algebraic functions (Teoriya algebraicheskikh funksii), Gostekhteorizdat, Moscow-Leningrad, 1948. 5. OLVER F., Introduction to astmptotic methods and special functions (Vvedenie v asimptoticheskie metody i spetsial'nye funksii), Nauka, Moscow, 1978. Translated

U.S.S.R. Comput.Maths.Math.Phys.. Printed in Great Britain

by D.E.B.

0041-5553/84 $lo.cO+o.Oo 01985 Pergamon Press Ltd.

Vo1.24,No.4,pp.l4-18,1984

TWO SCHEMES FOR A NON-LINEAR METHOD OF OPTIMIZATION IN EXTREMAL PROBLEMS* A.A. TRET'YAKOV Two schemes of the method of non-linear descent are proposed for solving the unconstrained minimization problem q(r)*min,SPE,,Z the method with independent choice of coefficients, and a method of quadratic type. Conditions are stated for the existence of a minimizing arc for the general problem of mathematical programming. The methods considered below for solving both unconstrained and constrained extremal problems have a scheme which differs from the traditional linear scheme of iterative optimlzation methods. Most methods are constructed using the recurrence relation (1) tr+,=xr+aksr, where sris the direction of descent, chosen by a certain rule, c& is the length of the step in this direction, and zk is the next approximation. The scheme in the present paper is (2) Z~+,-k+~d~+~&, where sh,d, are directions of descent, or the minimizing pair, which specifies the arc of descent, and a*, b& are the step lengths in these directions. The minimization in this scheme is not along a straight line, but along a certain set, specified by directions s*,d, and coefficients In this sense the process is non-linear with respect to the minimization oL*, Bk. set at each step. Construction (2) is considered for the following reasons: to solve problems with strong ravines, where linear methods usually converge slowly or not at all, due to the influence of rounding errors, to solve problems of degenerate type, when the first derivatives of the objective function or the functions of constraints vanish at the minimum point, and to solve problems of non-linear programming with an admissible set of complex configuration. Below we propose two versions. of the methods of scheme (2) and prove convergence theorems. 1.

Non-linear methods of minimization with independent choice of coeffi-

cients. The unconstrained

extremal

problem cp(z)+min, ZEE, will be solved by a method whose scheme is given by the relations (3a)

x~+,-z~-~r-_B~& (cp'(Sh), SJW i.e., & and&are directions from the condition

$'(zr),&)>O,

of decrease. q (z,-aa8k-Adk)

(3b)

Il~~ll=lldrll==~,

It would be natural

to seek coefficients

&and

p*

= nf,n q b-~,-BdJ.

In general, however, this problem can only be solved approximately, which causes further difficulties. Henceforth it will always be assumed that the function v(z) belongs to the class C','(X,), X,=(r~E.l~(z)scp(zo)), and we shall seek coefficients QI, p,, satisfying the condition 'P(.~~)~(z~+,)~Q~Q~(~'(z;), rr)+qrfUcp'(zl), d&-La&

(41

where L is the Lipschitz constant of the gradient of v(z), and ql is a coefficient. prove below the existence of a,,p,, guaranteeing that (4) holds. We have: cients

Lemma. ah, fl,

Let a sequence {z,) be constructed satisfy condition (4). Then,

accordingtoscheme

'~(fr)--(~(~~+,)>~-'q~Ar'll~'(~~)II*, where *Zh.vychisl.Mat.mat.Fiz.,24,7,986-992,1984

(3) and let the

We shall coeffi(5)

15

*f-(1-2*13sl’f(*-2q.)y,‘-~e*y,20, Q* (q’ (zd* Sk)

e~=--G%~)ll ’ y*” Proof.

We put

a,=(l-q)
s*)/&

(cp’(4. dJ IkP’(x*)ll ’

o
BI-(i_q)(cp'(II),d,)/L,,q,=q/2. Then, with

1 (I-dq
qE('/,,1)

_

-F

L

Hence we obtain at once inequality (4) and the lemma. Before considering the convergence of methods (3), (S), we shall state some necessary propositions. Following /l/, we introduce the set of stationary points ~=(z’(cp’(z’)=O}. We and write two required conditions. put x;=XVx, Condition

such that, for all z~X,\u.(z’),

1.

Given any

e>O,

a ij=ij(s)>O exists

2.

Given any

k=Z,

we have at least one of the relations

we have

lIm'(r)lM Condition

Theorem 1. Let conditions 1 and 2 and the hypotheses function m(z), lower-bounded in non-empty set X0'. Then

e&e>0

or yrZy>O.

of the lemma be satisfied

for the

limp(tr,X0*)-0. .-rm Proof.

By condition

(4) and the lemma,

~~~.~-cp~sr+,~~Cll~‘~z*~/l*,

C=L-‘A,Zq.

The sequence (cp(t~))is monotonic and lower-bounded, so that m(+)---cp(~,+~) -0. Hence II~J’(z.)I/‘0 I k+m , which proves the theorem. We next obtain convergence rate estimates for the method of non-linear descent when solving the unconstrained minimization problem for a convex and strongly convex function. Theorem 2. For the convex 2 hold. Then, if diamXo=q
function we have

m(r),

let the hypotheses

of the lemma and condition

(6) where

p='p&)-'p', O-CC-CL-‘~-~. The proof follows obviously

Theorem If e*>O, QO,

3.

from the lemma and Theorem

For the strongly k=Z, then

convex

function

9.3.3 of /l/.

cp(z), let the hypotheses

of the lemma hold.

(7a)

*--1 Ilx~--r'11'C2p-'exp(-~~q,A,~);

(7b)

here,p is the constant of stronq convexity. The proof follows from our lesxna and Theorem 9.3.4 of /l/. We have thus shown that methods (3), (4) converge and that the rate of convergence estimates (6), (7), similar to those obtained in /l/, hold. Notice that, at each iteration, these constructions give a point closer to the optimal than is given by type (1) methods. This is because, in scheme (1). the minimization is along the direction So, whereas in scheme (2) it is over the plane generated by the vectors sk , d,. 2. Non-linear methods of minimization of quadratic type. Consider the iterative minimization process (constrained or not), when the next approximation is obtained from the relation zA+,-z,(al), where z,(a)=xk+aksk+a,'dl. Henceforth we assume

that

[s+l

, O
k=Z.

Following

/2/, we put

IIq+“(2) II- sup IW” (3 [xl’ll IN-i and we define the minimizing Definition. if

m'p'(zJ[s,]‘
pair

(a,,&) at the point

We call the pair $~‘(zJ[d~‘cn..

(a, d*)

minimizing

xk. (or a pair of descent)

$'+"b~)[&l'+'~O provided

that

#"(x&)=0.

at the point xI r=i, 2..... p.

16 Denote by so(z,) the set of minimizing pairs at the point zk and let us given an extended scheme of the method of quadratic descent for the unconstrained minimization problem. Scheme of the method. The numbers O-=o,aCl, and an initial point&, are specified. We construct the next approximation z&as follows: we put i*+l (k is the iteration number) equal to the least integer from i==O,l,..., for which

(Sk,&I =SD(z*). Then, ah=al*+‘,

(8b)

zh+,=.zk(ak). For this method we have:

Theorem 4. Let cpH?+'(E,), let the set c={zlcp(z)G~p(z~)) the minimizing sequence {(&,d,)} satisfy the relations 11'0"' (4

IIq”+” w r 41 ‘+‘ll24p converges

Proof.

ofE,,and

let

rskl’ll~c&+” (21)II, j=l,

9”’ (z*) =o, Then(zS

be a compactum

b*) II, r
2,...,r-i,

k=Z.

to X.'.

We consider

two cases:

Case 1. Sequence i, is not bounded in aggregate. Then, giving any number m=Z , no matter how large, there is a number k(m) such that, for i,,,,=m we have the inequality (it is assumed that %,,,,-~,,,,(a"))

Expanding

the left-hand

side by Taylor's

formula,

we obtain

cp(f.,“,)-cp(zrcm,)=cp”‘(zrcm,) [a”s*(,,l’+rcp”‘(zI,,,) Using inequality

(9) and the hypothesis

of the theorem,

(P"'(%v) [a”s*,,,l’+r~“‘(z,,,,)

From (10) there automatically

which implies,

as

m-t=,

Case 2. Sequence ity (81, we have

follows

that

we have

[amslcn,]'-'[a'"d,,,]+

(10)

the relation

IIm""(zkC~,) jl-+O or

(i*)is bounded

[a”s*(~n,],-‘[a’d,,,,]+o(arn’)

q+X,', k-+m.

in aggregate,

i.e.,

&Gl,

a&a>O.

In view of inequal-

By hypothesis, the sequence {cp(s&} is lower-bounded and r~(t~+~)--$(z~)+O,k+~. Hence the sequence (cp(zr)} converges to the element m': cp(z,)+cp',k+=. Noting that c,ll~'~'(~,)((~ll~'~'(z,) [rr]'ll*;cllcp("(zr)II, we can write ~(s~)-p,(z*+,)> "m;:)"

I

But

t

~~~(z~)y(tr+l)++& and hence (Icp"'(z~)((+O. With respect to the functional rate of convergence,

II‘ph+J - ‘p(4 II Q ak _$w+I, where

ar+O,

20.

it can be seen that, in case 1,

IIcp”’Pd II1

IIq+"(fJ11+0, k-c-, while in case 2

II‘p(a+d - cp(4 IIQ f.0n.a

SUP ;kEI'k.It+*,l m,"'(&)'*

where

II$')(&)I/-+O, k-cm. The theorem is proved. Denote by X,,*-{t'lcp(')(z')-0, r-l, 2, . . . . p}, and by k,the subscripts such that r-i,Z,...,p-1. Theorem 5. Let the hypotheses of the previous theorem hold, and let k-w Then limp(.z,,,Xp*)-0. i-m we have the inequality Proof. Given any a,,(a*,+O or alpa'),

cp"'(Zki) =o, as

i-00.

17

1 -(L:"+"(p(p+ll(t*([d~,]~+I $0. a~:P(~T,,)lcl,l~(P+,)! ,

cp("*,)-9('.,+, BY hypothesis,

sequence ((p(r,,)}is lower-bounded and cp(r.,‘,)-m(r,,)-0, i-m. Hence the sequence (qJ(+,,)} converges to an element 9': m(r,,)+q~',i-00. and II#"(I,,)I]~O,i+m, rGp. The theorem is proved. Fran these results we have the following corollaries: Under the conditions Corollary 1. (z,)I], we have the relation

of Theorem

1 and with the relation

II~"'(r.)[d*]'ll>c.ll~""'

limp(z,,X;)-0. *-Corollary

If

2.

Under

the conditions

of Theorem

1, there is a sequence

(z~~)~(z,) such that

limI,,-YEX,'. >-,'~+"(z~)[z]'+'>o Vz%!z., then z'is the optimal solution.

3. The condition for the existence of a minimizing arc of quadratic type in the constrained minimization problem. Consider

the existence

of a minimrzing

arc (11)

r,(a)=r,+as,+a'd, for the non-linear

programming

problem i=l. 2,.

ZEA=(Z.EE,I~I,(I)>O,

j(z)+min,

.m),

(12)

such that r,(a)EA , a=[O, e]. e>O, and function f(r,(a)) is decreasing for aE[O,e], 1.e.. relation (8) is satisfied. Before proving our main theorem on the existence of an arc II:), consider the existence at the point IEA of the arc Given

any point

z=A.

z(a)--x+az+a'l, a=[O, e]. ll41=~~ z(a)+ we introduce the following system of sets: ~(2)={i~~c(~)=o}, Z(~)-(zd3.l(q,‘(z), 2)X4 ill).

K(z,M)-

(

zdLlz==

zD(I)-{z~E.(3iEI(I):(z, l.‘(z)-{id(Z))(z,

c lLch’(4. !Lao). A.” v;(r))-o), ~l'(t))PO,zEZ'(z)},

2'(2)-{zE2~(2)(31~1.*(~):p:*'(t)[zj*-0), r,'(l)-(&I,~(Z) ~~!"(2)[21*-0, 2=2'(t)), "+')(r)(z]'+'-0). P(z)-(Z&~-'(I)IZIi&'(r):cp, S-L Izp(2) - (id. (t) 19,‘p+” (=) [*]‘+I -0, z=??(r) ), K.‘(z)-(kd:-’

(I) Iqi’”

(2) [zl’+‘-co),

r-1.2,.

l,‘(z) - (jEr:-’ (I) Iql,“+” (2) [zj’+‘>O), M,,,(Z)-(z,hsE.I cpi“-“(I)[z]‘-‘>O, k-l,

~~~~~-1. $(4-O. i4(Z)}U(Z,l~E.I

.p.

r-1.2....,p,

k==i+2..-.,r-2. llz11-1, cp?(r)-0,

2,....r-2. z&'-'(z) ( 9,"-" (I)[!]'-'+~:"(r)[z]~>O.id. '-'(2)). N;,(Z)-(z,kE.(

II+I,

1"'(t)-0, k-l,2,....9-2.

/"-"(I)[z]~-'
K.‘(t)\k),

if

K’(r)*@;

(131

3) condition (13) holds and 3kE.. /+o:(l+);(z), 1)+@) (z)[z]'>O. The proof follows from Theorem 4 of /3/. In our case m:')(r)=0 and we have: arc

Thmrem 7. Let I&(z)EC~(E,) and .P(z)+@, i=l, 2,.. . m. Then, for the existence of an (11) belonging to set A, it is sufficient that vectors 1#O.:~Zp(r) exist such that w '~-"(2)[1]~-~+~:~'(2)[z]~>O,

iElI-*(I).

18 The proof follows from Theorem 5 of /3/. We shall summarize our results by stating an existence theorem for a minimizing arc (11) in the non-linear programming problem, the proof of which follows at once from Theorem 4-7. Theorem 8. Let the functions DECO, qti(t)=C’(E.), f"'(~)-O,.k=f~ T... , q-2, c~?'(z)~O, k=i, 2,..., r-2, i=Z(z). If vectors z=E. and l&. exist such that M.j,(~)flN,~,(~)z@',~ then the arc is a minimizing arc of type (11) for problem (12). *(a)=z+az+a'l In conclusion, we mention an optimization method obtained from the non-linear approximation of the function at the next point of approximation z,. It can be assumed without loss of generality that the function cp(r) is convex and that the solution X* is sought as the root of the functional equation F(z)=(p’(z)=O. We expand the function F(Z) by Taylor's formula up to second order: F(y)=-F(z)+F’~r~~y-~l+~F”~X~~y-Z1*+~~r,Y~, where

(IO(z, Y) Il=ob-YII’),

and we seek a point y satisfying

the relation

F(r)+F’(r)[y-sl+~F”(r)ly-tl’--O.

(14)

In general, the solution y of this equation is a non-linear function of x and is found approximately. However, if we replace the unknown quantity y-z by the vector -(P(z))-'F(I) in the t'hird term of (14) (using the classical Newton method), the resulting equation is linear:

and we have

for the point

y the expression Y'"--(F'(r)-~F"(z)(F'(z))-'F(I)}-'F(z).

(15)

In this case we have the rate of convergence estimate Il~-2~ll~s,llt-z~ll~. By following a similar procedure further, up to approximation of p-th order, we arrive at expressions similar to (15), but the rate of convergence estimate becomes IIY-I'IJ~CI~I~~-Z'(~~. However, this class of methods will not be discussed here. REFERENCES

(Matematicheskoe programmirovanie), Nauka, Moscow, 1. KARMANGV V.G., Mathematical programming 1975. 2. ALEKSEEV V.M., TIKHOMIROV V.M. and FOMIN S.V., Optimal control (Optimal'noe uravlenie), Nauka, Moscow, 1979. 3. DENISOV D.V. and TRET'YAKOV A.A., Properties of regular sets in Euclidean spaces, in: Computational methods and programming (Vychisl. metody i programmirovanie), No.31, Izdvo MGU, Moscow, 1981.

Translated

by D.E.B.