14 2. KAPASNOSEL'SKII M.A., et al., Approximate solutions of operator equations (Priblizhennye resheniya operatornykh uravnenii), Nauka, Moscow, 1969. 3. VAINBERG M.M. andTRENOGIN V.A., Theory of branching of solutions of non-linear equations (Teoriya vetvleniya reshenii nelineinykh uravenii), Nauka, Moscow, 1969. 4. CHEBOTAREV N-G., Theory of algebraic functions (Teoriya algebraicheskikh funksii), Gostekhteorizdat, Moscow-Leningrad, 1948. 5. OLVER F., Introduction to astmptotic methods and special functions (Vvedenie v asimptoticheskie metody i spetsial'nye funksii), Nauka, Moscow, 1978. Translated
U.S.S.R. Comput.Maths.Math.Phys.. Printed in Great Britain
by D.E.B.
0041-5553/84 $lo.cO+o.Oo 01985 Pergamon Press Ltd.
Vo1.24,No.4,pp.l4-18,1984
TWO SCHEMES FOR A NON-LINEAR METHOD OF OPTIMIZATION IN EXTREMAL PROBLEMS* A.A. TRET'YAKOV Two schemes of the method of non-linear descent are proposed for solving the unconstrained minimization problem q(r)*min,SPE,,Z the method with independent choice of coefficients, and a method of quadratic type. Conditions are stated for the existence of a minimizing arc for the general problem of mathematical programming. The methods considered below for solving both unconstrained and constrained extremal problems have a scheme which differs from the traditional linear scheme of iterative optimlzation methods. Most methods are constructed using the recurrence relation (1) tr+,=xr+aksr, where sris the direction of descent, chosen by a certain rule, c& is the length of the step in this direction, and zk is the next approximation. The scheme in the present paper is (2) Z~+,-k+~d~+~&, where sh,d, are directions of descent, or the minimizing pair, which specifies the arc of descent, and a*, b& are the step lengths in these directions. The minimization in this scheme is not along a straight line, but along a certain set, specified by directions s*,d, and coefficients In this sense the process is non-linear with respect to the minimization oL*, Bk. set at each step. Construction (2) is considered for the following reasons: to solve problems with strong ravines, where linear methods usually converge slowly or not at all, due to the influence of rounding errors, to solve problems of degenerate type, when the first derivatives of the objective function or the functions of constraints vanish at the minimum point, and to solve problems of non-linear programming with an admissible set of complex configuration. Below we propose two versions. of the methods of scheme (2) and prove convergence theorems. 1.
Non-linear methods of minimization with independent choice of coeffi-
cients. The unconstrained
extremal
problem cp(z)+min, ZEE, will be solved by a method whose scheme is given by the relations (3a)
x~+,-z~-~r-_B~& (cp'(Sh), SJW i.e., & and&are directions from the condition
$'(zr),&)>O,
of decrease. q (z,-aa8k-Adk)
(3b)
Il~~ll=lldrll==~,
It would be natural
to seek coefficients
&and
p*
= nf,n q b-~,-BdJ.
In general, however, this problem can only be solved approximately, which causes further difficulties. Henceforth it will always be assumed that the function v(z) belongs to the class C','(X,), X,=(r~E.l~(z)scp(zo)), and we shall seek coefficients QI, p,, satisfying the condition 'P(.~~)~(z~+,)~Q~Q~(~'(z;), rr)+qrfUcp'(zl), d&-La&
(41
where L is the Lipschitz constant of the gradient of v(z), and ql is a coefficient. prove below the existence of a,,p,, guaranteeing that (4) holds. We have: cients
Lemma. ah, fl,
Let a sequence {z,) be constructed satisfy condition (4). Then,
accordingtoscheme
'~(fr)--(~(~~+,)>~-'q~Ar'll~'(~~)II*, where *Zh.vychisl.Mat.mat.Fiz.,24,7,986-992,1984
(3) and let the
We shall coeffi(5)
15
*f-(1-2*13sl’f(*-2q.)y,‘-~e*y,20, Q* (q’ (zd* Sk)
e~=--G%~)ll ’ y*” Proof.
We put
a,=(l-q)
s*)/&
(cp’(4. dJ IkP’(x*)ll ’
o
BI-(i_q)(cp'(II),d,)/L,,q,=q/2. Then, with
1 (I-dq
qE('/,,1)
_
-F
L
Hence we obtain at once inequality (4) and the lemma. Before considering the convergence of methods (3), (S), we shall state some necessary propositions. Following /l/, we introduce the set of stationary points ~=(z’(cp’(z’)=O}. We and write two required conditions. put x;=XVx, Condition
such that, for all z~X,\u.(z’),
1.
Given any
e>O,
a ij=ij(s)>O exists
2.
Given any
k=Z,
we have at least one of the relations
we have
lIm'(r)lM Condition
Theorem 1. Let conditions 1 and 2 and the hypotheses function m(z), lower-bounded in non-empty set X0'. Then
e&e>0
or yrZy>O.
of the lemma be satisfied
for the
limp(tr,X0*)-0. .-rm Proof.
By condition
(4) and the lemma,
~~~.~-cp~sr+,~~Cll~‘~z*~/l*,
C=L-‘A,Zq.
The sequence (cp(t~))is monotonic and lower-bounded, so that m(+)---cp(~,+~) -0. Hence II~J’(z.)I/‘0 I k+m , which proves the theorem. We next obtain convergence rate estimates for the method of non-linear descent when solving the unconstrained minimization problem for a convex and strongly convex function. Theorem 2. For the convex 2 hold. Then, if diamXo=q
function we have
m(r),
let the hypotheses
of the lemma and condition
(6) where
p='p&)-'p', O-CC-CL-‘~-~. The proof follows obviously
Theorem If e*>O, QO,
3.
from the lemma and Theorem
For the strongly k=Z, then
convex
function
9.3.3 of /l/.
cp(z), let the hypotheses
of the lemma hold.
(7a)
*--1 Ilx~--r'11'C2p-'exp(-~~q,A,~);
(7b)
here,p is the constant of stronq convexity. The proof follows from our lesxna and Theorem 9.3.4 of /l/. We have thus shown that methods (3), (4) converge and that the rate of convergence estimates (6), (7), similar to those obtained in /l/, hold. Notice that, at each iteration, these constructions give a point closer to the optimal than is given by type (1) methods. This is because, in scheme (1). the minimization is along the direction So, whereas in scheme (2) it is over the plane generated by the vectors sk , d,. 2. Non-linear methods of minimization of quadratic type. Consider the iterative minimization process (constrained or not), when the next approximation is obtained from the relation zA+,-z,(al), where z,(a)=xk+aksk+a,'dl. Henceforth we assume
that
[s+l
, O
k=Z.
Following
/2/, we put
IIq+“(2) II- sup IW” (3 [xl’ll IN-i and we define the minimizing Definition. if
m'p'(zJ[s,]‘
pair
(a,,&) at the point
We call the pair $~‘(zJ[d~‘cn..
(a, d*)
minimizing
xk. (or a pair of descent)
$'+"b~)[&l'+'~O provided
that
#"(x&)=0.
at the point xI r=i, 2..... p.
16 Denote by so(z,) the set of minimizing pairs at the point zk and let us given an extended scheme of the method of quadratic descent for the unconstrained minimization problem. Scheme of the method. The numbers O-=o,aCl, and an initial point&, are specified. We construct the next approximation z&as follows: we put i*+l (k is the iteration number) equal to the least integer from i==O,l,..., for which
(Sk,&I =SD(z*). Then, ah=al*+‘,
(8b)
zh+,=.zk(ak). For this method we have:
Theorem 4. Let cpH?+'(E,), let the set c={zlcp(z)G~p(z~)) the minimizing sequence {(&,d,)} satisfy the relations 11'0"' (4
IIq”+” w r 41 ‘+‘ll24p converges
Proof.
ofE,,and
let
rskl’ll~c&+” (21)II, j=l,
9”’ (z*) =o, Then(zS
be a compactum
b*) II, r
2,...,r-i,
k=Z.
to X.'.
We consider
two cases:
Case 1. Sequence i, is not bounded in aggregate. Then, giving any number m=Z , no matter how large, there is a number k(m) such that, for i,,,,=m we have the inequality (it is assumed that %,,,,-~,,,,(a"))
Expanding
the left-hand
side by Taylor's
formula,
we obtain
cp(f.,“,)-cp(zrcm,)=cp”‘(zrcm,) [a”s*(,,l’+rcp”‘(zI,,,) Using inequality
(9) and the hypothesis
of the theorem,
(P"'(%v) [a”s*,,,l’+r~“‘(z,,,,)
From (10) there automatically
which implies,
as
m-t=,
Case 2. Sequence ity (81, we have
follows
that
we have
[amslcn,]'-'[a'"d,,,]+
(10)
the relation
IIm""(zkC~,) jl-+O or
(i*)is bounded
[a”s*(~n,],-‘[a’d,,,,]+o(arn’)
q+X,', k-+m.
in aggregate,
i.e.,
&Gl,
a&a>O.
In view of inequal-
By hypothesis, the sequence {cp(s&} is lower-bounded and r~(t~+~)--$(z~)+O,k+~. Hence the sequence (cp(zr)} converges to the element m': cp(z,)+cp',k+=. Noting that c,ll~'~'(~,)((~ll~'~'(z,) [rr]'ll*;cllcp("(zr)II, we can write ~(s~)-p,(z*+,)> "m;:)"
I
But
t
~~~(z~)y(tr+l)++& and hence (Icp"'(z~)((+O. With respect to the functional rate of convergence,
II‘ph+J - ‘p(4 II Q ak _$w+I, where
ar+O,
20.
it can be seen that, in case 1,
IIcp”’Pd II1
IIq+"(fJ11+0, k-c-, while in case 2
II‘p(a+d - cp(4 IIQ f.0n.a
SUP ;kEI'k.It+*,l m,"'(&)'*
where
II$')(&)I/-+O, k-cm. The theorem is proved. Denote by X,,*-{t'lcp(')(z')-0, r-l, 2, . . . . p}, and by k,the subscripts such that r-i,Z,...,p-1. Theorem 5. Let the hypotheses of the previous theorem hold, and let k-w Then limp(.z,,,Xp*)-0. i-m we have the inequality Proof. Given any a,,(a*,+O or alpa'),
cp"'(Zki) =o, as
i-00.
17
1 -(L:"+"(p(p+ll(t*([d~,]~+I $0. a~:P(~T,,)lcl,l~(P+,)! ,
cp("*,)-9('.,+, BY hypothesis,
sequence ((p(r,,)}is lower-bounded and cp(r.,‘,)-m(r,,)-0, i-m. Hence the sequence (qJ(+,,)} converges to an element 9': m(r,,)+q~',i-00. and II#"(I,,)I]~O,i+m, rGp. The theorem is proved. Fran these results we have the following corollaries: Under the conditions Corollary 1. (z,)I], we have the relation
of Theorem
1 and with the relation
II~"'(r.)[d*]'ll>c.ll~""'
limp(z,,X;)-0. *-Corollary
If
2.
Under
the conditions
of Theorem
1, there is a sequence
(z~~)~(z,) such that
limI,,-YEX,'. >-,'~+"(z~)[z]'+'>o Vz%!z., then z'is the optimal solution.
3. The condition for the existence of a minimizing arc of quadratic type in the constrained minimization problem. Consider
the existence
of a minimrzing
arc (11)
r,(a)=r,+as,+a'd, for the non-linear
programming
problem i=l. 2,.
ZEA=(Z.EE,I~I,(I)>O,
j(z)+min,
.m),
(12)
such that r,(a)EA , a=[O, e]. e>O, and function f(r,(a)) is decreasing for aE[O,e], 1.e.. relation (8) is satisfied. Before proving our main theorem on the existence of an arc II:), consider the existence at the point IEA of the arc Given
any point
z=A.
z(a)--x+az+a'l, a=[O, e]. ll41=~~ z(a)+ we introduce the following system of sets: ~(2)={i~~c(~)=o}, Z(~)-(zd3.l(q,‘(z), 2)X4 ill).
K(z,M)-
(
zdLlz==
zD(I)-{z~E.(3iEI(I):(z, l.‘(z)-{id(Z))(z,
c lLch’(4. !Lao). A.” v;(r))-o), ~l'(t))PO,zEZ'(z)},
2'(2)-{zE2~(2)(31~1.*(~):p:*'(t)[zj*-0), r,'(l)-(&I,~(Z) ~~!"(2)[21*-0, 2=2'(t)), "+')(r)(z]'+'-0). P(z)-(Z&~-'(I)IZIi&'(r):cp, S-L Izp(2) - (id. (t) 19,‘p+” (=) [*]‘+I -0, z=??(r) ), K.‘(z)-(kd:-’
(I) Iqi’”
(2) [zl’+‘-co),
r-1.2,.
l,‘(z) - (jEr:-’ (I) Iql,“+” (2) [zj’+‘>O), M,,,(Z)-(z,hsE.I cpi“-“(I)[z]‘-‘>O, k-l,
~~~~~-1. $(4-O. i4(Z)}U(Z,l~E.I
.p.
r-1.2....,p,
k==i+2..-.,r-2. llz11-1, cp?(r)-0,
2,....r-2. z&'-'(z) ( 9,"-" (I)[!]'-'+~:"(r)[z]~>O.id. '-'(2)). N;,(Z)-(z,kE.(
II+I,
1"'(t)-0, k-l,2,....9-2.
/"-"(I)[z]~-'
K.‘(t)\k),
if
K’(r)*@;
(131
3) condition (13) holds and 3kE.. /+o:(l+);(z), 1)+@) (z)[z]'>O. The proof follows from Theorem 4 of /3/. In our case m:')(r)=0 and we have: arc
Thmrem 7. Let I&(z)EC~(E,) and .P(z)+@, i=l, 2,.. . m. Then, for the existence of an (11) belonging to set A, it is sufficient that vectors 1#O.:~Zp(r) exist such that w '~-"(2)[1]~-~+~:~'(2)[z]~>O,
iElI-*(I).
18 The proof follows from Theorem 5 of /3/. We shall summarize our results by stating an existence theorem for a minimizing arc (11) in the non-linear programming problem, the proof of which follows at once from Theorem 4-7. Theorem 8. Let the functions DECO, qti(t)=C’(E.), f"'(~)-O,.k=f~ T... , q-2, c~?'(z)~O, k=i, 2,..., r-2, i=Z(z). If vectors z=E. and l&. exist such that M.j,(~)flN,~,(~)z@',~ then the arc is a minimizing arc of type (11) for problem (12). *(a)=z+az+a'l In conclusion, we mention an optimization method obtained from the non-linear approximation of the function at the next point of approximation z,. It can be assumed without loss of generality that the function cp(r) is convex and that the solution X* is sought as the root of the functional equation F(z)=(p’(z)=O. We expand the function F(Z) by Taylor's formula up to second order: F(y)=-F(z)+F’~r~~y-~l+~F”~X~~y-Z1*+~~r,Y~, where
(IO(z, Y) Il=ob-YII’),
and we seek a point y satisfying
the relation
F(r)+F’(r)[y-sl+~F”(r)ly-tl’--O.
(14)
In general, the solution y of this equation is a non-linear function of x and is found approximately. However, if we replace the unknown quantity y-z by the vector -(P(z))-'F(I) in the t'hird term of (14) (using the classical Newton method), the resulting equation is linear:
and we have
for the point
y the expression Y'"--(F'(r)-~F"(z)(F'(z))-'F(I)}-'F(z).
(15)
In this case we have the rate of convergence estimate Il~-2~ll~s,llt-z~ll~. By following a similar procedure further, up to approximation of p-th order, we arrive at expressions similar to (15), but the rate of convergence estimate becomes IIY-I'IJ~CI~I~~-Z'(~~. However, this class of methods will not be discussed here. REFERENCES
(Matematicheskoe programmirovanie), Nauka, Moscow, 1. KARMANGV V.G., Mathematical programming 1975. 2. ALEKSEEV V.M., TIKHOMIROV V.M. and FOMIN S.V., Optimal control (Optimal'noe uravlenie), Nauka, Moscow, 1979. 3. DENISOV D.V. and TRET'YAKOV A.A., Properties of regular sets in Euclidean spaces, in: Computational methods and programming (Vychisl. metody i programmirovanie), No.31, Izdvo MGU, Moscow, 1981.
Translated
by D.E.B.