CONVERGENCE ESTIMATES FOR ITERATIVE MINIMIZATION METHODS* V. G. KARMANOV
Moscow (Received 23 May 1973) ESTIMATES are obtained, whereby the efficiency of the chosen iterative method can be established during the computational process, for unconstrained or constrained minimization problems. Topics relating to the convergence of quite a wide class of iterative methods are discussed. We shall discuss the minimization of a convex differentiable functional ‘p(5) on a convex closed set X of Hilbert space H A sequence of elements {xk} will be called an iterative sequence (the term relaxation sequence is also sometimes used) if: Condirion I.
{XT”}EX. k=O, 1,. . . .
Condition 2. cp(5kf’)
It will be assumed throughout that the following hold: Condition3. The convexset X,={z~x: Condition 4. IIrp’(s) IlGrc+a
cp(s)~rp(zc)}
isbounded: diamX,=a<+oo.
for all zC%.
Information required to prove the convergence theorems is given in Section 1. The theorems of Section 2 relate to convergence estimates for iterative methods of an extremely general type. In Section 3 we examine the convergence of various methods of unconstrained minimization. Finally, in Section 4, we discuss the convergence of methods for solving constrained minimization problems.
1. Required information
The following lemmas provide the basis for our future discussion. Lemma I
If the sequence {PA} is such that
and
for at kaSt
ak>o
one k, then
*Zh. vjkhid. Mat. mat. Fiz., 14, 1, 3-14, 1974. 1 USSR
14,
A
2
?t G. Karmanov
For,
whence
Lemma 2 If the functional cp(z) is strongly convex on a convex and closed set X, i.e., a h>O, exists such that, for any z‘, .z”=X and OGaGl
rp(ctx’+~l-a)xN)~:CE(Pfxl)+(1-&)cp(X”f--a(l-tL)hllx’-x’/ll2, and since there exists an t*=X
such that rp(2’) =min 4, (2) , we have, for any SEX x
If here, a)(s) EC’ (X) , then
For the proof see e.g., 121. Consider an element V~=X~---V~~$(~~),where O
and denote by p(vk) the
in&~luk--XII.
Lemma 3 For any ti=X
and %>O
For the proof see e.g., [2].
2. General convergence estimates Since q(z) is convex and continuous, it will attain its minimum on a bounded and closed set X0, and hence on X:
3
Convergence estimates for iterative minirn~ati~~ methods
rnp rp(x) = cp(5*). We now observe that, if xk=p process terminates.
(u”),
then 9, (zk) =p (2’)) and in view of this, the iterative
In the processes discussed below we assume that Ijxk--p t=,“) 11 >O. l%eorem 1 If conditions 1-4 are satisfied, then
where c = const > 0. Pr00.f
Put pk=cp
pk <
(xk)-cp (x*). Since rp(s) is convex on the convex set X, we have
(9’ (xk),xk--X*) = (9 (xkf, 2’ - p @*))+ (q’ (xk),p (uk)-x*)
= (qqxk), xk - P (UkJ) -I-
-
-$ (VA-
(
II‘pfbk)II+
+11x* - P(~Nl)IIx"
+-
lx*
- pfUkJ,x*- PPk)I/
whence
P
( uk) ) GO,
(ukN
P V) ll)ll~R-
Using condition 4 and the second inequality of Lemma 3,
Further,
(URN
we have ( uk--p ( vk), s-p
& (@(x’)>xk - P tuk)).-
G IIcp’(xk)II+ G
P
P kk), x’ - P (ukN.
Since p is the projection operator, for all a=X
pk
x*-
P (uk),
P Fk) II-
so that
V. G. Karmanov
4
Applying Lemma 1, we get
whence, since VA
1
cp(sk)- ‘P(zk+‘) --L vk2 lbk-p
(uk)
112
kc0
.
Hence, in the case when q,(z) is strongly convex, (3) indicates the strong convergence of the methods. Convergence with respect to a functional will be discussed throughout. We shall not quote estimates of strong convergence for the case of strongly convex functionals (unless there is some special reason for so doing), since the reader only needs to recall the second inequality of Lemma 2 in order to obtain the relevant results for himself. It is clear that divergence of the series on the right-hand side of (1) guarantees convergence of
the iterative process. It must be emphasized that, as z” approaches the minimum point z* we have /Ix~--~(u~) /1+-O. H owever, no success can be achieved if we try to apply directly the estimate (1) for determining a priori if a given iterative process of minimization is convergent. For each concrete method (or class of methods), we need to be able to obtain a lower bound for the quantity and later sections of this paper are in fact devoted to this topic. On the other cPtzk)-Q(sk+i), hand, turning to the computational aspect of the minimization procedure, it can be said that the estimate (1) is often extremely promising. There are so far no really sound criteria for saying that a chosen minimization procedure is not worth continuing, and we have either to stop the computation or else use another method. The estimate (1) allows us to accumulate, during the computational process, information on the speed of the approach of cp(tm) to rp(a?) and on the convergence of the method. As long as vk2
- cp(x”“Y = * 1 cP(Xk) lIxk--PPkW k ’
(1
the minimization process remains, so to speak, profitable. But as soon as the situation
vk2
starts to become established, continued,
- cp(xk+l) cP(xk) III” - P (uk)II” =
&J&)~
and there is no promise of the situation
o>o, changing if the process is
the process should be stopped.
It must be mentioned
that, when the element
v’%X, the evaluation
of the projection
p ( vk),
often poses a problem as difficult as the initial problem; yet it is only when evaluation of p ( uk) is relatively easy that it is worth carrying out in order to determine the convergence of the process. For instance, if the set X in a finite-dimensional problem has the form {I: aGx< b} , we can find p (vk>
Convergence estimatesfor iterativeminimization methods
as a result of comparing n numbers. Notice that, in the context of problems of mathematical programming, the method of penalty functions leads to a problem of rn~~~ation on a set X= (5: 2>0) If the evaluation of p ( vk>for auxiliary purposes proves difficult, the estimate (1) may be “corrupted” by making it easy to realize computationally, by using the relationship
The estimate changes as follows:
(4) Admittedly, it is possible to construct an example when yk2 rp(@f - ‘p(xk+l) ->const>O lfxk - PW/12 for arbitrary k, and the process converges at a rate 0( l/m), whereas the series
co xi
cp(xk) - ip (xk’“)
k=.
j/
9’ Pk)II2
is convergent and the estimate (4) does not reflect the fact that the iterative process is convergent. If, however, uksX (as is always the case in unconstrained extremum problems), the estimate (4) is then equivalent to (1). Theorem 2 If
cp(x) EC“ 0(X*),
i.e., #(ST) satisfies on the set X0 the Holder condition
Ilcp’(x’)-cp’(x”)
IIaJ~'-x"llP,
x’, XNEXo,
OCpGl,
and the sequence {z’} is such that
where (cp’(xk),xk - xkfl) xk+l I( >
(6)
”
ak= 11cp’(xk) (111 xk then
cP(xk) --cp(sk+‘)2% (qJ’(ti), XGP-‘)
243.
(7)
Proof. Consider cp(xk) -
cp(xk+l) = (cp’(xk), xk -
+ i (‘p’ (zk+l + z (xk - xk+l))0
xk+“) I$ (39), xk -
xy
dz
6
V. G. Karmanov
> (cp’(xk), xk - x&+1)-
L 11 Xk - x k+l (IQ+1i (1 -
z) oh
8 =(cp’(xk),
-
xk+l II”‘_
ak 11~’ (xk)[IIIxk -
xk+l1 -
xk -
xk+l) -+
From this, and conditions (5) and (6), we get
q
(xk) - q (xktl)>
>+
(cp’(xk),xk -
(1cp’(xk) )I I)xk - xk+lu = f
+,/xLxk+l,/Pt~ xy.
The fact that the right-hand side of this inequality is positive follows from (5) and (6). Corollary The inequalities (7) imply that condition 2 holds, while the Holder conditions imply that I]$ (x) ]I is bounded on Xe. Hence, if cp(x) is convex, Theorem 1 holds, and hence
(9) This inequality allows some conclusions to be drawn regarding the “mechanism” of an iterative process. For this, consider the following. Example. Let the length ]]i--i+‘[] akr\k j]xk
where 0<~<‘l/L+,.
--p
of the iterative step satisfy the conditions
(vk)j
<
[Ixk -
xk+l
(I&
The quantity Q, characterizes the rate of decrease of the step length.
For O
(cp’(xk),xk- xk+l)=
IPk - P(vk)l12 ucpt (xk)I!
> ‘+z2%‘k2 /bk _
p @k)
ukv;
IIrp’(xk)IIllxk-
IIXk-
xktlII
P Wk)lr
11 ’
But vkl]qY(~) Illllx”--p(uk) llZl by Lemma 3, so that
vk2
(9 (xk),xk - xk+l)
IIxk-P(@)lla Hence, with
ak’qk’o
(i/k)
the
>
fak2q
km
process is convergent.
We can now draw the following conclusion. While the quantity [cp(2) -cp (tip”) ]I 112-p (uk) II” enables the actual convergence of a process to be established, the values of CCA (the cosine of the angle between the antigradient -rp (xk) and the direction xRti-ti) and Q assist in revealing why the iterative process is divergent.
Convergence estimatesfor iterativeminimization methods
7
i%eorem 3
If conditions l-4 are satisfied and the functional cp(z) is strongly convex on the set X, then
k@.
By Lemma 2, fl$(~~:“)112~:hph,for a strongly convex functional, so that
whence I”k+l<
I1 -
cpIxkil) lh’ (xk)lr” I
4p(xk)-
8,
ok
(12)
*
As already said, p&)0, k=O, 1, . , , (since, if PA=0, the process ends). In view of this,
We obtain from (12):
i.e., the inequality (10). The inequality (11) is an immediate consequence of the second inequality of Lemma 3. Nore 2. The estimates (IO) and (11) are effective when Il’p’(sk)~l~r)as sb-rz*, i.e., primarily for unconstr~~d minimization problems.
3. Unconstrained m~ation
probtems
1. In this case p (uk> =vk=xk-- ~+‘(a?‘) and the estimate (1) transforms into (4). (Notice that (4) can easily be obtained directly, since b=cp (xk) -rp (x*) GSll~$(z:“) il. We then have c = K2 in (4). Notice that condition (4) is superfluous as regards obtaining (4).) The inequality (9) is correspondingly modified: (qf (xk), xk -
xk+l)
1vi(xk>II”
-I
03) l
8
V. G. Karmanov
Put xk+l
xk - psk.
= -
P
Theorem 4 If the convex functional cp(5) WY,
condition 3 is satisfied, and in addition
67(xk)v Sk)
uk=
I](P’(xk))I ))SkI)
>O
and ~~2-0 for at least one k, then, for an iterative process such that k=0,1,...,
cp(Xk+l) = rnn; rp(xi”), I
we have m-l 9 (xm)
-
‘p (x*)
&
[c1 x
(14)
ak2]-1,
k=o
where cl = const > 0. If, in addition, cp(z) is a strongly convex functional,
m-1
UXffl-x*~[2~-$(q(xo)-q(x*))exp
Proof: Since cp(x) &3’, takes the form (with p=1)
we have ~II$(x’) -$(a!‘)
[--L2Lpk21
(16)
II
cp(sk) - cp(xk+l) > cp(sk) - cp(xi”) > (cp’(sk), 29 - @) - ; =
I]39 -
x;+llj* = p (cp’(zk), Sk) -
ukP 11Sk 1111q’ cxk)
/(-
+
-g 8” ((Sk112
(17)
P” IIs” 1i2*
And since this inequality holds for all fl> 0, we obtain, with
(18) From this and (4) we obtain (14), where cl = c/U,. The inequalities (15) and (16) follow immediately from (lo), (1 l), and (18). Note 3. If, instead of q(s) ECU,i we require that cp(2) =C1s0, it is easily shown by similar working that cp(sk)- q(zk+‘)> czakl+“~lcp’(rk)(li+i/p,
Convergenceestimates for iterative minim~ation methods
9
while (14)-( 16) are appropriately modified. For example, m-1
qJ(5-q - lp(r*)
c
-i
Kc
I
aa+“Pllq3t(zk) II’@-’
c3
k=0
.
2. Methodsof conjugategradients.This class of methods can be described by the following scheme: xk+‘=xL@kSk,
k=O, 1, * . .
)
(20)
sO=(P’(xO), S%p’(Xk) --Ek.+‘,
k=l,
(19)
2,. . . (
cp(zk+l) = min q (zi”).
(21)
(22)
P>O
numbers k for which &+=O are called renewals, or instants of renewal. This scheme includes both the method of steepest descent (with .&=O, k=l, 2, . . .) , and various versions of the conjugate gradient method, since these versions only differ in the me~od of choosing the & f3].
The
Convergence. The estimate for these methods follows from (14) and the equation (cp’(ti) ,~~)=~~cp'(z~)~~~,thetruth o f wh’ICh is obvious. From (21) we have (Cp’(xkf~), Sk+’>=lkP’(~““) ll”-Ek+,(cp’(x”“), Sk>, and from (22):
On observing that, from (21), ~lsk~~z=~~~‘(~~) ~~z+~~2~~sk-‘~~z, we now get
(231 It follows from this and (14) that, with
the convergence is sauteed.
If, on the other hand,
where c = const > 0, the convergence of the process is at the rate 0(1/m) for a convex functional 4i,(2) or in the case of a strongly convex functional, at the rate of a geometric progression. A convergence rate estimate for the method of steepest descent ($+==O, k=l, 2, . . .) follows in an obvious way from (14) and (23):
10 while if the functional is strongly convex,
Take as an example the following method for choosing the & (see [3] ):
!ik=
(0’ (xk),cp’(xk)- cp’(xk-‘N IIrp’Pk--‘)II* ’
0,
keII,, kEI%, I1 u 12 = (1,2,. . .)
It was shown in f3] that, if cp(5) is strongly convex, condition (24) holds, i.e., whatever the method of choosing the instants of renewal (Le., the set of indices IS), the process converges at the rate of a geometric progression. 3. Method ofcoordinate descentfor finding an unconstrainedminimum in Finite-dimensional problems. Convergence of the method can be guaranteed if the following procedure is adopted. Let
i = 1,2,. . . , 72.
myl~l=l~[>O,
at the point I of n-dimensional Euclidean space E’tWe choose
ei,
- ei, where ejis the j-th coordinate vector. Then,
In this case
m-1
c
ak2> ;
k=o
and the process is guar~teed convergent at a rate of O( I/m), or if cp(x) is strongly convex, at a rate O(qm), O
Convergence estimates for iterative minimization methods
i.e., with a probability
11
which tends to unity as m -+ m, we have
=o (Urn).
cp(xTn)-cp (x*)
4. Constrained
minimization
problems
1. Let cp(.Z) EC’S’ (X). c onsider the following method of constructing
the iterative sequence
{xk} : 5
In order to satisfy the conditions
uk=
k+lEZk_
of Theorems
PkSk.
1 and 2, we have to choose sk and fli,in such a way that
(0’ (xk),Sk) > w IIIISk/I
0,
(25)
II9’
cp(xktl) ==<;;;,
,Ip
(xi”),
xkptl=
xk - Psk,
1,
where Sk =
max p,
R = {p > 0 : xk -
If a sequence {J.?}, such that fibi=cLi :
!3sk fz X}.
(ak, /L) llcp’(xLf) /I, can be found, the following now
becomes evident from the example of Section 2. If the point xk* is not located in a sufficiently neighbourhood
of z*and hence )Izk~-p (uki) 11is not a small quantity,
will be influenced
by &, characterizing
the distance from rPL along the direction
-sk’ to the
boundary
of the set X. Hence the rate of approach of the point xkf to the boundary
direction
-.sk’ is significant
along the
for elements of this subsequence.
For problems of mathematical of constraints),
small
the convergence of the process
programming
most of the well-known
(finite-dimensional
problems with a finite number
algorithms for iterative processes are so constructed
number of small steps & outside the neighbourhood
of the minimum
that the
point is finite, i.e., the
directions -sk are so chosen that the number of “walks” with a small step close to a given boundary surface (constraint) is finite. And, since the number of constraints is finite, the total number of small steps of the type cki will also be finite. In the well-known methods of feasible directions, special so-called anti-zigzag devices have been proposed for such purposes (see [4]). Finally, in gradient projection
and conditional
gradient methods, walks of this type are eliminated
automatically,
so that
methods with renewals, where by “renewal” we mean say a step in accordance with the gradient projection method, become extremely promising. As an example of the application below for the gradient projections 2. Gradient projections (v”) . Hence
of our theorems, convergence
rate estimates will be discussed
method.
methods.
Let cp(x) EC’,’ (X0) .In the methods in question,
sk=x”-p
5 I~+,=ICk-pk(gk-p(uk)), VLxk --vk(p’ ( xk) .
0(/x1,
12
Kannanov
K G.
We shah use the estimate (8). Here it takes the form Zk - p (29) 1”. cp(sk)- cp($+l)> P((7’(zk),xk - P (uk)) - + p”11 Applying the first inequality of Lemma 3, we get +p-p(dp cp(sk) - cp(4”) --Y
~P211~k-P(~k)llz
= - P - $- B”)/Ix” - p (vk) Ii”. vk
Convergence of the gradient projections method is thus guaranteed by appropriate choice of the quantity (see [ 1J) vk
’
(+ Pk - ‘+Pk2)-
As an example, the following schemes may be quoted (see [2]). and Pkis chosen from the condition
Scheme I. O
cp(zk+l) = mm1 cp(Lz$“). Then, - +- s’)ll.+P~~k)lP
T(Xk) - ‘F(skfl) > $)(a+) - r++$+l)>(+P > (+-
P -- -g P2) I(xk - P (vk) II2*
Since this inequality holds for any O
cPbk)-(pbk+i)~E1ll~k-p(vk) (12, Scheme2. 0
O~p,,
9 (zk) - 9 (gk+‘) >
Pk (+
II Zk
>Po($-$)
-
-
&,>O.
Then, ‘$
P (vk) 11” =
pk)
1)Ik
E2 llx”
-
-
P cuk)
11’
P (uk) I!?
&g>O.
Scheme 3. Here, PR=I, k=O, 1, . . . , while the choice of the va is varied. We put *=zk-wp’ (2) . UV The scheme consists of the following zk+l = p (vk), O
2 2efL
cp(P (yk)) = min q (p (yyk)), v*,
e>O.
13
Convergence estimatesfor iterative minimization methods In
this case, since Pk=l, while v?,Gv”, cp(xk)
cp(P(uk))
- cp(xk+l)= ‘p (xk)-
~(~-~)~sk-p(V’)IjZ>/e,jlz”-P((u”)IP,
%>O*
Scheme 4.
(cp’(zk), xk - P oJk)> 2 O<&4<7k<
7’
Then (see [2], pp. 92-93),
cp(xk)-
o
00.
with &=I
qJ(xk+l)>
2
(cp’(xk),xk -
while with pk=h[k(rp’(+h), ~~-p(z~~))/llx~--p(~~)
cp(xk)- rp(xk+l)>
1’
Ilxk --P(Uk)l12 &6 Es>07
&4&I,
(9
(xk)9 Ilxk
-
P (uk) II27
11’
Xk
-Tj-
P (uk)) > G&- II9 -
P (vk)Y > E485 Aqq$+ P (vk) II -
--P(Uk)l12.
Hence cp(x”) -cp (xk+‘) >,~~llx~-p (uk) II’, G-0. If we additionally require that o
(P) -q (2’)
we
,
where c = const > 0 is suitably defined for each scheme. Translated by D. E. Brown REFERENCES 1.
LYUBICH, Yu. I., and MAISTROVSKII, G. D., General theory of relaxation processes for convex function&, Usp. Mat. Nauk, 25, 1, 57-112, 1970.
2.
LEVITIN, E. S., and POLYAK, V. T., Constrained minimization methods, Zh. vjchisl. Mat. mat. Fiz., 6,5, 181-823, 1966.
3.
POLYAK, B. T., Conjugate gradient method in extremum problems, Zh. vjchisl. Mat. mat. Fiz., 9,4, 808-821, 1969.
4.
DEM’YANOV, V. F., and RUBINOV, A. M., Approximate methods for solvingextremal problems (Priblizhenny (Pribliihennye metody resheniya ekstremal’nykh zadach), Izd-vo LGU, Leningrad, 1968.
5.
ZOUTENDIJK, G., Methods of Feasible Directions, Elsevier, 1960.