Available online at www.sciencedirect.com
Applied Mathematics and Computation 194 (2007) 102–107 www.elsevier.com/locate/amc
Two conditions concerning Newton’s method Min Wu Department of Mathematics, Zhejiang University, Zhejiang, Hangzhou 310027, PR China
Abstract In this note, we compare two conditions concerning Newton’s method, and find out that even under some weaker condition one has the convergence of the Newton’s method and a better error bound than the relative strong condition. Ó 2007 Elsevier Inc. All rights reserved. Keywords: Newton’s method; Majorizing function; Center Lipschitz condition
1. Introduction In this note, we investigate the convergence of the Newton’s method, i.e. using the iterative scheme 1
xnþ1 ¼ xn f 0 ðxn Þ f ðxn Þ;
n ¼ 0; 1; . . .
ð1:1Þ
to solve the equation f ðxÞ ¼ 0;
ð1:2Þ
where f is an operator defined on a convex subset D of a Banach space E1 with values in a Banach space E2. In other words, f : D E1 ! E2 : Many authors study the convergence of the sequence (1.1) towards a solution of (1.2) under the condition of the Kantorovich theorem (see [4]), or closely related one (see [5,7]). Roughly speaking, those results are under the assumption that the second Fre´chet derivative f 00 is continuous and bounded in D or f 0 is Lipschitz continuous in D. In order to get better error bounds in [2,3] the authors studied the convergence of Newton’s method with the assumption that f 00 is Lipschitz continuous in D. Recently, Argyros (see [1]) investigated this method under the assumption that f is m-times Fre´chet derivable. To mention his result, let Bðx0 ; rÞ ¼ fx 2 Djkx x0 k 6 rg and B(x0, r) = {x 2 Dj kx x0k < r}. Thus, the condition in [1] can be formulated as
E-mail address:
[email protected] 0096-3003/$ - see front matter Ó 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2007.04.035
M. Wu / Applied Mathematics and Computation 194 (2007) 102–107
103
Condition A. Let m P 2 be a positive integer, a2 > 0, g > 0, ai P 0 and 3 6 i 6 m + 1. Let further f be an mtimes Fre´chet derivable operator. Assume there exists x0 2 D such that kf 0 ðx0 Þ1 f ðx0 Þk 6 g;
ð1:3Þ
1 ðiÞ
kf 0 ðx0 Þ f ðx0 Þk 6 ai ; 0
1
kf ðx0 Þ ½f pðsÞ 6 0;
ðmÞ
ðxÞ f
ðmÞ
i ¼ 2; . . . ; m; ðx0 Þk 6 amþ1 kx x0 k
ð1:4Þ for all x 2 D;
ð1:5Þ ð1:6Þ
where p is defined as following: pðrÞ ¼
amþ1 mþ1 am m a2 r þ r þ þ r2 r þ g ðm þ 1Þ! m! 2
and s is such that p 0 (s) = 0. He showed, among others. Theorem A. Under Condition A the iterative sequence {xn} (n P 0) generated by (1.1) is well defined and contained in Bðx0 ; r Þ. Moreover, xn converges to a solution x* of (1.2), which is unique in Bðx0 ; r Þ [ Bðx0 ; r Þ, where r* and r** are the only two positive zeros of p(r). Furthermore, the following error estimates hold for all n P 0: kxnþ1 xn k 6 rnþ1 rn ; kxn x k 6 r rn ;
ð1:7Þ ð1:8Þ
where rnþ1 ¼ rn
pðrn Þ ; p0 ðrn Þ
r0 ¼ 0; n ¼ 0; 1; . . .
This result generalized the classical Newton’s method. On the other hand, in 1999 Wang (see [6]) also considered the similar problem under quite different conditions. Thus, let q(x) = kx x0k, qðxx0 Þ ¼ qðxÞ þ kx0 xk and L(u) be a positive nondecreasing integrable function in [0, d] for some d > 0. Denote for some b > 0 hðtÞ :¼ b t þ where R satisfies
RR 0
Z
t
LðuÞðt uÞdu;
ð1:9Þ
0 6 t 6 R;
0
LðuÞðR uÞdu ¼ 1. We need the following:
Condition B. Suppose that f has a continuous derivative in Bðx0 ; dÞ and f 0 (x0)1 exists. Let f 0 (x0)1f 0 satisfy the so-called center Lipschitz condition in the inscribed sphere with the average L, i.e. for all x 2 B(x0, d) and x0 2 Bðx; d qðxÞÞ there holds Z qðxx 0 Þ 1 0 0 0 0 kf ðx0 Þ ðf ðx Þ f ðxÞÞk 6 LðuÞdu; ð1:10Þ where
qðxx0 Þ
6 d. Let b ¼
R d0 0
qðxÞ
LðuÞu du with d0 satisfying
1
b ¼ kf 0 ðx0 Þ f ðx0 Þk 6 b and
t*
6 d, where
t*,
t**
R d0 0
LðuÞdu ¼ 1. Suppose ð1:11Þ
are two positive zeros of h(t).
One of the results in [6] is the following: Theorem B. Under Condition B the sequence from (1.1) is well defined for all n and converges to a solution x* of (1.2) which satisfies x 2 Bðx1 ; t bÞ Bðx0 ; t Þ:
ð1:12Þ
104
M. Wu / Applied Mathematics and Computation 194 (2007) 102–107
Moreover, for all n P n0 P 0 the best possible error bounds are 2nn0 kx xn0 k kx xn k 6 ðt tn Þ t tn0
ð1:13Þ
and 2nn0 2kxnþ1 xn k kxn0 þ1 xn0 k ffi 6 kx xn k 6 ðt tn Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; nþ1 tn0 þ1 tn0 1 þ 1 þ 4 ðtt t 2 ðt nþ1 t n Þ t Þ
ð1:14Þ
n
where t0 = 0 and tn is such that tnþ1 ¼ tn
hðtn Þ ; h0 ðtn Þ
n ¼ 0; 1; . . .
ð1:15Þ
On the one hand, it is easy to see that the assertions in Theorem B are essentially stronger than those in Theorem A. On the other hand, both conditions (Conditions A and B) are quite different. One may therefore ask whether Condition A is essentially weaker than Condition B. In this note, we will show that in fact Condition B is weaker than Condition A. Consequently, we can use Theorem B to improve the assertions of Theorem A. Our main result of this note is Theorem 1.1. If f satisfies Condition A, then f satisfies Condition B. Consequently, under Condition A the sequence {xn} and x* satisfy (1.12)–(1.14). 2. Proof We begin with the following simple fact: Lemma 2.1. In Condition B, a sufficient condition for (1.10) is that f is two-times Fre´chet derivable and 1
kf 0 ðx0 Þ f 00 ðxÞk 6 LðqðxÞÞ:
ð2:1Þ
Proof. We have to verify that (2.1) implies (1.10). To this end, we notice f 0 ðx0 Þ f 0 ðxÞ ¼
Z
1
f 00 ðx þ sðx0 xÞÞðx0 xÞds:
0
Hence, 1
kf 0 ðx0 Þ ðf 0 ðx0 Þ f 0 ðxÞÞk 6
Z
1
1
kf 0 ðx0 Þ f 00 ðx þ sðx0 xÞÞkdskx0 xk:
0
Since L(u) is nondecreasing we conclude from (2.1) that Z 1 kf 0 ðx0 Þ1 ðf 0 ðx0 Þ f 0 ðxÞÞk 6 Lðkx þ sðx0 xÞ x0 kÞdskx0 xk 0
Z
1 0
Lðkx x0 k þ skx xkÞdskx xk ¼
6
¼
Z
kxx0 kþkx0 xk
LðuÞdu kxx0 k
0
Z
0
qðxx0 Þ
LðuÞdu; qðxÞ
which gives (1.10).
h
We are in the position to verify Theorem 1.1. Proof of Theorem 1.1. We first prove that if x 2 Bðx0 ; rÞ and r 2 [0, r*], then 1
kf 0 ðx0 Þ f 00 ðxÞk 6 p00 ðkx x0 kÞ:
ð2:2Þ
M. Wu / Applied Mathematics and Computation 194 (2007) 102–107
105
For this goal let e, b1, bi, i = 2, . . . , m be defined by e = x x0, b1 = x0 + s1e, bi = x0 + si(bi1 x0), si 2 [0, 1]. Thus (see also [1]) f 0 ðx0 Þ1 f 00 ðxÞ ¼ f 0 ðx0 Þ1 f 00 ðx0 Þ þ f 0 ðx0 Þ1 ½f 00 ðxÞ f 00 ðx0 Þ Z 1 1 00 1 0 0 ¼ f ðx0 Þ f ðx0 Þ þ f ðx0 Þ f 000 ½x0 þ s1 ðx x0 Þðx x0 Þds1 : 0 000
00
Using the same approach for f instead of f we conclude from the above that Z 1 1 00 1 00 1 0 0 0 f ðx0 Þ f ðxÞ ¼ f ðx0 Þ f ðx0 Þ þ f ðx0 Þ f 000 ðx0 Þðx x0 Þds1 þ f 0 ðx0 Þ
1
0
Z
1
½f 000 ½x0 þ s1 ðx x0 Þ f 000 ðx0 Þðx x0 Þds1 :
0
Clearly, Z 1
½f 000 ½x0 þ s1 ðx x0 Þ f 000 ðx0 Þðx x0 Þds1
0
can be written as Z 1Z 1 f ð4Þ fx0 þ s2 ½x0 þ s1 ðx x0 Þ x0 g½s1 ðx x0 Þðx x0 Þds2 ds1 : 0
0
Repeatedly, we obtain 1
1
f 0 ðx0 Þ f 00 ðxÞ ¼ f 0 ðx0 Þ f 00 ðx0 Þ þ f 0 ðx0 Þ
1
Z
1
f 000 ðx0 Þe ds1 þ þ
0
Z
Z 1 1 f 0 ðx0 Þ f ðmÞ ðx0 Þ 0 0 |fflfflfflfflfflffl{zfflfflfflfflfflffl} 1
m2
ðbm3 x0 Þ ðb1 x0 Þe dsm2 ds1 þ T ; where T ¼
Z
Z
1
1
1
f 0 ðx0 Þ ½f ðmÞ ðbm2 Þ f ðmÞ ðx0 Þðbm3 x0 Þ ðb1 x0 Þe dsm2 ds1 : 0 0 |fflfflfflfflfflffl{zfflfflfflfflfflffl} m2
Next we apply (1.5) for the last term and (1.4) for other terms to obtain Z 1 Z 1 Z 1 1 kf 0 ðx0 Þ f 00 ðxÞk 6 a2 þ a3 kekds1 þ . . . þ am kðbm3 x0 Þ ðb1 x0 Þekdsm2 ds1 0 0 0 |fflfflfflfflfflffl{zfflfflfflfflfflffl} þ amþ1
Z
Z
1
m2
1
kbm2 x0 k ke 0 0 |fflfflfflfflfflffl{zfflfflfflfflfflffl}
m3 Y
ðbi x0 Þkdsm2 ds1 :
i¼1
m2
The inequality (2.2) follows now from the definition of p and bi. To examine Condition B for f let amþ1 um1 þ þ a3 u þ a2 : LðuÞ ¼ p00 ðuÞ ¼ ðm 1Þ! Then, L(u) is positive, nondecreasing and integrable in [0, r]. Thus, (2.2) can be rewritten as 1
kf 0 ðx0 Þ f 00 ðxÞk 6 LðqðxÞÞ; whichR however implies (1.10) due to Lemma 1. To find out other numbers of Condition B, let d0 > 0 be such d that 0 0 p00 ðuÞdu ¼ 1. We get p 0 (d0) = 0, while (1.6) tells us that with d0 = s there holds p(d0) 6 0. Hence, for
106
M. Wu / Applied Mathematics and Computation 194 (2007) 102–107
b¼
Z
d0
LðuÞu du ¼
Z
0
d0
p00 ðuÞu du ¼ g pðd0 Þ;
ð2:3Þ
0
we have by (1.3) 1
b ¼ kf 0 ðx0 Þ f ðx0 Þk 6 b; i.e. (1.11) is satisfied. Thus, Condition B is satisfied. Moreover, as Z t p00 ðuÞðt uÞdu ¼ tp0 ð0Þ þ pðtÞ b ¼ t þ pðtÞ b; 0
we obtain by (1.9) hðtÞ ¼ b t þ
Z
t
p00 ðuÞðt uÞdu ¼ pðtÞ: 0
Hence, the majorizing function of the two theorems are the same. So {tn} and {rn} in Theorems A and B, respectively, are the same. Consequently, t* = r*, t** = r**, and x 2 Bðx1 ; t bÞ Bðx0 ; t Þ ¼ Bðx0 ; r Þ. Furthermore, using (1.13), we get for all n P n0 P 0 2nn0 kx xn0 k kx xn k 6 ðr rn Þ 6 r rn ; r rn0 which is better than (1.8), while (1.7) is contained in the proof of Theorem B (see [6]).
h
Remark 1. Let us observe (2.3). Clearly, a necessary and sufficient condition for b 6 b is p(d0) 6 0, i.e. p(s) 6 0. Therefore, we cannot find an m-times Fre´chet derivable function f which satisfies Condition B but not Condition A. In this sense Theorem B is superior as Theorem A. On the other hand, Theorem A gives a method to compute L(u) in Theorem B. 3. More about Theorems A and B In Theorem B, if L(u) is a positive constant, then the function h defined by h(t) = b t + Lt2/2 is the majorizing function in Kantorovich theorem. Consequently, we have the convergence of Newton’s method and corresponding posterior error estimate. Moreover, if L(u) = 2c/(1 cu)3, where c satisfies kf 0 (x0)1f (n)(x0)k 6 n!cn1, n P 2, we will get a convergence theorem under a premise of Smale type and corresponding error estimate (see [6]). From the computational point of view, if m in Theorem A is great, it will take us some time to compute (1.3)–(1.6). However, with the choice of L(u) = 2c/(1 cu)3 we only need to compute kf 0 (x0)1f00 (x)k, which may save the calculation time. In Theorem A, let m = 2, we get the main result of Huang (see [3]). Now since Condition B is weaker than Condition A (see Theorem 1.1), we can also deduce Huang’s result by theorem B with the choice L(u) = Kt + c, where K and c are defined as in [3]. In the same way, one can also obtain the result concerning Newton’s method in [2]. Though Theorem B can be used to obtain convergence theorem in some cases, Theorem B seems not comparable with the Kantorovich theorem. In other words, one can construct examples to show that with these examples the Kantorovich assumptions fail while Theorem B fulfills, and vice versa. To this end, we note that by the Kantorovich theorem one assumes: f is Fre´chet derivable and there exists x0 2 D such that 1
kf 0 ðx0 Þ f ðx0 Þk 6 a; 1
kf 0 ðx0 Þ ½f 0 ðxÞ f 0 ðyÞk 6 ckx yk
for all x; y 2 D
and ac 6 1/2. (This is a little weaker than the original Kantorovich assumptions.) Example 1. Let E1 = E2 = R, D = [5, 5], x0 = 0, and f be defined on D and given by 1 1 f ðxÞ ¼ x3 x þ : 6 3
M. Wu / Applied Mathematics and Computation 194 (2007) 102–107
107
We have a = 1/3 and c = 5. Then ac = 5/3 > 1/2. Therefore, the Kantorovich condition fails and we cannot get the convergence of Newton’s sequence startingpwith ffiffiffi x0 defined pffiffiffi by the Kantorovich theorem. On the other hand, (1.11) is satisfied with b = a = 1/3, d0 ¼ 2, b ¼ 2 2=3. f 0 (x0) = 1, f 00 (x) = x and L(u) = u, so kf 0 (x0)1f 00 (x)k 6 L(kx x0k). Therefore, (1.10) is fulfilled. Consequently, Theorem B can be applied to get the convergence of Newton’s sequence starting with x0. Example 2 (Gutie´rrez [2]). Let E1 = E2 = R, x0 = 0. Let f : E1 ! E2 be given by f ðxÞ ¼ sinðxÞ 5x 8: In this case, a = 2 and c = 1/4, pffiffiffi then ac pffiffi= ffi 1/2. Hence the hypothesis of the Kantorovich theorem holds. However, for L(u) = u/4, d0 ¼ 2 2, b ¼ 4 2=3 and b = a = 2, one has b > b. So the conditions of Theorem B do not be satisfied. References [1] I.K. Argyros, A Newton–Kantorovich theorem for equations involving m-Frechet differentiable operators and applications in radiative transfer, J. Comput. Appl. Math. 131 (2001) 149–159. [2] J.M. Gutie´rrez, A new semilocal convergence theorem for Newton’s method, J. Comput. Appl. Math. 79 (1997) 131–145. [3] Z. Huang, A note on the Kantorovich theorem for Newton iteration, J. Comput. Appl. Math. 47 (1993) 211–217. [4] L.V. Kantorovich, G.P. Akilov, Functional Analysis, Pergamon Press, Oxford, 1982. [5] A.M. Ostrowski, Solution of Equations in Euclidean and Banach Spaces, Academic Press, New York, 1979. [6] Xinghua Wang, Convergence of Newton’s method and inverse function theorem in Banach space, Math. Comput. 68 (1999) 169–186. [7] T. Yamamoto, A method for finding sharp error bounds for Newton’s method the Kantorovich assumptions, Numer. Math. 49 (1986) 203–220.