Accepted Manuscript Approximate Gauss-Newton methods for solving underdetermined nonlinear least squares problems
Ji-Feng Bao, Chong Li, Wei-Ping Shen, Jen-Chih Yao, Sy-Ming Guu
PII: DOI: Reference:
S0168-9274(16)30157-X http://dx.doi.org/10.1016/j.apnum.2016.08.007 APNUM 3078
To appear in:
Applied Numerical Mathematics
Received date: Revised date: Accepted date:
24 October 2015 15 August 2016 21 August 2016
Please cite this article in press as: J.-F. Bao et al., Approximate Gauss-Newton methods for solving underdetermined nonlinear least squares problems, Appl. Numer. Math. (2016), http://dx.doi.org/10.1016/j.apnum.2016.08.007
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Approximate Gauss-Newton methods for solving underdetermined nonlinear least squares problems Ji-Feng Bao∗
Chong Li†
Wei-Ping Shen‡
Jen-Chih Yao§
Sy-Ming Guu¶
Abstract. We propose several approximate Gauss-Newton methods, i.e., the truncated, perturbed, and truncated-perturbed GN methods, for solving underdetermined nonlinear least squares problems. Under the assumption that the Fr´echet derivatives are Lipschitz continuous and of full row rank, Kantorovich-type convergence criteria of the truncated GN method are established and local convergence theorems are presented with the radii of convergence balls obtained. As consequences of the convergence results for the truncated GN method, convergence theorems of the perturbed and truncated-perturbed GN methods are also presented. Finally, numerical experiments are presented where the comparisons with the standard inexact Gauss-Newton method and the inexact trust-region method for bound-constrained least squares problems [M. Porcelli, On the convergence of an inexact Gauss-Newton trust-region method for nonlinear least-squares problems with simple bounds, Optim. Lett. 7 (2013), 447–465] are made.
Key words. nonlinear least squares problems, approximate Gauss-Newton methods, Lipschitz condition. AMS subject classification. 65K05, 93E24 ∗
School of Mathematics, Physics and Information Science, Zhejiang Ocean University, Zhoushan, Zhejiang 316022, P. R. China. Key Laboratory of Oceanographic Big Data Mining & Application of Zhejiang Province, Zhoushan, Zhejiang 316022, P. R. China (
[email protected]). This author was supported by Scientific Research Foundation of Zhejiang Ocean University (21065013613). † Department of Mathematics, Zhejiang University, Hangzhou 310027, P. R. China (
[email protected]). This author was supported in part by the National Natural Science Foundation of China (grants 11171300, 11371325). ‡ Department of Mathematics, Zhejiang Normal University, Jinhua 321004, P. R. China (
[email protected]). § Center for General Education, China Medical University, Taichung 40402, Taiwan (
[email protected]). This author is supported in part by the Grant MOST 105-2221-E-039-MY3. ¶ Corresponding author. Graduate Institute of Business and Management, Department of Neurology, College of Management, Chang Gung University, Chang Gung Memorial Hospital, Taoyuan City, Taiwan (
[email protected];
[email protected]), Research was partially supported by MOST 105-2221-E-182052 and BMRPD17.
1
2
1
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
Introduction
Let D be an open set and f : D ⊆ Rn → Rm be a nonlinear operator with the continuous Fr´echet derivative denoted by f . Let φ : D ⊆ Rn → R be defined by 1 φ(x) := f (x)T f (x), 2
for each x ∈ D.
The nonlinear least squares problem (NLSP) of f is defined by min φ(x). x∈D
(1.1)
The NLSP (1.1) arises most commonly from data-fitting applications [3, 8, 14]. This model is particularly useful in the formulation of a parameterized system in which a chemical, physical, financial, or economic application could use the function φ to measure the discrepancy between the model and the output of the system at various observation points. By minimizing the function φ, they select the parameter values which best match the model to the data [2, 21]. Note that finding the stationary points of φ is equivalent to solving the nonlinear gradient equation (1.2) ∇φ(x) := f (x)T f (x) = 0. Based on this equivalence, Newton’s method for solving nonlinear equation (1.2) can be applied to solving NLSP (1.1) (cf. [13]). However, Newton’s method for solving (1.2) requires the computation of the Hessian matrix of φ at each iteration xk : ∇2 φ(xk ) := f (xk )T f (xk ) + R(xk ),
(1.3)
where R(xk ) is the second order term which may be difficult to obtain especially for large scale problems [9]. In order to make the procedure more efficient, Newton’s method could be approximated by ignoring the second-order term in the Hessian matrix (1.3) and this yields the Gauss-Newton (GN) method. More precisely, the GN step dk is defined by the minimum norm solution of the following equation: f (xk )T f (xk )dk = −f (xk )T f (xk ).
(1.4)
That is, in terms of f (xk )† , the GN step dk is given by dk := −f (xk )† f (xk ),
(1.5)
where f (xk )† represents the Moore-Penrose inverse of f (xk ) (see section 2 for the definition). Many authors have studied the local as well as semi-local convergence of the GN method; see for example [6, 11, 12, 17, 21, 22]. However, as expressed in (1.4), the GN method has the following two disadvantages from the point of view of practical calculation: (a) it requires the computation of the Fr´echet
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP
3
derivative f (xk ) at each outer iteration; (b) it requires solving the equation (1.4) exactly at each outer iteration. This sometimes makes the GN method inefficient especially when the problem size is large. Noting these two disadvantages, Gratton et al. designed in [9] some approximate GN methods for solving the NLSP (1.1) with m ≥ n. To overcome the disadvantage in (a), the perturbed GN method was proposed in [9] where the Fr´echet derivative f (xk ) was replaced by a perturbed Fr´echet derivative Jk which is much easier or computationally less expensive to calculate. More precisely, the perturbed GN method finds the step dk such that JkT Jk dk = −JkT f (xk ). (1.6) As for the drawback in (b), a natural approach is to solve the equation (1.4) inexactly instead of exactly and this yields the truncated GN method proposed in [9]. For the residual rk and the iteration xk , the truncated GN method finds the step dk such that f (xk )T f (xk )dk = −f (xk )T f (xk ) + rk .
(1.7)
The third approximate GN method designed in [9] is the truncated-perturbed GN method which avoids both disadvantages in (a) and (b) and, for the residual rk and the iteration xk , finds the step dk such that JkT Jk dk = −JkT f (xk ) + rk . Under the assumption that the (perturbed) Fr´echet derivatives are of full column rank, the truncated, perturbed, and truncated-perturbed GN methods were proved to be convergent in [9]. However, in the case when m < n or without the full column rank assumption of the (perturbed) Fr´echet derivatives, the sequences {xk } generated by these approximate GN methods are not convergent in general. For example, we consider the function f : R2 → R1 defined by f (x) := x(1) x(2) , for each x = (x(1) , x(2) )T ∈ R2 . (1)
(2)
(2)
Let x0 = (x0 , x0 )T be any point in R2 . For each k ≥ 1, we define xk := (0, k!x0 )T . (1) (2) Then d0 = (−x0 , 0)T and dk = (0, kk!x0 )T for each k ≥ 1. Thus, one can check that the sequences {xk } and {dk } satisfy (1.7) with rk ≡ 0. This implies that {xk } is generated by the truncated GN method proposed in [9] (with rk ≡ 0). Obviously, {xk } doesn’t converge. (Note that in the case when each rk ≡ 0 and each search direction dk is chosen to be the minimum norm solution of (1.7), the approximate GN method is reduced to the GN method; hence, {xk } is convergent.) The purpose of this paper is, in the case when m ≤ n (i.e., the NLSP (1.1) is underdetermined), trying to propose some approximate GN methods for solving the NLSP (1.1) and study the convergence of the proposed method under the full row rank assumption. For this purpose, note that, in the case when f (xk ) is of full row rank, f (xk )† = f (xk )T (f (xk )f (xk )T )−1 (see section 2 for details) and thus, solving (1.5) is equivalent to solving the equation (1.8) f (xk )f (xk )T sk = −f (xk )
4
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
and setting the step dk := f (xk )T sk .
(1.9)
Note that in the case when m < n, the problem size of (1.8) is less than the one of (1.4). In particular, when m n, the methods based on the reformation (1.8) and (1.9) are more attractive than the methods based on (1.4) as shown by the numerical experiments in section 5. Hence, based on (1.8), (1.9) and motivation by [9], we propose here the truncated GN method (Algorithm TGNU) for solving the underdetermined NLSP (1.1) which solves (1.8) inexactly and compute dk by (1.9). We also design the perturbed GN method (Algorithm PGNU) where the Fr´echet derivatives in (1.8) and (1.9) are replaced by perturbed Fr´echet derivatives. Furthermore, by solving the corresponding equation in Algorithm PGNU inexactly, we propose the third approximate GN method, i.e., the truncated-perturbed GN method (Algorithm TPGNU). Under the assumption that the Fr´echet derivatives are Lipschitz continuous and of full row rank, Kantorovich-type convergence criteria of Algorithm TGNU are established and local convergence theorems are presented with the radii of convergence balls obtained. Applying the obtained convergence results of Algorithm TGNU, we also provide some convergence theorems for Algorithms PGNU and TPGNU. As a direct consequence of the main results, our result (Corollary 3.1) improves the corresponding one in [25, Theorem 3.1] (and so partially the one in [10, Theorem 2.6]) for inexact Newton methods. Numerical experiments are carried out in section 5 which illustrate that, in terms of the CPU time, the proposed methods are more effective than the standard inexact Gauss-Newton method and the inexact trust-region method for bound-constrained least squares problems (i.e., Algorithm ITREBO in [23]). This paper is organized as follows. In section 2, we present some preliminary notions and results. In section 3, we propose Algorithm TGNU and present the local as well as semi-local convergence results of this algorithm. Algorithms PGNU, TPGNU and their corresponding convergence results are given in section 4. Numerical experiments are reported in section 5. While in the last section, a concluding remarks is presented.
2
Preliminaries
Let a, b, and λ be positive constants. We define the function ϕb (·) : [0, +∞) → R by ϕb (t) := at2 − bt + λ,
for each t ≥ 0.
(2.1)
In particular, we write ϕ1 := ϕb in the case when b = 1. Obviously, the derivative ϕb is strictly increasing on [0, +∞). Moreover, if b2 , 4 then the function ϕb has two zeros and the smaller positive zero is √ b − b2 − 4aλ ∗ . t := 2a aλ ≤
(2.2)
(2.3)
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP
5
Let {tk } and {t˜k } be the sequences generated respectively by t0 := 0,
tk+1 := tk −
ϕb (tk ) ϕ1 (tk )
for each k = 0, 1, . . .
(2.4)
t˜0 := 0,
ϕb (t˜k ) t˜k+1 := t˜k − ϕb (t˜k )
for each k = 0, 1, . . ..
(2.5)
and
The following lemma presents some properties of {tk } and {t˜k } which are known in [12] and [25]. Lemma 2.1. Suppose that (2.2) holds. Let t∗ be given by (2.3). Let {tk } and {t˜k } be generated by (2.4) and (2.5), respectively. Then tk < tk+1 < t∗ ,
t˜k < t˜k+1 < t∗
for each k = 0, 1, . . .
(2.6)
and, {tk } and {t˜k } converge increasingly to t∗ . Moreover, λ t˜k+1 − t˜k ≤ b and t∗ − t˜k =
for each k = 0, 1, . . .
k
ξ 2 −1 ∗ t k −1 2 j ξ
for each k = 0, 1, . . . ,
j=0
√ b − b2 − 4aλ √ . ξ := b + b2 − 4aλ
where
(2.7)
Now we present the definition and some properties of the Moore-Penrose inverse. For details, one may refer to [1, 26, 27]. Let A : Rn → Rm be a linear operator (or an m × n matrix) and AT denote the adjoint of A. We say that an operator A† : Rm → Rn (or an n × m matrix A† ) is the Moore-Penrose inverse of A if it satisfies the following four equalities: AA† A = A,
A† AA† = A† ,
(AA† )T = AA† ,
(A† A)T = A† A.
In particular, if A is of full row rank, then A† = AT (AAT )−1
and
AA† = IRm ,
where IRm denotes the m × m identity matrix. Moreover, by the definition of the MoorePenrose inverse, one can easily check that (A† B)† = B † A if both A and B are of full row rank. For the remainder of this paper, we always assume that m ≤ n. Let B(x, r) and B(x, r) denote, respectively, the open and closed ball in Rn with center x and radius r > 0. Let
6
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
x ¯ ∈ Rn and r¯ > 0. Let · be the Euclidean vector norm or its induced matrix norm. A nonlinear operator g(·) : Rn → Rm is said to satisfy the Lipschitz condition with Lipschitz constant L on B(¯ x, r¯) if g(x) − g(y) ≤ L x − y
for each x, y ∈ B(¯ x, r¯).
Recall that f : D ⊆ Rn → Rm is a nonlinear operator with the continuous Fr´echet derivative denoted by f . Applying [12, Lemma 3.1] to the new function f˜(·) := f (¯ x)† f (·) in place of F (·), we obtain the following lemma (noting by the proof of [15, Corollary 3.2] that f (x) is x)). of full row rank and f˜(x)† = f (x)† f (¯ x) is of full row rank. Let r¯ > 0 be such that Lemma 2.2. Let x ¯ ∈ D be such that f (¯ † B(¯ x, r¯) ⊆ D. Suppose that f (¯ x) f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(¯ x, r¯). Suppose that 0 < r ≤ min{¯ r, 1/L}. Then, for each x ∈ B(¯ x, r), f (x) is of full row rank and moreover x) ≤ f (x)† f (¯
3
1 . 1 − L x − x ¯
Algorithm TGNU and convergence analysis
Given the non-negative sequence {k }, we propose the following truncated GN method, i.e., Algorithm TGNU({k }), for solving the underdetermined NLSP (1.1). Algorithm TGNU({k }) Choose an initial guess x0 ∈ D ⊆ Rn . For k = 0, 1, . . . until convergence, do: 1. Compute the Fr´echet derivative f (xk ) of the function f at xk . 2. Solve (1.8) to find sk such that the residual rk defined by rk := f (xk )f (xk )T sk + f (xk )
(3.1)
f (x0 )† rk ≤ k .
(3.2)
satisfies that 3. Set xk+1 := xk + f (xk )T sk . Remark 3.1. Let k ≥ 0 and suppose that f (xk ) is of full row rank. Then, one has by (3.1) that sk = (f (xk )f (xk )T )−1 ·(−f (xk ) + rk ), which together with Step 3 in Algorithm TGNU({k }) yields that xk+1 − xk = −f (xk )† f (xk ) + f (xk )† rk . This equality will be useful in the following convergence analysis.
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP
7
Below we present some convergence results including the local and semi-local convergence for Algorithm TGNU({k }). For this purpose, let x0 ∈ D be such that f (x0 ) is of full row rank. Throughout this paper, we define α := f (x0 )† f (x0 ) .
(3.3)
Let {θk } be a non-negative sequence such that 0 ≤ θ := sup θk < 1. Let {xk } be a sequence k≥0
generated by Algorithm TGNU({k }) with initial point x0 . The following conditions are considered respectively for each k ≥ 0: k ≤ θk f (x0 )† f (xk )
(3.4)
k ≤ θk f (x0 )† f (xk ) 2 .
(3.5)
and
3.1
Convergence of Algorithm TGNU({k }) with (3.4) satisfied
To establish the convergence of Algorithm TGNU({k }) with (3.4) satisfied, we write G := (1 + θ)
(1 − θ)2
(2θ2 − θ + 1)2 + 2θ(1 − θ)3 + 2θ2 − θ + 1
(3.6)
and set a :=
L(1 + θ) θ[1 − Lα(1 + θ)] , b := 1 − , λ := α(1 + θ). 2[1 + Lαθ(1 + θ)] 1 + Lαθ(1 + θ)
(3.7)
Then we have the following lemma. Lemma 3.1. Let α and G be defined by (3.3) and (3.6), respectively. Let a, b, and λ be defined by (3.7). Recall that L is the Lipschitz constant. Suppose that Lα ≤ G.
(3.8)
Then (2.2) holds, i.e., aλ ≤ b2 /4. Proof. In the case when θ = 0, one can easily check that (3.8) implies (2.2). Below we consider the case when 0 < θ < 1. For simplicity, we write s := Lα(1 + θ)
and
(1 − θ)2 . z := (2θ2 − θ + 1)2 + 2θ(1 − θ)3 + 2θ2 − θ + 1
Obviously, z is the positive zero of the function h : R → R defined by h(x) := 2θ(θ − 1)x2 − 2(2θ2 − θ + 1)x + (1 − θ)2 ,
for each x ∈ R.
(3.9)
8
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
Then, thanks to (3.8) and using the definitions of s, z and G, we have that (1 − θ)2 = z. 0≤s≤ (2θ2 − θ + 1)2 + 2θ(1 − θ)3 + 2θ2 − θ + 1 It follows that h(s) ≥ 0
(3.10)
(noting that h(0) = (1 − θ)2 > 0 and 2θ(θ − 1) ≤ 0). On the other hand, using (3.7) and the definition of s in (3.9), we get
θ(1 − s) b − 4aλ = 1 − 1 + θs 2
2
−
2(1 + θ)s h(s) . = 1 + θs (1 + θs)2
Therefore, we derive from (3.10) that b2 − 4aλ ≥ 0; and hence (2.2) holds. The proof is complete. Let t∗ and {tk } be defined by (2.3) and (2.4) with a, b, and λ given by (3.7). Then we have the following semi-local convergence result for Algorithm TGNU({k }) with (3.4) satisfied. Theorem 3.1. Let x0 ∈ D be such that f (x0 ) is of full row rank. Let r¯ > 0 be such that B(x0 , r¯) ⊆ D. Suppose that f (x0 )† f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(x0 , r¯). Suppose that (3.8) holds and t∗ ≤ r¯. Let {xk } be a sequence generated by Algorithm TGNU({k }) with initial point x0 . If {k } satisfies (3.4), then {xk } is welldefined and converges to a solution x∗ of f (x) = 0 in B(x0 , t∗ ). Moreover, the following estimate holds: xk − x∗ ≤ t∗ − tk for each k = 0, 1, . . . . (3.11) 1 Proof. By (3.6), it is easy to derive that G < 1+θ . Then, (3.8) gives that 0 ≤ Lα(1 + θ) < 1 which together with (3.7) implies that 0 < b ≤ 1. On the other hand, using (3.8) and applying Lemma 3.1, one sees that (2.2) holds. Thus, Lemma 2.1 is applicable to conclude that (2.6) holds and hence,
tk ≥ t1 = λ
and
− ϕ1 (tk ) ≥ −ϕb (tk ) ≥ −ϕb (t∗ ) ≥ 0
for each k = 1, 2, . . . .
(3.12)
Therefore, by (3.12), (3.7), and the definition of ϕ1 , we have that, for each k = 1, 2, . . ., 1 − Ltk ≥ 1 + Lθλ − L(1 + θ)tk = −(1 + Lαθ(1 + θ))ϕ1 (tk ) ≥ 0.
(3.13)
By letting k → ∞ in the above inequality, we obtain t∗ ≤ 1/L. We first show that {xk } is well-defined and for each k = 0, 1, . . ., the following two estimates hold: (1 − Ltk )(tk+1 − tk ) f (x0 )† f (xk ) ≤ ; (3.14) 1+θ
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP xk+1 − xk ≤ tk+1 − tk .
9 (3.15)
To do this, we suppose that for each k ≥ 0, if xk is well-defined, then k satisfies (3.4). Then, by (3.2) and the fact that θ = sup θk , k≥0
f (x0 )† r0 ≤ θ0 f (x0 )† f (x0 ) ≤ θ f (x0 )† f (x0 ) .
(3.16)
Recall that α = f (x0 )† f (x0 ) , t0 = 0 and t1 = λ = α(1 + θ). (3.14) is seen to hold for k = 0. Noting that f (x0 ) is of full row rank, one sees that x1 is well-defined. Moreover, by Remark 3.1 and (3.16), we have that x1 − x0 = − f (x0 )† f (x0 ) + f (x0 )† r0 ≤ (1 + θ) f (x0 )† f (x0 ) = t1 − t0 . This means that (3.15) holds for k = 0. Assume that {xk }lk=0 is well-defined and that (3.14) and (3.15) hold for all k ≤ l − 1. Then, by (3.2), (3.4) and the definition of θ, we get that for each 0 ≤ k ≤ l, (3.17) f (x0 )† rk ≤ θk f (x0 )† f (xk ) ≤ θ f (x0 )† f (xk ) . Moreover, using (2.6) and (3.15) (with k ≤ l − 1), one has that xk+1 − x0 ≤
k
xi+1 − xi ≤
i=0
k
(ti+1 − ti ) = tk+1 < t∗
for each k ≤ l − 1.
i=0
In particular, xl − x0 ≤ tl < t∗
and
xl−1 − x0 ≤ tl−1 < t∗ .
(3.18)
r, 1/L} and f (x0 )† f (·) satisfies the Lipschitz condition on B(x0 , r¯), Since 0 < t∗ ≤ min{¯ Lemma 2.2 (with x ¯ = x0 and x = xl−1 ) is applicable to conclude that f (xl−1 ) has full row rank and hence, by Remark 3.1 (with k = l − 1), xl − xl−1 = −f (xl−1 )† f (xl−1 ) + f (xl−1 )† rl−1 . Consequently, writing xτl−1 := xl−1 + τ (xl − xl−1 ) for each 0 ≤ τ ≤ 1, we get
f (x0 )† f (xl ) = f (x0 )† f (xl ) − f (xl−1 ) − f (xl−1 )(xl − xl−1 ) + rl−1 1
τ † + f (x0 )† rl−1 . (3.19) f ≤ (x ) (x ) − f (x ) (x − x )dτ f 0 l−1 l l−1 l−1 0
Note by (3.18) that xτl−1 − x0 ≤ τ xl − x0 + (1 − τ ) xl−1 − x0 < t∗ . Thus, using the Lipschitz assumption, we obtain by (3.15) (with k = l − 1) that 1 1 L † τ (f (xl−1 )−f (xl−1 ))(xl −xl−1 )dτ ≤ Lτ xl −xl−1 2 dτ ≤ (tl −tl−1 )2 . (3.20) f (x0 ) 2 0 0
10
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
On the other hand, (3.17) (with k = l − 1) and (3.14) (with k = l − 1) imply that f (x0 )† rl−1 ≤ θ f (x0 )† f (xl−1 ) ≤
θ(1 − Ltl−1 )(tl − tl−1 ) . 1+θ
(3.21)
Substituting (3.20) and (3.21) into (3.19), we derive by (3.12) and (3.13) that f (x0 )† f (xl ) 1 − Ltl ≤ −(1 + θ)[1 + Lαθ(1 + θ)]ϕ1 (tl )
L(1 + θ) 2 (tl − tl−1 ) + θ(1 − Lλ)(tl − tl−1 ) . 2
It follows from (3.7), (2.1), and (2.4) that 1 1 − Ltl a(tl − tl−1 )2 + (1 − b)(tl − tl−1 ) 1 + θ −ϕ1 (tl ) 1 1 − Ltl = ϕb (tl ) − ϕb (tl−1 ) − ϕ1 (tl−1 )(tl − tl−1 ) 1 + θ −ϕ1 (tl ) (1 − Ltl )(tl+1 − tl ) = . 1+θ
f (x0 )† f (xl ) ≤
(3.22)
r, 1/L} and f (x0 )† f (·) Therefore, (3.14) holds for k = l. Since xl − x0 ≤ tl < t∗ ≤ min{¯ satisfies the Lipschitz condition on B(x0 , r¯), Lemma 2.2 (with x ¯ = x0 and x = xl ) is applicable to conclude that f (xl ) has full row rank and f (xl )† f (x0 ) ≤
1 1 . ≤ 1 − L xl − x0 1 − Ltl
(3.23)
Then, xl+1 is well-defined. On the other hand, we note by (3.17) (with k = l) and Remark 3.1 (with k = l) that xl+1 −xl = f (xl )† f (x0 )(−f (x0 )† f (xl )+f (x0 )† rl ) ≤ (1+θ) f (xl )† f (x0 ) · f (x0 )† f (xl ) . Substituting (3.22) and (3.23) into the above inequality, one sees that (3.15) holds for k = l. Therefore, {xk } is well defined and moreover, (3.14) and (3.15) hold for each k ≥ 0. Below we will prove that {xk } converges to a solution x∗ of f (x) = 0 in B(x0 , t∗ ) and moreover (3.11) holds. In fact, by (3.15), we get that xk+p − xk ≤
p i=1
xk+i − xk+i−1 ≤
p
(tk+i − tk+i−1 ) = tk+p − tk
(3.24)
i=1
for any p ≥ 0 and k ≥ 0. This implies {xk } is a Cauchy sequence; and hence there exists a vector x∗ ∈ Rn such that xk → x∗ . Then, letting p → ∞ in (3.24), one can see that xk − x∗ ≤ t∗ − tk which together with the assumption implies that
1 ∗ ∗ . x0 − x ≤ t ≤ min r¯, L
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP
11
Thus, Lemma 2.2 (with x ¯ = x0 and x = x∗ ) is applicable to conclude that f (x∗ ) is of full row rank and hence, f (x∗ )T f (x∗ ) = 0 if and only if f (x∗ ) = 0. Moreover, by letting k → ∞ in (3.14) and using the fact that f (x) = f (x0 )f (x0 )† f (x), we know that f (x∗ ) = 0. Note that in the case when m = n, (3.2) and (3.4) are reduced to the following one: f (x0 )−1 rk ≤ k ≤ θk f (x0 )−1 f (xk ) ,
for each
k = 0, 1, . . . .
(3.25)
Thus, applying Theorem 3.1 to the special case when m = n, we have Corollary 3.1 below for inexact Newton methods which has been studied extensively ([5, 16, 20, 24, 25, 28]). In particular, it improves the corresponding one in [25, Theorem 3.1] (and so partially the one in [10, Theorem 2.6]), where the following stronger assumption is required: Lα ≤ G1 :=
(1 − θ)2 (1 + θ) (2(1 + θ) − θ(1 − θ)2 )
(noting that G ≥ G1 holds trivially). Corollary 3.1. Let m = n and x0 ∈ D be such that f (x0 ) is invertible. Let r¯ > 0 be such that B(x0 , r¯) ⊆ D. Suppose that f (x0 )−1 f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(x0 , r¯). Suppose that (3.8) holds and t∗ ≤ r¯. Let {xk } be a sequence generated by the inexact Newton method with initial point x0 . If {k } satisfies (3.25), then {xk } is welldefined and converges to a solution x∗ of f (x) = 0 in B(x0 , t∗ ). Moreover, the estimate (3.11) holds. The following example provides a case where our Corollary 3.1 is applicable but not [25, Theorem 3.1]. Example 3.1. Define f : R2 → R2 by 1 2 1 2 1 23 T ξ − ξ , ξ1 − f (x) := 2 1 2 2 2 50
It is clear that
f (x) = and hence
f (x) − f (y) =
ξ1 −ξ2 1 0 2
ξ1 − ζ1 ζ2 − ξ2 0 0
for each x = (ξ1 , ξ2 )T ∈ R2 .
for each x = (ξ1 , ξ2 )T ∈ R2 ; for each x = (ξ1 , ξ2 )T , y = (ζ1 , ζ2 )T ∈ R2 .
Choosing x0 ∈ R2 such that f (x0 )−1 exists, one can see that f (x0 )−1 (f (x) − f (y)) ≤ f (x0 )−1 · x − y ,
for each x, y ∈ R2 .
12
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
This implies that the operator f (x0 )−1 f satisfies the Lipschitz condition on R2 with Lipschitz constant L := f (x0 )−1 . Choose x0 = (0.95, 0.96)T . Then, Lα = f (x0 )−1 · f (x0 )−1 f (x0 ) = 0.14484818 · · · . On the other hand, if we take θ = 0.38, then G = 0.14566136 · · · > Lα and G1 = 0.10656403 · · · < Lα. Therefore, Corollary 3.1 is applicable but not [25, Theorem 3.1]. Now we present the local convergence result for Algorithms TGNU({k }) with (3.4) satisfied by applying Theorem 3.1. For this purpose, let ˜ := L
L . 1 − Lr
(3.26)
The following lemma is crucial for the local convergence analysis. Lemma 3.2. Let x∗ ∈ D be a solution of f (x) = 0 such that f (x∗ ) is of full row rank. Let r¯ > 0 be such that B(x∗ , r¯) ⊆ D. Suppose that f (x∗ )† f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(x∗ , r¯). Suppose that 0 < r < min{¯ r, 1/L}. Then, for each ∗ x0 ∈ B(x , r), the following assertions hold: (i) f (x0 ) is of full row rank and f (x0 )† f (x∗ ) ≤ 1/(1 − Lr); ˜ on B(x0 , r¯ − r); (ii) f (x0 )† f (·) satisfies the Lipschitz condition with Lipschitz constant L (iii) α ≤
Lr2 + r, where α is defined by (3.3). 2(1 − Lr)
¯ = x∗ and x = x0 ), one can see Proof. Let x0 ∈ B(x∗ , r). Then, applying Lemma 2.2 (with x that assertion (i) holds. Note that, for each x ∈ B(x0 , r¯ − r), x − x∗ ≤ x − x0 + x0 − x∗ < r¯. Thus, by assertion (i) and the Lipschitz assumption of f (x∗ )† f (·), we have that f (x0 )† (f (x) − f (y)) ≤ f (x0 )† f (x∗ ) · f (x∗ )† (f (x) − f (y)) ≤
L x − y 1 − Lr
for each x, y ∈ B(x0 , r¯ − r). This shows assertion (ii). For the proof of assertion (iii), we note that
α = f (x0 )† f (x0 ) = f (x0 )† f (x0 ) + f (x0 )(x∗ − x0 ) − f (x∗ ) − f (x0 )† f (x0 )(x∗ − x0 ) . Then, writing xτ0 = x0 + τ (x∗ − x0 ) for each 0 ≤ τ ≤ 1, one has 1 † ∗ f (x∗ )† (f (x0 ) − f (xτ0 ))(x∗ − x0 )dτ + f (x0 )† f (x0 )(x∗ − x0 ) . α ≤ f (x0 ) f (x ) · 0
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP
13
Thus, we deduce from assertions (i) and (ii) that α≤
Lr2 L x∗ − x0 2 + x∗ − x0 < + r. 2(1 − Lr) 2(1 − Lr)
Hence, assertion (iii) holds and the proof is complete. Then, applying Theorem 3.1, we present the following local convergence result of Algorithm TGNU({k }). Recall that G is defined by (3.6). Theorem 3.2. Let x∗ ∈ D be a solution of f (x) = 0 such that f (x∗ ) is of full row rank. Let r¯ > 0 be such that B(x∗ , r¯) ⊆ D. Suppose that f (x∗ )† f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(x∗ , r¯). Let
1 (1 − θ)¯ r 1 √ 1 − r := min (3.27) , 1 − θ + 4(1 + θ)2 L 1 + 2G and x0 ∈ B(x∗ , r). Let {xk } be a sequence generated by Algorithm TGNU({k }) with initial point x0 . If {k } satisfies (3.4), then {xk } is well defined and moreover converges to x ˆ∗ , a solution of f (x) = 0. Proof. By (3.27), one can see that 0 < r < min{¯ r, 1/L}. Then, assertions (i)–(iii) in Lemma 3.2 are seen to hold. Thus, thanks to Theorem 3.1, it suffices to prove that ˜ ≤G Lα
and
t∗ ≤ r¯ − r.
(3.28)
˜ and t∗ are defined by (3.26) and (2.3) with Here L ˜ + θ) ˜ L(1 θ[1 − Lα(1 + θ)] , b := 1 − , λ := α(1 + θ). ˜ ˜ 2[1 + Lαθ(1 + θ)] 1 + Lαθ(1 + θ) 1 , we have by assertion (iii) that Since r ≤ L1 1 − √1+2G a :=
˜ ≤1 Lα 2
Lr 1 − Lr
2 +
Lr ≤ G. 1 − Lr
(3.29)
(3.30)
Below we estimate t∗ . By (2.3) and (3.29), one has t∗ =
b+
√
˜ 2λ 2λ 2α(1 + θ)[1 + Lαθ(1 + θ)] 2α(1 + θ)2 ≤ < , ≤ b 1−θ 1−θ b2 − 4aλ
(3.31)
˜ ≤ G < 1/(1 + θ). On the other hand, by (3.27), where the last inequality holds because Lα one has 1 1 >√ . 0 < Lr < 1 and 1 − Lr ≥ √ 1 + 2G 3
14
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
Then, it follows from assertion (iii) that α≤
Lr2 + r ≤ 2r. 2(1 − Lr)
Thus, we have by (3.31) that t∗ + r ≤
4r(1 + θ)2 + r ≤ r¯ 1−θ
(1 − θ)¯ r ). This together with (3.30) shows that (3.28) holds and 1 − θ + 4(1 + θ)2 hence the proof is complete.
(noting that r ≤
The following corollary is a direct consequence of Theorem 3.2 for the GN method. Corollary 3.2. Let x∗ ∈ D be a solution of f (x) = 0 such that f (x∗ ) is of full row rank. Let r¯ > 0 be such that B(x∗ , r¯) ⊆ D. Suppose that f (x∗ )† f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(x∗ , r¯). Let √ r¯ 1 − 1/ 2 r := min , 5 L and x0 ∈ B(x∗ , r). Let {xk } be the sequence generated by the GN method with initial point x0 . Then {xk } is well defined and moreover converges to x ˆ∗ , a solution of f (x) = 0.
3.2
Convergence of Algorithm TGNU({k }) with (3.5) satisfied
In this subsection, we present some convergence results for Algorithm TGNU({k }) with (3.5) satisfied. It should be mentioned that condition (3.5) is more attractive because it assures a quadratic convergence of Algorithm TGNU({k }). Let us write a :=
L(1 + θα) θ θ + , b := 1 + 2[1 + Lθα2 (1 + θα)][1 + θ(α − 1)] 1 + θ(α − 1) 1 + θ(α − 1)
(3.32)
λ := αb(1 + αθ).
(3.33)
and Obviously, b ≥ 1. Then, based on the definitions of a, b, and λ given by (3.32) and (3.33), we present the following lemma which is similar to Lemma 3.1. Lemma 3.3. Let α be defined by (3.3). Let a, b, and λ be defined by (3.32) and (3.33). Recall that L is the Lipschitz constant. Suppose that α≤ Then (2.2) holds, i.e., aλ ≤ b2 /4.
1 (L +
2θ)2
+ 2Lθ + L + 2θ
.
(3.34)
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP
15
Proof. In the case when θ = 0, it is easy to check that (3.34) implies (2.2). Below we consider the case when 0 < θ < 1. Indeed, by (3.34), one has (L + 2θ)2 + 2Lθ − (L + 2θ) . α≤ 2Lθ This gives that 2Lθα2 + 2α(L + 2θ) ≤ 1 which is equivalent to α≤
1 . 2L(1 + αθ) + 4θ
(3.35)
On the other hand, we have by (3.32) that −1 1 2L(1 + αθ)2 b ≥ = (1 + αθ) + 4θ(1 + αθ) . 2 4a(1 + αθ) 1 + Lθα (1 + αθ) 2L(1 + αθ) + 4θ Combining this with (3.35), we get that α≤
b . 4a(1 + αθ)
Then, using (3.33), one sees that (2.2) holds and the proof is complete. Thanks to the estimations of t˜k+1 − t˜k and t∗ − t˜k in Lemma 2.1 and using Lemma 3.3 (instead of Lemma 3.1), one can establish the following convergence result for Algorithm TGNU({k }) with (3.5) satisfied. The proof is very similar to that of Theorem 3.1 and so is omitted here. Theorem 3.3. Let x0 ∈ D be such that f (x0 ) is of full row rank. Let r¯ > 0 be such that B(x0 , r¯) ⊆ D. Suppose that f (x0 )† f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(x0 , r¯). Suppose that (3.34) holds and t∗ ≤ r¯. Let {xk } be a sequence generated by Algorithm TGNU({k }) with initial point x0 . If {k } satisfies (3.5), then {xk } is well-defined and converges quadratically (in the root-convergence sense) to a solution x∗ of f (x) = 0 in B(x0 , t∗ ). Moreover, the following estimate holds: k
ξ 2 −1 xk − x∗ ≤ 2k −1 t∗ j j=0 ξ
for each k = 0, 1, . . . .
(3.36)
Here t∗ and ξ are defined by (2.3) and (2.7) with a, b, and λ given in (3.32) and (3.33). ˆ be defined by Remark 3.2. Let G(θ) := G be defined by (3.6) for each θ ∈ (0, 1) and let G ˆ G(θ, L) :=
L (L +
2θ)2
+ 2Lθ + L + 2θ
,
for each L ∈ (0, +∞) and θ ∈ (0, 1).
ˆ Then, (3.34) and (3.8) are equivalent to Lα ≤ G(θ, L) and Lα ≤ G(θ) respectively. The ˆ relationship of G and G is illustrated in Figures 1 and 2. Figure 1 presents the graphs of
16
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
ˆ while Figure 2 marks the regions G < G ˆ and G > G ˆ respectively. From z = 0 and z = G − G; ˆ these two figures, we can see that, in the case when L > 1, G < G holds for any 0 < θ < 1 ˆ depends on the values of θ and in the case when 0 < L < 1, the relationship of G and G (although we cannot prove theoretically these facts).
0.3
L
ˆ G−G
3
2.5
0.2 2 0.1 0
ˆ G
1.5
−0.1 1
−0.2 −0.3 −0.4 3
L
2.5
2
θ
0.5 1.5
1
0.5
0
0.5
1
0
0
ˆ Figure 1: the graphs of G − G
ˆ G>G 0
0.2
0.4
0.6
0.8
θ
1
Figure 2: the top view of Figure 1
In the special case when k ≡ 0, Algorithm TGNU({k }) is reduced to the GN method. In this case, we can choose θk ≡ 0 and so √ √ 1 1 − 1 − 2Lα 1 − 1 − 2Lα ∗ √ G= , t = . (3.37) , and ξ = 2 L 1 + 1 − 2Lα Therefore, applying Theorem 3.3, we have the following Kantorovich-type theorem for the GN method. Corollary 3.3. Let x0 ∈ D be such that f (x0 ) is of full row rank. Let r¯ > 0 be such that B(x0 , r¯) ⊆ D. Suppose that f (x0 )† f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(x0 , r¯). Let α be defined by (3.3). Suppose that 0 < Lα ≤
1 2
and
t∗ ≤ r¯.
Let {xk } be the sequence generated by the GN method with initial point x0 . Then {xk } is well defined and converges quadratically to a solution x∗ of f (x) = 0 in B(x0 , t∗ ). Moreover, the following estimate holds: k
ξ 2 −1 xk − x∗ ≤ 2k −1 t∗ j j=0 ξ Here t∗ and ξ are given in (3.37).
for each k = 0, 1, . . . .
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP
17
We end this section with another local convergence result for Algorithm TGNU({k }) by applying Theorem 3.3. Theorem 3.4. Let x∗ ∈ D be a solution of f (x) = 0 such that f (x∗ ) is of full row rank. Let r¯ > 0 be such that B(x∗ , r¯) ⊆ D. Suppose that f (x∗ )† f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(x∗ , r¯). Let 2 1 (3.38) r¯, r := min 11 3( (L + θ)2 + Lθ + L + θ) and x0 ∈ B(x∗ , r). Let {xk } be a sequence generated by Algorithm TGNU({k }) with initial point x0 . If {k } satisfies (3.5), then {xk } is well defined and moreover converges quadratically (in the root-convergence sense) to x ˆ∗ , a solution of f (x) = 0. Proof. By (3.38), one can see that 0 < r ≤ min{¯ r, 1/(2L)}. Then, assertions (i)–(iii) in Lemma 3.2 are seen to hold. Thus, thanks to Theorem 3.3, it suffices to prove that α≤
1
and
˜ + 2θ)2 + 2Lθ ˜ +L ˜ + 2θ (L
t∗ ≤ r¯ − r.
(3.39)
˜ and t∗ are defined by (3.26) and (2.3) with b, λ given by (3.32) and Here L a :=
˜ + θα) θ L(1 + . 2 ˜ 2[1 + Lθα (1 + θα)][1 + θ(α − 1)] 1 + θ(α − 1)
Since 0 < Lr ≤ 1/2 and α ≤
Lr2 + r, one has 2(1 − Lr) α≤
and
3 Lr2 +r ≤ r 2(1 − Lr) 2
1 1 . ≥ 2 2( (L + θ) + Lθ + L + θ) ˜ + 2θ)2 + 2Lθ ˜ +L ˜ + 2θ (L
(3.40)
(3.41)
Thus, we obtain by (3.38), (3.40) and (3.41) that 3 1 1 α≤ r≤ ≤ . 2 2 2( (L + θ) + Lθ + L + θ) ˜ + 2θ)2 + 2Lθ ˜ +L ˜ + 2θ (L
(3.42)
This gives that αθ ≤ 1/2. Hence, using (2.3), (3.32) and (3.33), we have 2λ = 2α(1 + αθ) ≤ 3α. b 9 It follows from (3.40) and (3.38) that t∗ ≤ r ≤ r¯ − r. Combining this with (3.42), one can 2 see that (3.39) holds and the proof is complete. t∗ ≤
18
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
4
Algorithms PGNU, TPGNU and convergence analysis
In this section, we propose the perturbed and truncated-perturbed GN methods (i.e., Algorithm PGNU and Algorithm TPGNU) for solving the underdetermined NLSP (1.1), and then present the convergence results of these two algorithms. To this end, let {xk } be a generated sequence and {Jk } be perturbed Fr´echet derivatives of f at {xk }.
4.1
Algorithm PGNU and convergence analysis
Algorithm PGNU Choose an initial guess x0 ∈ D ⊆ Rn . For k = 0, 1, . . . until convergence, do: 1. Form the perturbed Fr´echet derivative Jk of f at xk . 2. Solve sk from the following equation Jk JkT sk = −f (xk ).
(4.1)
3. Set xk+1 := xk + JkT sk . Remark 4.1. Let k ≥ 0 and suppose that Jk is of full row rank. Then, one has by (4.1) that sk = −(Jk JkT )−1 f (xk ), which together with Step 3 in Algorithm PGNU yields that xk+1 − xk = −Jk† f (xk ). Note that the full row rank assumption of {Jk } is reasonable. In fact, by the arguments in the proof we did for Theorem 3.1, the full row rank assumption of f (x0 ) can ensure that f (xk ) is of full row rank; hence, thanks to the well-known perturbation result for the Moore-Penrose inverse (cf. [1]), Jk is of full row rank provided that it is close enough to f (xk ) (for example, f (xk )† · Jk − f (xk ) < 1). Below we apply the convergence results of Algorithm TGNU established in section 3 to obtain some convergence results for Algorithm PGNU. Theorem 4.1. Let x0 ∈ D be such that f (x0 ) is of full row rank. Let r¯ > 0 be such that B(x0 , r¯) ⊆ D. Suppose that f (x0 )† f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(x0 , r¯). Suppose that (3.8) (resp.(3.34)) holds and that t∗ ≤ r¯. Let {xk } be a sequence generated by Algorithm PGNU with initial point x0 . Suppose that all elements in {Jk } are of full row rank and that f (x0 )† (I − f (xk )Jk† )f (xk ) ≤ μk f (x0 )† f (xk ) (resp. f (x0 )† (I −
f (xk )Jk† )f (xk )
≤ μk f (x0 )† f (xk ) 2
for each k = 0, 1, . . .
(4.2)
for each k = 0, 1, . . .),
(4.3)
where {μk } is a non-negative sequence. If sup μk ≤ θ, then {xk } is well defined and converges x∗
k≥0
, t∗ ).
of f (x) = 0 in B(x0 Moreover, the estimate (3.11) (resp.(3.36)) holds. to a solution Here t∗ is defined by (2.3) with a, b, λ given by (3.7) (resp. (3.32) and (3.33)).
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP
19
Proof. Suppose that (3.8) and (4.2) hold (for brevity, we only prove this case as the other one is similar). We assert that the iteration in Algorithm PGNU can be considered as the one in Algorithm TGNU({k }). For this purpose, let k ≥ 0. Thanks to Algorithm PGNU, one has xk+1 = xk − Jk† f (xk ) = xk − f (xk )† f (xk ) + (f (xk )† − Jk† )f (xk ), which together with Remark 3.1 implies that Algorithm PGNU can be considered as Algorithm TGNU({k }) with f (xk )† rk = (f (xk )† − Jk† )f (xk ). Thus one has by (4.2) that f (x0 )† rk = f (x0 )† f (xk )f (xk )† rk = f (x0 )† (I − f (xk )Jk† )f (xk ) ≤ μk f (x0 )† f (xk ) . Therefore the assertion holds, and so we can complete the proof by the arguments similar to Theorem 3.1. Similar to Theorems 3.2, 3.4 and using Theorem 4.1, we obtain the following local convergence theorem for Algorithm PGNU. Theorem 4.2. Let x∗ ∈ D be a solution of f (x) = 0 such that f (x∗ ) is of full row rank. Let r¯ > 0 be such that B(x∗ , r¯) ⊆ D. Suppose that f (x∗ )† f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(x∗ , r¯). Let r be given by (3.27) (resp. (3.38)) and x0 ∈ B(x∗ , r). Let {xk } be a sequence generated by Algorithm PGNU with initial point x0 . Suppose that all elements in {Jk } are of full row rank and that (4.2) (resp.(4.3)) holds. If sup μk ≤ θ, ˆ∗ , a solution of f (x) = 0. then {xk } is well defined and moreover converges to x
4.2
k≥0
Algorithm TPGNU and convergence analysis
Given the non-negative sequence {k }, we propose the following truncated-perturbed GN method, i.e., Algorithm TPGNU({k }), for solving the NLSP (1.1). Algorithm TPGNU({k }) Choose an initial guess x0 ∈ D ⊆ Rn . For k = 0, 1, . . . until convergence, do: 1. Form the perturbed Fr´echet derivative Jk of f at xk . 2. Solve (4.1) to find sk such that the residual r˜k defined by r˜k := Jk JkT sk + f (xk ) satisfies that f (x0 )† r˜k ≤ k . 3. Set xk+1 := xk + JkT sk .
(4.4)
20
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
Remark 4.2. Let k ≥ 0 and suppose that Jk is of full row rank. Then, we can deduce from Algorithm TPGNU that xk+1 − xk = −Jk† f (xk ) + Jk† r˜k . Moreover, as explained in Remark 4.1, we know that the full row rank assumption of Jk is reasonable. Applying the convergence results of Algorithm TGNU established in section 3, we obtain a semi-local convergence result for Algorithm TPGNU. For this purpose, we introduce the following two conditions of {k }: k ≤ θ˜k f (x0 )† f (xk ) for each k = 0, 1, . . . ;
(4.5)
k ≤ θ˜k f (x0 )† f (xk ) 2
(4.6)
for each k = 0, 1, . . . ,
where {θ˜k } is a non-negative sequence. Theorem 4.3. Let x0 ∈ D be such that f (x0 ) is of full row rank. Let r¯ > 0 be such that B(x0 , r¯) ⊆ D. Suppose that f (x0 )† f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(x0 , r¯). Suppose that (3.8) (resp.(3.34)) holds and that t∗ ≤ r¯. Let {xk } be a sequence generated by Algorithm TPGNU({k }) with initial point x0 . Suppose that all elements in {Jk } are of full row rank and moreover satisfies (4.2) (resp. (4.3)). Suppose that the following inequality holds: f (x0 )† f (xk )Jk† f (x0 ) ≤ ωk ,
for each k = 0, 1, . . . ,
(4.7)
where {ωk } is a non-negative sequence. If {k } satisfies (4.5) (resp. (4.6)) and sup{μk + k≥0
ωk θ˜k } ≤ θ, then {xk } is well defined and converges to a solution x∗ of f (x) = 0 in B(x0 , t∗ ). Moreover, the estimate (3.11) (resp.(3.36)) holds. Here t∗ is defined by (2.3) with a, b, λ given by (3.7) (resp. (3.32) and (3.33)). Proof. Suppose that (3.8), (4.2), and (4.5) hold (for brevity, we only prove this case as the other case is similar). Similar to Theorem 4.1, we only need to prove that the iteration in Algorithm TPGNU can be written into the one in Algorithm TGNU. To this end, let k ≥ 0. Then, by Algorithm TPGNU, we have that xk+1 = xk − Jk† f (xk ) + Jk† r˜k = xk − f (xk )† f (xk ) + f (xk )† f (xk ) − Jk† f (xk ) + Jk† r˜k , which together with Remark 4.2 implies that Algorithm TPGNU can be considered as as Algorithm TGNU with f (xk )† rk = f (xk )† f (xk ) − Jk† f (xk ) + Jk† r˜k .
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP
21
Thus, one has f (x0 )† rk = f (x0 )† f (xk )f (xk )† rk = f (x0 )† (I−f (xk )Jk† )f (xk )+f (x0 )† f (xk )Jk† f (x0 )f (x0 )† r˜k . Hence, using (4.2), (4.4), (4.7), and (4.5), one has f (x0 )† rk ≤ (μk + ωk θ˜k ) f (x0 )† f (xk ) . Therefore, the assertion holds and the proof is complete. Similar to Theorems 3.2, 3.4 and using Theorem 4.3, we have the following local convergence result for Algorithm TPGNU. Theorem 4.4. Let x∗ ∈ D be a solution of f (x) = 0 such that f (x∗ ) is of full row rank. Let r¯ > 0 be such that B(x∗ , r¯) ⊆ D. Suppose that f (x∗ )† f (·) satisfies the Lipschitz condition with Lipschitz constant L on B(x∗ , r¯). Let r be given by (3.27) (resp. (3.38)) and x0 ∈ B(x∗ , r). Let {xk } be a sequence generated by Algorithm TPGNU({k }) with initial point x0 . Suppose that all elements of {Jk } are of full row rank and that (4.2) (resp. (4.3)) and (4.7) hold. If (4.5) (resp. (4.6)) and sup{μk + ωk θ˜k } ≤ θ hold, then {xk } is well defined and k≥0
moreover converges to x ˆ∗ , a solution of f (x) = 0.
5
Numerical examples
In this section, we present some numerical experiments. In all tested problems, the computational costs of the revolved Jacobian matrices are small and so it is easy to apply Algorithm TGNU to solving the corresponding NLSP. However, in order to make comparisons of Algorithm TGNU with Algorithms PGNU and TPGNU, we construct {Jk } artificially in Example 5.1. Moreover, in Example 5.2, we only compare Algorithm TGNU with Algorithm ITREBO (i.e., Algorithm 2.1 in [23]) and the standard inexact Gauss-Newton method (cf. [23, p. 451]). For demonstration purpose, all linear systems that appear in this section are solved by the conjugate gradient method. Moreover, similar to [3] and [7], we stop the outer iterations in all tested problems when f (xk ) < 10−10 . Throughout this section, we use No and Ni to denote the outer iteration numbers and inner iteration numbers (of the conjugate gradient method) respectively. All the tests were implemented in MATLAB 7.13.0.564 on a Lenovo PC with Intel(R) Core(TM) i5-3210 CPU @ 2.5 GHz. Example 5.1. Consider the nonlinear least squares problem (1.1) with f : R10 → R6 defined
22
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
by ⎛ ⎜ ⎜ ⎜ ⎜ f (x) = ⎜ ⎜ ⎜ ⎝
f1 (x) f2 (x) f3 (x) f4 (x) f5 (x) f6 (x)
⎞
⎛
⎜ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ ⎟ ⎜ ⎟ := ⎜ ⎟ ⎜ ⎟ ⎜ ⎜ ⎠ ⎝
sin x(1) − (x(2) )2 (1) 2 (x ) + x(2) + (x(3) )2 − 4x(3) + 4 x(1) sin x(1) + x(3) − 2 2x(4) + (x(1) )2 x(5) − 2 (6) 2(x(1) )2 + (x(2) )3 ex + 4x(5) + (x(6) )2 − 3 10 (i) ex − x(6) (x(1) )2
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎟ ⎠
i=1
for each x = (x(1) , · · · , x(10) )T ∈ R10 . Clearly, the Jacobian of f at each x ∈ R10 is not invertible but of full row rank. Moreover, the solution of f (x)T f (x) = 0 is not unique and one subset of the solution set is S := {(0, 0, 2, 1, 0.75, 0, x(7) , x(8) , x(9) , x(10) )T | x(7) , x(8) , x(9) , x(10) ∈ R}. Note that Algorithms TGNU, PGNU, and TPGNU converge locally and so we choose the following three initial points near the solution set: (i1 ) x0 = (2, 2, · · · , 2)T ; (i2 ) x0 = (1.2, 1.4, 0.7, 1.2, 1.6, 0.0, 0.2, 1.9, 1.3, 0.5)T ; (i3 ) x0 = (1.2, 1.4, 0.8, 1.5, 1.8, 0.1, 0.3, 1.7, 0.8, 0.8)T . Algorithms TGNU, PGNU and TPGNU are all tested. In Algorithms PGNU and TPGNU, we choose the following finite difference as the perturbed Fr´echet derivative Jk , i.e., [Jk ]i,j =
fi (xk + ej ) − fi (xk ) ,
where = 10−10 and ej is the jth column of the identity matrix. The inner loop stopping tolerance for (1.8) in Algorithm TGNU is given by (3.2) with k = θk f (x0 )† f (xk ) for control (3.4) and k = min{1, θk f (x0 )† f (xk ) 2 } for control (3.5) respectively. Similarly, the inner loop stopping tolerance for (4.1) in Algorithm TPGNU is given by (4.4) with k = θ˜k f (x0 )† f (xk ) for control (4.5) and k = min{1, θ˜k f (x0 )† f (xk ) 2 } for control (4.6) respectively. For simplicity, we choose θk ≡ θ and θ˜k ≡ θ˜ for each k = 0, 1, 2, . . .. As for (4.1) in Algorithm PGNU, we solve it up to machine precision eps (which is about 2.2×10−16 ). We now report our experimental results. Similar to [4], we illustrate in Table 1 the values of ||f (xk )|| for the last three iterations of Algorithm TGNU with (3.5) satisfied. Table 2 presents the outer and inner iteration numbers of Algorithm TGNU. In order to see how θ influences the convergence performance of Algorithm TGNU, we do the numerical experiments with different θs in both tables. Finally, we list the outer and inner iteration numbers of Algorithms PGNU and TPGNU in the third table. Particularly, for Algorithm TPGNU, ˜ we adopt different θs.
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP From Tables 1–3, we can see that Algorithms TGNU, PGNU, and TPGNU converge with initial points (i1 ), (i2 ) and (i3 ). For Algorithms TGNU and TPGNU, large θ requires more outer iterations. In terms of the inner iterations in Algorithm TGNU (resp. TPGNU), the numbers corresponding to θ ≈ 0.5 (resp. θ˜ ≈ 0.5) are less than the numbers corresponding to ˜ Finally, we also can see that the convergence performances of Algorithms other θ (resp. θ). PGNU and TPGNU with control (4.6) are comparable to the one of Algorithm TGNU with control (3.5). Table 1: ||f (xk )|| of the last three iterations of Algorithm TGNU with control (3.5). initial point (i1 )
(i2 )
(i3 )
θ = 10−10 4.9e-01 1.5e-02 1.7e-12 1.0e-03 1.0e-06 2.0e-20 7.1e-04 5.0e-07 1.1e-21
θ = 0.25 3.1e-01 3.3e-04 6.4e-12 3.5e-05 1.2e-09 1.4e-22 1.7e-06 3.6e-10 3.4e-23
θ = 0.5 3.3e-04 2.8e-10 1.7e-25 2.9e-02 7.8e-04 8.3e-11 1.0e-02 4.0e-08 1.3e-15
θ = 0.75 3.3e-04 2.8e-10 1.7e-25 7.7e-04 2.8e-10 1.0e-21 7.1e-05 1.8e-10 5.8e-20
θ = 0.9 8.9e-04 1.1e-10 9.0e-19 3.8e-04 1.0e-08 9.6e-15 7.1e-05 1.8e-10 5.8e-20
Table 2: Outer and inner iteration numbers of Algorithm TGNU. control
(3.4)
(3.5)
initial point (i1 ) (i2 ) (i3 ) (i1 ) (i2 ) (i3 )
θ = 10−10 No Ni 9 71 8 50 8 47 9 71 8 54 8 54
θ = 0.25 No Ni 14 47 11 34 12 41 8 39 10 39 10 44
θ = 0.5 No Ni 14 37 14 36 19 44 9 43 8 31 9 33
θ = 0.9 No Ni 28 44 23 38 19 31 9 43 11 37 9 31
23
24
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
Table 3: Outer and inner iteration numbers of Algorithms PGNU and TPGNU.
No
Ni
initial point
PGNU
(i1 ) (i2 ) (i3 ) (i1 ) (i2 ) (i3 )
10 8 8 110 77 75
TPGNU with control (4.5) ˜ θ = 0.25 θ˜ = 0.5 θ˜ = 0.9 11 14 28 14 14 23 16 22 19 42 37 44 41 36 38 51 52 31
TPGNU with control (4.6) ˜ θ = 0.25 θ˜ = 0.5 θ˜ = 0.9 9 9 9 9 11 11 9 9 9 43 43 41 37 36 37 36 33 31
Example 5.2. In this example, we consider some medium/large dimension test problems with f (·) := (f1 (·), f2 (·), . . . , fl (·))T defined as follows. In particular, tested problems (1) and (2) are taken from [4]. (1) f : R2l → Rl where fi (x) := (x(i) + x(l+i) )(x(i) + x(l+i) − (2) f : R2l → Rl where fi (x) := x(i) x(l+i) −
√ i,
√
i),
i = 1, · · · , l.
i = 1, · · · , l.
(3) f : R3l → Rl where fi (x) := x(i) x(l+i) x(2l+i) −
√ 3
i,
i = 1, · · · , l.
(4) f : R2l → Rl where fi (x) := (3 − 2x(2i−1) )x(2i−1) − x(2i−2) − 2x(2i) + 1,
i = 1, · · · , l.
Here x(0) := 0. In this example, we illustrate the convergence performance of Algorithm TGNU. For comparisons, Algorithm ITREBO and the standard inexact Gauss-Newton method (Algorithm IGN) (cf. [23]) are also tested. For each tested problem, we consider l = 1000, 2000, and 4000. Motivated by [4] and [18], we choose the initial point x0 = (l/2, · · · , l/2)T in problems (1)-(3) and x0 = (−1, · · · , −1)T in problem (4). To reduce the cost of computation, the inner loop stopping tolerance for (1.8) is given by rk ≤ θk f (xk ) .
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP
25
Note that, as explained in the last section, this inner loop stopping tolerance is equivalent to the following tolerance (in the case when f (x0 ) is of full row rank): f (x0 )† rk ≤ θk f (x0 )† f (xk ) , but with a different {θk }. For simplicity, we choose θk ≡ 10−10 for each k = 0, 1, 2, . . .. As for Algorithm ITREBO, the parameters are set according to [23] and [19], i.e., √ Δmin = 2 · 10−16 , Δ0 = 1, β1 = 0.25, β2 = 0.75, δ = 0.25, ηmax = 0.95, ηk = min{0.1, ||f (xk )||∞ , f (xk )T f (xk ) }. While for Algorithm IGN, we solve the linear system f (xk )T f (xk )dk = −f (xk )T f (xk ) such that f (xk )T f (xk )dk + f (xk )T f (xk ) ≤ θk ||f (xk )T f (xk )||, where {θk } is chosen as that in Algorithm TGNU. We report our experimental results in Tables 4–7, where the CPU time in seconds is denoted by t and ||f (xk )|| lists the last three iterations of the algorithms. From these tables, we can see that Algorithms TGNU, ITREBO and IGN converge with the revolved initial points. Moreover, Algorithm TGNU needs much less CPU time to converge than the other two algorithms on the whole. The outer and inner iteration numbers of Algorithm TGNU and IGN are almost the same. Finally, compared with Algorithm ITREBO, Algorithm TGNU needs much less outer iterations but more inner iterations. Table 4: Results for problem (1). m×n 1000 × 2000 2000 × 4000 4000 × 8000
algorithm TGNU ITREBO IGN TGNU ITREBO IGN TGNU ITREBO IGN
No 15 28 15 16 31 16 17 33 17
Ni 797 431 726 1074 709 915 1442 803 1210
t 2.5 7.3 5.4 14.3 51.6 34.3 101.3 359.5 224.5
||f (xk )|| 2.7e-04, 7.6e-08, 5.7e-15 3.5e-04, 1.1e-07, 1.1e-14 2.7e-04, 7.6e-08, 5.7e-15 2.7e-04, 7.6e-08, 5.7e-15 4.7e-05, 1.8e-09, 0 2.7e-04, 7.6e-08, 5.7e-15 2.7e-04, 7.6e-08, 5.7e-15 7.0e-04, 4.5e-07, 1.9e-13 2.7e-04, 7.6e-08, 5.7e-15
26
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
Table 5: Results for problem (2). m×n 1000 × 2000 2000 × 4000 4000 × 8000
algorithm TGNU ITREBO IGN TGNU ITREBO IGN TGNU ITREBO IGN
No 13 27 13 14 29 14 15 32 15
Ni 242 135 244 286 116 290 337 165 343
t 1.6 6.1 3.5 10.7 40.6 21.2 77.8 321.8 158.6
||f (xk )|| 1.1e-03, 3.0e-07, 3.6e-05, 5.4e-10, 1.1e-03, 3.0e-07, 1.1e-03, 3.0e-07, 2.5e-03, 2.2e-06, 1.1e-03, 3.0e-07, 1.1e-03, 3.0e-07, 2.7e-04, 3.0e-08, 1.1e-03, 3.0e-07,
8.0e-14 8.1e-14 8.0e-14 1.5e-13 1.3e-12 1.5e-13 2.6e-13 2.6e-13 2.6e-13
Table 6: Results for problem (3). m×n 1000 × 3000 2000 × 6000 4000 × 12000
algorithm TGNU ITREBO IGN TGNU ITREBO IGN TGNU ITREBO IGN
No 20 33 20 21 36 21 23 39 23
Ni 244 115 247 257 121 260 308 122 311
t 5.1 20.4 11.8 26.4 119.8 71.6 170.2 855.2 522.9
3.8e-05, 1.0e-04, 3.8e-05, 3.5e-03, 3.2e-04, 3.5e-03, 7.0e-04, 1.9e-03, 7.0e-04,
||f (xk )|| 4.4e-10, 3.6e-09, 4.4e-10, 2.9e-06, 4.6e-08, 2.9e-06, 1.3e-07, 1.6e-06, 1.3e-07,
4.1e-14 4.6e-14 4.1e-14 2.8e-12 8.4e-14 2.8e-12 1.3e-13 8.4e-13 1.3e-13
Table 7: Results for problem (4). m×n 1000 × 2000 2000 × 4000 4000 × 8000
algorithm TGNU ITREBO IGN TGNU ITREBO IGN TGNU ITREBO IGN
No 4 6 4 4 7 4 4 7 4
Ni 30 14 30 30 18 30 30 15 30
t 1.0 1.9 1.4 5.5 13.0 8.1 41.7 86.7 54.7
1.1e-03, 8.3e-04, 1.1e-03, 1.4e-03, 6.6e-05, 1.4e-03, 1.9e-03, 4.8e-04, 1.9e-03,
||f (xk )|| 3.1e-08, 1.2e-07, 3.1e-08, 3.1e-08, 6.3e-10, 3.1e-08, 3.1e-08, 8.5e-08, 3.1e-08,
1.4e-14 1.4e-14 1.4e-14 1.9e-14 1.9e-14 1.9e-14 2.8e-14 2.8e-14 3.5e-15
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP
6
27
Concluding remarks
In the present paper, we establish several semi-local convergence results for the proposed approximate Gauss-Newton methods under the residual control: f (x0 )† rk ≤ θk f (x0 )† f (xk )
(resp. f (x0 )† rk ≤ θk f (x0 )† f (xk ) 2 ).
(6.1)
The obtained convergence criterion depends heavily on the sequence {θk }. Note that in the case when f (x0 ) is of full row rank, the residual control (6.1) is equivalent to the following residual control: (6.2) rk ≤ ηk f (xk ) (resp. rk ≤ ηk f (xk ) 2 ), as the inequalities rk ≤ f (x0 ) · f (x0 )† rk and f (x0 )† rk ≤ f (x0 )† · rk hold. To present a clear and sharp convergence criteria, we adopt here the residual control (6.1) instead of (6.2) in theoretical analysis. However, in practical computations, one may adopt the residual control (6.2) to avoid the cost of computing f (x0 )† (which is expensive when the problem size is very large) as we have done in Example 5.2.
References [1] A. Ben-Israel, T. N. E. Greville, Generalized Inverses: Theory and Applications, John Wiley, New York, 1974; 2nd edition, Springer-Verlag, New York, 2003. [2] A. Bj¨ ork, Numerical Methods for Least Squares Problems, SIAM Press, Philadelphia, PA, 1996. [3] P. Courtier, J. N. Thepaut, and A. Hollingsworth, A strategy for operational implementation of 4D-Var, using an incremental approach, Quart. J. Roy. Meteor. Soc. 120 (1994) 1367-1387. [4] H. Dan, N. Yamashita, M. Fukushima, Convergence properties of the Inexact LevenbergMarquardt method under local error bound conditions, Optim. Methods Softw. 17 (2002) 605-626. [5] R. S. Dembo, S. C. Eisenstat, T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal. 19 (1982) 400-408. [6] J. E. Dennis, R. B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall, Englewood Cliffs, NJ, 1983. [7] J. B. Francisco, N. Kreji´c, J. M. Mart´ınez, An interior-point method for solving boxconstrained underdetermined nonlinear systems, J. Comput. Appl. Math. 177 (2005) 67-88.
28
J. F. Bao, C. Li, W. P. Shen, J. C. Yao, and S. M. Guu
[8] M. Ghil and P. Malanotte-Rizzoli, Data assimilation in meteorology and oceanography, Adv. Geophys. 33 (1991) 141-266. [9] S. Gratton, A. S. Lawless, N. K. Nichols, Approximate Gauss-Newton methods for nonlinear least squares problems, SIAM J. Optim. 18 (2007) 106-132. [10] X. P. Guo, On semilocal convergence of inexact Newton methods, J. Comput. Math. 25 (2007) 231-242. [11] W. M. H¨ außer, A Kantorovich-type convergence analysis for the Gauss-Newton method, Numer. Math. 48 (1986) 119-125. [12] N. C. Hu, W. P. Shen, C. Li, Kantorovich’s type theorems for systems of equations with constant rank derivatives, J. Comput. Appl. Math. 219 (2008) 110-122. [13] L. V. Kantorovich, On Newton’s method for functional equations, Dokl. Akad. Nauk. SSSR. (N.S.) 59 (1948) 1237-1240. [14] A. S. Lawless, S. Gratton, and N. K. Nichols, An investigation of incremental 4D-Var using non-tangent linear models, Quart. J. Roy. Meteor. Soc. 131 (2005) 459-476. [15] C. Li, N. C. Hu, J. H. Wang, Convergence behavior of Gauss-Newton’s method and extensions of the Smale point estimate theory, J. Complexity, 26 (2010) 268-295. [16] C. Li, W. P. Shen, Local convergence of inexact methods under the H¨older condition, J. Comput. Appl. Math. 222 (2008) 544-560. [17] C. Li, W. H. Zhang, X. Q. Jin, Convergence and uniqueness properties of Gauss-Newton’s method, Comput. Math. Appl. 47 (2004) 1057-1067. [18] J. J. Mor´e, B. S. Garbow, K. E. Hillstrom, Testing Unconstrained Optimization Software, ACM Transactions on Mathematical Software, 7 (1981) 17-41. [19] M. Macconi, B. Morini, M. Porcelli, A Gauss-Newton method for solving bound constrained underdetermined nonlinear systems, Optim. Methods Softw. 24 (2009) 219-235. [20] B. Morini, Convergence behaviour of inexact Newton methods, Math. Comput. 68 (1999) 1605-1613. [21] J. Nocedal, S. J. Wright, Numerical Optimization, Springer, New York, NY, 1999. [22] J. M. Ortega, W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. [23] M. Porcelli, On the convergence of an inexact Gauss-Newton trust-region method for nonlinear least-squares problems with simple bounds, Optim. Lett. 7 (2013) 447-465.
APPROXIMATE GAUSS-NEWTON METHODS FOR SOLVING NLSP
29
[24] W. P. Shen, C. Li, Convergence criterion of inexact methods for operators with H¨older continuous derivatives, Taiwan. J. Math. 12 (2008) 1865-1882. [25] W. P. Shen, C. Li, Kantorovich-type convergence criterion for inexact Newton methods, Appl. Numer. Math. 59 (2009) 1599-1611. [26] G. W. Stewart, J. G. Sun, Matrix Perturbation Theory, Academic Press, New York, 1990. [27] G. R. Wang, Y. M. Wei, S. Qiao, Generalized Inverse: Theory and Computations, Science Press, Beijing, 2004. [28] T. J. Ypma, Local convergence of inexact Newton methods, SIAM J. Numer. Anal. 21 (1984) 583-590.