Applied Mathematics Letters 21 (2008) 181–186 www.elsevier.com/locate/aml
A new generalized APPA for maximal monotone operators Min Li Department of Management Science and Engineering, School of Economics and Management, Southeast University, Nanjing, 210096, PR China Received 20 November 2005; received in revised form 5 February 2007; accepted 5 March 2007
Abstract In this work, the conditions on the well-known Bregman function in generalized approximate proximal point algorithms (APPA) are replaced by some easily checked ones, which are more practical. The resulting new generalized APPA with optimal step sizes converges to a solution globally under rather relaxed restrictions on the error sequence. c 2007 Elsevier Ltd. All rights reserved.
Keywords: Bregman function; Global convergence; Inexact criterion; Maximal monotone operator; Proximal point algorithm
1. Introduction Let the point-to-set mapping Tˆ = T + NΩ be a maximal monotone operator on Rn , where NΩ is the normal cone operator with respect to Ω : NΩ = {(x, y) | x ∈ Ω , y T (w − x) ≤ 0 ∀w ∈ Ω }; T is a point-to-set maximal monotone operator on Rn and Ω is a closed and convex subset in Rn . The canonical problem associated with Tˆ is that of finding a root of it, i.e., finding a point x ∈ Ω such that 0 ∈ Tˆ (x). One powerful approach to solving this problem is the proximal point algorithm (abbreviated as PPA) proposed first by Martinet [1] and then developed by many researchers; see, e.g. [2,3]. Instead of finding x ∈ Ω such that 0 ∈ Tˆ (x) directly, starting from the current approximation to a root of Tˆ , say, x k ∈ Ω , PPA generates the next iterate x k+1 by solving the following proximal subproblem: (PPA)
0 ∈ βk Tˆ (x k+1 ) + x k+1 − x k ,
(1)
where {βk }∞ k=0 ⊂ [β, ∞) is a non-decreasing sequence of scalars and β > 0. Note that solving the subproblem (1) exactly can be computationally as difficult as solving the original problem itself. This makes straightforward applications of PPA impractical in many cases. Thus it is essential to devise implementable PPA-type algorithms. The first contribution in this area is the approximate PPA (APPA) presented by Rockafellar in [2]. In particular, Rockafellar’s APPA finds an inexact solution x k+1 of (1) in the following sense: (APPA)
0 ∈ βk Tˆ (x k+1 ) + x k+1 − x k − ek ,
E-mail address:
[email protected]. c 2007 Elsevier Ltd. All rights reserved. 0893-9659/$ - see front matter doi:10.1016/j.aml.2007.03.010
(2)
182
M. Li / Applied Mathematics Letters 21 (2008) 181–186
where ek ∈ Rn is the error term. Obviously, the restriction on ek is crucial for leading to practical and efficient APPA-type algorithms. Recently, some authors [4,5] significantly relaxed the restriction on ek to kek k ≤ υk kx k − x k+1 k
with sup{υk } = υ ∈ (0, 1),
(3)
k≥0
which implies that the relative errors for solving the proximal subproblems involved can be fixed at υ and do not need to tend to zero. To ensure convergence of the algorithm with the criterion (3), the authors in [5] used an extragradient step to correct the approximate solution and thus presented an extragradient-proximal algorithm. The extragradient step has been improved by choosing a refined step size in [4,6]. For more desired theoretical improvement and convergence, some recent developments on the PPA focus on replacing the linear term x k+1 − x k with more general terms, e.g., the Bregman functions [7–9]. In particular, a function h : Ω → Rn is called a Bregman function with Ω if it satisfies the following conditions: (1) h is strictly convex and continuous on Ω . (2) h is continuously differentiable on int Ω (the interior of Ω ). (3) Let Dh (x, y) = h(x) − h(y) − h∇h(y), x − yi. Given any x ∈ Ω and α ∈ R, the right partial level set L(x, α) = {y ∈ int Ω | Dh (x, y) ≤ α} is bounded. (4) If {y k } ⊂ int Ω is a convergent sequence with limit y ∞ , then Dh (y ∞ , y k ) → 0. (5) The left partial level set L0 (y, α) = {x ∈ int Ω | Dh (x, y) ≤ α} is bounded. (6) If {x k } ⊂ Ω and {y k } ⊂ int Ω are sequences such that {x k } is bounded, {y k } converges to y ∞ and Dh (x k , y k ) → 0, then x k → y ∞ . Note that in fact (5)–(6) are implied by (1)–(4); see, e.g., [10]. Therefore, (5)–(6) are redundant for the definition of a Bregman function. Consequently, a natural idea is to design the Bregman-based APPA (denoted by B-APPA) in the following scheme: 0 ∈ βk Tˆ (x k+1 ) + ∇h(x k+1 ) − ∇h(x k ) − ek .
(4)
Note that in existing Bregman-based methods [9], the assumptions on {ek } are ∞ X
kek k < +∞
and
k=1
∞ X
hek , x k i exists and is finite,
k=1
which are rather strict and difficult to implement. In addition, the conditions (3) and (4) in the definition of the Bregman function are not practical in the sense that they are difficult to check. One approach for overcoming these difficulties is replacing Bregman functions with other general functions. In particular, at the kth iteration, rather than the linear term x k+1 − x k or the Bregman term ∇h(x k+1 ) − ∇h(x k ), the original problem is regularized by f (x k+1 ) − f (x k ), where f : Ω → Rn satisfies: (a) f (x) is µ-strongly monotone uniformly on Ω : (x − x 0 )T ( f (x) − f (x 0 )) ≥ µkx − x 0 k2 , ∀x, x 0 ∈ Ω . (b) f (x) is L-Lipschitz continuous on Ω : k f (x) − f (x 0 )k ≤ Lkx − x 0 k, ∀x, x 0 ∈ Ω . Remark 1. The conditions (a) and (b) are stronger than the conditions (1)–(6) for a Bregman function. However, they are easier to verify in practice. The general functions that satisfy conditions (a) and (b) include the gradient of certain Bregman functions (i.e., f = ∇h). Note that in such a case the explicit information of the Bregman function h is not required. The theoretical framework of the new generalized APPA (denoted by f -APPA) consists of the following steps. Algorithm ( f -APPA): Let x k ∈ Ω , β > 0, [γ L , γU ] ⊂ (0, 2) and f be a function satisfying (a) and (b). Choose βk ∈ [β, ∞) and γk ∈ [γ L , γU ]. Step 1. Find x˜ k ∈ Ω and ek ∈ Rn satisfying 0 ∈ βk Tˆ (x˜ k ) + f (x˜ k ) − f (x k ) − ek ,
(5)
under the inexact criterion kek k ≤ µυk kx k − x˜ k k
with sup{υk } = υ ∈ (0, 1). k≥0
(6)
M. Li / Applied Mathematics Letters 21 (2008) 181–186
183
Step 2. Correct x˜ k : x k+1 = PΩ [x k − αk βk y k ],
(7)
where αk = γk αk∗ ,
αk∗ =
(x k − x˜ k )T d k , kd k k2
d k = f (x k ) − f (x˜ k ) + ek ,
y k ∈ T (x˜ k )
(8)
and PΩ (·) denotes the projection onto Ω under the Euclidean norm. Here y k satisfies βk y k + f (x˜ k ) − f (x k ) − ek ∈ −NΩ (x˜ k ).
(9)
Compared with the traditional hybrid APPA, this new algorithm uses the general function f and adopts the optimal step size strategy which takes full advantage of the information on existing iterates. From a computational point of view, this strategy may accelerate convergence [4,6] while the iterates are generated under the same relaxed restrictions as in the improved hybrid APPA [4,5] and the global convergence property is also reserved. Throughout we make the following standard assumptions. Assumption 1. (int Ω ) ∩ (Re-int dom T ) 6= ∅, where “Re-int X ” denotes the relative interior of the set X , “dom T ” denotes the domain of the operator T and dom T = {x | (x, y) ∈ T } = {x | T (x) 6= ∅}. Assumption 2. The sequences {x˜ k } ⊂ Ω and {ek } ⊂ Rn conforming to the recursions (5) and (6) exist. Assumption 3. The zero-point set of Tˆ (·), denoted by Ω ∗ , is nonempty. Remark 2. Assumption 2 can be satisfied by some simple conditions on f such as that on ∇h in [7–9,11]. If f (x) = x, the existences of {x˜ k } and {ek } are assured by Theorem 2 of [12]. Also for monotone variational inequalities, i.e., T (·) = F(·) where F(·) is a continuous monotone mapping from Rn into itself, it is easy to prove that Assumption 2 is satisfied without any extra conditions. 2. Convergence In this section, we make some convergence analyses of the algorithm. A basic inequality concerning the projection mapping on a closed convex set is (PΩ (v) − w)T (v − PΩ (v)) ≥ 0,
∀v ∈ Rn , w ∈ Ω .
(10)
Consequently, we have kPΩ (v) − wk2 ≤ kv − wk2 − kv − PΩ (v)k2 ,
∀v ∈ Rn , w ∈ Ω .
(11)
Lemma 1. For any β > 0, (x, y) ∈ NΩ if and only if x = PΩ [x + βy]. Proof. See [13, p. 267].
It follows from the notation of d k , conditions (a) and (b) on f and the inexact criterion (6) that (x k − x˜ k )T d k = (x k − x˜ k )T [ f (x k ) − f (x˜ k )] + (x k − x˜ k )T ek ≥ µkx k − x˜ k k2 − kx k − x˜ k kkek k ≥ µ(1 − υ)kx k − x˜ k k2 ,
(12)
and kd k k ≤ k f (x k ) − f (x˜ k )k + kek k ≤ (L + µυ)kx k − x˜ k k. In Lemma 2, we will list a few inequalities associated with the recursion (5).
(13)
184
M. Li / Applied Mathematics Letters 21 (2008) 181–186
Lemma 2. For given x k ∈ Ω and βk > 0, let x˜ k and ek conform to set-valued equation (5) and the inexact criterion (6). Then for any x ∗ ∈ Ω ∗ and y k ∈ T (x˜ k ), we have (x k − x ∗ )T βk y k ≥ (x k − x˜ k )T βk y k ≥ µ(1 − υ)kx k − x˜ k k2 .
(14)
Proof. Since x ∗ is a zero point of Tˆ (·), y k ∈ Tˆ (x˜ k ) (y k ∈ T (x˜ k ), 0 ∈ NΩ (x˜ k ), Tˆ = T + NΩ ) and Tˆ is monotone, we have (x˜ k − x ∗ )T (y k − 0) ≥ 0. Using βk > 0 the first assertion is proved. Moreover, it follows from (9) and the definition of NΩ that x˜ k ∈ Ω ,
(x − x˜ k )T [βk y k + f (x˜ k ) − f (x k ) − ek ] ≥ 0,
∀x ∈ Ω .
(15)
By setting x = x k and using the notation of d k , we get (12)
(x k − x˜ k )T βk y k ≥ (x k − x˜ k )T d k ≥ µ(1 − υ)kx k − x˜ k k2 . The proof is complete.
Lemma 2 shows that −y k is a descent direction of kx − x ∗ k2 /2 at the point x k . On the basis of such observations, it is natural to generate the new iterate x k+1 via x k+1 = PΩ [x k − αk βk y k ].
(16)
If we define Θk := kx k − x ∗ k2 − kx k+1 − x ∗ k2 ,
(17)
then it is reasonable to use Θk to measure the progress obtained at the kth iteration. Theorem 1. Let {x k }, {x˜ k }, {ek } and {υk } be the sequences conforming to the set-valued equation (5) and the inexact criterion (6). Then the sequence {x k } generated by (7) satisfies kx k+1 − x ∗ k2 ≤ kx k − x ∗ k2 −
µ2 (1 − υ)2 γ L (2 − γU ) k kx − x˜ k k2 . (L + µυ)2
(18)
Proof. First, it follows from (11) and x k+1 = PΩ [x k − αk βk y k ] that kx k+1 − x ∗ k2 ≤ kx k − αk βk y k − x ∗ k2 − kx k − αk βk y k − x k+1 k2 . Substituting this into (17), we obtain Θk ≥ 2αk βk (x k − x ∗ )T y k + kx k − x k+1 k2 − 2αk βk (x k − x k+1 )T y k .
(19)
Then substituting (14) into (19), we get Θk ≥ 2αk βk (x k − x˜ k )T y k + kx k − x k+1 k2 − 2αk βk (x k − x k+1 )T y k = 2αk βk (x k − x˜ k )T y k − αk2 kd k k2 + k(x k − x k+1 ) − αk d k k2 + 2αk (x k − x k+1 )T (d k − βk y k ).
(20)
Now we consider the last term in the right-hand side of (20). From (8) and (9) and using Lemma 1, we know that x˜ k = PΩ [d k − βk y k + x˜ k ]. By using v := d k − βk y k + x˜ k , x˜ k := PΩ (v) and w := x k+1 in the basic inequality of projection mapping (10), we get (x˜ k − x k+1 )T (d k − βk y k ) ≥ 0. Obviously, we have (x k − x k+1 )T (d k − βk y k ) ≥ (x k − x˜ k )T (d k − βk y k )
(21)
M. Li / Applied Mathematics Letters 21 (2008) 181–186
185
and thus from (20) we obtain Θk ≥ 2αk βk (x k − x˜ k )T y k − αk2 kd k k2 + 2αk (x k − x˜ k )T (d k − βk y k ) = 2αk (x k − x˜ k )T d k − αk2 kd k k2 .
(22)
Using (8), (12) and (13), we have αk∗ ≥
µ(1 − υ)kx k − x˜ k k2 µ(1 − υ) = . (L + µυ)2 kx k − x˜ k k2 (L + µυ)2
(23)
From (22) and (8), we obtain Θk ≥ 2γk αk∗ (x k − x˜ k )T d k − (γk αk∗ )2 kd k k2 = 2γk αk∗ (x k − x˜ k )T d k − (γk2 αk∗ )(x k − x˜ k )T d k ≥ αk∗ γ L (2 − γU )(x k − x˜ k )T d k . Then it follows from (17) and (12) that kx k+1 − x ∗ k2 ≤ kx k − x ∗ k2 − αk∗ γ L (2 − γU )(x k − x˜ k )T d k ≤ kx k − x ∗ k2 − αk∗ γ L (2 − γU )µ(1 − υ)kx k − x˜ k k2 . The assertion follows from (23) immediately. From the inequality (18), we can see that
{x k }
is a bounded sequence and
lim kx k − x˜ k k = 0.
(24)
k→∞
Consequently, {x˜ k } is also bounded. In the following theorem, we will prove the convergence of the proposed method. Theorem 2. Let {x k }, {x˜ k }, {ek } and {υk } be the sequences conforming to the set-valued equation (5) and the inexact criterion (6). Then the sequence {x k } generated by (7) converges to one zero point of Tˆ (·). Proof. Since {x˜ k } is bounded, it has one cluster point at least. Let x ∞ be a cluster point of {x˜ k } and the subsequence {x˜ k j } converges to x ∞ . If we define zk =
1 [ f (x k ) − f (x˜ k ) + ek ], βk
then z k j ∈ Tˆ (x˜ k j ) (according to (5)). Using x k − x˜ k → 0, ek → 0, infk βk = β > 0 and f (·) is L-Lipschitz continuous, we obtain 1 lim z k j = lim [ f (x k j ) − f (x˜ k j ) + ek j ] = 0. j→∞ j→∞ βk j Because Tˆ is maximal, it is a closed set in Rn × Rn ; then lim (x˜ k j , z k j ) = (x ∞ , 0) ∈ Tˆ ,
j→∞
that is x ∞ is a zero point of Tˆ (·). Note that the inequality (18) is true for all zero points of Tˆ (·), and hence we have kx k+1 − x ∞ k ≤ kx k − x ∞ k,
∀k ≥ 0.
(25)
Since {x˜ k j } → x ∞ and x k − x˜ k → 0, for any given > 0, there exists an l > 0 such that kx˜ kl − x ∞ k < /2
and
kx kl − x˜ kl k < /2.
(26)
Therefore, for any k ≥ kl , it follows from (25)–(26) that kx k − x ∞ k ≤ kx kl − x ∞ k ≤ kx kl − x˜ kl k + kx˜ kl − x ∞ k < and thus the sequence {x k } converges to x ∞ which is a zero point of Tˆ (·).
186
M. Li / Applied Mathematics Letters 21 (2008) 181–186
3. Conclusions To relax restrictions on the error sequence {ek } of the new generalized APPA, the conditions on the Bregman function used in some generalized APPA are replaced by some new conditions, which are somewhat stronger, but should be easier to verify in practice. The extragradient step with optimal step sizes is also used to guarantee global convergence of the new algorithm. Acknowledgments This research was supported by NSFC Grant 10571083 and MOEC Grant 20060284001. The author wishes to thank the anonymous referees for their valuable comments and suggestions. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
B. Martinet, Regularisation d’in´equations variationnelles par approximations successives, Rev. Fr. Inform. Rech. Op´er. 4 (1970) 154–159. R.T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optim. 14 (5) (1976) 877–898. M. Teboulle, Convergence of proximal-like algorithms, SIAM J. Optim. 7 (4) (1997) 1069–1083. B.S. He, A new approximate proximal point algorithm for maximal monotone operator, Sci. China Ser. A 46 (2) (2003) 200–206. M.V. Solodov, B.F. Svaiter, A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator, Set-Valued Anal. 7 (4) (1999) 323–345. M.V. Solodov, B.F. Svaiter, A hybrid projection-proximal point algorithm, J. Convex Anal. 6 (1) (1999) 59–70. R.S. Burachik, A.N. Iusem, A generalized proximal point algorithm for the variational inequality problem in a Hilbert space, SIAM J. Optim. 8 (1) (1998) 197–216. J. Eckstein, Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming, Math. Oper. Res. 18 (1) (1993) 202–226. J. Eckstein, Approximate iterations in Bregman-function-based proximal algorithms, Math. Program. 83 (1) (1998) 113–123. H.H. Bauschke, J.M. Borwein, Legendre functions and the method of random Bregman projections, J. Convex Anal. 4 (1997) 27–67. Y. Censor, S.A. Zenios, Proximal minimization algorithm with D-functions, J. Optim. Theory Appl. 73 (3) (1992) 451–464. J. Eckstein, D.P. Bertsekas, On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators, Math. Program. 55 (3) (1992) 293–318. D.P. Bertsekas, J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Englewood Cliffs, NJ, 1989.