Applied Mathematics and Computation 228 (2014) 258–263
Contents lists available at ScienceDirect
Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc
A note on the convergence of alternating proximal gradient method Miantao Chao ⇑, Caozong Cheng Department of Mathematics, Beijing University of Technology, 100124 Beijing, China
a r t i c l e
i n f o
Keywords: Alternating proximal gradient method Alternating direction method of multipliers Strongly convex functions Global convergence
a b s t r a c t We consider a class of linearly constrained separable convex programming problems whose objective functions are the sum of m convex functions without coupled variables. The alternating proximal gradient method is an effective method for the case m ¼ 2, but it is unknown whether its convergence can be extended to the general case m P 3. This note shows the global convergence of this extension when the involved functions are strongly convex. Ó 2013 Elsevier Inc. All rights reserved.
1. Introduction We consider the following convex minimization model with linear constraints and separable objective function:
min
m X fi ðxi Þ
s:t:
m X Ai xi ¼ b;
i¼1
ð1Þ
i¼1
xi 2 X i ;
i ¼ 1; . . . ; m;
where fi : Rni ! R ði ¼ 1; . . . ; mÞ are closed proper functions, X i # Rni ði ¼ 1; . . . ; mÞ are closed convex sets, Ai 2 Rlni ði ¼ 1; . . . ; mÞ are given matrices, and b 2 Rl is a given vector. Throughout, the solution set of (1) is assumed to be nonempty. Our discussion focuses on the particular case of (1) where m P 3. A fundamental method for solving (1) is the alternating direction method of multipliers (ADMM) which was presented originally in [1,2]. The standard ADMM iterative scheme is
8 8 2 9 = m < > X > > > xkþ1 2 arg min f1 ðx1 Þ þ 21l A1 x1 þ Aj xkj b lkk ; > 1 > x1 2X 1 : > ; > j¼2 > > > 8 > 2 9 > > = < X i1 m > X > k > kþ1 kþ1 1 k > xi 2 arg min fi ðxi Þ þ 2l Aj xj þ Ai xi þ A x b l k ; > j j < xi 2X i : ; j¼1 j¼iþ1 8 9 > 2 > > = m 1 < X > > k kþ1 kþ1 1 > > x 2 arg min f ðx Þ þ A x þ A x b l k ; m m m m j j m > 2 l > xm 2X m : ; j¼1 > > > ! > > m > X > kþ1 > > ¼ kk l1 Aj xkþ1 b : >k j : j¼1
⇑ Corresponding author. E-mail addresses:
[email protected] (M. Chao),
[email protected] (C. Cheng). 0096-3003/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.amc.2013.11.101
ð2Þ
M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263
259
The ADMM has been well studied in the literature for the special case m ¼ 2. Without further assumptions, the convergence of (2) for the general case m P 3 still remains open despite that its efficiency has been verified empirically in [7,8]. In [3], the convergence of (2) is shown under the conditions that all fi are strongly convex. When each convex function fi in (1) is of particular structure and the step size l1 in the update of kkþ1 is sufficiently small to fulfill certain error bound, the resulting scheme is convergent (see [5]). He et al. [4] showed that the resulting sequence is convergent if the output of (2) is further corrected by a substitution procedure. Based on the framework of ADMM (2), Ma [6] proposed an alternating proximal gradient method (APGM) for problem (1) in the special case m ¼ 2, as follows:
8 h i2 > T k kþ1 1 k k k > x ¼ arg min f ðx Þ þ x x s A ðA x þ A x b l k Þ > ; 1 1 1 1 1 2 1 1 2 1 > 1 2ls1 > x1 2X 1 > < h i2 T k kþ1 kþ1 1 k k ¼ arg min f ðx Þ þ x x s A ðA x þ A x b l k Þ x ; 2 2 2 2 1 2 > 2 2 2 1 2ls2 > x2 2X 2 > 2 > > > : kkþ1 ¼ kk 1 A xkþ1 þ A xkþ1 b ; l
1 1
ð3Þ
2 2
where s1 and s2 are the step size for the proximal gradient steps. It is easy to see that the subproblems in APGM (3) are easier to solve than the subproblems in ADMM (2). A natural idea for solving (1) is to extend the APGM (3) from the special case m ¼ 2 to the general case m P 3. This yields the following scheme for (1) when m P 3:
8 8 " !#2 9 = < m > X > > T k kþ1 1 k k > x1 ¼ arg min f1 ðx1 Þ þ 2ls1 x1 x1 s1 A1 A j xj b l k ; > > x 2X > ; : 1 1 > j¼1 > > > > > 8 > > " !#2 9 > = i1 m > < X X > > kþ1 T k kþ1 1 k k > ¼ arg min f ðx Þ þ x x s A A x þ A x b l k x > ; i i i i j j i i j j > 2lsi < i xi 2X i : ; j¼1 j¼i > 8 > " !#2 9 > > = m1 < > X > > T k kþ1 kþ1 1 k k > ¼ arg min fm ðxm Þ þ 2ls x x s A A x þ A x b l k xm ; > m m m j m m m j > m xm 2X m : > ; > j¼1 > > > ! > > m > X > kþ1 > > ¼ kk l1 Aj xkþ1 b ; >k j :
ð4Þ
j¼1
where l > 0 and si > 0, i ¼ 1; 2; . . . ; m. Similar to the ADMM, we want to investigate whether the APGM’s convergence can be extended to the general case, i.e., the case m P 3. In this paper, we show it’s global convergent for the general case under the strong convexity assumption of fi (i ¼ 1; 2; . . . ; m). 2. Preliminaries Let k k denote the Euclidean norm. For any positive definite matrix M, we denote k kM as the M-norm. If M is the product of a positive parameter b and the identity matrix I, i.e., M ¼ bI, we use the simpler notation: k kM ¼ k kb . Let f : Rn ! ð1; þ1. If the domain of f denoted by domf :¼ fx 2 Rn jf ðxÞ < þ1g is not empty, f is said to be proper. We say that f is convex if
f ðtx þ ð1 tÞyÞ 6 tf ðxÞ þ ð1 tÞf ðyÞ; 8x; y 2 Rn ;
8t 2 ½0; 1:
Furthermore, f is said to be strongly convex with the modulus g > 0 iff
1 f ðtx þ ð1 tÞyÞ 6 tf ðxÞ þ ð1 tÞf ðyÞ gtð1 tÞkx yk2 ; 2
8x; y 2 Rn ; 8t 2 ½0; 1:
For convex function f, the subdifferential of f is the set-valued operator defined by
@f ðxÞ :¼ fn 2 Rn jf ðyÞ P f ðxÞ þ hy x; ni;
8y 2 dom f g:
If proper function f is strongly convex with modulus g > 0, then
hs1 s2 ; x1 x2 i P
1 gkx1 x2 k2 ; 2
8s1 2 @f ðx1 Þ; 8s2 2 @f ðx2 Þ:
The Lagrange function of (1) is given by
! m m X X T Lðx1 ; x2 ; . . . ; xm ; kÞ ¼ fi ðxÞ k Ai xi b ; i¼1
i¼1
ð5Þ
260
M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263
and it is defined on the set
W :¼ X 1 X 2 . . . X m Rl : Let ðx1 ; x2 ; . . . ; xm ; k Þ be a saddle point of the Lagrange function (5). Then, for any k 2 Rl and xi 2 X i (i ¼ 1; 2; . . . ; m), we have
Lðx1 ; x2 ; . . . ; xm ; kÞ 6 Lðx1 ; x2 ; . . . ; xm ; k Þ 6 Lðx1 ; x2 ; . . . ; xm ; k Þ: Finding a saddle point of Lðx1 ; x2 ; . . . ; xm ; kÞ is equivalent to finding w ¼ ðx1 ; x2 ; . . . ; xm ; k Þ 2 W and ni 2 @fi ðxi Þ (i ¼ 1; 2; . . . ; m) such that
hxi xi ; ni ATi k i P 0; and hk k ;
m X Ai xi bi P 0; 8ðx1 ; x2 ; . . . ; xm ; kÞ 2 W:
ð6Þ
i¼1
b and H Finally, we define two auxiliary block-diagonal matrices H
b ¼ diag H H ¼ diag
1
Iþ
ls1
1
ls1
Iþ
m
l m
l
AT1 A1 ; AT1 A1 ;
1
ls2 1
ls2
Iþ Iþ
m
l m
l
AT2 A2 ; . . . ; AT2 A2 ; . . . ;
1
Iþ
lsm 1
Iþ
lsm
where I denotes the identity matrix in Rll . Since matrices.
m
l m
l
ATm Am ;
ATm Am ; lI ;
b and H are positive definite l > 0 and si > 0 ði ¼ 1; 2; . . . ; mÞ, both H
3. Convergence analysis In this section, by solving (1), we prove the convergence of the extended APGM (4) with the strongly convex assumption on fi ’s. Assume that fi is strongly convex with the the modulus gi > 0 (i ¼ 1; 2; . . . ; m). Let X denotes the solution set of (1), W denote the solution set of (6), and sequence fwk g be generated by (4), where wk ¼ ðxk1 ; xk2 ; . . . ; xkm ; kk Þ. We first prove the following lemma. Lemma 1. Let ðx1 ; x2 ; . . . ; xm Þ 2 X and k be a corresponding Lagrange multiplier associated with the linear constraint. Then
* kk k ;
2 + m m m m X X
1 1X 1 kþ1 X kþ1 2 kþ1 k Ai xikþ1 b P gi xkþ1 x þ x x ; x x A x b þ i i i i i i i i 2 ls l i i¼1 i¼1 i¼1 i¼1 * + m m X 1X þ Ai xkþ1 Ai xi ; Aj xkj Aj xjkþ1 : i
l i¼1
ð7Þ
j¼i
Proof. By invoking the first-order optimality condition for the xkþ1 -related subproblem in (4), there exists nkþ1 2 @fi ðxikþ1 Þ i i such that
* xi
xkþ1 ; nkþ1 i i
þ
1
lsi
"
xkþ1 i
xki
þs
T i Ai
i1 m X X Aj xkþ1 þ Aj xkj b lkk j j¼1
xi
xkþ1 ; nkþ1 i i
þ
1
lsi
xkþ1 i
xki
"
ATi
P 0;
i1 m X 1 X k Aj xkþ1 þ Aj xkj b j
ð8Þ
!#+
k
l
8 xi 2 X i :
j¼i
Taking xi :¼ xi and xi :¼ xikþ1 in (8) and (6), respectively, we obtain that
*
!#+
j¼1
P 0;
ð9Þ
j¼i
D E xkþ1 xi ; ni ATi k P 0: i
ð10Þ
Adding (9) and (10), we get
* xkþ1 i
xi ; ðni
nkþ1 Þ i
ATi ðk
k
k Þ
1
lsi
ðxikþ1
xki Þ
1
l
ATi
!+ i1 m X X kþ1 k P 0: A j xj þ A j xj b j¼1
j¼i
According to the strong convexity of fi , it follows that
D E D E
1 1 kþ1 Ai ðxkþ1 xi Þ; kk k P xkþ1 xi ; nkþ1 ni þ xi xi ; xkþ1 xki þ i i i i
lsi
P
1 1 1 kþ1 g kxkþ1 xi k2 þ x xi ; xkþ1 xki þ i 2 i i lsi i l
*
l
*
Ai ðxikþ1
xi Þ;
+ i1 m X X kþ1 k : Ai ðxkþ1 x Þ; A x þ ðA x bÞ j j j j i i j¼1
j¼i
+ i1 m X X kþ1 k A j xj þ Aj xj bÞ j¼1
j¼i
261
M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263
Summing the above inequality over all i ¼ 1; 2; . . . ; m and using the equality
Pm
i¼1 Ai xi
¼ b, we have
* + m D m E X X k k kþ1 kþ1 Ai ðxi xi Þ; k k ¼ Ai xi b; k k i¼1
i¼1
2 m m m X
1 1X 1 kþ1 X kþ1 2 kþ1 k P gi xkþ1 x þ x x ; x x A x b þ i i i i i i i i 2 i¼1 ls l i i¼1 i¼1 * + m m X 1X þ Ai xkþ1 xi ; Aj xkj Aj xkþ1 : i j
l i¼1
j¼i
The proof is completed. h Next, we find an upper bound of the quantity kwkþ1 w kH kwk w kH , which measures the progress made by the new iterate wkþ1 . With the help of Lemma 1, we can get the lemma below. Lemma 2. Let w ¼ ðx1 ; x2 ; . . . ; xm ; k Þ 2 W . Then
2 m m X kþ1 kþ1 1 X kþ1 2 w w 2 6 kwk w k2 c x x A x b ; i i i H i i H l i¼1 i¼1
ð11Þ
2 where ci ¼ gi 3mþ1 l kAi k .
Proof. It follows from the last equality in (4) and (7) that
kk
kþ1
2 !2 * + 2 m m m X 1 X 1 X kþ1 k k k kþ1 kþ1 k kl ¼ k k Ai xi b ¼ k k 2 k k ; A i xi b þ A i x i b l l i¼1 l i¼1 i¼1 2
l
2 2 X m m m X
1 1 kþ1 X kþ1 k kþ1 2 kþ1 k 6 k k gi kxi xi k 2 x i x i ; xi x i A i x i b l ls l i i¼1 i¼1 i¼1 * + m m X 2X Ai xkþ1 Ai xi ; ðAj xkj Aj xjkþ1 Þ : i
l i¼1
ð12Þ
j¼i
On the other hand, for any i,
2
lsi
2 kþ1 2 kþ1 xikþ1 xi ; xkþ1 xki ¼ xi xi ; xikþ1 xi hxi xi ; xi xki i i
lsi
6
2
lsi
lsi
kxkþ1 xi k2 þ i
2
lsi
jhxkþ1 xi ; xi xki ij 6 i
1
lsi
kxkþ1 xi k2 þ i
1
lsi
kxki xi k2
ð13Þ
and
* 2
Ai xikþ1
Ai xi ;
+ * + m m X X k kþ1 kþ1 k ðAj xj Aj xj Þ ¼ 2 Ai ðxi xi Þ; ðAj xj Aj xj Þ j¼i
*
j¼i
+ m X 2 Ai ðxikþ1 xi Þ; ðAj xj Aj xjkþ1 Þ j¼i m D m D E E X X
xi Þ; Aj ðxj xjkþ1 Þ 6 2 Ai ðxikþ1 xi Þ; Aj ðxkj xj Þ þ 2 Ai ðxkþ1 i j¼i
j¼i
m m X X 6 2ðm i þ 1ÞkAi ðxikþ1 xi Þk2 þ kAj ðxjkþ1 xj Þk2 þ kAj ðxkj xj Þk2 : j¼i
ð14Þ
j¼i
Summing up (13) for all i’s, we obtain
2
m X 1 i¼1
lsi
hxkþ1 xi ; xikþ1 xki i 6 i
m X 1 i¼1
lsi
kxkþ1 xi k2 þ i
m X 1 i¼1
lsi
kxki xi k2 :
ð15Þ
262
M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263
Since m X m m X X kAj xjkþ1 Aj xj k2 ¼ ikAi xkþ1 Ai xi k2 i i¼1 j¼i
i¼1
and m X m m X X kAj xkj Aj xj k2 ¼ ikAi xki Ai xi k2 ; i¼1 j¼i
i¼1
summing up (14) for all i’s, we obtain
* + m m m m m X X X X X kþ1 k kþ1 2 A i xi A i xi ; ðAj xj Aj xj Þ 6 2 ðm i þ 1ÞkAi ðxkþ1 xi Þk2 þ ikAi ðxikþ1 xi Þk2 þ ikAi ðxki xi Þk2 i i¼1
j¼i
i¼1
i¼1
i¼1
m m m m X X X X ¼ ð2m i þ 2ÞkAi ðxikþ1 xi Þk2 þ ikAi ðxki xi Þk2 6 ð2m þ 1ÞkAi ðxkþ1 xi Þk2 þ m kAi ðxki xi Þk2 : i i¼1
i¼1
i¼1
ð16Þ
i¼1
By substituting (15) and (16) into (12), we obtain
kkkþ1 k k2l þ
m m m m X X mX 1 mX 1 kAi ðxikþ1 xi Þk2 þ kxikþ1 xi k2 6 kkk k k2l þ kAi ðxki xi Þk2 þ kxki xi k2
l
i¼1
i¼1
lsi
l
i¼1
i¼1
lsi
2 m m m m X 1 3m þ 1 X mX X kþ1 kþ1 2 gi kxi xi k Ai xi b þ kAi ðxkþ1 xi Þk2 6 kkk k k2l þ kA ðxk xi Þk2 i l i¼1 l i¼1 l i¼1 i i i¼1 2 m m m m X X 1 1 3m þ 1 X X kþ1 k 2 kþ1 2 þ kxi xi k gi kxi xi k Ai xi b þ kA k2 kxkþ1 xi k2 ¼ kkk k k2l i lsi l i¼1 l i¼1 i i¼1 i¼1 2 m m m m X X mX 1 3m þ 1 1 X þ kAi ðxki xi Þk2 þ kxki xi k2 gi kAi k2 kxikþ1 xi k2 Ai xkþ1 b : i l i¼1 ls l l i i¼1 i¼1 i¼1 where the second inequality follows from kAi ðxikþ1 xi Þk2 6 kAi k2 kxikþ1 xi k2 .
h
With the help of the preceding two lemmas, we can establish the convergence of the extended APGM (4) for solving (1) with strongly convex fi ’s. Theorem 3. Let fi ’s in (1) be strongly convex with the modulus gi ’s. For any
( ) ð3m þ 1ÞkAi k2 ; l > max
gi
16i6m
the sequence fwk g generated by the APGM (4) converges to a solution of (6). Proof. Since
2
2 ik l > max fð3mþ1ÞkA g, ci :¼ gi 3mþ1 gi l kAi k > 0. It following from (11) that 16i6m
kwkþ1 w k2H 6 kwk w k2H 6 6 kw0 w k2H < þ1:
ð17Þ
Taking (11) into account, we have
2 2 3 X 1 m m 1 X X X 2 2 kþ1 kþ1 4 ci kx x k þ Ai x b ðkwk w kH kwkþ1 w kH Þ < þ1: 56 i i i l i¼1 k¼0 i¼1 k¼0 Thus
lim kxikþ1 xi k ¼ 0;
k!1
for all i ¼ 1; . . . ; m;
X m kþ1 lim Ai xi b ¼ 0: k!1 i¼1
ð18Þ
ð19Þ
Taking (17) and (18) into account, it yields that the sequence fkkk k k2l g converges. Without loss of generality, we take the k k and limkj !1 ni j ¼ ni in view of the boundness of fkk g and fnki g, where subsequences fkkj g and fnkj g such that limkj !1 kkj ¼ nki 2 @fi ðxki Þ. Taking limit along this subsequence in (8) and (19), we obtain
M. Chao, C. Cheng / Applied Mathematics and Computation 228 (2014) 258–263
ni 2 @fi ðxi Þ; hxi 0 xi ; ni AT ki P 0;
8xi 0 2 X i and
263
m X Ai xi ¼ b; i¼1
which mean that ðx1 ; x2 ; . . . ; xm ; kÞ is a saddle point of the Lagrange function (5). Since k is an arbitrary Lagrange multiplier in (17) and conclude that the whole generated sequence fkk g converges to corresponding to ðx1 ; x2 ; . . . ; xm Þ , we can set k ¼ k This completes the proof. h k.
4. Conclusion We consider the linearly constrained separable convex programming, whose objective function is separable into m ðP 3Þ individual convex functions without coupled variables. The alternating proximal gradient method is an effective method for the case m ¼ 2, but it is unknown whether its convergence can be extended to the general case m P 3. In this paper, we show the global convergence of the alternating proximal gradient method for minimizing the sum of any number of strongly convex separable functions. Acknowledgments This work was supported by the National Science Foundation of China (Grant No. 61179033), and Doctoral Fund of Innovation of Beijing University of Technology. References [1] D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite element approximations, Comput. Math. Appl. 2 (1976) 17–40. [2] D. Gabay, Applications of the method of multipliers to variational inequalities, in: M. Fortin, R. Glowinski (Eds.), Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems, North-Holland, Amsterdam, 1983, pp. 299–331. [3] D.R. Han, X.M. Yuan, A Note on the Alternating Direction Method of Multipliers, J. Optim. Theory Appl. 155 (2012) 227–238. [4] B.S. He, M. Tao, X.M. Yuan, Alternating direction method with Gaussian back substitution for separable convex programming, SIAM J. Optim. 22 (2012) 313–340. [5] M.Y. Hong, Z.Q. Luo, On the linear convergence of the alternating direction method of multipliers, Arxiv preprint arxiv:1208.3922 (2012). [6] S.Q. Ma, Alternating Proximal Gradient Method for Convex Minimization, Preprint (2012). [7] Y.G. Peng, A. Ganesh, J. Wright, W.L. Xu, Y. Ma, RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images, IEEE Trans. Pattern Anal. Mach. Intell. 34 (11) (2012) 2233–2246. [8] M. Tao, X.M. Yuan, Recovering low-rank and sparse components of matrices from incomplete and noisy observations, SIAM J. Optim. 21 (2011) 57–81.