Signal Processing ] (]]]]) ]]]–]]]
1
Contents lists available at ScienceDirect
3
Signal Processing
5
journal homepage: www.elsevier.com/locate/sigpro
7 9 11 13
Fast communication
Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery
15 Q1
Xiaofan Lin n, Gang Wei
17
School of Electronic and Information Engineering, South China University of Technology, Wushan Road, Tianhe District, Guangzhou 510641, PR China
19 21 23 25
a r t i c l e i n f o
abstract
Article history: Received 16 August 2014 Received in revised form 3 November 2014 Accepted 5 February 2015
In this paper we propose an accelerated reweighted nuclear norm minimization algorithm to recover a low rank matrix. Our approach differs from other iterative reweighted algorithms, as we design an accelerated procedure which makes the objective function descend further at every iteration. The proposed algorithm is the accelerated version of a state-of-the-art algorithm. We provide a new analysis of the original algorithm to derive our own accelerated version, and prove that our algorithm is guaranteed to converge to a stationary point of the reweighted nuclear norm minimization problem. Numerical results show that our algorithm requires distinctly fewer iterations and less computational time than the original one to achieve the same (or very close) accuracy, in some problem instances even require only about 50% computational time of the original one, and is also notably faster than several other state-of-the-art algorithms. & 2015 Published by Elsevier B.V.
27 29 31
Keywords: Matrix rank minimization Matrix completion Compressed sensing
33 35
63
37
65
39
1. Introduction
41
There is a rapidly growing interest in the low rank matrix recovery problem or the rank minimization problem, which recovers an unknown low rank matrix form very limited information [1–5]. This paper deals with the following affine rank minimization problem:
43 45 47 49 51 53 55 57 59
min rankðXÞ X
s:t: AðXÞ ¼ b;
ð1Þ
where the linear map A: Rmn -Rs and the vector b A Rs are known. The above problem aims to find a matrix of minimum rank that satisfies a given system of linear equality constraints, which is a useful idea that can be applied to various applications such as signal or image processing [6,7], subspace segmentation [8], collaborative filtering [9] and system n
Corresponding author. E-mail address:
[email protected] (X. Lin).
identification [10]. Although (1) is simple in form, it is a challenging optimization problem due to the discrete nature of the rank function. A commonly used heuristic introduced in [11] is replacing the rank function with the nuclear norm, which is the sum of the singular values of the matrix. This technique is based on the fact that the nuclear norm minimization is the tightest convex relaxation of the rank minimization problem. The new formula can be given by min jjXjjn X
s:t: AðXÞ ¼ b;
ð2Þ
67 69 71 73 75 77
which can be rewritten as follows under some conditions (recall the Lagrangian relaxation): ð3Þ min λX jn þ 12 ‖AðXÞ b‖22 ;
79
P where jjXjjn ≔ ri ¼ 1 σ i ðXÞ denotes the nuclear norm. However, the nuclear norm minimization requires more measurements for exact recovery of the low rank solution. Recently, a tighter relaxation, called the Schatten
83
X
http://dx.doi.org/10.1016/j.sigpro.2015.02.004 0165-1684/& 2015 Published by Elsevier B.V.
61 Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i
81
85 87
X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]
2
1 3 5 7 9 11 13 15
p-norm1 minimization [12–14], was introduced instead of the previous nuclear norm minimization in order to give a better approximation to the original problem. The Schatten p-norm with 0 op r1 is defined as !1=p r X jjXjjp ≔ σ i ðXÞp : ð4Þ i¼1
It is easy to see that when p¼1, the Schatten p-norm is exactly the nuclear norm, and when p tends towards 0, the Schatten p-norm becomes closer to the rank function. Thus, Schatten p-norm is more general than the nuclear norm. Theoretical analysis in [12,13,15] shows that the Schatten p-norm needs fewer measurements with small p for exact recovery. With the above definition, the Schatten p-norm minimization can be formulated as follows:
17
min P0 ðX Þ≔λp‖X‖pp þ 12 ‖AðXÞ b‖22 ;
19
When 0 o p o 1, the above problem is nonconvex. To solve this nonconvex problem efficiently, some iterative reweighted algorithms have been proposed and analyzed in recent published literature [2,3,14]. The key idea of iterative reweighted technique is to solve a convex problem with a given weight at each iteration, and update the weight at every turn. This idea is commonly used to solve Schatten p-norm or Lp norm minimization. In this paper we propose a novel iterative reweighted algorithm that solves the Schatten p-norm minimization problem (i.e., (5)) very efficiently. Different from other iterative reweighted algorithms, we add an accelerated procedure which makes the objective function descend further at every iteration. The proposed algorithm is the accelerated version of a state-of-the-art algorithm, called the reweighted nuclear norm minimization (RNNM) algorithm [16]. However, our accelerated version is notably faster than the original one, and achieves the same (or very close) accuracy. The rest of this paper is organized as follows. In Section 2, we give some preliminary results that are used in our analysis. In Section 3, we give our own analysis of the original RNNM algorithm, and Section 4 uses this analysis to derive a sketch of the accelerated version. Some algorithm details are discussed in Sections 5 and 6. The convergence analysis, which proves that our algorithm is guaranteed to converge to a stationary point of P0 ðXÞ (see (5)), is given in Section 7. Numerical results are reported in Section 8. In Section 9, we give some conclusions.
21 23 25 27 29 31 33 35 37 39 41 43 45
X
ð5Þ
with ε 4 0. Here we assume m rn without loss of generality. Compare (7) with (4), one can see that with ε tends to 0, (7) will get closer to (4). The following definition will be used frequently in this paper. Definition 1. Let X ¼ U ΣV be the reduced singular value decomposition (SVD) of X with rank r. For each v Z0, the weighted singular value shrinkage operator Dv ð; wÞ is defined by T
53
min PðX; εÞ≔λ
55 57
X
where X A R jjXjjp;ε ≔
p‖X‖pp;ε þ 12 ‖AðXÞ b‖22 ;
mn
m X
ð6Þ
and !1=p
ðσ i ðXÞ þ εÞp
:2
ð7Þ
i¼1
59 1
61
Strictly speaking, the Schatten p-norm is not a norm when 0 o p o 1 since it is nonconvex. However it is called a norm customarily.
69 71
where ðÞ þ ≔maxfa; 0g for any a A R; wi , i ¼ 1; 2; …; r are elements of vector w; σ i ðXÞ, i ¼ 1; 2; …; r, are singular values of X.
75
With Definition 1, we give the following lemma, which has been proved in [16]. Lemma 1. For each v Z 0 and a constant matrix Y A Rmn , the weighted singular value shrinkage operator Dv ðY; wÞ is the optimal solution of the following problem: minv X
m X
1 wi σ i ðX Þ þ ‖X Y‖2F : 2 i¼1
77 79 81 83 85 87
For convenience, the abbreviation ORNNM refers to the original RNNM algorithm that we aim to accelerate in the sequel.
In this paper, we propose an accelerated procedure to improve the performance of ORNNM. This algorithm has been analyzed in [16]. However, to derive our accelerated version, here we give our own analysis. Inspired by [17,4], where auxiliary functions for analyzing the L1 norm/Schatten p-norm minimization were constructed, we define a different auxiliary function as follows: Q ðX; Y; w; εÞ ¼ FðX; w; εÞ þ GðX; YÞ
ð8Þ
where FðX; w; εÞ≔λ
m X
p=ðp 1Þ Þ þð1 pÞwi
pwi ðσ i ðXÞ þ ε
i¼1
1 þ ‖AðXÞ b‖22 2
1 1 GðX; Y Þ≔ ‖AðX YÞ‖22 þ ‖X Y‖2F : 2 2
89 91 93
3. A new analysis for original RNNM algorithm
95 97 99 101 103 105
107 ð9Þ
109 111
and
51
67
73
2. Preliminaries We first introduce the following unconstrained smooth nonconvex problem to approximate (5):
65
Dv ðX; wÞ ¼ U Diagfðσ i ðXÞ vwi Þ þ ; i ¼ 1; 2; …; rgV T ;
47 49
63
ð10Þ
113
In the following, we illustrate that minimizing Q ðX; Y; w; εÞ will lead to minimizing P0 ðXÞ (see (5) for definition). First, it can be verified that wn , whose entries n wi ¼ ðσ i ðXÞ þ εÞp 1 , i ¼ 1; 2; …; m, minimizes FðX; w; εÞ over w, since wni ¼ ðσ i ðXÞ þ εÞp 1 is an optimal solution of the
115
2 Strictly speaking, this is not a norm. However we preserve the norm notation for conventional reason, since the Schatten p-norm is neither a norm but it is still common to use a norm notation for it in the literature.
Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i
117 119 121 123
X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]
1
p=ðp 1Þ Þ þð1 pÞwi :
min pwi ðσ i ðXÞ þ ε
ð11Þ
wi 4 0
5
Therefore, substituting wni ¼ ðσ i ðXÞ þ εÞp 1 , i ¼ 1; …; m into (9), we have min FðX; w; εÞ ¼ λ w
9 11 13 15 17 19 21
‖AðXÞ b‖22 ‖AðX YÞ‖22 þ ‖X Y‖2F
¼ 2 X Y; ∇f ðYÞ þ ∇f ðYÞ; ∇f ðYÞ þ hX Y; X Y i ¼ X Y þ∇f ðYÞ; X Y þ ∇f ðYÞ ¼ ‖X Y þ ∇f ðYÞ‖2F :
m X
1 ðσ i ðXÞ þ εÞp þ ‖AðXÞ b‖22 2 i¼1
Z λp
63
Therefore,
following convex problem:
3
7
3
65 67 ð17Þ
With (8)–(10) and (17), (8) can be further formulated as follows:
m X
1 ðσ i ðXÞ þ εÞp þ ‖AðXÞ b‖22 2 i¼1
¼ PðX; εÞ;
ð12Þ
where the inequality follows from the fact that 0 op r1. Here the multiplier p is necessary because we focus on solving (6), whose stationary point the ORNNM algorithm converges to. Secondly, it can also be verified that for a fixed Y, the second derivative of GðX; YÞ is HX ¼ ðI An AÞ I , where denotes the Kronecker product. If jjAn Ajj r1, then the corresponding Hessian matrix of HX is positive semidefinite, which means GðX; YÞ Z 0. In that case,
23
Q ðX; Y; w; εÞ ZFðX; w; εÞ:
25
The equality holds when Y ¼ X. Combining (12) and (13), we have the following lemma.
27
Lemma 2. If jjAn Ajjr 1, then the following inequalities hold
29
Q ðX; Y; w; εÞ ZFðX; w; εÞ ZPðX; εÞ ZP0 ðXÞ:
ð13Þ
ð14Þ
1 Q ðX; Y; w; εÞ ¼ ‖X ðY ∇f ðYÞÞ‖2F 2 m X p=ðp 1Þ : þλ pwi ðσ i ðXÞ þ εÞ þ ð1 pÞwi
35 37 39 41 43 45 47 49 51 53 55 57
61
73 75
ð18Þ By Lemma 1, for a fixed Y , w and ε , the minimum of Q ðX; Y; w; εÞ over X is X nk þ 1 ¼ Dλp ðY k ∇fðY k Þ; wk Þ.3 By setting Y ¼ X k we have GðX k ; YÞ ¼ 0 and Q ðX k ; Y; wk ; εk Þ ¼ FðX k ; wk ; εk Þ. Therefore, the minimum of Q ðX; Y; w; εÞ over Y is Y nk þ 1 ¼ X k . By (11), the minimum of Q ðX; Y; w; εÞ over w is wnk þ 1 , whose entries wni ¼ ðσ i ðXÞ þ εÞp 1 , i ¼ 1; 2; …; m. From (9) we can see that if the sequence of εk decreases, Q ðX; Y; w; εk Þ will decrease. The above statements indicate a block-coordinate descent algorithm to minimize Q ðX; Y; w; εÞ. Therefore, according to Lemma 2, a block-coordinate descent algorithm, outlined in Algorithm 1, can be derived to solve (5). It can be observed that Algorithm 1 is equivalent to ORNNM in its original paper [16]. k
77
k
k
79 81 83 85 87 89 91 93
The last part of (14) can be easily verified from the definition of PðX; εÞ and P0 ðXÞ. Without loss of generality, we assume the condition jjAn Ajj r1 always holds, since A can always be renormalized. In Section 8 where algorithms are compared by solving the matrix completion problem, jjAn Ajjr 1 holds since A is a projector. Due to Lemma 2, we would expect minimizing P0 ðXÞ by minimizing Q ðX; Y; w; εÞ. Next, we illustrate that the ORNNM algorithm is a block-coordinate descent algorithm to minimize function Q ðX; Y; w; εÞ. By basic matrix computations, we have
Algorithm 1. Equivalent ORNNM algorithm. Input: linear map A: Rmn -Rs , vector bA Rs , p A ð0; 1Þ, λ; ε0 4 0, η A ð0; 1Þ. Output: matrix X n A Rmn . Initialize: Y 0 ¼ 0, k≔0, w0 ¼ 0; Loop:
T
T
99
2. Y k þ 1 ¼ X k þ 1 ; 3. wki þ 1 ¼ ðσ i ðX k þ 1 Þþ εk Þp 1 ;
103
4. X n ¼ X k þ 1 if converged, otherwise εk þ 1 ¼ ηεk , k≔k þ 1, go to Step 1.
105
4. Sketch of accelerated RNNM algorithm 109 In this section, we point out a way to accelerate the ORNNM algorithm according to the analysis in last section, and give a sketch of the proposed accelerated version. Assume Q ðX k ; Y k ; wk ; εk Þ is generated by Algorithm 1 at the end of kðk Z1Þth iteration, then according to (13),
T
¼ 2ðAðXÞÞ b þ b b þ 2ðAðXÞÞ AðYÞ ðAðYÞÞ AðYÞ ¼ 2ðAðX YÞÞT ðAðYÞ bÞ þ ðAðYÞ bÞT ðAðYÞ bÞ ¼ 2 AðX YÞ; AðYÞ b þ AðYÞ b; AðYÞ b ¼ 2 X Y; An ðAðYÞ bÞ n þ A ðAðYÞ bÞ; An ðAðYÞ bÞ ;
97
107
T
¼ ðAðXÞÞT AðXÞ 2ðAðXÞÞT b þ b b ðAðXÞÞT AðXÞ 2ðAðXÞÞT AðYÞ þ ðAðYÞÞT AðYÞ T
95
101
1. X k þ 1 ¼ Dλp ðY k ∇f ðY k Þ; wk Þ;
‖AðXÞ b‖22 ‖AðX YÞ‖22
ð15Þ
where h; i denotes the inner product operator. Denote ∇f ðYÞ ¼ An ðAðYÞ bÞ as the gradient of function f ðY Þ≔12‖AðYÞ b‖22 ,
59
71
i¼1
31 33
69
Q ðX k ; Y k ; wk ; εk Þ ¼ FðX k ; wk ; εk Þ:
ð19Þ
In Step 1 of ðk þ 1Þth iteration, Q ðX
kþ1
k
k
k
k
k
113 115 117
; Y ; w ; ε Þ r Q ðX ; Y ; w ; ε Þ: k
111
k
ð20Þ
119
then
‖AðXÞ b‖22 ‖AðX YÞ‖22 ¼ 2 X Y; ∇f ðYÞ þ ∇f ðYÞ; ∇f ðYÞ :
ð16Þ
3 In this paper, the superscript k (or k þ 1) and the subscript k (or k þ 1) have the same meaning. We place k (or k þ 1) in different position so that the notations are more clear.
Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i
121 123
X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]
4
3 5 7 9 11 13
Q ðX
19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
kþ1
;Y
kþ1
; w ; ε Þ ¼ Q ðX k
k
kþ1
;X
kþ1
;w ;ε Þ k
k
¼ FðX k þ 1 ; wk ; εk Þ rQ ðX k þ 1 ; Y k ; wk ; εk Þ:
ð21Þ
It should be noted that choosing Y k þ 1 ¼ X k þ 1 is not the only option in Step 2, since in next iteration, X k þ 2 ¼ Dλp ðY k þ 1 ∇f ðY k þ 1 Þ; wk þ 1 Þ always minimizes Q ðX; Y k þ 1 ; wk þ 1 ; εk þ 1 Þ over X given any Y k þ 1 . In fact, by choosing a kþ1 kþ1 ¼ X^ satisfying Y^ FðX^
kþ1
; wk ; εk Þ r FðX k þ 1 ; wk ; εk Þ;
ð22Þ
we obtain Q ðX^
15 17
Approximated SVD of matrix A U ΣV T . 1. Q ’QRðARÞ; 2. for t ¼ 1; 2; …; T max do
In Step 2 of ðk þ 1Þth iteration,
1
kþ1
; Y^
kþ1
; wk ; εk Þ rQ ðX k þ 1 ; Y k þ 1 ; wk ; εk Þ:
ð23Þ
kþ1
FðX
kþ1
kþ1 kþ1 ; wk ; εk Þ ¼ Q ðX^ ; Y^ ; wk ; εk Þ;
; w ; ε Þ ¼ Q ðX k
k
kþ1
;Y
kþ1
; w ; ε Þ: k
k
Eq. (23) means function Q ðX; Y; wk ; εk Þ descends further kþ1 by choosing X ¼ Y ¼ X^ at Step 2 than the original option that chooses X ¼ Y ¼ X k þ 1 . According to the above analysis, a sketch of an accelerated reweighted nuclear norm minimization (ARNNM) algorithm can be derived. The algorithm sketch is outlined in Algorithm 2. Besides, we have the following theorem. Theorem 1. The ARNNM algorithm (i.e., Algorithm 2) descends further than the ORNNM algorithm (i.e., Algorithm 1) at every iteration. Algorithm 2. Sketch of accelerated RNNM (ARNNM) algorithm. Input: linear map A: Rmn -Rs , vector b A Rs , p A ð0; 1Þ, λ; ε0 40, η A ð0; 1Þ. Output: matrix X n A Rmn . Initialize: Y 0 ¼ 0, k≔0, w0 ¼ 0; Loop: 1. X k þ 1 ¼ Dλp ðY k ∇f ðY k Þ; wk Þ; 2. choose Y
kþ1
kþ1 kþ1 ¼ X^ such that FðX^ ; wk ; εk Þr FðX k þ 1 ; wk ; εk Þ;
kþ1 3. X k þ 1 ’X^ ;
4.
wki þ 1
¼ ðσ i ðX
kþ1
Þþε Þ
k p1
;
5. Power method for approximated SVD In both ARNNM and ORNNM algorithm, computing SVD for a large matrix is required in Step 1 and is usually expensive. Inspired by [4,18], we use the power method [18] to compute an approximated SVD, and use the eigenvectors in the previous iteration to initialize the power method. This approach reduces the computational time for SVD. The power method is summarized in Algorithm 3. Algorithm 3. Power method (approximated SVD method for Step 1 of Algorithm 2). Input: Input matrix A, rank rr, initial R A Rnrr Output:
67
6. U ¼ Q U^ ;
69
The power method is not part of the accelerated procedure proposed in this paper. In Section 8 where various algorithms are compared, we apply the power method to both ORNNM and ARNNM algorithm. 6. Implementation of Step 2 of ARNNM
71 73 75 77
kþ1
How to obtain X^ efficiently is a key to implement Step 2 of Algorithm 2. Solving the following problem is an option kþ1 ¼ arg min FðX; wk ; εk Þ: X^
ð24Þ
X
However, since X is usually a large variable in practice, the above problem is non-scalable and expensive. A better kþ1 option is adjusting the top singular values of X^ . Since kþ1 ^ X is usually a low-rank matrix in practice, the problem size is significantly reduced compared to (24). Let X k þ 1 ¼ U k þ 1 Diagðσ k þ 1 ÞV Tk þ 1 Xk þ 1 ¼
r1 X
σ ki þ 1 ui vTi þ
i¼1
rr X
σ kj þ 1 uj vTj
81 83 85 87 89 91
ð26Þ
93
be the reduced SVD of X k þ 1 . Here σ ki þ 1 ; i ¼ 1; …; r 1 are the top r 1 ðr 1 orrÞ singular values, and σ kj þ 1 , j ¼ r 1 þ 1; …; rr the rest. σ k þ 1 A Rrr is a vector whose entries are σ ki þ 1 ; i ¼ 1; …; rr, and Diagðσ k þ 1 Þ A Rrrrr is a diagonal matrix whose diagonal entries are σ ki þ 1 , i ¼ 1; …; rr. Define function 0 1 r1 rr X X T k þ 1 T k k ð27Þ σ i ui v i þ σ j uj v j ; w ; ε A Fðσ Þ ¼ F@
95 97 99 101 103
j ¼ r1 þ 1
with σ ¼ vecðσ i ; i ¼ 1; …; r 1 Þ A Rr1 being variable and the rest (i.e., σ kj þ 1 , j ¼ r 1 þ 1; …; rr, ui , vi , i ¼ 1; …; rr, wk , εk ) being constants. We propose the following approach to kþ1 obtain X^ :
σ^ k þ 1 ¼ arg min Fðσ Þ
105 107 109
ð28Þ
σ A Rr1 ;r 1 o rr
σ k þ 1 ½1: r1 ’σ^
79
ð25Þ
j ¼ r1 þ 1
i¼1
5. X n ¼ X k þ 1 if converged, otherwise εk þ 1 ¼ ηεk , k≔k þ 1, go to Step 1.
65
3. Q ’QRðAAT Q Þ; 4. end 5. ½U^ ; Σ; V T ¼ SVDðQ T AÞ;
The above inequality follows from the fact that FðX^
63
111
kþ1
kþ1 ’U k þ 1 Diagðσ k þ 1 ÞV Tk þ 1 : X^
ð29Þ
113
ð30Þ
115
In (28)–(30), σ ; σ^ k þ 1 A R . Eq. (29) means replacing the kþ1 first r1 elements of σ k þ 1 with σ^ . Here we choose r 1 orr to further reduce the problem size. Since small singular values do not much affect the result, computing them is unnecessary and will cost more time. Note that (28) is a quadratic convex problem, and the scale of variable σ is small, thus (28)–(30) can be computed efficiently. r1
Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i
117 119 121 123
X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]
1 3
7
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57
j ¼ r1 þ 1
i¼1
7. Convergence results ð31Þ
It can be derived from (31) that the gradient and Hessian of function Fðσ Þ are respectively ∇Fσ i ¼ uTi An ðAðU k þ 1 Diagðσ ÞV Tk þ 1 Þ bÞvi þ λpwi ; i ¼ 1; 2; …; r 1
ð32Þ
and
T Aðuj vTj Þ ; H σ i;j ¼ Aðui vTi Þ
i; j ¼ 1; 2; …; r 1 :
ð33Þ
With (32) and (33), the Newton method can be used to solve (28) in only one iteration (since the function is quadratic). However, for a relatively large r1, computing every element of Hessian matrix with (33) is still not efficient enough. To avoid this problem, we take advantage of the following fact: !!T !! r1 r1 X X T d Hσ d ¼ A di ui vTi A di uj vTj ; ð34Þ i¼1
i¼1 r 1 r 1
r1
is the where d A R is a given vector and H σ A R Hessian matrix whose i; jth entry is H σ i;j . Computing (33) needs about Oðr 21 ðmn þ smn þ s2 ÞÞ order, while (34) needs about Oðr 1 mn þ smn þ s2 Þ. Clearly, (34) can be computed more efficiently than (33). Thus, we propose the following iterative approach to solve (28):
β’
∇FTσ ∇Fσ
∇FTσ H σ ∇Fσ
;
ð35Þ
σ ’σ þ β∇Fσ :
ð36Þ
The above approach is known as the steepest descend method, and (35) computes the best step length for the quadratic problem (28) in each iteration. The denominator in (35) can be computed applying (34). By using the above approach, the implementation of Step 2 of Algorithm 2 is derived and outlined in Algorithm 4. Algorithm 4. Implementation of Step 2 of Algorithm 2. Input: linear map A: R
mn
-R , vector bA R , p A ð0; 1Þ, λ; ε 4 0, w , s
s
k
k
r 1 o rr, X k þ 1 and its reduced SVD U k þ 1 Diagðσ k þ 1 ÞV Tk þ 1 . Output: kþ1 A Rmn . matrix X^
1. Initialize vector σ with σ i ¼ σ ki þ 1 , i ¼ 1; 2; …; rr; 2. for iter ¼ 1; 2; …; t max do 3. Update ∇Fσ by (32); 4. Update β by (35); 5. Update σ by (36); 6. end kþ1 7. Set σ^ ¼ σ; kþ1 by (29) and (30); 8. Compute X^
67
In this section, we discuss the convergence of the ARNNM algorithm (i.e., Algorithm 2). Here the convergence of an algorithm means that the sequence generated by the algorithm gets arbitrarily close to a limit point. Lemma 3. The ARNNM algorithm (i.e., Algorithm 2) generates a non-increasing sequence of function P0 ðXÞ (see (5) for definition). Proof. Consider the k þ1ðk Z1Þth iteration of Algorithm 2. After Step 2, by (22) we have FðX^
kþ1
; wk ; εk Þ rFðX k þ 1 ; wk ; εk Þ r FðX k ; wk ; εk Þ;
ð37Þ
kþ1 Then, after Step 3, X k þ 1 equals X^ . According to (12), after Step 4 we have
PðX k þ 1 ; εk Þ r PðX k ; εk Þ;
ð38Þ
Since fεk g is a decreasing sequence, kþ1
;ε
kþ1
69 71 73 75 77 79 81 83 85
; ε Þ; k
ð39Þ
87
Combine (38) and (39), we conclude that Algorithm 2 generates a non-increasing sequence of function PðX; εÞ. According to the fact that
89
PðX
Þ rPðX
kþ1
P0 ðXÞ r PðX; εÞ;
ð40Þ
we further conclude that Algorithm 2 generates a nonincreasing sequence of function P0 ðXÞ. □ Theorem 2. If sequence fεk g converges to zero, then the sequence fX k g generated by ARNNM algorithm (i.e., Algorithm 2) converges to a stationary point of function P0 ðXÞ (see (5) for definition). When p¼1, The sequence converges to the global optimum of function P0 ðXÞ. Proof. By definition, function P0 ðXÞ is bounded below. Since the sequence fX k g generated by Algorithm 2 is non-increasing (by Lemma 3), there exists a subsequence fX sk g that converges to a limit point X . For convenience we denote sk by k in the following. Consider the ðk þ 1Þth ðk Z 1Þ iteration in Algorithm 2. kþ1 , and according According to Step 3, X k þ 1 comes from X^ kþ1
. Thus, fX k g converges to X means to Step 2, Y k þ 1 ¼ X^ fY k g also converges to X . Besides, since σ i ðÞ is a continuous function, it can be observed from Algorithm 2 that fwk g
91 93 95 97 99 101 103 105 107 109 111
εmin is
113
ð41Þ
115
Since Dλp ð; Þ is a continuous function, eventually Dλp ðY k ∇f ðY k Þ; wk Þ converges to Dλ ðX ∇f ðX Þ; wÞ. Thus, the following formula holds according to Step 1 of Algorithm 2:
117
converges to w, where w i ¼ ðσ i ðX Þ þ εmin Þ
p1
and
the limit point of sequence fεk g. When εmin ¼ 0, we have w i ¼ ðσ i ðX ÞÞp 1 :
119 121
59 61
63 65
0 1 r1 rr X 1 @ X T kþ1 TA σ i ui v i þ σ j uj vj bj22 þ jA 2 i¼1 j ¼ r1 þ 1
5
9
iteration of steepest descend method. In our experiments, one or two iteration is enough for Algorithm 4.
Eq. (28) can be further formulated as follows: 0 1 r1 rr X X kþ1 k þ 1 σ^ ¼ arg min λp@ wi σ i þ wj σ j A σ A Rr 1
5
Note that even one iteration in Algorithm 4 can result in accelerating effect, since (22) is satisfied with one
X ¼ Dλp ðX ∇f ðX Þ; wÞ:
ð42Þ
Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i
123
X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]
6
3 5 7 9
By Lemma 1, X is an optimal solution of the following problem: min λp X
r X
1 w i σ i ðX Þ þ ‖X X ∇f ðX Þ ‖22 ; 2 i¼1
where w i ¼ σ i ðX Þp 1 . Therefore, ! r X 0 A λp∂ w i σ i ðX Þ þ X X ∇f ðX Þ :
ð43Þ
13
Combining (41) and (44), we obtain ! r X p σ i ðX Þ þAn ðAðX Þ bÞ; 0 A λp∂
ð44Þ
17
ð45Þ
i¼1
15
which means X is also a stationary point of P0 ðXÞ. When p¼1, P0 ðXÞ is convex. In this case X is the global optimum of function P0 ðXÞ. □
19 8. Numerical result 21
25
In this section, we compare the proposed ARNNM algorithm with the ORNNM algorithm and several state-ofthe-art algorithms. In the experiment, we solve the following matrix completion problem:
27
min λ
23
29 31 33 35 37 39 41 43 45 47
X
p‖X‖pp þ 12 ‖P Ω ðXÞ P Ω ðMÞ‖22 ;
63
In this subsection, we compare the proposed ARNNM algorithm with the original one. The following stopping criterion is applied to both algorithms:
65
jjX k X k þ 1 jjF
69
maxf1; jjX k jjF g
i¼1
11
8.1. Compared with the ORNNM algorithm
1000
r~ k ’arg max σ rk ðX k Þ 4 ρ σ 1 ðX k Þ rk A Z þ
r~ ¼ minfm; n; r~ k g;
ð47Þ
where ρ is a parameter chosen manually. Eq. (47) is executed in every iteration, and the first iteration needs a parameter rmax to initialize r~ . This approach has also been commonly applied in several other papers such as [3,5]. In our experiments, we follow the same rank estimation approach. We also use the relative error
to measure the recovery level, as [16] does. The synthetic data is generated by
53
M ¼ M L M TR þ σ n Ξ
57 59 61
75 77 79 81 83 85
ARNNM ORNNM
87 89 91
700 600
93
500
95
400
97
300 200 4000
5000
6000
7000
8000
99
demension Fig. A1. Computation time vs. dimension of the problem. The labels of the x-axis represent one dimension of the square matrix, for example, label “4000” means that the matrix is of 4000 4000.
900
101 103 105
1000 ARNNM ORNNM
107
800
109
700
ð48Þ
ð49Þ
with M L A Rmr , M R A Rnr and Ξ A Rmn whose entries generated form i.i.d. standard Gaussian distribution. Here Ξ is the noisy matrix and σn is the noisy level. The sampling ratio sr ¼ Ω and degree of freedom =ðmnÞ the ratio fr ¼ rðm þ n rÞ=Ω, where Ω is the sample number. All the experiments are performed on a PC with an Intel Core i5, 2.5 GHz processor with 4 GB of RAM.
cputime
jjX MjjF rel:err≔ o ε: maxf1; jjMjjF g
73
900
ð46Þ
which is problem (5) with A ¼ P Ω and b ¼ P Ω ðMÞ. Here P Ω is a projective operator, Ω is a random subset of index set, and M A Rmn is the matrix to be recovered with sampled entries fM ij ; ði; jÞ A Ωg known. The approximated SVD approach such as truncated SVD or the power method used in this paper reduces the computational time significantly. Therefore, the choice of estimated rank r~ is important for speeding up the algorithm. In [16] where the ORNNM algorithm is proposed, the following rank estimation approach is applied:
67
71
800
51
55
ð50Þ
1100
k
49
o ε:
Note that the approximated SVD and rank estimation approaches in the implementations of these two algorithms are the same, the only difference between them is that ARNNM has the additional accelerated approach, i.e., Steps 2 and 3 in Algorithm 2. The rank estimation approach has a great effect on final recovery accuracy. If the final estimated rank is not precise, the relative error is usually larger than 1e3 in our tests. To make a fair comparison, we tune the parameter ρ so that the final estimated rank is precise in every run of this subsection. The rest parameters are set as follows: ε ¼ 1e 4, λ ¼ 1e 6, p¼1, r 1 ¼ 10, r max ¼ 100, ε0 ¼ 10, η ¼ 0:9, t max ¼ 1.
cputime
1
111
600 500
113
400
115
300
117 200 100 50
119 100
150
200
250
rank Fig. A2. Computation time vs. rank of the matrix. Here the dimension of the matrix is 4000 4000.
Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i
121 123
X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]
1 3
We test the algorithms on noiseless synthetic data and after then noisy synthetic data. Table A1 shows the performance of both algorithms in noiseless case. It can be observed
7
that the proposed ARNNM algorithm achieves close accuracy to ORNNM algorithm with distinctly fewer iterations and less computational time. Table A2 shows the performance of both
10
7
relative RMSE
15
69
ARNNM ORNNM ALT FPCA sIRLS
9
13
65 67
5
11
63
71 73
10
75 77 10
17
79
19
81
21
10
0
5
10
15
20
25
30
23 25
87
10 ARNNM ORNNM ALT FPCA sIRLS
relative RMSE
29
33 35
83 85
27
31
35
cputime
89 91
10
93 95 97
10
99
37
101
39 10
41
0
10
20
30
40
50
60
cputime
103 105
43 10
107
47
109
49 51 53
relative RMSE
45
10
111 113 115
10
55 57 59 61
117
ARNNM ORNNM ALT FPCA sIRLS
10
0
20
119 40
cputime
60
80
100
Fig. A3. Comparison of five algorithms in noiseless case (i.e., σ n ¼ 0). r ¼60. The problem sizes are (a) 1000 1000, (b) 2000 2000, and (c) 3000 3000.
Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i
121 123
X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]
8
1 3
algorithms in noisy case. Compared with Table A1, one can see that the accuracy of both algorithms descends due to the noise. However, ARNNM is still faster than ORNNM, and
achieves close accuracy to ORNNM. In some problem instances in Tables A1 and A2, ARNNM even requires only nearly half the computational time of ORNNM.
63 65 67
5 10 ARNNM ORNNM ALT FPCA sIRLS
7
11 13 15
relative RMSE
9
69 71
10
73 75 77
10
79
17
81
19 10
21
0
5
10
15
20
25
30
35
83
cputime
85
23 10
25
87 89
29 31 33
relative RMSE
27
91
10
93 95 10
35
97
ARNNM ORNNM ALT FPCA sIRLS
37 39
10
0
10
99 20
30
40
50
60
70
cputime
41 43
103 105
10
107
45
109
51 53
relative RMSE
47 49
10
111 113 10
115 ARNNM ORNNM ALT FPCA sIRLS
55 57 59 61
101
10
0
20
117 119 40
60
80
100
cputime Fig. A4. Comparison of five algorithms in noisy case (i.e., σ n ¼ 0:1). r¼ 60. The problem sizes are (a) 1000 1000, (b) 2000 2000, and (c) 3000 3000.
Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i
121 123
X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]
1
5
We further test our approach on larger matrices and larger ranks. Figs. A1 and A2 show the computation time with respect to the dimensions of the problem and the rank of the matrix respectively. It is clear that ARNNM is still faster than ORNNM in larger matrices/larger ranks cases.
7
8.2. Compared along with other algorithms
9
In this subsection, we compare the performance of ARNNM with ORNNM and three other state-of-the-art algorithms: FPCA [5], sIRLS [3], and Active ALT [4]. For these three other algorithms, we use the codes released by their authors, with parameters set by their authors. The implementation details are in Appendix. Figs. A3 and A4 show the performance of all algorithms in noiseless case and noisy case respectively. They are problem instances of Tables A1 and A2 with r¼60. The results show that ARNNM is the fastest algorithm among all. It can also be observed that FPCA do not converge in Fig. A4(a) and (b) due to the noise, while ARNNM always converges. The accuracy of ARNNM in noisy case is lower than noiseless case, but so do other algorithms, and ARNNM achieves fine accuracy among them. Compare ARNNM and ORNNM in Fig. A3, it can be observed that ARNNM descends much faster than ORNNM at first, but after then they have similar descend rate. This illustrates that the accelerated procedure has a significant effect at first, then when the top singular values of the
3
11 13 15 17 19 21 23 25 27
9
variable get closer to that of the optimum, the accelerated procedure is less useful. However, the accelerated procedure still saves a distinct amount of time on the whole.
63
9. Conclusion
67
In this paper, we have developed an accelerated re-weighted nuclear norm minimization algorithm named ARNNM to recover a low rank matrix. The proposed algorithm is an accelerated version of a state-of-the-art algorithm, which we call ORNNM in this paper. We design an accelerated procedure to make the objective function descend further at every iteration, and prove that the proposed algorithm is guaranteed to converge to a stationary point of the reweighted nuclear norm minimization problem. Numerical results show that ARNNM is significantly faster than ORNNM and several other state-of-the-art algorithms.
69
Appendix A. Implementation details for the comparison
81
The code for FPCA is downloaded from http://www. columbia.edu/ sm2756/FPCA.htm. The code for sIRLS is downloaded from http://faculty.washington.edu/mfazel/. We add our problem instances to their code, and set the parameter η in their code to 2. The code for Active ALT is downloaded from http://www.cs.utexas.edu/ cjhsieh/. We use some of their code (mainly the power method and some for matrix computation) in ours for ARNNM and
83
SR
FR
ρ
Iteration
Time
Rel.err.
Iteration
Time
Rel.err.
1000 1000
20 40 60
0.30 0.30 0.30
0.1320 0.2613 0.3880
0.050 0.040 0.030
56 74 114
7.82 11.84 23.04
5.44e 04 9.69e 04 1.55e 03
37 52 76
3.82 7.64 15.35
5.71e 04 9.92e 04 1.53e 03
2000 2000
20 40 60
0.30 0.30 0.30
0.0663 0.1320 0.1970
0.040 0.030 0.022
37 43 53
15.43 24.06 38.45
3.93e 04 5.65e 04 6.63e 04
29 35 42
10.42 19.48 32.14
4.10e 04 5.58e 04 7.13e 04
20 40 60
0.25 0.25 0.25
0.0532 0.1060 0.1584
0.030 0.020 0.015
44 52 57
39.95 62.01 83.40
4.19e 04 6.12e 04 7.81e 04
33 39 45
22.45 41.45 68.11
4.52e 04 5.73e 04 7.34e 04
3000 3000
45
87 89
97 99 101 103 105 107
ARNNM
r
SR
FR
ρ
Iteration
Time
Rel.err.
Iteration
Time
Rel.err.
1000 1000
20 40 60
0.30 0.30 0.30
0.1320 0.2613 0.3880
0.050 0.040 0.030
56 74 114
7.71 12.32 24.43
8.51e 03 6.13e 03 5.39e 03
37 52 76
3.74 7.83 16.79
8.45e 03 6.07e 03 5.32e 03
2000 2000
20 40 60
0.30 0.30 0.30
0.0663 0.1320 0.1970
0.040 0.030 0.022
37 43 53
15.65 25.71 41.34
7.61e 03 5.07e 03 4.03e 03
29 35 42
10.95 20.09 33.51
7.54e 03 4.99e 03 3.97e 03
20 40 60
0.25 0.25 0.25
0.0532 0.1060 0.1584
0.030 0.020 0.015
44 53 57
41.37 67.30 87.20
7.45e 03 4.87e 03 3.88e 03
33 39 45
23.15 43.20 72.13
7.37e 03 4.80e 03 3.76e 03
57
61
ORNNM
mn
55
59
85
111 Problem
53
79
109
Table A2 Comparison of ARNNM and ORNNM algorithm with noise (i.e., σ n ¼ 0:1).
49 51
77
ARNNM
r
41
47
ORNNM
mn
39
43
75
95 Problem
37
73
93
Table A1 Comparison of ARNNM and ORNNM algorithm without noise (i.e., σ n ¼ 0).
33 35
71
91
29 31
65
3000 3000
Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i
113 115 117 119 121 123
10
1
X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]
ORNNM. The parameters required by ARNNM or ORNNM are given in Sections 8.1 and 8.2 uses the same parameters.
3 References 5 7 9 11 13 15 17 19 21 23
[1] J. Ma, J. Xu, Y. Bao, S. Yu, Compressive sensing and its application: from sparse to low-rank regularized optimization, Signal Process. 28 (2012) 609–623. [2] K. Konishi, K. Uruma, T. Takahashi, T. Furukawa, Iterative partial matrix shrinkage algorithm for matrix rank minimization, Signal Process. 100 (2014) 124–131. [3] K. Mohan, M. Fazel, Iterative reweighted algorithms for matrix rank minimization, J. Mach. Learn. Res. 13 (1) (2012) 3441–3473. [4] C. Hsieh, P. Olsen, Nuclear norm minimization via active subspace selection, in: Proceedings of The 31st International Conference on Machine Learning, 2014, pp. 575–583. [5] S. Ma, D. Goldfarb, L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Math. Program. 128 (1–2) (2011) 321–353. [6] H. Akcay, Subspace-based spectrum estimation in frequency-domain by regularized nuclear norm minimization, Signal Process. 99 (2014) 69–85. [7] T. Takahashi, K. Konishi, T. Furukawa, Rank minimization approach to image inpainting using null space based alternating optiminzation, in: Proceedings of IEEE International Conference on Image Processing, 2012, pp. 1717–1720. [8] G. Liu, Z. Lin, Y. Yu, Robust subspace segmentation by low-rank representation, in: Proceedings of the 27th International Conference on Machine Learning, 2010, pp. 663–670.
[9] E.J. Candes, B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math. 9 (6) (2009) 717–772. [10] Z. Liu, L. Vandenberghe, Interior-point method for nuclear norm approximation with application to system identification, SIAM J. Matrix Anal. Appl. 31 (3) (2009) 1235–1256. [11] M. Fazel, Matrix rank minimization with applications (Ph.D. thesis), Stanford University, Stanford, CA, 2002. [12] M. Zhang, Z. Haihuang, Y. Zhang, Restricted p-isometry properties of nonconvex matrix recovery, IEEE Trans. Inf. Theory 59 (7) (2013) 4316–4323. [13] L. Liu, W. Huang, D.R. Chen, Exact minimum rank approximation via schatten p-norm minimization, J. Comput. Appl. Math. 267 (2014) 218–227. [14] M. Lai, Y. Xu, W. Yin, Improved iteratively reweighted least squares for unconstrained smoothed l(q) minimization, SIAM J. Numer. Anal. 51 (2) (2013) 927–957. [15] L. Kong, N. Xiu, Exact low-rank matrix recovery via nonconvex schatten p-minimization, Asia-Pac. J. Oper. Res. 30 (03). [16] Y. Li, Y. Zhang, Z. Huang, A reweighted nuclear norm minimization algorithm for low rank matrix recovery, J. Comput. Appl. Math. 263 (2014) 338–350. [17] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun. Pure Appl. Math. 57 (11) (2004) 1413–1457. [18] N. Halko, P.G. Martinsson, J.A. Tropp, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions, SIAM Rev. 53 (2) (2011) 217–288.
25
Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i
27 29 31 33 35 37 39 41 43 45 47 49