Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery

Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery

Signal Processing ] (]]]]) ]]]–]]] 1 Contents lists available at ScienceDirect 3 Signal Processing 5 journal homepage: www.elsevier.com/locate/s...

716KB Sizes 0 Downloads 66 Views

Signal Processing ] (]]]]) ]]]–]]]

1

Contents lists available at ScienceDirect

3

Signal Processing

5

journal homepage: www.elsevier.com/locate/sigpro

7 9 11 13

Fast communication

Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery

15 Q1

Xiaofan Lin n, Gang Wei

17

School of Electronic and Information Engineering, South China University of Technology, Wushan Road, Tianhe District, Guangzhou 510641, PR China

19 21 23 25

a r t i c l e i n f o

abstract

Article history: Received 16 August 2014 Received in revised form 3 November 2014 Accepted 5 February 2015

In this paper we propose an accelerated reweighted nuclear norm minimization algorithm to recover a low rank matrix. Our approach differs from other iterative reweighted algorithms, as we design an accelerated procedure which makes the objective function descend further at every iteration. The proposed algorithm is the accelerated version of a state-of-the-art algorithm. We provide a new analysis of the original algorithm to derive our own accelerated version, and prove that our algorithm is guaranteed to converge to a stationary point of the reweighted nuclear norm minimization problem. Numerical results show that our algorithm requires distinctly fewer iterations and less computational time than the original one to achieve the same (or very close) accuracy, in some problem instances even require only about 50% computational time of the original one, and is also notably faster than several other state-of-the-art algorithms. & 2015 Published by Elsevier B.V.

27 29 31

Keywords: Matrix rank minimization Matrix completion Compressed sensing

33 35

63

37

65

39

1. Introduction

41

There is a rapidly growing interest in the low rank matrix recovery problem or the rank minimization problem, which recovers an unknown low rank matrix form very limited information [1–5]. This paper deals with the following affine rank minimization problem:

43 45 47 49 51 53 55 57 59

min rankðXÞ X

s:t: AðXÞ ¼ b;

ð1Þ

where the linear map A: Rmn -Rs and the vector b A Rs are known. The above problem aims to find a matrix of minimum rank that satisfies a given system of linear equality constraints, which is a useful idea that can be applied to various applications such as signal or image processing [6,7], subspace segmentation [8], collaborative filtering [9] and system n

Corresponding author. E-mail address: [email protected] (X. Lin).

identification [10]. Although (1) is simple in form, it is a challenging optimization problem due to the discrete nature of the rank function. A commonly used heuristic introduced in [11] is replacing the rank function with the nuclear norm, which is the sum of the singular values of the matrix. This technique is based on the fact that the nuclear norm minimization is the tightest convex relaxation of the rank minimization problem. The new formula can be given by min jjXjjn X

s:t: AðXÞ ¼ b;

ð2Þ

67 69 71 73 75 77

which can be rewritten as follows under some conditions (recall the Lagrangian relaxation):   ð3Þ min λX jn þ 12 ‖AðXÞ b‖22 ;

79

P where jjXjjn ≔ ri ¼ 1 σ i ðXÞ denotes the nuclear norm. However, the nuclear norm minimization requires more measurements for exact recovery of the low rank solution. Recently, a tighter relaxation, called the Schatten

83

X

http://dx.doi.org/10.1016/j.sigpro.2015.02.004 0165-1684/& 2015 Published by Elsevier B.V.

61 Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i

81

85 87

X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]

2

1 3 5 7 9 11 13 15

p-norm1 minimization [12–14], was introduced instead of the previous nuclear norm minimization in order to give a better approximation to the original problem. The Schatten p-norm with 0 op r1 is defined as !1=p r X jjXjjp ≔ σ i ðXÞp : ð4Þ i¼1

It is easy to see that when p¼1, the Schatten p-norm is exactly the nuclear norm, and when p tends towards 0, the Schatten p-norm becomes closer to the rank function. Thus, Schatten p-norm is more general than the nuclear norm. Theoretical analysis in [12,13,15] shows that the Schatten p-norm needs fewer measurements with small p for exact recovery. With the above definition, the Schatten p-norm minimization can be formulated as follows:

17

min P0 ðX Þ≔λp‖X‖pp þ 12 ‖AðXÞ  b‖22 ;

19

When 0 o p o 1, the above problem is nonconvex. To solve this nonconvex problem efficiently, some iterative reweighted algorithms have been proposed and analyzed in recent published literature [2,3,14]. The key idea of iterative reweighted technique is to solve a convex problem with a given weight at each iteration, and update the weight at every turn. This idea is commonly used to solve Schatten p-norm or Lp norm minimization. In this paper we propose a novel iterative reweighted algorithm that solves the Schatten p-norm minimization problem (i.e., (5)) very efficiently. Different from other iterative reweighted algorithms, we add an accelerated procedure which makes the objective function descend further at every iteration. The proposed algorithm is the accelerated version of a state-of-the-art algorithm, called the reweighted nuclear norm minimization (RNNM) algorithm [16]. However, our accelerated version is notably faster than the original one, and achieves the same (or very close) accuracy. The rest of this paper is organized as follows. In Section 2, we give some preliminary results that are used in our analysis. In Section 3, we give our own analysis of the original RNNM algorithm, and Section 4 uses this analysis to derive a sketch of the accelerated version. Some algorithm details are discussed in Sections 5 and 6. The convergence analysis, which proves that our algorithm is guaranteed to converge to a stationary point of P0 ðXÞ (see (5)), is given in Section 7. Numerical results are reported in Section 8. In Section 9, we give some conclusions.

21 23 25 27 29 31 33 35 37 39 41 43 45

X

ð5Þ

with ε 4 0. Here we assume m rn without loss of generality. Compare (7) with (4), one can see that with ε tends to 0, (7) will get closer to (4). The following definition will be used frequently in this paper. Definition 1. Let X ¼ U ΣV be the reduced singular value decomposition (SVD) of X with rank r. For each v Z0, the weighted singular value shrinkage operator Dv ð; wÞ is defined by T

53

min PðX; εÞ≔λ

55 57

X

where X A R jjXjjp;ε ≔

p‖X‖pp;ε þ 12 ‖AðXÞ  b‖22 ;

mn

m X

ð6Þ

and !1=p

ðσ i ðXÞ þ εÞp

:2

ð7Þ

i¼1

59 1

61

Strictly speaking, the Schatten p-norm is not a norm when 0 o p o 1 since it is nonconvex. However it is called a norm customarily.

69 71

where ðÞ þ ≔maxfa; 0g for any a A R; wi , i ¼ 1; 2; …; r are elements of vector w; σ i ðXÞ, i ¼ 1; 2; …; r, are singular values of X.

75

With Definition 1, we give the following lemma, which has been proved in [16]. Lemma 1. For each v Z 0 and a constant matrix Y A Rmn , the weighted singular value shrinkage operator Dv ðY; wÞ is the optimal solution of the following problem: minv X

m X

1 wi σ i ðX Þ þ ‖X Y‖2F : 2 i¼1

77 79 81 83 85 87

For convenience, the abbreviation ORNNM refers to the original RNNM algorithm that we aim to accelerate in the sequel.

In this paper, we propose an accelerated procedure to improve the performance of ORNNM. This algorithm has been analyzed in [16]. However, to derive our accelerated version, here we give our own analysis. Inspired by [17,4], where auxiliary functions for analyzing the L1 norm/Schatten p-norm minimization were constructed, we define a different auxiliary function as follows: Q ðX; Y; w; εÞ ¼ FðX; w; εÞ þ GðX; YÞ

ð8Þ

where FðX; w; εÞ≔λ

m  X

p=ðp  1Þ Þ þð1  pÞwi

pwi ðσ i ðXÞ þ ε

i¼1

1 þ ‖AðXÞ  b‖22 2

1 1 GðX; Y Þ≔  ‖AðX  YÞ‖22 þ ‖X  Y‖2F : 2 2

89 91 93

3. A new analysis for original RNNM algorithm

95 97 99 101 103 105



107 ð9Þ

109 111

and

51

67

73

2. Preliminaries We first introduce the following unconstrained smooth nonconvex problem to approximate (5):

65

Dv ðX; wÞ ¼ U Diagfðσ i ðXÞ  vwi Þ þ ; i ¼ 1; 2; …; rgV T ;

47 49

63

ð10Þ

113

In the following, we illustrate that minimizing Q ðX; Y; w; εÞ will lead to minimizing P0 ðXÞ (see (5) for definition). First, it can be verified that wn , whose entries n wi ¼ ðσ i ðXÞ þ εÞp  1 , i ¼ 1; 2; …; m, minimizes FðX; w; εÞ over w, since wni ¼ ðσ i ðXÞ þ εÞp  1 is an optimal solution of the

115

2 Strictly speaking, this is not a norm. However we preserve the norm notation for conventional reason, since the Schatten p-norm is neither a norm but it is still common to use a norm notation for it in the literature.

Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i

117 119 121 123

X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]

1

p=ðp  1Þ Þ þð1  pÞwi :

min pwi ðσ i ðXÞ þ ε

ð11Þ

wi 4 0

5

Therefore, substituting wni ¼ ðσ i ðXÞ þ εÞp  1 , i ¼ 1; …; m into (9), we have min FðX; w; εÞ ¼ λ w

9 11 13 15 17 19 21

‖AðXÞ  b‖22 ‖AðX  YÞ‖22 þ ‖X Y‖2F

    ¼ 2 X  Y; ∇f ðYÞ þ ∇f ðYÞ; ∇f ðYÞ þ hX  Y; X  Y i   ¼ X  Y þ∇f ðYÞ; X Y þ ∇f ðYÞ ¼ ‖X  Y þ ∇f ðYÞ‖2F :

m X

1 ðσ i ðXÞ þ εÞp þ ‖AðXÞ  b‖22 2 i¼1

Z λp

63

Therefore,

following convex problem:

3

7

3

65 67 ð17Þ

With (8)–(10) and (17), (8) can be further formulated as follows:

m X

1 ðσ i ðXÞ þ εÞp þ ‖AðXÞ  b‖22 2 i¼1

¼ PðX; εÞ;

ð12Þ

where the inequality follows from the fact that 0 op r1. Here the multiplier p is necessary because we focus on solving (6), whose stationary point the ORNNM algorithm converges to. Secondly, it can also be verified that for a fixed Y, the second derivative of GðX; YÞ is HX ¼ ðI An AÞ  I , where  denotes the Kronecker product. If jjAn Ajj r1, then the corresponding Hessian matrix of HX is positive semidefinite, which means GðX; YÞ Z 0. In that case,

23

Q ðX; Y; w; εÞ ZFðX; w; εÞ:

25

The equality holds when Y ¼ X. Combining (12) and (13), we have the following lemma.

27

Lemma 2. If jjAn Ajjr 1, then the following inequalities hold

29

Q ðX; Y; w; εÞ ZFðX; w; εÞ ZPðX; εÞ ZP0 ðXÞ:

ð13Þ

ð14Þ

1 Q ðX; Y; w; εÞ ¼ ‖X ðY ∇f ðYÞÞ‖2F 2 m   X p=ðp  1Þ : þλ pwi ðσ i ðXÞ þ εÞ þ ð1  pÞwi

35 37 39 41 43 45 47 49 51 53 55 57

61

73 75

ð18Þ By Lemma 1, for a fixed Y , w and ε , the minimum of Q ðX; Y; w; εÞ over X is X nk þ 1 ¼ Dλp ðY k  ∇fðY k Þ; wk Þ.3 By setting Y ¼ X k we have GðX k ; YÞ ¼ 0 and Q ðX k ; Y; wk ; εk Þ ¼ FðX k ; wk ; εk Þ. Therefore, the minimum of Q ðX; Y; w; εÞ over Y is Y nk þ 1 ¼ X k . By (11), the minimum of Q ðX; Y; w; εÞ over w is wnk þ 1 , whose entries wni ¼ ðσ i ðXÞ þ εÞp  1 , i ¼ 1; 2; …; m. From (9) we can see that if the sequence of εk decreases, Q ðX; Y; w; εk Þ will decrease. The above statements indicate a block-coordinate descent algorithm to minimize Q ðX; Y; w; εÞ. Therefore, according to Lemma 2, a block-coordinate descent algorithm, outlined in Algorithm 1, can be derived to solve (5). It can be observed that Algorithm 1 is equivalent to ORNNM in its original paper [16]. k

77

k

k

79 81 83 85 87 89 91 93

The last part of (14) can be easily verified from the definition of PðX; εÞ and P0 ðXÞ. Without loss of generality, we assume the condition jjAn Ajj r1 always holds, since A can always be renormalized. In Section 8 where algorithms are compared by solving the matrix completion problem, jjAn Ajjr 1 holds since A is a projector. Due to Lemma 2, we would expect minimizing P0 ðXÞ by minimizing Q ðX; Y; w; εÞ. Next, we illustrate that the ORNNM algorithm is a block-coordinate descent algorithm to minimize function Q ðX; Y; w; εÞ. By basic matrix computations, we have

Algorithm 1. Equivalent ORNNM algorithm. Input: linear map A: Rmn -Rs , vector bA Rs , p A ð0; 1Þ, λ; ε0 4 0, η A ð0; 1Þ. Output: matrix X n A Rmn . Initialize: Y 0 ¼ 0, k≔0, w0 ¼ 0; Loop:

T

T

99

2. Y k þ 1 ¼ X k þ 1 ; 3. wki þ 1 ¼ ðσ i ðX k þ 1 Þþ εk Þp  1 ;

103

4. X n ¼ X k þ 1 if converged, otherwise εk þ 1 ¼ ηεk , k≔k þ 1, go to Step 1.

105

4. Sketch of accelerated RNNM algorithm 109 In this section, we point out a way to accelerate the ORNNM algorithm according to the analysis in last section, and give a sketch of the proposed accelerated version. Assume Q ðX k ; Y k ; wk ; εk Þ is generated by Algorithm 1 at the end of kðk Z1Þth iteration, then according to (13),

T

¼  2ðAðXÞÞ b þ b b þ 2ðAðXÞÞ AðYÞ  ðAðYÞÞ AðYÞ ¼ 2ðAðX YÞÞT ðAðYÞ bÞ þ ðAðYÞ  bÞT ðAðYÞ  bÞ     ¼ 2 AðX  YÞ; AðYÞ  b þ AðYÞ b; AðYÞ  b   ¼ 2 X  Y; An ðAðYÞ bÞ  n  þ A ðAðYÞ bÞ; An ðAðYÞ bÞ ;

97

107

T

¼ ðAðXÞÞT AðXÞ  2ðAðXÞÞT b þ b b    ðAðXÞÞT AðXÞ  2ðAðXÞÞT AðYÞ þ ðAðYÞÞT AðYÞ T

95

101

1. X k þ 1 ¼ Dλp ðY k  ∇f ðY k Þ; wk Þ;

‖AðXÞ  b‖22 ‖AðX  YÞ‖22

ð15Þ

where h; i denotes the inner product operator. Denote ∇f ðYÞ ¼ An ðAðYÞ bÞ as the gradient of function f ðY Þ≔12‖AðYÞ b‖22 ,

59

71

i¼1

31 33

69

Q ðX k ; Y k ; wk ; εk Þ ¼ FðX k ; wk ; εk Þ:

ð19Þ

In Step 1 of ðk þ 1Þth iteration, Q ðX

kþ1

k

k

k

k

k

113 115 117

; Y ; w ; ε Þ r Q ðX ; Y ; w ; ε Þ: k

111

k

ð20Þ

119

then

‖AðXÞ  b‖22 ‖AðX  YÞ‖22     ¼ 2 X  Y; ∇f ðYÞ þ ∇f ðYÞ; ∇f ðYÞ :

ð16Þ

3 In this paper, the superscript k (or k þ 1) and the subscript k (or k þ 1) have the same meaning. We place k (or k þ 1) in different position so that the notations are more clear.

Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i

121 123

X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]

4

3 5 7 9 11 13

Q ðX

19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61

kþ1

;Y

kþ1

; w ; ε Þ ¼ Q ðX k

k

kþ1

;X

kþ1

;w ;ε Þ k

k

¼ FðX k þ 1 ; wk ; εk Þ rQ ðX k þ 1 ; Y k ; wk ; εk Þ:

ð21Þ

It should be noted that choosing Y k þ 1 ¼ X k þ 1 is not the only option in Step 2, since in next iteration, X k þ 2 ¼ Dλp ðY k þ 1 ∇f ðY k þ 1 Þ; wk þ 1 Þ always minimizes Q ðX; Y k þ 1 ; wk þ 1 ; εk þ 1 Þ over X given any Y k þ 1 . In fact, by choosing a kþ1 kþ1 ¼ X^ satisfying Y^ FðX^

kþ1

; wk ; εk Þ r FðX k þ 1 ; wk ; εk Þ;

ð22Þ

we obtain Q ðX^

15 17

Approximated SVD of matrix A  U ΣV T . 1. Q ’QRðARÞ; 2. for t ¼ 1; 2; …; T max do

In Step 2 of ðk þ 1Þth iteration,

1

kþ1

; Y^

kþ1

; wk ; εk Þ rQ ðX k þ 1 ; Y k þ 1 ; wk ; εk Þ:

ð23Þ

kþ1

FðX

kþ1

kþ1 kþ1 ; wk ; εk Þ ¼ Q ðX^ ; Y^ ; wk ; εk Þ;

; w ; ε Þ ¼ Q ðX k

k

kþ1

;Y

kþ1

; w ; ε Þ: k

k

Eq. (23) means function Q ðX; Y; wk ; εk Þ descends further kþ1 by choosing X ¼ Y ¼ X^ at Step 2 than the original option that chooses X ¼ Y ¼ X k þ 1 . According to the above analysis, a sketch of an accelerated reweighted nuclear norm minimization (ARNNM) algorithm can be derived. The algorithm sketch is outlined in Algorithm 2. Besides, we have the following theorem. Theorem 1. The ARNNM algorithm (i.e., Algorithm 2) descends further than the ORNNM algorithm (i.e., Algorithm 1) at every iteration. Algorithm 2. Sketch of accelerated RNNM (ARNNM) algorithm. Input: linear map A: Rmn -Rs , vector b A Rs , p A ð0; 1Þ, λ; ε0 40, η A ð0; 1Þ. Output: matrix X n A Rmn . Initialize: Y 0 ¼ 0, k≔0, w0 ¼ 0; Loop: 1. X k þ 1 ¼ Dλp ðY k  ∇f ðY k Þ; wk Þ; 2. choose Y

kþ1

kþ1 kþ1 ¼ X^ such that FðX^ ; wk ; εk Þr FðX k þ 1 ; wk ; εk Þ;

kþ1 3. X k þ 1 ’X^ ;

4.

wki þ 1

¼ ðσ i ðX

kþ1

Þþε Þ

k p1

;

5. Power method for approximated SVD In both ARNNM and ORNNM algorithm, computing SVD for a large matrix is required in Step 1 and is usually expensive. Inspired by [4,18], we use the power method [18] to compute an approximated SVD, and use the eigenvectors in the previous iteration to initialize the power method. This approach reduces the computational time for SVD. The power method is summarized in Algorithm 3. Algorithm 3. Power method (approximated SVD method for Step 1 of Algorithm 2). Input: Input matrix A, rank rr, initial R A Rnrr Output:

67

6. U ¼ Q U^ ;

69

The power method is not part of the accelerated procedure proposed in this paper. In Section 8 where various algorithms are compared, we apply the power method to both ORNNM and ARNNM algorithm. 6. Implementation of Step 2 of ARNNM

71 73 75 77

kþ1

How to obtain X^ efficiently is a key to implement Step 2 of Algorithm 2. Solving the following problem is an option kþ1 ¼ arg min FðX; wk ; εk Þ: X^

ð24Þ

X

However, since X is usually a large variable in practice, the above problem is non-scalable and expensive. A better kþ1 option is adjusting the top singular values of X^ . Since kþ1 ^ X is usually a low-rank matrix in practice, the problem size is significantly reduced compared to (24). Let X k þ 1 ¼ U k þ 1 Diagðσ k þ 1 ÞV Tk þ 1 Xk þ 1 ¼

r1 X

σ ki þ 1 ui vTi þ

i¼1

rr X

σ kj þ 1 uj vTj

81 83 85 87 89 91

ð26Þ

93

be the reduced SVD of X k þ 1 . Here σ ki þ 1 ; i ¼ 1; …; r 1 are the top r 1 ðr 1 orrÞ singular values, and σ kj þ 1 , j ¼ r 1 þ 1; …; rr the rest. σ k þ 1 A Rrr is a vector whose entries are σ ki þ 1 ; i ¼ 1; …; rr, and Diagðσ k þ 1 Þ A Rrrrr is a diagonal matrix whose diagonal entries are σ ki þ 1 , i ¼ 1; …; rr. Define function 0 1 r1 rr X X T k þ 1 T k k ð27Þ σ i ui v i þ σ j uj v j ; w ; ε A Fðσ Þ ¼ F@

95 97 99 101 103

j ¼ r1 þ 1

with σ ¼ vecðσ i ; i ¼ 1; …; r 1 Þ A Rr1 being variable and the rest (i.e., σ kj þ 1 , j ¼ r 1 þ 1; …; rr, ui , vi , i ¼ 1; …; rr, wk , εk ) being constants. We propose the following approach to kþ1 obtain X^ :

σ^ k þ 1 ¼ arg min Fðσ Þ

105 107 109

ð28Þ

σ A Rr1 ;r 1 o rr

σ k þ 1 ½1: r1 ’σ^

79

ð25Þ

j ¼ r1 þ 1

i¼1

5. X n ¼ X k þ 1 if converged, otherwise εk þ 1 ¼ ηεk , k≔k þ 1, go to Step 1.

65

3. Q ’QRðAAT Q Þ; 4. end 5. ½U^ ; Σ; V T  ¼ SVDðQ T AÞ;

The above inequality follows from the fact that FðX^

63

111

kþ1

kþ1 ’U k þ 1 Diagðσ k þ 1 ÞV Tk þ 1 : X^

ð29Þ

113

ð30Þ

115

In (28)–(30), σ ; σ^ k þ 1 A R . Eq. (29) means replacing the kþ1 first r1 elements of σ k þ 1 with σ^ . Here we choose r 1 orr to further reduce the problem size. Since small singular values do not much affect the result, computing them is unnecessary and will cost more time. Note that (28) is a quadratic convex problem, and the scale of variable σ is small, thus (28)–(30) can be computed efficiently. r1

Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i

117 119 121 123

X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]

1 3

7

11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57

j ¼ r1 þ 1

i¼1

7. Convergence results ð31Þ

It can be derived from (31) that the gradient and Hessian of function Fðσ Þ are respectively ∇Fσ i ¼ uTi An ðAðU k þ 1 Diagðσ ÞV Tk þ 1 Þ  bÞvi þ λpwi ; i ¼ 1; 2; …; r 1

ð32Þ

and

  T  Aðuj vTj Þ ; H σ i;j ¼ Aðui vTi Þ

i; j ¼ 1; 2; …; r 1 :

ð33Þ

With (32) and (33), the Newton method can be used to solve (28) in only one iteration (since the function is quadratic). However, for a relatively large r1, computing every element of Hessian matrix with (33) is still not efficient enough. To avoid this problem, we take advantage of the following fact: !!T !! r1 r1 X X T d Hσ d ¼ A di ui vTi A di uj vTj ; ð34Þ i¼1

i¼1 r 1 r 1

r1

is the where d A R is a given vector and H σ A R Hessian matrix whose i; jth entry is H σ i;j . Computing (33) needs about Oðr 21 ðmn þ smn þ s2 ÞÞ order, while (34) needs about Oðr 1 mn þ smn þ s2 Þ. Clearly, (34) can be computed more efficiently than (33). Thus, we propose the following iterative approach to solve (28):

β’

∇FTσ ∇Fσ

∇FTσ H σ ∇Fσ

;

ð35Þ

σ ’σ þ β∇Fσ :

ð36Þ

The above approach is known as the steepest descend method, and (35) computes the best step length for the quadratic problem (28) in each iteration. The denominator in (35) can be computed applying (34). By using the above approach, the implementation of Step 2 of Algorithm 2 is derived and outlined in Algorithm 4. Algorithm 4. Implementation of Step 2 of Algorithm 2. Input: linear map A: R

mn

-R , vector bA R , p A ð0; 1Þ, λ; ε 4 0, w , s

s

k

k

r 1 o rr, X k þ 1 and its reduced SVD U k þ 1 Diagðσ k þ 1 ÞV Tk þ 1 . Output: kþ1 A Rmn . matrix X^

1. Initialize vector σ with σ i ¼ σ ki þ 1 , i ¼ 1; 2; …; rr; 2. for iter ¼ 1; 2; …; t max do 3. Update ∇Fσ by (32); 4. Update β by (35); 5. Update σ by (36); 6. end kþ1 7. Set σ^ ¼ σ; kþ1 by (29) and (30); 8. Compute X^

67

In this section, we discuss the convergence of the ARNNM algorithm (i.e., Algorithm 2). Here the convergence of an algorithm means that the sequence generated by the algorithm gets arbitrarily close to a limit point. Lemma 3. The ARNNM algorithm (i.e., Algorithm 2) generates a non-increasing sequence of function P0 ðXÞ (see (5) for definition). Proof. Consider the k þ1ðk Z1Þth iteration of Algorithm 2. After Step 2, by (22) we have FðX^

kþ1

; wk ; εk Þ rFðX k þ 1 ; wk ; εk Þ r FðX k ; wk ; εk Þ;

ð37Þ

kþ1 Then, after Step 3, X k þ 1 equals X^ . According to (12), after Step 4 we have

PðX k þ 1 ; εk Þ r PðX k ; εk Þ;

ð38Þ

Since fεk g is a decreasing sequence, kþ1



kþ1

69 71 73 75 77 79 81 83 85

; ε Þ; k

ð39Þ

87

Combine (38) and (39), we conclude that Algorithm 2 generates a non-increasing sequence of function PðX; εÞ. According to the fact that

89

PðX

Þ rPðX

kþ1

P0 ðXÞ r PðX; εÞ;

ð40Þ

we further conclude that Algorithm 2 generates a nonincreasing sequence of function P0 ðXÞ. □ Theorem 2. If sequence fεk g converges to zero, then the sequence fX k g generated by ARNNM algorithm (i.e., Algorithm 2) converges to a stationary point of function P0 ðXÞ (see (5) for definition). When p¼1, The sequence converges to the global optimum of function P0 ðXÞ. Proof. By definition, function P0 ðXÞ is bounded below. Since the sequence fX k g generated by Algorithm 2 is non-increasing (by Lemma 3), there exists a subsequence fX sk g that converges to a limit point X . For convenience we denote sk by k in the following. Consider the ðk þ 1Þth ðk Z 1Þ iteration in Algorithm 2. kþ1 , and according According to Step 3, X k þ 1 comes from X^ kþ1

. Thus, fX k g converges to X means to Step 2, Y k þ 1 ¼ X^ fY k g also converges to X . Besides, since σ i ðÞ is a continuous function, it can be observed from Algorithm 2 that fwk g

91 93 95 97 99 101 103 105 107 109 111

εmin is

113

ð41Þ

115

Since Dλp ð; Þ is a continuous function, eventually Dλp ðY k  ∇f ðY k Þ; wk Þ converges to Dλ ðX ∇f ðX Þ; wÞ. Thus, the following formula holds according to Step 1 of Algorithm 2:

117

converges to w, where w i ¼ ðσ i ðX Þ þ εmin Þ

p1

and

the limit point of sequence fεk g. When εmin ¼ 0, we have w i ¼ ðσ i ðX ÞÞp  1 :

119 121

59 61

63 65

 0  1   r1 rr X  1 @ X T kþ1 TA σ i ui v i þ σ j uj vj  bj22 þ jA 2  i¼1 j ¼ r1 þ 1

5

9

iteration of steepest descend method. In our experiments, one or two iteration is enough for Algorithm 4.

Eq. (28) can be further formulated as follows: 0 1 r1 rr X X kþ1 k þ 1 σ^ ¼ arg min λp@ wi σ i þ wj σ j A σ A Rr 1

5

Note that even one iteration in Algorithm 4 can result in accelerating effect, since (22) is satisfied with one

X ¼ Dλp ðX ∇f ðX Þ; wÞ:

ð42Þ

Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i

123

X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]

6

3 5 7 9

By Lemma 1, X is an optimal solution of the following problem: min λp X

r X

  1 w i σ i ðX Þ þ ‖X  X  ∇f ðX Þ ‖22 ; 2 i¼1

where w i ¼ σ i ðX Þp  1 . Therefore, ! r X   0 A λp∂ w i σ i ðX Þ þ X  X  ∇f ðX Þ :

ð43Þ

13

Combining (41) and (44), we obtain ! r X p σ i ðX Þ þAn ðAðX Þ  bÞ; 0 A λp∂

ð44Þ

17

ð45Þ

i¼1

15

which means X is also a stationary point of P0 ðXÞ. When p¼1, P0 ðXÞ is convex. In this case X is the global optimum of function P0 ðXÞ. □

19 8. Numerical result 21

25

In this section, we compare the proposed ARNNM algorithm with the ORNNM algorithm and several state-ofthe-art algorithms. In the experiment, we solve the following matrix completion problem:

27

min λ

23

29 31 33 35 37 39 41 43 45 47

X

p‖X‖pp þ 12 ‖P Ω ðXÞ P Ω ðMÞ‖22 ;

63

In this subsection, we compare the proposed ARNNM algorithm with the original one. The following stopping criterion is applied to both algorithms:

65

jjX k  X k þ 1 jjF

69

maxf1; jjX k jjF g

i¼1

11

8.1. Compared with the ORNNM algorithm

1000

r~ k ’arg max σ rk ðX k Þ 4 ρ  σ 1 ðX k Þ rk A Z þ

r~ ¼ minfm; n; r~ k g;

ð47Þ

where ρ is a parameter chosen manually. Eq. (47) is executed in every iteration, and the first iteration needs a parameter rmax to initialize r~ . This approach has also been commonly applied in several other papers such as [3,5]. In our experiments, we follow the same rank estimation approach. We also use the relative error

to measure the recovery level, as [16] does. The synthetic data is generated by

53

M ¼ M L M TR þ σ n Ξ

57 59 61

75 77 79 81 83 85

ARNNM ORNNM

87 89 91

700 600

93

500

95

400

97

300 200 4000

5000

6000

7000

8000

99

demension Fig. A1. Computation time vs. dimension of the problem. The labels of the x-axis represent one dimension of the square matrix, for example, label “4000” means that the matrix is of 4000  4000.

900

101 103 105

1000 ARNNM ORNNM

107

800

109

700

ð48Þ

ð49Þ

with M L A Rmr , M R A Rnr and Ξ A Rmn whose entries generated form i.i.d. standard Gaussian distribution. Here Ξ is the noisy matrix   and σn is the noisy level. The  sampling ratio sr ¼ Ω and degree of freedom  =ðmnÞ   the  ratio fr ¼ rðm þ n  rÞ=Ω, where Ω is the sample number. All the experiments are performed on a PC with an Intel Core i5, 2.5 GHz processor with 4 GB of RAM.

cputime

jjX  MjjF rel:err≔ o ε: maxf1; jjMjjF g

73

900

ð46Þ

which is problem (5) with A ¼ P Ω and b ¼ P Ω ðMÞ. Here P Ω is a projective operator, Ω is a random subset of index set, and M A Rmn is the matrix to be recovered with sampled entries fM ij ; ði; jÞ A Ωg known. The approximated SVD approach such as truncated SVD or the power method used in this paper reduces the computational time significantly. Therefore, the choice of estimated rank r~ is important for speeding up the algorithm. In [16] where the ORNNM algorithm is proposed, the following rank estimation approach is applied:

67

71

800

51

55

ð50Þ

1100

k

49

o ε:

Note that the approximated SVD and rank estimation approaches in the implementations of these two algorithms are the same, the only difference between them is that ARNNM has the additional accelerated approach, i.e., Steps 2 and 3 in Algorithm 2. The rank estimation approach has a great effect on final recovery accuracy. If the final estimated rank is not precise, the relative error is usually larger than 1e3 in our tests. To make a fair comparison, we tune the parameter ρ so that the final estimated rank is precise in every run of this subsection. The rest parameters are set as follows: ε ¼ 1e  4, λ ¼ 1e  6, p¼1, r 1 ¼ 10, r max ¼ 100, ε0 ¼ 10, η ¼ 0:9, t max ¼ 1.

cputime

1

111

600 500

113

400

115

300

117 200 100 50

119 100

150

200

250

rank Fig. A2. Computation time vs. rank of the matrix. Here the dimension of the matrix is 4000  4000.

Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i

121 123

X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]

1 3

We test the algorithms on noiseless synthetic data and after then noisy synthetic data. Table A1 shows the performance of both algorithms in noiseless case. It can be observed

7

that the proposed ARNNM algorithm achieves close accuracy to ORNNM algorithm with distinctly fewer iterations and less computational time. Table A2 shows the performance of both

10

7

relative RMSE

15

69

ARNNM ORNNM ALT FPCA sIRLS

9

13

65 67

5

11

63

71 73

10

75 77 10

17

79

19

81

21

10

0

5

10

15

20

25

30

23 25

87

10 ARNNM ORNNM ALT FPCA sIRLS

relative RMSE

29

33 35

83 85

27

31

35

cputime

89 91

10

93 95 97

10

99

37

101

39 10

41

0

10

20

30

40

50

60

cputime

103 105

43 10

107

47

109

49 51 53

relative RMSE

45

10

111 113 115

10

55 57 59 61

117

ARNNM ORNNM ALT FPCA sIRLS

10

0

20

119 40

cputime

60

80

100

Fig. A3. Comparison of five algorithms in noiseless case (i.e., σ n ¼ 0). r ¼60. The problem sizes are (a) 1000  1000, (b) 2000  2000, and (c) 3000  3000.

Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i

121 123

X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]

8

1 3

algorithms in noisy case. Compared with Table A1, one can see that the accuracy of both algorithms descends due to the noise. However, ARNNM is still faster than ORNNM, and

achieves close accuracy to ORNNM. In some problem instances in Tables A1 and A2, ARNNM even requires only nearly half the computational time of ORNNM.

63 65 67

5 10 ARNNM ORNNM ALT FPCA sIRLS

7

11 13 15

relative RMSE

9

69 71

10

73 75 77

10

79

17

81

19 10

21

0

5

10

15

20

25

30

35

83

cputime

85

23 10

25

87 89

29 31 33

relative RMSE

27

91

10

93 95 10

35

97

ARNNM ORNNM ALT FPCA sIRLS

37 39

10

0

10

99 20

30

40

50

60

70

cputime

41 43

103 105

10

107

45

109

51 53

relative RMSE

47 49

10

111 113 10

115 ARNNM ORNNM ALT FPCA sIRLS

55 57 59 61

101

10

0

20

117 119 40

60

80

100

cputime Fig. A4. Comparison of five algorithms in noisy case (i.e., σ n ¼ 0:1). r¼ 60. The problem sizes are (a) 1000  1000, (b) 2000  2000, and (c) 3000  3000.

Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i

121 123

X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]

1

5

We further test our approach on larger matrices and larger ranks. Figs. A1 and A2 show the computation time with respect to the dimensions of the problem and the rank of the matrix respectively. It is clear that ARNNM is still faster than ORNNM in larger matrices/larger ranks cases.

7

8.2. Compared along with other algorithms

9

In this subsection, we compare the performance of ARNNM with ORNNM and three other state-of-the-art algorithms: FPCA [5], sIRLS [3], and Active ALT [4]. For these three other algorithms, we use the codes released by their authors, with parameters set by their authors. The implementation details are in Appendix. Figs. A3 and A4 show the performance of all algorithms in noiseless case and noisy case respectively. They are problem instances of Tables A1 and A2 with r¼60. The results show that ARNNM is the fastest algorithm among all. It can also be observed that FPCA do not converge in Fig. A4(a) and (b) due to the noise, while ARNNM always converges. The accuracy of ARNNM in noisy case is lower than noiseless case, but so do other algorithms, and ARNNM achieves fine accuracy among them. Compare ARNNM and ORNNM in Fig. A3, it can be observed that ARNNM descends much faster than ORNNM at first, but after then they have similar descend rate. This illustrates that the accelerated procedure has a significant effect at first, then when the top singular values of the

3

11 13 15 17 19 21 23 25 27

9

variable get closer to that of the optimum, the accelerated procedure is less useful. However, the accelerated procedure still saves a distinct amount of time on the whole.

63

9. Conclusion

67

In this paper, we have developed an accelerated re-weighted nuclear norm minimization algorithm named ARNNM to recover a low rank matrix. The proposed algorithm is an accelerated version of a state-of-the-art algorithm, which we call ORNNM in this paper. We design an accelerated procedure to make the objective function descend further at every iteration, and prove that the proposed algorithm is guaranteed to converge to a stationary point of the reweighted nuclear norm minimization problem. Numerical results show that ARNNM is significantly faster than ORNNM and several other state-of-the-art algorithms.

69

Appendix A. Implementation details for the comparison

81

The code for FPCA is downloaded from http://www. columbia.edu/  sm2756/FPCA.htm. The code for sIRLS is downloaded from http://faculty.washington.edu/mfazel/. We add our problem instances to their code, and set the parameter η in their code to 2. The code for Active ALT is downloaded from http://www.cs.utexas.edu/ cjhsieh/. We use some of their code (mainly the power method and some for matrix computation) in ours for ARNNM and

83

SR

FR

ρ

Iteration

Time

Rel.err.

Iteration

Time

Rel.err.

1000  1000

20 40 60

0.30 0.30 0.30

0.1320 0.2613 0.3880

0.050 0.040 0.030

56 74 114

7.82 11.84 23.04

5.44e  04 9.69e  04 1.55e  03

37 52 76

3.82 7.64 15.35

5.71e  04 9.92e  04 1.53e  03

2000  2000

20 40 60

0.30 0.30 0.30

0.0663 0.1320 0.1970

0.040 0.030 0.022

37 43 53

15.43 24.06 38.45

3.93e  04 5.65e  04 6.63e  04

29 35 42

10.42 19.48 32.14

4.10e  04 5.58e  04 7.13e  04

20 40 60

0.25 0.25 0.25

0.0532 0.1060 0.1584

0.030 0.020 0.015

44 52 57

39.95 62.01 83.40

4.19e  04 6.12e  04 7.81e  04

33 39 45

22.45 41.45 68.11

4.52e  04 5.73e 04 7.34e  04

3000  3000

45

87 89

97 99 101 103 105 107

ARNNM

r

SR

FR

ρ

Iteration

Time

Rel.err.

Iteration

Time

Rel.err.

1000  1000

20 40 60

0.30 0.30 0.30

0.1320 0.2613 0.3880

0.050 0.040 0.030

56 74 114

7.71 12.32 24.43

8.51e  03 6.13e  03 5.39e  03

37 52 76

3.74 7.83 16.79

8.45e  03 6.07e 03 5.32e  03

2000  2000

20 40 60

0.30 0.30 0.30

0.0663 0.1320 0.1970

0.040 0.030 0.022

37 43 53

15.65 25.71 41.34

7.61e  03 5.07e 03 4.03e  03

29 35 42

10.95 20.09 33.51

7.54e  03 4.99e  03 3.97e  03

20 40 60

0.25 0.25 0.25

0.0532 0.1060 0.1584

0.030 0.020 0.015

44 53 57

41.37 67.30 87.20

7.45e  03 4.87e  03 3.88e  03

33 39 45

23.15 43.20 72.13

7.37e  03 4.80e  03 3.76e  03

57

61

ORNNM

mn

55

59

85

111 Problem

53

79

109

Table A2 Comparison of ARNNM and ORNNM algorithm with noise (i.e., σ n ¼ 0:1).

49 51

77

ARNNM

r

41

47

ORNNM

mn

39

43

75

95 Problem

37

73

93

Table A1 Comparison of ARNNM and ORNNM algorithm without noise (i.e., σ n ¼ 0).

33 35

71

91

29 31

65

3000  3000

Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i

113 115 117 119 121 123

10

1

X. Lin, G. Wei / Signal Processing ] (]]]]) ]]]–]]]

ORNNM. The parameters required by ARNNM or ORNNM are given in Sections 8.1 and 8.2 uses the same parameters.

3 References 5 7 9 11 13 15 17 19 21 23

[1] J. Ma, J. Xu, Y. Bao, S. Yu, Compressive sensing and its application: from sparse to low-rank regularized optimization, Signal Process. 28 (2012) 609–623. [2] K. Konishi, K. Uruma, T. Takahashi, T. Furukawa, Iterative partial matrix shrinkage algorithm for matrix rank minimization, Signal Process. 100 (2014) 124–131. [3] K. Mohan, M. Fazel, Iterative reweighted algorithms for matrix rank minimization, J. Mach. Learn. Res. 13 (1) (2012) 3441–3473. [4] C. Hsieh, P. Olsen, Nuclear norm minimization via active subspace selection, in: Proceedings of The 31st International Conference on Machine Learning, 2014, pp. 575–583. [5] S. Ma, D. Goldfarb, L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Math. Program. 128 (1–2) (2011) 321–353. [6] H. Akcay, Subspace-based spectrum estimation in frequency-domain by regularized nuclear norm minimization, Signal Process. 99 (2014) 69–85. [7] T. Takahashi, K. Konishi, T. Furukawa, Rank minimization approach to image inpainting using null space based alternating optiminzation, in: Proceedings of IEEE International Conference on Image Processing, 2012, pp. 1717–1720. [8] G. Liu, Z. Lin, Y. Yu, Robust subspace segmentation by low-rank representation, in: Proceedings of the 27th International Conference on Machine Learning, 2010, pp. 663–670.

[9] E.J. Candes, B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math. 9 (6) (2009) 717–772. [10] Z. Liu, L. Vandenberghe, Interior-point method for nuclear norm approximation with application to system identification, SIAM J. Matrix Anal. Appl. 31 (3) (2009) 1235–1256. [11] M. Fazel, Matrix rank minimization with applications (Ph.D. thesis), Stanford University, Stanford, CA, 2002. [12] M. Zhang, Z. Haihuang, Y. Zhang, Restricted p-isometry properties of nonconvex matrix recovery, IEEE Trans. Inf. Theory 59 (7) (2013) 4316–4323. [13] L. Liu, W. Huang, D.R. Chen, Exact minimum rank approximation via schatten p-norm minimization, J. Comput. Appl. Math. 267 (2014) 218–227. [14] M. Lai, Y. Xu, W. Yin, Improved iteratively reweighted least squares for unconstrained smoothed l(q) minimization, SIAM J. Numer. Anal. 51 (2) (2013) 927–957. [15] L. Kong, N. Xiu, Exact low-rank matrix recovery via nonconvex schatten p-minimization, Asia-Pac. J. Oper. Res. 30 (03). [16] Y. Li, Y. Zhang, Z. Huang, A reweighted nuclear norm minimization algorithm for low rank matrix recovery, J. Comput. Appl. Math. 263 (2014) 338–350. [17] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun. Pure Appl. Math. 57 (11) (2004) 1413–1457. [18] N. Halko, P.G. Martinsson, J.A. Tropp, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions, SIAM Rev. 53 (2) (2011) 217–288.

25

Please cite this article as: X. Lin, G. Wei, Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery, Signal Processing (2015), http://dx.doi.org/10.1016/j.sigpro.2015.02.004i

27 29 31 33 35 37 39 41 43 45 47 49