Convergence of projected Landweber iteration for matrix rank minimization

Convergence of projected Landweber iteration for matrix rank minimization

Appl. Comput. Harmon. Anal. 36 (2014) 316–325 Contents lists available at ScienceDirect Applied and Computational Harmonic Analysis www.elsevier.com...

269KB Sizes 0 Downloads 37 Views

Appl. Comput. Harmon. Anal. 36 (2014) 316–325

Contents lists available at ScienceDirect

Applied and Computational Harmonic Analysis www.elsevier.com/locate/acha

Letter to the Editor

Convergence of projected Landweber iteration for matrix rank minimization ✩ Junhong Lin, Song Li ∗ Department of Mathematics, Zhejiang University, Hangzhou 310027, PR China

a r t i c l e

i n f o

a b s t r a c t

Article history: Received 21 December 2012 Received in revised form 7 June 2013 Accepted 13 June 2013 Available online 19 June 2013 Communicated by Amit Singer

In this paper, we study the performance of the projected Landweber iteration (PLW) for the general low rank matrix recovery. The PLW was first proposed by Zhang and Chen (2010) [43] based on the sparse recovery algorithm APG (Daubechies et al., 2008) [14] in the matrix completion setting, and numerical experiments have been given to show its efficiency (Zhang and Chen, 2010) [43]. In this paper, we focus on providing a convergence rate analysis of the PLW in the general setting of low rank matrix recovery with the affine transform having the matrix restricted isometry property. We show robustness of the algorithm to noise with a strong geometric convergence rate even for noisy measurements provided that the affine transform satisfies a matrix restricted isometry property condition. © 2013 Elsevier Inc. All rights reserved.

Keywords: Low rank matrix recovery Projected Landweber iteration Nuclear norm Restricted isometry property

1. Introduction 1.1. Affine rank minimization In this paper, we consider the general affine rank minimization problem (ARMP),

min

X ∈Rn1 ×n2

rank( X )

s.t.

A( X ) = b ,

(1.1)

where A : Rn1 ×n2 → Rm (m < n1 n2 ) is a known affine transform and b ∈ Rm is a given measurement vector. Such problem arises in many engineering applications, including system identification [27,28], collaborative filtering [9,39], quantum state tomography [19,18], and face recognition [3,8]. 1.2. Nuclear norm minimization Unfortunately, solving the affine rank minimization problem directly is NP-hard in general and thus is computationally infeasible [38,33]. Then it is natural to consider the method of nuclear norm minimization, which can be viewed as a convex relaxation of rank minimization [15], that consists of solving the following problem:

min

X ∈Rn1 ×n2

 X ∗ s.t. A( X ) = b.

Rewritten as a semidefinite program, nuclear norm minimization can be solved with efficient methods [38]. ✩

*

This work is supported by NSF of China No. 11171299, No. 91130009 and NSF of Zhejiang Province, China, No. Y6090091. Corresponding author. E-mail addresses: [email protected] (J. Lin), [email protected] (S. Li).

1063-5203/$ – see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.acha.2013.06.005

(1.2)

J. Lin, S. Li / Appl. Comput. Harmon. Anal. 36 (2014) 316–325

317

Now it has been shown that the nuclear norm minimization recovers all r rank matrices provided that the affine transform A satisfies a matrix restricted isometry property (M-RIP) condition δcr  C for some constants c , C > 0 [38,6,36,41,13]. Recall that the matrix restricted isometry property [38] is defined as follow: Definition 1.1. Let A : Rn1 ×n2 → Rm be a linear map. Without loss of generality, assume n2  n1 . For every integer r with 1  r  n1 , define the r-restricted isometry constant to be the smallest number δr such that

 2 (1 − δr ) X 2F  A( X )2  (1 + δr ) X 2F holds for all matrices X of rank at most r. Many types of random measurement maps such as Gaussian measurement maps have the M-RIP constant δr  δ with overwhelming probability [38,6] provided that

m  C δ −2 max{n1 , n2 }r .

(1.3)

Gaussian measurement maps are of the form

A( X )l =



al,k, j X k, j ,

l = 1, . . . , m ,

(1.4)

k, j

where the al,k, j are independent normal distributed random variables with mean zero and variance 1/m. The bound of (1.3) is optimal since the number of degrees of freedom of an n1 × n2 matrix of rank r is equal to r (n1 + n2 − r ). In fact, any random measurement map A obeying that, for any fixed X ∈ Rn1 ×n2 ,

   2 P A X 22 −  X 2F   δ X 2F  ce −γ mδ ,

δ ∈ (0, 1)

(1.5)

(γ , c are positive numerical constants) will satisfy the M-RIP δr  δ with overwhelming probability provided that m  C δ −2 max{n1 , n2 }r [6]. Subgaussian measurement maps satisfy (1.5). Very recently, Kramer and Ward [25] showed that randomizing the column signs of any matrix that satisfies the standard vector restricted isometry property [10] results in a matrix which satisfies the Johnson–Lindenstrauss lemma. As a result, random partial Fourier measurements, i.e., maps A of the form (1.4) where the al,k, j (unraveled for each l into vectors of length n1 n2 ) correspond to m rows randomly selected from the n1 n2 × n1 n2 discrete Fourier matrix (or 2D Fourier transform matrix) with randomized column signs, satisfy (1.5) with γ = C 0 log−4 (n1 n2 ) for a positive constant C 0 [25]. Therefore, such random partial Fourier measurements have the M-RIP δr  δ with high probability as long as [6]

m  C δ −2 max{n1 , n2 }r log4 (n1 n2 ). Another theory regarding the power of nuclear norm minimization to recover low rank matrix from subsampled measurements is in the matrix completion setup. Let PΩ : Rn1 ×n2 → Rn1 ×n2 denote the projection onto the index set Ω . That is, (PΩ ( X ))i j = X i j for (i , j ) ∈ Ω and (PΩ ( X ))i j = 0 otherwise. Then the nuclear norm minimization in the matrix completion setup can be formulated as follow:

min

X ∈Rn1 ×n2

 X ∗ s.t. PΩ ( X ) = PΩ ( M ).

(1.6)

Note that there are obvious rank one matrices in the kernel of the operator PΩ . Therefore, the RIP fails completely in this setting. The recent papers [9,12,23,37] show that, under certain incoherence assumptions on the singular vectors of the rank r matrix, exact recovery from |Ω| randomly chosen entries is possible provided that

  |Ω|  Cr max{n1 , n2 } log2 max{n1 , n2 } . We refer to [9,12,23,37] for detailed statements. In [19,18] extensions to quantum state tomography can be found. 1.3. Other algorithms Although there exist polynomial time algorithms to solve a standard semidefinite program, in practice they do not scale well to large semidefinite program problem instances. Therefore, it is crucial to develop fast algorithms that are specialized to large scale low rank matrix recovery. Towards this end, algorithms based on the nuclear norm minimization such as singular value thresholding (SVT) [7], accelerated proximal gradient algorithm [40], fixed point continuation (FPC), fixed point continuation with approximation SVD (FPCA) [30,20], projected Landweber iteration (PLW) [43], a spectral regularization algorithm [31], and matrix iteratively reweighted least squares minimization [17,29] are proposed. Greedy algorithms designed for the affine rank minimization such as atomic decomposition for minimum rank approximation (ADMiRA) [26], singular value projection (SVP) [20,32], and Optspace [24] have also been proposed for large scale low rank matrix recovery.

318

J. Lin, S. Li / Appl. Comput. Harmon. Anal. 36 (2014) 316–325

Both ADMiRA and SVP are greedy algorithms that aim at providing approximation solutions of the following formulation of minimum rank approximation





minA( X ) − b2 X

s.t.

rank( X )  r .

The ADMiRA algorithm is motivated by the CoSaMP [34] in compressed sensing while the SVP is based on the classical projected gradient algorithm IHT [1]. The ADMiRA recovers the optimal solution of the ARMP if the affine constraint A satisfies the M-RIP with δ4r  0.04, and for noisy measurements it has a geometric convergence rate [26]. Similarly, the SVP has a strong geometric convergence rate provided that the affine constraint A satisfies an M-RIP condition [20,32]. The fixed point continuation and the Bregman iteration algorithm to deal with the basis pursuit problem [22], have been extended to deal with the low rank matrix recovery in [30]. The authors solve a Lagrangian version of the problem (1.2):

min

n1 ×n2

X ∈R

 2   X ∗ + μA( X ) − b2 ,



(1.7)

where μ is a parameter. Then the parameter μ is increased by a continuous technique to get the solution. Although they proved the convergence of the FPC, they did not provide a convergence rate analysis and the efficiency of this algorithm is restricted to the noiseless, affine constraint case (i.e., linear equality). Cai et al. [7] proposed the SVT, which is based on the linearized Bregman algorithm [42,35], and is a simple first-order and easy-to-implement algorithm. Comparing with the FPC, the SVT can handle larger scale problems in matrix completion, since it utilizes the sparsity of sampled entries and the low rank property of the recovery matrix. They also provided a convergence analysis showing that the sequence of iterates converges. However, they did not provide a convergence rate analysis and like the FPC the efficiency of this algorithm too is restricted to the noiseless, affine constraint case. 1.4. Contributions In this paper, we consider the following formulation of the nuclear norm minimization:





min A( X ) − b2 n1 ×n2

X ∈R

s.t.

 X ∗  R

(1.8)

for a parameter R. Based on the sparse recovery algorithm APG [14], we propose an algorithm using a gradient method, with projection on nuclear norm closed balls, to give approximation solutions of (1.8). Note that in the matrix completion setting, such algorithm first appeared in [43]. We name this algorithm after [43] as PLW. Just like the relationship [14] between the iterative soft-thresholding algorithm and the APG for sparse recovery, the PLW can be viewed as a self adaptive thresholding FPC algorithm [43]. It computes the singular value thresholding in each iteration with self adaptive thresholding parameters. Numerical examples show that not only the number of iterations in the PLW are fewer than that in the SVT, but also the average time for each iteration of the PLW is greatly less than that for the SVT in matrix completion [43]. Although the authors [43] proved its strict convergence, they did not provide a convergence rate analysis. In this paper, we focus on providing a convergence rate analysis of the PLW in the general setting of low rank matrix recovery with the affine transform having the M-RIP. We show robustness of the PLW to noise with a strong geometric convergence rate even for noisy measurements provided that the affine transform satisfies an M-RIP condition. Unfortunately, the M-RIP fails in the matrix completion setup, and the theoretical analysis of the algorithm seems to be much more involved. Such a theoretical analysis has to be postponed to later investigations. We shall restrict this work to the setting of real valued matrix X ∈ Rn1 ×n2 . For perspective, our result can be easily extended to complex valued matrix. 1.5. Notation In the following and without loss of generality, we assume that n1  n2 . The entries of a matrix X ∈ Rn1 ×n2 are denoted by lower case letters and the corresponding indices, i.e., X i j = xi j . The matrix X ∗ is the adjoint of X , and for the linear operator A : Rn1 ×n2 → Rm , A∗ : Rm → Rn1 ×n2 , is the adjoint operator. For a generic matrix X ∈ Rn1 ×n2 , we write its singular value decomposition (SVD) X = U Σ V ∗ for unitary matrices U ∈ Rn1 ×n1 , V ∈ Rn2 ×n2 and a diagonal matrix Σ = diag(σ1 , . . . , σn1 ) ∈ Rn1 ×n2 , where σ1  σ2  · · ·  σn1 are the singular values. In the following, we denote σ ( X ) the vector of the singular values of the matrix X . For a specific matrix X we indicate the singular value decomposition as X = U X Σ X V X∗ . Denote by rank ( X ) the number of nonzero singular values of X . We will say that X is an r rank matrix if rank ( X )  r. We define the inner product of two matrices X and Y ∈ Rn1 ×n2 to be  X , Y  = Tr( X ∗ Y ), and denote the Frobenius norm of the matrix X by  X  F = (Tr( X ∗ X ))1/2 . The following identities hold:  X 2F = Tr( X ∗ X ) = i σi2 ( X ). The nuclear norm of X is defined as the sum of its singular values:  X ∗ = i σi ( X ). The spectral norm is defined as the largest singular value:  X  = σ1 ( X ) . Let the l1 norm and l2 norm of a vector be denoted by x1 and x2 respectively. Let B R = {x ∈ l2 ( N ): x1  R } be an l1 norm closed ball with radius R in finite dimensional space, and B R = { X ∈ Rn1 ×n2 :  X ∗  R } be a nuclear norm closed

J. Lin, S. Li / Appl. Comput. Harmon. Anal. 36 (2014) 316–325

319

ball with radius R. Let P B R be the vector projected operator in the closed l1 ball with radius R, denoted as P R without abuse. Similarly, denotes P R the matrix projected operator in the nuclear norm closed ball B R with radius R. 1.6. Organization The rest of this paper is organized as follows. In Section 2, we will build our problem and state the PLW algorithm. Subsequently, we will state our main result, which gives a convergence rate of the PLW in the general setting of low rank matrix recovery with the affine transform having the M-RIP in the noisy and mismodeling cases. Section 3 devotes on proving the main result. We begin by proving the exactly low rank matrix case, and then extend the result to the general case. 2. Main result 2.1. Problem setup Suppose that we observe the data b from the model:

b = A( M ) + z ,

(2.1)

n1 ×n2

n1 ×n2

where M ∈ R is an unknown low rank matrix, A : R → R is a known linear mapping, and z is an m dimensional noise term. The goal is to reconstruct the unknown matrix M based on b and A. m

2.2. Projected Landweber iteration The PLW is based on the Landweber iteration and projected method. We first set the initial value X 0 as 0. The iterative formulation of such algorithm with step size ηn is as follow:





E n = X n − ηn A∗ A( X n ) − b X n+1 = P R ( E n ).



In particular, when A = PΩ in the noiseless matrix completion setup (1.6), we have the following iterative formulation [43]:





E n = X n − ηn PΩ X n − M



X n+1 = P R ( E n ).

Note that from the definition of the nuclear norm, for a matrix X ∈ Rn1 ×n2 with SVD X = U diag(σ ( X )) V ∗ ,

P R ( X ) = U diag( P R





σ ( X ) V ∗.

On the basis of the computing of the vector projected operator P R (σ ( X )) in [14], the algorithm in computing the matrix projected operator can be found in [43, Algorithm 1]. Note that the matrix projected operator is computed by a singular value thresholding with some adaptive thresholding parameter. 2.3. Main result Our main result of this paper shows the following convergence guarantee for the PLW in terms of the M-RIP: Theorem 2.1. Suppose that the measurement operator A satisfies the M-RIP of order 2r. Let b = A( M ) + z be a vector of measurements of an arbitrary matrix M. Denote M [r ] the best rank r approximation of M, i.e.,

M [r ] = arg min  X − M  F

s.t.

X

rank( X )  r .

Furthermore, we assume that the step size ηn and the M-RIP constant δ2r satisfy



|ηn − 1|  α for some α  0 γ = 10 α + (1 + α )δ2r < 1.

(2.2)

Then, the nth iterate of the PLW algorithm with step size ηn and radius R = (1 − τ ) M [r ] ∗ with some τ ∈ [0, 1), satisfies

√   n  X − M   γ n  M  F + 4(α + 1)(1 + δ2r ) + 1 τ  M [r ]  F + 4(α + 1) 1 + δ2r z2 F 1−γ 1−γ

  M − M [r ] ∗ 4(α + 1)(1 + δ2r )  M − M [r ]  F + . + 1−γ r

(2.3)

320

J. Lin, S. Li / Appl. Comput. Harmon. Anal. 36 (2014) 316–325

Remark 2.2. 1. Note that the parameter

τ indicates how well the feasible set B R approximates the ideal feasible set B Rˆ with radius τ = 0. Then the residual error in (2.3) becomes merely dependent on the unrecoverable

Rˆ =  M [r ] ∗ . Ideally, one has energy:

z2 +

 M − M [r ] ∗ r

+  M − M [r ]  F .

And such a result is essentially optimal up to the constants [2,26]. 2. A natural choice of the step size ηn is that ηn = 1. This leads to α = 0 in (2.2). In this case, the sufficient condition on the M-RIP becomes δ2r < 0.1. Comparing with the ADMiRA which requires δ4r  0.04 to guarantee a geometric convergence rate in [26], our assumption on the M-RIP is weaker. Comparing with the SVP which requires δ2r  1/3 in [32], our assumption on the M-RIP is slightly stronger. Note the result on the SVP in [32] is depending on the setting on the step size: ηt = 1/(1 + δ2r ). 3. Note that we have not tried to optimize the M-RIP condition. We expect that with a more complicated proof as in [6,36,13], one can still improve this condition. 3. Proofs 3.1. Exactly low rank matrix case We begin by giving the following theorem for the exactly low rank matrix case. Theorem 3.1. Suppose that the measurement operator A satisfies the M-RIP of order 2r. Let b = A( M ) + z be a vector of measurements of an arbitrary matrix M with rank at most r. Furthermore, we assume that the step size ηn and the M-RIP constant δ2r satisfy



|ηn − 1|  α for some α  0 γ = 10 α + (1 + α )δ2r < 1.

(3.1)

Then, the nth iterate of the PLW algorithm with step size ηn and radius R = (1 − τ ) M ∗ with some τ ∈ [0, 1), satisfies

√  n   X − M   γ n  M  F + 4(α + 1)(1 + δ2r ) + 1 τ  M  F + 4(α + 1) 1 + δ2r z2 . F 1−γ 1−γ

In the rest of this subsection, we will focus on giving the proof of this theorem. The proof makes use of the ideas from [11,4,38,29]. Before the proof, we would like to introduce some basic lemmas and notation. For arbitrary X with SVD X = U Σ V ∗ , write Σ = Σ0 + Σ1 + · · ·, where

Σ0 = diag(σ1 , σ2 , . . . , σr , 0 · · ·), Σ1 = diag(0, . . . , 0, σr +1 , σr +2 , . . . , σ2r , 0 · · ·), Σ2 = diag(0, . . . , 0, σ2r +1 , σ2r +2 , . . . , σ3r , 0 · · ·), . . . . Similarly, write U = U 0 + U 1 + U 2 + · · · and V = V 0 + V 1 + V 2 · · · accordingly. Thus

X = X 0 + X 0c = X 0 + X 1 + X 2 + · · · , with X ik = U i Σi V i∗ . We will always use the notation Σi , U i , V i and X i to denote such truncations unless specified mentioning. Denote X 0 + X 1 by X 01 . A standard computation shows that: Lemma 3.2.



 Xi F 

 X 0c ∗ √ . r

i 2

Proof. Note that for each i  2,

 Xi F 



r σir ( X ) 

 X i −1 ∗ √ r

and thus

 i 2

 Xi F 

  X i −1 ∗  X 0c ∗ = √ . √ i 2

r

r

2

J. Lin, S. Li / Appl. Comput. Harmon. Anal. 36 (2014) 316–325

321

Lemma 3.3. Suppose that A satisfies the M-RIP of order 2r. Then for arbitrary X ∈ Rn2 ×n2 ,

   Xc A( X )  1 + δ2r  X 01  F + √0 ∗ . 2 r

Proof. The proof is similar to that given by Needell and Tropp [34], also Goldfarb and Ma [20]. Note that by triangle inequality and then by using the M-RIP, we get

        A( X )  A( X 01 ) + A( X j )  1 + δ2r  X 01  F +  X j F . 2 2 2 j 2

j 2

2

Using Lemma 3.2 to the above one would prove the result.

The following lemma is a straightforward generalization of [6, Lemma 3.3], and states that M-RIP operators approximately preserve inner products between low rank matrices. Its proof parallels that of [5,16,4] about the sparse vectors recovery. Lemma 3.4. For X , X ∈ Rn1 ×n2 , suppose that A satisfies the M-RIP of order max(rank( X + X ), rank( X − X )) with constant δ . Then we have

          η A( X ), A X − X , X   |η − 1| + ηδ  X  F  X  . F Proof. We assume that  X  F =  X  F = 1. According to the parallelogram identity and from the definition of the M-RIP, we have



    1      A X + X 2 − A X − X 2 A( X ), A X = 2 2 

4 1 4

2 2      (1 + δ) X + X  F − (1 − δ) X − X  F = X , X + δ.

It thus follows that



 









η A( X ), A X − X , X  (η − 1) X , X + ηδ  |η − 1| + ηδ. Similar to the above, one can easily prove that



 





η A( X ), A X − X , X  −|η − 1| − ηδ. Combining the above two inequalities, we get

      η A( X ), A X − X , X   |η − 1| + ηδ. Thus by a simple renormalization, we conclude the proof.

2

Lemma 3.5. Suppose that A satisfies the M-RIP of order 2r with constant δ . Then for X , X ∈ Rn1 ×n2 with  X 0c ∗   X 0 ∗ and  X 0 c ∗   X 0 ∗ ,

          X , X − η A( X ), A X   5 |η − 1| + ηδ  X  F  X  . F

Proof. By Lemma 3.4,

                 X , X − η A( X ), A X  =  X i , X j − η A( X i ), A X j   |η − 1| + ηδ  X i  F  X j  F .  i, j

i, j

Note that by Lemma 3.2 and the assumption  X 0c ∗   X 0 ∗ , we have



 Xi F   X0F +  X1F +

 X 0c ∗  X 0 ∗   X0F +  X1F + √ √

i

 2 X 0  F +  X 1  F  Similarly, we have

  √    X   5 X  . i F F i



r

5 X 01  F 



r

5 X  F .

(3.2)

322

J. Lin, S. Li / Appl. Comput. Harmon. Anal. 36 (2014) 316–325

Introducing the above two inequalities to (3.2), we get

          X , X − η A( X ), A X   5 |η − 1| + ηδ  X  F  X  . F

2

In what follows, M ⊥ is a projection of M onto B R and M − M ⊥ is denoted by D. Furthermore, for k = 0, 1, . . ., we denote X k − M ⊥ by D k for compactness. Note that D 0 = − M ⊥ . Lemma 3.6.

 n +1  2             D   2 D n+1 , D n − 2ηn A D n+1 , A D n + 2ηn A D n+1 , A( D ) + z . F Proof. The proof parallels that of [4] about sparse vectors recovery. Since X n+1 = P R ( X n − R =  M ⊥ ∗ , we have

ηn A∗ (A( X n ) − b)) and

 n +1    2     2 X − X n + ηn A∗ A X n − b  F   M ⊥ − X n + ηn A∗ A X n − b  F .

Introducing b = A( M ) + z to the above,

 n +1    2     2 X − X n + ηn A∗ A X n − A( M ) − z  F   M ⊥ − X n + ηn A∗ A X n − A( M ) − z  F .

That is

 n +1    2     2 D − D n + ηn A∗ A D n − A( D ) − z  F  − D n + ηn A∗ A D n − A( D ) − z  F , which is equivalent to



 





D n+1 , D n+1 − 2D n + 2ηn A∗ A D n − A( D ) − z  0.

By a simple computation, one would prove the result.

2

We next introduce the following useful lemma. Lemma 3.7. For any matrix E ∈ Rn1 ×n2 with SVD U Σ V ∗ , we have



P R ( E ) = U diag P R





σ (E) V ∗

and



supp P R









σ ( E ) ⊂ supp σ ( E ) .

The first inequality of this lemma is a direct consequence of the definition of the matrix projection operator and the definition of nuclear norm, while the second inequality is implied by [4, Lemma 2.3]. The following lemma, which is related to matrix nuclear norm inequality, is useful for our proof, see [21]. Lemma 3.8. For any matrix X and Y ∈ Rn1 ×n2 ,

  σi ( X ) − σi (Y )   X − Y ∗ . i

By using the above lemma, we can prove the following result. Lemma 3.9.

 k    √  D c   Dk   rDk  . 0 ∗ 0 F 0 ∗ Proof. We have  X k ∗  R =  M ⊥ ∗ , that is

 k   D + M ⊥    M ⊥ ∗ . ∗

(3.3)

Let the SVD of D k be U Σ V ∗ , and M ⊥ = −U Σ M ⊥ V ∗ . By Lemma 3.8 and (3.3), we have

    k     D + M⊥ = σi D k − σi ( M ⊥ )   D k + M ⊥    M ⊥ ∗ =  M ⊥ ∗ . ∗ ∗ i

(3.4)

J. Lin, S. Li / Appl. Comput. Harmon. Anal. 36 (2014) 316–325

323

Set H k = D k + M ⊥ = U (Σ − Σ M ⊥ ) V ∗ , and H k = H kT + H kT c , where H kT = U (Σ0 − (Σ M ⊥ )0 ) V ∗ and H kT c = U (Σ0c − (Σ M ⊥ )0c ) V ∗ . With this notation, (3.4) is equivalent to

 k    H  +  H k c    M ⊥ ∗ . T ∗ T ∗

(3.5)

By Lemma 3.7, we have rank ( M ⊥ )  rank( M )  r. Thus, we have (Σ M ⊥ )0 = Σ M ⊥ and (Σ M ⊥ )0c = 0. Consequently, by (3.5), we get

 k         D c  =  H k c    M ⊥ ∗ −  H k    M ⊥ − H k  =  D k ∗ , T ∗ T ∗ 0 T ∗ 0 ∗

which leads to the conclusion.

2

Now, we are ready to give the proof of Theorem 3.1. Proof of Theorem 3.1. By Lemmas 3.6, 3.5 and 3.9, we have

 n +1  2             D   2 D n+1 , D n − 2ηn A D n+1 , A D n + 2ηn A D n+1 , A( D ) + z F           10 |ηn − 1| + ηn δ2r  D n  F  D n+1  F + 2ηn A D n+1 , A( D ) + z             10 |ηn − 1| + ηn δ2r  D n  F  D n+1  F + 2ηn A D n+1 2 A( D )2 + z2 .

(3.6)

Note that by Lemmas 3.3 and 3.9,

1      n+1         D n0+ c ∗ A D   1 + δ2r  D n+1  +  2 1 + δ2r  D n01+1  F  2 1 + δ2r  D n+1  F . √ 01 2 F r

Introducing the above to (3.6),

   n +1  2           D   10 |ηn − 1| + ηn δ2r  D n   D n+1  + 4ηn 1 + δ2r  D n+1  A( D ) + z2 , F F F F 2

which is equivalent to

   n +1        D   10 |ηn − 1| + ηn δ2r  D n  + 4ηn 1 + δ2r A( D ) + z2 . F F 2

(3.7)

By M ⊥ = P R ( M ) and that (1 − τ ) M  F  (1 − τ ) M ∗ = R, we have

   D  F =  M − M ⊥  F   M − (1 − τ ) M  F = τ  M  F .

(3.8)

By Lemma 3.7, we have rank( D )  r. It thus follows from the definition of the M-RIP and (3.8) that

    A( D )  1 + δ2r  D  F  1 + δ2r τ  M  F . 2

Introducing the above to (3.7), we get

  n +1      D   10 |ηn − 1| + ηn δ2r  D n  + 4ηn τ (1 + δ2r ) M  F + 4ηn 1 + δ2r z2 . F F

Applying the assumptions (3.1), we get

  n +1     D   γ  D n  + 4(α + 1)τ (1 + δ2r ) M  F + 4(α + 1) 1 + δ2r z2 , F F

where

γ = 10(α + (1 + α )δ2r ) < 1. Applying this inequality recursively and using the fact that n −1  j =0

to get

γj<

∞  j =0

γj=

1 1−γ

√  n  D   γ n  M ⊥  F + 4(α + 1)τ (1 + δ2r ) M  F + 4(α + 1) 1 + δ2r z2 . F 1−γ 1−γ

Note that  M ⊥  F   M  F , thus

√  n  D   γ n  M  F + 4(α + 1)τ (1 + δ2r ) M  F + 4(α + 1) 1 + δ2r z2 . F 1−γ 1−γ

By using the above inequality and (3.8), we can finally deduce

    n    X − M  = Dn − D   Dn + D F F F F √ 4 ( α + 1)(1 + δ2r ) 4(α + 1) 1 + δ2r n  γ M  F + + 1 τ M  F + z2 . 1−γ 1−γ

2

324

J. Lin, S. Li / Appl. Comput. Harmon. Anal. 36 (2014) 316–325

3.2. Extended to general matrix case Theorem 2.1 is proved by combining Theorem 3.1 and the following lemma, which shows how to convert the mismodeling error (deviations from a low-rank matrix) to an equivalent additive measurement noise with a quantified norm. The proof of this lemma is similar as that of [34,2] and is first showed in [26]. We give the proof for completeness. Lemma 3.10. Suppose that A satisfies the M-RIP with order 2r. Let X be an arbitrary n1 × n2 matrix. The measurement b = A( X ) + z is also represented as b = A( X [r ] ) + z˜ , where

˜z2  z2 +



1 + δ2r  X − X [r ]  F +

 X − X [r ] ∗ r

 .

Proof. Decompose X = X − X [r ] + X [r ] to obtain b = A( X [r ] ) + z˜ , where z˜ = z + A( X − X [r ] ). To compute the size of the error term, we simply apply the triangle inequality and Lemma 3.3:

     X − X [r ] ∗ . ˜z2  z2 + A( X − X [r ] )2  z2 + 1 + δ2r  X − X [r ]  F + √ r

2

Now we are ready to prove Theorem 2.1. Proof of Theorem 2.1. Write b = A( M ) + z = A( M [r ] ) + z˜ . Applying Theorem 3.1 and Lemma 3.10, we get

√  n   X − M   γ n  M [r ]  F + 4(α + 1)(1 + δ2r ) + 1 τ  M [r ]  F + 4(α + 1) 1 + δ2r ˜z2 F 1−γ 1−γ √ 4 ( α + 1 )( 1 + δ ) 4 ( α + 1) 1 + δ2r 2r  γ n  M [r ]  F + + 1 τ  M [r ]  F + z2 1−γ 1−γ

  M − M [r ] ∗ 4(α + 1)(1 + δ2r )  M − M [r ]  F + . 2 + 1−γ r

Acknowledgments We would like to thank the referees for their valuable comments that improve the presentation of this paper. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25]

T. Blumensath, M. Davies, Iterative thresholding for sparse approximations, J. Fourier Anal. Appl. 14 (2008) 629–654. T. Blumensath, M. Davies, Iterative hard thresholding for compressive sensing, Appl. Comput. Harmon. Anal. 27 (2009) 265–274. R. Basri, D. Jacobs, Lambertian reflectance and linear subspaces, IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003) 218–233. S. Bahmani, B. Raj, A unifying analysis of projected gradient descent for l p -constrained least squares, Appl. Comput. Harmon. Anal. 34 (2013) 366–378. E.J. Candès, The restricted isometry property and its implications for compressed sensing, C. R. Math. Acad. Sci. Paris, Ser. I 346 (2008) 589–592. E.J. Candès, Y. Plan, Tight oracle bounds for low-rank recovery from a minimal number of random measurements, IEEE Trans. Inform. Theory 57 (2011) 2342–2359. J.F. Cai, E.J. Candès, Z.W. Shen, A singular value thresholding algorithm for matrix completion, SIAM J. Optim. 20 (2010) 1956–1982. E.J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58 (2011). E.J. Candes, B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math. 9 (2009) 717–772. E.J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inform. Theory 52 (2006) 489–509. E.J. Candès, J. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Comm. Pure Appl. Math. 59 (2006) 1207–1223. E.J. Candès, T. Tao, The power of convex relaxation: Near-optimal matrix completion, IEEE Trans. Inform. Theory 56 (2010) 2053–2080. T. Cai, A. Zhang, Sharp RIP bound for sparse signal and low-rank matrix recovery, Appl. Comput. Harmon. Anal. 35 (2013) 74–93. I. Daubechies, M. Fornasier, I. Loris, Accelerated projected gradient method for linear inverse problems with sparsity constraints, J. Fourier Anal. Appl. 14 (2008) 764–792. M. Fazel, Matrix rank minimization with applications, PhD thesis, Stanford University, 2002. S. Foucart, Hard thresholding pursuit: An algorithm for compressive sensing, SIAM J. Numer. Anal. 49 (2011) 2543–2563. M. Fornasier, H. Rauhut, R. Ward, Low-rank matrix recovery via iteratively reweighted least squares minimization, SIAM J. Optim. 21 (2011) 1614–1640. D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Trans. Inform. Theory 57 (2011) 1548–1566. D. Gross, Y.K. Liu, S.T. Flammia, S. Becker, J. Eisert, Quantum state tomography via compressed sensing, Phys. Rev. Lett. 105 (2010) 150401. D. Goldfarb, S. Ma, Convergence of fixed point continuation algorithms for matrix rank minimization, Found. Comput. Math. 11 (2011) 183–210. R. Horn, C. Johnson, Matrix Analysis, Cambridge University Press, 1990. E.T. Hale, W. Yin, Y. Zhang, Fixed-point continuation for l1 -minimization: Methodology and convergence, SIAM J. Optim. 19 (2008) 1107–1130. R.H. Keshavan, A. Montanari, S. Oh, Matrix completion from a few entries, IEEE Trans. Inform. Theory 56 (2010) 2980–2998. R.H. Keshavan, S. Oh, OptSpace: A gradient descent algorithm on the Grassman manifold for matrix completion, available at http://arxiv.org/abs/ 0910.5260, 2009. F. Kramer, R. Ward, New and improved Johnson–Lindenstrauss embeddings via the restricted isometry property, SIAM J. Math. Anal. 43 (2011) 1269–1281.

J. Lin, S. Li / Appl. Comput. Harmon. Anal. 36 (2014) 316–325

325

[26] K. Lee, Y. Bresler, ADMiRA: Atomic decomposition for minimum rank approximation, IEEE Trans. Inform. Theory 56 (2010) 4402–4416. [27] Z. Liu, L. Vandenberghe, Interior-point method for nuclear norm approximation with application to system identification, SIAM J. Matrix Anal. Appl. 31 (2008) 1235–1256. [28] K. Mohan, M. Fazel, Reweighted nuclear norm minimization with application to system identification, in: Proc. of the American Control Conference, 2010, pp. 2953–2959. [29] K. Mohan, M. Fazel, Iterative reweighted algorithms for matrix rank minimization, J. Mach. Learn. Res. 13 (2012) 3253–3285. [30] S. Ma, D. Goldfarb, L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Math. Program. 128 (2011) 321–353. [31] R. Mazumder, T. Hastie, R. Tibshirani, Spectral regularization algorithms for learning large incomplete matrices, J. Mach. Learn. Res. 11 (2010) 2287–2322. [32] R. Meka, P. Jain, I.S. Dhillon, Guaranteed rank minimization via singular value projection, in: Proc. of the Neural Information Processing Systems Conference, 2010. [33] B.K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Comput. 24 (1995) 227–234. [34] D. Needell, J.A. Tropp, CoSaMP: Iterative signal recovery from noisy samples, Appl. Comput. Harmon. Anal. 26 (2008) 301–321. [35] S. Osher, Y. Mao, B. Dong, W. Yin, Fast linearized Bregman iteration for compressive sensing and sparse denoising, Commun. Math. Sci. 8 (2010) 93–111. [36] S. Oymak, K. Mohan, M. Fazel, B. Hassibi, A simplified approach to recovery conditions for low rank matrices, in: IEEE International Symposium on Information Theory Proceedings, ISIT, 2011, pp. 2318–2322. [37] B. Recht, A simpler approach to matrix completion, J. Mach. Learn. Res. 12 (2011) 3413–3430. [38] B. Recht, M. Fazel, P. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Rev. 52 (2010) 471–501. [39] N. Srebro, J.D.M. Rennie, T.S. Jakkola, Maximum-margin matrix factorization, in: Advances in Neural Information Processing Systems, 2005. [40] K.C. Toh, S.W. Yun, An accelerated proximal gradient algorithm for nuclear norm regularized least squares problem, Pac. J. Optim. 6 (2010) 615–640. [41] H. Wang, S. Li, The bounds of restricted isometry constants for low rank matrices recovery, Sci. China Math. 56 (2013) 1117–1127. [42] W. Yin, S. Osher, D. Goldfarb, J. Darbon, Bregman iterative algorithms for l1 -minimization with applications to compressed sensing, SIAM J. Imaging Sci. 1 (2008) 143–168. [43] H. Zhang, L.Z. Cheng, Projected Landweber iteration for matrix completion, J. Comput. Appl. Math. 235 (2010) 593–601.