Journal of Computational and Applied Mathematics 263 (2014) 338–350
Contents lists available at ScienceDirect
Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam
A reweighted nuclear norm minimization algorithm for low rank matrix recovery✩ Yu-Fan Li a , Yan-Jiao Zhang a , Zheng-Hai Huang b,a,∗ a
Department of Mathematics, School of Science, Tianjin University, Tianjin 300072, PR China
b
Center for Applied Mathematics of Tianjin University, PR China
article
info
Article history: Received 19 July 2012 Received in revised form 6 November 2013 MSC: 65K05 90C59 15A60 Keywords: Low rank matrix minimization Matrix completion problem Reweighted nuclear norm minimization Weighted fixed point method
abstract In this paper, we propose a reweighted nuclear norm minimization algorithm based on the weighted fixed point method (RNNM–WFP algorithm) to recover a low rank matrix, which iteratively solves an unconstrained L2 –Mp minimization problem introduced as a nonconvex smooth approximation of the low rank matrix minimization problem. We prove that any accumulation point of the sequence generated by the RNNM–WFP algorithm is a stationary point of the L2 –Mp minimization problem. Numerical experiments on randomly generated matrix completion problems indicate that the proposed algorithm has better recoverability compared to existing iteratively reweighted algorithms. © 2013 Elsevier B.V. All rights reserved.
1. Introduction The low rank matrix minimization problem (LRM for short) is to find a lowest rank matrix based on some feasible measurement ensembles. When the set of measurement is affine in the matrix variable, the LRM is given by min rank(X ) s.t. A(X ) = b, X
(1.1)
where the linear map A : Rm×n → Rs and the vector b ∈ Rs are known. The LRM has various applications in image compression, statistics embedding, system identification and control problems; and it is NP-hard. The tightest convex relaxation of problem (1.1) is the following nuclear norm minimization problem: min ∥X ∥∗ X
s.t. A(X ) = b,
(1.2)
m×n where ∥X ∥∗ := with rank(X ) = r (here n ≤ m without loss i=1 σi (X ) is the sum of all the singular values of X ∈ R of generality). When X is restricted to be a diagonal matrix, problems (1.1) and (1.2) reduce to the sparse signal recovery problem:
r
min ∥x∥0 x
s.t. Ax = b
✩ This work was partially supported by the National Natural Science Foundation of China (Grant No. 11171252).
∗
Corresponding author at: Department of Mathematics, School of Science, Tianjin University, Tianjin 300072, PR China. E-mail address:
[email protected] (Z.-H. Huang).
0377-0427/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.cam.2013.12.005
(1.3)
Y.-F. Li et al. / Journal of Computational and Applied Mathematics 263 (2014) 338–350
339
and the l1 norm minimization problem: min ∥x∥1 x
s.t. Ax = b,
(1.4)
n respectively, n where ∥x∥0 := |support(x)| := |{i | xi ̸= 0}| is the cardinality of the support set of x ∈ R , and ∥x∥1 := i=1 |xi | is the l1 -norm of the vector x. It is known that under some conditions on the linear transformation A (or matrix A), one can obtain an exact solution of problem (1.1) (or (1.3)) via (1.2) (or (1.4)) [1,2]. Besides, problem (1.1) (or (1.3)) can also be relaxed to the nonconvex n p p σ (X ) (or replacing optimization problem, such as, replacing the term ∥X ∥∗ in (1.2) by the nonconvex term ∥X ∥p := i i = 1 ∥x∥1 in (1.4) by ∥x∥pp := ni=1 |xi |p ) with 0 < p < 1. It was shown by Chartrand [3] that a nonconvex variant of (1.3) could produce exact reconstruction with fewer measurements. Recently, there has been an explosion of research on this topic both in vector case and matrix case, see, e.g., [4–14]. Recently, some nonconvex smooth approximation models were proposed in the literature. In the context of the vector case, one of the models is the following unconstrained l2 –lp minimization problem:
min λ x
n (|xi | + ε)p + ∥Ax − b∥22 where 0 < p ≤ 1.
(1.5)
i=1
This problem can be solved by iteratively reweighted method: given xk at an iteration, it generates xk+1 as the unique solution of the reweighted l1 norm minimization problem: minx λ∥W k x∥1 + ∥Ax − b∥22 , where W k = Diag{(|xki | + ε)p−1 : i = 1, . . . , n}. Such a reweighted l1 norm minimization algorithm for sparse signal recovery was originally introduced by Candés, Wakin and Boyd [15]. In 2009, Foucart and Lai [6] proved that under the assumption of Restricted Isometry Property (RIP for short) condition, the reweighted l1 norm minimization algorithm with weight wik = (|xki | + ε)p−1 for any index i ∈ {1, . . . , n} can exactly recover the sparse signal, where p ∈ (0, 1) is a given parameter. Recently, Chen and Zhou [16] gave an unconstrained iteratively reweighted l1 minimization algorithm to solve (1.5), and further proved that under RIP/NSP(Null Space Property) type conditions the accumulation points of the sequence generated by the reweighted l1 norm minimization algorithm can converge to the stationary point of (1.5). Moreover, Chen and Zhou [17] also proposed an effective globally convergent smoothing nonlinear conjugate gradient method for solving nonconvex minimization problems, which can guarantee that any accumulation point of a sequence generated by this method is a Clarke stationary point of the problem concerned. In the context of the matrix case, Mohan and Fazel [18] proposed an iteratively reweighted least square (IRLS for short) algorithm (see, e.g., [18–21]) for minimizing the smooth Schatten-p function fp (X ) = Tr(X T X + γ I )p/2 , i.e., min fp (X ) s.t. A(X ) = b where 0 < p < 1. X
Recently, Lai et al. [21] considered the unconstrained minimization problem: min Tr(X T X + ε 2 I )p/2 + X
1 2λ
∥A(X ) − b∥22 where 0 < p < 1.
Given W k at an iteration, the IRLS algorithm [21] generates X k+1 by X k+1 := arg min Tr(Wpk X T X ) + X
1
λ
∥A(X ) − b∥22 ,
where Wpk = p(X k X k + ε k I )p/2−1 . Both in [18,21], the iteration of regular parameters γ and ε in the algorithm take the T
2
same rule as the one in [22], i.e., γ k = min{γ k−1 , γs σr +1 (X k )} and ε k = min{ε k−1 , γs σr +1 (X k )} with γs < 1 and r being a guesstimate of the rank. In this paper, we first introduce an unconstrained smooth nonconvex approximation of problem (1.1) by the following L2 –Mp problem: min λ∥X ∥p,ε + ∥A(X ) − b∥22 ,
(1.6)
X
p where λ is a parameter which is sufficiently small, and ∥X ∥p,ε := i=1 (σi (X ) + ε) with ε > 0 and p ∈ (0, 1). For any fixed scalar ε > 0 and p ∈ (0, 1), ∥ · ∥p,ε is subdifferentiable; and it satisfies that ∥X ∥p,ε → rank(X ) as ε → 0 and p → 0. We propose a reweighted nuclear norm minimization algorithm for solving problem (1.6), i.e.,
n
X k+1 := arg min λp X
n
wik σi (X ) + ∥A(X ) − b∥22 ,
(1.7)
i =1
where wik = (σi (X k ) + ε)p−1 for all i ∈ {1, 2, . . . , n} and σi (X k ) is the i-th singular value of X k . We call the weighted form n of i=1 wik σi (X ) as the weighted nuclear norm. It is not a norm strictly speaking, but it is convex. Thus, the iteration (1.7) is called a reweighted nuclear norm minimization algorithm. For the implementation of the iteration (1.7), how to solve
340
Y.-F. Li et al. / Journal of Computational and Applied Mathematics 263 (2014) 338–350
problem min λp X
n
wi σi (X ) + ∥A(X ) − b∥22
(1.8)
i =1
with known wi for all i ∈ {1, 2, . . . , n} is a key. We propose a weighted fixed point method for solving (1.8). We also prove that any accumulation point of the sequence generated by the reweighted nuclear norm minimization algorithm based on the proposed weighted fixed point method (RNNM–WFP algorithm for short) is a stationary point of the L2 –Mp minimization problem (1.6). The preliminary numerical experiments on randomly generated matrix completion problems indicate that this algorithm has better recoverability compared to some existing algorithms. The rest of this paper is organized as follows. In Section 2, we discuss the subdifferential of the new smooth approximation term ∥X ∥p,ε and the subdifferential of the weighted nuclear norm of matrix X . In Section 3, we propose an RNNM–WFP algorithm to solve problem (1.6). The convergence results are discussed in Section 4, and numerical results are reported in Section 5. In Section 6, we give some conclusions. Throughout this paper, Rn denotes the space of n-dimensional real column vectors, and Rm×n denotes the space of m × n dimensional real matrices. Let I = {1, 2, . . . , n} and Diag{ui : i ∈ I} (or Diag(u) with u ∈ Rn ) denote the diagonal matrix whose i-th diagonal element is ui . For a matrix X ∈ Rm×n of rank r, we write its singular value decomposition (SVD for short) as X = U Σ V T with U ∈ Rm×n , V ∈ Rn×n and Σ = Diag(σ (X )). Here, σ (X ) ∈ Rn denotes the singular value vector of X , and we assume σ1 (X ) ≥ · · · ≥ σn (X ) and n ≤ m without lose of generality. 2. Preliminaries In this section, we give some basic concepts and preliminary results which are used in our analysis. Specially, we give the subdifferential of the objective function of problem (1.6), and then, we generalize the subdifferential of nuclear norm to the weighted form. 2.1. The subdifferential of singular value function In this subsection, we give the subdifferential of the objective function of problem (1.6). In order to characterize it, we first introduce some basic concepts and preliminary results. Definition 2.1. (i) (Absolutely symmetric function) f : Rn → R is called an absolutely symmetric function, if f (x1 , x2 , . . . , xn ) = f (|xπ(1) |, |xπ(2) |, . . . , |xπ(n) |) holds for any permutation π for {1, . . . , n}. (ii) (Orthogonally invariant function) A function F : Rm×n → [−∞, +∞], with n ≤ m, is orthogonally invariant if F (U T XV ) = F (X ) holds for all X ∈ Rm×n and all orthogonal matrices U ∈ Rm×n and V ∈ Rn×n . The relationship between absolutely symmetric functions and orthogonally invariant functions was essentially worked out in [23] (see also, for instance, [24, Proposition 5.1]). We write it here as the following proposition. Proposition 2.1. A function F : Rm×n → [−∞, +∞] with n ≤ m is orthogonally invariant if and only if F (X ) = f ◦ σ (X ) for some absolutely symmetric function f : Rn → [−∞, +∞]. In the following, we recall a very useful formula concerning the subdifferential of a singular value function F at X ∈ Rm×n in terms of the subdifferential of the corresponding absolutely symmetric function f at the singular vector σ (X ) [24, Theorem 7.1]. Proposition 2.2. Let f : Rn → [−∞, +∞] be an absolutely symmetric function, then the subdifferential of the corresponding singular value function f ◦σ at a matrix X ∈ Rm×n is given by the formula ∂(f ◦σ )(X ) = UDiag(∂ f (σ (X )))V T , where X = U Σ V T with U ∈ Rm×n , V ∈ Rn×n and Σ = Diag(σ (X )) ∈ Rn×n . Denote the regularization term in objective function of problem (1.6) by n (|σi (X )| + ε)p := F (X ) = f ◦ σ (X ), i =1
with F : Rm×n → [−∞, +∞] being orthogonally invariant and f : Rn → [−∞, +∞] being absolutely symmetric, here n p n f (t ) = i=1 (|ti | + ε) for any t ∈ R and 0 < p < 1. Then, by using Proposition 2.2, we show that the subdifferential of F can be given as below. Lemma 2.1. Let F : Rm×n → [−∞, +∞] and f : Rn → [−∞, +∞] be defined as the above, then F = f ◦σ is subdifferentiable at matrix X ∈ Rm×n and
∂ F (X ) = p U Diag
ci
(σi (X ) + ε)1−p
: i ∈ I VT,
= 1,
with X = U Σ V T being the SVD of X , and ci ∈ [−1, 1],
σ i (X ) > 0 , σi (X ) = 0 is a constant depends only on the value of
σi (X ) for each i ∈ I.
Y.-F. Li et al. / Journal of Computational and Applied Mathematics 263 (2014) 338–350
341
p n Proof. Since F (X ) = i=1 (|σi (X )| + ε) = f ◦ σ (X ) with f : R → [−∞, +∞] being defined by f (t ) = n for any t ∈ R and 0 < p < 1, the following three statements hold:
n
• f is an absolutely symmetric function by Definition 2.1; • F is orthogonally invariant by Proposition 2.1; and • the subdifferential of f at t ∈ Rn can be written as: p(|t1 | + ε)p−1 ∂|t1 | = 1, . n .. ∂ f (t ) = ∈ R where ∂|ti | ∈ [−1, 1], = −1, p(|tn | + ε)p−1 ∂|tn |
n
i =1
(|ti | + ε)p
ti > 0, ti = 0, ti < 0.
Thus, by Proposition 2.2, it follows that
∂ F (X ) = ∂(f ◦ σ )(X ) = p UDiag which completes the proof.
ci
(σi (X ) + ε)1−p
: i ∈ I VT,
Let Lλ (X , ε) denote the objective function of problem (1.6), that is Lλ (X , ε) := λ
n (σi (X ) + ε)p + ∥A(X ) − b∥22 .
(2.1)
i=1
Then, by Lemma 2.1, the necessary condition of problem (1.6) is given by 0 ∈ ∂X Lλ (X , ε) = λp UDiag
∂|σi (X )| : i ∈ I V T + 2A∗ (A(X ) − b). (σi (X ) + ε)1−p
(2.2)
2.2. The subdifferential of weighted nuclear norm In this subsection, we give the subdifferential of the weighted n nuclear norm, which is a generalization of the nuclear norm. The weighted nuclear norm can be defined by fW (X ) = i=1 wi σi (X ) = φ(σ ), where W = Diag{wi : i ∈ I} is known and σ := (σ1 (X ), . . . , σn (X ))T . Hence, φ(σ ) is an absolutely symmetric function by Definition 2.1. For simplicity, we denote f (X ) = fW (X ) for any given weight matrix W ∈ Rn×n . Combining the result with [23, Theorem 1], and in a similar way as the one in [23, Theorem 2], it is easy to show that the subdifferential of f (X ) = φ(σ ) in X is given by
∂ f (X ) = ∂X φ(σ ) = conv{UDV T : X = U Σ V T and D = Diag{d : d ∈ ∂φ(σ )}},
(2.3)
where convS is the convex hull of the set S. Let the SVD of matrix X with rank r be X = U Σ V with the matrices being T
. 1 . 1 0 . 0 . 0 0 partitioned by U = [U . U ] and V = [V . V ], where U and V having r columns. Given W = Diag{wi : i ∈ I} = Wr
Wn−r
with Wr ∈ Rr ×r . Recall that
∂φ(σ ) = {x ∈ Rn | xi = wi , i = 1, . . . , r ; |xi | ≤ wi , i = r + 1, . . . , n}. T T Let G ∈ ∂X φ(σ ), then it follows from (2.3) that G = i λi = 1. For each i, X = Ui Σ Vi with i λi Ui Di Vi with λi ≥ 0 and T T Ui and Vi having the same partition as U and V , and di ∈ ∂φ(σ ). Then, G = U 0 Wr V 0 + i λi Ui1 Wn−r Ti Vi1 . Here, Ti is an (n − r ) × (n − r ) diagonal matrix with diagonal elements being less than and equal to 1 in magnitude. Thus, there exist orthogonal matrices Yi , Zi ∈ R(n−r )×(n−r ) such that Ui1 = U 1 Yi , Vi1 = V 1 Zi , and T T T ¯ n −r T V 1 T G = U 0 Wr V 0 + λi U 1 Yi Wn−r Ti ZiT V 1 = U 0 Wr V 0 + U 1 W i
λi Yi Wn−r Ti ZiT satisfying T ¯ n −r T ) = σ 1 σ1 (W λi Yi Wn−r Ti Zi ≤ λi σ1 (Wn−r Ti ) ≤ σ1 (Wn−r )
¯ n−r T = with W
i
i
i
since the diagonal element of Ti is less than and equal to 1. Thus, we obtain the following lemma. Lemma 2.2. Let all symbols be given as the above, then we obtain that the subdifferential of f (X ) = T
T
¯ n−r T V 1 : σ1 (W ¯ n−r T ) ≤ σ1 (Wn−r )}. ∂ f (X ) = {U 0 Wr V 0 + U 1 W
n
i =1
wi σi (X ) is
342
Y.-F. Li et al. / Journal of Computational and Applied Mathematics 263 (2014) 338–350
3. Algorithm design In this section, we propose a weighted fixed point method for solving (1.8), and then, give a reweighted nuclear norm minimization algorithm based on the weighted fixed point method for solving problem (1.6). 3.1. Weighted fixed point method How to solve (1.8) is a key to the iteration (1.7). In this subsection, we propose a weighted fixed point method to solve (1.8). It should be noted that the fixed point iterative algorithm, which uses an operator splitting technique, was first proposed in [25] for the l1 -regularized problem, and then, it was developed in [26] by Ma et al. for solving the unconstrained nuclear norm minimization problem. n Since the objective function in (1.8) is convex in X , X ∗ is the optimal solution if and only if 0 ∈ λp ∂ i=1 wi σi (X ∗ ) + ∗ ∗ g (X ), where g (X ) = 2A (A(X ) − b). Now, by using the operator splitting technique, the above condition is equivalent to 0 ∈ τ λp ∂
n
wi σi (X ∗ ) + X ∗ − (X ∗ − τ g (X ∗ ))
i=1
for any τ > 0. Let Y ∗ = X ∗ − τ g (X ∗ ), then 0 ∈ τ λp ∂ of min τ λp
n
X
n
i=1
wi σi (X ∗ ) + X ∗ − Y ∗ , which states that X ∗ is an optimal solution
1
wi σi (X ) + ∥X − Y ∗ ∥22 .
(3.1)
2
i=1
Let X = U Σ V T be the SVD of X ∈ Rm×n with rank r. For each ν ≥ 0, we define the weighted singular value shrinkage operator Dν (·, W ) by Dν (X , W ) = UDν (Σ , W )V T
with Dν (Σ , W ) = Diag{(σi (X ) − νwi )+ : i ∈ I},
where plus function (·)+ is defined by a+ := max{a, 0} for any a ∈ R. In the following, we prove that the weighted singular value shrinkage operator Dν (·, W ) is a proximity operator associated with the weighted nuclear norm. Lemma 3.1. For each ν ≥ 0, W = Diag{wi : i ∈ I}, and Y ∈ Rm×n , the weighted singular value shrinkage operator Dν (Y , W ) is an optimal solution of the following problem: min ν X
n
1
wi σi (X ) + ∥X − Y ∥2F .
i =1
(3.2)
2
Proof. For W = Diag{wi : i ∈ I}, we denote f (X ) = fW (X ) = show that X¯ obeys the optimal condition of (3.2):
n
i=1
wi σi (X ) for simplicity. Let X¯ = Dν (Y , W ), we need to
0 ∈ ν∂ f (X¯ ) + X¯ − Y . T
T
For this purpose, we rewrite the SVD of Y ∈ Rm×n as Y = U (1) Σ (1) V (1) + U (2) Σ (2) V (2) where Σ (1) is a diagonal matrix whose diagonal elements are those σi (Y ) satisfying σi (Y ) ≥ νwi and Σ (2) is a diagonal matrix with other σi (Y ) as its diagonal elements. Let I1 = {i | σi (Y ) ≥ νwi }, I2 = I \ I1 and W (1) = Diag{wi : i ∈ I1 }, W (2) = Diag{wi : i ∈ I2 }. Then, T
X¯ = Dν (Y , W ) = U (1) (Σ (1) − ν W (1) )V (1) , and hence,
(1) (1) (1) T (2) 1 (2) (2) T ¯ Σ V Y −X =ν U W V +U ν with σ1 ( ν1 Σ (2) ) = ν1 σ1 (Σ (2) ) = ν1 maxi∈I2 σi (Y ) , ν1 σi0 (Y ) < wi0 ≤ maxi∈I2 wi = σ1 (W (2) ). ¯ T such that By Lemma 2.2, there exists W T ¯ T V (2) T : σ1 (W ¯ T ) ≤ σ1 (W (2) )}. ∂ f (X¯ ) = {U (1) W (1) V (1) + U (2) W Thus, Y − X¯ ∈ ν∂ f (X¯ ), which completes the proof.
By Lemma 3.1, we can easily derive the following theorem. Theorem 3.1. Given τ > 0, λ > 0, p ∈ (0, 1), and W = Diag{wi : i ∈ I}. X ∗ is an optimal solution of problem (1.8) if and only if X ∗ = Dτ λp (h(X ∗ ), W ), where h(·) = I (·) − τ g (·) with g (X ) = 2A∗ (A(X ) − b). Proof. Obviously, X ∗ is an optimal solution of problem (1.8) if and only if X ∗ is an optimal solution of problem (3.1). When taking Y = Y ∗ and ν = τ λp in (3.2), we can derive the theorem easily by Lemma 3.1.
Y.-F. Li et al. / Journal of Computational and Applied Mathematics 263 (2014) 338–350
343
Given τ > 0, λ > 0, p ∈ (0, 1), and W = Diag{wi : i ∈ I}. Then, starting with X 0 = 0m×n , the weighted fixed point iterative method for solving problem (1.8) inductively defines Y k = X k − τ g (X k ) and X k+1 = Dτ λp (Y k , W ) until a stopping criterion is reached. 3.2. An RNNM–WFP algorithm In this subsection, we propose an RNNM–WFP algorithm for solving problem (1.6), which is given as follows. Algorithm 3.1 (RNNM–WFP Algorithm). Input: linear operator A : Rm×n → Rs , vector b ∈ Rs , and estimated rank R. Output: matrix X ∗ ∈ Rm×n . Step 0 Given p ∈ (0, 1), choose parameters λ > 0, τ ∈ (0, 1], and a positive integer γs < 1. Initialize X 0 = 0m×n , ε 0 > 0. Set k := 0, decompose X 0 . Step 1 Compute W k by W k = Diag{(σi (X k ) + ε k )p−1 : i ∈ I}. Step 2 Compute X k+1 by X k+1 = Dτ λp (X k − τ g (X k ), W k ). Step 3 Update ε k+1 by ε k+1 = min{ε k , γs σR+1 (X k+1 )}. Step 4 If converged, we set X ∗ := X k+1 ; otherwise, set k := k + 1 and go to Step 1. 4. Convergence results In this section, we discuss the convergence of the RNNM–WFP algorithm (i.e., Algorithm 3.1). We prove that any accumulation point of the sequence generated by the RNNM–WFP algorithm is a stationary point of problem (1.6). We define a functional by
J (X , W , ε) := p
n
wi σi (X ) + ε tr(W ) +
1−p p
i=1
tr(W
p − 1− p
) .
(4.1)
Then, the iteration formula (1.7) is equivalent to X k+1 := arg min λJ (X , W k , ε k ) + ∥A(X ) − b∥22 . X
Here, we iterate the regular parameter ε in the algorithm as ε k+1 = min{ε k , γs σR+1 (X k+1 )} with γs < 1 and R be a guesstimate of the rank, which takes the same rule as [18,21,22]. m×n Lemma 4.1. Given and ε > 0. W ∗ = Diag{(σi (X ) + ε)−(1−p) : i ∈ I} is a solution to minW J (X , W , ε); and n X ∈ R ∗ p J (X , W , ε) = i=1 (σi (X ) + ε) .
Proof. By the definition of J (X , W , ε) (see (4.1)), we have that
J (X , W , ε) = p
n
wi σi (X ) + ε
i =1
=p
n
n
wi +
n 1−p
i=1
(σi (X ) + ε)wi +
i =1
1−p p
p
p − 1− p
(wi )
i =1 p
(wi )− 1−p
n
≥
(σi (X ) + ε)p ,
i=1
where the inequality follows from the fact that wi∗ = (σi (X ) + ε)−(1−p) is an optimal solution of the following convex problem: min (σi (X ) + ε)wi +
w i >0
1−p p
p − 1− p
wi
(4.2)
since wi∗ satisfies the first order condition of (4.2). Thus, the results of the lemma hold. Remark 4.1. By Lemma 4.1, we actually have that J (X k , W k , ε) := min J (X k , W , ε) = W
n (σi (X k ) + ε)p , i=1
where W = Diag{w : i ∈ I} with w = (σi (X k ) + ε)−(1−p) for i ∈ I. By (2.1) we have k
k i
k i
Lλ (X k , ε) = λJ (X k , W k , ε) + ∥A(X k ) − b∥22 .
344
Y.-F. Li et al. / Journal of Computational and Applied Mathematics 263 (2014) 338–350
In the following, we prove that the sequence {Lλ (X k , ε k )} is nonincreasing. Lemma 4.2. Let {X k } be the sequence generated by Algorithm 3.1. Then, Lλ (X k , ε k ) ≥ Lλ (X k+1 , ε k+1 ) holds for all k ≥ 1. Thus, the sequence {Lλ (X k , ε k )} is nonincreasing and convergent. Proof. Since W k = Diag{(σi (X k ) + ε k )p−1 : i ∈ I} is a solution to minW J (X k , W , ε k ) by Lemma 4.1, it is easy to see that J (X k+1 , W k+1 , ε k+1 ) ≤ J (X k+1 , W k , ε k+1 ) ≤ J (X k+1 , W k , ε k ),
(4.3)
where the second inequality holds from the definition of J (X , W , ε) and the fact that ε is decreasing in Algorithm 3.1. In addition, by Step 2 of Algorithm 3.1 and Lemma 3.1 we can obtain k
X k+1 := arg min λp
n
X
wik σi (X ) + ∥A(X ) − b∥22 .
i =1
This further yields X k+1 := arg minX λ J (X , W k , ε k ) + ∥A(X ) − b∥22 by the definition of J (X , W , ε). Thus, we obtain that
λJ (X k+1 , W k , εk ) + ∥A(X k+1 ) − b∥22 ≤ λJ (X k , W k , εk ) + ∥A(X k ) − b∥22 .
(4.4)
Furthermore, we have Lλ (X k , ε k ) − Lλ (X k+1 , ε k ) = λJ (X k , W k , ε k ) + ∥A(X k ) − b∥22 − λJ (X k+1 , W k+1 , ε k+1 ) − ∥A(X k+1 ) − b∥22
≥ λJ (X k+1 , W k , εk ) + ∥A(X k+1 ) − b∥22 − λJ (X k+1 , W k+1 , εk+1 ) − ∥A(X k+1 ) − b∥22 = λJ (X k+1 , W k , εk ) − λJ (X k+1 , W k+1 , εk+1 ) ≥ λJ (X k+1 , W k , εk ) − λJ (X k+1 , W k+1 , εk ) ≥ 0, where the first equality follows from Remark 4.1, the first inequality from (4.4), and the last two inequalities from (4.3).
Now, we show the convergence of Algorithm 3.1. Theorem 4.1. Given λ > 0 and p ∈ (0, 1). Let εmin := limk→∞ ε k > 0. Then, the sequence {X k } generated by Algorithm 3.1 is bounded and every accumulation point of the sequence is a stationary point of (1.6) with ε = εmin . Proof. By Lemma 4.2, the sequence {Lλ (X k , ε k )} is nonincreasing and convergent. This, together with ε k > 0, λ > 0 and the definition of Lλ (·, ·) by (2.1), implies that the sequence {X k } is bounded. Thus, there exists a subsequence {X kj } of {X k }, which converges to X˜ . By the iteration in Algorithm 3.1, we have that X kj +1 = Dτ λp (X kj − τ g (X kj ), W kj ) with W kj = Diag{(σi (X kj ) + ε kj )p−1 : i ∈ I}. Note that the plus function (·)+ is continuous, by the definition of the weighted singular value shrinkage operator Dν (·, ·), it follows that Dτ λp (·, ·) is continuous, which, together with the continuity of g (·), implies that
˜ ) , Xˆ . X kj +1 = Dτ λp (X kj − τ g (X kj ), W kj ) → Dτ λp (X˜ − τ g (X˜ ), W This shows that {X kj +1 } is another convergent subsequence of {X k } converging to Xˆ . Moreover, by Step 2 of Algorithm 3.1 and Lemma 3.1, we have X kj +1 := arg min λp X
n
k
wi j σi (X ) + ∥A(X ) − b∥22 .
(4.5)
i =1
It further suggests that 0 ∈ λp ∂
n
k
wi j σi (X kj +1 ) + 2A∗ (A(X kj +1 ) − b).
i=1
By Lemma 2.2, there exist U kj +1 ∈ Rm×n and V kj +1 ∈ Rn×n such that
k j +1
0 = λp U0
kj
kj +1 T
W r ( V0
k +1
) + U1 j
kj
k j +1 T
¯ n −r T ( V1 W
)
+ 2A∗ (A(X kj +1 ) − b),
(4.6)
Y.-F. Li et al. / Journal of Computational and Applied Mathematics 263 (2014) 338–350
where W kj =
kj
Wr
kj
Wn−r
and X kj +1 = U kj +1 Σ kj +1 (V kj +1 )T is the singular value decomposition with the matrices kj +1
being partitioned by U kj +1 = [U0 k
345
k
.. kj +1 k +1 .. k +1 k +1 k +1 . U1 ], V kj +1 = [V0 j . V1 j ], here U0 j and V0 j have r columns, and
¯ n−j r T ) ≤ σ1 (Wn−j r ). Let j → ∞, (4.6) becomes σ1 (W ˜¯ ˆ T + 2A∗ (A(Xˆ ) − b) 0 = λp Uˆ0 W˜ r (Vˆ0 )T + Uˆ1 W n−r T (V1 )
(4.7)
˜¯ ˜ and σ1 (W ˜ with U kj +1 → Uˆ , V kj +1 → Vˆ , W kj → W n−r T ) ≤ σ1 (Wn−r ). Thus, by combining (4.7) with Lemma 2.2, we ˆ conclude that X is an optimal solution to min λp X
n
w ˜ i σi (X ) + ∥A(X ) − b∥22 .
(4.8)
i=1
Now, suppose that X˜ is not a stationary point of (1.6), we will derive a contradiction. For this purpose, we need to show that if X˜ is not a stationary point of (1.6), then X˜ is not a minimizer of problem (4.8).
• In fact, if X˜ is a minimizer of problem (4.8), then W˜ r ˜ 0 ∈ λp U V˜ T + 2A∗ (A(X˜ ) − b). ¯ Wn˜−r T W˜ r ˜¯ ˜ Let B = satisfy that σi (B) = w ˜ i , i = 1, . . . , r, and σi (B) ≤ σ1 (W ˜ i, i = n−r T ) ≤ σ1 (Wn−r ) = maxi=r +1,...,n w ¯ Wn˜−r T 1, σi (X˜ ) > 0, r + 1, . . . , n with w ˜ i = (σi (X˜ ) + εmin )p−1 for i ∈ I. Then, there exists ci ∈ ∂|σi (X˜ )| satisfying ci = such ∈ [−1, 1], σ (X˜ ) = 0 i
that
˜ 0 = λp UDiag
ci
(σi (X˜ ) + εmin )1−p
: i ∈ I V˜ T + 2A∗ (A(X˜ ) − b),
which demonstrates by (2.2) that X˜ is a stationary point of (1.6). Thus, by the assumption we know that X˜ is not a minimizer of (4.8). Since Xˆ minimizes (4.8), we have
λp
n i =1
w ˜ i σi (Xˆ ) + ∥A(Xˆ ) − b∥22 < λp
n
w ˜ i σi (X˜ ) + ∥A(X˜ ) − b∥22 .
i=1
By the definition of J (X , W , ε) given in (4.1), the above inequality is equivalent to
˜ , εmin ) + ∥A(Xˆ ) − b∥22 < λJ (X˜ , W ˜ , εmin ) + ∥A(X˜ ) − b∥22 . λJ (Xˆ , W Moreover, from Remark 4.1, we have that Lλ (X kj , ε kj ) = λJ (X kj , W kj , ε kj ) + ∥A(X kj ) − b∥22 ; Lλ (X kj +1 , ε kj +1 ) = λJ (X kj +1 , W kj +1 , ε kj +1 ) + ∥A(X kj +1 ) − b∥22 . Since the sequence {Lε (X k , λ)} converges by Lemma 4.2, it holds that
˜ , εmin ) + ∥A(X˜ ) − b∥22 = lim λJ (X kj , W kj , εkj ) + ∥A(X kj ) − b∥22 λJ (X˜ , W j→∞
= lim Lλ (X kj , εkj ) = lim Lλ (X kj +1 , εkj +1 ) j→∞
j→∞
= lim λJ (X kj +1 , W kj +1 , εkj +1 ) + ∥A(X kj +1 ) − b∥22 j→∞
≤ lim λJ (X kj +1 , W kj , εkj ) + ∥A(X kj +1 ) − b∥22 j→∞
˜ , εmin ) + ∥A(Xˆ ) − b∥22 , = λJ (Xˆ , W where the inequality follows from (4.3). This contradicts (4.9), which indicates that X˜ is a stationary point of (1.6).
(4.9)
346
Y.-F. Li et al. / Journal of Computational and Applied Mathematics 263 (2014) 338–350
Hence, the sequence {X k } generated by Algorithm 3.1 is bounded and every accumulation point of the sequence is a stationary point of (1.6). 5. Numerical results In this section, we give the experiment results of RNNM–WFP algorithm (i.e., Algorithm 3.1). In the experiments, we solve the following matrix completion problem: min λ X
n (σi (X ) + ε)p + ∥PΩ (X − M )∥22 , i=1
which is problem (1.6) with A = PΩ and PΩ (X ) =
Xij , 0,
(i, j) ∈ Ω , otherwise
being a projective operator. Here, M ∈ Rm×n is the matrix
to be recovered with sampled entries {Mij , (i, j) ∈ Ω } known, and Ω is a random subset of index set. We consider recovering randomly created matrix M = ML MRT + σ Ξ by the proceeding that M = ML MRT satisfying ML ∈ Rm×r , MR ∈ Rn×r , and Ξ ∈ Rm×n with i.i.d. standard Gaussian entries respectively generated in MATLAB, where Ξ is the noisy matrix and σ is the noisy level. We sample a subset Ω by q entries uniformly at random. We set the sampling ratio sr = q/(mn) and the degree of freedom ratio fr = r (m + n − r )/q. In the experiments, we use the following two conditions as the stopping rules
∥X k − X k+1 ∥F < tol or
∥X k − X k+1 ∥F < tol, max{1, ∥X k ∥F }
(5.1)
where tol is a very small number, and if one of the above two stopping criteria is satisfied, the iteration is terminated. We use the relative error ∥X ∗ − M ∥F rel.err := ∥M ∥F to measure the recovery level. Here, X ∗ is the output of Algorithm 3.1. All experiments were done on a PC with CPU of 2.93 GHz and RAM of 2.00 GB. For each matrix M ∈ Rm×n with different ranks r and sample numbers q, we take the average value of ten independent experiments as the final result in our numerical experiments. 5.1. The parameters In our computational experiments, the parameters used in the algorithm are chosen as follows. We set the maximum iteration number maxit as 1000, and if the stopping criterion (5.1) is not satisfied after 1000 iterations, the algorithm is terminated then. We choose the parameters τ = 1 and γs = 0.9, respectively. To overcome the expensive cost in computing the singular value decomposition of Y k in the k-th iteration, we compute the rank-R truncated SVD of Y k in our experiments. The choice of estimated rank R is very important for speeding up the algorithm, and the larger R is, the more time we cost in the computation of SVD. In the numerical experiments, we set R as R = min{l, Rk } with l = min{m, n} and Rk to be the largest number such that σRk (Y k ) > ρ × σ1 (Y k ). It is obvious that the sequence {Rk } is nonincreasing. In the following, we compare the effect of the different choices of value ρ to the experiments through some tests. In the tests, we only consider the 200 × 200 matrix with sr = 0.5 and rank r varying from 10 to 20 for different values of ρ = 0.16, 0.20, . . . , 0.45, respectively. We set the parameters λ = 1e−6, tol = 1e−6, and p = 0.2 in the tests. Fig. 1 depicts the performances of the different values of ρ for different ranks r. From this, we can find that ρ = 0.3 performs well both in the CPU time cost and in the precision of recovery. 5.2. Comparison with nuclear norm minimization In this subsection, we compare our algorithm with the classical nuclear norm minimization methods. When p = 1 and
ε = 0, problem (1.6) reduces to nuclear norm minimization problem: min λ∥X ∥∗ + ∥A(X ) − b∥22 , X
(5.2)
and Algorithm 3.1 reduces to a fixed point method for solving nuclear norm minimization problem (NNM–FP). It is easy to verify that the results obtained in the paper still hold in this case. Moreover, when a continuation technique for λ is adopted, the corresponding fixed point continuation iterative scheme (FPCA) for solving (5.2) was outlined in [26]. Next, we compare the performances of algorithms RNNM–WFP (0 < p < 1), NNM–FP (p = 1), and FPCA, respectively. In all these experiments, we set λ = 1e−4, tol = 1e−4 and take the first condition of (5.1) as the stopping rule for RNNM–WFP and NNM–FP. The other parameters for FPCA are taken the same as in [26]. The performances of NNM–FP and RNNM–WFP for p = 0.2, 0.4, 0.6, 0.8 on randomly created problems for 500 × 500 matrices with shifty rank r are shown in Fig. 2. Fig. 2 shows that RNNM–WFP with different p can recover better solutions than NNM–FP in less CPU time almost all the time. Table 1 displays the behavior of FPCA and RNNM–WFP with p being taken
Y.-F. Li et al. / Journal of Computational and Applied Mathematics 263 (2014) 338–350
(a) Relative error.
347
(b) CPU time in seconds.
Fig. 1. Recovery results of 200 × 200 matrices with different ranks r by RNNM–WFP.
(a) Relative error.
(b) CPU time in seconds. Fig. 2. Recovery results of 500 × 500 matrices by RNNM–WFP and NNM–FP. Table 1 Numerical results of FPCA and RNNM–WFP. Problems n (m)
FPCA
r
RNNM–WFP
sr
fr
Time
Rel. err.
Time
Rel. err.
200 500 500 500
10 10 20 50
0.39 0.23 0.39 0.57
0.250 0.172 0.201 0.333
1.05E+00 5.52E+00 1.02E+01 1.55E+01
6.47E−05 1.32E−04 3.31E−05 2.71E−05
4.98E−01 3.81E+00 2.62E+00 7.41E+00
7.72E−07 4.50E−07 1.31E−07 4.69E−08
1000 1000 1000 3000
50 80 100 100
0.39 0.39 0.57 0.26
0.250 0.394 0.333 0.250
5.97E+01 6.69E+01 8.97E+01 6.16E+02
3.10E−05 7.52E−05 2.23E−05 9.01E−05
2.23E+01 5.06E+01 6.44E+01 5.95E+02
4.87E−08 7.81E−08 3.00E−08 1.94E−08
randomly from (0, 1) on randomly created matrix completion problems. From Table 1, we can see that RNNM–WFP always get better solutions than FPCA with less CPU time. The numerical results show that RNNM–WFP algorithm performs better than the classical nuclear norm minimization methods. 5.3. Comparison with other algorithms In this subsection, we compare the performance of Algorithm 3.1 with two recent algorithms including the popular solver LMaFit proposed in [27] and the truncated-iteratively reweighted unconstrained Lq minimization algorithm t-IRucLq proposed in [21]. In the following experiments, we take p = 0.2 for RNNM–WFP and set λ = 1e−6, tol = 1e−4 for all the algorithms. We compare the behavior of RNNM–WFP, t-IRucLq, and LMaFit with respect to different ranks of M with or without noise. Table 2 gives the comparison results of RNNM–WFP and t-IRucLq on randomly created matrix completion problems with varying rank and sampling ratio without noise. From Table 2, we can see that RNNM–WFP can get better solutions than tIRucLq with less CPU time almost all the time. Also, it can recover comparable solutions to LMaFit for rank r = 50, 80, 100. What to mention is that LMaFit takes less CPU time than both RNNM–WFP and t-IRucLq as it avoided the computation of the SVD of matrix.
348
Y.-F. Li et al. / Journal of Computational and Applied Mathematics 263 (2014) 338–350
(a) Relative error.
(b) CPU time in seconds. Fig. 3. Recovery results of 300 × 300 matrices on different noise levels.
Table 2 Numerical results of LMaFit, t-IRucLq and RNNM–WFP without noise (i.e., σ = 0). Problems
LMaFit
t-IRucLq
RNNM–WFP
n (m)
r
sr
fr
Time
Rel. err.
Time
Rel. err.
Time
Rel. err.
1000
10 20 50 80 100
0.119 0.130 0.390 0.390 0.570
0.167 0.305 0.250 0.394 0.333
4.90E−01 1.73E+00 1.44E+00 2.62E+00 2.33E+00
1.52E−04 2.28E−04 1.41E−04 1.54E−04 1.57E−04
2.30E+01 3.28E+01 3.62E+01 9.74E+01 8.91E+01
1.67E−02 2.41E−02 1.00E−03 2.21E−03 4.06E−04
1.77E+01 2.93E+01 1.31E+01 2.44E+01 3.21E+01
8.03E−04 1.34E−03 7.96E−05 2.53E−04 2.29E−05
2000
10 20 50 80 100
0.060 0.080 0.167 0.250 0.290
0.166 0.249 0.296 0.314 0.336
1.13E+00 2.70E+00 7.72E+00 9.74E+00 1.14E+01
1.43E−04 1.64E−04 1.77E−04 2.01E−04 1.70E−04
1.08E+02 1.52E+02 3.44E+02 3.97E+02 5.49E+02
7.49E−02 5.13E−02 1.34E−02 5.50E−03 4.14E−03
1.39E+02 1.44E+02 1.18E+02 1.12E+02 1.21E+02
1.99E−03 1.88E−03 7.71E−04 3.99E−04 3.44E−04
3000
10 20 50 80 100
0.040 0.080 0.165 0.263 0.262
0.166 0.166 0.200 0.200 0.250
1.81E+00 5.16E+00 1.34E+01 1.70E+01 2.05E+01
1.48E−04 1.24E−04 1.49E−04 1.15E−04 1.66E−04
1.53E+02 2.28E+02 4.90E+02 6.96E+02 1.06E+03
1.46E−01 4.16E−02 1.00E−02 3.01E−03 3.74E−03
4.91E+02 3.08E+02 2.39E+02 2.10E+02 2.60E+02
3.15E−03 1.19E−03 4.41E−03 1.45E−04 2.44E−04
Table 3 Numerical results of LMaFit, t-IRucLq and RNNM–WFP with noise (i.e., σ = 0.1). Problems
LMaFit
t-IRucLq
RNNM–WFP
n (m)
r
sr
fr
Time
Rel. err.
Time
Rel. err.
Time
Rel. err.
1000
10 20 50 80 100
0.119 0.130 0.390 0.390 0.570
0.167 0.305 0.250 0.394 0.333
3.55E−01 1.06E+00 1.19E+00 2.18E+00 1.95E+00
4.47E−02 7.04E−02 5.50E−02 6.53E−02 6.39E−02
4.27E+01 4.80E+01 2.74E+01 7.82E+01 6.75E+01
1.77E−02 3.09E−02 1.15E−02 1.14E−02 9.00E−02
1.95E+01 3.03E+01 1.29E+01 2.38E+01 2.96E+01
1.42E−02 1.47E−02 7.79E−03 8.31E−03 6.37E−03
2000
10 20 50 80 100
0.060 0.080 0.167 0.250 0.290
0.166 0.249 0.296 0.314 0.336
8.07E−01 1.74E+00 5.39E+00 6.52E+00 7.69E+00
4.51E−02 5.77E−02 6.95E−02 6.82E−02 6.75E−02
8.12E+01 1.20E+02 2.42E+02 3.44E+02 4.34E+02
7.57E−02 5.70E−02 1.85E−02 1.20E−02 1.06E−02
2.03E+02 2.18E+02 1.41E+02 1.32E+02 1.35E+02
1.43E−02 1.30E−02 8.99E−03 7.27E−03 6.79E−03
3000
10 20 50 80 100
0.040 0.080 0.165 0.263 0.262
0.166 0.166 0.200 0.200 0.250
1.38E+00 3.60E+00 8.97E+00 1.05E+01 1.21E+01
4.52E−02 4.47E−02 4.93E−02 4.88E−02 5.60E−02
1.57E+02 2.47E+02 5.07E+02 6.19E+02 9.24E+02
1.52E−01 4.46E−02 1.46E−02 9.37E−03 9.36E−03
5.31E+02 3.28E+02 2.44E+02 2.17E+02 2.65E+02
1.46E−02 1.00E−02 6.98E−03 5.45E−03 5.60E−03
Table 3 gives the comparison results of RNNM–WFP, t-IRucLq, and LMaFit for the same problems with noisy level
σ = 0.10. From Table 3, we can see that RNNM–WFP can always get much better solutions than t-IRucLq and LMaFit, which shows that RNNM–WFP is not sensitive to the noise compared to t-IRucLq and LMaFit; and RNNM–WFP almost always takes less CPU time than t-IRucLq, but it seems to need more CPU time than LMaFit.
Y.-F. Li et al. / Journal of Computational and Applied Mathematics 263 (2014) 338–350
349
Fig. 4. Image Recovery of 512 × 512 by RNNM–WFP and t-IRucLq. (a): Original image with full rank; (b): Original image truncated to rank 40 image; (c): Original image with 50% sampling; (d): Recovery image of (c) by RNNM–WFP; (e): Recovery image of (c) by t-IRucLq; (f): Rank 40 image with 50% sampling; (g): Recovery image of (f) by RNNM–WFP; (h): Recovery image of (f) by t-IRucLq; (i):Rank 40 image with 4% masked; (j): Recovery image of (i) by RNNM–WFP; (k): Recovery image of (i) by t-IRucLq.
In special, the comparison between RNNM–WFP and t-IRucLq for matrix completion problems on randomly created 300 × 300 matrices with sr = 0.5 and shifty rank r is presented in Fig. 3. Fig. 3 depicts the performance of RNNM–WFP and t-IRucLq on different noise levels. It is easy to see that when the sampling ratio is fixed, the higher the rank is, the harder the problem is; and when the rank is fixed, the higher the noisy level is, the harder the problem is. In Fig. 3(a), RNNM–WFP
350
Y.-F. Li et al. / Journal of Computational and Applied Mathematics 263 (2014) 338–350
always recover the better solutions than t-IRucLq in any noisy level for different matrix rank. On the other hand, Fig. 3(b) declares that RNNM–WFP takes less CPU time than t-IRucLq almost in all the cases. 5.4. Image recovery In the following, we apply the algorithms to the grayscale image recovery. As we all know, the grayscale image can be characterized by matrix; while the color image can be described by tensor. The work we need to do is to recover the missing pixel value of the image in some positions, which is the inpainting process. When the matrix responding to the image is of low rank, we can solve the problem as the low rank matrix minimization problem (or matrix completion problem more specifically). Although the image is not usually low rank, we can use the matrix completion problem to solve it by the low rank approximation. The image in our test is 512 × 512, which is in Fig. 4(a), and its truncated SVD with rank 40 is shown in Fig. 3(b). The images (c) and (f) in Fig. 4 are obtained from (a) and (b) in Fig. 4 by taking half of the pixels uniformly at random, respectively. Fig. 4(i) is another image obtained from Fig. 4(b) with 4% of the pixels are masked in non-random fashion. The recovered images of (c), (f), (i) in Fig. 4 by RNNM–WFP and t-IRucLq are respectively shown as (d), (g), (j) and (e), (h), (k) in Fig. 4. In the comparison, we take λ = 1e−6, tol = 1e−4, and the maximum iteration number maxit = 500, respectively. 6. Conclusions In this paper, we designed a reweighted nuclear norm minimization algorithm based on the weighted fixed point method for solving the unconstrained L2 –Mp problem (1.6), which is a smooth nonconvex approximation to the low rank matrix minimization problem. The convergence result of Algorithm 3.1 was established, in which we showed that any accumulation point of the sequence generated by Algorithm 3.1 is a stationary point of problem (1.6). The numerical results demonstrated that Algorithm 3.1 is effective for the recovery of matrix completion problems than other iteratively reweighted algorithms no matter in the noiseless or the noisy case. Moreover, Algorithm 3.1 is not so much sensitive to the noise. For the further study, it is worth considering that whether the reweighted nuclear norm minimization algorithm proposed in this paper can be applied to other nonconvex approximation problems or not. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27]
E.J. Candés, T. Tao, Decoding by linear programming, IEEE Trans. Inform. Theory 51 (12) (2005) 4203–4215. B. Recht, M. Fazel, P. Parrilo, Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization, SIAM Rev. 52 (2010) 471–501. R. Chartrand, Exact reconstruction of sparse signals via nonconvex minimization, IEEE Signal Process. Lett. 14 (10) (2007) 707–710. R. Chartrand, V. Staneva, Restricted isometry properties and nonconvex compressive sensing, Inverse Problems 24 (2008) 035020. X. Chen, F. Xu, Y. Ye, Lower bound theory of nonzero entries in solutions of l2 –lp minimization, SIAM J. Sci. Comput. 32 (2010) 2832–2852. S. Foucart, M.-J. Lai, Sparsest solutions of undertermined linear systems via lq -minimization for 0 < q < 1, Appl. Comput. Harmon. Anal. 26 (2009) 395–407. G. Gasso, A. Rakotomamonjy, S. Canu, Recovering sparse signals with non-convex penalties and DC programming, IEEE Trans. Signal Process. 57 (2009) 4686–4698. D. Ge, X. Jiang, Y. Ye, A note on the complexity of Lp minimization, Math. Program. 129 (2011) 285–299. L. Kong, N. Xiu, Exact low-rank matrix cecovery via nonconvex schatten p-minimization, Asia-Pac. J. Oper. Res. 30 (3) (2013) 1340010 (13 pages). R. Saab, O. Yilmaz, Sparse recovery by non-convex optimization-instance optimality, Appl. Comput. Harmon. Anal. 29 (1) (2010) 30–48. Y. Shen, S. Li, Restricted p-isometry property and its application for nonconvex compressive sensing, Adv. Comput. Math. 37 (2012) 441–452. M. Wang, W. Xu, A. Tang, On the performance of sparse recovery via lp -minimization, IEEE Trans. Inform. Theory 57 (11) (2011) 7255–7278. Y.B. Zhao, An approximation theory of matrix rank minimization and its applications to quadratic equations, Linear Algebra Appl. 437 (2012) 77–93. M. Zhang, Z.H. Huang, Y. Zhang, Restricted p-isometry properties of nonconvex matrix recovery, IEEE Trans. Inform. Theory 59 (7) (2013) 4316–4323. E.J. Candés, M.B. Wakin, S.P. Boyd, Enhancing sparsity by reweighted l1 minimization, J. Fourier Anal. Appl. 14 (5) (2008) 877–905. X. Chen, W. Zhou, Convergence of reweighted l1 minimization algorithms for l2 –lp minimization, Comput. Optim. Appl. (2013). http://dx.doi.org/10. 1007/s10589-013-9553-8. X. Chen, W. Zhou, Smoothing nonlinear conjugate gradient method for image restoration using nonsmooth nonconvex minimization, SIAM J. Imaging Sci. 3 (2010) 765–790. K. Mohan, M. Fazel, Iterative reweighted least squares for matrix rank minimization, in: 48th Annual Allerton Conference on Communication, Control, and Computing, Allerton, 2010, pp. 653–661. I. Daubechies, R. DeVore, M. Fornasier, C.S. Gntrk, Iteratively reweighted least squares minimization for sparse recovery, Comm. Pure Appl. Math. 63 (1) (2010) 1–38. M.-J. Lai, J. Wang, An unconstrained lq minimization with 0 < q < 1 for sparse solution of underdetermined linear systems, SIAM J. Optim. 21 (1) (2011) 82–101. M.-J. Lai, Y. Xu, W. Yin, Improved iteratively reweighted least squares for unconstrained smoothed lq minimization, SIAM J. Numer. Anal. 5 (2) (2013) 927–957. M. Fornasier, H. Rauhut, R. Ward, Low-rank matrix recovery via iteratively reweighted least squares minimization, SIAM J. Optim. 21 (4) (2011) 1614–1640. G.A. Watson, Characterization of the subdifferential of some matrix norms, Linear Algebra Appl. 170 (1992) 33–45. A.S. Lewis, H.S. Sendov, Nonsmooth analysis of singular values, part I: theory, Set-Valued Anal. 13 (3) (2005) 213–241. E.T. Hale, W. Yin, Y. Zhang, A fixed-point continuation method for l1 -regularized minimization with applications to compressed sensing, in: Technical Report, CAAM TR 07-07, Rice University, 2007. S. Ma, D. Goldfarb, L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Math. Program. 128 (1–2) (2011) 321–353. Z. Wen, W. Yin, Y. Zhang, Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm, Math. Program. Comput. 4 (2012) 333–361.