Sparse recovery based on q-ratio constrained minimal singular values

Sparse recovery based on q-ratio constrained minimal singular values

Signal Processing 155 (2019) 247–258 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro S...

1MB Sizes 0 Downloads 29 Views

Signal Processing 155 (2019) 247–258

Contents lists available at ScienceDirect

Signal Processing journal homepage: www.elsevier.com/locate/sigpro

Sparse recovery based on q-ratio constrained minimal singular values Zhiyong Zhou∗, Jun Yu Department of Mathematics and Mathematical Statistics, Umeå University, Umeå 901 87, Sweden

a r t i c l e

i n f o

a b s t r a c t

Article history: Received 10 May 2018 Revised 5 September 2018 Accepted 2 October 2018 Available online 3 October 2018 Keywords: Compressive sensing q-ratio sparsity q-ratio constrained minimal singular values Convex–concave procedure

We study verifiable sufficient conditions and computable performance bounds for sparse recovery algorithms such as the Basis Pursuit, the Dantzig selector and the Lasso estimator, in terms of a newly defined family of quality measures for the measurement matrices. With high probability, the developed measures for subgaussian random matrices are bounded away from zero as long as the number of measurements is reasonably large. Comparing to the restricted isotropic constant based performance analysis, the arguments in this paper are much more concise and the obtained bounds are tighter. Numerical experiments are presented to illustrate our theoretical results. © 2018 Elsevier B.V. All rights reserved.



{δ ∈ RN : δ1 ≤ (1 + c0 ) sδ2 } is a cone in RN , was used to es-

1. Introduction Sparse signal recovery, particularly compressive sensing [1–4], aims to reconstruct a sparse signal from noisy underdetermined linear measurements:

y = Ax + w,

(1)

RN

where x ∈ is the true sparse or compressible signal, y ∈ Rm is the measurement vector with m  N, A ∈ Rm×N is the measurement matrix, and w ∈ Rm is the noise vector. If the measurement matrix satisfies the stable or robust null space property (NSP) [5] or restricted isometry property (RIP) [1,6], stable and robust recovery can be guaranteed. Although probabilistic results conclude that the NSP and RIP are fulfilled for some specific random matrices with high probability [7–9], it’s computationally hard to verify NSP and compute restricted isometry constant (RIC) for a given measurement matrix [10,11]. Several relaxation techniques are used to obtain an approximate solution, for instance semi-definite programming [12,13] and linear programming [14]. Recently, [15] and [16] defined new kinds of computable quality measures of the measurement matrix. Specifically, [15] developed 1 -constrained minimal singular values (CMSV) ρs (A ) = Az2 min z and obtained the error 2 bounds in terms of z=0,z21 /z22 ≤s

2

this quality measure of the measurement matrix for the Basis Pursuit (BP) [17], the Dantzig selector (DS) [18], and the Lasso estimator [19]. Essentially the same quantity called strong restricted Aδ2 √ eigenvalue, i.e., θ (s, c0 ) = min where CSRE (s, c0 ) = nδ δ =0,δ ∈CSRE (s,c0 )



2

Corresponding author. E-mail addresses: [email protected] (Z. Zhou), [email protected] (J. Yu).

https://doi.org/10.1016/j.sigpro.2018.10.002 0165-1684/© 2018 Elsevier B.V. All rights reserved.

tablish the minimax optimal bounds of the q estimation errors with 1 ≤ q ≤ 2 for the Sorted L-One Penalized Estimation (Slope) [20] and Lasso, see [21]. The corresponding results were also obtained for the Square-Root Slope and Square-Root Lasso via this quantity in [22]. Similarly, in [16], the authors defined another quantity ω♦ (A, s ) =

Az♦ min z=0,z1 /z∞ ≤s z∞

with  · ♦ denoting

a general norm, and derived the performance bounds on the ∞ norm of the recovery error vector based on this quality measure. This kind of measures has also been used in establishing results for block sparsity recovery [23] and low-rank matrix recovery [24]. In this paper we generalize these two quantities to a more general quantity called q-ratio CMSV with 1 < q ≤ ∞, and establish the performance bounds for both q norm and 1 norm of the reconstruction errors. Our contribution mainly has four aspects. First, we proposed a sufficient condition based on a q-ratio sparsity level for the exact recovery using 1 minimization in the noise free case, and designed a convex-concave procedure to solve the corresponding non-convex problem, leading to an acceptable verification algorithm. Second, we introduced q-ratio CMSV and derived concise bounds on both q norm and 1 norm of the reconstruction error for the BP, the DS and the Lasso estimator in terms of q-ratio CMSV. We established the corresponding stable and robust recovery results involving both sparsity defect and measurement error. As far as we know, we are the first to establish the results for the compressible (not exactly sparse) cases since all the literature mentioned above only considered the exactly sparse cases. Third, we demonstrated that for subgaussion random matrices, the q-ratio CMSVs are bounded away from zero with high probability, as long as the number of measurement is large enough.

248

Z. Zhou, J. Yu / Signal Processing 155 (2019) 247–258

Finally, we presented algorithms to compute the q-ratio CMSV for an arbitrary measurement matrix, and studied the effects of different parameters on the proposed q-ratio CMSV. Moreover, we illustrated that q-ratio CMSV based bound is tighter than the RIC based one. The paper is organized as follows. In Section 2, we present the definitions of q-ratio sparsity and q-ratio CMSV, and give a sufficient condition for unique noiseless recovery based on the qratio sparsity and an inequality for the q-ratio CMSV. In Section 3, we derive performance bounds on both q norm and 1 norm of the reconstruction errors for several convex recovery algorithms in terms of q-ratio CMSVs. In Section 4, we demonstrate that the subgaussian random matrices have non-degenerate q-ratio CMSVs with high probability as long as the number of measurements is relatively large. In Section 5, we design algorithms to verify the sufficient condition for unique recovery in noise free case and compute the q-ratio CMSV. The q-ratio CMSV based bound and the RIC based bound on the BP are also compared therein. Section 6 contains the conclusion. Finally, the proofs are postponed to the Appendix. Throughout the paper, we denote vectors by lower case letters and matrices by upper case letters. Vectors are columns by default. zT denotes the transpose of the vector z and zi denotes the i-th entry of z. For any vector z ∈ RN , we denote the 0 norm  z0 = Ni=1 1{zi = 0}, the ∞ norm z∞ = max1≤i≤N |zi | and the  q 1/q for 0 < q < ∞. We say a signal x is q norm zq = ( N i=1 |zi | ) k-sparse if x0 ≤ k. [N] denotes the set {1, 2, . . . , N} and |S| denotes the cardinality of a set S. Furthermore, we write Sc for the complement [N]ࢨS of a set S in [N]. supp(x) := {i ∈ [N]: xi = 0}. For a vector x ∈ RN and a set S ⊂ [N], we denote by xS the vector coincides with x on the indices in S and is extended to zero outside S. For any matrix A ∈ Rm×N , kerA := {z ∈ RN : Az = 0}, AT is the transpose and trace(A) is the common trace function. · , · is the inner product function. ·  denotes the floor function.

In this section, we present the definitions of q-ratio sparsity and q-ratio CMSV, and give their basic properties. A sufficient condition is established for unique sparse recovery via noise free BP. We start with a stable sparsity measure, which is called q-ratio sparsity level here. Definition 1 ([25,26]). For any non-zero z ∈ RN and non-negative q ∈ {0, 1, ∞}, the q-ratio sparsity level of x is defined as

sq ( z ) =

z 1 z q

q  q−1

.

(2)

q→0

s1 (z ) = lim sq (z ) = exp(H1 (π (z ))) q→1

s∞ (z ) = lim sq (z ) = q→∞

z 1 . z ∞

exp(Hq (π (z ))) if z = 0 0 if z = 0,

(6)

where Hq is the Rényi entropy of order q ∈ [0, ∞] [28,29]. When q ∈ {0, 1, ∞}, the Rényi entropy is given by Hq (π (z )) = N 1 q 1−q log ( i=1 πi (z ) ), and the cases of q ∈ {0, 1, ∞} are defined by evaluating limits, with H1 being the ordinary Shannon entropy. The sparsity measure sq (z) has the following basic properties (see also [25,26]): •





Continuity: Unlike the traditional sparsity measure 0 norm, the function sq ( · ) is continuous on RN \ {0} for all q > 0. Thus, it is stable with respective to small perturbations of the signal. Scale-invariance: For all c = 0, it holds that sq (cz ) = sq (z ). This property is in line with the common sense that sparsity should be based on relative (rather than absolute) magnitudes of the entries of the signal. Non-increasing in q: For any q ≥ q ≥ 0, we have

z 1 = s∞ ( z ) ≤ sq ( z ) ≤ sq ( z ) ≤ s0 ( z ) = z 0 , z ∞ •

which follows from the non-increasing property of the Rényi entropy Hq with respect to q. Range equal to [1, N]: For all z ∈ RN \ {0} and all q ∈ [0, ∞], we z have 1 ≤ z 1 = s∞ (z ) ≤ sq (z ) ≤ s0 (z ) = z0 ≤ N. ∞

Next, we present a sufficient condition for the exact recovery via noise free BP in terms of q-ratio sparsity. First, it is well known that when the true signal x is k-sparse, the sufficient and necessary condition for the exact recovery of the noise free BP problem:

min z∈RN

z1 s.t. Az = Ax

(7)

is given by the null space property of order k:

see Theorem 4.5 in [4]. Then, the sufficient condition for exact recovery of k-sparse signal via noise free BP (7) in terms of q-ratio sparsity goes as follows. Proposition 1. If x is k-sparse and there exists some 1 < q ≤ ∞ such that k is strictly less than

min

z∈kerA\{0}

q

2 1−q sq (z ),

(8)

then the unique solution to problem (7) is the true signal x. Remarks. Obviously, this result is a direct extension of the Proposition 1 in [15], which states that if the sparsity level k is 1 1 strictly less than either min min 4 s2 (z ) or 2 s∞ (z ), then z∈kerA\{0}

The cases of q ∈ {0, 1, ∞} are evaluated as limits:

s0 (z ) = lim sq (z ) = z0



sq ( z ) =

zS 1 < zSc 1 , ∀z ∈ kerA \ {0}, S ⊂ [N] with |S| ≤ k,

2. q-Ratio sparsity and q-ratio CMSV



which counts effective coordinates of z by counting effective states of π (z) via entropy. Formally, we have

z∈kerA\{0}

the unique recovery is achieved via (7). We extends it to a weaker

(3)

condition, that is k < sup

min

q∈ (1,∞] z∈kerA\{0}

(4)

(5)

Here π (z ) ∈ RN with entries πi (z ) = |zi |/z1 and H1 is the ordi nary Shannon entropy H1 (π (z )) = − N i=1 πi (z ) log πi (z ). This kind of sparsity measure was proposed in [25,26], where estimation and statistical inference via α -stable random projection method were studied. Its extension to block sparsity was developed in [27]. In fact, this kind of sparsity measure is entropy-based,

2

q 1−q

sq (z ). When q = ∞, the

minimization problem (8) can be solved by solving N linear programs with a polynomial time, see the algorithm (30). However, in the cases of 1 < q < ∞, it’s very difficult to solve exactly. In Section 5, we adopt a convex-concave procedure algorithm to solve it approximately. Now we are ready to present the definition of q-ratio constrained minimal singular value, which is developed based on the q-ratio sparsity level. Definition 2. For any real number s ∈ [1, N], 1 < q ≤ ∞ and matrix A ∈ Rm×N , the q-ratio constrained minimal singular value (CMSV) of A is defined as

ρq,s (A ) =

min

z=0,sq (z )≤s

Az2 . z q

(9)

Z. Zhou, J. Yu / Signal Processing 155 (2019) 247–258

Remarks. When q = 2 and q = ∞, ρ q, s (A) reduces to ρ s (A) given in [15] (or equivalently θ (s, c0 ) used in [21,22]), and w2 (A, s) given in [16], respectively. For measurement matrices A with columns of unit norm, it is obvious that ρ q, s (A) ≤ 1 for any q > 1 since Aei 2 = 1, ei q = 1 and sq (ei ) = 1, where ei is the i-th canonical basis for RN . Moreover, when q and A are fixed, as a function with respective to s, ρ q, s (A) is non-increasing. For any α ∈ R, we have ρq,s (α A ) = |α|ρq,s (A ). This fact together with Theorem 1 in Section 3 imply that in the case of adopting a measurement matrix α A, increasing the measurement energy (|α |) without changing the measurement matrix structure (A) proportionally reduces the q norm of reconstruction errors when the true signal is exactly sparse. In fact, other extensions of this definition can be done, for instance we can define, for any 1 < q ≤ ∞,

ρ♦,q (A, s ) =

min

z=0,sq (z )≤s

Az♦ , z q

Proposition 2. If 1 < q2 ≤ q1 ≤ ∞, then for any real number 1 ≤ s ≤ q (q −1 ) N 1/q˜ with q˜ = q2 (q1 −1 ) , we have 1

2

(10)

Remarks. Let q1 = ∞ and q2 = 2 (thus q˜ = 2), we have ρ∞,s (A ) ≥ 1 ρ 2 (A ). The left hand side is exactly the right hand s2 ∞,s side inequality of Proposition 4 in [16], i.e., w2 (A, s ) ≥ ρs2 (A ). But

ρ2,s2 (A ) ≥

q (q −1 )

q1 −q2 q1 (q2 −1 )

≥ 1 when q1 ≥ q2 > 1, so ρq ,sq˜ (A ) ≤ 2 ρq2 ,s (A ). Similarly, according to the right hand side of the inequality, we have for any t ∈ [1, N] ρq2 ,t (A ) ≥ 1t ρq1 ,t (A ) by letting t = sq˜ . But obviously 1t ρq1 ,t (A ) ≤ ρq1 ,t (A ). Therefore, we cannot obtain the monotonicity with respective to q of ρ q, s (A) when s and A are as q˜ = q2 (q1 −1 ) = 1 + 1

2

fixed. However, when s = N, then since for any z ∈ RN , sq (z) ≤ N, it holds trivially that ρ q, N (A) is increasing with respect to q by using the decreasing property of q norm. 3. Recovery results In this section, we derive performance bounds on both q norm and 1 norm of the reconstruction errors for several convex sparse recovery algorithms in terms of the q-ratio CMSV of the measurement matrix. Let y = Ax + w ∈ Rm where x ∈ RN is the true sparse or compressible signal, A ∈ Rm×N is the measurement matrix and w ∈ Rm is the noise vector. We focus on three most renowned sparse recovery algorithms based on convex relaxation: the BP, the DS and the Lasso estimator. BP: min z1 s.t. y − Az2 ≤ ε . z∈RN

DS: min z1 s.t. AT (y − Az )∞ ≤ λN σ . z∈RN

Lasso: min 12 y − Az22 + λN σ z1 . z∈RN

Here ε , λN and σ are parameters used in the conditions to control the noise levels. We first present the following main recovery results for the case that the true signal x is exactly sparse. Theorem 1. Suppose x is k-sparse. For any 1 < q ≤ ∞, we have 1) If w2 ≤ ɛ, then the solution xˆ to the BP obeys

xˆ − xq ≤

xˆ − x1 ≤



,

(11)

4k1−1/q ε . q ρ q−1 (A )

(12)

ρ

q

q,2 q−1 k

q,2

(A )

k

2) If the noise w in the DS satisfies AT w∞ ≤ λN σ , then the solution xˆ to the DS obeys

where  · ♦ denotes a general norm. Then the measure ω♦ (A, s) defined in [16] is exactly ρ ♦, ∞ (A, s). Thus the corresponding results there can be generalized in terms of this newly defined measure. But we do not pursue the extensions in this paper. Basically, the recovery condition (ρ 2, s (A) > 0 with some proper s) discussed later to achieve the 2 norm error bounds is equivalent to the robust width property investigated in [30–32] and the strong restricted eigenvalue condition used in [21,22]. The recovery condition in terms of q-ratio CMSV is a bit stronger than the restricted eigenvalue condition [33,34] or more general restricted strong convexity condition [35]. The difference is that here we use a restricted set in terms of q-ratio sparsity, which is more intuitive and makes the proof procedure more concise. Not least it is computable! Comparing to the RIP, all these conditions do not require upper bounds on the restricted eigenvalues. As for different q, we have the following important inequality, which will play a crucial role in analyzing the probabilistic behavior of ρ q, s (A) via the existing results for ρ 2, s (A) established in [15].

ρq1 ,s (A ) ≥ ρq2 ,sq˜ (A ) ≥ s−q˜ ρq1 ,sq˜ (A ).

249

xˆ − xq ≤

4k1−1/q λN σ , ρ q (A )

(13)

8k2−2/q λN σ . ρ q (A )

(14)

2

q,2 q−1 k

xˆ − x1 ≤

2

q,2 q−1 k

3) If the noise w in the Lasso satisfies AT w∞ ≤ κλN σ for some κ ∈ (0, 1), then the solution xˆ to the Lasso obeys

xˆ − xq ≤

xˆ − x1 ≤

1+κ · 1−κ

2k1−1/q

ρ

2

q

q,( 1−2 κ ) q−1 k

(A )

λN σ ,

1+κ 4k2−2/q · λN σ . ( 1 − κ )2 ρ 2 (A ) q 2

(15)

(16)

q,( 1−κ ) q−1 k

Remarks. When the noise vector w ∼ N(0, σ 2 Im ), the conditions on noise for the DS and Lasso hold with high probability if λN (the parameter related to the signal dimensional N) is properly chosen. As a by-product of (11), we have if ρ (A ) > 0, then the noise q q,2 q−1 k

free BP (7) can uniquely recover any k-sparse signal by letting ε = 0. We established both the error q norm and 1 norm bounds. Our results for the error q norm bounds generalize from the existing results in [15] (q = 2) and [16] (q = ∞) to any 1 < q ≤ ∞. The error q norm bounds depend on the q-ratio CMSV of the measurement matrix A, which is bounded away from zero for subgaussian random matrices and can be computed approximately by using some specific algorithms. The details will be discussed in the later sections. As discussed in Section II.C of [15], the RIC based recovery error bounds for the BP [6], the DS [18] and the Lasso [36] are all quite complicated. The interested readers can refer to the related literature for detailed recovery conditions and the corresponding expressions of the error bounds. In contrast, as presented in this theorem, the 1 -CMSV based bounds for the 2 error norm [15] and the q-ratio CMSV based bounds for any q error norms with q ∈ (1, ∞], are much more concise and their derivations are much less involved, which will be given in the Appendix. In addition, similar to the sharp RIC based recovery condition established in [37], the problem of obtaining a sharp q-ratio CMSV based recovery condition is left for future work. Next, we extend Theorem 1 to the case that the true signal is allowed to be not exactly sparse, but is compressible, i.e., it can be well approximately by an exactly sparse signal. Theorem 2. Let the 1 -error of best k-term approximation of x be σk,1 (x ) = inf{x − z1 , z ∈ RN is k-sparse}, which is a function that measures how close x is to being k-sparse. For any 1 < q ≤ ∞, we have

250

Z. Zhou, J. Yu / Signal Processing 155 (2019) 247–258

1) If w2 ≤ ɛ, then the solution xˆ to the BP obeys

xˆ − xq ≤

xˆ − x1 ≤

ρ

2ε q

q,4 q−1 k

Then as a direct consequence of Proposition 2 (i.e., if 1 < q < 2, 2(q−1 ) (A ).)

+ k1/q−1 σk,1 (x ),

(A )

(17)

4k1−1/q ε + 4σk,1 (x ). q ρ q−1 (A ) q,4

(18)

k

2) If the noise w in the DS satisfies AT w∞ ≤ λN σ , then the solution xˆ to the DS obeys

xˆ − xq ≤

xˆ − x1 ≤

8k

ρ

2

1−1/q q

q,4 q−1 k

16k

(A )

λN σ + k1/q−1 σk,1 (x ),

q q,4 q−1

k

(A )

λN σ + 4σk,1 (x ).

xˆ − x1

m ≥ c1

L2 s log N

η2

we have

1+κ · 1−κ

4k1−1/q

ρ2

q,(

1+κ ≤ · ( 1 − κ )2

4 1−κ

)

q q−1

k

(A )



≤ κλN σ for some

λN σ + k1/q−1 σk,1 (x ),

(21)

ρ2

q,(

4 1−κ

)

q q−1

k

(A )

λN σ

4 + σ (x ). 1 − κ k,1

(22)

4 ) q−1 k q, ( 1− κ

q,4 q−1 k

0 for the BP, DS and Lasso in the compressible case, while the conditions are ρ (A ) > 0, ρ q (A ) > 0 and ρ (A ) > q q q,2 q−1 k

m ). L4

(24)

2) When 2 ≤ q ≤ ∞, there exist constants c1 and c2 such that for any η > 0 and m ≥ 1 satisfying

m ≥ c1

L2 s

2(q−1 ) q

log N

η2

we have

8k2−2/q

Remarks. As we can see, all the error bounds consist of two components, one is caused by the measurement error, while the other one is caused by the sparsity defect. And according to the proof procedure presented later, we can sharpen the error bounds to be the maximum of these two components instead of their summation. Comparing to the exactly sparse case, we need slightly stronger conditions to achieve the valid error bounds. Concisely, we require ρ (A ) > 0, ρ q (A ) > 0 and ρ (A ) > q q q,4 q−1 k

(23)

P {ρq,s (A ) ≥ s−1 (1 − η )} ≥ 1 − exp(−c2 η2

3) If the noise w in the Lasso satisfies κ ∈ (0, 1), then the solution xˆ to the Lasso obeys

q

Theorem 3. Under the assumptions and notations of Lemma 1, it holds that 1) When 1 < q < 2, there exist constants c1 and c2 such that for any η > 0 and m ≥ 1 satisfying

(20)

A T w

xˆ − xq ≤

2,s

and Lemma 1, we have the following probabilistic statements about ρ q, s (A).

E[ρq,s (A )] ≥ s−1 (1 − η ),

2−2/q

ρ2

(19)

ρq,s (A ) ≥ s−1 ρ2,s (A ). While if 2 ≤ q ≤ ∞, ρq,s (A ) ≥ ρ

2 ) q−1 k q, ( 1− κ

q,2 q−1 k

0 in the exactly sparse case, respectively. Moreover, analogous to the necessity of robust width property for stable and robust recovery discussed in [30–32], the necessity of q-ratio CMSV based condition could also be achieved. We leave investigation of this issue for future work because it’s out of the scope of the present paper. 4. Random matrices

E[ρq,s (A )] ≥ 1 − η, P {ρq,s (A ) ≥ 1 − η} ≥ 1 − exp(−c2 η2

(25) m ). L4

(26)

Remarks. Theorem 3 shows that at least for subgaussian random matrices, the q-ratio CMSV is bounded away from zero as long as the number of measurements is large enough. Random measurement matrices with i.i.d isotropic subgaussian random vector rows include the Gaussian and Bernoulli ensembles. The order of required number of measurements m is close to the optimal order for establishing the q norm error bound, see [38]. Besides, [39] shows that the ρ 2, s (A) of a class of structured random matrices including the Fourier random matrices and Hadamard random matrices is bounded from zero with high probability as long as the number of measurements is reasonably large. Then by adopting Proposition 2 again, this conclusion still holds for the ρ q, s (A) with 1 < q ≤ ∞, see [40] for detailed arguments. In Fig. 1, we plot the histograms of ρ q, s (A) using the computing algorithm (33) for Gaussian random matrices A ∈ R40×60 normalized by √1 . We set s = 4 but with three different values for q, 40

In this section, we study the property of q-ratio CMSVs for the subgaussian random matrices. A random vector X ∈ RN is called isotropic and subgaussian with constant L if it holds for all u ∈ RN 2 that E | X, u |2 = u22 and P (| X, u | ≥ t ) ≤ 2 exp(− Ltu ). Then as 2

shown in Theorem 2 of [15], we have the following lemma.

i.e., q = 1.8, q = 2 and q = 3. We obtain each histogram from 100 Gaussian random matrices. As expected, it can be observed that the q-ratio CMSVs are all bounded away from zero both in expectation and with high probability in this setting. 5. Numerical experiments

we have

In this section, we first describe a convex-concave procedure used to compute the maximal sparse level k such that the sufficient condition (8) is fulfilled, and then introduce the computation of the q-ratio CMSV and compare the q-ratio CMSV based bound and RIC based bound on the BP. All the experiments were done on a platform with a Intel (R) Xeon (R) CPU @ 3.60 GHz, 16 GB RAM, and a Windows 8 operating system. The version of Matlab used is R2015b.

E |1 − ρ2,s (A )| ≤ η

5.1. Verifying sufficient conditions

Lemma 1 ([15]). Suppose the rows of the scaled measurement matrix √ mA to be i.i.d isotropic and subgaussian random vectors with constant L. Then there exists constants c1 and c2 such that for any η > 0 and m ≥ 1 satisfying

m ≥ c1

L2 s log N

η2

and

P (1 − η ≤ ρ2,s (A ) ≤ 1 + η ) ≥ 1 − exp(−c2 η2

m ). L4

In order to use the q-ratio sparsity level to verify the sufficient condition (8), for each 1 < q ≤ ∞, we need to solve the optimization

Z. Zhou, J. Yu / Signal Processing 155 (2019) 247–258

251

Fig. 1. Histograms of the q-ratio CMSVs for Gaussian random matrices of size 40 × 60 with s = 4 and q = 1.8, 2, 3.

problem:

min

 q

z∈kerA\{0}

2 1−q

z 1 z q

q  q−1

.

(27)

Table 1 Comparison of the maximal sparsity levels calculated via different algorithms for a Bernoulli matrix with N = 40. The corresponding CPU times (s) are listed in the brackets.

Regardless of the constant, it is essentially equivalent to solve the problem:

max zq s.t. Az = 0 and z1 ≤ 1. z∈RN

(28)

Unfortunately, this maximizing q norm over a polyhedron problem is non-convex. For q = 2, [15] proposed to use a semidefinite relaxation to obtain an upper bound:

( L2 ) :

max

Z ∈RN×N :Z 0

trace(Z )

s.t. trace(AZAT ) = 0, Z 1 ≤ 1,

(29)

where Z = zzT and Z1 is the entry-wise 1 norm of Z. For q = ∞, it can be solved by solving N linear programs (see [15,16]):

  (L∞ ) : max maxN zi , s.t. Az = 0 and z1 ≤ 1 . 1≤i≤N

z∈R

(30)

Here we adopt the convex–concave procedure (CCP) (see [41] for details) to solve the problem (28) for any 1 < q < ∞. The basic CCP algorithm goes as follows: Algorithm: CCP to solve (28). Given an initial point z0 . Let k = 0. Repeat 1. Convexify. Linearize zq with the approximation zk q + ∇ (zq )Tz=zk (z − zk ) = zk q + [zk 1q−q |zk |q−1 sign(zk )]T (z − zk ). 2. Solve. Set the value of zk+1 to be a solution of max zk q + [zk 1q −q |zk |q−1 sign(zk )]T (z − zk ) z∈RN

s.t. Az = 0, z1 ≤ 1. 3. Update iteration: k = k + 1. Until stopping criterion is satisfied.

(31)

We first compare the L2 and L∞ algorithms for verifying the sufficient condition with our developed CCP algorithm. We present the results of CCP algorithms for q = 1.8, 2, 3 and 20 here. All the

m

L2

20 24 28 32

1 1 2 2

(7.87) (7.76) (7.41) (7.57)

L∞

CCP1.8

CCP2

CCP3

CCP20

1 2 2 3

1 2 3 4

1 2 3 4

2 2 4 5

2 2 2 3

(8.33) (8.29) (8.33) (8.25)

(9.40) (8.78) (8.94) (8.71)

(8.91) (8.68) (8.65) (8.66)

(8.64) (8.72) (8.67) (8.71)

(8.69) (8.76) (8.67) (8.65)

Table 2 Comparison of the maximal sparsity levels calculated via different algorithms for a Gaussian matrix with N = 256. The corresponding CPU times (s) are listed in the brackets. m

L∞

CCP1.8

CCP2

CCP3

CCP20

25 51 76 102 128 153 179 204 230

1 (61.00) 2 (69.25) 2 (79.77) 3 (95.00) 4 (113.02) 5 (131.60) 7 (153.03) 9 (170.53) 12 (191.65)

1 (62.32) 2 (72.74) 4 (82.74) 7 (98.56) 10 (115.03) 13 (135.58) 17 (159.35) 20 (179.87) 27 (192.90)

1 (61.72) 3 (72.18) 4 (83.11) 7 (97.99) 10 (115.97) 14 (134.47) 18 (157.35) 23 (174.99) 31 (194.68)

1 (62.07) 3 (70.48) 4 (82.39) 6 (97.13) 9 (115.77) 12 (133.13) 16 (156.71) 23 (173.45) 32 (195.30)

1 (61.72) 2 (70.43) 2 (82.87) 3 (97.58) 4 (114.31) 6 (113.37) 7 (156.95) 10 (177.64) 13 (196.24)

convex problems including (29), (30) and (31) are solved by CVX toolbox in Matlab [42]. The initial point z0 used in CCP is taken to be the solution of L∞ (30). If we denote the optimal objective values obtained to solve (29), (30) and (28) via CCP algorithm with optival optival optival some q by L2 , L∞ and CC Pq , then the maximal sparsity levels to achieve unique recovery for noise free BP are calcuoptival

optival

q

optival

q

lated as 1/(4L2 ), 1/(2L∞ ) and 2 1−q (1/CC Pq ) q−1 , respectively. In Table 1, we present the corresponding maximal sparsity levels calculated via different algorithms for a small size Bernoulli matrix with fixed N = 40 while varying the number of measurements m. As is shown, the convex relaxation algorithm

252

Z. Zhou, J. Yu / Signal Processing 155 (2019) 247–258

Fig. 2. q-ratio CMSV ρ q, s for Bernoulli random matrices of size 60 as a function of s with q = 1.8, 2, 3 and m = 20, 30, 40.

Fig. 3. q-ratio CMSV ρ q, s for Bernoulli random matrices of size 60 as a function of m with q = 1.8, 2, 3 and s = 4, 6, 8.

L2 actually give a lower bound for the solution of the case q = 2. In Table 2, we compare the results computed by L∞ and CCP for larger Gaussian random matrix with N = 256, also varying the number of measurements m. From both tables, it is observed that the maximal sparsity levels to achieve unique recovery for noise free BP computed by the algorithms L2 and L∞ are quite conserva-

tive. But it is much more acceptable and closer to the theoretical well-known optimal recovery condition bound m ≥ 2kln (N/k) for the Gaussian measurement matrix via our proposed CCP algorithm with some proper q, for instance q = 2. According to Proposition 1, to obtain unique recovery for noise free BP, the sparsity level of the unknown true signal is merely required to less than or equal to

Z. Zhou, J. Yu / Signal Processing 155 (2019) 247–258

253

Fig. 4. q-ratio CMSV ρ q, s for Bernoulli random matrices of size 60 as a function of q with m = 20, 30, 40 and s = 2, 4, 8.

the maximal sparsity level calculated for some q. The corresponding computing times for these algorithms are also listed in these tables. 5.2. Computing q-ratio CMSVs For each q, the computation of the q-ratio CMSV is equivalent

Table 3 Computational results of the IP algorithm (33) for a Bernoulli random matrix with m = 20, N = 60 and s = 5. q

min_opt

mean_opt

std_opt

time (s)

1.8 2 3

0.0178 0.0161 0.0138

0.0995 0.0974 0.0528

0.0916 0.0908 0.0231

38.75 32.06 30.93

to

min Az2 s.t. z∈RN

z 1 ≤ s

q−1 q

, z q = 1.

(32)

The above optimization problem is not convex because of the q constraint zq = 1. Here we use an interior point (IP) algorithm to directly compute an approximate numerical solution of (32). However IP approach requires that the objective and constraint function to possess continuous second order derivatives, q−1

which is not fulfilled by the constraint z1 − s q ≤ 0. The problem can be addressed by defining z = z+ − z− with z+ = max(z, 0 ) and z− = max(−z, 0 ). This leads to the following augmented optimization problem:

min (z+ − z− )T AT A(z+ − z− )

z+ ,z− ∈RN

s.t.

 i

zi+ +



zi− − s

q−1 q

≤ 0,

i

z+ − z− qq = 1, z + ≥ 0, z − ≥ 0.

(33)

The IP algorithm is implemented using the Matlab function fmincon. Due to the existences of local minima, we run the IP 30 times and select the minimal function value for all the trials. In Fig. 2, we compare the q-ratio CMSVs as a function of s approximated by the IP for Bernoulli random matrices. We set N = 60 but with three different m = 20, 30, 40 and three different q = 1.8, 2, 3. A Bernoulli random matrix with dimensionality 40 × 60 is first

simulated. Then for m = 20, 30, 40, the corresponding matrix is obtained by taking the first m rows of that full Bernoulli random matrix. And the columns of all the used matrix are normalized to have unit norms, which guarantees that ρ q, s (A) ≤ 1 for any q > 1. In general, as is shown that the q-ratio CMSVs decrease as s increases for all the nine cases. For fixed s, the q-ratio CMSVs increases as m increases for all the q. The influence of q on the q-ratio CMSVs is relatively small and apparently not monotonous. In Fig. 3, we plot the q-ratio CMSVs as a function of m with varying q = 1.8, 2, 3 and s = 4, 6, 8. For each q and s, we computing the q-ratio CMSVs with m increasing from 20 to 40. For each m, the construction of the corresponding matrix follows the same procedure given previously. Under the same settings, the q-ratio CMSVs as a function of q with varying m = 20, 30, 40 and s = 2, 4, 8 are presented in Fig. 4. Similar behaviors as Fig. 2 are observed from these two figures. Furthermore, we show in Table 3 the computational results of the IP algorithms performed on the Bernoulli random matrix with m = 20, N = 60 and s = 5 for three different q = 1.8, 2, 3. The minimal, mean and standard deviation of the optimal objective values, and the total computing time for 30 trials are listed. In Fig. 5, we show how q-ratio CMSVs calculated for a Bernoulli random matrix decrease as a function of repeat times of the trials. It seems that our choice of 30 times for the experiments is reasonable.

254

Z. Zhou, J. Yu / Signal Processing 155 (2019) 247–258

Fig. 5. q-ratio CMSVs calculated for a Bernoulli matrix of size 20 × 60 with s = 5 and q = 1.8, 2, 3 via IP algorithm as a function of repeat times.

Fig. 6. The q-ratio CMSV ρ q, s based bound vs the RIC based bound for Hadamard submatrices with N = 64, k = 1, 2, 4 and q = 1.8.

Z. Zhou, J. Yu / Signal Processing 155 (2019) 247–258 q

5.3. Bounds comparison Finally, we compare the q-ratio CMSV based bound and the RIC based bound on the BP for different configurations of m and k. It’s known that if the 2√ k order RIC of the measurement matrix A satisfies that δ2k (A ) < 2 − 1, then for any solution xˆ of the noisy BP approximates the true k-sparse signal x with errors

xˆ − xq ≤ Ck1/q−1/2 ε , where C =



4 1+δ2k (A ) √ 1−(1+ 2 )δ2k (A )

255

(34) with any 1 ≤ q ≤ 2.

Without loss of generality, we set ε = 1. The RIC is approximated using Monte Carlo simulations. Specifically, to compute δ 2k (A), we randomly take 10 0 0 submatrices of A ∈ Rm×N of size m × 2k, and approximate δ 2k (A) using the maximum of max(σ12 − 1, 1 − σ22k ) among all sampled submatrices. Here σ 1 and σ 2k are the corresponding maximal and minimal singular values of the sampled submatrix. As it is obvious that the approximated RIC is always smaller than or equal to the exact RIC, the error bounds based on the exact RIC are always worse than those based on the approximated RIC. Therefore, if our q-ratio CMSV based bound is better than the approximated RIC based bound, it is even better than the exact RIC based one. We approximate the q-ratio CMSV and the RIC for column normalized submatrices of a row-randomly-permuted Hadamard matrix with N = 64, k = 1, 2, 4, m = 10k : N, and q = 1.8. Fig. 6 shows that for all the tested cases, the q-ratio based bounds are smaller than those based on the RIC. For some certain m and k, the q-ratio CMSV based bounds apply √ even when the RIC based bound do not apply (i.e., δ2k (A ) ≥ 2 − 1). When m approaches N, it can be observed that the q-ratio based bounds are slightly larger than 2 while the RIC based bounds approach 4k1/1.8−1/2 ≥ 4 as δ 2k (A) → 0.

which implies that k ≥ 2 1−q sq (z ) for any 1 < q ≤ ∞. As a consequence of contraposition, when there exists some 1 < q ≤ ∞ such that k <

q

2 1−q sq (z ), zS 1 < zSc 1 holds

min

z∈kerA\{0}

that for all z ∈ kerAࢨ{0} and |S| ≤ k. Thus the null space property of order k is fulfilled and the unique solution to problem (7) is exactly the true k-sparse signal x. Proof of Proposition 2. We firstly prove the left hand side of (10). For any z ∈ RN \ {0} and 1 < q2 ≤ q1 ≤ ∞, when sq1 (z ) ≤ s, we



have

z1 zq1

q q1−1 1

≤ s ⇒ z 1 ≤ s

zq1 . Then, with q˜ :=

q2 (q1 −1 ) q1 (q2 −1 )

q1 −1 z 1 ≤ s q1 ⇒ sq2 ( z ) = z q2



q1 −1 q1

z q1 ≤ s

In this paper, we proposed a new measure of the measurement matrix’s incoherence, the q-ratio CMSV which was defined based on the q-ratio sparsity measure. We established the bounds for both q norm and 1 norm of the reconstruction errors of the Basis Pursuit, the Dantzig selector and the Lasso estimator using the qratio CMSV. For the subgaussian random matrices, we showed that the q-ratio CMSV is bounded away from zero as long as the number of measurements is relatively large. A CCP algorithm was developed to verify the sufficient conditions guaranteeing the unique noiseless recovery and an interior point algorithm was used to compute the q-ratio CMSV. Numerical experiments were presented to illustrate our theoretical results and assess all the algorithms. Some further generalizations including the block sparsity recovery, low-rank matrix recovery or more general high dimensional M-estimation are left for future work. Acknowledgments We would like to thank two anonymous referees, the Associate Editor, and the Editor for their detailed and insightful comments and suggestions that help to improve the quality of this paper. This work is supported by the Swedish Research Council grant (Reg.No. 340-2013-5342). Appendix A. Proofs Proof of Proposition 1. If there exists z ∈ kerAࢨ{0} and |S| ≤ k such that zS 1 ≥ zSc 1 , then for any 1 < q ≤ ∞, we have

z1 = zS 1 + zSc 1 ≤ 2zS 1 ≤ 2k1−1/q zS q ≤ 2k1−1/q zq ,

zq2 as zq2 ≥

we have

z 1 z q2

 q q2−1 2

q2 (q1 −1 )

≤ s q1 (q2 −1) = sq˜ ,

which implies that

{z : sq1 (z ) ≤ s} ⊆ {z : sq2 (z ) ≤ sq˜ }. Therefore, we have

Az2 Az2 ≥ min zq1 z=0,sq2 (z)≤sq˜ zq1 Az2 zq2 = min · z q1 z=0,sq2 (z )≤sq˜ zq2 Az2 ≥ min = ρq2 ,sq˜ (A ). z=0,sq2 (z )≤sq˜ zq2

ρq1 ,s (A ) =

min

z=0,sq1 (z )≤s

Next we verify the right hand side of (10). When sq2 (z ) ≤ sq˜ , for any z ∈ RN \ {0}, by using the non-increasing property of the z q-ratio sparsity, we have z 1 = s∞ (z ) ≤ sq2 (z ) ≤ sq˜ since q2 ≤ ∞. ∞

6. Conclusion

q1 −1 q1

zq

z

Then as 1 < q2 ≤ q1 ≤ ∞, thus it holds that z 2 ≤ z 1 ≤ sq˜ ⇒ q1 ∞ zq2 ≤sq˜ zq1 . In addition, the non-increasing property of q-ratio sparsity sq1 (z ) ≤ sq2 (z ) implies that

{z : sq2 (z ) ≤ sq˜ } ⊆ {z : sq1 (z ) ≤ sq˜ }. Therefore, we have

Az2 z q2 Az2 ≥ min z=0,sq1 (z )≤sq˜ zq2 Az2 zq1 = min · z q2 z=0,sq1 (z )≤sq˜ zq1 Az2 ≥ s−q˜ min z=0,sq1 (z )≤sq˜ zq1

ρq2 ,sq˜ (A ) =

min

z=0,sq2 (z )≤sq˜

= s−q˜ ρq1 ,sq˜ (A ). The proof is completed. Proof of Theorem 1. The proof procedure follows from the similar arguments in [15,16], which is simpler than those employed for obtaining the RIC based bounds. The derivation has two key steps: Step 1: For all algorithms, show that the residual h = xˆ − x is q-ratio sparse. As x is k-sparse, we assume that supp(x ) = S and |S| ≤ k. First, for BP and DS, since xˆ1 = x + h1 is the minimum among all z satisfying the constraints of BP and DS (including the true signal x), we have

x1 ≥ xˆ1 = x + h1 = xS + hS 1 + xSc + hSc 1 ≥ xS 1 − hS 1 + hS c 1 = x1 − hS 1 + hS c 1 .

256

Z. Zhou, J. Yu / Signal Processing 155 (2019) 247–258

Therefore, we obtain that hSc 1 ≤ hS 1 , which leads to

Therefore, we have

h1 = hS 1 + hS c 1 ≤ 2hS 1 ≤ 2k1−1/q hS q ≤ 2k1−1/q hq ,

Ah22 = hT AT Ah =

for any 1 < q ≤ ∞. Thus, sq (h ) =



h1 hq

q

q−1

ρ2

q

(A )h2q ≤ Ah22 ≤ 2λN σ h1 ≤ 4λN σ k1−1/q hq 4k1−1/q λN σ . (A ) q

ρ2

q,2 q−1 k

1 1 w22 − A(xˆ − x ) − w22 + λN σ x1 2 2 1 1 2 = w2 − A(xˆ − x )22 + A(xˆ − x ), w 2 2 1 2 − w2 + λN σ x1 2 ≤ A(xˆ − x ), w + λN σ x1

Hence, h1 ≤ 2k1−1/q hq ≤

xˆ1 ≤ κh1 + x1 .

(35)

Therefore, it holds that

x1 ≥ xˆ1 − κh1 = x + hSc + hS 1 − κhSc + hS 1 ≥ x + hS c 1 − hS 1 − κ ( hS c 1 + hS 1 ) = x1 + ( 1 − κ )hS c 1 − ( 1 + κ )hS 1 .

AT (y − Axˆ) ∈ λN σ ∂xˆ1 , where ∂xˆ1 = [−1, 1]N is the subgradient of  · 1 evaluated at xˆ. Thus, we have AT (y − Axˆ)∞ ≤ λN σ , which leads to

AT Ah∞ ≤ (κ + 1 )λN σ . Following the same argument as before, we get

Ah22 ≤ (κ + 1 )λN σ h1 .

1+κ ≤ hS 1 , 1−κ

ρ

2

q

q,( 1−2 κ ) q−1 k

hq ≤

k.

Ah2 = A(xˆ − x )2 ≤ Axˆ − y2 + y − Ax2 ≤ 2ε .

(36)

Then it follows from the definition of q-ratio CMSV and sq (h ) ≤ q

2 q−1 k that

ρ

k

(A )hq ≤ Ah2 ≤ 2ε ⇒ hq ≤

ρ

2ε q

q,2 q−1 k

Meanwhile, h1 ≤ 2k1−1/q hq ⇒ h1 ≤ ρ 4k

q q−1

k,

2(κ + 1 ) 1−1/q k hq , 1−κ

(39)

which implies that

Step 2: Obtain an upper bound on Ah2 and then get the q norm and 1 norm bounds on the error vector h via the definition of q-ratio CMSV. (a) For the BP, this is trivial since both x and xˆ satisfy the constraint y − Az2 ≤ ε , the triangle inequality implies

q q,2 q−1

2 1−κ

2 2

≤ λN σ

2 = hS c 1 + hS 1 ≤ hS 1 1−κ 2 2 ≤ k1−1/q hS q ≤ k1−1/q hq . 1−κ 1−κ



(A )h ≤ Ah ≤ (κ + 1 )λN σ h1 2 q

which implies that

h1 hq

(38)

As a consequence, with sq (h ) ≤

As a consequence, we obtain

sq ( h ) =

λN σ .

Moreover, since xˆ is the solution of Lasso, the optimality condition yields that

which leads to

2

≤ 1−κ

(A ) q q,2 q−1 k

AT Ah∞ ≤ AT (y − Ax )∞ + AT (y − Axˆ)∞ ≤ AT w∞ + AT (y − Axˆ)∞ ≤ κλN σ + AT (y − Axˆ)∞ .

xˆ − x, AT w + λN σ x1 ≤ xˆ − x1 AT w∞ + λN σ x1 ≤ κλN σ h1 + λN σ x1 ,

q q−1

8k2−2/q

ρ2

(c) Finally, we establish an upper bound on Ah22 for the Lasso with AT w∞ ≤ κλN σ .

=

q  q−1

(37)

q

⇒ hq ≤

λN σ xˆ1 ≤



|hi ||(AT Ah )i |

i=1

Thus, with sq (h ) ≤ 2 q−1 k, q,2 q−1 k

Consequently, substituting y = Ax + w yields

Thus, it yields

N 

≤ AT Ah∞ h1 ≤ 2λN σ h1 .

q

≤ 2 q−1 k.

1 1 Axˆ − y22 + λN σ xˆ1 ≤ Ax − y22 + λN σ x1 . 2 2

h1

hi (AT Ah )i ≤

i=1

Next, as for Lasso, since the noise w satisfies AT w∞ ≤ κλN σ for some small κ > 0 and xˆ is a solution of Lasso, we have

hS c 1

N 

k1−1/q

ρ

2 q,

( 1−2κ )

q q−1

k

(A )

·

2 (κ + 1 ) λN σ . 1−κ

Therefore, (16) holds since h1 ≤ 1−2 κ k1−1/q hq . Proof of Theorem 2. Assume that S is the index set that contains the largest k absolute entries of x so that σk,1 (x ) = xSc 1 and let h = xˆ − x. The derivations also have two steps: Step 1: For all algorithms, bound h1 with hq and xSc 1 . First for BP and DS, since xˆ1 = x + h1 is the minimum among all z satisfying the constrains of BP and DS, we have

xS 1 + xSc 1 = x1 ≥ xˆ1 = x + h1 = xS + hS 1 + xS c + hS c 1 ≥ xS 1 − hS 1 − xS c 1 + hS c 1 , which implies that

(A )

1−1/q ε

(A ) q q,2 q−1 k

.

.

(b) Now for the DS, since AT w∞ ≤ λN σ ,

AT Ah∞ ≤ AT (y − Axˆ)∞ + AT (y − Ax )∞ ≤ 2λN σ .

hS c 1 ≤ hS 1 + 2 xS c 1 .

(40)

As a consequence,

h1 = hS 1 + hS c 1 ≤ 2 hS 1 + 2 xS c 1 ≤ 2k1−1/q hS q + 2xSc 1 ≤ 2k1−1/q hq + 2xSc 1 .

(41)

Z. Zhou, J. Yu / Signal Processing 155 (2019) 247–258

Next, regarding to Lasso, adopting (35), we obtain that

xS 1 + xSc 1 = x1 ≥ xˆ1 − κh1 ≥ xS + xSc + hS + hSc 1 − κhS + hSc 1 ≥ xS + hSc 1 − xSc 1 − hS 1 − κhS 1 − κhSc 1 = xS 1 + ( 1 − κ )hS c 1 − xS c 1 − ( 1 + κ )hS 1 , which implies that

1+κ 2 hS 1 + xS c 1 . 1−κ 1−κ

hS c 1 ≤

(42)

Therefore, we have

2 2 hS 1 + xS c 1 1−κ 1−κ 2 2 ≤ k1−1/q hq + xS c 1 . 1−κ 1−κ

h1 ≤ hS 1 + hS c 1 ≤

(43)

Step 2: Verify that the q-ratio sparsity levels of h have lower bounds if hq is larger than the bound component caused by the error. (a) Specifically, for the BP, we assume that h = 0 and hq > 2ε ρ (A ) , otherwise (17) holds trivially. Since Ah2 ≤ 2ε , see q q,4 q−1 k

Ah2

(36), so we have hq > ρ

(A ) q q,4 q−1 k

. Then it holds that

Ah2 Az2 q < ρ q−1 (A ) = min q q,4 k hq z q z=0,sq (z )≤4 q−1 k q

⇒ sq (h ) > 4 q−1 k ⇒ h1 > 4k1−1/q hq .

(44)

k1/q−1 xSc 1

k1/q−1 σk,1 (x ),

Combining (41), we have hq < = which completes the proof of (17). The error 1 norm bound (18) follows immediately from (17) and (41). 1−1/q (b) As for the DS, we assume h = 0 and hq > 28k λN σ , ρ

(A ) q q,4 q−1 k

otherwise (19) holds trivially. As Ah22 ≤ 2λN σ h1 , see (37), so we have hq >

ρ2

q q,4 q−1

k

(A ) =

Ah2

4k1−1/q

ρ2

(A ) q q,4 q−1 k

min z=0,sq (z )

q ≤4 q−1

· h 2 . Then it implies that 1

Az22 z2q k 1−1/q

 q Ah22 4 q−1 k > h2q sq (h ) q

⇒ sq (h ) > 4 q−1 k ⇒ h1 > 4k1−1/q hq .

(45)

Combining (41), we have hq < k1/q−1 xSc 1 = k1/q−1 σk,1 (x ), which completes the proof of (19). Then (20) holds as a result of (19) and (41). (c) Finally, regarding to the Lasso, we assume that h = 0 κ 4k1−1/q and hq > 1+ λN σ , otherwise (21) holds trivially. 1−κ · 2 ρ

(A ) q q, ( 4 ) q−1 k 1 −κ

Since in this case Ah22 ≤ (1 + κ )λN σ h1 , see (38), so we have

hq >

ρ2

q,(

4 1−κ

)

4k1−1/q

(1−κ )ρ 2

q q, ( 4 ) q−1 k 1 −κ

q q−1

k

(A ) =

Ah2

(A )

· h 2 . Then it leads to 1

min z=0,sq (z )≤( 1−4 κ )

q q−1

Az22 z2q k 1− 1q q

Ah22 ( 1−4κ ) q−1 k > sq ( h ) h2q ⇒ sq ( h ) > (

q 4 ) q−1 k 1−κ

⇒ h1 >

257

4 k1−1/q hq . 1−κ

(46)

Combining (43), we have hq < k1/q−1 xSc 1 = k1/q−1 σk,1 (x ), which completes the proof of (21). Consequently, (22) is obtained by (21) and (43). References [1] E.J. Candes, J.K. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Commun. Pure Appl. Math. 59 (8) (2006) 1207–1223. [2] D.L. Donoho, Compressed sensing, IEEE Trans. Inf. Theory 52 (4) (2006) 1289–1306. [3] Y.C. Eldar, G. Kutyniok, Compressed Sensing: Theory and Applications, Cambridge University Press, 2012. [4] S. Foucart, H. Rauhut, A mathematical introduction to compressive sensing, 1, Birkhäuser Basel, 2013. [5] A. Cohen, W. Dahmen, R. DeVore, Compressed sensing and best k-term approximation, J. Am. Math. Soc. 22 (1) (2009) 211–231. [6] E.J. Candes, The restricted isometry property and its implications for compressed sensing, C.R. Math. 346 (9–10) (2008) 589–592. [7] R. Baraniuk, M. Davenport, R. DeVore, M. Wakin, A simple proof of the restricted isometry property for random matrices, Constr. Approx. 28 (3) (2008) 253–263. [8] M. Rudelson, R. Vershynin, On sparse reconstruction from fourier and gaussian measurements, Commun. Pure Appl. Math. 61 (8) (2008) 1025–1045. [9] M. Stojnic, W. Xu, B. Hassibi, Compressed sensing-probabilistic analysis of a null-space characterization, in: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, IEEE, 2008, pp. 3377–3380. [10] A.S. Bandeira, E. Dobriban, D.G. Mixon, W.F. Sawin, Certifying the restricted isometry property is hard, IEEE Trans. Inf. Theory 59 (6) (2013) 3448–3450. [11] A.M. Tillmann, M.E. Pfetsch, The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing, IEEE Trans. Inf. Theory 60 (2) (2014) 1248–1259. [12] A. d’Aspremont, L. El Ghaoui, Testing the nullspace property using semidefinite programming, Math. Program. 127 (1) (2011) 123–144. [13] A. d’Aspremont, L.E. Ghaoui, M.I. Jordan, G.R. Lanckriet, A direct formulation for sparse PCA using semidefinite programming, SIAM Rev. 49 (3) (2007) 434–448. [14] A. Juditsky, A. Nemirovski, On verifiable sufficient conditions for sparse signal recovery via 1 minimization, Math. Program. 127 (1) (2011) 57–88. [15] G. Tang, A. Nehorai, Performance analysis of sparse recovery based on constrained minimal singular values, IEEE Trans. Signal Process. 59 (12) (2011) 5734–5745. [16] G. Tang, A. Nehorai, Computable performance bounds on sparse recovery, IEEE Trans. Signal Process. 63 (1) (2015) 132–141. [17] S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1998) 33–61. [18] E. Candes, T. Tao, The Dantzig selector: statistical estimation when p is much larger than n, Ann. Stat. (2007) 2313–2351. [19] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B (Methodological) (1996) 267–288. [20] M. Bogdan, E. Van Den Berg, C. Sabatti, W. Su, E.J. Candès, SLOPE–adaptive variable selection via convex optimization, Ann. Appl. Stat. 9 (3) (2015) 1103. [21] P.C. Bellec, G. Lecué, A.B. Tsybakov, Slope meets lasso: improved oracle bounds and optimality, arXiv:1605.08651 (2016). [22] A. Derumigny, Improved bounds for square-root lasso and square-root slope, Electron. J. Stat. 12 (1) (2018) 741–766. [23] G. Tang, A. Nehorai, Semidefinite programming for computable performance bounds on block-sparsity recovery, IEEE Trans. Signal Process. 64 (17) (2016) 4455–4468. [24] G. Tang, A. Nehorai, The stability of low-rank matrix reconstruction: a constrained singular value view, IEEE Trans. Inf. Theory 58 (9) (2012) 6079–6092. [25] M. Lopes, Estimating unknown sparsity in compressed sensing, in: International Conference on Machine Learning, 2013, pp. 217–225. [26] M.E. Lopes, Unknown sparsity in compressed sensing: denoising and inference, IEEE Trans. Inf. Theory 62 (9) (2016) 5145–5166. [27] Z. Zhou, J. Yu, Estimation of block sparsity in compressive sensing, arXiv:1701. 01055 (2017). [28] Y. Plan, R. Vershynin, One-bit compressed sensing by linear programming, Commun. Pure Appl. Math. 66 (8) (2013) 1275–1297. [29] R. Vershynin, Estimation in high dimensions: a geometric perspective, in: Sampling Theory, A Renaissance, Springer, 2015, pp. 3–66. [30] J. Cahill, D.G. Mixon, Robust width: a characterization of uniformly stable and robust compressed sensing, arXiv:1408.4409 (2014). [31] H. Zhang, On robust width property for lasso and dantzig selector, arXiv:1501. 03643 (2015). [32] Z. Zhou, J. Yu, Stable and robust p -constrained compressive sensing recovery via robust width property, arXiv:1705.03810 (2017). [33] P.J. Bickel, Y. Ritov, A.B. Tsybakov, Simultaneous analysis of lasso and Dantzig selector, Ann. Stat. (2009) 1705–1732. [34] G. Raskutti, M.J. Wainwright, B. Yu, Restricted eigenvalue properties for correlated gaussian designs, J. Mach. Learn. Res. 11 (Aug) (2010) 2241–2259. [35] S. Negahban, B. Yu, M.J. Wainwright, P.K. Ravikumar, A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers, in: Advances in Neural Information Processing Systems, 2009, pp. 1348–1356.

258

Z. Zhou, J. Yu / Signal Processing 155 (2019) 247–258

[36] V. Kekatos, G.B. Giannakis, Sparse volterra and polynomial regression models: recoverability and estimation, IEEE Trans. Signal Process. 59 (12) (2011) 5907–5920. [37] T.T. Cai, A. Zhang, Sparse representation of a polytope and recovery of sparse signals and low-rank matrices., IEEE Trans. Inf. Theory 60 (1) (2014) 122–132. [38] S. Dirksen, G. Lecué, H. Rauhut, On the gap between restricted isometry properties and sparse recovery conditions, IEEE Trans. Inf. Theory (2016).

[39] H. Zhang, L. Cheng, On the constrained minimal singular values for sparse signal recovery, IEEE Signal Process. Lett. 19 (8) (2012) 499–502. [40] Z. Zhou, J. Yu, On q-ratio cmsv for sparse recovery, arXiv:1805.12022 (2018). [41] T. Lipp, S. Boyd, Variations and extension of the convex–concave procedure, Optim. Eng. 17 (2) (2016) 263–287. [42] M. Grant, S. Boyd, Y. Ye, CVX: Matlab software for disciplined convex programming, (2008).