Available online at www.sciencedirect.com
Journal of the Franklin Institute 351 (2014) 2711–2727 www.elsevier.com/locate/jfranklin
Globally sparse and locally dense signal recovery for compressed sensing Yipeng Liu ESAT-STADIUS/iMinds Future Health Department, Department of Electrical Engineering, KU Leuven, Kasteelpark Arenberg 10, Box 2446, 3001 Heverlee, Belgium Received 4 September 2013; received in revised form 8 January 2014; accepted 11 January 2014 Available online 23 January 2014
Abstract Sparsity regularized least squares are very popular for the solution of the underdetermined linear inverse problem. One of the recent progress is that structural information is incorporated to the sparse signal recovery for compressed sensing. Sparse group signal model, which is also called block-sparse signal, is one example in this way. In this paper, the internal structure of each group is further defined to get the globally sparse and locally dense group signal model. It assumes that most of the entries in the active groups are nonzero. To estimate this newly defined signal, minimization of the ‘1 norm of the total variation is incorporated to the group Lasso which is the combination of a sparsity constraint and a data fitting constraint. The newly proposed optimization model is called globally sparse and locally dense group Lasso. The added total variation based constraint can encourage local dense distribution in each group. Theoretical analysis is performed to give a class of theoretical sufficient conditions to guarantee successful recovery. Simulations demonstrate the proposed method's performance gains against Lasso and group Lasso. & 2014 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.
1. Introduction Compressive sensing (CS) is a new signal processing technique that can reconstruct the signal with a much fewer randomized samples than Nyquist sampling with high probability on condition that the signal has a sparse representation. Least-absolute shrinkage and selection operator (Lasso) is one of the popular convex ways to exploiting sparsity to recover the sparse signal [1]. It has wide applications in signal processing [2], machine learning [1], etc. Recent research on sparse signal recovery for CS shows that adding structural constraint can improve the estimate performance [3]. E-mail address:
[email protected] 0016-0032/$32.00 & 2014 The Franklin Institute. Published by Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jfranklin.2014.01.009
2712
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
Group Lasso (GLasso) is a popular example of model based sparse signal recovery ways [4–6], which extends the popular Lasso method [1,2]. Besides sparsity, it takes another physical property of signal. Group-sparsity is called block-sparsity too [7–9]. In group-sparse signal model, most of the nonzero entries are accumulated in groups. This means that a few groups can represent the signal. The GLasso replaces the ‘1 norm regularization with a sum of ‘2 norm regularization which has the effect of grouping all the variables within each group and the recovery entries are encouraged to be zero or nonzero simultaneously. A series of literatures has addressed certain statistical properties of the GLasso. In [6,10,11], some asymptotic properties like risk consistency and estimation consistency are provided. Some sufficient conditions for successful recovery are given in [7,8,12]. To allow overlapping of different groups, overlapping Group Lasso is proposed and corresponding algorithms are developed [13,14]. Weighted Group Lasso was proposed to enhance the sparsity [15–18]. Combining the ‘1 norm regularization for standard sparse constraint and the sum of ‘2 norm regularization for group constraint, the Hi-Lasso was proposed to further enforce sparse distribution in each group [19–24]. There are a lot of papers on Group Lasso, we just name parts of them here. In this paper, internal structural information of active groups is used to set a globally sparse and locally dense group signal model, and it is different from the signal model for the HiLasso. In the proposed signal model, the estimated signal is globally sparse and locally dense. This kind of signal model can be applied to many areas, such as wireless communication [18], array signal processing [17] regression [5] and bioinformatics [13]. A corresponding novel convex recovery model called globally sparse and locally dense (SDGLasso) is developed with a new constraint added for the signal recovery. This newly added constraint, in the form of total variation minimization, encourages dense distribution in each group. Similar to the Lasso [25,26], the GLasso [8] and the HiLasso [21], coherence evaluation for the proposed SDGLasso is performed to get a group of sufficient conditions for the successful recovery of globally sparse and locally dense group signal. Numerical experiments show the performance improvement. In the rest of this paper, the globally sparse and locally dense group signal model is formulated in Section 2. And in Section 3, a corresponding recovery model called SDGLasso is proposed. Section 4 gives a class of sufficient conditions for the SDGLasso and its deduction. In Section 5, simulations are performed to demonstrate the performance improvement of the proposed method. Finally, Section 6 draws the conclusion. 2. Signal model Considering an N-by-1 signal x can be expanded in an N-by-N orthogonal complete dictionary Ψ, with the representation as x ¼ Ψθ
ð1Þ
When most elements of the N-by-1 vector θ are zeros or trivially small, the signal x is sparse. When the number of nonzero entries of θ is S ðS5NÞ, the signal is said to be S-sparse. The signal model (1) is the standard sparse one. In many applications, practical signals enjoy some other common structural information. One example is the sparse group signal model [7] 2 3 6 7 θT ¼ 6 θ1 ⋯θd θdþ1 ⋯θ2d ⋯ θðK 1Þdþ1 ⋯θN 7 4|fflffl ffl{zfflfflffl} |fflfflfflfflfflffl{zfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl}5 θT1
θT2
θTK
ð2Þ
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
2713
where d is the length of each group. Without loss of generality, we can assume that N ¼ Kd. Here most of the nonzero entries are accumulated in groups. That is to say, only a small part of the d-by-1 vectors θk , k ¼ 1; 2; …; K, are active. We assume that there are at most K0 active groups; and within each group, we assume that not more than s entries are nonzero. Based on the sparse group signal (2), some more detailed structural information in each group can be used. One common structure is that most of the nonzero entries within each group are densely distributed. It means that if θk ¼ ½θðk 1Þdþ1 θðk 1Þdþ2 ⋯ θkd T is active, inside each active group, most of its entries θðk 1Þdþi , i ¼ 1; 2; …; d are active at a time. As there are a few active groups and most of the entries in it are nonzero, i.e. the modeled signal is globally sparse but locally dense, it is named as globally sparse and locally dense group signal model. In CS, instead of measuring the signal directly as Nyquist sampling, an M-by-1 random measurement matrix Φ is used to sample the signal. In matrix notation, the obtained M-by-1 random sample vector can be represented as y ¼ Φx ¼ ΦΨθ ¼ Aθ
ð3Þ
3. The proposed signal recovery When the restricted isometry property (RIP) holds, a series of recovering algorithm can reconstruct the sparse signal. One class is greedy algorithm, such as matched pursuit (MP) [27] and orthogonal matched pursuit (OMP) [28]; and another class is convex programming, such as basis pursuit (BP) and Lasso and Dantzig Selector (DS) [29,30]. Both of the convex programming and greedy algorithm have advantages and disadvantages when applied to different problem scenarios. A short assessment of their differences would be that convex programming algorithm has higher reconstruction accuracy while greedy algorithm has less computing complexity. The standard Lasso finds the smallest ‘1 norm of coefficients among all the decompositions that the signal is decomposed into a superposition of dictionary elements. It is a decomposition principle based on a true global optimization. It can be formulated as θLasso ¼ arg minð‖y Aθ‖2 þ α‖θ‖1 Þ θ
ð4Þ
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where ‖x‖2 ¼ jx1 j2 þ jx2 j2 þ ⋯ þ jxN j2 , for N-by-1 vector x ¼ ½x1 x2 ⋯ xN T , is the ‘2 norm, and ‖x‖1 ¼ jx1 j þ jx2 j þ ⋯ þ jxN j is the ‘1 norm; and α is the weighting factor between the least squares data fitting and the sparse constraint. The computation of the Lasso solutions is a quadratic programming problem or more general convex optimization problem, and can be tackled by standard numerical analysis algorithms. Its solution has been well investigated [2]. The standard Lasso is developed to recover the sparse signal (1). It only exploits the sparsity of signal. To take advantage of structural information, the GLasso is proposed for the sparse group signal model (2), and it can be formulated as [6,7] K θGLasso ¼ arg min ‖y Aθ‖2 þ β ∑ ‖θk ‖2 ð5Þ θ
k¼1
where β is the weighting factor between the least squares data fitting and the sparse group constraint. Here ∑Kk ¼ 1 ‖θk ‖2 can encourage the distribution that the entries are accumulated in a few clusters. The GLasso (5) is a second-order cone programming, and can be solved efficiently in polynormal time.
2714
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
The GLasso trades sparsity at the single-coefficient level with sparsity at a group level, while inside each group, no explicit structural constraint is enforced. Since it is assumed that nonzero entries of each active group are in dense distribution, and entries of each inactive group are dense with zeros, entries of all the groups are of piecewise-dense distributed. Therefore, a total variation minimization based new structural constraint can be added to the GLasso model (5) to formulate the SDGLasso as K I θSDGLasso ¼ arg min ‖y Aθ‖2 þ γ 1 ∑ ‖θk ‖2 þ γ 2 ∑ ‖Di θ‖1 ð6Þ θ
i¼1
k¼1
where γ1 and γ2 are the regularizing parameters balancing the least squares data fitting, the sparse group constraint and the total variation minimization (TVM) [31]. The variation matrix can be either forward or backward differential one. For the i-th order matrix 3 2 1 1 0 ⋯ 0 0 0 60 1 1 ⋯ 0 0 0 7 7 6 7 6 6 ⋱ ⋱ ⋱ ⋱ ⋮ 7 ð7Þ Di;F ¼ 6 ⋮ ⋱ 7 7 6 0 0 0 ⋯ 0 1 1 5 4 0 0 0 ⋯ 0 0 1 2
1
6 0 6 6 Di;B ¼ 6 6 ⋮ 6 4 0 0
1
0
⋯
0
0
1
1
⋯
0
0
⋱ 0
⋱ 0
⋱ ⋯
⋱ 0
⋱ 1
0
0
⋯
0
0
0
3
0 7 7 7 ⋮ 7 7 7 1 5 1
ð8Þ
Here Di;F and Di;B are the i-order forward and backward differential matrix, respectively; 1 is a 1 i row vector with all the elements being 1; and 1 is a 1 i row vector with all the elements being 1. The globally sparse and locally dense group signals possess a locally dense structure, and this structural information can be used by total variation in regularization to encourage the corresponding distribution. In CS, with overwhelming probability, the required number of measurement for successful sparse vector recovery by convex programming is proportional to the sparsity S. It is obvious that the total variation operation would result in a sparser vector for the globally sparse and locally dense group signals. Thus the required number of measurement would decrease, and the performance improves. The SDGLasso model (6) is a convex programming. It can be solved by standard softwares packages, such as SDPT3 [32] and SeDuMi [33]. An efficient subgradient algorithm can also compute the optimal solution efficiently [19,34]. Another algorithm called SpaRSA can apply to the SDGLasso model to get the optimal solution too [35].
4. Theoretical evaluation In this section, theoretical guarantees are developed that show that the proposed SDGLasso model (6) recovers the true unknown globally sparse and locally dense group signal. As scaling
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
2715
does not affect the optimal solution, Eq. (6) can be rewritten as K I min γ ∑ ‖θk ‖2 þ ð1 γÞ ∑ ‖Di θ‖1 θ
i¼1
k¼1
y ¼ Aθ
s:t:
ð9Þ
To make the derivation convenient, with loss of generality, only the first forward differential matrix is taken in this section. The index of the differential matrix is ignored and the N-by-N matrix D is used to denote the first order differential matrix in this section. Then Eq. (9) reduces to K min γ ∑ ‖θk ‖2 þ ð1 γÞ‖Dθ‖1 θ
k¼1
y ¼ Aθ
s:t:
ð10Þ
Here we can define B ¼ AD†
ð11Þ
b ¼ Dθ
ð12Þ
we assume that the number of the nonzero entices in each group of b is no more than sb. From the definition of D in Eq. (6), it is easy to get that D† is an N-by-N upper triangular matrix with all the nonzero entries in the upper triangular area being ones. Then the measurement can be reprinted as y ¼ Aθ ¼ Bb
ð13Þ
Then Eq. (10) can be rewritten as K min γ ∑ ‖θk ‖2 þ ð1 γÞ‖b‖2 θ
k¼1
s:t:
y ¼ Aθ;
b ¼ Dθ
ð14Þ
This simplified model has a similar structure as Hi-Lasso. Previous theory has given a class of sufficient conditions for the successful signal recovery by the Lasso [25,26], the GLasso [8,7], and the Hi-Lasso [21]. Here a class of sufficient conditions for the SDGLasso can be obtained in a similarly way. In the globally sparse and locally dense group signal θ, Let A0 denote the matrix corresponding to θ0 which is constituted by the entries in the active groups; θS0 is the vector with its entries being the nonzero entries of θ, and AS0 is the sub-dictionary corresponding to θS0 . Then we have y ¼ Aθ ¼ A0 θ0 ¼ AS0 θS0
ð15Þ
Correspondingly, A 0 is made up of the sub-dictionary of A with its columns not in A0 ; and the columns of AS0 are the ones of A0 not in AS0 ; and A is the matrix that all its columns not in AS0 . Thus we can get that with proper column index adjustment ½A0 ; AS0 can be equal to A. As the differential matrix D does not effect the grouping of the sparse vector θ. Thus the corresponding B0 , BS0 , BS0 , B 0 , and B can be obtained with the same column indexes with their counterparts in A ; and b0 , bS0 , bS0 can be given with the same support as their counterparts in θ. To achieve a unique sparse representation, the columns of AS0 must be linearly independent for any choice of A0 and the support S. Then, we can get that ðAS0 ÞT AS0 is invertible and the
2716
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
pseudo-inverse of AS0 can be represented as ðAS0 Þ† ¼ ððAS0 ÞT AS0 Þ 1 ðAS0 ÞT
ð16Þ BS0
Here the linear independence assumption for the columns of is kept too. The spectral norm ρðZÞ of a matrix Z can defined in two ways as [36–38] ρðZÞ ¼ maxjxT Zyj; x;y
and ρðZÞ ¼
s:t: J xJ ¼ 1; Jy J ¼ 1
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi λmax ðΞÞ
ð17Þ
ð18Þ
where λmax ðΞÞ ¼ maxðyT ΞyÞ; y
s:t: Jy J ¼ 1
ð19Þ
Based on the above assumptions, we can get a group of sufficient conditions for the SDGLasso (14) ρc ððAS0 Þ† A 0 Þo1
ð20Þ
ρc ððAS0 Þ† AS0 Þo1
ð21Þ
‖ðBS0 Þ† B‖1;1 o1
ð22Þ
where ρc ðZÞ ¼ max ∑ρðZlr Þ r
ð23Þ
l
In Eq. (20), Zlr is the (l,r)-th s d block of Z; and in Eq. (21), Zlr is the (l,r)-th s ðd sÞ block of Z ; and ‖Z‖1;1 ¼ max‖zr ‖1 r
ð24Þ
where zr is the r-th column of Z. The proof of the sufficient conditions (20)–(22) can refer to Appendix A. The standard coherence of the dictionary (measurement matrix) A can be defined as μA ¼ maxjaTi aj j i;j a i
ð25Þ
where ai and aj , i; j ¼ 1; 2; …; N, are the i-th and j-th atoms (column vectors) of the dictionary A, respectively ; and the standard coherence of the dictionary B is μB ¼ maxjbTi bj j i;j a i
ð26Þ
where bi and bj , i; j ¼ 1; 2; …; N, are the i-th and j-th atoms of the dictionary B, respectively. Similarly, the group coherence for dictionary A is defined as 1 ð27Þ μG ¼ max ρ ATi Aj i;j a i d where Ai and Aj , i; j ¼ 1; 2; …; K, are the i-th and j-th sub-dictionaries of the dictionary A, respectively, corresponding to the sub-vectors θi and θj .
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
To describe the local properties, sub-coherence of A is defined as T νA ¼ max maxjai aj j; ai ; aj A Ak k ¼ 1;2;…;K i;j a i
νB ¼
max
k ¼ 1;2;…;K
2717
ð28Þ
maxjbTi bj j; bi ; bj A Bk i;j a i
ð29Þ
When d ¼ 1, νA and νB are defined to be zeros. If all the columns of Ak are orthogonal for each group, then νA ¼ 0. If all the atoms are normalized, it would be obvious that μA , μG and νA all lies in [0, 1], and νA r μA , μG r μA . All these properties hold for μB and νB too. M ¼ AT A is the Gram matrix of inner products of the atoms of A. Then, the standard group coherence μG can be represented in terms of the largest singular value of an off-diagonal subblock of M. Similar to the above definition of spectral norm (17) and (18), the sparse spectral norm can be defined as ρSS ðZÞ ¼ maxjxT Zyj x;y
s:t: Jx J ¼ 1;
Jy J ¼ 1;
‖x‖0 r s;
‖y‖0 r s
ð30Þ
And the largest sparse singular value of Ξ ¼ ZT Z can be defined as [37,38] λSmax ðΞÞ ¼ maxðyT ΞyÞ y
s:t: Jy J ¼ 1;
Jy J r s
Then the sparse matrix norm can be given by qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ρS ðZÞ ¼ λSmax ðZT ZÞ
ð31Þ
ð32Þ
It is easy to get that ρSS ðZÞr ρS ðZÞ
ð33Þ
For any matrix Z, ρSS ðZÞ ¼ ρðIS ZIS Þ
ð34Þ
ρS ðZÞ ¼ ρðZIS Þ
ð35Þ
where IS is a matrix with s rows and s columns, and each row of the matrix has only nonzero entries. Here the locations of the ones are chosen to maximize the corresponding singular value. With the definition of sparse largest singular value and sparse matrix norm, two kinds of sparse group coherence can be given as μSS G ¼ max
1 SS T ρ Ai Aj d
ð36Þ
μSG ¼ max
1 S T ρ Ai Aj d
ð37Þ
i;j a i
i;j a i
2718
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
S Obviously μSS G r μG and μG r μG . Besides, to give the coherence between the active part and passive part of each block, a new sub-coherence can be defined as 1 νS ¼ max ρ ITS ATi Ai ð38Þ i d
where Ai is the d s columns of Ai , which are not selected by IS . With the above definitions, it would be obtained that [8] s 0 r μSS G r μA d rffiffiffi s 0 r μSG r μ d A rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sðd sÞ S νA 0rν r d
ð39Þ ð40Þ ð41Þ
The proof of the above equations (39)–(41) can be refereed to Appendix B. S S Let μSS G , μG and ν be the sparse group-coherence measures defined in Eqs. (36)–(38), respectively; and let νA and νB be the sub-coherence of the dictionaries A and B defined by Eqs. (25) and (26), respectively. The conditions are satisfied if K 0 dμSG o1 1 ðs 1ÞνA ðK 0 1Þ dμSS G
ð42Þ
True
Lasso
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
100
200
300
400
500
0
0
100
GLasso 0.2
0.3
0.15
0.2
0.1
0.1
0.05
0
100
200
300
300
400
500
400
500
SDGLasso
0.4
0
200
400
500
0
0
100
200
300
Fig. 1. The signals recovered by the standard Lasso, the GLasso and the SDGLasso, with the number of measurements M¼ 100 and all the elements in the active groups are nonzero.
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
2719
K 0 dνS o1 1 ðs 1ÞνA ðK 0 1Þ dμSS G
ð43Þ
K 0 sb νS o1 1 ðsb 1ÞνB ðK 0 1Þ dμB
ð44Þ
The proof of the above the sufficient conditions (42)–(44) can refer to Appendix C. 5. Simulation In this section, numerical experiments are used to show the performance of the proposed SDGLasso. Here, a globally sparse and locally dense group signal is generated. The length of the signal is N ¼ 1000, and the group width is d ¼ 10. Thus K ¼ N/d ¼ 100. The number of the randomly chosen active groups is 6. The synthetic sparse signal is normalized to have unit C2 norm. α and β are set to be 4, while γ1 and γ2 are set to be 2. The random sampling matrix is formed by sampling the i.i.d. entries from a white Gaussian distribution. The standard Lasso, the GLasso and the SDGLasso are used to recover the globally sparse and locally dense group signal. The first 3 differential matrices are taken in the SDGLasso. Fig. 1 shows the recovered signals with the number of measurements M¼ 100 and all of the entries in the active groups are nonzero ones; and Fig. 2 shows the recovered signals with the number of measurements M¼ 100 and 80% of the entries in the active groups are nonzero ones. These figures show that the signals recovered by the SDGLasso are better than that of the other two ways’ in that True
Lasso
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
0
100
200
300
400
500
0
0
100
GLasso
200
300
400
500
400
500
SDGLasso
0.25
0.2
0.2
0.15
0.15 0.1 0.1 0.05
0.05 0
0
100
200
300
400
500
0
0
100
200
300
Fig. 2. The signals recovered by the standard Lasso, the GLasso and the SDGLasso, with the number of measurements M¼ 100 and 80% of the elements in the active groups are nonzero.
2720
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727 1.6 standard Lasso GLasso SDGLasso
1.4 1.2
MSE
1 0.8 0.6 0.4 0.2 0 20
40
60
80
100
120
140
160
180
200
220
measurement number
Fig. 3. The MSE performance of the signals from M¼ 20 to M¼230 with all of the elements in the active groups are nonzero. 1.6 standard Lasso GLasso SDGLasso
1.4 1.2
MSE
1 0.8 0.6 0.4 0.2 0 20
40
60
80
100
120
140
160
180
200
220
measurement number
Fig. 4. The MSE performance of the signals from M¼ 20 to M¼ 230 with 80% of the elements in the active groups are nonzero.
active groups recovered by the SDGLasso clearly stand out and the noise is better suppressed. Besides, based on the theory of the added total variation minimization and the simulation results, we can get that the denser the active groups, the higher performance enhancement the SDGLasso achieves. Here L¼ 1000 Monte Carlo simulations are used to further demonstrate the performance improvement. The mean square error (MSE), which uses these sparse signal recovery methods, is defined as MSE ¼
∑Ll¼ 1 ‖θ^ l θl ‖2 L
ð45Þ
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
2721
where θl and θ^ l are the true and estimated sparse vectors, respectively, for the l-th simulation. Figs. 3 and 4 show the MSE performance of the recovered signals from M¼ 20 to M ¼ 230 with 80% and all of the entries in the active groups are nonzero ones, respectively. It is obvious that the SDGLasso outperforms the other two methods. Moreover, along with the increase of the density of the nonzero entries, the SDGLasso's performance gain against the Lasso and the GLasso becomes more remarkable. 6. Conclusion In this paper, one structural characteristic that most entries of the active groups are nonzero ones is incorporated into the sparse group signal model to obtain the globally sparse and locally dense group signal model. To exploit this structural information to improve the estimate performance, a total variation minimization constraint is added in the GLasso model. Since the computational complexity to evaluating the newly added constraint is trivial in comparison with overall computational complexity of the solution of GLasso [39], the computational time of the method is approximately the same as the classical ones, if we use the same algorithm to solve these convex optimization models. To theoretically guarantee the successful recovery from compressive measurement by the proposed SDGLasso, a class of sufficient conditions is given based on coherence analysis. Numerical simulations are also used to demonstrate that the proposed SDGLasso outperforms the standard Lasso and the GLasso in terms of estimation accuracy. Some future works can be done to apply this technique to several applicative scenarios. In wireless communication, cognitive radio needs to detect the unused spectrum holes in a wide spectrum range for dynamic spectrum access [40,18]. Compressive sensing can be used to reduce the sampling frequency, where the proposed SDGLasso can be used to recover the spectrum since a lot of the detected bands are allocated to wideband communications. In compressive sensing for radar applications [41], when the detected targets are not far away from the receiver, they cannot be modeled as separated dots but several groups of dense dots in discrete grid. In such a scenario, naturally it is better to use the proposed SDGLasso than other classical methods. Several other possible applications can be multiple measurement vectors (MMV) problem where multi-channel signals share almost the same supports [42], the climate prediction where the places in a small area have almost the same climate [43], etc. Acknowledgment Yipeng Liu is supported by FWO PhD/postdoc Grant G.0108.11 (compressed sensing). Appendix A Assume that there is another N-by-1 vector θ~ which satisfies y ¼ Aθ~
ðA:1Þ
~ and A ~ 0 is Similar to the definition of θ0 , θ~ 0 is consisted of the entries from the active groups of θ, the sub-dictionary corresponding to θ~ 0 . Thus we have ~ 0 θ~ 0 y ¼ A0 θ0 ¼ A
ðA:2Þ
2722
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
As AS0 has all its columns be linearly independent, we can get θS0 ¼ ðAS0 Þ† AS0 θS0 ¼ QAS0 θS0
ðA:3Þ
where Q ¼ ðAS0 Þ†
ðA:4Þ
Combining the fact that ~ 0 θ~ 0 AS0 θS0 ¼ A0 θ0 ¼ A
ðA:5Þ
we can get θS0 ¼ QA~ θ~
ðA:6Þ
Here with appropriate permutations of the blocks, A~ can be represented as A~ ¼ ½P R
ðA:7Þ
where the blocks in P are also contained in A0 and the rest blocks consist of R. Corresponding to ~ the permutated θ~ can be classified into p and r. As p is in the active groups, it can be further A, divided into p ¼ p S þ pS
ðA:8Þ AS0 ,
S
S
where p indicates the values in p that corresponds to columns in and p is corresponding to remaining columns. The matrix P is divided in the same way to get PS and PS , thus we can get QPS pS ¼ ΠpS
ðA:9Þ
where Π is an ðK 0 sÞ ðrsÞ matrix consisting of groups of size s s that are either equal to the identity, or to zero, and r is number of groups in PS . Combining Eqs. (A.6) and (A.9) would result in θS0 ¼ QA~ θ~ ¼ QðPp þ RrÞ ¼ QPp þ þQRr ¼ QPS pS þ QPS pS þ QRr ¼ ΠpS þ QPS pS þ QRr Therefore, for the sparse group regularization term
ðA:10Þ gðθÞ ¼ ∑Kk ¼ 1 ‖θk ‖2 ,
it has
gðθÞ ¼ gðθ0 Þ ¼ gðθS0 Þ r gðpS Þ þ gðQPS pS Þ þ gðQRrÞ S
Since P is contained in get
AS0
ðA:11Þ
and R is contained in A 0 , and when Eqs. (20) and (21) hold, we can
ρc ðQPS Þo1
ðA:12Þ
ρc ðQRÞo1
ðA:13Þ
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
2723
In [8], it has been proved that for any matrix Z with appropriate dimensions and a vector v with ‖v‖2 40, we have gðZvÞr ρc ðZÞgðvÞ
ðA:14Þ
Combining Eqs. (A.12)–(A.14), we can get ~ gðθÞogðpS Þ þ gðpS Þ þ gðrÞ ¼ gðθ~ 0 Þ ¼ gðθ~ 0 Þ ¼ gðθÞ
ðA:15Þ
For the TVM term, similarly we can get that ‖b0 ‖1 o‖b~ 0 ‖1
ðA:16Þ
where b~ 0 ¼ Dθ~ 0 is constituted by all the entries in active groups. In [25], it has been proved that ~ 1 ~ 1;1 ‖θ‖ ‖θ0 ‖1 o‖QA‖
ðA:17Þ
Similarly we can get ~ 1;1 ‖b‖ ~ 1 ‖b0 ‖1 o‖QB B‖
ðA:18Þ
where QB ¼ ðBS0 Þ†
ðA:19Þ
~ † B~ ¼ AD
ðA:20Þ
Eq. (A.18) holds a condition that at least one column in B~ is not in columns of BS0 are linearly independent. Considering the condition (22), we can get
BS0 ,
‖ðBS0 Þ† br ‖1 o1
which is true for the
ðA:21Þ BS0 ,
ðBS0 Þ† br
for every column br of B. Otherwise, if br is a column of would be a vector with the only nonzero entries in the location corresponding to br . It is easy to get ‖ðBS0 Þ† br ‖1 ¼ 1
ðA:22Þ
Thus from Eqs. (A.21) and (A.22), we can get ‖ðBS0 Þ† br ‖1 r 1
ðA:23Þ
Combining Eqs. (A.18) and (A.23), we can conclude that ~ 1 ‖b‖1 ¼ ‖b0 ‖1 o‖b‖
ðA:24Þ
Combining Eqs. (A.15) and (A.24), we can conclude that K
K
k¼1
k¼1
~ 1 γ ∑ ‖θk ‖2 þ ð1 γÞ‖Dθ‖1 r γ ∑ ‖θ~ k ‖2 þ ð1 γÞ‖Dθ‖ Thus θ ¼ D† b achieves the minimal objective for all possible y ¼ Aθ. Appendix B Using the Guergorin disc theorem, we can get T T ρSS ðZÞ ¼ λ1=2 max ðIS Z IS IS ZIS Þ
ðA:25Þ
2724
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r
s
max ∑ jelr j l
r¼1
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r s max jelr j l
ðB:1Þ
T T ρS ðZÞ ¼ λ1=2 max ðIS Z ZIS Þ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r
s
max ∑ jhlr j l
r¼1
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r s max jhlr j l
ðB:2Þ
where elr and hlr are the entries of E ¼ MTij IS ITS Mij
ðB:3Þ
H ¼ MTij Mij ;
ðB:4Þ
and
respectively. The entries of the d d matrix Mij ¼ ATi Aj for ia j have its absolute value not more than μA . Therefore jelr j r sμ2A
ðB:5Þ
jhkl j r dμ2A
ðB:6Þ
and
Thus we can get that Eqs. (39)–(41) hold. Appendix C It has proved in [8] that ρc ðXYÞr ρc ðXÞρc ðYÞ
ðC:1Þ
For all real N N matrices X and Y. Thus, ρc ðQA 0 Þ r ρc ððAS0 ÞT AS0 Þ 1 ρc ððAS0 ÞT A 0 Þ
ðC:2Þ
As it is defined that ρc ððAS0 ÞT A 0 Þ ¼ max ∑ ρðITS ATi Aj Þ j2 = Λ0 i A Λ0
ðC:3Þ
where Λ0 is the set of indices l for Al is in A0 . As every entry in the sum is bounded above by dμSB and Λ0 contains K 0 indices, we can get ρc ðQAÞr ρc ðððAS0 ÞT AS0 Þ 1 ÞK 0 dμSB
ðC:4Þ
Assuming the columns of A are normalized, we can write ðAS0 ÞT AS0 ¼ IðK 0 sÞðK 0 sÞ þ W
ðC:5Þ
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
2725
where W is a ðK 0 sÞ ðK 0 sÞ matrix with s s blocks Wl;q with its entries Wl;q ½i; i ¼ 0, for all i, and IðK 0 sÞðK 0 sÞ is the ðK 0 sÞ ðK 0 sÞ identity matrix. Because Wl;q ¼ ½AS0 Tl ½AS0 q
ðC:6Þ
For all la q and Wl;l ¼ ½AS0 Tl ½AS0 l Iss
ðC:7Þ
We have ρc ðWÞ ¼ max ∑ρðWl;q Þ q
l
r max ρðWq;q Þ þ max ∑ ρðWl;q Þ q
q
r ðs 1ÞνA þ ðK 0 1Þ
laq dμSS B
ðC:8Þ
From the sufficient conditions (42) and (43), it is easy to get ðs 1ÞνA þ ðK 0 1Þ dμSS B o1
ðC:9Þ
Thus we get ρc ðWÞo1
ðC:10Þ
On condition that Eq. (C.10) holds, the result from [8] gives 1
ðI þ WÞ 1 ¼ ∑ ð WÞk k¼0
ðC:11Þ
Combining Eqs. (C.2) and (C.11), we get 1 ρc ððAS0 ÞT AS0 Þ 1 ¼ ρc ∑ ð WÞk k¼0
1
r ∑ ðρc ðWÞÞ
k
k¼0
¼
1 1 ρc ðWÞ
r
1 1 ðS 1Þν ðK 0 1Þ dμSS B
ðC:12Þ
Combining Eqs. (C.5) and (C.12), we obtain ρc QA 0 r
K 0 dμSB 1 ðS 1Þν ðK 0 1Þ dμSS B
ðC:13Þ
Therefore we get the sufficient condition (42). Eqs. (43) and (44) can be given in the same way. Here Eqs. (42)–(44) are proved to be a class of sufficient conditions for successful recovery of the globally sparse and locally dense group signal via Eq. (10). Based on the above analysis, it is easy to further get that these conditions are also sufficient for recovery via Eq. (6).
References [1] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc.: Ser. B 58 (1996) 267–288. [2] S. Chen, D. Donoho, M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. 20 (1999) 33–61.
2726
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
[3] R. Baraniuk, V. Cevher, M. Duarte, C. Hegde, Model-based compressive sensing, IEEE Trans. Inf. Theory (2010) 982–2001. [4] P. Bühlmann, S. van de Geer, The group Lasso, Statistics for High-Dimensional Data, Springer Berlin Heidelberg, Gemany, 2011, pp. 55–76. [5] M. Yuan, Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 68 (2006) 49–67. [6] L. Meier, S.V.D. Geer, P. Bühlmann, The group lasso for logistic regression, J. R. Stat. Soc.: Ser. B 70 (2008) 53–71. [7] M. Stojnic, F. Parvaresh, B. Hassibi, On the reconstruction of block-sparse signals with an optimal number of measurements, IEEE Trans. Signal Process. 57 (2010) 3075–3085. [8] Y.C. Eldar, P. Kuppinger, H. Bolcskei, Block-sparse signals: uncertainty relations and efficient recovery, IEEE Trans. Signal Process. 58 (2010) 3042–3054. [9] E. Elhamifar, R. Vidal, Block-sparse recovery via convex optimization, IEEE Trans. Signal Process. 60 (2012) 4094–4107. [10] Y. Nardi, A. Rinaldo, On the asymptotic properties of the group lasso estimator for linear models, Electron. J. Stat. 2 (2008) 605–633. [11] H. Liu, J. Zhang, On the estimation consistency of the group lasso and its applications, in: Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), Clearwater Beach, Florida, USA. [12] M. Stojnic, l_ {2}/l_{1}-optimization in block-sparse compressed sensing and its strong thresholds, IEEE J. Sel. Top. Signal Process. 4 (2010) 350–357. [13] L. Jacob, G. Obozinski, J.-P. Vert, Group lasso with overlap and graph lasso, in: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, pp. 433–440. [14] L. Yuan, J. Liu, J. Ye, Efficient methods for overlapping group lasso, IEEE Trans. Pattern Anal. Mach. Intell. 35 (2013) 2104–2116. [15] K. Hirose, S. Konishi, Variable selection via the weighted group lasso for factor analysis models, Can. J. Stat. 40 (2012) 345–361. [16] X. Zhu, Z. Huang, H.T. Shen, Video-to-shot tag allocation by weighted sparse group lasso, in: Proceedings of the 19th ACM International Conference on Multimedia, ACM, pp. 1501–1504. [17] Y. Liu, Q. Wan, Robust beamformer based on total variation minimization and sparse constraint, Electron. Lett. 46 (2010) 1697–1699. [18] Y. Liu, Q. Wan, Enhanced compressive wideband frequency spectrum sensing for dynamic spectrum access, EURASIP J. Adv. Signal Process. 2012 (2012) 1–11. [19] J. Friedman, T. Hastie, R. Tibshirani, A Note on the Group Lasso and a Sparse Group Lasso, 2010. [20] P. Sprechmann, I. Ramirez, G. Sapiro, Y. Eldar, Collaborative hierarchical sparse modeling, in: 44th Annual Conference on Information Sciences and Systems (CISS 2010), Princeton, NJ, pp. 1–6. [21] P. Sprechmann, I. Ramirez, G. Sapiro, Y. Eldar, C-Hilasso: a collaborative hierarchical sparse modeling framework, IEEE Trans. Signal Process. 59 (2011) 4183–4198. [22] N. Simon, J. Friedman, T. Hastie, R. Tibshirani, A sparse-group lasso, J. Comput. Graph. Stat. 22 (2013) 231–245. [23] M. Lim, T. Hastie, Learning Interactions Through Hierarchical Group-Lasso Regularization, 2013, arXiv preprint arXiv:1308.2719. [24] J. Friedman, T. Hastie, R. Tibshirani, A Note on the Group Lasso and a Sparse Group Lasso, 2010, arXiv preprint arXiv:1001.0736. [25] J. Tropp, Greed is good: algorithmic results for sparse approximation, IEEE Trans. Inf. Theory 50 (2004) 2231–2242. [26] D.L. Donoho, M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization, Proc. Natl. Acad. Sci. 100 (2003) 2197–2202. [27] S. Mallat, Z. Zhang, Matching pursuits and time-frequency dictionaries, IEEE Trans. Signal Process. 41 (2003) 3397–3415. [28] Y.C. Pati, R. Rezaifar, P.S. Krishnaprasad, Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition, in: 27th Asilomar Conference on Signals, Systems, Computer, USA, pp. 40–44. [29] E. Candes, T. Tao, The dantzig selector: statistical estimation when p is much larger than n, Ann. Stat. 35 (2007) 2313–2351. [30] M. Asif, J. Romberg, On the lasso and dantzig selector equivalence, in: 44th Annual Conference on Information Sciences and Systems (CISS 2010), Princeton, NJ, pp. 1–6. [31] A. Chambolle, P.-L. Lions, Image recovery via total variation minimization and related problems, Numer. Math. 76 (1997) 167–188.
Y. Liu / Journal of the Franklin Institute 351 (2014) 2711–2727
2727
[32] M. Grant, S. Boyd, Y. Ye, Cvx: Matlab Software for Disciplined Convex Programming, 2010. [33] J. Sturm, Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones, Optim. Methods Softw. 11 (1999) 625–653. [34] M. Yuan, Y. Lin, Model selection and estimation in regression with grouped variables, J. R. Stat. Soc., Ser. B 68 (2007) 49–67. [35] S. Wright, R. Nowak, M. Figueiredo, Sparse reconstruction by separable approximation, IEEE Trans. Signal Process. 57 (2009) 2479–2493. [36] R.A. Horn, C.R. Johnson, Matrix Analysis, Cambridge Press, New York, NY, 1985. [37] H. Zou, T. Hastie, R. Tibshirani, Sparse principal component analysis, J. Comput. Graph. Stat. 15 (2006) 265–286. [38] B. Moghaddam, Y. Weiss, S. Avidan, Spectral bounds for sparse PCA: exact and greedy algorithms, in: Advances in Neural Information Processing Systems, pp. 915–922. [39] Y. Liu, M. De Vos, I. Gligorijevic, V. Matic, Y. Li, S. Van Huffel, Multi-structural signal recovery for biomedical compressive sensing, IEEE Trans. Biomed. Eng. 60 (2013) 2794–2805. [40] Y. Liu, Q. Wan, Compressive Wideband Spectrum Sensing for Fixed Frequency Spectrum Allocation, 2010, arXiv preprint arXiv:1005.1804. [41] J.H. Ender, On compressive sensing applied to radar, Signal Process. 90 (2010) 1402–1414. [42] S.F. Cotter, B.D. Rao, K. Engan, K. Kreutz-Delgado, Sparse solutions to linear inverse problems with multiple measurement vectors, IEEE Trans. Signal Process. 53 (2005) 2477–2488. [43] S. Chatterjee, K. Steinhaeuser, A. Banerjee, S. Chatterjee, A.R. Ganguly, Sparse group lasso: consistency and climate applications, SDM J., 2012, 47–58.