Accepted Manuscript
A family of the subgradient algorithm with several cosparsity inducing functions to the cosparse recovery problem Guinan Wang, Hongjuan Zhang, Shiwei Yu, Shuxue Ding PII: DOI: Reference:
S0167-8655(16)30090-3 10.1016/j.patrec.2016.05.012 PATREC 6536
To appear in:
Pattern Recognition Letters
Received date:
30 September 2015
Please cite this article as: Guinan Wang, Hongjuan Zhang, Shiwei Yu, Shuxue Ding, A family of the subgradient algorithm with several cosparsity inducing functions to the cosparse recovery problem, Pattern Recognition Letters (2016), doi: 10.1016/j.patrec.2016.05.012
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
CR IP T
ACCEPTED MANUSCRIPT
Research Highlights (Required)
To create your highlights, please type the highlights against each \item command.
• The hot cosparse analysis model is discussed.
AN US
It should be short collection of bullet points that convey the core findings of the article. It should include 3 to 5 bullet points (maximum 85 characters, including spaces, per bullet point.)
• A new alternative way replacing the l0 -norm based on the cosparsity inducing function is presented. • A new constrained optimal model and a family of subgradient algorithm for the cosparse recovery problem is given. • The convergence analysis of the proposed algorithm is given.
AC
CE
PT
ED
M
• Simulations on the recovering of the unknown signal indicate its better performance.
ACCEPTED MANUSCRIPT 1
Pattern Recognition Letters journal homepage: www.elsevier.com
Guinan Wanga , Hongjuan Zhang∗a , Shiwei Yua , Shuxue Ding∗∗b a Department
of Mathematics, Shanghai University, Shanghai, 200444, P R China b Department of Computer Software, The University of Aizu, Tsuruga, Ikki-Machi, Aizu-Wakamatsu City, Fukushima, 965-8580, Japan
ABSTRACT
CR IP T
A family of the subgradient algorithm with several cosparsity inducing functions to the cosparse recovery problem
AC
1. Introduction
CE
PT
ED
M
AN US
In the past decade, there has been a great interest in the sparse synthesis model for signal. The researchers have obtained a series of achievements about the sparse representation. The cosparse analysis model as the corresponding version of the sparse synthesis model has drawn much attention in recent years. Many approaches have been proposed to solve this model. In some conventional general, these methods usually relaxed l0 -norm to l1 -norm or l2 -norm to represent the cospasity of signal, from which some reasonable algorithms have been developed. Furthermore, this work will present a new alternative way to replace the l0 -norm based on the cosparsity inducing function, which is closer to l0 -norm than l1 -norm and l2 -norm. Based on this function, we firstly construct the objective function and give a constrained optimal model of the cosparse recovery problem. Then we propose a subgradient algorithm — Cosparsity Inducing Function (CIF) algorithm, which belongs to a two-layer optimization algorithm. Specifically, through converting the constrained optimal problem into the unconstrained case, we firstly obtain a temporary optimal variable, in which the cosparsity inducing function is approximated using its local linear approximation in order to avoid its nonconvex property. Secondly, a new cosupport is given by projecting the temporary optimal variable into the cosparse subspace and then keeping the l smallest elements. Besides, the desired signal is estimated using a conjugate gradient algorithm on the new cosupport. Moreover, we study the relative theoretical analysis about CIF Algorithm. Simulations on the recovering of the unknown signal in the cosparse analysis model indicate its better performance at last. keyword: cosparse analysis model, cosparsity inducing function, local linear approximation c 2016 Elsevier Ltd. All rights reserved.
In recent years, signal models have drawn much attention and been successfully used for a variety of signal processing tasks such as denoising, deblurring and compressing sensing. To obtain the sparse representation for signals, the researchers are interested in the following problem: the signal of interest x ∈ Rd is observable only through a set of linear measurements y ∈ Rm and the observation matrix M ∈ Rm×d (m < d) y = Mx + e,
∗ Corresponding author. Email address:
[email protected] ∗∗ Corresponding author. Email address:
[email protected]
(1)
where e is the additive noise that satisfies kek2 6 ε. When e = 0, the equation (1) becomes y = Mx, it is the noiseless case. The aim of solving the problem (1) is to recover or approximate x from y. Since this problem is a linear equation system with more unknowns than equations, i.e., under complete, generally, it is impossible. If x is known to be sparse in prior, the problem has been shown solvable. So far, there are two important signal models to solve the problem (1): the sparse synthesis model and the cosparse analysis model. In the sparse synthesis model, the main optimization problem about the sparse representation is xˆ = Dαˆ and αˆ = argmin kαk0 , subject to ky − Mxk2 6 ε, α∈Rn
(2)
ACCEPTED MANUSCRIPT 2
xˆ = argmin kΩxk0 ,
subject to
x∈Rd
ky − Mxk2 6 ε,
(3)
ED
M
AN US
where Ω ∈ R p×d (p > d) is a fixed analysis operator. We can find that the goal of the problem (3) is to make the cosparse representation vector Ωx sparse. In other words, Ωx contains many zeros. The cosparsity l is the number of zeros in Ωx, i.e. l = p − kΩxk0 (0 6 l 6 d), where kΩxk0 > p − d[4]. Certainly, there are some connections and differences between the sparse synthesis model and the cosparse analysis model that has proved in some literatures[2, 5, 6]. Specially, these two models may become equivalent in the general case, when D is a square and invertible matrix, i.e. D = Ω−1 , and D ∈ Rm×n is a full-rank matrix with n < m for D = Ω+ = (ΩT Ω)−1 ΩT . While this seems like a perfect transfer from the analysis model to the synthesis model, it is in fact missing a key element. It simply states that x must reside in the range of Ω, i.e. ΩΩ+ x = Ω(ΩT Ω)−1 ΩT x = x. Adding this as a constraint to the synthesis model, we get an exact equivalence, and otherwise, the synthesis model gets a larger number of degrees of freedom, and thus its minimum is deeper. In this work, we will concentrate on the cosparse analysis model. As we all know, l0 -norm problem is generally NP-hard. From the previous works, we can find that there are some alternative ways to replace the l0 -norm. For example, the l1 -norm or l2 norm due to their convexity[2, 6, 7]
find approximate solution under the noiseless condition. GAP recoveres the signal perfectly in the relevant experiments. Besides, another kind of the greedy algorithm is also used to solve this model. The work presented in [8] describes a new family of greedy-like methods for the cosparse analysis model, including Analysis IHT (AIHT), Analysis HTP (AHTP), Analysis CoSaMP (ACoSaMP) and Analysis SP (ASP). These algorithms are the analysis versions of the synthesis counterpart approaches, i.e. Iterative Hard Thresholding (IHT), Hard Thresholding Pursuit (HTP), Compressive Sampling Matching Pursuit (CoSaMP), Subspace Pursuit (SP)[9, 10, 11, 12]. When x is a low dimensional signal, the methods of AHTP, ACoSaMP, ASP need to solve the transformation of the problem (5), i.e. min ky − Mxk22 s.t.kΩ∧ xk22 = 0, where ∧ is the x cosupport. And for high dimensional signals, the model (5) is replaced by the following unconstrained minimization problem min ky − Mxk22 + λkΩ∧ xk22 (λ is a relaxation constant). x Meanwhile, literature [8] also provides performance grantees for these methods, which relied on a restricted isometry property (RIP) adapted to the context of the cosparse analysis model. Although the l1 -norm and l2 -norm overcome the computational difficulty, the researchers still prefer to find the better substitutions than the popular l1 -norm and l2 -norm. For the problem (2), many of works related sparse recovery replaced the l0 -norm using lq -norm (0 < q < 1) and have proved that lq (0 < q < 1) minimization has better sparse recovery ability than l1 minimization[13, 14, 15, 16, 17, 18, 19, 20, 21]. Besides, another novel idea of the sparsity inducing functions which used in the synthesis model (2) has proposed by Montefusco et al.[22], in which the sparse representation of recovering x from y under the noiseless situation be cast as
CR IP T
where the signal x ∈ Rd is assumed to be composed as linear combinations of a few atoms from a given dictionary D ∈ Rd×n , which is overcomplete, i.e.,n > d, such as x = Dα. The vector α ∈ Rn is the sparse representation of x, that is to say, α contains few nonzeros elements, and the sparsity k is the number of nonzero elements in α, i.e. kαk0 = k d[1, 2, 3]. In the cosparse analysis model, the researchers often consider the following optimization problem to recover x from y
subject to
xˆ = argmin kΩxk2 ,
PT
ky − Mxk2 6 ε
xˆ = argmin kΩxk1 , x∈Rd
x∈Rd
subject to
ky − Mxk2 6 ε
(4) (5)
AC
CE
At present, for these problems in the cosparse analysis model, the main works are not only the related algorithms to estimate or approximate the sparse representation of the possibly observed signal, but also the theoretical success guarantees for such algorithms[4, 6, 8]. Specifically, for the optimal model (4), Cai et al. introduce a split Bregman method which solves this model prevailingly using the Bregman iteration and provide the detailed convergence analysis[6]. The Bregman iteration converges very quickly when applied to certain types of objective functions, especially for problems involving an l1 regularization term. So the Split Bregman method takes only a few steps of iterations to give good results for the cosparse analysis model. In order to solve the optimal model (5), literature [4] gives a greedy algorithm termed “Greedy Analysis Pursuit” (GAP), which is an effective pursuit methods and similar to the Orthogonal Matching Pursuit (OMP)[2]. This approach updates the cosupports of the cosparse signals in a greedy fashion to
min Φ(x), x∈Rd
subject to y = Mx,
(6)
where Φ(x) is called the sparsity inducing function, which can be chosen lq -norm(0 < q < 1), atan function, log-sum function and so on. These sparsity inducing functions are more or less closely resembling the l0 -norm, which have been proved that the recovery effect is better by the related experiments in some other literatures[23, 24, 25]. Although these above functions are nonsmooth and nonconvex (concave), they also maintain some good properties of the l1 -norm, such as continuity and differentiability (for x , 0). In fact, there are some methods to resolve this nonsmooth and nonconvex optimization problem[22, 26, 27, 28, 29, 30], which includes a first order approximation method, a neural network approach based on smoothing approximation and so on. Here we mainly focus on the first order approximation method. As we all know, the good properties of the first order approximation method are easy to implement, and are easily obtained by exploiting the concavity of the function, which always lies below its tangent. And it is also shown that this method has ability to yield the best convex majorization of a concave objective function. At present, there are some representative works about the first order approximation method[22, 26, 27]. One of them is the local linear approximation (LLA) method proposed in [22, 26], which is possible to transform the nonconvex constrained minimization problem into a convex unconstrained
ACCEPTED MANUSCRIPT 3 In the following, we first convert the problem (7) into the unconstrained problem by Lagrangian multiplier method[13] 1 min ky − Mxk22 + λΦ(Ωx) d x∈R 2
M
ED
PT
CE
2. Model and Algorithm
A−lq function : Φ(Ωx) = kΩxkqq =
AC
subject to ky − Mxk2 6 ε.
i=1
|wi x|q (0 < q < 1)
p X
CR IP T
A − atan function : Φ(Ωx) =
(9) |wi x| atan
A − log − sum function : Φ(Ωx) =
i=1 p X
(10)
log(|wi x| + ) (11)
i=1
From the above expressions, we know that they are nonconvex about x and notice that Φ(Ωx) can be expressed as the summation of the functions about |wi x|, i.e. Φ(Ωx) =
p X i=1
φ(| wi x |),
(12)
where | wi x | is the variable and φ : R+ −→ R, having the following properties: continuous and derivable. In order to utilize these functions well, we first give their local linear approximation[22, 26], i.e. φ(z) ≈ φ(z0 ) + φ0 (z0 )(z − z0 ), f or z ≈ z0
(13)
With this approximation, the optimization problem (8) is converted into the following form p
X 1 min ky − Mxk22 + λ [(φ(|wi x? |) + φ0 (|wi x? |)(|wi x| − |wi x? |)], x∈Rd 2 i=1 (14) where x? is a known vector and |wi x| ≈ |wi x? |, the expressions of φ0 (|wi x|) are shown in Table 1. For the A-lq function (0 < q < 1), we add the smoothing factor to ensure that the denominator of the gradient is not equal to zero. Therefore, the above problem (14) can be essentially denoted as the following convex problem p
To recover the unknown cosparse signals in the cosparse analysis model, i.e., find the solution of the optimization model (3) x∈Rd
p X
X 1 min ky − Mxk22 + λ φ0 (|wi x? |)|wi x|. x∈Rd 2 i=1
2.1. The optimization model
min Φ(Ωx)
(8)
where λ is the penalty factor. For simplicity, the row of the analysis operator Ω is written as wi (1 < i < p). The corresponding functions Φ(Ωx) may be chosen as follows
AN US
problem by inserting the local linear approximation in the context of a Lagrangian approach. In literature [27], there is another first order method named smoothing quadratic regularization (SQR) algorithm, which solves a strongly convex quadratic minimization problem with a diagonal Hessian matrix at each iteration. Inspired by the literature [22, 26, 27], this work will concentrate on a more suitable relaxation of l0 -norm in the analysis model which may be differ from the l1 -norm and l2 -norm. One new substitution, named cosparsity inducing function, will be given to replace the l0 -norm, which is closer to it than l1 norm and l2 -norm. The cosparsity inducing function includes the following forms, such as A-lq , A-atan and A-log-sum function, whose details will be given in Section 2. Based on these functions, we firstly construct the objective function and give a constrained optimal model of the cosparse recovery problem. Then we propose a subgradient algorithm — Cosparsity Inducing Function (CIF) algorithm, which belongs to a two-layer optimization algorithm. Specifically, through converting the constrained optimal problem into the unconstrained case, we firstly obtain a temporary optimal variable based on the gradient learning step, in which the cosparsity inducing function is approximated using its local linear approximation in order to avoid its nonconvex property. Secondly, a new cosupport is given by projecting the temporary optimal variable into the cosparse subspace and then keeping the l smallest elements. Finally, the desired signal is estimated using a conjugate gradient algorithm on the new cosupport. The CIF Algorithm has a better recovery ability than some existing methods for solving the cosparse analysis problem, which will be indicated by the numerical experiments. The manuscript is organized as follows. Section 2 will introduce the optimal model firstly, and give the cosparsity inducing functions including their expression, and their subgradients. Meanwhile, we will propose CIF Algorithm and provide its main procedure. In Section 3, we will provide theoretical guarantees for the recovering performance of CIF Algorithm. The numerical experiments of CIF Algorithm will present in Section 4. Conclusions are drawn in the final section.
(7)
Here we will address some different cosparsity inducing functions Φ(Ωx), which are close to l0 -norm for the cosparse recovery. These functions are chosen as the analysis lq function (0 < q < 1), the analysis atan function and the analysis log-sum function, which are simplified as A-lq , A-atan and A-log-sum function for convenience. Specifically, when Φ(Ωx) = kΩxk1 and kΩxk2 , the problem (7) is equal to the problem (4) and (5).
(15)
which is the final optimization model of the cosparse analysis problem. For convenience, we set p
Y(x) =
X 1 φ0 (|wi x? |)|wi x|. ky − Mxk22 + λ 2 i=1
(16)
Table 1: Cosparsity inducing functions and their gradients about |wi x|.
Φ(Ωx) φ(| wi x |) A-lq |wi x|q A-atan atan |wi x| A-log-sum log(|wi x| + )
φ0 (| wi x |) q |wi x|1−q + |wi x|2 + 2 1 |wi x|+
ACCEPTED MANUSCRIPT 4
i
i
where α ∈ [−1, 1]. Next, we will introduce the proposed algorithm named Cosparsity Inducing Function (CIF) Algorithm in detail, which belongs to a two-layer optimization algorithm. In the first optimization procedure, we will calculate a temporary variable by solving the unconstrained problem (15).
e x = xt−1 + µt [MT (y − Mxt−1 ) − λt−1
p X i=1
φ0 (|wi xt−1 |)∂(|wi xt−1 |)],
(18) and the step size µt is learned by the following adaptive one k∂Y(xt−1 )k22
kM∂Y(xt−1 )k22
(19)
.
As for the penalty factor, we will update it using the following procedure λt : subject to ∂Y(e x) = 0, i.e., MT (y − M˜x) + λt
p X i=1
φ0 (|wi xt−1 |)∂ϕ(wi x˜ ) = 0.
(20)
ED
M
In the second optimization procedure, projecting e x into the cosparse subspace by keeping its smallest l elements for finding a new cosupport[8], i.e. ∧t = cosupport(Ωe x, l),
—————————————————————— Note that, CIF Algorithm belongs to a two-layer optimization method. In the first optimization procedure, an temporary variable is calculated by converting the constrained optimal problem into the unconstrained case, in which the cosparsity inducing function is approximated using its first order expression in order to avoid its nonconvex property. Then, a new cosupport is given by projecting the temporary optimal variable into the cosparse subspace and then keeping the l smallest elements.
AN US
µt =
3. Iteration : while halting criterion is not satisfied do –t = t + 1. The first optimization : –Calculating an temporary variable e x using the expression (18), where the updating of the step size µt based on (19). –Learning the Lagrange multiplier λt with (20). The second optimization : –Finding a new cosupport by the process (21). –Estimating the final optimal solution xt , i.e., solving the second optimal problem (22) with a conjugate gradient algorithm. end while 4. Output : xˆ : l−cosparse approximation of x.
CR IP T
2.2. The proposed algorithm In this section, we will solve (15) based on the (sub)gradient algorithm. Firstly, let ϕ(x) = |wi x|, and set its subgradient about x as w T, wi x > 0; i T αw , wi x = 0; ∂ϕ(wi x) = (17) i −w T , w x < 0.
(21)
CE
PT
where cosupport(Ωe x, l) is the indices of the l smallest elements after applying Ω on e x. Finally, in order to obtain a better solution xt , we will use the following optimal model, which is based on the new cosupport ∧t p P xt = argminx 21 ky − Mxk22 + λ φ0 (|wi xt−1 |)|wi x| , (22) i=1 subject to Ω∧t x = 0
AC
where Ω∧t denotes the vectors of Ω on the new cosupport ∧t . In the following, the specific procedure of the CIF Algorithm is described. Note that, CIF Algorithm has three corresponding versions due to the different forms of the cosparsity inducing functions Φ(Ωx), which are introduced in Section 2.1. Here we will not discuss them one by one. —————————————————————— Proposed Algorithm : CIF Algorithm —————————————————————— 1. Input : M ∈ Rm×d , analysis dictionary Ω ∈ R p×d , y ∈ Rm where y = Mx+e, e is the additive noise. l is the cosparsity of x under Ω, q is the power of the A-lq function and 0 < q < 1. 2. Intialization : x0 = 0, λ0 = 0.001, t = 0.
3. Analysis of CIF Algorithm In this section we provide theoretical guarantees for CIF Algorithm and our analysis is based on the optimization problem (15). At first, we need the M-restricted isometry property (MRIP), which is a version of RIP for the analysis model[31, 32]. Definition 1. (Definition 1 in [31]) A matrix M obeys the M-restricted isometry property with constant δM over any subset M ∈ RN , if δM is the smallest quantity satisfying (1 − δM )kvk22 6 kMvk22 6 (1 − δM )kvk22 f or all v ∈ M. (23) Theorem 2. (Stable recovery of CIF Algorithm) Suppose that x∗ is the true signal vector satisfies y = Mx + e with k e k2 < ε. Assume that the measurement matrix M satisfies the M-RIP property of order 2l − p with δ2l−p < 1 over the set M = {x : Ω∧ x = 0, | ∧ | > l} and the analysis operator Ω has full column rank. Then a sequence {xt : t = 1, 2, · · · } generated by CIF Algorithm is convergent and the solution xˆ obtained by the CIF Alogrithm satisfies the following error bound k x∗ − xˆ k2 < Cε,
(24)
where C is a constant depending on δ2l−p . Proo f . Since the above model (15) can be regard as iteratively reweighted l1 (IRL1) minimization problem no matter which kind of the cosparsity inducing function is chosen. We only provide the theoretical analysis about the stable recovery of CIF Algorithm with A-lq function here. Set β = Ωx, so βi = ωi x, i = 1, 2, · · · , p.
(25)
ACCEPTED MANUSCRIPT 5 4. Numerical Experiments Y(x) =
1 ky − Mxk22 + λ 2
is written as
p X i=1
q |wi x| |wi x? |1−q +
(26)
p
F(β) =
X 1 q |βi |. ky − MΩ+ βk22 + λ ? 1−q 2 |β | + i i=1
(27)
That is, min F(β) is a IRL1 optimal problem. For fixed , Chen and Zhou, in [33], have proved that under some conditions the iterates of IRL1 converge to the global minimizer. Suppose βˆ = Ωˆx is the global minimizer, so existing a seˆ i.e., quence {βt } converges to β, {βti }
−→ βˆ i ,
i = 1, 2, · · · , p.
In this section, we will demonstrate the proposed CIF Algorithm using some of the experiments just like in literature [8], which is the reconstruction of the Shepp-Logan phantom from few number of measurements with 35 radial lines. In this experiment, the cosparse operator is Ω2D−DIF and the actual cosparsity of the original signal under this operator is l = 128014. The accuracy of recovery is measured by the following index: lrec , which presents the current recovering cosparsity obtained by the given algorithm, i.e. the number of zero elements in Ωˆx. When lrec is closer to the exact cosparsity l = 128014, the signals are perfectly recovered. And we also calculated the performance index:
CR IP T
Then
err =
(28)
ky − Mˆxk2 , kMˆxk2
(37)
Because rank(Ω) = d, the arbitrary submatrix Ω∗ ∈ Rd×d of Ω is reversible. Then Ω∗ xˆ = βˆ ∗ , (29) ∗ d ˆ where βˆ ∈ R is composed of the elements of β. We can obtain xt = (Ω∗ )−1 β∗t . (30)
err is smaller when the signals obtain the better recovery. Besides, we estimated the effectiveness using another index:
So {xt } converges to xˆ . Meanwhile, we can get
4.1. The performance of CIF Algorithm for the different parameters In the CIF Algorithm, there are two parameters, i.e. q and , where q is the power of the A-lq function, and is the factor for avoiding zero vector in the cosparsity inducing function. Here this parameter need to be determined firstly in the procedure of CIF Algorithm, which is assumed 10−6 in this experiment. Thus, when Φ(Ωx) are A-lq function, the performance of CIF Algorithm is affected by only one parameter q, and when Φ(Ωx) are A-atan and A-log-sum functions, there is no parameter. In order to verify the effectiveness of CIF Algorithm with Alq function, we test 6 different possible values of q to compare the experiment results which are demonstrated in Table 2. Table 2 shows that the recovery performance of CIF algorithm is better for most of the chosen parameters q, where lrec and l is less than 30, in other words, the recovery rate of the cosparsity is up to 99.98%, and the values of err is less than 0.016 and SNR is more than 43 dB. In practice, we suggest using q = 0.01 for CIF Algorithm with A-lq function.
(31)
Y(ˆx) 6 Y(0).
(32)
Then
1 − δ2l−p 1
6 p
1 − δ2l−p
In addition,
1 − δ2l−p 1
k M(x∗ − xˆ ) k2
k (y − Mx∗ ) − (y − Mˆx) k2
(k y − Mx∗ k2 + k y − Mˆx k2 )
(33)
√
(34)
(ε+ k y − Mˆx k2 ).
CE
6 p
1 − δ2l−p
PT
1
6 p
1
ED
k x∗ − xˆ k2 6 p
p
2Y(ˆx) 6
AC
k y − Mˆx k2 6
p
2Y(0) =
2 k y k2 .
As in literature [8], we can obtain that existing a positive constant η so that k y k2 6 η k e k2 6 ηε.
(35)
Finally, (1 + η) k x − xˆ k2 6 Cε, where C = p . 1 − δ2l−p ∗
(36)
From this analysis, we can see that CIF Algorithm can recover an approximate solution away from the true signal vector by the factors η and ε. 2
E k xˆ k22
E k y − Mˆx k22
).
(38)
where SNR is short for signal to noise ratio. The higher SNR is, the better the performance is.
AN US
ˆ 6 F(0), F(β)
M
i.e.,
SNR = 10 ∗ log10 (
Table 2: The results of CIF Algorithm with A-lq function. index lrec err SNR
q=0.001 128004 0.0134 43.5305
0.01 128015 0.0133 43.7577
0.1 127982 0.0177 42.707
1/3 127869 0.0160 42.8905
1/2 127993 0.0142 43.0888
2/3 127979 0.0160 43.0888
The related performances of CIF Algorithm with A-atan and A-log-sum function are shown in Fig.1. From this figure, we can see that the cosparsity lrec is nearly fully recovered and both err and SNR are obtained the better values. Meanwhile, we can find that err and SNR of CIF Algorithm with A-log-sum function are better than CIF Algorithm with A-atan function, but the convergence of CIF Algorithm with A-log-sum function is worse than CIF Algorithm with A-atan function.
ACCEPTED MANUSCRIPT 6 (b) 44.5 A−atan A−log−sum
11.76 11.74 A−atan A−log−sum l=128014
43 0.025 SNR
11.7
43.5
err
log(lrec)
44
0.03
11.72
of the current cosparsity and SNR, which are presented in Table 3. The recovery cosparsity lrec of CIF Algorithm is more preferable than AIHT, AHTP and GAP, and SNR is also more than AIHT, AHTP. Although the values of SNR about CIF Algorithm is smaller than ASP, ACoSaMP and CoIRLq[31], the cosparsity is also achieved great recovery. Thus, we will study how to improve SNR in future. Overall, CIF Algorithm with the above cosparsity inducing functions has obtained the better recovery results.
(c)
0.035
11.68
0.02
42.5 42 41.5
11.66
41
0.015 11.64 11.62 20
40.5 40
60 Iteration
80
0.01 20
100
40
60 Iteration
80
40 20
100
40
60 Iteration
80
100
Fig.1 The results of CIF Algorithm with A-atan function and A-log-sum function.
As a whole, CIF Algorithm has obtained the better recovery performance no matter which kind of the cosparsity inducing function Φ(Ωx) is chosen. And its recovery performance is acceptable for the appropriate parameter values.
proposed algorithm cosparsity SNR
previous algorithm cosparsity SNR previous algorithm cosparsity SNR
(a) 11.78
CE
PT
ED
M
AN US
4.2. The comparison between CIF Algorithm and some existing algorithms Here we will compare numerical results between CIF Algorithm and some previous algorithm. We firstly show the different performance between Φ(Ωx) choosen the cosparsity inducing function and Φ(Ωx) =k Ωx k1 in Fig.2, where the results of CIF Algorithm with A-lq function is chosen the best performance in the Table 2, i.e., q = 0.01. Clearly, CIF Algorithm with the above kind of cosparsity inducing function is superior to Φ(Ωx) =k Ωx k1 for all of lrec , err and SNR, where the error of CIF Algorithm is closer to zero than Φ(Ωx) =k Ωx k1 and SNR of CIF Algorithm has improved more than 1 dB. Meanwhile, we can analyze the performance of the CIF Algorithm with the different cosparsity inducing functions. CIF Algorithm with A-lq function has the better convergence in the cosparsity recovery than other methods. A-atan function obtains the better performance of lrec , err and SNR. And CIF Algorithm with Alog-sum function has the best results both the errors (err) and SNR, i.e. err is smallest and SNR is largest, but the convergence is worst. Thus A-lq and A-atan functions are the better choices than A-log-sum function.
Table 3: The comparison of the related algorithms.
(b)
11.76
0.03
A−lq A−atan A−log−sum l1 l=128014
11.7
11.68
43
0.02
60 Iteration
80
100
42.5 42 41.5 41
0.015
11.64
40
43.5
0.025
11.66
11.62 20
44
SNR
AC
11.72
44.5 A−lq A−atan A−log−sum l1
err
11.74
log(lrec)
(c)
0.035
40.5 0.01 20
40
60 Iteration
80
100
40 20
40
60 Iteration
80
100
Fig.2 The comparison of between Φ(Ωx) choosen the cosparsity inducing function and Φ(Ωx) =k Ωx k1 . The subgraph(c) has the same legend with (b).
Finally, we compare CIF Algorithm with some previous approaches, such as AIHT, AHTP, GAP, ASP and ACoSaMP in literature [8] and CoIRLq in [31], and give the recovery results
A-lq 128015 43.7577
A-atan 128013 43.3768
A-log-sum 128010 44.3419
AHTP 127907 42.173
AIHT 125070 40.8994
GAP 127660 165.488
CR IP T
(a)
ASP 128014 147.7146
ACoSaMP 128014 138.223
CoIRLq 128014 105.3298
5. Discussions and Conclusions In this work, we focused on the cosparse analysis model and proposed CIF Algorithm with a kind of cosparsity inducing functions for the cosparse representation problem. These cosparsity inducing functions are better substitutions of l0 norm, which differs from other relaxations, such as l1 -norm and l2 -norm, moreover, they are closer to l0 -norm than l1 -norm and l2 -norm. Based on them, we constructed the objective functions and gave a constrained optimal model of the cosparse analysis problem. And we proposed a subgradient algorithm — Cosparsity Inducing Function (CIF) algorithm, which belongs to a two-layer optimization algorithm. Specifically, through converting the constrained optimal problem into the unconstrained case, we firstly obtained a temporary optimal variable, in which the cosparsity inducing function was approximated using its local linear approximation in order to avoid its nonconvex property. Secondly, a new cosupport was given by projecting the temporary optimal variable into the cosparse subspace and then keeping the l smallest elements. Finally, the desired signal was estimated using a conjugate gradient algorithm on the new cosupport. The CIF Algorithm has a better recovery ability than some existing methods for solving the cosparse analysis problem, which has been indicated by the numerical experiments. Moreover, we have studied the theoretical guarantees for the stability and convergence of CIF Algorithm. Note that the above mentioned cosparsity inducing functions are nonconvex, nonsmooth at origin. We employed a convex technique that has been utilized in synthesis model successfully, i.e. using their local linear approximation to overcome their nonconvex property. Besides, the proposed CIF Algorithm belongs to the subgradient method. As we all know that the most disadvantage of this learning style is its slow convergence and the demand for some suitable step size. For improving the convergence, some other approaches may be chosen, such as the fixed-point iteration, alternating direction method of multipliers
ACCEPTED MANUSCRIPT 7
Acknowledgment
AN US
The authors acknowledge the insightful comments provided by the anonymous reviewers which have added much to the clarity of the paper. This work was supported by the National Natural Science Foundation of China under the Grants No.11126057, 61304178, 11501351, Shanghai Leading Academic Discipline Project (J50101), key Disciplines of Shanghai Municipality (Operations Research and Cybernetics, S30104), Shanghai Key Laboratory of Intelligent Information Processing (No.IIPL-2014-003) and Grants-In-Aid for Scientific Research, Ministry of Education, Culture, Sports, Science and Technology, Japan, Project (No.24500280).
[17] R.Chartrand, Exact reconstruction of sparse signals via nonconvex minimization, IEEE Signal Process. Lett. 14 (10) (2007) 707–710. [18] M.E.Davies, R.Gribonval, Restricted isometry constants where l p sparse recovery can fail for 0 < p ≤ 1, IEEE Trans. Inform. Theory 55 (5) (2009) 2203–2214. [19] M.J.Lai, J.Wang, An unconstrained lq minimization for sparse solution of under determined linear systems, SIAM J. Optimization 21 (1) (2011) 82–101. [20] S.Foucart, A note on guaranteed sparse recovery via lq -minimization, Appl. Comput. Harmon. Anal. 29 (1) (2010) 97–103. [21] Z. Xu, X. Chang, l1/2 regularization: A thresholding representation theory and a fast solver, IEEE Trans. Neur. Net. Lear. 23 (7) (2012) 1013–1027. [22] L. B.Montefusco, D. Lazzaro, S. Papi, A fast algorithm for nonconvex approaches to sparse recovery problems, Signal Process. 93 (9) (2013) 2636–2647. [23] E.J.Cand`es, M.B.Wakin, S.P.Boyd, Enhancing sparsity by reweighted l1 minimization, J. Fourier Anal. Appl. 14 (5-6) (2008) 877–905. [24] R.Chartrand, W.Yin, Iteratively reweighted algorithms for compressive sensing, Acoustics, Speech and Signal Processing (2008) 3869–3872. [25] G.Gasso, A.Rakotomamonjy, S.Canu, Recovering sparse signals with a certain family of nonconvex penalties and dc programming, IEEE Trans. Signal Proces. 57 (12) (2009) 4686–4698. [26] H.Zou, R.Li, One-step sparse estimates in nonconcave penalized likelihood models, Ann. Stat. 36 (4) (2008) 1509–1533. [27] W. Bian, X. Chen, Worst-case complexity of smoothing quadratic regularization methods for non-lipschitzian optimization, SIAM J. Optim. 23 (3) (2013) 1718–1741. [28] W. Bian, X. Chen, Smoothing neural network for constrained nonlipschitz optimization with applications, IEEE Trans. Neur. Net. Lear. 23 (3) (2012) 399–411. [29] W. Bian, X. Chen, Neural network for nonsmooth, nonconvex constrained minimization via smooth approximation, IEEE Trans. Neur. Net. Lear. 25 (3) (2014) 545–556. [30] W. Bian, X. Chen, Y. Ye, Complexity analysis of interior point algorithms for non-lipschitz and nonconvex minimization, Math. Program. 149 (1) (2015) 301–327. [31] S. Zhang, Stable cosparse recovery via lq -analysis optimization, arxiv:1409.4575v1 [cs.it] 16 sep 2014. [32] T.Blumensath, M.E.Davies, Sampling theorems for signals from the union of finite-dimensional linear subspaces, IEEE Trans. Inform. Theory 55 (4) (2008) 1872–1882. [33] X.Chen, W.Zhou, Convergence of reweighted l1 minimization algorithms and unique solution of truncated l p minimization, Technical Report,. [34] R. Burden, J.Faires, Numerical analysis, Thomson Brooks. [35] S.Boyd, N.Parikh, E.Chu, B.Peleato, J.Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learning 3 (1) (2011) 1–122.
CR IP T
(ADMM) and so on[34, 35]. As for the choice of the step size, in CIF algorithm, we adopt an adaptive learning method, which avoids the difficulty of parameter selection partly. Certainly, CIF Algorithm also involves some other parameters, such as the penalty parameter λ. It is noting that it is not easy to determine its initial value. In this work, they are chosen properly based on the experimental results. Therefore, in the future, we will discuss these questions further. Meanwhile, we also hope that the proposed method could apply for more related areas to the analysis cosparse model.
References
AC
CE
PT
ED
M
[1] Y.Lu, M.Do, A theory for sampling signals from a union of subspaces, IEEE Trans. Signal Proces. 56 (6) (2008) 2334–2345. [2] M.Elad, Sparse and redundant representations: From theory to applications in signal and image processing, Springer. [3] A.M.Bruckstein, D.L.Donoho, M.Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Rev. 51 (1) (2009) 34–81. [4] S.Nam, M.E.Davies, M.Elad, R.Gribonval, The cosparse analysis model and algorithms, Appl. Comput. Harmon. Anal. 34 (1) (2013) 30–56. [5] M.Elad, P.Milanfar, R.Rubinstein, Analysis versus synthesis in signal priors, Inverse Probl. 23 (3) (2007) 947–968. [6] J.-F. Cai, S. Osher, Z. Shen, Split bregman methods and frame based image restoration, Multiscale Model. Simul. 8 (3) (2009) 337–369. [7] H. Zhang, M. Yan, W. Yin, One condition for solution uniqueness and robustness of both l1 −synthesis and l1 −analysis minimizations, 2014, available from: http://arxiv.org/abs/1304.5038. [8] R.Giryes, S.Nam, M.Elad, R.Gribonval, M.Davies, Greedy-like algorithms for the cosparse analysis model, Linear Algebra Appl. 441 (2014) 22–60. [9] T.Blumensath, M.Davies, Iterative hard thresholding for compressed sensing, Appl. Comput. Harmon. Anal. 27 (3) (2009) 265–274. [10] S.Foucart, Hard thresholding pursuit: an algorithm for compressive sensing, SIAM J. Numer. Anal. 49 (6) (2011) 2543–2563. [11] D.Needell, J.Tropp, Cosamp: iterative signal recovery from incomplete and inaccurate samples, Appl. Comput. Harmon. Anal. 26 (3) (2009) 301–321. [12] W.Dai, O.Milenkovic, Subspace pursuit for compressive sensing signal reconstruction, IEEE Trans. Inform. Theory 55 (5) (2009) 2230–2249. [13] Q. Lyu, Z. Lin, Y. She, C. Zhang, A comparison of typical l p minimization algorithms, Neurocomputing 119 (2013) 413–424. [14] S. Guo, Z. Wang, Q. Ruan, Enhancing sparsity via l p (0 < p < 1) minimization for robust face recognition, Neurocomputing 99 (2013) 592– 602. [15] S. Foucart, M.-J. Lai, Sparsest solutions of underdetermined linear systems via lq −minimization for 0 < q ≤ 1, Appl. Comput. Harmon. Anal. 26 (3) (2009) 395–407. [16] M. Wang, W. Xu, A. Tang, On the performance of sparse recovery via l p −minimization(0 < p < 1), IEEE Trans. Inform. Theory 57 (11) (2011) 7255–7278.