Reprint of “Nesterov’s algorithm solving dual formulation for compressed sensing”

Reprint of “Nesterov’s algorithm solving dual formulation for compressed sensing”

Journal of Computational and Applied Mathematics 265 (2014) 52–68 Contents lists available at ScienceDirect Journal of Computational and Applied Mat...

472KB Sizes 0 Downloads 32 Views

Journal of Computational and Applied Mathematics 265 (2014) 52–68

Contents lists available at ScienceDirect

Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam

Reprint of ‘‘Nesterov’s algorithm solving dual formulation for compressed sensing’’✩, ✩✩ Feishe Chen a , Lixin Shen a , Bruce W. Suter b , Yuesheng Xu a,∗ a

Department of Mathematics, Syracuse University, Syracuse, NY 13244, USA

b

Air Force Research Laboratory, AFRL/RITB, Rome, NY 13441-4505, USA

article

info

Dedicated to Professor Benyu Guo on the occasion of his seventieth birthday with friendship and esteem Keywords: Nesterov’s algorithm Proximity operator Moreau envelope ℓ1 regularization Compressed sensing

abstract We develop efficient algorithms for solving the compressed sensing problem. We modify the standard ℓ1 regularization model for compressed sensing by adding a quadratic term to its objective function so that the objective function of the dual formulation of the modified model is Lipschitz continuous. In this way, we can apply the well-known Nesterov algorithm to solve the dual formulation and the resulting algorithms have a quadratic convergence. Numerical results presented in this paper show that the proposed algorithms outperform significantly the state-of-the-art algorithm NESTA in accuracy. © 2014 Published by Elsevier B.V.

1. Introduction Compressed sensing [1–3] is an emerging theory in applied mathematics that reshapes the way people work with large data sets. Traditionally, to avoid losing information when representing a signal, one has to follow the Shannon/Nyquist density sampling theory, that is, the sampling frequency must be greater than twice the highest frequency component of the signal. Compressed sensing is a new way to represent signals. The breakthrough of the compressed sensing theory is that one can represent a signal at a rate significantly below the Nyquist sampling frequency. Many potential applications of compressed sensing, such as magnetic resonance imaging [4,5], sensor networks [6], and digital photography [7], were reported in the literature. The basic principle in compressed sensing is that a sparse or compressible signal can be reconstructed from a small number of measurements, measured through appropriate linear combinations of signal values, via an optimization approach. Specifically, we wish to reconstruct a signal (vector) with least nonzero components from given measurements in a vector form which is equal to the product of a given measurement matrix and the signal to be reconstructed. An essential goal in compressed sensing is to reconstruct the ideal signal from a small number of measurements. A key to this goal is the notion of sparsity. It was shown mathematically in [3] that under the sparse assumption, the signal can be exactly reconstructed from the given measurements and the chance of its being wrong is infinitesimally small. In the seminal work of Candes, Romberg, and Tao [1], and that of Donoho [8], the compressed sensing problem was described as solving the ℓ1 minimization problem subject to linear constrains that involve measurements and a measurement matrix. This optimization problem is essentially the basis pursuit problem discussed early in [9], where interior-point DOI of original article: http://dx.doi.org/10.1016/j.cam.2013.09.032. ✩ A publishers’ error resulted in this article appearing in the wrong issue. The article is reprinted here for the reader’s convenience and for the continuity

of the special issue. For citation purposes, please use the original publication details; Journal of Computational and Applied Mathematics 260, pp. 1–17. ✩✩ This research is supported in part by 2012 Air Force Visiting Faculty Research Program, by the US National Science Foundation under grant DMS-

1115523, by US Air Force Office of Scientific Research under grant FA9550-09-1-0511. ∗ Corresponding author. E-mail addresses: [email protected] (F. Chen), [email protected] (L. Shen), [email protected] (B.W. Suter). 0377-0427/$ – see front matter © 2014 Published by Elsevier B.V. http://dx.doi.org/10.1016/j.cam.2014.01.012

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

53

methods in convex analysis were employed to solve the resulting minimization problem. In passing, we remark that the interior-point method is efficient only for a problem of a small size and is not suitable for a large scale optimization problem since the feature of sparsity of the underlying signal is not fully used. Solving the basis pursuit problem of a large scale is challenging. In recent years, many algorithms for the optimization problem have been developed by using the sparsity property of a solution to be sought. A typical approach is to use the idea of the Bregman iteration and the linearized Bregman iteration (see, e.g., [10,11]). However, when encountered with data of high dynamic ranges, the linearized Bregman iteration would not perform as well as it does for data of low dynamic ranges. By using Nesterov’s smoothing technique, an efficient method called Algorithm NESTA was developed in [12] for the optimization problem. It has been tested intensively that Algorithm NESTA is a powerful approach for sparse recovery. However, Algorithm NESTA requires the property that the product of the measurement matrix and its transpose is the identity matrix. This limits its applications in a range of problems. The purpose of this paper is to develop efficient algorithms for solving the ℓ1 regularization model which raises in compressed sensing. Our main point is to make use of the well-known Nesterov algorithm in order to take the advantage of its quadratic convergence property. The quadratic convergence of the Nesterov algorithm requires that the objective function in the optimization problem under consideration is differentiable with Lipschitz continuous gradient. Hence, we modify the objective function of the original model by adding to it a quadratic term so that the objective function of the dual formulation of the modified model is Lipschitz continuous. We then apply the Nesterov algorithm to the dual formulation and the resulting algorithms have a quadratic convergence. The rest of the paper is organized as follows. In Section 2 we describe the modified model and provide mathematical properties of the modified model and its dual formulation. In Section 3, we characterize a solution of the modified model in terms of the proximity operator of the ℓ1 norm evaluated at a point in terms of the solution of its dual problem. By specializing our results in Section 3 for noiseless measurements and noisy measurements, we develop in Section 4 algorithms for a solution of the dual formulation of the modified model with noise-free or noisy measurements via FISTA. Numerical experiments for noise-free and noisy measurements are presented in Sections 5.1 and 5.2, respectively. We draw our conclusion in Section 6. 2. A modified compressed sensing problem and its dual problem We review in this section the notion of compressed sensing and describe a modified compressed sensing problem and its dual problem which can be efficiently solved by Nesterov’s algorithm. Consider a general linear measurement process that computes a measurement vector b ∈ Rm by an m × n measurement matrix A from an n-dimensional signal u, with the number of measurements m being much smaller than that of samples n. That is, the vector b can be written as b = Au.

(1)

A central issue in compressed sensing is to develop algorithms for recovering the sparse solutions of the underdetermined system (1). A sparse solution to this system is a solution of the following optimization problem min{∥u∥0 : u ∈ Rn and Au = b},

(2)

where the ℓ0 -norm ∥u∥0 counts the number of non-zero entries in u. With the optimization model (2), the number of 2s measurements is sufficient to obtain uniform recovery with high probability for an s-sparse vector (that is, the vector to be recovered has at most s non-zero components). Unfortunately, solving (2) is NP-complete, requiring an exhaustive enumeration of all possible locations of the nonzero components in u. Difficulties in solving the optimization problem (2) stem from the nonconvexity and nonsmoothness of the ℓ0 -norm ∥ · ∥0 . A common practice in the community of compressed sensing is to replace the nonconvex ℓ0 -norm ∥ · ∥0 by the convex ℓ1 -norm. To describe our model, we introduce necessary notation. We use ⟨·, ·⟩ and ∥ · ∥, respectively, to denote the inner product and the corresponding ℓ2 -norm of the Euclidean space Rd . By ∥ · ∥1 and ∥ · ∥∞ , respectively, we denote the ℓ1 -norm and ℓ∞ -norm. We use xi for the i-th component of a vector x ∈ Rd for i = 1, 2, . . . , d. The standard inner product in the space Rd is defined by

⟨x, y⟩ :=

d 

xi yi ,

for x, y ∈ Rd .

i =1

In the optimization problem (2), replacing the nonconvex ℓ0 -norm by the convex ℓ1 -norm leads to a new optimization problem min{∥u∥1 : u ∈ Rn and Au = b}.

(3)

Surprisingly, the optimization based on the ℓ1 model (3) can exactly recover s-sparse signals and closely approximate compressible signals with high probability using m ≥ cs log(n/s) i.i.d. Gaussian measurements [2,8]. Roughly speaking, the ratio of the number of measurements required for model (3) over that required for model (2) is about a multiple of log(n/s). In other words, the number of measurements required for model (3) increases significantly for a highly sparse signal. A method NESTA was developed in [12] for the optimization problem (3), using Nesterov’s algorithm which has the quadratic convergence. Intensive numerical testing for the method shows that the NESTA is a powerful approach for sparse

54

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

recovery. However, NESTA has a rather restricted requirement that AA⊤ is the identity matrix which limits its application range. It is the purpose of this paper to develop an efficient algorithm for problem (3) taking the advantage of the quadratic convergence of Nesterov’s algorithm but avoiding the limitation of the NESTA. For this purpose, we propose an alternative model which approximates model (3). That is,



min µ∥u∥1 +

1 2



∥u − uapp ∥ : u ∈ R and Au = b , 2

n

(4)

where µ is a regularization parameter and uapp is a pre-estimated solution that is close enough to a solution of (3). The solution of model (4) is a good approximation to an ideal solution to model (3) if either uapp is close enough to the ideal solution or the parameter µ is sufficiently large. The benefits of adding the quadratic term 21 ∥ · −uapp ∥2 can be further elaborated from both theoretical analysis and algorithmic development for model (4). From the viewpoint of theoretical analysis, the objective functional of model (4) is strongly convex due to this quadrature term. It turns out that even though the objective functional is not differentiable, its conjugate is differentiable with Lipschitz continuous gradients. The differentiability of the conjugate is desirable from the viewpoint of algorithmic development and allows us to explore existing efficient optimization tools for the dual problem of model (4). For instance, we propose to apply the well-known fast algorithm called FISTA [13], a variant of Nesterov’s algorithm, for the dual problem of model (4). Models (1)–(4) are broken once the measurements b are contaminated by noise, which is an inevitable situation in general. In such a scenario, the linear constraint condition b = Au will be relaxed to

∥b − Au∥ ≤ ϵ,

(5)

where ϵ is the estimated noise power. Models (1)–(4) can be extended accordingly to adapt the changes of measurements. The theory established for model (4) is developed for the measurements with this inequality constraint. Replacing the linear constraint condition b = Au in model (4) by the inequality constraint (5) results that the objective functional of the corresponding dual formulation is a sum of a differentiable term with Lipschitz continuous gradient and the nondifferentiable ℓ2 norm. Also, FISTA, a variant of Nesterov’s algorithm could be employed for a solution of the dual problem. We describe below the proposed model. To this end, we review a notion of convex analysis. For the Hilbert space Rd , the class of all lower semicontinuous convex functions f : Rd → (−∞, +∞] such that 2

dom f := {x ∈ Rd : f (x) < +∞} ̸= ∅ is denoted by Γ0 (Rd ). The indicator function of a closed convex set C in Rd is defined, at u ∈ Rd , as

ιC (u) :=



0,

+∞,

if u ∈ C , otherwise.

Clearly, the indicator ιC is in Γ0 (Rd ) for any closed nonempty convex set C . In compressed sensing with noise involved in the collected data as described in (5), we wish to find a solution of the following constrained minimization problem min{∥u∥1 : ∥Au − b∥ ≤ ϵ}.

(6)

u

With the notion of the indicator function we can rewrite the constrained minimization problem as an unconstrained problem min{∥u∥1 + ιC (Au − b)},

(7)

u

where C = {z ∈ Rm : ∥z ∥ ≤ ϵ}. We remark that when ϵ = 0, we have that C = {0} and (7) reduces to the compressed sensing model with noise free observation data. Motivated from this example, we consider below a somewhat more general form of problem (7). For a given matrix A ∈ Rm×n , a vector b ∈ Rm , functions ψ ∈ Γ0 (Rn ), and ϕ ∈ Γ0 (Rm ), we consider the optimization problem min{ψ(u) + ϕ(Au − b) : u ∈ Rn }.

(8)

As mentioned earlier, we wish to apply the fast algorithm FISTA to find a solution of problem (8). The use of the algorithm requires the regularity condition that the gradient of one of the two terms in the objective function is Lipschitz continuous [13]. This requirement prevents us from the direct use of FISTA in solving (8) in the setting of compressed sensing, because in that context any of the two terms in (8) does not have the required regularity condition. This point leads us to modify the original model. For this purpose, we choose a vector uapp in Rn and add a quadratic 21 ∥ · −uapp ∥2 to the objective function of model (8). This leads to a new optimization problem



min ψ(u) + ϕ(Au − b) +

1 2µ

 ∥u − uapp ∥2 : u ∈ Rn ,

(9)

where µ is a positive number. In particular, when ψ and ϕ are chosen to be ∥ · ∥1 and ι{0} , respectively, model (8) reduces to model (3) while model (9) becomes model (4). Although (9) still does not have the regularity condition required for the

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

55

use of FISTA, we shall show later that its dual formulation satisfies the regularity condition. Further, we shall show that the Lipschitz constant of the gradient of the term having better regularity in the dual formulation is independent of the introduced parameter µ. This implies that the introduced parameter will not change the Lipschitz constant and therefore will not affect the convergence speed of the resulting algorithm. The approximation of model (9) to model (8) is discussed in the next lemma. We define F := ψ + ϕ(A · −b). Proposition 2.1. Let A ∈ Rm×n , b ∈ Rm , uapp ∈ Rn , ψ ∈ Γ0 (Rn ), ϕ ∈ Γ0 (Rm ) and µ a positive number. Suppose that the inclusion b ∈ A(dom ψ) − dom ϕ

(10)

holds. If u⋆ and u are minimizers of models (8) and (9), respectively, then 0 ≤ F (u ) − F (u⋆ ) ≤

1

∥u⋆ − uapp ∥2 .



(11)

Proof. The inclusion (10) ensures that both F (u ) and F (u⋆ ) are finite. Therefore, the difference F (u )− F (u⋆ ) is well-defined. The first inequality in (11) holds because u⋆ is a solution to the minimization problem (8) while the second inequality holds because u is a solution to the minimization problem (9).  1 2µ

The implication of Proposition 2.1 is twofold. On one hand, it implies that when µ in (9) is large enough (that is, the term ∥u⋆ − uapp ∥2 in (9) is small enough), the solution to (9) can be viewed as a reasonable approximation to the solution of

(8) (in a sense of the objective function value). On the other hand, it can also be viewed as a reasonable approximation to the solution of (8) when the vector uapp is close enough to a solution of (8). Therefore, with a properly chosen parameter µ and vector uapp , we resort to finding a solution of model (9) due to several promising properties of the model that will be described now. First, we establish the existence and uniqueness of the solution of model (9). Proposition 2.2. Let A ∈ Rm×n , b ∈ Rm , uapp ∈ Rn , ψ ∈ Γ0 (Rn ), ϕ ∈ Γ0 (Rm ) and µ a positive number. If the inclusion (10) holds, then the optimization problem (9) has a unique solution in Rn . Proof. The inclusion (10) implies that the domain of the objective functional of model (9) is nonempty. Since ψ ∈ Γ0 (Rn ) and ϕ ∈ Γ0 (Rm ), there holds that the objective function is convex. Hence, the uniqueness of the minimizer follows from the fact that the objective function is strictly convex. To show the existence of a solution to the minimization problem, it suffices to show that the objective function is coercive. This is true because ψ + ϕ(A · −b) is in Γ0 (Rn ) and the quadratic term 12 ∥ · −uapp ∥2 is supercoercive.  Next, we show that the conjugate of the objective function of model (9) is differentiable and the gradient of the corresponding conjugate function is Lipschitz continuous although the objective function itself is not differentiable. To this end, we review below the concepts of the proximity operator, the Moreau envelope and the conjugate function. The proximity operator was introduced in [14,15]. The proximity operator of a function f ∈ Γ0 (Rd ) with parameter λ, denoted by proxλf , is a mapping from Rd to itself, defined for a given point x ∈ Rd by proxλf (x) := argmin



1 2λ

 ∥u − x∥2 + f (u) : u ∈ Rd .

The Moreau envelope of function f with parameter λ at x ∈ Rd is defined by envλf (x) := min



1 2λ

 ∥u − x∥2 + f (u) : u ∈ Rd .

It can be verified that envλf ∈ Γ0 (Rd ). Furthermore, the Moreau envelope of f is differentiable and its gradient

∇ envλf (x) =

1

λ

(I − proxλf )(x),

x ∈ Rd

(12)

is Lipschitz continuous with Lipschitz constant λ1 . This is because I − proxλf is Lipschitz continuous with Lipschitz constant 1 (see, for example, [16,17] for details). The conjugate of f ∈ Γ0 (Rd ) is the function f ∗ ∈ Γ0 (Rd ) defined by f ∗ : u → sup{⟨x, u⟩ − f (x) : x ∈ Rd }. The proximity operator, the Moreau envelope, and the conjugate function of a function f ∈ Γ0 (Rd ) are linked together via the Moreau decomposition 1 2λ

∥ · ∥2 = envλf (·) + env 1 f ∗

·

λ

λ

(13)

.

(14)

and I = proxλf + λprox 1 f ∗ ◦ λ



1

λ

 I

56

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

From these two identities, we notice that the proximity operator (or the Moreau envelope) of f can be computed easily as long as the proximity operator (or the Moreau envelop) of the conjugate f ∗ of f can be computed easily, and vice-versa. Now, we can show that the sum of the first and the third terms of the objective function of model (9) has a differential conjugate. Lemma 2.3. If for ψ ∈ Γ0 (Rn ), uapp ∈ Rn , and a positive number µ, define f := µψ +

1 2

∥ · −uapp ∥2 ,

then the conjugate f ∗ is differentiable and its gradient is Lipschitz continuous with Lipschitz constant 1. Moreover, f ∗ = env(µψ)∗ (· + uapp ) −

1 2

∥uapp ∥2 .

(15)

Proof. From the definition of the conjugate function and the Moreau envelope, we have that



  1 (µψ)(u) + ∥u − uapp ∥2 : u ∈ Rn 2   1 1 1 = − ∥uapp ∥2 + ∥ · +uapp ∥2 − min (µψ)(u) + ∥u − (· + uapp )∥2 : u ∈ Rn

f ∗ = sup ⟨·, u⟩ −



2

2

1

1

2

2

2

= − ∥uapp ∥2 + ∥ · +uapp ∥2 − env(µψ) (· + uapp ). Applying the Moreau decomposition (13) with λ = 1 to the last two terms of the above equation yields (15). From Eq. (15) we conclude that f ∗ is differentiable. By (12), we know that the gradient of f ∗ is Lipschitz continuous with constant 1.  The above result can be understood from the viewpoint of strong convexity. A function f ∈ Γ0 (Rd ) is strongly convex with constant β if f − (β/2)∥ · ∥2 is in Γ0 (Rd ) as well. A useful fact is that for a strongly convex function with constant β its conjugate function is differentiable and the gradient of the conjugate is Lipschitz with Lipschitz constant 1/β (see, e.g., [18]). Notice that the function µψ + 21 ∥ · −uapp ∥2 in Lemma 2.3 can be written as µψ − ⟨·, uapp ⟩ + 12 ∥uapp ∥2 + 12 ∥ · ∥2 and the

function µψ − ⟨·, uapp ⟩ + 21 ∥uapp ∥2 is convex, that is, µψ + 12 ∥ · −uapp ∥2 is a strongly convex function with constant β = 1. Hence, the conclusion of Lemma 2.3 follows immediately. We now turn to a study of the solution of model (9). The approach we adopt is to obtain the solution via the solution of the dual formulation of model (9). Prior to our presentation, we review two concepts in convex analysis: the strong relative interior of a subset in Rd and the subdifferential of a function in Γ0 (Rd ). For a subset C of Rd , the smallest cone in Rd containing C is called the conical hull of C and we denote it by cone C . The strong relative interior of C is sri(C ) := {x : x ∈ C and cone(C − x) = span(C − x)}. The subdifferential of a function f ∈ Γ0 (Rd ) at a given vector x ∈ Rd is the set defined by

∂ f (x) := {y : y ∈ Rd and f (z ) ≥ f (x) + ⟨y, z − x⟩ for all z ∈ Rd }. In particular, ∂ f (x) = {∇ f (x)} if the function f is differentiable at x. The subdifferential and the proximity operator of the function f ∈ Γ0 (Rd ) are intimately related. Specifically, for x in the domain of f and y ∈ Rd we have that y ∈ ∂ f (x) if and only if

x = proxf (x + y).

(16)

For a discussion of this relation, see, [19] or [20, Proposition 16.34]. Applications of this relation in image processing are reported in our recent work [21,22]. With the help of these notation, we are able to present the well-known Fenchel– Rockafellar duality theorem. Lemma 2.4 (Fenchel–Rockafellar Duality Theorem). Let f ∈ Γ0 (Rn ), g ∈ Γ0 (Rm ), A ∈ Rm×n be such that 0 ∈ sri(dom g − A(dom f )). If x ∈ Rn is a solution of a (primal) problem min{f (u) + g (Au) : u ∈ Rn } and y ∈ Rm is a solution of its dual problem min{f ∗ (A⊤ v) + g ∗ (−v) : v ∈ Rm }, then x ∈ ∂ f ∗ (A⊤ y).

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

57

We next present our main result for model (9). Clearly, the set of solutions to model (9) is identical to the set of solutions to the optimization problem:



min µψ(u) + µϕ(Au − b) +

1 2

 ∥u − uapp ∥2 : u ∈ Rn .

(17)

Consider the problem





1 min − ∥uapp ∥2 + env(µψ)∗ (A⊤ v + uapp ) + (µϕ)∗ (−v) − ⟨v, b⟩ : v ∈ Rm . 2

(18)

We shall study the relation between the problems stated above. Proposition 2.5. Let ψ ∈ Γ0 (Rn ), ϕ ∈ Γ0 (Rm ), uapp ∈ Rn , b ∈ Rm , A ∈ Rm×n , and µ be a positive number. Suppose that −b ∈ sri(dom ϕ − A(dom ψ)). If v ∈ Rm is a solution of the above problem (18), then proxµψ (A⊤ v + uapp ) is the unique solution of model (17). Proof. Set f = µψ +

1 2

∥ · −uapp ∥2 and g = µϕ(· − b).

Clearly, f ∈ Γ0 (Rn ) and g ∈ Γ0 (Rm ). By Proposition 2.2, model (17) has a unique solution which is denoted by u⋆ . Since dom g = b + dom ϕ and dom f = dom ψ , one has that dom g − A(dom f ) = b + dom ϕ − A(dom ψ). By the assumption of −b ∈ sri(dom ϕ − A(dom ψ)), one has that 0 ∈ sri(b + dom ϕ − A(dom ψ)) = sri(dom g − A(dom f )). By Lemma 2.3, we know that f ∗ = env(µψ)∗ (· + uapp ) − 21 ∥uapp ∥2 . Using the definition of the conjugate function, we deduce that g ∗ = (µϕ)∗ + ⟨·, b⟩. By the Fenchel–Rockafellar duality theorem, problem (18) is the dual problem of (17). Hence, u⋆ ∈ ∂ f ∗ (A⊤ v). Since the conjugate function f ∗ is differentiable, we actually have that u⋆ = ∇ f ∗ (A⊤ v). By Eq. (12) we have that

∇ f ∗ = (I − prox(µψ)∗ )(· + uapp ). Further, by the Moreau decomposition (13), we have that

∇ f ∗ = proxµψ (· + uapp ). Putting the above arguments together, we have that u⋆ = proxµψ (A⊤ v + uapp ). This completes the proof.



Proposition 2.5 demonstrates how we can compute a solution of the primal problem (17) if a solution of the dual problem (18) is given. However, we need to have information on whether a solution of the dual problem (18) exists. We shall discuss the existence of the dual formulation in the next section. 3. Solution of the approximate models In this section, we shall apply Proposition 2.5 to the approximate compressed sensing model with inequality constraint or equality constraint and provide a characterization of a solution of the dual formulation, and investigate the relationship between a solution of the approximate model and a solution of the dual formulation. In particular, The solution of the generalized approximate model for model (4) is given in terms of the solution of its dual formulation via the proximity operator. 3.1. Compressed sensing with noise free data In an ideal situation, we expect that the collected data is noise free, that is, the observed data is collected via measurement without perturbation by noise b = Au. Under the assumption that the underlying signal u is sparse, we can recover the signal by solving the basis pursuit problem (3). As we mentioned in the earlier section, basis pursuit problem is a special case of (7) when C in (7) is chosen as {0}. The parameter ϵ indicating the noise power is 0. Correspondingly, in the context of (8), ψ = ∥ · ∥1 and ϕ = ι{0} . By the definition of the Fenchel conjugate, it can be easily deduced that (µϕ)∗ = 0.

58

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

Applying Proposition 2.1 to models (3) and (4), we obtain the following result. Corollary 3.1. Let A ∈ Rm×n , b ∈ Rm , uapp ∈ Rn , and µ be a positive number. Suppose that b ∈ A(Rn ). If u⋆ and u are minimizers of models (3) and (4), respectively, then 0 ≤ ∥u ∥1 − ∥u⋆ ∥1 ≤

1 2µ

∥u⋆ − uapp ∥2 .

Proof. Define ψ := ∥ · ∥1 and ϕ := ι{0} . By Proposition 2.1, it suffices to verify the inclusion condition in Proposition 2.1. Clearly, dom ψ = Rn and dom ϕ = {0}. Hence, the assumption of b ∈ A(Rn ) is identical to b ∈ A(dom ψ) − dom ϕ . The result therefore follows.  To apply Proposition 2.5 to model (4), we need to study an auxiliary minimization problem. For v ∈ Rm we let 1 J (v) := − ∥uapp ∥2 + env(µ∥·∥1 )∗ (A⊤ v + uapp ) − ⟨v, b⟩ 2 and we consider the auxiliary minimization problem min J (v) : v ∈ Rm .





(19)

We first investigate the function J. Lemma 3.2. If uapp ∈ Rn , b ∈ Rm , and A ∈ Rm×n , then J ∈ Γ0 (Rm ). Furthermore, if rank(A) = m ≤ n, i.e., A is a full row rank matrix, then J is coercive. Proof. Since env(µ∥·∥1 )∗ (A⊤ · +uapp ) ∈ Γ0 (Rm ) and −⟨·, b⟩ is linear, the function J is clearly in Γ0 (Rm ). By the Moreau decomposition (13), one can rewrite the function J as 1 1 J (v) = − ∥uapp ∥2 + ∥A⊤ v + uapp ∥2 − env(µ∥·∥1 ) (A⊤ v + uapp ) − ⟨v, b⟩. 2 2 The definition of the Moreau envelope ensures that env(µ∥·∥1 ) (A⊤ v + uapp ) ≤ µ∥A⊤ v + uapp ∥1 . Applying the triangle inequality to ∥A⊤ v + uapp ∥1 and using the fact ∥A⊤ v∥1 ≤





n∥A⊤ v∥, we derive that

env(µ∥·∥1 ) (A⊤ v + uapp ) ≤ µ n∥A⊤ ∥2 ∥v∥ + µ∥uapp ∥1 . It follows, after some simplification, that J (v) ≥

1 2

  √ ∥A⊤ v∥2 − ∥Auapp ∥ + µ n∥A⊤ ∥2 + ∥b∥ ∥v∥ +



1 2

 ∥uapp ∥2 − µ∥uapp ∥1 .

Furthermore, since rank(A) = m ≤ n, the matrix AA is positive definite and thus all of its eigenvalues are positive. Let λmin be its smallest eigenvalue. Hence, ∥A⊤ v∥2 ≥ λmin ∥v∥2 . Set ⊤

C1 := ∥Auapp ∥ + µn∥A⊤ ∥2 + ∥b∥,

and C2 := −µ∥uapp ∥1 .

We have that J (v) ≥

λmin 2

∥v∥2 − C1 ∥v∥ + C2 .

Clearly, we have that lim∥v∥→+∞ J (v) = +∞. Thus, J is coercive.



In the next lemma we present a formula for env(µ∥·∥1 )∗ . Lemma 3.3. If µ is a positive number, then for any u = (u1 , u2 , . . . , ud ) ∈ Rd env(µ∥·∥1 )∗ (u) =

 1 |ui |≥µ

2

(|ui | − µ)2 .

(20)

Proof. By the Moreau decomposition equation (13), we know that env(µ∥·∥1 )∗ (u) =

1 2

∥u∥2 − envµ∥·∥1 (u).

(21)

A direct computation shows that any number x ∈ R envµ|·| (x) =

 1  µ|x| − µ2 ,

if |x| ≥ µ,

1   |x|2 ,

otherwise.

2

2

(22)

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

59

Using formula (22), we obtain a formula for envµ∥·∥1 (u)

   1 2 1  µ|ui | − µ + envµ∥·∥1 (u) = |ui |2 . |ui |≥µ

2

2 |u |<µ i

Substituting this equation into the right hand side of (21) gives the desired formula.



We now study the auxiliary minimization problem (19). Lemma 3.4. Let uapp ∈ Rn , b ∈ Rm , and A ∈ Rm×n . If b is an element in the range of A, that is, there exists a vector u0 ∈ Rn such that Au0 = b, then the set of solutions of problem (19) is nonempty. Proof. By Lemma 3.2, J ∈ Γ0 (Rm ). Notice that the objective function can be written as 1 J (v) = − ∥uapp ∥2 + env(µ∥·∥1 )∗ (A⊤ v + uapp ) − ⟨A⊤ v, u0 ⟩ 2 Let N (A⊤ ) denote the null space of A⊤ and (N (A⊤ ))⊥ denote the orthogonal complement of N (A⊤ ). We consider two cases: v ∈ N (A⊤ ) and v ∈ (N (A⊤ ))⊥ . For all v ∈ N (A⊤ ), we immediately have that 1 J (v) = − ∥uapp ∥2 + env(µ∥·∥1 )∗ (uapp ). 2 Now, take an arbitrary nonzero unit vector p ∈ N (A⊤ )⊥ . Then the index set S := {i : (A⊤ p)i ̸= 0} is nonempty. Hence, when a positive number α is large enough, we have that

S = {i : |(α Ap + uapp )i | ≥ µ}. By Lemma 3.3, for a large enough α we have that

1 1 J (α p) = − ∥uapp ∥2 + (|(α Ap + uapp )i | − µ)2 − α⟨p, b⟩. 2 2 i∈S Clearly, we have that limα→+∞ J (α p) = +∞, that is, J is coercive on (N (A⊤ ))⊥ . Thus, J achieves its minimal value in (N (A⊤ ))⊥ . In summary, the set of solutions to the problem (19) is nonempty.  We remark that the condition in Lemma 3.4 is satisfied when rank(A) = m ≤ n, that is, A is a full row rank matrix. In this scenario, we point it out that there is an alternative way to show the nonemptiness of the set of solutions of problem (19). Actually, to prove the existence of solutions of problem (19), it suffices to show the coercivity of its objective function. This is an immediate consequence of Lemma 3.2. Proposition 3.5. Let uapp ∈ Rn , A ∈ Rm×n , and µ > 0. If b is an element in the range of A, then, the following statements hold. (i) The gradient of the objective functional of problem (19) is Lipschitz continuous with constant ∥A∥22 which is the largest eigenvalue of A⊤ A. (ii) If v ∈ Rm is a solution of the above problem (19), then proxµ∥·∥1 (A⊤ v + uapp ) is the unique solution of model (4). Proof. We first prove (i). It follows from (12) and the chain rule of subdifferential that

∇ J (v) = A(I − prox(µ∥·∥1 )∗ )(A⊤ v + uapp ) − b. Using the Moreau decomposition equation (14), we simply have

∇ J (v) = A prox(µ∥·∥1 ) (A⊤ v + uapp ) − b. Hence, (i) follows immediately from the nonexpansivity of the proximity operator proxµ∥·∥1 . Now, we prove (iii) by using Proposition 2.5. To this end, we verify the hypotheses of Proposition 2.5. Set ψ := ∥ · ∥1 and ϕ := ι{0} . Then solutions of model (4) are consistent to that of model (17). It is clear that dom ψ = Rn and dom ϕ = {0}. Further, because there exists a vector u0 ∈ Rn such that Au0 = b, then cone(A(Rn ) + b) = span(A(Rn ) + b). Note that A(Rn ) = −A(Rn ) and dom ϕ − A(dom ψ) = −A(Rn ). Hence, −b ∈ sri(dom ϕ − A(dom ψ)). By the definition of conjugate function, we have that (µϕ)∗ = 0. It says that model (19) is actually model (18) in the current situation. Moreover, all other requirements in Proposition 2.5 are satisfied. Thus, (iii) follows from Proposition 2.5.  3.2. Compressed sensing with noisy data In a realistic compressed sensing, the measurement that we collected is usually corrupted with noise. Unlike the ideal data collection model (1), the actual model has the form b = Au + noise,

(23)

60

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

where A ∈ Rm×n , u ∈ Rn , b ∈ Rm , and the noise might be assumed to be normally distributed with mean 0. As a result, all optimization problems (2)–(4) are no longer suitable for the model (23). A common practice in compressed sensing is to reconstruct the ideal signal u in (23) by solving min{∥u∥1 : u ∈ Rn and ∥Au − b∥ ≤ ϵ},

(24)

where ϵ is an estimated upper bound of the noise power. Many different approaches were proposed to solve the optimization problem (24). For example, in [12], the solution of problem (24) was obtained by smoothing the ℓ1 norm followed by using Nesterov’s algorithm. Frequently, the Lagrange form of the constrained optimization problem (24) 2



min λ∥u∥1 + u

1 2

2 2



∥Au − b∥

(25)

is considered. Here, λ is the regularization parameter. Study of model (25) and its various solvers are shown in [23–25,11]. We remark that the model (24) is more natural than model (25) since the noise power ϵ is more natural to determine than the regularization parameter λ in (25). We apply the idea discussed in the previous subsection to the current scenario of the measurement with noise. To this end, we define a ball centered at the origin with a radius ϵ in Rm as Bϵ := {v : v ∈ Rm and ∥v∥ ≤ ϵ}. With this notation, model (24) becomes min{∥u∥1 + ιBϵ (Au − b) : u ∈ Rn }. Therefore, for noisy data we propose to study the following model



min µ∥u∥1 + ιBϵ (Au − b) +

1 2

∥u − uapp ∥2 : u ∈ Rn

 (26)

where µ is a regularization parameter and uapp is a pre-estimated solution such that it is close enough to a solution of (24). As an immediate consequence of Proposition 2.1 we obtain the following result. Corollary 3.6. Let A ∈ Rm×n , b ∈ Rm , uapp ∈ Rn , and µ be a positive number. Suppose that b ∈ A(Rn ) − Bϵ . If u⋆ and u are minimizers of models (24) and (26), respectively, then 0 ≤ ∥u ∥1 − ∥u⋆ ∥1 ≤

1 2µ

∥u⋆ − uapp ∥2 .

Proof. Define ψ := ∥ · ∥1 and ϕ := ιBϵ . Clearly, dom ψ = Rn and dom ϕ = Bϵ . Hence, b ∈ A(dom ψ) − dom ϕ . Furthermore, due to u⋆ and u being minimizers of models (24) and (26), respectively, we have that Au⋆ − b ∈ Bϵ and Au − b ∈ Bϵ . By Proposition 2.1, the conclusion of this corollary holds.  Lemma 3.7. Let ψ = ∥ · ∥1 be defined on Rn , let ϕ = ιBϵ be defined on Rm and let A ∈ Rm×n be of full row rank. If −b ∈ dom ϕ − A(dom ψ), then −b ∈ sri(dom ϕ − A(dom ψ)). Proof. We know that dom ϕ = Bϵ and dom ψ = Rn . Since A is of full row rank, A(dom ψ) = A(Rn ) = Rm . As a result, we obtain that dom ϕ − A(dom ψ) = Bϵ − Rm = Rm . Clearly, cone(Rm + b) = span(Rm + b) = Rm . By the definition of the strong relative interior, −b ∈ sri(dom ϕ − A(dom ψ)).



Note that for every positive number µ, µιBϵ = ιBϵ and that the conjugate function of ιBϵ is

(ιBϵ )∗ = ϵ∥ · ∥.

(27)

Now, we present a theoretical result on the solution of the minimization problem (26) via a solution of its dual problem. We let J˜(v) = J (v) + ϵ∥v∥ and consider the problem





min J˜(v) : v ∈ Rm .

(28)

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

61

Proposition 3.8. Let uapp ∈ Rn , A ∈ Rm×n , and both µ and ϵ be positive numbers. Suppose that b ∈ A(Rn ) − Bϵ and the matrix A has a full row rank. Then, the following statements hold: (i) The set of solutions to problem (28) is nonempty; (ii) If v ∈ Rm is a solution of problem (19), then proxµ∥·∥1 (A⊤ v + uapp ) is the unique solution of model (26).

˜ Hence, (i) follows. Proof. (i) By Lemma 3.2, the function J is convex and coercive. So is the function J. (ii) Set ψ := ∥ · ∥1 and ϕ := ιBϵ . By Lemma 3.7, −b ∈ sri(dom ϕ − A(dom ψ)). By the definition of conjugate function, we have that (µϕ)∗ = ϵ∥ · ∥. It says that model (28) is actually model (18) in the current situation. Moreover, all requirements in Proposition 2.5 are satisfied. Hence proxµ∥·∥1 (A⊤ v + uapp ) is a solution of model (26). The uniqueness of the solution to model (26) is from Proposition 2.2. We conclude that (ii) follows.  The requirement of the matrix A being full a row rank is often satisfied in compressed sensing. The partial discrete cosine transform matrix, one of such instances, will be explored in Section 5. 4. Numerical algorithms We present in this section numerical algorithms for the dual formulation via a variant of Nesterov’s acceleration scheme. Based on the relationship between a solution of the approximate model and a solution of the dual formulation, this gives an efficient way to find a solution of the approximate model. Specifically, Propositions 3.5 and 3.8 provide a method to find the solution of the proposed approximate models. We first find a solution v of the dual formulation of the approximate model, and then identify the solution u of approximate model as u = proxµ∥·∥1 (A⊤ v + uapp ). We shall demonstrate that this method is efficient. We shall apply a modified version (FISTA) of Nesterov’s algorithm, which was proposed in [13] to find solutions of the dual formulation of the approximate model. We first recall FISTA, which solves the following minimization problem min{f (x) + g (x)}.

(29)

x∈Rn

In the above equation, g : Rn → R is continuous, convex but possibly nonsmooth and f : Rn → R is convex, continuously differentiable, and its gradient is Lipschitz continuous with constant L, that is,

∥∇ f (x) − ∇ f (y)∥ ≤ L∥x − y∥ for all x, y ∈ Rn . The algorithm FISTA may be described as follows: Algorithm 1 FISTA Initialization: y1 = x0 ∈ Rn , k = 1, t1 = 1. repeat Step 1: Compute ∇ f (yk ). Step 2: Compute xk ,:

    2  1 n   xk = argmin g (x) + x − yk − ∇ f (yk )  : x ∈ R 2 L 

L

Step 3: Compute tk+1 : 1+



tk+1 =

1 + 4tk2

2

Step 4: Compute yk+1 : yk+1 = xk +

tk − 1 tk+1

(xk − xk−1 ).

until ‘‘convergence’’ For the sequences {xk } and {yk } generated by FISTA, it was shown in [13] that for any k ≥ 1 F (xk ) − F (x⋆ ) ≤

2L

(k + 1)2

∥x 0 − x ⋆ ∥2 ,

where F := f + g and x⋆ is a solution  of  the minimization problem (29). As exhibited in the above inequality, FISTA also has 1 a remarkable convergence rate O k2 . We remark that in Step 2, xk can be written as the form in terms of the proximity

62

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

operator

 xk = prox 1 g L

yk −

1 L

 ∇ f (yk ) .

Next, we apply the algorithm FISTA to the dual formulation of the modified model. We first consider the noise free case. Let us look at the objective function J of the dual formulation (19). We already know from the proof of Proposition 3.5 that J is differentiable and its gradient given by

∇ J (v) = Aprox(µ∥·∥1 ) (A⊤ v + uapp ) − b, is Lipschitz continuous with constant ∥A∥22 . In this case, we take f := J and g := 0 in FISTA. For such a choice of g, we can verify that prox 1 g is the identity operator. Applying FISTA to J, we obtain the following algorithm for the noise free case. L

Algorithm 2 (Nesterov’s acceleration scheme for the noise free case). Initialization: v0 ∈ Rm , y1 = v0 , t1 = 1, and L = ∥A∥22 . repeat(k ≥ 1) Step 1: Compute vk :  1  vk = y k − Aprox(µ∥·∥1 ) (A⊤ yk + uapp ) − b 2 ∥A∥2 Step 2: Compute tk+1 : 1+



tk+1 =

1 + 4tk2

2

.

Step 3: Compute yk+1 : yk+1 = vk +



tk − 1



tk+1

(vk − vk−1 ).

until a given stopping criteria is met The solution of the approximate model (4): u = proxµ∥·∥1 (A⊤ vk + uapp ). We remark that the proximity operator proxµ∥·∥1 appeared in Algorithm 2 can be explicitly computed. Actually, the proximity operator proxµ∥·∥1 is the soft-thresholding operator in wavelet analysis and is given at x ∈ Rn by

(proxµ∥·∥1 (x))i = max{|xi | − µ, 0}sign(xi ),

(30)

for i = 1, 2, . . . , n. For the case with noisy observed data, the objective function in (28) becomes J˜(v) = J (v) + ϵ∥v∥. As we discussed earlier in this section, J has a Lipschitz continuous gradient. We further notice that there is an explicit expression for the proximity operator of ϵ∥ · ∥ which is given by 0, v , v−ϵ ∥v∥

 proxϵ∥·∥ (v) =

if ∥v∥ ≤ ϵ, otherwise,

see, e.g., [19]. The minimizer of J˜ can be computed by using FISTA with f := J and g := ϵ∥ · ∥. As a result, we can develop a procedure similar to Algorithm 2 for noisy observed data. In the case of noisy data, the formula for computing v k in Algorithm 2 becomes

vk = prox

 ϵ ∥·∥ ∥A∥2 2

yk −

1

∥A∥22

Aprox(µ∥·∥1 ) (A⊤ yk + uapp ) − b







.

If a convergence criterion is satisfied, u = proxµ∥·∥1 (A⊤ vk + uapp ) is obtained and it can be considered as a solution of the approximate model (26). In view of Corollary 3.1, one could see that the closer to the solution of the original problem (3) uapp becomes, the better the approximation of u the solution of (4) to u⋆ that of (3) is. Hence a better approximate solution uapp of the original problem (3) is desirable. Let us discuss the role of parameter µ. On one hand, the larger µ we choose, the smaller error ∥u ∥1 − ∥u⋆ ∥1 has. On the other hand, the parameter µ balances the terms ∥ · ∥1 and ∥ · −uapp ∥22 in (4). If uapp is far away from the solution that we expect, the parameter µ should be a relative large number; otherwise, the parameter µ should be chosen to be a smaller number. As a result, the choice of the parameter µ depends on the closeness of uapp to an expected solution. This point will be illustrated in the numerical experiment section. From this point, we propose below a procedure of successively decreasing the parameter µ and updating the initial guess uapp .

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

63

Algorithm 3 (A continuation-like version of Algorithm 2) Initialization: uapp = 0, a given scaling factor 0 < γ < 1, a given parameter µ, and a positive integer T . repeat Applying Algorithm 2 with given uapp and µ and denoting the output of the algorithm by u . Resetting uapp = u , µ ← γ µ. until the number of repeating exceeds T . As above, we can propose a similar procedure of successively decreasing the parameter µ and updating the initial guess uapp appeared in model (26). The procedure results in a continuation-like algorithm for noisy data. 5. Numerical experiments In this section, we apply continuation-like algorithms to solve our proposed approximate models in compressed sensing for both noise-free case and noisy case. Also, we provide comparisons of our proposed algorithm to the state-of-art algorithm NESTA introduced in[12]. The experiments were performed in Matlab 7.11 on Thinkpad T400 with Intel Core Duo CPU @2.26G, 3GB RAM on the Windows Vista Home Basic operating system. We first describe the sensing matrix A and sparse signals u. In our experiments, the m × n sensing matrix A is generated by randomly picking m rows from the n × n discrete cosine transform (DCT) matrix. Note that the n × n DCT matrix can be created by the Matlab command dct(eye(n)) and ∥A∥ the norm of the matrix is always 1. The signals u used in our experiments are generated according to [12]. They are s-sparse, that is, there are exactly s nonzero components in each signal. Each nonzero entry is generated as follows:

η1 10αη2

(31)

where η1 = ±1 with probability 1/2 and η2 is uniformly distributed in [0, 1]. Clearly, the dynamic range of the amplitude of nonzero entries of a s-sparse signal is [0, 10α ] with the parameter α controlling this dynamic range. The accuracy of our numerical solutions is quantified by the relative ℓ2 -error, the relative ℓ1 -error, and the absolute ℓ∞ -error defined having the forms

∥ u − u ∥ , ∥ u∥

|∥u∥1 − ∥u ∥1 | , ∥ u∥ 1

∥ u − u ∥ ∞ ,

respectively, where u is the true data and u is the restored data. 5.1. The case of noise-free data In the last section, we addressed the arising issues when the proposed Algorithm 2 is applied to the dual formulation of model (17). Particularly, we discussed the effect of varying the parameter µ and selecting the vector uapp appeared in the proposed algorithm. We now illustrate this point by numerical experiments. In the current setting, the measurement b is the vector Au. Our goal is to recover the sparse signal u from the measurement b. We set n = 512, m = 256, s = 0.05n (rounded up to the nearest integer), and α = 5, and generate sparse signals whose nonzero components are computed with (31). We test two different choices of uapp , namely, uapp being the 0-sparse data and uapp being the exact sparse to be recovered. We remark that the second choice of uapp is not practical, just for showing the effect of the proposed model when uapp is close to the ideal data. The parameter µ is chosen to be 103 , 104 , 105 , 106 , 107 for each choice of uapp . We perform Algorithm 2 with the stopping criteria: the number of iterations exceeds 2000 or

∥Au − b∥ < 10−5 /∥A⊤ b∥∞ , ∥b ∥ and obtain u , the solution of (4). For each of the given values of the parameter µ, we repeat the test 100 times, using different sensing matrices (randomly generated from the n × n DCT matrix) each time as well as a new random instance of the testing data. Table 1 displays the mean and the standard deviation of the number of iterations and the errors for recovered signals measured by the above three error metrics when Algorithm 2 is applied for uapp being the 0-sparse data. In this table, we use the pair (·, ·) to represent the mean and the standard deviation of the results obtained from the 100 tests. The same pair notation will be used in other tables in this paper with the number of tests indicated in the specific context. We can see that the model with relative large number µ usually leads to a better result with the proposed algorithm. This phenomenon matches the result of Corollary 3.1. In the other case where uapp happens to be the true data, as shown in Table 2, the model with a relative small number µ usually leads to a better result with the proposed algorithm. Further, we observe that the algorithm with a larger parameter µ always requires more iterations to fulfill a pre-given stopping criterion. This can be intuitively explained from the step of computing vk in Algorithm 2. To have a fast convergence, a relatively small change is expected to be made for vk after certain number of iterations. This requires that prox(µ∥·∥1 ) (A⊤ yk + uapp ) should not

64

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

Table 1 Errors between the recovered signals and the true data, measured by the relative ℓ2 -error, the relative ℓ1 -error, and the absolute ℓ∞ -error, with uapp being the 0-sparse data.

µ

Iterations

Relative ℓ2 -error

Relative ℓ1 -error

Absolute ℓ∞ -error

103 104 105 106 107

(1.41e+2, 1.60e+2) (1.21e+3, 6.96e+2) (1.32e+3, 2.46e+2) (1.99e+3, 5.38e+1) (2.00e+3, 0)

(5.89e−1, 6.84e−2) (3.21e−2, 6.59e−2) (3.02e−10, 2.18e−10) (7.64e−5, 7.72e−5) (2.58e−3, 1.72e−2)

(2.68, 5.93e−1) (9.77e−2, 1.95e−1) (1.17e−10, 1.25e−10) (6.56e−5, 8.12e−5) (9.83e−4, 3.06e−3)

(3.05e+4, 1.14e+4) (1.93e+3, 4.06e+3) (1.13e−5, 4.33e−6) (4.29, 1.97) (8.39e+1, 4.00e+2)

Table 2 Errors between the recovered signals and the true data, measured by the relative ℓ2 -error, the relative ℓ1 -error, and the absolute ℓ∞ -error, with uapp being the exact sparse to be recovered.

µ

Iterations

Relative ℓ2 -error

Relative ℓ1 -error

Absolute ℓ∞ -error

103 104 105 106 107

(1.70e+2, 2.31e+1) (4.47e+2, 7.49e+1) (1.32e+3, 2.37e+2) (1.99e+3, 5.35e+1) (2.00e+3, 0)

(3.03e−10, 2.27e−10) (2.83e−10, 2.12e−10) (2.97e−10 2.09e−10) (8.20e−5, 1.03e−4) (2.48e−3, 1.72e−2)

(9.89e−11, 1.13e−10) (1.03e−10, 8.86e−11) (1.13e−10, 1.12e−10) (6.95e−5, 8.44e−5) (1.04e−3, 3.74e−3)

(1.14e−5, 4.30e−6) (1.10e−5, 4.45e−6) (1.14e−5, 3.95e−6) (5.20, 1.08e+1) (7.58e+1, 2.92e+2)

Table 3 The columns ‘‘relative ℓ2 -error’’, ‘‘relative ℓ1 -error’’, and ‘‘absolute ℓ∞ -error’’ display the errors between the recovered signals and the true data. The column ‘‘CPU time’’ shows CPU time of algorithms. The number m of measurements is m = n/2 and the test signals are s-sparse with s = 0.05n. Method

Parameter

Relative ℓ2 -error

Relative ℓ1 -error

Absolute ℓ∞ -error

CPU time (s)

µ ˜ = 10−2 µ ˜ = 10−7

(4.01e−10, 6.23e−11) (3.48e−3, 1.33e−4) (3.48e−8, 1.33e−9)

(2.61e−11, 2.02e−11) (6.53e−3, 2.23e−4) (6.54e−8, 2.23e−9)

(5.85e−9, 1.15e−9) (2.64e−2, 2.11e−3) (2.65e−7, 2.12e−8)

(1.46, 2.88e−2) (9.91e−1, 1.16e−1) (2.09, 9.32e−2)

µ ˜ = 10−2 µ ˜ = 10−7

(2.10e−10, 2.40e−11) (3.49e−3, 6.39e−5) (3.49e−8, 6.37e−10)

(2.76e−11, 1.54e−11) (6.56e−3, 9.70e−5) (6.56e−8, 9.66e−10)

(3.41e−9, 4.85e−10) (2.82e−2, 1.48e−3) (2.82e−7, 1.48e−8)

(7.35, 2.96e−1) (4.16, 2.39e−1) (8.19, 2.06e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(9.01e−11, 7.45e−12) (3.48e−3, 3.07e−5) (3.49e−8, 3.07e−10)

(1.47e−11, 2.69e−12) (6.54e−3, 5.38e−5) (6.54e−8, 5.38e−10)

(1.62e−9, 1.70e−10) (3.07e−2, 1.57e−3) (3.07e−7, 1.57e−8)

(4.24e+1, 2.17e−1) (1.99e+1, 6.95e−1) (4.09e+1, 8.26e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(4.80e−12, 4.60e−13) (5.93e−5, 4.31e−6) (5.96e−10, 4.34e−11)

(1.10e−12, 4.54e−13) (1.74e−4, 1.53e−5) (1.75e−9, 1.53e−10)

(3.86e−9, 5.79e−10) (2.67e−2, 1.98e−3) (2.70e−7, 2.08e−8)

(1.70, 2.97e−2) (1.10, 8.81e−2) (2.54, 1.06e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(2.08e−12, 2.77e−013) (5.94e−5, 1.99e−6) (5.97e−10, 1.97e−11)

(2.17e−13, 1.62e−13) (1.76e−4, 5.87e−6) (1.77e−9, 5.80e−11)

(1.92e−9, 2.57e−10) (2.77e−2, 1.60e−3) (2.80e−7, 1.66e−8)

(8.37, 3.53e−1) (4.48, 2.57e−1) (9.97, 5.77e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(9.99e−13, 1.44e−13) (6.01e−5, 9.44e−7) (6.04e−10, 9.21e−12)

(4.52e−14, 3.45e−14) (1.77e−4, 3.23e−6) (1.78e−9, 3.20e−11)

(1.06e−9, 1.64e−10) (3.00e−2, 8.90e−4) (3.05e−7, 8.79e−9)

(4.89e+1, 1.65e−1) (2.13e+1, 6.82e−1) (5.12e+1, 6.32e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(4.82e−14, 7.52e−15) (7.40e−7, 5.06e−8) (7.38e−12, 5.32e−13)

(4.80e−15, 4.11e−15) (2.87e−6, 2.87e−7) (2.87e−11, 2.98e−12)

(3.18e−9, 5.58e−10) (2.42e−2, 1.07e−3) (2.39e−7, 1.41e−8)

(2.30, 5.24e−2) (1.31, 1.84e−1) (3.36, 1.20e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(2.42e−14, 3.05e−15) (7.62e−7, 3.02e−8) (7.53e−12, 2.66e−13)

(2.06e−15, 1.57e−15) (2.91e−6, 1.55e−7) (2.89e−11, 1.43e−12)

(1.76e−9, 2.63e−10) (2.74e−2, 2.03e−3) (2.65e−7, 2.14e−8)

(1.14e+1, 4.53e−1) (5.77, 2.50e−1) (1.18e+1, 7.84e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(1.22e−14, 1.11e−15) (7.62e−7, 1.41e−8) (1.10e−11, 1.53e−11)

(1.17e−15, 7.95e−16) (2.91e−6, 7.03e−8) (4.67e−11, 7.92e−11)

(1.06e−9, 1.45e−10) (2.91e−2, 1.37e−3) (4.37e−7, 6.82e−7)

(6.60e+1, 5.08e−1) (2.58e+1, 8.82e−1) (6.53e+1, 1.33)

13

n=2 Algorithm 3 NESTA n = 215 Algorithm 3 NESTA n = 217 Algorithm 3 NESTA n = 213 Algorithm 3 NESTA n = 215 Algorithm 3 NESTA n = 217 Algorithm 3 NESTA n = 213 Algorithm 3 NESTA n = 215 Algorithm 3 NESTA n = 217 Algorithm 3 NESTA

α=1

α=3

α=5

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

65

Table 4 The columns ‘‘relative ℓ2 -error’’, ‘‘relative ℓ1 -error’’, and ‘‘absolute ℓ∞ -error’’ display the errors between recovered signals and the true data. The column ‘‘CPU time’’ shows CPU time of algorithms. The number m of measurements is n/4 and the test signals are s-sparse with s = 0.02n. Method

Parameter

Relative ℓ2 -error

Relative ℓ1 -error

Absolute ℓ∞ -error

CPU time (s)

µ ˜ = 10−2 µ ˜ = 10−7

(9.58e−10, 1.00e−10) (8.33e−3, 3.01e−4) (8.32e−8, 3.01e−9)

(1.25e−10, 8.23e−11) (1.73e−2, 6.89e−4) (1.73e−7, 6.89e−9)

(1.28e−8, 2.2e−9) (6.32e−2, 4.16e−3) (6.30e−7, 4.15e−8)

(1.81, 2.99e−2) (1.28, 1.18e−1) (3.11, 1.27e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(4.43e−10, 2.77e−11) (8.34e−3, 2.59e−4) (8.33e−8, 2.60e−9)

(6.09e−11, 2.67e−11) (1.74e−2, 4.95e−4) (1.74e−7, 4.96e−9)

(6.74e−9, 7.77e−10) (6.66e−2, 3.34e−3) (6.64e−7, 3.34e−8)

(1.00e+1, 2.23e−1) (4.24, 2.09e−1) (1.05e+1, 2.72e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(2.14e−10, 8.93e−12) (8.35e−3, 1.22e−4) (8 8.34e−8, 1.23e−9)

(2.83e−11, 9.63e−12) (1.74e−2, 2.56e−4) (1.74e−7, 2.56e−9)

(3.66e−9, 3.91e−10) (7.06e−2, 2.47e−3) (7.04e−7, 2.46e−8)

(5.43e+1, 2.92e−1) (2.44e+1, 4.14e−1) (6.04e+1, 5.60e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(1.06e−11, 9.55e−13) (1.40e−4, 1.56e−5) (1.38e−9, 1.51e−10)

(1.32e−12, 6.71e−13) (4.60e−4, 6.15e−5) (4.55e−9, 6.01e−10)

(7.63e−9, 9.55e−10) (6.27e−2, 6.58e−3) (6.10e−7, 6.18e−8)

(2.22, 6.59e−2) (1.44, 1.12e−1) (3.88, 1.65e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(5.01e−12, 3.86e−13) (1.45e−4, 6.53e−6) (1.43e−9, 6.40e−11)

(3.33e−13 2.54e−13) (4.72e−4, 2.99e−5) (4.67e−9, 2.96e−10)

(4.32e−9 4.47e−10) (6.76e−2, 3.58e−3) (6.55e−7, 3.55e−8)

(1.22e+1 4.38e−1) (4.83, 1.82e−1) (1.59e+1, 1.11)

µ ˜ = 10−2 µ ˜ = 10−7

(2.44e−12, 1.67e−13) (1.45e−4, 3.93e−6) (1.43e−9, 3.87e−11)

(9.55e−14, 7.26e−14) (4.72e−4, 1.56e−5) (4.67e−9, 1.55e−10)

(2.42e−9, 2.85e−10) (7.32e−2, 4.28e−3) (7.06e−7, 3.79e−8)

(6.63e+1, 3.00e−1) (2.72e+1, 3.19e−1) (7.29e+1, 2.26)

µ ˜ = 10−2 µ ˜ = 10−7

(1.02e−13, 1.46e−14) (1.68e−6, 1.60e−7) (1.99e−11, 2.08e−12)

(1.09e−14, 8.03e−15) (7.55e−6, 1.06e−6) (8.65e−11, 1.20e−11)

(6.61e−9, 1.36e−9) (5.36e−2, 3.86e−3) (7.45e−7, 7.29e−8)

(3.10, 1.57e−1) (1.58, 1.10e−1) (5.35, 5.69e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(5.29e−14, 3.65e−15) (1.73e−6, 1.16e−7) (2.05e−11, 1.48e−12)

(3.12e−15, 2.23e−15) (7.32e−6, 5.88e−7) (8.41e−11, 6.93e−12)

(3.54e−9, 5.89e−10) (5.85e−2, 2.92e−3) (8.20e−7, 5.76e−8)

(1.63e+1, 5.89e−1) (5.15, 1.29e−1) (2.14e+1, 8.17e−1)

µ ˜ = 10−2 µ ˜ = 10−7

(2.56e−14, 1.25e−15) (1.75e−6, 6.27e−8) (2.08e−11, 7.98e−13)

(1.90e−15, 1.02e−15) (7.46e−6, 3.77e−7) (8.57e−11, 4.36e−12)

(1.97e−9, 1.55e−10) (6.23e−2, 2.96e−3) (8.95e−7, 4.06e−8)

(8.96e+1, 7.56e−1) (3.00e+1, 8.15e−1) (1.11e+2, 1.19)

13

n=2 Algorithm 3 NESTA n = 215 Algorithm 3 NESTA n = 217 Algorithm 3 NESTA n = 213 Algorithm 3 NESTA n = 215 Algorithm 3 NESTA n = 217 Algorithm 3 NESTA n = 213 Algorithm 3 NESTA n = 215 Algorithm 3 NESTA n = 217 Algorithm 3 NESTA

α=1

α=3

α=5

change dramatically relative to A⊤ yk + uapp . Since the relation ∥(I − prox(µ1 ∥·∥1 ) )(w)∥2 ≤ ∥(I − prox(µ2 ∥·∥1 ) )(w)∥2 holds for any w ∈ Rm and arbitrary numbers µ2 > µ1 > 0, therefore, the change of prox(µ∥·∥1 ) (A⊤ yk + uapp ) relative to A⊤ yk + uapp for a larger µ is bigger than that for a smaller µ. As a consequence, for a larger µ Algorithm 2 will take more iterations to converge. These phenomenons provide the evidence that supports the needs of the continuation-like algorithm. The next experiment is design to evaluate the proposed Algorithm 3. We set T = 4,

µ = 2∥A⊤ b∥∞ ,

γ = 5000−1/T .

(32)

Since T = 4, Algorithm 2 will be used T times in Algorithm 3. In the kth application of Algorithm 2, it is stopped if the number of iterations exceeds 300 × 2k−T or 10−7 ∥Au − b∥ < √ . ∥b ∥ n max{∥A⊤ b∥∞ , 1} The tested signals in Table 3 have different sizes (n ∈ {213 , 215 , 217 }) and dynamic ranges (α ∈ {1, 3, 5}). The nonzero elements of the signals are always set to s = 0.05n (rounded up to the nearest integer) and the number of measurements is the half of the lengths of the signals, that is, m = n/2. Table 3 reports the mean and the standard deviation of the errors for recovered signals measured by the above three error metrics and the CPU times consumed, from 20 repeated experiments using Algorithm 3. The performance of our proposed algorithm is compared to that of NESTA developed in [12]. The parameter µ ˜ in Table 3 is used in [12] to smooth the ℓ1 norm by Nesterov’s smoothing technique. The smaller µ ˜ is the

66

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

Table 5 The columns ‘‘relative ℓ2 -error’’, ‘‘relative ℓ1 -error’’, and ‘‘absolute ℓ∞ -error’’ display the errors between recovered signals and the true data. The column ‘‘CPU time’’ shows CPU time of algorithms. The number of measurements m is n/2 and the test signals are s-sparse with s = 0.05n. Relative ℓ2 -error

Relative ℓ1 -error

Absolute ℓ∞ -error

CPU time (s)

n=2 Algorithm 3 NESTA

(3.62e−2, 1.43e−3) (4.19e−2, 1.28e−3)

(3.90e−3, 1.22e−3) (1.96e−2, 8.60e−4)

(3.69e−1, 3.48e−2) (4.23e−1, 3.64e−2)

(3.51, 1.01e−1) (1.47, 9.27e−2)

n = 215 Algorithm 3 NESTA

(3.60e−2, 7.20e−4) (4.20e−2, 7.52e−4)

(3.58e−3, 8.20e−4) (1.88e−2, 7.23e−4)

(3.94e−1, 2.72e−2) (4.54e−1, 3.04e−2)

(1.74e+1, 5.69e−1) (4.83, 1.90e−1)

n = 217 Algorithm 3 NESTA

(3.59e−2, 3.24e−4) (4.18e−2, 3.79e−4)

(3.43e−3, 3.48e−4) (1.84e−2, 3.21e−4)

(4.15e−1, 1.91e−2) (4.89e−1, 2.75e−2)

(9.48e+1, 1.06) (2.72e+1, 3.07e−1)

n = 213 Algorithm 3 NESTA

(1.18e−2, 8.62e−4) (1.27e−2, 7.89e−4)

(1.13e−3, 6.53e−4) (1.17e−2, 1.14e−3)

(7.17, 4.99e−1) (7.97, 4.52e−1)

(3.33, 1.49e−1) (1.66, 2.10e−1)

n = 215 Algorithm 3 NESTA

(1.17e−2, 2.33e−4) (1.26e−2, 3.36e−4)

(1.18e−3, 3.21e−4) (1.12e−2, 4.40e−4)

(7.78, 5.53e−1) (8.69, 8.17e−1)

(1.73e+1, 5.38e−1) (5.38, 1.54e−1)

n = 217 Algorithm 3 NESTA

(1.16e−2, 1.26e−4) (1.27e−2, 1.85e−4)

(1.24e−3, 1.90e−4) (1.10e−2, 2.40e−4)

(8.33, 5.70e−1) (9.10, 4.25e−1)

(9.84e+1, 6.78e−1) (3.14e+1, 1.03)

n = 213 Algorithm 3 NESTA

(7.13e−4, 7.34e−5) (7.83e−4, 4.74e−5)

(5.00e−5, 4.64e−5) (8.92e−4, 1.07e−4)

(3.41e+1, 3.43) (3.86e+1, 2.53)

(3.40, 1.28e−1) (1.46, 1.09e−1)

n = 215 Algorithm 3 NESTA

(7.08e−4, 3.22e−5) (7.95e−4, 3.47e−5)

(2.63e−5, 2.20e−5) (8.46e−4, 5.54e−5)

(3.79e+1, 2.68) (4.19e+1, 2.28)

(1.71e+1, 4.72e−1) (4.83, 1.70e−1)

n = 217 Algorithm 3 NESTA

(7.10e−4, 1.23e−5) (7.91e−4, 1.49e−5)

(3.21e−5, 1.42e−5) (8.26e−4, 2.79e−5)

(4.04e+1, 1.71) (4.58e+1, 2.19)

(9.59e+1, 6.43e−1) (2.73e+1, 4.90e−1)

Method 13

α = 1, σ = 0.05

α = 3, σ = 1

α = 5, σ = 5

closer the smoothed ℓ1 norm to the ℓ1 norm. Two different levels of µ ˜ , namely, µ ˜ = 10−2 and µ ˜ = 10−7 , are used for the NESTA in our experiments. A parameter δ for tolerance in the NESTA varies for different values of the smoothing parameter µ ˜ and needs to be determined. We finally choose δ = 10−8 for µ ˜ = 10−2 and δ = 10−12 for µ ˜ = 10−7 since such choices lead to reasonable results. As illustrated in Table 3, we can see that our proposed Algorithm 3 can accurately recover sparse signals with different dynamic ranges. By comparing to the results from the NESTA, the results of our proposed algorithm have up to 3, 5, and 3 orders of improvement in accuracy measured in the ‘‘relative ℓ2 -error’’, ‘‘relative ℓ1 -error’’, and ‘‘absolute ℓ∞ -error’’, respectively. The experimental settings for Table 4 are the same as those for Table 3 except the number m of measurements is n/4 and the number of nonzero elements of the sparse signals is set to be 0.02n. The results from the NESTA are also included for comparison. We can conclude that with comparable computational cost the results from our algorithm have up to 3, 5, and 2 orders of improvement over those from the NESTA in accuracy measured in the ‘‘relative ℓ2 -error’’, ‘‘relative ℓ1 -error’’, and ‘‘absolute ℓ∞ -error’’, respectively. 5.2. The case of noisy data Numerical experiments were carried out to test the performance of our proposed algorithm for noisy measurements. Signals u of length n with s-sparse are generated by (31). The lengths of sparse signals are chosen to be n = 213 , 215 , 217 and three different levels of dynamic ranges are α = 1, 3, and 5. Our sensing matrices A are formed by randomly selected m rows of the n × n DCT matrix. Our measurements b are simply Au followed by adding Gaussian noise with mean 0 and standard deviation σ . Corresponding to the levels of dynamic ranges α = 1, 3, and 5, the standard deviation σ of noise are set to be 0.05, 1, and 5, respectively. In Table 5, for all possible combinations of the lengths of signals and the levels of dynamic ranges, we always set m = n/2 and s = 0.05n. As we did for noise free data, the numerical results reported here are obtained by repeating the test 20 times. Again, we include the result from the NESTA for comparison purpose. The experimental settings for generating Table 6 are the same as those for Table 5 except that we change m from n/2 to n/4 and s from 0.05n to 0.02n. From both tables, we can observe that our algorithm performs better than the NESTA in terms of the three different error measures. We also notice that our algorithm needs more CPU times than the NESTA to achieve the reported results. We want to remark that the accuracy of recovered signals cannot be further improved by the NESTA. To this end, we pick a specific setting of n = 215 , m = n/2, s = 0.05n, α = 5 as an example to illustrate this point. For different values µ ˜ and δ used in the NESTA, Table 7 shows its performance. We repeat the experiments 10 times for each pair of (µ, ˜ δ) to obtain the results in Table 7.

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

67

Table 6 The columns ‘‘relative ℓ2 -error’’, ‘‘relative ℓ1 -error’’, and ‘‘absolute ℓ∞ -error’’ display the errors between recovered signals and the true data. The column ‘‘CPU time’’ shows CPU time of algorithms. The number of measurements m is n/4 and the test signals are s-sparse with s = 0.02n. Relative ℓ2 -error

Relative ℓ1 -error

Absolute ℓ∞ -error

CPU time (s)

n=2 Algorithm 3 NESTA

(7.19e−2, 6.39e−3) (7.27e−2, 2.96e−3)

(4.88e−3, 3.24e−3) (2.53e−2, 2.82e−3)

(6.35e−1, 9.15e−2) (6.77e−1, 6.80e−2)

(3.51, 3.91e−2) (1.69, 9.79e−2)

n = 215 Algorithm 3 NESTA

(6.92e−2, 2.70e−3) (7.18e−2, 2.07e−3)

(4.80e−3, 1.77e−3) (2.27e−2, 1.33e−3)

(6.88e−1, 4.95e−2) (7.17e−1, 3.09e−2)

(1.65e+1, 7.91e−1) (6.54, 1.82e−1)

n = 217 Algorithm 3 NESTA

(6.76e−2, 1.28e−3) (7.17e−2, 1.25e−3)

(5.08e−3, 9.08e−4) (2.20e−2, 8.85e−4)

(7.42e−1, 7.07e−2) (7.52e−1, 3.65e−2)

(9.50e+1, 1.01) (3.28e+1, 3.03e−1)

n = 213 Algorithm 3 NESTA

(1.94e−2, 2.13e−3) (2.02e−2, 1.85e−3)

(1.36e−3, 8.54e−4) (1.57e−2, 1.87e−3)

(1.03e+1, 1.05) (1.20e+1, 7.92e−1)

(3.38, 1.40e−1) (1.90, 1.10e−1)

n = 215 Algorithm 3

(2.06e−2, 9.19e−4)

(1.53e−2, 1.20e−3)

(1.30e+1, 7.85e−1)

(7.39, 1.41e−1)

n = 217 Algorithm 3 NESTA

(1.93e−2, 4.71e−4) (2.04e−2, 5.61e−4)

(1.38e−3, 4.13e−4) (1.47e−2, 6.33e−4)

(1.29e+1, 7.72e−1) (1.36e+1, 6.71e−1)

(9.79e+1, 1.45) (3.67e+1, 4.38e−1)

n = 213 Algorithm 3 NESTA

(1.09e−3, 1.36e−4) (1.25e−3, 1.20e−4)

(9.10e−5, 6.42e−5) (1.21e−3, 1.91e−4)

(5.04e+1, 5.20) (5.92e+1, 5.25)

(3.45, 1.04e−1) (1.84, 1.22e−1)

n = 215 Algorithm 3 NESTA

(1.20e−3, 7.38e−5) (1.27e−3, 8.32e−5)

(6.22e−5, 3.93e−5) (1.10e−3, 1.10e−4)

(5.50e+1, 3.80) (6.32e+1, 3.96)

(1.74e+1, 5.15e−1) (7.18, 1.55e−1)

n = 217 Algorithm 3 NESTA

(1.16e−3, 3.70e−5) (1.29e−3, 4.24e−5)

(3.63e−5, 2.28e−5) (1.08e−3, 6.12e−5)

(6.13e+1, 5.36) (6.94e+1, 4.63)

(9.63e+1, 6.30e−1) (3.48e+1, 1.03)

Method 13

α = 1, σ = 0.05

α = 3, σ = 1

α = 5, σ = 5

Table 7 Performance of NESTA with different values of smoothing, approximation parameter µ ˜ and different values of tolerance parameter δ . The columns ‘‘relative ℓ2 -error’’, ‘‘relative ℓ1 -error’’, and ‘‘absolute ℓ∞ -error’’ display the errors between recovered signals and the true data. The column ‘‘CPU time’’ shows CPU 15 time of algorithms. The data has setting n = 2 , m = n/2, s = 0.05n, α = 5, the Gaussian noise standard deviation σ = 5.

µ ˜

δ −6

Relative ℓ2 -error

Relative ℓ1 -error

Absolute ℓ∞ -error

CPU time (s)

10−2

10 10−8 10−10 10−12

(8.01e−4, 3.25e−5) (7.46e−4, 3.13e−5) (7.41e−4, 3.12e−5) (7.41e−4, 3.12e−5)

(9.74e−4, 6.45e−5) (9.84e−4, 6.45e−5) (9.85e−4, 6.45e−5) (9.85e−4, 6.45e−5)

(4.20e+1, 9.28e−1) (3.93e+1, 9.86e−1) (3.90e+1, 1.00) (3.90e+1, 1.00)

(6.52, 1.95e−1) (1.40e+1, 2.71e−1) (2.88e+1, 5.97e−1) (5.89e+1, 8.30e−1)

10

−4

10−6 10−8 10−10 10−12

(9.38e−4, 3.42e−5) (7.81e−4, 3.16e−5) (7.43e−4, 3.12e−5) (7.40e−4, 3.12e−5)

(8.95e−4, 6.41e−5) (9.84e−4, 6.47e−5) (9.88e−4, 6.47e−5) (9.88e−4, 6.47e−5)

(4.90e+1, 2.52) (4.09e+1, 9.80e−1) (3.91e+1, 9.94e−1) (3.90e+1, 1.00)

(5.27, 1.67e−1) (1.37e+1, 3.81e−1) (1.03e+2, 1.37) (1.58e+2, 1.71)

10

−6

10−6 10−8 10−10 10−12

(9.40e−4, 3.76e−5) (8.38e−4, 3.37e−5) (7.56e−4, 3.15e−5) (7.43e−4, 3.12e−5)

(6.73e−4, 6.35e−5) (9.53e−4, 6.44e−5) (9.87e−4, 6.47e−5) (9.88e−4, 6.47e−5)

(4.63e+1, 1.43) (4.39e+1, 1.36) (3.98e+1, 9.07e−1) (3.91e+1, 9.92e−1)

(4.78, 1.00e−1) (9.27, 1.95e−1) (6.87e+1, 1.02) (2.34e+2 5.82)

6. Conclusions We propose efficient algorithms for the optimization problem for compressed sensing by solving its dual formulation with Nesterov’s algorithm. All numerical results presented in this paper show that the proposed algorithms perform significantly better than the state-of-the-art algorithm NESTA in accuracy in terms of the relative ℓ2 , the relative ℓ1 , and absolute ℓ∞ error measures, for tested signals ranging from low dynamic ranges to high dynamic ranges. References [1] E. Candes, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory 52 (2) (2006) 489–509. [2] E. Candes, J. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics 59 (8) (2006) 1207–1223. [3] E. Candes, T. Tao, Near optimal signal recovery from random projections: universal encoding strategies? IEEE Transactions on Information Theory 52 (12) (2006) 5406–5425.

68

F. Chen et al. / Journal of Computational and Applied Mathematics 265 (2014) 52–68

[4] M. Lustig, D. Donoho, J.M. Pauly, Sparse MRI: The application of compressed sensing for rapid MR imaging, Magnetic Resonance in Medicine 58 (2007) 1182–1195. [5] M. Lustig, D. Donoho, J. Santos, J. Pauly, Compressed sensing MRI, IEEE Signal Processing Magazine 25 (2008) 72–82. [6] W. Bajwa, J. Haupt, A. Sayeed, R. Nowak, Compressive wireless sensing, in: Proc. Fifth Int. Conf. on Information Processing in Sensor Networks, 2006, pp. 134–142. [7] D. Takhar, V. Bansal, M. Wakin, M. Duarte, D. Baron, K.F. Kelly, R.G. Baraniuk, A compressed sensing camera: new theory and an implementation using digital micromirrors, in: Proc. Comp. Imaging IV at SPIE Electronic Imaging, San Jose, California, 2006. [8] D. Donoho, Compressed sensing, IEEE Transactions on Information Theory 52 (4) (2006) 1289–1306. [9] S. Chen, D. Donoho, M. Saunders, Atomic decomposition by basis pursuit, SIAM Journal on Scientific Computing 20 (1998) 33–61. [10] J. Cai, S. Osher, Z. Shen, Convergence of the linearized bregman iteration for ℓ1 minimization, Mathematics of Computation 78 (2009) 2127–2136. [11] W. Yin, S. Osher, D. Goldfarb, J. Darbon, Bregman iterative algorithms for ℓ1 minimization with applications to compressed sensing, SIAM Journal on Imaging Sciences 1 (2008) 143–168. [12] S. Becker, J. Bobin, E. Candes, NESTA: a fast and accurate first-order method for sparse recovery, SIAM Journal on Imaging Sciences 4 (1) (2009) 1–39. [13] A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences 2 (2009) 183–202. [14] J.-J. Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien, Comptes Rendus de l’Academie des Sciences, Serie A (Mathematique) 255 (1962) 1897–2899. [15] J.-J. Moreau, Proximité et dualité dans un espace hilbertien, Bulletin de la Société Mathématique de France 93 (1965) 273–299. [16] P. Combettes, V. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Modeling and Simulation: A SIAM Interdisciplinary Journal 4 (2005) 1168–1200. [17] R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. [18] J.-B. Hiriart-Urruty, C. Lemarechal, Convex Analysis and Minimization Algorithms II, in: Grundlehren der Mathematischen Wissenschaften (Fundamental Principles of Mathematical Sciences), vol. 306, Springer-Verlag, Berlin, 1993. [19] C.A. Micchelli, L. Shen, Y. Xu, Proximity algorithms for image models: denoising, Inverse Problems 27 (2011) 045009. p. 30. [20] H.L. Bauschke, P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, in: AMS Books in Mathematics, Springer, New York, 2011. [21] A. Krol, S. Li, L. Shen, Y. Xu, Preconditioned alternating projection algorithms for maximum a Posteriori ECT reconstruction, Inverse Problems 28 (2012) 115005. p. 34. [22] Q. Li, C.A. Micchelli, L. Shen, Y. Xu, A proximity algorithm accelerated by Gauss–Seidel iterations for L1/TV denoising models, Inverse Problems 28 (2012) 095003. p. 20. [23] R. Chan, T. Chan, L. Shen, Z. Shen, Wavelet algorithms for high-resolution image reconstruction, SIAM Journal on Scientific Computing 24 (4) (2003) 1408–1432. [24] I. Daubechies, M. Defrise, C.D. Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Communications on Pure and Applied Mathematics 57 (2004) 1413–1541. [25] S. Wright, R. Nowak, M. Figueiredo, Sparse reconstruction by separable approximation, IEEE Transactions on Signal Processing 57 (7) (2009) 2479–2493.