Journal of Computational and Applied Mathematics 281 (2015) 152–168
Contents lists available at ScienceDirect
Journal of Computational and Applied Mathematics journal homepage: www.elsevier.com/locate/cam
Vectorial additive half-quadratic minimization for isotropic regularization✩ Wen-Ping Li a,∗ , Zheng-Ming Wang b , Tao Zhang a , Bao-Cun Bai a , Yong-He Ding a , Ya Deng c a
Beijing Institute of Tracking and Telecommunications Technology, Beijing 100094, China
b
College of Science, National University of Defense Technology, Changsha 410073, China
c
Department of Electrical Engineering, Ecole Polytechnique de Montreal, Quebec, Canada
article
info
Article history: Received 17 January 2014 Received in revised form 7 December 2014 MSC: 49J24 Keywords: Convergence analysis Image restoration Optimal parameters Additive half-quadratic regularization Isotropic Variational methods
abstract We propose the vectorial additive half-quadratic (VAHQ) algorithm to minimize the isotropic regularized cost function with the general edge-preserving potential functions (PFs) in image restoration. By introducing an auxiliary vectorial variable, the cost function is changed into an augmented one which can be alternately minimized. One minimization is solved with an explicit expression, the other is implemented by Fast Fourier Transform (FFT). VAHQ is shown to globally converge to a stationary point for nonconvex PFs providing all stationary points are isolated and to a unique minimum for convex PFs without any isolation assumption, on an extended domain of parameters. What is more, the linear convergence rate of VAHQ is proved to be less than 1, the explicit expression of the optimal parameters and the optimal bound of the convergence rate are present for convex PFs. Image restoration examples show the restoration performance of nonconvex PFs, the lower computation cost of our algorithm compared to Majorize–Minimize memory gradient (MMMG) algorithm with the help of FFT and some interesting phenomena confirming our conclusions on the convergence domain and the convergence rate. © 2014 Elsevier B.V. All rights reserved.
1. Introduction Digital image restoration is to reconstruct an image of an unknown scene from an observed image, which plays an important part in various applied areas such as medical and astronomical imaging, image and video coding, computer vision and many others [1,2]. For simplicity, we focus on the image observation model: g = Hf + η
(1)
where g and f ∈ Rp are the observed image and the true image sized n × n with p = n2 , respectively; H ∈ Rp×p is generated by a linear blurred (or convolution) operator usually; η ∈ Rp is additive noise. The problem of restoring f from (1) is known to be typically ill-posed when H is singular or η is unknown. A flexible means to overcome such an ill-posed problem is
✩ This work was supported in part by NSFC: 61205190, 61271437 and NUDT Project: JC12-02-05, JC13-02-03.
∗
Corresponding author. E-mail addresses:
[email protected] (W.-P. Li),
[email protected] (Z.-M. Wang),
[email protected] (Y. Deng).
http://dx.doi.org/10.1016/j.cam.2014.12.011 0377-0427/© 2014 Elsevier B.V. All rights reserved.
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
153
regularization [3,4], where f ∗ is a minimizer of the cost function: p β
p
ϕ (∥[Df ]i ∥) . (2) 2 i =1 i=1 The cost function (2) is classical in variational methods and Bayesian estimation. In a statistical setting, the first quadratic data-fidelity term above supposes that the noise is white and Gaussian. The role of the second regularization term is to push f ∗ to exhibit some expected features, such as preserving edges and smoothing regions. In (2), β is a parameter to balance the data-fidelity term and the regularization term. D = [D1 , D2 ]T is the digital gradient operator for 2-D image, with D1 , D2 : Rp → Rp the digital gradient alone the two directions, respectively; the vector [Df ]i = [[D1 f ]i , [D2 f ]i ]T ∈ R2 with i = 1, . . . , p. H and D satisfy the basic requirement: E (f ) =
[Hf − g ]2i +
ker(H T H ) ∩ ker(DT D) = {0}
(3)
ϕ is called potential function (PF). Following the definition [5] of total variation (TV) regularization [6], the regularization is
called isotropic if one sets ϕ (∥[Df ]i ∥) = ϕ( [D1 f ]2i + [D2 f ]2i ) and anisotropic if one sets ϕ (∥[Df ]i ∥) = ϕ (|[D1 f ]i |) +
ϕ (|[D2 f ]i |), respectively. In practice, the isotropic regularization yields image restoration of better quality than the anisotropic one [7], because the isotropic one involves the dependence of the gradient in two directions. Various PFs have been considered in the literature. A basic requirement in image restoration is that ϕ allows the recovery of large differences ∥Df ∥ at the locations of edges and smoothes the other differences. The regularization has been originally introduced by Tikhonov and Arsenin [3] who set ϕ(t ) = t 2 although it has very strong smoothing properties and cannot preserve edges. Since the pioneering work of Geman and Geman [4], convex [6,8,9] and nonconvex [10–13] PFs have been introduced to preserve discontinuous features in images. Compared to convex PFs, nonconvex PFs offer much richer possibilities to restore high quality images with neat edges [1,7,14,15] from piecewise constant images, but leads to bad mathematical properties (the existence and uniqueness of solution could not be met) and challenging computation. In order to minimize (2), half-quadratic (HQ) reformulation of (2) was pioneered in [12,13]. The basic idea of HQ reformulation is to convert the original (possibly nonconvex) cost function (2) into a sequence of convex and quadratic functions by introducing an auxiliary variable. There are two forms of HQ formulation: Geman and Reynolds [12] first considered the multiplicative form, which can be applied to both anisotropic and isotropic regularization. However, the multiplicative HQ cost function cannot be efficiently minimized by FFT for blurring problems [16] because of its Hessian without block circulant structure. Later, Geman and Yang [13] proposed the additive form for the anisotropic regularization, whose Hessian is of the block circulant structure. Therefore, the additive HQ cost function can be efficiently minimized by FFT. However, the additive HQ method could not be directly applied to the isotropic regularization since its augmented cost function is not quadratic in f [5,16]. The reason why the additive HQ method does not suit to the isotropic regularization is that the introduced auxiliary variable is scalar (we call it scalar additive HQ (SAHQ) method to distinguish from VAHQ), but not vectorial. Wang et al. [5] proposed an alternating algorithm to minimize the approximate isotropic TV regularization, such an algorithm can be viewed as a special case of VAHQ, but they did not generalize it for general PFs. Recently, some subspace algorithms [17–20] have proposed to minimize the generalized form of the cost function (2), the small number of memory gradient descend directions is introduced to avoid the inverse of a large scale matrix with relatively lower computation cost and faster convergence speed. In the paper, we focus on the cost function (2) with the isotropic regularization. Firstly, we introduce an auxiliary vectorial variable to reformulate (2) into additive HQ form for a wide class of edge-preserving PFs (no matter convex or nonconvex) using the duality theory, then develop the alternate minimization algorithm (i.e., VAHQ) to a series of convex or quadratic reformulated cost functions, such alternate minimization algorithm has a simple explicit form or can be efficiently implemented by FFT, this is our first contribution. Nextly, we show that VAHQ is a class of gradient related iterative schemes [21] essentially, then analyze the global convergence of VAHQ providing all stationary points are isolated and obtain an extended convergence domain of parameters for nonconvex and convex PFs, with unconstrained optimization tools such as Armijo stepsize rule and the global convergence lemma, such analysis is similar to [22]’s way. What is more, we show any stationary point of the cost function with the isotropic regularization is isolated and VAHQ is globally convergent to the unique minimum without any isolation assumption when PFs are convex (unnecessarily strictly), such conclusions do not hold for the scalar HQ methods. Finally, we consider the bound of the linear convergence rate of VAHQ for convex PFs, the relationship of the optimal bound of convergence rate and the optimal parameters is present analytically, such conclusions have not emerged for the scalar HQ methods to the best of our knowledge. p Notations. When necessary, a vector f ∈ Rp is also written as [fi ]i=1 , and [f ]n denotes the n × n folded matrix of f with p 2 p = n . Given a p × p matrix H , diag(H ) denotes its diagonal vector. ∀f , g ∈ Rp , the inner product ⟨f , g ⟩ = i=1 fi gi , p×q q p the ℓ2 norm ∥f ∥. Given a matrix H ∈ R , Hi,· ∈ R and H·,j ∈ R represent the ith row vector and the jth column vector of H respectively. Denote H ≻ 0 and H ≽ 0 when the symmetric matrix H is positive definite and semi-positive definite respectively, H T the transpose of H . Let φ be a function on the scalar t, φ ′ and φ ′′ denote the first-order and second-order differentiable on t respectively. Let φ(∥x∥) be a function on the vectorial x, ∇φ and ∇ 2 φ denote the firstorder differentiable vector and the second-order differentiable matrix of φ(∥x∥) on x respectively. Let φ(a, b) be a binary function, φ(·, b) represents the single function in a with b fixed, and φ(a, ·) represents the single function in b with a fixed. Denote R+ = {t ∈ R : t ≥ 0}, 0 the zero vector, 0p the p × p zero matrix, Ip the p × p identity matrix, ‘‘⊗’’ the Kronecker product.
154
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
2. Vectorial additive half-quadratic algorithm 2.1. Basic hypotheses Consider the cost function (2) with the isotropic regularization. Let ϕ satisfy the hypotheses: H1: ϕ : R+ → R+ is twice continuously differentiable and strictly increasing. H2: ϕ ′ (t )/t is differentiable and monotonously decreasing on R+ , and there exists 0 < cˆ < +∞, such that
ϕ ′ (0)
ϕ ′ (t )
= lim
0
= cˆ and
t
t →0+
lim
ϕ ′ (t )
t →+∞
t
= 0.
(4)
√
The hypothesis that ϕ ′ (t )/t is differentiable and monotonously decreasing on R+ in H2 implies that ϕ( t ) is concave, which is the basic hypothesis for the multiplicative HQ method. As showed by Charbonnier [23], H2 is called the edgepreserving condition. From H2, ϕ ′ (t )/t is differentiable and monotonously decreasing, then
ϕ ′ (t )
cˆ = lim
t
t →0+
≥
ϕ ′ (t ) t
≥ ϕ ′′ (t ),
∀t ∈ R+ .
(5)
Thus, H2 implies the basic hypothesis for SAHQ method [16]: there exists cˆ > 0, such that cˆ t 2 /2 − ϕ(t ) is convex, when ϕ(t ) is twice differentiable and t ∈ R+ . The differentiable conditions in H1 and H2 are technical requirements in mathematics. From H2, the derivative of ϕ ′ (t )/t at 0 is finite, then cˆ = ϕ ′ (0)/0 = ϕ ′′ (0).
(6)
Thus, ∀y ∈ R2 , following [24] we have
ϕ ′ (∥y ∥) y ∈ R2 ∥y ∥ ′ T T ϕ (∥y ∥) I − y ⊗ y + ϕ ′′ (∥y ∥) y ⊗ y , 2 2 2 2 ∇ ϕ(∥y ∥) = ∥y ∥ ∥y ∥ ∥y ∥ cˆ I2 , ∇ϕ(∥y ∥) =
(7)
y ̸= 0
(8)
y = 0.
Obviously, ϕ(∥y ∥) is twice continuously differentiable, so is E (f ) in (2). We can see that ∇ 2 ϕ(∥y ∥) relates to both ϕ ′ (∥y ∥) / ∥y ∥ and ϕ ′′ (∥y ∥), which makes the cost function E (f ) of the isotropic regularization share better property than that of the anisotropic regularization, for example, each stationary point is isolated for convex PFs as shown in Theorem 3.2. Theorem 2.1. Suppose ϕ satisfies H1 and H2, if c ≥ cˆ , then the vectorial function
Φ (y ) =
c
∥y ∥2 − ϕ(∥y ∥)
2
(9)
is convex. Proof. According to (8), ∀x ∈ R2 and y ̸= 0, it follows that xT ∇ 2 Φ (y )x = c ∥x∥2 − ϕ ′′ (∥y ∥)
⟨x, y ⟩2 ⟨x, y ⟩2 ϕ ′ (∥y ∥) 2 ∥ ∥ − x − . ∥y ∥ ∥y ∥2 ∥y ∥2
Form (5), we have xT ∇ 2 Φ (y )x ≥
c−
ϕ ′ (∥y ∥) ∥y ∥
⟨x, y ⟩2 ∥x ∥2 − . ∥y ∥2
According to the Cauchy–Schwarz inequality: ∥x∥2 − ⟨x, y ⟩2 / ∥y ∥2 ≥ 0 and the condition: c ≥ cˆ , we have xT ∇ 2 Φ (y )x ≥ 0 and Φ is convex. When y = 0, we have xT ∇ 2 Φ (y )x = (c − cˆ ) ∥x∥2 and Φ is convex. Examples of PFs satisfying H1 and H2 are
ϕ(t ) = K
t2
(convex) K2 t2 ϕ(t ) = log 1 + 2 (nonconvex) 2
1+
K2
2
ϕ(t ) = ϕ(t ) =
K
t 2 /K 2
K2
2 1 + t 2 /K 2 K2 2
(nonconvex)
1 − exp −
t2 K2
(nonconvex)
(10) (11)
(12)
(13)
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
155
with K a parameter to judge edges or noise and cˆ = 1. However, the PF of the well-known TV regularization (ϕ(t ) = t) does ϕ ′ (t )
not satisfy H2 since limt →0+ t = +∞. Thus, the additive HQ method cannot deal with the exact TV regularization, but with its smoothed version (10) with K small enough. 2.2. Vectorial additive half-quadratic reformulation Under H1 and H2, let c ≥ cˆ , the vectorial function Φ in (9) is convex according to Theorem 2.1, the Legendre–Fenchel conjugate function of Φ reads
Φ ∗ (x) = sup {⟨x, y ⟩ − Φ (y )} ,
∀x ∈ R2 .
(14)
y ∈R2
From basic results of duality theory, Φ ∗ is convex, reciprocally,
Φ (y ) = sup ⟨x, y ⟩ − Φ ∗ (x) ,
∀y ∈ R 2 .
(15)
x∈R2
Denote φ(x) = Φ ∗ (x) −
∥x∥2 2c
, from (9) and (15) it is easy to see that
2 x 1 √ cy − √ + φ(x) . ϕ(∥y ∥) = inf 2 c x∈R2
(16)
Thus, the cost function (2) is augmented as E (f ) =
inf J (f , b)
(17)
b∈Rp×2
where J (f , b) =
p β
2 i=1
2 p 1 √ bi [Hf − g ] + c [Df ]i − √ + φ(bi ) 2 c 2 i
(18)
i =1
with b ∈ Rp×2 , [Df ]i and bi ∈ R2 . We call (18) the vector additive augmented function of (2). Look at the scalar additive augmented function of (2) showed in [16]:
2 p p bi β 1 √ 2 J1 (f , b) = [Hf − g ]i + c ∥[Df ]i ∥ − √ + φ(bi ) 2 i=1
i=1
2
c
with bi ∈ R. J1 (·, b) is nonquadratic in f obviously, but J (·, b) is quadratic (hence convex) in f since H and D are linear. In addition, J (f , ·) is convex in b for each f fixed, since Φ ∗ is convex. However, J (·, ·) is not necessarily convex in (f , b). Thus, the minimizer (f ∗ , b∗ ) of J in (18) can be calculated using an alternate minimization. Let the solution at the (k − 1)th step be (f k−1 , bk−1 ), for the kth step, we calculate bk such that J (f k−1 , bk ) ≤ J (f k−1 , b), f such that J (f , b ) ≤ J (f , b ), k
until
k k−1 f −f
∥f k ∥
k
k
k
∀b ∈ Rp×2
∀f ∈ R
p
(19) (20)
< ε, with arbitrary initial point f 0 and the allowed error ε .
2.3. Efficient implement of the alternate minimization 2.3.1. Computation of bk in (19) From basic results of duality theory, we have the following conclusion to solve (19) explicitly: Theorem 2.2. Suppose ϕ satisfies H1 and H2, if c ≥ cˆ , then the infimum of (16) arrives at x = cy −
ϕ ′ (∥y ∥) y, ∥y ∥
∀y ∈ R2 .
(21)
∥y ∥2 − sup ⟨x, y ⟩ − Φ ∗ (x) .
(22)
Proof. From (16), we have
ϕ(∥y ∥) =
c 2
x∈R2
If c ≥ cˆ , the function Φ is convex and continuous, it follows that the infimum arrives at y = ∇ Φ ∗ (x). For the dual pair Φ ∗ and Φ , x = ∇ Φ (y ) from [25, Theorem 23.5], then the infimum of (16) arrives at (21) from (9).
156
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
The closed-form expression of φ is usually difficult or not analytical for most of PFs such as (10)–(13). According to Theorem 2.2, it does not need to know the expression of φ , the minimizer bk of (19) can be calculated explicitly and pixel by pixel:
ϕ′
bki = c −
[D1 f k−1 ]2i + [D2 f k−1 ]2i [Df k−1 ]i 2 2 [D1 f k−1 ]i + [D2 f k−1 ]i
(23)
T
where [Df k−1 ]i = [D1 f k−1 ]i , [D2 f k−1 ]i . Notice that the condition c ≥ cˆ is the basic requirement of the classical additive HQ methods, we will show that such requirement is not necessary for the convergence of VAHQ in the next section.
2.3.2. Computation of f k in (20) The Euler–Lagrange equation associated with (20) reads Gc f = β H T g + DT1 bk·,1 + DT2 bk·,2
(24)
Gc = β H T H + cDT1 D1 + cDT2 D2 .
(25)
with
It is easy to see that Gc ≻ 0 under (3), then Gc is invertible. Here we do not solve (24) exactly, but introduce a parameter θ > 0 and change (24) into Gc f = (1 − θ )Gc f + θ (β H T g + DT1 bk·,1 + DT2 bk·,2 ).
(26)
The following iteration consists in performing one step to minimize (26): f k = (1 − θ )f k−1 + θ Gc−1 (β H T g + DT1 bk·,1 + DT2 bk·,2 ).
(27)
Usually, the blurring matrix H is generated by a symmetric point spread function, then H T H is block circulant. Obviously, DT1 D1 , DT2 D2 is block circulant, so is Gc . Under the periodic boundary condition [26] of f k , Gc can be diagonalized by 2-dimensional FFT. Using the convolution theorem, (27) can be calculated pixel by pixel, by changing the vector in Rp into the n × n matrix, without referring to the inverse of the p × p matrix Gc :
[f ]n = (1 − θ )[f k
k −1
]n + θ F
−1
F β H [g ]n + D 1 [bk·,1 ]n + D 2 [bk·,2 ]n
β F (H )2 + c F (D1 )2 + c F (D2 )2
(28)
where F , F −1 , H , D 1 , D 2 represent FFT, inverse FFT, the symmetric point spread operator, the digital horizontal gradient operator, the digital vertical gradient operator for the n × n matrix respectively. We can compute H [g ]n using the MATLAB function ‘‘imfilter’’ and the n × n matrix β F (H )2 + c F (D1 )2 + c F (D2 )2 using ‘‘psf2otf ’’ before starting iterations. Since all variables except bk·,1 and bk·,2 in (28) are constant, computing f k with (28) involves one FFT and one inverse FFT of the n × n matrix with the complexity O(nlogn) at each iteration. There is no need of the inverse of Gc , so the computation of f k is at a low cost, especially for the large size images. The alternate minimization (19) and (20) implemented by (23) and (28) is called vectorial additive half-quadratic (VAHQ) algorithm. Some remarks are followed:
• Remark 1: The efficient calculation f k in (28) with FFT has emerged in [5,16] with θ = 1, where the current iteration
does not use the former iterative information directly (it is indirect since bk is connected with according to (23)), so the convergence speed is not optimal usually. It is noticed that, when θ ̸= 1, (27) does not minimize J (·, bk ) of (20) in each iteration exactly, but it can bring a faster convergence speed of the total iteration without destroying the convergence of VAHQ, which will be shown in the next section. • Remark 2: VAHQ cannot be applied to the exact TV regularization [6], since (23) makes no sense when t → 0+ . To avoid such a drawback, consider the approximation isotropic TV regularization with Huber-like PF [16]:
ϕ(t ) =
2 t , 2
0≤t ≤K
K2 Kt − ,
t > K. 2 Set c = 1, the augmented cost function (18) becomes
p p β 1 2 2 ∥[Df ]i − bi ∥ + K ∥bi ∥ . J (f , b) = [Hf − g ]i + 2 i =1
i =1
2
Its alternate minimization algorithm is suggested to be implemented by 2-D shrinkage formula and FFT in [5]. The formula (23) applied to the Huber-like PF is nothing but the 2-D shrinkage formula. Thus, the alternate minimization algorithm of [5] is a special case of VAHQ without considering the differentiable conditions in H1 and H2.
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
157
• Remark 3: For the duality theory suits for n-dimensional Euclidean space, VAHQ can be extended to n-dimensional vectorial space, such as 3-D images and multichannel images. Analogous extended algorithm to the multichannel TV regularization has been proposed by [27] for multichannel image restoration. 3. Convergence analysis 3.1. Convergence We focus our attention on general PFs (no matter convex or nonconvex) in this section. We will show that VAHQ is a class of gradient related iterative schemes with constant stepsize [21]. The convergence is decided by the parameter pair (θ, c ), the domain of (θ , c ) to ensure the convergence will be obtained by the Armijo stepsize rule. First of all, let us show that VAHQ amounts to finding the fixed point of a map. Theorem 3.1. The alternate minimization (19), (20) implemented by (23), (27) is equivalent to the fixed point iteration of the following map:
Mθ,c : Rp → Rp ,
Mθ,c (f ) = f − θ Gc−1 ∇ E (f ).
(29)
Proof. Given f k−1 , bk is calculated with (23):
ϕ ′ [Df k−1 ]i [Df k−1 ]i . = c − k−1 [Df ]i
bki
Alternately according to (27), we have
f k = (1 − θ )f k−1 + θ Gc−1
β H Tg +
2 j=1
k−1 p ′ ϕ [ Df ] i Dj f k−1 . DTj diag c − [Df k−1 ]i
i=1
The expression above is the fixed point iteration of the following map:
Mθ,c (f ) = (1 − θ )f + θ Gc
−1
βH g + T
2
DTj diag
ϕ ′ (∥[Df ]i ∥) c− ∥[Df ]i ∥
j=1
p Dj f
.
i=1
The expression for ∇ E given in (2) with ℓ2 norm yields
ϕ ′ (∥[Df ]i ∥) ∇ E (f ) = β H (Hf − g ) + D diag ∥[Df ]i ∥ T
T
p
Df .
(30)
i=1
According to above two equations and (25), the conclusion holds.
Such a result happens to both multiplicative HQ and SAHQ methods [16,28]. Theorem 3.1 describes the essence of VAHQ, it is indeed a pertinent correction of the steepest descent method to minimize (2) with the isotropic regularization. The general form of gradient related iterative scheme has been directly considered for both multiplicative HQ and SAHQ methods to minimize (2) with the anisotropic regularization in [22], and the special case when θ = 1 has been studied in [16]. VAHQ is equivalent to the iteration: f k+1 = f k − θ Gc−1 ∇ E (f k )
(31)
which will be mainly considered next, instead of the alternate iteration shown in Section 2. Since G is positive definite, the direction d k = −Gc−1 ∇ E (f k ) makes ∇ E (f k )T d k < 0 when ∇ E (f k ) ̸= 0, then the direction sequence {d k } is gradient related.1 θ is called stepsize, admissible stepsize is necessary for the convergence and the convergence speed of (31). θ is neither too large (otherwise the iteration could diverge) nor too small (otherwise the convergence speed is too slow). The well-known stepsize rule is Armijo inequality: E (f k+1 ) − E (f k ) ≤ −ωθ ∇ E (f k )T Gc−1 ∇ E (f k )
(32)
where ω ∈ (0, 1) is a constant and independent on k. In the Armijo rule, the stepsize is unnecessary a constant, it can depend on k, the linear search is usually needed to find a suitable stepsize satisfying (32) at each step. For VAHQ, if we can find a constant stepsize θ satisfying (32), then every limit point of {f k } obtained by (31) is a stationary point2 of E (f ) [29, Proposition 1.2.1]. However, such a result does not provide a global convergence guarantee, Allain et al. [22, Proposition 3] proposed the following lemma of practical interest. 1 Sequence {d k } is gradient related to {f k } if for any subsequence {f k } k ∈K limsupk→∞,k∈K ∇ E (f k )dk < 0. 2 f ∗ is a stationary point if ∇ E (f ∗ ) = 0.
→
f ⋆ with ∇ E (f ⋆ )
̸=
0, {d k }k∈K is bounded and satisfies
158
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
Lemma 3.1. Suppose E (f ) is continuously differentiable, below bounded and coercive,3 Gc is positive definite, all stationary points of E (f ) are isolated4 and θ satisfies Armijo inequality (32), then the sequence {f k } obtained by (31) from arbitrary initial point f 0 converges to a stationary point. Lemma 3.1 shows the global convergence defined as the convergence of the iterations from any initialization, such global convergence has been addressed for multiplicative HQ method in [30] and for both multiplicative HQ and SAHQ methods in [22], providing each minimum is isolated. To meet such a rigorous condition, the strictly convex PFs are usually required to make the cost function strictly convex. For VAHQ, it is easy to see that E (f ) in (2) and Gc in (25) satisfy Lemma 3.1’s conditions except the isolation of all stationary points, the following theorem shows that all stationary points of E (f ) is isolated when ϕ is convex. Let us show some notations first. Let f ∗ ∈ Rp be a stationary point of E (f ), denote
Θ = {i : [Df ∗ ]i = 0, 1 ≤ i ≤ p},
¯ = {i : [Df ∗ ]i ̸= 0, 1 ≤ i ≤ p} Θ
¨ a diagonal p × p matrix whose ith diagonal element and W ∗ ′′ ∗ ′ ϕ [Df ]i − ϕ [Df ]i , ¨ i ,i = [Df ∗ ]i 2 [Df ∗ ]i 3 W 0,
¯ i∈Θ i ∈ Θ.
From H1, (2) and (8), we have
∇ 2 E (f ) = β H T H + DT diag
ϕ ′ (∥[Df ]i ∥) ∥[Df ]i ∥
p
¨ Df ⊗ (Df )T D. D + DT W
(33)
i=1
Theorem 3.2. Under H1 and H2, suppose ϕ is convex and (3) holds, then each stationary point of E (f ) is an isolated minimum, furthermore, E (f ) admits the unique minimum. Proof. From (33), ∀u ∈ Rp , we have
2 ∗ [Du]i , [Df ∗ ]i ∥[Du]i ∥ + u ∇ E (f )u = β u H Hu + cˆ ϕ [Df ]i ∗ 2 [Df ]i i∈Θ ¯ i∈Θ 2 ϕ ′ [Df ∗ ]i [Du]i , [Df ∗ ]i ∗ ∥[Du]i ∥2 − + . [Df ]i [Df ∗ ]i 2 ¯ i∈Θ T
∗
2
T
T
2
′′
(34)
When ϕ is convex, uT ∇ 2 E (f ∗ )u ≥ 0, then E (f ) is convex and E (f ) has at least one minimum which is also a stationary point. Suppose f ∗ is a stationary point of E (f ), we want to show that ∇ 2 E (f ∗ ) is positive definite, i.e., uT ∇ 2 E (f ∗ )u > 0. ¯ ∈ ker(H ) such that It only needs to prove uT ∇ 2 E (f ∗ )u > 0 when u ∈ ker(H ). By contradiction, suppose there exists u ¯ from ¯ T ∇ 2 E (f ∗ )u¯ = 0, then there exists some i such that [Du¯ ]i ̸= 0 from (3) and each i satisfying [Du¯ ]i ̸= 0 belongs to Θ u ¯ ̸= ∅ to make the fourth term of right hand in (34) ¯ ]i = k[Df ∗ ]i ̸= 0 with k a nonzero constant and Θ (34), so we have [Du equal to 0. However, according to (30), we have
kϕ ′ [Df ∗ ]i ∗ ⟨[Df ∗ ]i , [Df ∗ ]i ⟩. ¯ ∇ E (f ) = β(H u¯ ) (Hf − g ) + u [Df ]i T
∗
T
∗
¯ i∈Θ
¯ T ∇ E (f ∗ ) ̸= 0 since β(H u¯ )T (Hf ∗ − g ) = 0, k ̸= 0, [Df ∗ ]i ̸= 0 and ϕ ′ (t )/t > 0 for any finite t, it contradicts that f ∗ is Thus, u a stationary point. According to the second order sufficient optimality conditions [29, Proposition 1.1.3], f ∗ is a strict local minimum of E (f ), i.e., it is an isolated minimum. Suppose there exist two different isolated minima f1∗ and f2∗ with E (f1∗ ) ≤ E (f2∗ ), then E (α f1∗ + (1 − α)f2∗ ) ≤ E (f2∗ ) for any α ∈ (0, 1), it contradicts that f2∗ is an isolated minimum. Thus, E (f ) admits the unique minimum. From the proof of Theorem 3.2, we can see that ∇ E (f ) = 0 and ∇ 2 E (f ) is semi-positive definite cannot happen at the same time for the isotropic regularization when E (f ) is convex (unnecessarily strictly), such a good property is applied to the proof of convergence rate as shown in Theorem 3.5. Similar property does not hold for the anisotropic regularization. Therefore, VAHQ globally converges to the unique minimum for convex PFs, there is no need to suppose all stationary points are isolated as shown in Lemma 3.1. When ϕ is nonconvex, the extra condition that all stationary points of E (f ) are isolated is required to get the global convergence. The convergent point can be a desirable minimum, an undesirable maximum or saddle point, if such exist. However,
3 E (f ) is coercive if E (f k ) → +∞ when ∥f k ∥ → +∞. 4 A stationary point f ∗ is isolated if there exists a ball centered at f ∗ such that f ∗ is the only stationary point in the ball.
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
159
in contrast to the attractive minima, the latter are unstable fixed points, since E (f k ) is strictly decreasing as long as f k is not a stationary point. Hence, the proper perturbation away from a saddle point or a maximum will lead away from it [30]. To ensure the convergence of VAHQ, the remainder is how to find θ satisfying Armijo inequality (32). Let us show the preliminary conclusion first: Theorem 3.3. Under H1 and H2, ∀y1 , y2 ∈ R2 , then cˆ
ϕ(∥y1 + y2 ∥) ≤ ϕ(∥y1 ∥) + ⟨y2 , ∇ϕ(∥y1 ∥)⟩ +
⟨y2 , y2 ⟩ .
2
(35)
Proof. Under H1 and H2, set c = cˆ in (9), Φ (y ) is convex from Theorem 2.1, then
⟨y2 , ∇ Φ (y1 + y2 ) − ∇ Φ (y1 )⟩ = y2 , cˆ y2 − (∇ϕ(∥y1 + y2 ∥) − ∇ϕ(∥y1 ∥)) . Besides, according to the differential mean value theorem, there exists y3 between y1 and y1 + y2 , such that
⟨y2 , ∇ Φ (y1 + y2 ) − ∇ Φ (y1 )⟩ = y2 , ∇ 2 Φ (y3 )y2 ≥ 0. The last inequality results from the convexity of Φ . Then
⟨y2 , ∇ϕ(∥y1 + y2 ∥) − ∇ϕ(∥y1 ∥)⟩ ≤ cˆ ⟨y2 , y2 ⟩ .
(36)
Consider the function γ (t ) : R → R defined as
γ (t ) = ϕ(∥y1 + ty2 ∥) where t is a scalar which is independent of y1 and y2 . According to the chain rule, we have γ ′ (t ) = ⟨y2 , ∇ϕ(∥y1 + ty2 ∥)⟩, then
ϕ(∥y1 + y2 ∥) − ϕ(∥y1 ∥) = γ (1) − γ (0) =
1
γ ′ (t )dt 0
1
⟨y2 , ∇ϕ(∥y1 ∥)⟩ dt +
= 0
⟨y2 , ∇ϕ(∥y1 + ty2 ∥) − ∇ϕ(∥y1 ∥)⟩ dt 0
1
1
⟨y2 , ∇ϕ(∥y1 ∥)⟩ dt + cˆ
≤
1
0
⟨y2 , y2 ⟩ tdt 0
= ⟨y2 , ∇ϕ(∥y1 ∥)⟩ +
cˆ 2
⟨y2 , y2 ⟩ .
The fourth inequality results from (36). Thus, the inequality (35) holds.
The conclusion of Theorem 3.3 is analogous with the descent lemma [29], but their conditions are different, the descent lemma supposes that ∇ϕ(∥y ∥) is cˆ -Lipschitz.5 Consider the iteration (31), we have the following conclusion: Theorem 3.4. Under H1 and H2, let (θ , c ) belong to
Γ = (θ , c ) : θ ∈ (0, 2); c ∈ (θ cˆ /2, +∞)
(37)
then there exists ω ∈ (0, 1), such that Armijo inequality (32) holds. Proof. Consider the second-order approximation of E (f ) in the neighborhood of f k E˜ (f ) = E (f k ) + ∇ E (f k )T (f − f k ) +
1 2
(f − f k )T Gc (f − f k ).
Since the first term of E (f ) in (2) is quadratic in f , then it is easy to deduce E˜ (f k+1 ) − E (f k+1 ) =
p
c ϕ(∥[Df k ]i ∥) − ϕ(∥[Df k+1 ]i ∥) + δki , ∇ϕ(∥[Df k ]i ∥) + ∥δki ∥2 2
i=1
where δki = [Df k+1 − Df k ]i ∈ R2 . From Theorem 3.3 we have E˜ (f k+1 ) ≥ E (f k+1 ) +
p c − cˆ
2
∥δki ∥2 .
i =1
5 ∇ϕ is cˆ -Lipschitz if ∥∇ϕ(x ) − ∇ϕ(x )∥ ≤ cˆ ∥x − x ∥ for ∀x , x ∈ R2 . 1 2 1 2 1 2
160
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
According to the above inequality and ∇ E (f k ) = − θ1 Gc (f k+1 − f k ) from (31), we have E (f k+1 ) − E (f k ) + ωθ∇ E (f k )T Gc−1 ∇ E (f k ) ≤ E˜ (f k+1 ) − E (f k ) + ωθ ∇ E (f k )T Gc−1 ∇ E (f k ) +
p cˆ − c
2
∥δki ∥2
i =1
1
= ∇ E (f k )T (f k+1 − f k ) + (f k+1 − f k )T Gc (f k+1 − f k ) 2
+ ωθ ∇ E (f k )T Gc−1 ∇ E (f k ) + =β
p cˆ − c
2
∥δki ∥2
i =1
p ω−1 1 Hf k+1 − Hf k 2 + ω − 1 c + cˆ + ∥δki ∥2 . θ 2 θ 2 i=1
Set ω = min 1 − θ2 , 1 − θ2ccˆ , according to (37), it is easy to see ω ∈ (0, 1). Thus, Armijo inequality (32) holds.
From Theorem 3.4, Theorem 3.2 and Lemma 3.1, when (θ , c ) ∈ Γ , the iteration (31) is convergent to a stationary point for nonconvex PFs providing all stationary points are isolated and is convergent to the unique minimum for convex PFs, from any initialization. Notice that, in Section 2, all results about VAHQ require c ≥ cˆ , but here the domain of (θ , c ) to ensure the convergence of VAHQ is extended to Γ which is called the convergence domain of (θ , c ), where c can be less than cˆ . In our experiments, Γ is shown as the largest convergence domain of (θ , c ). Allain et al. [22] have originally introduced such an extended convergence domain for SAHQ. Discriminatively, the cˆ -Lipschitz condition of ∇ϕ is required in [22], our convergence results of VAHQ require H1 and H2. 3.2. Convergence rate To consider the linear convergence rate of VAHQ, we pay our attention to convex PFs. For scalar HQ methods, the quantitative relationship between the convergence rate and the parameter c is given when θ = 1 by [16], the qualitative relationship between the convergence rate and the parameter c is given and the optimal parameter θ is suggested by [22]. From [21, Proposition 10.1.4], the linear convergence rate6 is equivalent to the spectral radius (i.e., the largest-in-magnitude eigenvalue) of the matrix ∇ Mθ,c at the convergence point f ∗ of (31) when the spectral radius is strictly less than 1. Denote the spectral radius ρ(∇ Mθ,c (f ∗ )), we will show the bound of ρ(∇ Mθ ,c (f ∗ )) on the convergence domain Γ . Furthermore, the convergence speed decreases as the convergence rate increases, and the smallest convergence rate corresponds to the fastest convergence speed. We will show how to choose (θ , c ) to obtain the lowest bound of ρ(∇ Mθ ,c (f ∗ )) for the fastest convergence speed. Theorem 3.5. Under H1 and H2, suppose ϕ is convex and (3) holds, let f ∗ be the minimum obtained by (31) and (θ , c ) ∈ Γ , then
ρ(∇ Mθ,c (f ∗ )) < 1.
(38)
Proof. Let λ be an eigenvalue of ∇ Mθ,c (f ∗ ) and u ∈ Rp be an associated eigenvector with ∥u∥ = 1, we have
λuT Gc u = uT Gc ∇ Mθ,c (f ∗ )u.
(39)
From (33) and the definition of Gc in (25), we have u Gc ∇ Mθ,c (f )u = (1 − θ )β u H Hu + cu D Du − cˆ θ T
∗
T
T
T
T
i∈Θ
ϕ ′ [Df ∗ ]i ∥[Du]i ∥ − θ ∥[Du]i ∥2 [Df ∗ ]i 2
¯ i∈Θ
2 ϕ ′ [Df ∗ ]i ∗ [Du]i , [Df ∗ ]i ′′ −ϕ [Df ]i (40) +θ ∗ 2 . [Df ∗ ]i [Df ]i ¯ i∈Θ 2 2 According to the Cauchy–Schwarz inequality: [Du]i , [Df ∗ ]i / [Df ∗ ]i ≤ ∥[Du]i ∥2 , ϕ ′ (t )/t ≥ ϕ ′′ (t ) and ϕ ′′ (0) = cˆ , (40)
implies uT Gc ∇ Mθ,c (f ∗ )u ≤ (1 − θ )β uT H T Hu + cuT DT Du − cˆ θ
i∈Θ
∥[Du]i ∥2 − θ
ϕ ′′ [Df ∗ ]i ∥[Du]i ∥2
¯ i∈Θ
p
= (1 − θ )β uT H T Hu + cuT DT Du − θ
ϕ ′′ [Df ∗ ]i ∥[Du]i ∥2 .
i=1
∥fn+1 −f ∗ ∥ 6 For the convergence sequence f → f ∗ , the linear convergence rate is defined as lim n n→∞ ∥fn −f ∗ ∥ .
(41)
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
161
Because ϕ ′ (t )/t ≥ ϕ ′′ (t ) and ϕ ′ (0)/0 = cˆ , (40) implies
ϕ ′ [Df ∗ ]i ∗ ∥[Du]i ∥2 ∥[Du]i ∥ − θ u Gc ∇ Mθ,c (f )u ≥ (1 − θ )β u H Hu + cu D Du − cˆ θ [Df ]i i∈Θ ¯ i∈Θ p ϕ ′ [Df ∗ ]i T T T T ∗ ∥[Du]i ∥2 . = (1 − θ )β u H Hu + cu D Du − θ [Df ]i ∗
T
T
T
T
T
2
(42)
i =1
Thus,
(1 − θ )A + (c − θ ω2 )B (1 − θ )A + (c − θ ω1 )B ≤λ≤ A + cB A + cB
(43)
with A = β uT H T Hu,
B = uT DT Du ϕ ′ [Df ∗ ]i ω1 = max , ω2 = min ϕ ′′ [Df ∗ ]i . [Df ∗ ]i 1≤i≤p 1≤i≤p
(44)
From (3), A and B cannot be 0 at the same time. Because ϕ is convex and (θ , c ) ∈ Γ , then −1 < 1 − θ < 1, −c < c − θ ω1 and c − θ ω2 ≤ c. According to (43), we have −1 < λ ≤ 1. Next, we will show λ ̸= 1 when ϕ is convex. Since ϕ ′′ ≥ 0, (40) implies uT Gc ∇ Mθ,c (f ∗ )u = (1 − θ )β uT H T Hu + cuT DT Du
−ˆc θ
i∈Θ
2 ϕ ′ [Df ∗ ]i [Du]i , [Df ∗ ]i 2 ∗ ∥[Du]i ∥ − ∥[Du]i ∥ − θ ≤ (1 − θ )β uT H T Hu + cuT DT Du [Df ]i ∗ 2 [Df ]i ¯ i∈Θ 2
the second inequality holds strictly and λ < 1 except when Du = 0 or Du = Df ∗ . When Du = 0, from (3) we have uT H T Hu > 0, moreover, 1 − θ < 1, then λ < 1. When Du = Df ∗ ̸= 0, suppose λ = 1, it is easy to show that 0 = uT ∇ 2 E (f ∗ )u = β uT H T Hu +
p
ϕ ′′ [Df ∗ ]i ∥[Du]i ∥2
i=1
then Hu = 0. However, f ∗ is a minimum of E (f ), according to the first order necessary condition for optimality, we have ∇ E (f ∗ ) = 0, then
p ϕ ′ [Df ∗ ]i ∗ ∥[Du]i ∥2 . 0 = u ∇ E (f ) = β u H (Hf − g ) + [Df ]i T
∗
T
∗
T
i=1
There is a contradiction to u H (Hf − g ) = 0, ∥Du∥ > 0 and ϕ ′ (t )/t > 0 for any finite t. T
∗
T
Similar conclusions hold for both multiplicative HQ and SAHQ methods when ∇ϕ is cˆ -Lipschitz, which has been proven in [22], our proof is totally different from theirs. From their proof, as far as we know, an implicit condition that ϕ is strictly convex is needed, Theorem 3.5 here only requires ϕ is convex with the help of the first order necessary condition for optimality. What is more, according to our proof, it is easy to get the conclusions on the optimal bound of the convergence rate. Define the bound function of λ:
(1 − θ )A + (c − θ ω1 )B (1 − θ )A + (c − θ ω2 )B . ρ(θ , c ) = max , A + cB A + cB
(45)
According to the definition of the convergence rate, ρ(θ , c ) represents the bound of the convergence rate. We want to find the optimal parameter pair (θ ∗ , c ∗ ) ∈ Γ to minimize ρ(θ , c ) for the fastest convergence speed. Such an optimization problem could not be solved with gradient methods, because ρ(θ , c ) is the maximum of two parts and its optimal solution is located at the intersection of its two parts, where it is not differentiable. According to (43), ρ(θ , c ) in (45) is equivalent to
max θ
A + ω1 B A + cB
− 1, 1 − θ
A + ω2 B A + cB
.
(46)
The minimal ρ arrives at the intersection of the two surfaces:
θ∗ =
2A + 2c ∗ B 2A + (ω1 + ω2 )B
,
(θ ∗ , c ∗ ) ∈ Γ .
(47)
162
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
Inserting (47) into (46), the optimal bound of convergence rate
ρ(θ ∗ , c ∗ ) =
(ω1 − ω2 )B 2A + (ω1 + ω2 )B
.
(48)
Such conclusions on the convergence rate have not emerged for the scalar additive HQ methods to the best of our knowledge. Some remarks are followed:
• Remark 1: From (44), ω1 and ω2 are depend on the minimum f ∗ obtained by (31) which is unknown, so it is hard to choose the optimal parameter pair (θ ∗ , c ∗ ) beforehand. Fortunately, in reality we can choose ω1 . cˆ and ω2 & 0 relaxedly from (4) of H2 and the optimal parameter pair is near to the line c ∗ = θ ∗ /2 since B is much larger than A usually in our vast experiments.
• Remark 2: The optimal parameter pair (θ ∗ , c ∗ ) reaching the fastest convergence speed is not unique according to (47) and the optimal bound of convergence rate is a constant according to (48). Actually, A and B in (44) relate to u which is an eigenvector of ∇ Mθ,c (f ∗ ), so the optimal bound of convergence rate is influenced by (θ , c ) indirectly. In our experiment as shown in Table 2, different optimal parameter pairs lead to the same convergence rate almost, so the influence of the different parameter pairs to the optimal convergence rate is unconspicuous. • Remark 3: In theory, the parameter pair (θ , c ) satisfying (47) is optimal for the bound of the convergence rate, not necessary for the convergence rate. But in practice, as shown in Fig. 4, the theoretical bound of convergence rate defined by (45) is consistent with the practical convergence rate and their minima are located at the same position. Thus the definition (45) is reasonable. What is more, the linear relationship between the optimal parameters θ ∗ and c ∗ is shown theoretically by (47) and practically in Table 2. However, (47) is the necessary condition of the optimal convergence rate, but not the sufficient condition, as shown in Table 2 that θ = 0.1, 0.2 or 1.9 is not optimal. • Remark 4: Consider the typically convex PF (10), we have cˆ = 1 and choose ω1 . 1, ω2 & 0, then c ∗ > θ ∗ /2 from the definition of Γ in (37), so the optimal parameter θ ∗ > 1 from (47) usually. This is the reason why we introduce the stepsize parameter θ in (26) for VAHQ. 4. Numerical results In this section, the restoration performance of convex PFs and nonconvex PFs is compared with piecewise constant images; The computation efficiency comparison is present between MMMG [20] and ours; The convergence domain and convergence rate results are validated by nonconvex PFs and convex PFs. All of our experiments are performed on a PC machine configured with Intel Dual Core based system with 1.8 GHz CPU and 2 GB RAM memory under MATLAB platform. 4.1. Restoration performance comparison VAHQ is alternately implemented with (23) and (28), taking ϕ as (10) (convex) and (11) (nonconvex) for example. The initial point f 0 is set as the observed image g and the iteration processes until the relative error norm
k f − f k−1 < 10−4 f k
(49)
or k > 5000 where k is the kth step of iteration. Denote k0 the last step of the iteration and the iteration is regarded as divergence when k0 = 5000. We compare the restoration performance of the convex PF (10) and the nonconvex PF (11) with piecewise constant images: ‘‘TwoCircle’’ and ‘‘Phantom’’, which are shown in Figs. 1 and 2 respectively. The observed images are blurred by the Gaussian kernel and noised by white noise. The parameter pair (θ , c ) of VAHQ is fixed on (1, 0.52) which are in Γ , other parameters are shown in the caption of Figs. 1 and 2. In our vast experiments, different parameter pairs (θ , c ) in Γ lead to the almost same restoration results under the same other parameters. The isotropic regularization with the convex PF (10) is close to the isotropic TV regularization when K is small, which can be called smoothed TV regularization [31]. The restoration result by the convex PF (10) looks the same as Huber-like PF [16]. So we choose (10) to show the performance of convex PFs. We can see that, the restoration performance with nonconvex PFs is better than that of convex PFs, which is the reason why we emphasize the global convergence of VAHQ adapting to nonconvex PFs such as (11) and advocate them for better restoration performance. Besides, the Peak Signal-to-Noise Ratio (PSNR7 ) of the nonconvex PF (11) is higher than that of the convex PF (10) for piecewise constant images. However, the nonconvex PF (11) costs more iterative number than the convex PF (10), it means that the convergence speed of nonconvex PFs is slower than that of convex PFs. 4.2. Computation efficiency comparison to MMMG algorithm In this subsection, we will compare the computation efficiency of ours to a latest algorithm [20] which suits to the isotropic regularization. The multiplicative HQ algorithm needs the inverse of large scale matrix and its computation cost 7 PSNR = −20 log ∥fb −fr ∥ with f the original image and f the object image. r b 10 p
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
(a) Original ‘‘TwoCircle’’ image.
(b) Observed image.
(c) Restoration with convex PF (10).
(d) Restoration with nonconvex PF (11).
163
Fig. 1. Performance comparison with ‘‘TwoCircle’’ image. (b) Blurred by the Gaussian kernel sized 3 × 3 with standard deviation 1, and noised by normal distribution with mean 0 and standard deviation 10. (c) β = 1/6, K = 0.5, PSNR = 12.56, k0 = 141. (d) β = 1/6, K = 4, PSNR = 18.33, k0 = 295.
is much higher than that of VAHQ, so we do not refer to the multiplicative HQ algorithm here, such comparison has been executed by [16]. We compare the computation efficiency of VAHQ to Chouzenoux’s MMMG8 which is also of low computation cost. Our experiment corresponds to the restoration problem with ‘‘Brain’’ image blurred by the uniform kernel and noised by normal distribution. Nonconvex PF (11) is chosen for the two compared algorithms. The parameters λ and δ of MMMG in [20], β and K of VAHQ are adjusted to make the cost function (2) consistent and the restoration performance best, θ and c of VAHQ are combined for the fastest convergence speed. The two compared algorithms stop until (49) satisfies. The CPU time and k0 of MMMG are 23.0413 s and 178 respectively, the CPU time and k0 of VAHQ are 6.9264 s and 185 respectively. We can see that, the computation cost of VAHQ is obviously lower than that of MMMG, even its convergence speed is slightly slower. Moreover, from (c) and (d) in Fig. 3, the PSNR of VAHQ is higher than that of MMMG; from (e), the minimum of the cost function E (f ) obtained by VAHQ is lower than that of MMMG, which means the higher calculation precision of VAHQ. 4.3. Convergence check with nonconvex PFs We used the nonconvex PF (11) to validate the convergence domain results in Section 3.1 with ‘‘TwoCircle’’ image. Table 1 shows the iterative number and the PSNR near the interior boundary and the exterior boundary of Γ . We can see that, if 8 Code is available at http://www-syscom.univ-mlv.fr/~chouzeno/Recherche.html.
164
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
(a) Original ‘‘Phantom’’ image.
(b) Observed image.
(c) Restoration with convex PF (10).
(d) Restoration with nonconvex PF (11).
Fig. 2. Performance comparison with ‘‘Phantom’’ image. (b) Blurred by the Gaussian kernel sized 3 × 3 with standard deviation 2, and noised by normal distribution with mean 0 and standard deviation 2. (c) β = 1/6, K = 0.5, PSNR = 31.72, k0 = 434. (d) β = 1/6, K = 2, PSNR = 38.28, k0 = 1289.
θ = −0.1, 0, 2.1, the iterative number is 5000 or PSNR is unwonted (NaN, −∞ and <10), which means VAHQ is divergence or disabled. If 0 < θ ≤ 2, when c ≥ θ /2, the iterative number is less than 5000 and PSNR is normal (>10), VAHQ is convergent; when c < θ /2, the iterative number is 5000, VAHQ is divergent. Thus, VAHQ is convergent or divergent when the parameter pair (θ , c ) is in the interior boundary or in the exterior boundary of Γ , which is coincident with Theorem 3.4. What is more, such numerical result can give rise to hypotheses such as Γ being the maximal convergence domain in a certain sense. However, VAHQ is convergent on the boundary of Γ when θ ̸= 0, which is not coincident with our result, maybe it results from the discretization. 4.4. Convergence rate check with convex PFs We used the convex PF (10) to validate the convergence rate results about convex PFs in Section 3.2 with ‘‘TwoCircle’’ image. In this subsection, the termination criterion of VAHQ is changed as
k f − f k−1 < 10−10 . f k
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
(a) Original ‘‘Brain’’ image.
(b) Observed image.
(c) Restoration by MMMG.
(d) Restoration by VAHQ.
(e) The cost function (2).
165
(f) The relative error norm (49).
Fig. 3. Computation efficiency comparison. (a) The origin ‘‘Brain’’ image sized 256 × 256; (b) the observed image blurred by the uniform kernel sized 5 × 5, and noised by normal distribution with mean 0 and standard deviation 4; (c) the restored image by MMMG with δ = K = 20, λ = δ 2 β/2 = 20 in [20]; (d) the restored image by VAHQ with K = 20, β = 1/10, θ = 1, c = 0.52; (e) the cost function (2) and (f) the relative error norm (49) as a function of the iterative steps for the two tested algorithms.
The relative error is chosen so little in order to get the practical convergence rate near the optimal solution. The practical convergence rate is approximately calculated by
k f 0 − f k0 −1 . f k0 −1 − f k0 −2 The model parameters β = 1/6 and K = 0.5 are adjusted for the best PSNR.
(50)
166
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
Table 1 The convergence behavior near the boundary of Γ . θ is chosen as −0.1, 0.0, . . . , 2.1 and c is chosen as θ/2 − 0.1, θ/2, θ/2 + 0.1 for each θ fixed. In each double-number unit, the upper number shows the terminated step and the bottom one shows PSNR, PSNR = NaN or −∞ means that the divergence of VAHQ and the terminated step is 1 means that VAHQ is not performed.
θ −0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
−0.1
685 NaN
1 6.77
252 NaN
5000 NaN
5000 20.59
5000 21.36
5000 21.49
5000 21.76
0.0
264 NaN
1 6.77
2585 21.01
2575 21.01
2560 21.08
2554 21.08
2552 21.08
2551 21.08
5000
1 6.77
5000 19.78
4968 21.04
4151 21.05
3746 21.06
3505 21.06
3345 21.06
c = θ /2+
0.1
−∞ θ
c = θ /2+
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
−0.1
5000 21.69
5000 21.78
5000 21.85
5000 21.90
5000 21.89
5000 21.92
5000 21.95
5000 21.94
0.0
2550 21.08
2550 21.08
2550 21.08
2549 21.08
2549 21.08
2549 21.08
2549 21.08
2549 21.08
0.1
3230 21.06
3145 21.07
3078 21.07
3025 21.07
2982 21.07
2945 21.07
2915 21.07
2888 21.07
θ c = θ /2+
1.5
1.6
1.7
1.8
1.9
2.0
2.1
−0.1
5000 21.94
5000 21.94
5000 21.94
5000 21.92
5000 21.92
5000 21.85
5000
−∞
0.0
2549 21.08
2549 21.08
2548 21.08
2548 21.08
2548 21.08
2549 21.08
−∞
0.1
2866 21.07
2846 21.07
2828 21.07
2812 21.07
2799 21.07
2786 21.07
−∞
(a) Theoretical bound of convergence rate.
5000 5000
(b) Practical convergence rate.
Fig. 4. Convergence rate surface for the convex PF (10). (a) The theoretical bound surface defined by (45) of convergence rate. (b) The practical convergence rate surface defined by (50). The grid sized 19 × 19 is formed by dividing θ and c in Γ , 0.1 ≤ θ ≤ 1.9 is divided into 19 parts, θ/2 + 0.05 ≤ c ≤ θ/2 + 0.95 is divided into 19 parts for each θ . The minimum theoretical bound of convergence rate is 0.8465 and located at (θ, c ) = (1.3, 0.70) approximately satisfying the optimal parameter line (51), such optimal points could not be reached exactly because of the rough grid. The minimum practical convergence rate is 0.8364 and located at (θ, c ) = (1.3, 0.70) also.
According to (47), the theoretical optimal parameter line is
θ∗ =
0.2 + 10c ∗ 5.525
(51)
which is near to the line θ ∗ = 2c ∗ . Fig. 4 shows the theoretical bound surface of the convergence rate and the practical convergence rate surface by roughly gridding the convergence domain Γ . The two surfaces share the analogous shape, which shows the rationality of the bound function (45) of the convergence rate. The practical optimal convergence rate is 0.8364 which is slightly less than the theoretical optimal bound 0.8465 of the convergence rate, both are located at (θ, c ) = (1.3, 0.7) which approximately satisfies (51). It could not satisfy (51) exactly because of the rough gridding.
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
167
Table 2 Statistics of the practical convergence rate and the iterative number k0 under different parameter pairs (θ, c ) in detail. In each double-number unit, the upper number represents the practical convergence rate and the bottom one represents the iterative number. For each row (θ fixed), the minimal practical convergence rate and the minimal iterative number are emphasized with bold and italic style respectively..
To demonstrate the practical iterative behavior near the optimal parameter line in detail, we enumerate the practical convergence rate calculated with (50) and the iterative number k0 using the dense gridding with c ranging from θ /2 + 0.01 to θ/2 + 0.1 for each θ ranging from 0.1 to 1.9 in Table 2. We can see that, the practical optimal parameter c is linear on θ , which is accordant with (47). The practical optimal convergence rate approximates to a constant around 0.835–0.840 (correspond to the minimal iterative number about 100) and is independent on the position of the optimal parameter pair (θ ∗ , c ∗ ), which is accordant with (48). Besides, when θ is relatively smaller (θ < 1) or relatively bigger (θ = 1.9), the practical convergence rate cannot reach the optimal value even (47) satisfies, so the condition (47) is only the necessary condition for the optimal convergence rate. Obviously, the optimal parameter c is usually less than cˆ = 1, which lies outside of the basic convergence domain c ≥ cˆ of the classical additive HQ methods [13,16,32]. 5. Conclusion We propose a vector additive half-quadratic algorithm to minimize a class of cost functions with general isotropic regularization, such an algorithm involves the alternate minimization and can be implemented at a low cost. Then we show the global convergence behavior on an extended domain, no matter convex or nonconvex PFs. Furthermore, we propose an analytical way of choosing parameters to obtain the optimal bound of the convergence rate for convex PFs. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]
G. Aubert, P. Kornprobst, Mathematical Problems in Image Processing, second ed., Springer-Verlag, Berlin, 2006. A.K. Jain, Fundamentals of Digital Image Processing, Prentice-Hall, Englewood Cliffs, 1989. A. Tikhonov, V. Arsenin, Solutions of Ill-Posed Problems, Prentice-Hall, Washington, 1977. S. Geman, D. Geman, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Trans. Pattern Anal. Mach. Intell. 6 (1984) 721–741. Y. Wang, J. Yang, W. Yin, Y. Zhang, A new alternating minimization algorithm for total variation image reconstruction, SIAM J. Imag. Sci. 1 (2008) 248–272. L. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D 60 (1992) 259–268. M. Nikolova, M.K. Ng, C.P. Tam, Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction, IEEE Trans. Image Process. 19 (2010) 3073–3088. C. Bouman, K. Sauer, A generalized Gaussian image model for edge-preserving map estimation, IEEE Trans. Image Process. 2 (1993) 296–310. C.R. Vogel, M.E. Oman, Iterative methods for total variation denoising, SIAM J. Sci. Comput. 17 (1996) 227–238. T. Hebert, R. Leahy, A generalized em algorithm for 3-d Bayesian reconstruction from Poisson data using Gibbs priors, IEEE Trans. Med. Imaging 8 (1989) 194–202. P. Perona, J. Malik, Scale-space and edge detection using anisotropic diffusion, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 629–639.
168
W.-P. Li et al. / Journal of Computational and Applied Mathematics 281 (2015) 152–168
[12] S. Geman, G. Reynolds, Constrained restoration and the recovery of discontinuities, IEEE Trans. Pattern Anal. Mach. Intell. 14 (1992) 367–383. [13] D. Geman, C.D. Yang, Nonlinear image recovery with half-quadratic regularization, IEEE Trans. Image Process. 4 (1995) 932–946. [14] M. Nikolova, M.K. Ng, S. Zhang, W. Ching, Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization, SIAM J. Imag. Sci. 1 (2008) 2–25. [15] W.P. Li, Z.M. Wang, Y. Deng, Efficient algorithm for nonconvex minimization and its application to pm regularization, IEEE Trans. Image Process. 21 (2012) 4322–4333. [16] M. Nikolova, M.K. Ng, Analysis of half-quadratic minimization methods for signal and image recovery, SIAM J. Sci. Comput. 27 (2005) 937–966. [17] C. Labat, J. Idier, Convergence of conjugate gradient methods with a closed-form stepsize formula, J. Optim. Theory Appl. 136 (2008) 43–60. [18] M. Zibulevsky, M. Ela, ℓ1 –ℓ2 optimization in signal and image processing, IEEE Signal Process. Mag. 27 (2010) 76–88. [19] E. Chouzenoux, J. Idier, S. Moussaoui, Majorize-minimize strategy for subspace optimization applied to image restoration, IEEE Trans. Image Process. 136 (2011) 1517–1528. [20] E. Chouzenoux, A. Jezierska, J.C. Pesquet, H. Talbot, A majorize-minimize sub-space approach for ℓ2 –ℓ0 image regularization, SIAM J. Imag. Sci. 6 (2013) 563–591. [21] J. Ortega, W. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, SIAM, Philadelphia, 2000. [22] M. Allain, J. Idier, Y. Goussard, On global and local convergence of half-quadratic algorithms, IEEE Trans. Image Process. 15 (2006) 1130–1142. [23] P. Charbonnier, B. Féraud, G. Aubert, M. Barlaud, Deterministic edge-preserving regularization in computed imaging, IEEE Trans. Image Process. 6 (1997) 298–311. [24] T.F. Chan, P. Mulet, On the convergence of the lagged diffusivity fixed point method in total variation image restoration, SIAM J. Numer. Anal. 36 (1999) 354–367. [25] R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, New Jersey, 1970. [26] M.K. Ng, R.H. Chan, W. Tang, A fast algorithm for deblurring models with Neumann boundary conditions, SIAM J. Sci. Comput. 21 (1999) 851–866. [27] Y. Wang, J. Yang, W. Yin, Y. Zhang, A fast algorithm for edge-preserving variation multichannel image restoration, SIAM J. Imag. Sci. 2 (2009) 569–592. [28] M. Nikolova, R.H. Chan, The equivalence of half-quadratic minimization and the gradient linearization iteration, IEEE Trans. Image Process. 15 (2006) 1130–1142. [29] D. Bertsekas, Nonlinear Programming, second ed., Athena Scientific, Belmont, Massachusetts, 1999. [30] A.H. Delaney, Y. Bresler, Globally convergent edge-preserving regularized reconstruction: an application to limited-angle tomography, IEEE Trans. Image Process. 7 (1998) 204–221. [31] J.F. Aujol, Some first-order algorithms for total variation based image restoration, J. Math. Imaging Vision 34 (2009) 307–327. [32] J. Idier, Convex half-quadratic criteria and interacting auxiliary variables for image restoration, IEEE Trans. Image Process. 10 (2001) 1001–1009.