Accepted Manuscript
An Adaptive Fixed-Point Proximity Algorithm for Solving Total Variation Denoising Models Jin-He Wang, Fan-Yun Meng, Li-Ping Pang, Xing-Hua Hao PII: DOI: Reference:
S0020-0255(17)30614-X 10.1016/j.ins.2017.03.023 INS 12806
To appear in:
Information Sciences
Received date: Revised date: Accepted date:
21 July 2016 2 January 2017 21 March 2017
Please cite this article as: Jin-He Wang, Fan-Yun Meng, Li-Ping Pang, Xing-Hua Hao, An Adaptive Fixed-Point Proximity Algorithm for Solving Total Variation Denoising Models, Information Sciences (2017), doi: 10.1016/j.ins.2017.03.023
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
An Adaptive Fixed-Point Proximity Algorithm for Solving Total Variation Denoising Models
∗
CR IP T
Jin-He Wanga b , Fan-Yun Mengb †, Li-Ping Pangc , Xing-Hua Haoc a
b
School of Engineering, Huzhou University, Huzhou 313000, China. Computer Engineering Institute, Qingdao Technological University, Qingdao 266033, China. c School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China.
AN US
Abstract We study an adaptive fixed-point proximity algorithm to solve the total variation denoising model. The objective function is a sum of two convex functions and one of them is composed by an affine transformation, which is usually a regularization term. By decoupling and splitting, the problem is changed into two subproblems. Due to the nonsmooth and nondifferentiability of the subproblem, we solve its proximity minimization problem instead of the original one. To overcome the ”staircase” effect during the process of denoising, an adaptive criterion on proximity parameters is put forward. At last we apply the to illustrate the efficiency of the algorithm.
M
improved algorithm to solve the isotropic total variation denoising model. The numerical results are given Keywords ROF total variation, fixed-point algorithm, split Bregaman, image denoising.
1.
Introduction
ED
2000 MR Subject Classification 65T60, 90C25, 68U10.
CE
PT
We consider a class of unconstrained optimization problem having the form: min Φ(x) + Ψ(Bx) s. t. x ∈ Rn ,
(1.1)
AC
where Φ : Rn → R, Ψ : Rm → R are proper l.s.c convex functions, B is a matrix of m×n dimensions S and R := R +∞. The model (1.1) includes a wide variety of optimization problems generated from image and signal processing. A familiar example is the Rudin-Osher-Fatemi (ROF) total variation model in image denoising [29] and other examples are high-resolution image reconstruction [6], [32], [33], total variation based on impulsive noise removel model [22] and multiresolution sparse regularized model [11]. The difficulty to solve (1.1) is the nondifferentiability of the objective function. For example, the classic ROF total variation model [30] is to minimize the sum of a `2 (`1 ) norm and a total variation (TV) regularization, which is nondifferentiable. There are great challenges to solve such ∗
The research is partially supported by Huzhou science and technology plan on No.2016GY03, the Natural Science Foundation of China, Grant 11501074 and 31271077. † Corresponding author. Email:
[email protected]
ACCEPTED MANUSCRIPT
AN US
CR IP T
problems simply and efficiently. A large quantity of ideologies is put forward to address these issues. Regarding different total variation models, various methods including fixed-point algorithms, smooth approximation methods, subgradient descent methods, interior point methods, second-order cone programmings, primal-dual approaches based on Newton methods are proposed, referring to [3], [4], [5], [7], [13], [15], [25]. Since the matrix of a continuous smooth image is segmented by the TV norm, it is insensitive to the uncontinuous parts of the image. Most algorithms solving TV norms are only effective to the obvious features of an image. It means that the edge and highlight details of the image are well retained, but there is a trouble to deal with the area where the features of the image are not obvious. It appears a loss in handling relatively smoothing parts of the image. This phenomenon is usually called the ”staircase” effect. When a method such as the fourth-order partial differential equation [12] is applied to smooth an image, it is possible to smooth the image excessively and the quality of an image is decreased seriously. Hence, it is necessary to improve the existing algorithms to weaken the ”staircase” effect in the process of denoising.
ED
M
Recently the split Bregman iteration algorithm [14] is successfully utilized to solve the total variation model. Its main idea is to ”de-couple” the composite function from the original problem, and the problem is decomposed into a differentiability portion and a nondifferentiability portion. One can use Gauss-Seidel iterations to deal with the differentiable part and the nondifferentiabe portion can be solved by shrinkage operators. More details about this aspect refer to [2] and [27]. The advantage of split Bregman iteration algorithms is that it converges quickly with less iterations. However, after most noise is eliminated at first few steps, more steps are needed to ensure the smooth of images. If an algorithm guarantees the smooth of the image at the prophase steps, then the computation of latter iterations is reduced and the quality of images is maintained. To achieve this goal, we minimize the proximity subproblem instead of the original one. It not only smooths the image but also improves the computation speed of the algorithm.
AC
CE
PT
The ideology of proximity minimizations is popular and extensively applied to solve the structured optimization problems [16], [17]. Recently many scholars exploit this strategy to deal with the models generated in image processing, referring to [10], [18], [19], [20], [24], [34] et al. The proximity problem not only inherits the advantages of the original problem but also overcomes its disadvantages. From a practical point of view, on the one hand, the proximity term can be seen as a regularization. It strengthens the local convexity of the nonsmooth problem and the solution is easily exhibited. On the other hand, by adjusting proximity parameters constantly, the sequence generated by the algorithm tends to the stationary center fast. It has the same effect as the steepest descent method does and the convergence rate of the algorithm is maintained. Hence, the ideology of proximity minimizations helps to design the algorithm with low cost and high efficiency. Due to the nondifferentiability of nonsmooth subproblems, we employ the proximity idea to deal with the presented problem. In this case, we can still change the problem into a familiar fixed-point problem. By adjusting proximity parameters adaptively during iterations, the ”staircase” effect is weaken in the process of denoising and the convergent rate of the algorithm is also kept. The main contribution of this paper is to propose an adaptive fixed-point proximity algorithm to solve the problem (1.1). Using the techniques of decoupling and splitting, the optimization problem is decomposed into several subproblems. Because of nondifferentiability and nonsmooth of 2
ACCEPTED MANUSCRIPT
Preliminaries
AN US
2.
CR IP T
the problem, we solve its proximity minimization problem instead of the original one. Furthermore, to weaken the ”staircase” effect, an adaptive tactic on proximity parameters is put forward. The convergence of the algorithm is also proved under some assumptions. We organize this paper as follows. In section 2, firstly some preliminaries on various models in image processing are provided. Secondly, the properties about subdifferentials and proximity operators are introduced. We improve the split Bregamn iteration algorithm and apply it to the anisotropic ROF total variation model in section 3. To overcome the ”staircase” effect during the process of denoising, in section 4 we propose an adaptive fixed-point proximity algorithm and the convergence of the algorithm is obtained by choosing proper parameters. Section 5 applies the proposed algorithm to solve the isotropic total-variation model and some numerical results for removing Gaussian noise from an image are obtained. At last a conclusion of this paper is provided in section 6.
M
In this section, we review some total variation models widely used in image processing, which are special formulas of (1.1). In addition, some preliminaries on subdifferentials and proximity operators are given. We begin with introducing some notations used in this paper. For x and y in Rn , denote by P 1 < x, y >:= ni=1 xi yi the inner product of two vectors. The `2 -norm is defined by kxk2 :=< x, x > 2 . For a positive definite matrix H, the weighted inner product is defined by < x, y >H :=< x, Hy >, 1
AC
CE
PT
ED
2 . The `1 -norm is given by correspondingly, the weighted `2 -norm is defined by kxkH :=< x, y >H P kxk1 := i∈I |xi |, where I := {1, 2, · · · , n}. By properly choosing Φ and Ψ, we give several total variation models in image processing as follows. The `2 −TV image denoising model [14]: This model is also known as the ROF model and has the form λ min ku − f k22 + kukT V , (2.2) u 2 where f is the noisy image to be denoised and kukT V denotes the total variation of u, which is designed for the Gaussian noise removal from an image. In this model, Φ := λ2 ku − f k22 and Ψ := k · kT V . There are two definitions for TV semi-norms and they are employed for different scenes in image processing. One is called the anisotropic TV where Ψ is a composition of a `1 −norm and a matrix B. The other is the isotropic TV where Ψ is given by a combination of `2 −norm and B. The matrix B is usually a first order difference matrix. More details about the choices of Ψ and B refer to [4] and [28]. Nikolova et al [22], [23] proposed to change the `2 -norm by a `1 -norm as the fidelity term. Compared with the `2 -norm, the `1 -norm is insensitive to the extreme values when the elements in an image matrix are disturbed by some noise. Hence, this model is usually applied in the impulsive noise removal. The `1 −TV image denoising model [29]: This model has the form
λ min ku − f k1 + kukT V . u 2 3
ACCEPTED MANUSCRIPT
The above total variation model is mainly used to remove the pure noise from an image. The next two models are the extensions of the above models and they are often applied to restore a blurred image corrupted by Gaussian noise. The `2 −TV image restoration model [8]: λ min kKu − f k22 + kukT V , u 2
λ min kKu − f k1 + kukT V : u ∈ Rd . u 2
CR IP T
where K is a n × n matrix representing a convolution kernel. Similarly, replacing `2 -norm by `1 norm, we obtain the following model to restore an image corrupted by impulsive noise. The `1 −TV image restoration model [19]:
AN US
These models are all special cases of (1.1) and in this paper we mainly apply the considered algorithm to solve the `2 −TV image denoising model. Next we review some definitions about subdifferentials and proximity operators, which play an important role in the algorithm and convergence analysis. The subdifferential of a convex function φ at a point x ∈ Rn is the set defined by ∂φ(x) := {y ∈ Rn : φ(z) > φ(x) + hy, z − xi, ∀z ∈ Rn }.
M
The conjugate function of φ is φ∗ , which is defined by φ∗ (y) := sup{hx, yi − φ(x)}. A familiar x
ED
characterization of subdifferentials between a function and its conjugate function is that for x ∈ domφ and y ∈ domφ∗ , y ∈ ∂φ(x) ⇔ x ∈ ∂φ∗ (y).
(2.3)
PT
For a convex function ϕ, the proximity operator of ϕ respect to a semi-definite matrix H at x is defined by
CE
1 proxϕ,H (x) := arg min{ ku − xk2H + ϕ(u) : u ∈ Rd }. 2
AC
If ϕ is strictly convex and coercive, then its proximity operator is uniquely existed [7]. If the positive definite matrix H is an identity matrix I, then the above proximity operator is reduced to 1 proxϕ,I (x) := arg min{ ku − xk22 + ϕ(u) : u ∈ Rd }, 2
and proxϕ (·) is short for proxϕ,I (·). The relation between subdifferentials and proximity operators is that for a convex function f on Rn , x in the domain of f and y ∈ Rn , y ∈ ∂f (x) ⇔ x = proxf (x + y), which is frequently used to prove the convergence of the algorithm. At the end of this section, we give some preliminaries about fixed-points. For an operator P : Rn → Rn , if there is x ∈ Rn satisfying x = P(x), 4
ACCEPTED MANUSCRIPT
then we call this point x as a fixed-point of P. In general, we choose a kind of convergent iterations to compute the fixed-point of an operator. Choosing a starting point x0 ∈ Rn , the iteration is given by xk+1 = P(xk ),
kP(x) − P(y)k2 ≤ kx − yk22 ,
CR IP T
and a sequence {xk : k ∈ N} is obtained, which is also known as the Picard sequence of P. To analyze the convergence of Picard sequences, we introduce the definition of non-expansive operators [9]. For an operator P, if it satisfies
AN US
then P is non-expansive. If P is non-expansive, we can obtain its fixed-point by calculating its K−average operator, which is defined by PK = KI + (1 − K)P,
where K ∈ (0, 1). Opial [26] proves that the Picard sequence of K−average operators converges to the fixed-point of P. Moreover, the Picard sequence of P converges to the fixed-point if P is strongly non-expansive, which is
The Split Bregman Iteration Algorithm
ED
3.
M
kP(x) − P(y)k22 ≤ hx − y, P(x) − P(y)i.
DFp (x, y) := F (y) − F (x) − hp, y − xi,
AC
CE
PT
The split Bregman iteration algorithm has great performance to solve the `1 regularization problem. [14] provided an ideology about how apply the split Bregman algorithm to solve a general optimization problem. It stimulates us to exploit the split Bregman algorithm to deal with the considered problem. As applications to anisotropic ROF total variation models, some numerical experiments are given to analyze the advantages and disadvantages of the algorithm. We start with the concept of ”Bregman distance” [1], which is associated with a convex function F given by
where p is in the subgradient set of F at y. This conception is primarily used to solve the constrained optimization problem: min F (u) u
s.t. Bu = b.
(3.4)
Using a quadratic penalty function, we change the problem (3.4) into an unconstrained problem min{F (u) + λ2 kBu − bk22 }, where λ is a constant. [14] gave the following Bregman iteration: u
uk+1 = argmin{DFp (u, uk ) + λ2 kBu − bk2 } u
= argmin{F (u) − hpk , u − uk i + λ2 kBu − bk2 }, u
pk+1 = pk − λB T (Buk+1 − b). 5
ACCEPTED MANUSCRIPT
A simplified Bregman iteration is uk+1 = argmin{F (u) + λ2 kBu − bk2 }, u
bk+1 = bk + Buk+1 − b.
min{Φ(u) + Ψ(v) : v = Bu}, u,v
AN US
and we change it into the following unconstrained problem:
CR IP T
Goldstein and Osher [14] proposed a split Bregman algorithm, which is an improvement of Bregaman algorithms. The difficulty to solve the ROF total variation model is the nondifferentiability of TV subnorms, which is the composition of a linear mapping and a `1 −norm (`2 −norm). The technique ”de-coupling” in split Bregman algorithms is to split the differential part and nondifferential part from the objective function. It helps to solve regularized `1 problems more quickly. The problem (1.1) is equivalent to the constrained optimization problem:
min{Φ(u) + Ψ(v) + u,v
λ kv − Buk22 }. 2
Applying Bregman iteration, we obtain (uk , v k ) = argmin{Φ(u) + Ψ(v) + λ kv − Bu − bk k22 }, 2 u,v
bk+1
bk
=
+ (Buk+1 − v k+1 ).
M
ED
To decouple the above problem, we minimize it iteratively with respect to u and v separately, which are given as follows, uk+1 = argmin{Φ(u) + λ2 kv k − Bu − bk k22 }, u
PT
v k+1 = argmin{Ψ(v) + λ2 kv − Buk+1 − bk k22 }. v
CE
Since the nondifferential part Ψ(·) has been ”split” out, the minimization of uk+1 is differentiable. Hence, the split Bregman iteration algorithm is to solve the following two subproblems alternatively and iteratively,
AC
Subproblem 1 : uk+1 = argmin{Φ(u) + λ2 kv k − Bu − bk k22 }, u
v k+1 = argmin{Ψ(v) + λ2 kv − Buk+1 − bk k22 }. v
Subproblem 2 : bk+1 = bk + (Buk+1 − v k+1 ).
Now we apply the split Breggman iteration algorithm to solve the anisotropic ROF total variation model, which is µ min ku − f k22 + | 5x u| + | 5y u|. (3.5) u 2 It is equivalent to the following constrained optimization problem µ min ku − f k22 + |dx | + |dy | u 2 s.t. dx = 5x u, dy = 5y u. 6
ACCEPTED MANUSCRIPT
Utilizing the penalty function, we obtain min
u,dx ,dy
µ λ λ ku − f k22 + |dx | + |dy | + kdx − 5x uk22 + kdy − 5y uk22 . 2 2 2
Furthermore, apply Bregman iteration to the above problem, and we have (uk+1 , dk+1 ) = argmin{ µ2 ku − f k22 + |dx | + |dy | + λ2 kdx − 5x u − bkx k22 + λ2 kdy − 5y u − bky k22 }, bk+1
=
u,dx ,dy bk + (5uk+1
− dk+1 ).
CR IP T
To derive the solution (uk+1 , dk+1 ), we split the problem by Bregman iterations and use an inner loop to obtain the solution, which is for n = 1 to N
uk+1 = argmin{ µ2 ku − f k22 + λ2 kdx − 5x u − bkx k22 + λ2 kdy − 5y u − bky k22 }, u
AN US
dk+1 = argmin{|dx | + |dy | + λ2 kdx − 5x u − bkx k22 + λ2 kdy − 5y u − bky k22 }. dx ,dy
M
The largest number of iterations in the inner loop is N . In fact, it is empirically found that for many applications the optimal efficiency is obtained when only one iteration of inner loop is performed, i.e., N = 1 in the algorithm. We refer the readers to [2] and [14] for more details. Therefore, to avoid the confusion, we assume that the solution (uk+1 , dk+1 ) is obtained by one step in the inner loop, which is N = 1. Hence, exploiting the splitting Bregman iteration algorithm to solve the problem (3.5) is reduced to solve the following Subproblem 3 and Subproblem 4, that is
ED
Subproblem 3 : uk+1 = argmin{ µ2 ku − f k22 + λ2 kdx − 5x u − bkx k22 + λ2 kdy − 5y u − bky k22 }, u
dk+1
=
dx ,dy k bx + (5x uk+1
PT
Subproblem 4:
bk+1 x
= argmin{|dx | + |dy | + λ2 kdx − 5x u − bkx k22 + λ2 kdy − 5y u − bky k22 }. − dk+1 x ),
bk+1 = bky + (5y uk+1 − dk+1 y y ).
AC
CE
Since the minimization of uk+1 in Subproblem 3 is differentiable, from the first order optimality condition, we obtain (µI − λ∇)uk+1 = µf + λ∇Tx (dkx − bkx ) + λ∇Ty (dky − bky ).
There are many effective iterative algorithms to obtain uk+1 and the most familiar one is the GaussSeidel method. In this case, uk+1 = Gk , where Gk is the Gauss-Seidel matrix [14]. For dk+1 , we can compute its optimal value by shrinkage operators, which is shrink(x, γ) =
x ∗ max(|x| − γ, 0), |x|
then we obtain
1 dk+1 = shrink(∇uk+1 + bk , ). λ We present the split Bregman iteration algorithm for the anisotropic TV denoising in Algorithm 3.1. 7
ACCEPTED MANUSCRIPT
CR IP T
Algorithm 3.1 Split Bregman for Anisotropic TV Denoising Initialize: u0 = f, d0x = d0y = b0x = b0y = 0 while kuk − uk−1 k2 > tol do uk+1 = Gk dk+1 = shrink(5x uk+1 + bkx , λ1 ) x dk+1 = shrink(5y uk+1 + bky , λ1 ) y bk+1 = bkx + (5x uk+1 − dk+1 x x ) k+1 k k+1 k+1 by = by + (5y u − dy ) end while
AC
CE
PT
ED
M
AN US
We test the split Bregman iteration algorithm on the ”Cameraman” image. By choosing proper parameters, we obtain the denoising results in Figure 1. From Figure 1, most noise of the image is obviously eliminated and the convergence of the algorithm is totally retained. Furthermore, it is noted that the noise of the image is hard to distinguish in the visual effect after first few iterations. But the iterate is not terminated because that the convergence condition is unsatisfied. To verify this case, we compute the peak-signal-to-noise ratio (PSNR) between the true image and the denoising image at each step. The result is presented in Figure 2. From the figure, the value of PSNR grows up quickly at first few iterations and it tends to stable in latter iterations. This is because that the TV norm allows the nonsmooth in some level, and the effect of µ2 ku − f k22 is relatively little at early iterations where the noise is large. Then there is ”staircase” effect in the regions where the feature of the image is not obvious and the quality of the image is affected.
Figure 1: Split Bregman for anisotropic TV denoising After most noise of the image is eliminated, the split Bregman algorithm has to spend more time to flatten the image, and the computation speed of the algorithm drops down. Moreover, the 8
ACCEPTED MANUSCRIPT
M
AN US
CR IP T
inconspicuous details of the image becomes difficult to distinguish. For example, the front cloth of the man and the gray floor are hard to discern. Hence, improving the algorithm possibly to preserve the inconspicuous details of images is necessary. Since the convergence of split Bregman algorithm is nonsensitive to the precision of subproblems, we add a new proximity term 2θ ku − uk k22 to the subproblem. By adjusting proximity parameters, there is little deviation between the two consecutive inner iterations and the characteristic of the image is kept. We will exhaustively discuss this part in the next section.
4.
ED
Figure 2: The relation between PNSR and iteration
The Adaptive Fixed-Point Proximity Algorithm
CE
PT
In this section we will use the fixed-point proximity algorithm to solve the considered problem and an adaptive tactic is employed to adjust proximity parameters. We call the improved algorithm as the adaptive fixed-point proximity algorithm. The convergence of the algorithm is proved under some proper assumptions. k
k−1
AC
−u k Since the stopping criterion of the algorithm is to judge the condition ku ku > tol, to kk avoid unnecessary computation, we wish to determine whether the iteration is stopped once uk is updated. We change the order of iterations and obtain
λ v k+1 = argmin{Ψ(v) + kv − Buk − bk k22 }, 2 v k+1 k k k+1 b = b + (Bu − v ), λ θ uk+1 = argmin{Φ(u) + kv k+1 − Bu − bk+1 k22 + ku − uk k22 }. 2 2 u
(4.6)
Here, we give a general formula of the algorithm with respect to (1.1) and it is also suitable to other ROF total variation problems. The next lemma states that the considered algorithm can be written as the fixed-point equations of some operators. 9
ACCEPTED MANUSCRIPT
Lemma 4.1. The problem (4.6) is equivalent to solve the following equations: (
wk+1 = proxλΨ∗ (λBuk + wk ), uk+1 = proxΦ (uk+1 − (λB T B + θI)(uk+1 − uk ) − B T (2wk+1 − wk )).
CR IP T
Proof: The first order optimality condition of (4.6) is 0 ∈ ∂Ψ(v k+1 ) + λ(v k+1 − Buk − bk ), bk+1 = bk + (Buk − v k+1 ), 0 ∈ ∂Φ(uk+1 ) + λB T (Buk+1 + bk+1 − v k+1 ) + θ(uk+1 − uk ).
(4.7)
(4.8)
From the first equality of (4.8), using the relations between subdifferentials and conjugate functions, we obtain
AN US
λv k+1 ∈ ∂(λΨ∗ (λ(Buk − v k+1 + bk ))).
(4.9)
Denote wk := λbk , multiplying λ to the two sides of the second equality in (4.8), and we have wk+1 = λBuk + wk − λv k+1 .
(4.10)
Utilizing subdifferentials and proximity operators, it follows from (4.9) and (4.10) that
M
wk+1 = proxλΨ∗ (λBuk + wk ).
(4.11)
By λBuk + wk = λv k+1 + λ(Buk + bk − v k+1 ) and (4.10), we have
ED
λB T (Buk+1 + bk+1 − v k+1 ) + θ(uk+1 − uk ) = (λB T B + θI)(uk+1 − uk ) + B T (2wk+1 − wk ).
PT
Combining the above equality and the third equation in (4.8), we obtain uk+1 = proxΦ (uk+1 − (λB T B + θI)(uk+1 − uk ) − B T (2wk+1 − wk )).
(4.12)
CE
From (4.11) and (4.12), the equivalent form of (4.6) can be written as wk+1 = proxλΨ∗ (λBuk + wk ), uk+1 = proxΦ (uk+1 − (λB T B + θI)(uk+1 − uk ) − B T (2wk+1 − wk )).
AC
(
For the problem (1.1), [19] provides a general formula of fixed-point problems. Given two positive definite matrices P and Q, denote ! ! 0 −B T P 0 SB := , R := , E := I + R−1 SB . B 0 0 Q Define T := prox(Φ+Ψ∗ ),R , then (1.1) is equivalent to the following fixed-point problem z = (T ◦ E)(z). 10
(4.13)
ACCEPTED MANUSCRIPT
Furthermore, [19] proposes a multi-step fixed-point algorithm, which is z k+1 = T (E0 + R−1
X
Mi z k−i+1 ),
(4.14)
where E0 = I + R−1 (SB − M0 ). Given a group matrix M = {Mi : i = 0, 1, 2...n}, the above fixed-point problem is rewritten as
CR IP T
z k+1 = TM (z k , z k−1 ...z k+1−n ).
M-conditions of matrix group M: 1. M0 = Σni=1 Mi ; 2. M1 = M2 = · · · = Mn−1 ;
4. N (H) ⊆ N (Mn ) 1
1
T
N (MnT );
5. k(H † ) 2 Mn (H † ) 2 k < 12 .
M
3. H := M0 + Mn , H is positive definite;
AN US
By selecting proper M and posing different weights to the sequence {z k−i+1 : i = 0, 1, 2...n}, the sequence generated by the algorithm converges to the fixed-points of the problem. Hence, we construct a sequence of matrices satisfying the ”M-conditions” [19], which is
ED
where N (A) = {x ∈ Rm : Ax = 0} is the null space of A and A† is the generalized inverse matrix of A. By choosing a proper matrix group, we provide the convergence of the algorithm (4.6).
CE
PT
Theorem 4.1. Let σ be the largest singular value of the matrix B and the parameters λ, θ > 0 satisfy λ + σθ2 > 1. Define the matrix group M by M0 = M1 =
λB T B + θI B T 1 B λI
!
,
(4.15)
AC
where i 6= 0, 1, Mi = 0. Then M satisfies the ”M −conditions”; moreover, the sequence generated by (4.7) converges to the solution of (1.1). Proof: Firstly, for i > 2, Mi = 0, the second condition is satisfied; again M0 = M1 , the first condition is also satisfied. For ∀XH ∈ N (H), because of H = M0 + Mn = 2M1 , it has HxH = 2M1 XH = 0, which is XH ∈ N (M1 ). Since M1 is symmetric, it holds XH ∈ N (M1T ). Hence, we obtain XH ∈ N (Mn ) ∩ N (MnT ), which is N (H) ⊆ N (Mn ) ∩ N (MnT ), and the fourth condition is satisfied. Next we only prove that the third condition is also satisfied, i.e., H = 2M1 is positive definite, which means that M1 is positive definite. 11
ACCEPTED MANUSCRIPT
Make a decomposition to M1 as f D, M1 = D T M
where
1
D :=
(λB T B + θI) 2 0
0 1 1 ( λ I) 2
!
.
Obviously, the corresponding matrix !
CR IP T
f= M
I 1 T B(λB B + θI)− 2
1
(B(λB T B + θI)− 2 )T I
.
f and M1 are congruent, the signs of their eigenvalues are same, then they have the same Since M f. From [19], it holds positive definiteness. Now we only need to verify the positive definiteness of M f is positive definite if and only if that M 1
AN US
kB(λB T B + θI)− 2 k2 < 1.
(4.16)
Without loss of generality, assume that B is phalanx. Make a singular value decomposition to B, that is B = U Σ V T,
ED
M
where U , V are unitary matrices and Σ is a diagonal matrix. The maximum value of the diagonal matrix is the singular value, and denote it by σ. Taking a singular value decomposition to the matrix in (4.16), we obtain 1 1 B(λB T B + θI)− 2 = U V T. Σ(λΣ2 + θI)− 2 Because the singular value of the matrix is equal to its two-norm, we have
PT
1 1 1 kB(λB T B + θI)− 2 k2 = σ(λσ 2 + θI)− 2 = q λ +
θ σ2
.
CE
If λ + σθ2 > 1, then (4.16) is satisfied, and M1 is positive definite, which means that the third condition is satisfied. From [19, Theorem 6.4], the sequence generated by (4.7) converges to the solution of (1.1). The proof of this theorem is completed.
AC
Furthermore, to improve the algorithm, we consider an adaptive criterion on penalty parameters. For the subproblem in (4.6), we add a proximity term 2θ ku−uk k22 to control the gap between the two consecutive iterations. At first few steps, the parameter θ is taken large enough and the ”staircase” effect is almost eliminated. When the algorithm tends to the convergence, the noise of the image is little, but the computation speed of the algorithm is dropped. To solve this contradiction, we consider a new updating criterion to adjust proximity parameters during iterations. It is natural to iterate θ as follows: θk+1 =
kuk+1 − uk k ∗ θk . kuk − uk−1 k 12
ACCEPTED MANUSCRIPT
Denote by θ0 the initial value of θ, then the above iteration criterion can be written as θk = ρk θ0 , where ρk = kuk − uk−1 k.
ρk+1 = min(υkuk+1 − uk k, 1),
CR IP T
Since λ represents the tolerance of the noise, it has a great relation with penalty parameters. The variation trend of θ and λ is inverse, that is, θ decreases while λ increases. To achieve this goal, choose λk = (1 − θθk0 )λ0 , where λ0 is a positive constant. With the help of ρk , we obtain a new criterion of λk , which is λk = (1 − ρk )λ0 . Hence λ and θ are sufficiently controlled by choosing ρ ∈ (0, 1). In particular, taking ρ = 1, θ = θ0 and λ = 0, the value of 2θ ku − uk k22 is largest and the intention of the penalty is fully exerted. During the process of iterations, we decrease θk to reduce the weight of penalty terms. Since kuk+1 − uk k in the stopping criterion decreases as the algorithm tends to the convergence, again we consider to employ the following updating criterion
By few calculations, we obtain
M
AN US
where υ > 0, apparently ρk ∈ (0, 1). Since kuk+1 − uk k is computed at each iteration, compared to the computation of obtaining the inverse matrix of a large sparse matrix, the computation generated by updating ρk can be ignored. This is one of good points of adaptive tactics. To grant the convergence of the algorithm, bring θk and λk in ”λ + σθ2 > 1” in Theorem 4.1, and we have σ p < 1. (1 − ρk )λ0 σ 2 + ρk θ0
Numerical Results for Isotropic Total-Variation Denoising
PT
5.
ED
θ0 > σ2, (4.17) λ0 where σ is the largest single value of B. Hence the initial values θ0 and λ0 only need to satisfy the condition (4.17).
AC
CE
In this section, we apply the adaptive fixed point proximity algorithm to solve the classic ROF total variation model for removing Gaussian noise. Not only the numerical results for denoising images are provided, but also the comparison with other familiar algorithms are given. Consider the isotropic total variation denoising model, which is µ min{ ku − f k22 + kukT V : u ∈ Rn }. 2
Regarding (4.6), here Φ(·) = µ2 k·−f k2 , Ψ(·) is the TV norm and Ψ(x) = Σni=1 k(xi , xn+i )k2 . Instead of solving (4.7), we consider the following multi-step fixed-point iteration, which is ( k+1 u = proxΦ,P (uk − P −1 B T (wk + ρk (wk − wk−1 ))), (5.18) wk+1 = proxλ0 Ψ∗ (wk + λ0 B(uk+1 + (1 − ρk )(uk+1 − uk ))). where P := κI + θ0 B T B, B is the first order difference matrix, which is " # Iq ⊗ Dp B := , Dq ⊗ Ip 13
ACCEPTED MANUSCRIPT
and ”⊗” denotes the matrix Kronecker product. The difference matrix Dp is given as 1 −1 1 Dp := .. .
−1 ..
.
−1
1
.
CR IP T
Since proxΦ,P (x) = (P + µI)−1 (P x + µf ), replacing x by uk − P −1 B T (wk + ρk (wk − wk−1 )), we obtain uk+1 = (µI + P )−1 (µf + P uk − B(wk + ρk (wk − wk−1 ))).
(5.19)
For the second equation of (5.18), denote v k+1 := wk + λ0 B(uk+1 + (1 − ρk )(uk+1 − uk )), then it has wk+1 = proxλ0 Ψ∗ (v k+1 ). By Moreau decomposition, which is ◦(
1 I), λ0
AN US
I − proxλ0 Ψ∗ = λ0 prox we have
1 Ψ λ0
wk+1 = proxλ0 Ψ∗ (v k+1 ) = v k+1 − λ0 prox
1 Ψ λ0
(
1 k+1 v ). λ0
Therefore, the second iteration of (5.18) becomes
M
v k+1 = wk + λ0 B(uk+1 + (1 − ρk )(uk+1 − uk )) 1 Ψ λ0
( λ10 v k+1 ).
(5.20)
ED
wk+1 = v k+1 − λ0 prox
Based on (5.19) and (5.20), we give the adaptive fixed-point proximity algorithm to solve the isotropic TV denoising model. The algorithm is provided in Algorithm 5.2.
CE
PT
Algorithm 5.2 Adaptive Fixed-Point Proximity Algorithm for TV Denoising Initialize: µ, λ0 , θ0 , υ > 0, u0 = f, w−1 = w0 = 0 while kuk − uk−1 k22 > tol do uk+1 ← (µI + P )−1 (µf + P uk − B T (wk + ρk (wk − wk−1 ))) v k+1 ← wk + λ0 B(uk+1 + (1 − ρk )(uk+1 − uk )) wk+1 ← v k+1 − λ0 prox 1 Ψ ( λ10 v k+1 ) λ0
AC
if υkuk+1 − uk k < 1 then ρk+1 ← υkuk+1 − uk k end if end while
In the first experiment, we choose the images of ”Cameraman”, ”Lena” and ”Peppers” as the original images f with sizes 256 × 256, 512 × 512 and 512 × 512, respectively. The noisy image is added by Gaussian white noise. We use PSNR as the evaluation of the quality of denoised images. It is given by 2552 n P SN R := 10 log10 ∗ (dB), ku − uk2 14
ACCEPTED MANUSCRIPT
where u∗ is the image matrix after denoising, n is the pixels number of matrices. The iteration of the algorithm is terminated when the following condition is satisfied: kuk − uk+1 k2 ≤ tol, kuk+1 k2
AN US
CR IP T
In this example, we set tol = 1.0e − 05 as the tolerance value. The algorithm is carried out with µ = 50 and the images are denoised by two iterations. The results are provided in Figure 3. The first, second row in Figure 3 are the original images, noised images and the third, forth rows are the resulting denoised images. From the comparison, we can see that the presented algorithm retentions most information where the details of images are obscure, and there is none obvious ”staircase” effect. We also give the stopping condition with iterations for each image in Figure 4. It shows that the algorithm is stable along with iterations and the strategy of adjusting proximity parameters is reasonable. As ρk+1 is adjusted constantly, the obtained uk+1 in the last few iterations has little deviation and the stopping condition is stable along with iterations. Furthermore, we provide the PSNR and iterations by choosing different µ in Table 1. The PSNR is larger and the denoising result of the image is better. From Table 1, as µ increases, both PSNR and iterations have much improvement, which means that the algorithm has great implement to remove Gauss noise from the image.
M
31.7968 32.6716 33.2727 33.7854 34.2533
67 53 48 43 37
ED
20 30 40 50 60
Table 1: Numerical Results for Removing Gauss Noise Cameraman Lena Peppers PSNR(dB) Iteration PSNR(dB) Iteration PSNR(dB) Iteration
PT
Example µ
31.6238 32.6725 33.3888 34.0106 34.5431
52 46 35 34 31
31.5469 32.7230 33.5134 34.2199 34.7854
49 46 36 31 29
AC
CE
In the second experiment, we compare the adaptive fixed point proximity algorithm (AFPP) with other two familiar algorithms. One is the fixed-point proximity (FPP) algorithm [20] and the other is the split-Bregman (SP) algorithm [14]. We choose the image of ”Cameraman” as the original image. For different µ, the PSNR, iterations and cpu time of three algorithms are reported in Table 2. From Table 2, under the same stopping condition, the performance of AFPP is superior to that of other two algorithms. When the numbers of iterations of three algorithms are almost the same, the PSNR of AFPP is higher than other two algorithms and the cpu time is less. Since there is no penalty in SP, the efficiency to remove the noise of the image becomes low. When the penalty is added to FPP, the problem becomes easier to solve. However, the penalty parameter of FPP is fixed and it leads to the ”staircase” effect at unobvious parts of the image. Then the quality of the image is affected. So we employ an adaptive criterion to adjust the penalty parameter constantly. It not only accelerates the calculation speed of the algorithm, but also weakens the ”staircase” effect. Hence, the presented adaptive fixed-point proximity algorithm has great performance to solve this total variation model. 15
AC
CE
PT
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
Figure 3: Denoising the images by adaptive fixed point proximity algorithm. The first row is the original images and the second row is the noising images. The third and forth rows are the resulting denoising images 16
(a) Cameraman
CR IP T
ACCEPTED MANUSCRIPT
(c) Peppers
(b) Lena
80 90
6.
12.5575 12.8394 11.1741 11.0960 9.8405 9.9726 9.0293 8.9009 8.2893 8.3499
33.7828 33.7636 34.2472 34.2336 34.7093 34.7206 35.1520 35.1362 35.4847 35.5309
M
70
40 41 36 37 32 32 29 29 27 27
ED
60
33.7869 33.8432 34.2541 34.2923 34.7184 34.7460 35.1632 35.1772 35.5531 35.5384
PT
50
Table 2: Comparisons of AFPP, FPP and SP AFPP FPP PSNR(dB) Iter CPU PSNR(dB) Iter CPU PSNR(dB) 42 41 37 38 33 33 29 29 29 29
12.9697 12.7081 11.4598 11.8438 10.1560 10.0840 9.0017 8.9100 8.9722 8.6866
33.7478 33.7669 34.2574 34.2641 34.6998 34.7371 35.1349 35.1482 35.5497 35.5365
SP Iter
CPU
43 42 38 40 33 35 29 29 27 27
13.4066 13.0614 12.3515 12.2461 10.1368 10.7379 9.1763 9.6834 8.8800 9.0683
Conclusion
CE
µ
AN US
Figure 4: Iterations for the stopping condition
AC
In this paper, we introduce a convex sum optimization model, by using techniques of decoupling and splitting, the optimization problem is decomposed into several subproblems. Due to nondifferentiability and nonsmooth of the problem, we solve its proximity minimization problem. To weaken the ”staircase” effect in the image denoising, an adaptive criterion of penalty parameters is given to improve the algorithm. The numerical experiments are given to illustrate good performance of the algorithm.
Acknowledgements The authors thank two anonymous referees for a number of valuable and helpful suggestions. This research was supported in part by the Natural Science Foundation of China, Grant 11301347, 11601061 and 31271077. 17
ACCEPTED MANUSCRIPT
References
CR IP T
[1] Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Momputational mathematics and Mathematical Physics. 7(3), 200-217 (1967). [2] Cai, J.F., Osher, S., Shen, Z.W.: Split Bregman methods and frame based image restoration. Multiscale Modeling and Mimulation. 8(2), 337-369 (2009).
AN US
[3] Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numerische Mathematik. 76(2), 167-188 (1997). [4] Chambolle, A.: An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision. 20(1), 89-97 (2004). [5] Chambolle, A.: Total variation minimization and a class of binary MRF models. Energy Minimization Methods in Computer Vision and Pattern Recognition. 3757, 136-152 (2005).
M
[6] Chan, R.H., Chan, T.F., Shen, L.X. and Shen, Z.W.: Wavelet algorithms for high-resolution image reconstruction. SIAM Journal on Scientific Computing. 24(4), 1408-1432 (2003).
ED
[7] Chan T.F. and Esedoglu, S.: Aspects of total variation regularized `1 function approximation. SIAM Journal on Applied Mathematics. 65(5), 1817-1837 (2005).
PT
[8] Chen, D.Q., Zhang, H., Cheng, L.Z.: A fast fixed point algorithm for total variation deblurring and segmentation. Journal of Mathematical Imaging and Vision. 43(3), 167-179 (2012).
CE
[9] Combettes, P. and Wajs, V.: Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation. 4(4), 1168-1200 (2005).
AC
[10] Combettes, P.L., and Pesquet, J.C.: Proximal Splitting Methods in Signal Processing. FixedPoint Algorithm for Inverse Problems in Science and Engineering. 49, 185-212 (2011). [11] Esser, E., Zhang, X.Q. and Chan, T.F.: A general framework for a class of first order primaldual algorithms for convex optimization in imaging science. SIAM Journal on Imaging Sciences. 3(4), 1015-1046 (2010). [12] Gao, Y.: Nonsmooth analysis. Science Press. Beijing. (2008). [13] Goldfarb, D., Yin, W.T.: Second-order cone programming methods for total variation-based image restoration. SIAM Journal on Scientific Computing. 27(2), 622-645 (2005). [14] Goldstein, T. and Osher, S.: The split Bregman method for L1-regularized problems. SIAM Journal on Imaging Sciences. 2(2), 323-343 (2009). 18
ACCEPTED MANUSCRIPT
[15] Hinterm¨ uller, M. and Stadler, G.: An Infeasible Primal-Dual Algorithm for TV-Based infConvolution-type Image Restoration. SIAM Journal on Scientific Computing. 28(1), 1-23 (2004). [16] Hu, Y.H., Li, C. and Yang, X.Q.: On convergence rates of linear proximal algorithms for convex composite optimization with applications. SIAM. J. Optmization. 26(2), 1207-1235 (2016).
CR IP T
[17] Lewis, A.S. and Wright, S.J.: A proximal method for composite minimization. Math. Program. 158(1), 501-546 (2016). [18] Li, Q., Micchelli, C.A., Shen, L.X. and Xu, Y.S.: A proximity algorithm accelerated by GaussSeidel iterations for L1/TV denoising models. Inverse Problems. 28(9), 095003 (2012).
AN US
[19] Li, Q., Shen, L.X., Xu, Y.S. and Zhang, N.: Multi-step fixed-point proximity algorithms for solving a class of optimization problems arising from image processing. Advances in Computational Mathematics. 41(2), 387-422 (2015). [20] Micchelli, C.A., Shen, L. and Xu, Y.: Proximity algorithms for image models: denoising. Inverse Problems. 27(4), 045009 (2011). [21] Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. CR Acad. Sci. Paris Sr. A Math. 255, 2897-2899 (1962).
M
[22] Nikolova, M.: Local strong homogeneity of a regularized estimator. SIAM Journal on Applied Mathematics. 61(2), 633-658 (2000).
ED
[23] Nikolova, M.: A variational approach to remove outliers and impulse noise. Journal of Mathematical Imaging and Vision. 20(1), 99-120 (2004).
PT
[24] Nitanda, A.: Stochastic Proximal Gradient Descent with Acceleration Techniques. Advances in Neural Information Processing Systems 27. (NIPS 2014).
CE
[25] Qin, Z., Goldfarb, D., Ma, S.: An alternating direction method for total variation denoising. Optimization Methods and Software. 30(3), 594-615 (2015).
AC
[26] Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bulletin of the American Mathematical Society. 73(4), 591-597 (1967). [27] Osher, S., Burger, M., Goldfarb, D.: An iterative regularization method for total variationbased image restoration. Multiscale Modeling & Simulation. 4(2), 460-489 (2005). [28] Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena. 60(1), 259-268 (1992). [29] Rudin, L., Osher, S. and Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena. 60(1), 259-268 (1992). [30] Rudin, L.I., Osher, S.: Total variation based image restoration with free local constraints. Image Processing, 1994. Proceedings. ICIP-94, IEEE International Conference. IEEE. 1, 31-35 (1994). 19
ACCEPTED MANUSCRIPT
[31] Shefi, R., Teboulle, M.: Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM Journal on Optimization. 24(1), 269-297 (2014). [32] Wang, R.X. and Tao D.C. Non-Local Auto-Encoder With Collaborative Stabilization for Image Restoration. IEEE Transactions on Image Processing. 25(5), 2117-2129 (2016).
CR IP T
[33] Wang, Y.H., Xu, C., You, S., Tao, D.C., Xu, C.: Parts for the Whole: The DCT Norm for Extreme Visual Recovery. arXiv:1604.05451. (2016).
AC
CE
PT
ED
M
AN US
[34] Xu, C., Liu, T.L., Tao, D.C.,: Local Rademacher Complexity for Multi-Label Learning. IEEE Transactions on Image Processing. 25(3), 1495-1506 (2016).
20