Image denoising via 2D dictionary learning and adaptive hard thresholding

Image denoising via 2D dictionary learning and adaptive hard thresholding

Pattern Recognition Letters 34 (2013) 2110–2117 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier...

3MB Sizes 2 Downloads 122 Views

Pattern Recognition Letters 34 (2013) 2110–2117

Contents lists available at ScienceDirect

Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

Image denoising via 2D dictionary learning and adaptive hard thresholding Xuande Zhang a,b, Xiangchu Feng a,⇑, Weiwei Wang a, Guojun Liu b a b

Department of Applied Mathematics, Xidian University, Xi’an, China School of Mathematics and Computer Science, Ningxia University, Yinchuan, China

a r t i c l e

i n f o

Article history: Received 7 June 2012 Available online 8 August 2013 Communicated by G. Borgefors Keywords: Image denoising 2D dictionary learning Singular value decomposition Adaptive hard thresholding

a b s t r a c t There is extensive interest in taking advantage of self-similarity inherent in images to learn adaptive dictionary for effective image representation and denoising in recent years. In this letter, we present a complementary view. When a group of similar patches are arranged into the so called similarity data matrix (SDM), there exist linear correlations among both columns and rows of the SDM. With this observation, we propose an image denoising algorithm based on 2D dictionary learning and adaptive hard thresholding (2DDL-AHT). In this algorithm, both row-correlation and column-correlation of the SDM are explored by 2D dictionary learning, and a group of similar patches are estimated by using adaptive hard thresholding. The experiments indicate that the proposed algorithm performs on par or slightly better than the state-of-the-art denoising methods. Ó 2013 Elsevier B.V. All rights reserved.

1. Introduction Image denoising is an indispensable step in any imaging system, and it remains one of the most important problems in the field of image processing. It is now well realized that image priors (assumption about the principle that the image data complies with) play a central role in image denoising, and there is a long sequence of image priors accompanying the progress in the research of image restoration, ranging from global smoothness to sparsity with respect to a certain dictionary, to self-similarity of image content, etc. Variational method assumed a smooth function space, such as bounded variation space and besov space, etc., as a prior and measured the global smoothness of nature images by the norm or semi-norm defined in this space (Rudin et al., 1992; Aubert and Kornprobst, 2006). This method has drawn considerable attention due to its mathematical fascination. However, natural images are too complex to be uniformly contained in a single smooth function space. As a consequence, this method tends to smooth out tiny structures of natural images, such as fine textures and weak edges. The sparse representation method assumed that any informative image can be well represented by a few atoms drawn from a dictionary (the dictionary can be redundant or not). The key of this method is to construct or learn an appropriate dictionary which can accurately fit the local structures of images. Wavelet has ever ⇑ Corresponding author at: Department of Applied Mathematics, Xidian University, Xi’an, China. Tel./fax: +86 0951 2061246. E-mail address: [email protected] (X. Feng). 0167-8655/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2013.07.018

been widely used due to its multiscale and localization properties compared to the Fourier transform (Sendur and Selesnick, 2002; Pizurica et al.,2002), but it cannot represent the anisotropic structures well due to its isotropic nature. To capture the anisotropic structures, a flood of directional extensions, such as curvelet, contourlet and bandelet (Mallat, 2008) have been constructed. These directional extensions show certain advantages over the classical wavelet. However, natural images always contain diverse and irregular patterns which cannot be well characterized by these predefined analytics dictionaries, and denoising algorithms built on these dictionaries can introduce visual artifacts in the denoising output. To alleviate this problem, several adaptive learning methods have been proposed. Specifically, Elad and Aharon proposed a dictionary learning paradigm, known as K-SVD, for sparse and redundant image representation (Aharon et al., 2006; Elad and Aharon, 2006). Priyam et al. and Zhang et al. proposed to learn local dictionaries from the group of similar patches by using principle component analysis (PCA) (Chatterjee and Milanfar, 2009; Zhang et al. 2010). All these adaptive learning methods show promising denoising performance. Self-similarity image prior assumes that there are fruitful repeating patterns inherent in natural images. To the best of our knowledge, there are two methods available for exploring these self-similarity structures, namely, nonlocal method (Buades and Morel, 2005; Mahmoudi and Sapiro, 2005; Dabov et al., 2007; Rajwade et al., 2013) and graph-based method (Coifman and Maggioni, 2006; Hammond et al., 2011; Shuman et al., 2013). Nonlocal means (NLM) algorithm (Buades and Morel, 2005) is viewed as the seminal work on the nonlocal method. In this algorithm, each pixel

X. Zhang et al. / Pattern Recognition Letters 34 (2013) 2110–2117

is estimated by the weighted average of all pixels in the image, and the weights are evaluated according to similarity between two neighborhood patches. The BM3D algorithm (Dabov et al., 2007) is the most celebrated nonlocal method which jointly estimates the group of similar patches by using 3D wavelet approximation and leads to the state-of-the-art denoising performance. It is now well realized that the jointly estimation of a group similar patches (group-wise estimation) always leads to promising results at the expense of relatively high computational complexity. Indeed, most recent nonlocal denoising algorithms focus on how to jointly model or estimate a group of similar patches (Dabov et al., 2007; Mairal et al., 2009; Rajwade et al., 2013). The graph-based method constructs a graph from image data. The edge of the graph is usually assigned as the similarity measure between two neighborhood patches, and the images data is viewed as a function defined on the graph. The significance of this method is that the images is analyzed and processed in an inhomogeneous space or domain. In this letter, we sail under the flag of nonlocal method and present a complementary view. When a group of similar patches are arranged into a matrix, there exist linear correlations among both columns and rows of this matrix. Under the spirit of exploiting as much as possible structures underlying images for image modeling, we simultaneously consider the correlations among both columns and rows, and take the merits of singular value decomposition (SVD) to learn the 2D dictionaries for image denoising. In this approach, each group of similar patches are estimated by sparse approximation with respect to the learned 2D dictionaries and the sparse approximation is implemented by using adaptive hard thresholding. Additionally, the proposed approach is developed in a similar framework of K-SVD, which facilitates the extension of it to other inverse problems in imaging. Experiments indicate that the proposed algorithm achieves comparable peak signal to noise ratio (PSNR) performance with the BM3D algorithm, and are very competitive in fine structures preservation. The rest of the letter is arranged as the following. Section 2 presents the proposed 2DDL-AHT denoising algorithm in detail. Section 3 presents the experiment results and Section 4 concludes. 2. The 2DDL-AHT denoising algorithm

We consider the following image formation model

f ¼ u þ v;

on all the patches fRi ugNi¼1 extracted from the image and the first 10 principle vectors are taken to form a projection matrix U. The similarity between Rij u and Riu is then defined as

mðRij u; Ri uÞ ¼

1 jjUðRij u  UðRi uÞjj22 10

For each pixel, the SDM is constructed by using block matching in the 41  41 neighborhood window. Meanwhile, the similarity tolerance is set as s = 50 + 2r2 and t is constrained to be between 16 and 32. The above similarity metric has the following bonus. First, the pairwise similarity computed in the subspace spanned by the significant eigenvectors is more robust to additive noise than similarity directly computed in the signal space. Second, the computational load of grouping similar patches can be greatly reduced by projecting all the patches to the low dimensional subspace. The choice of the dimension of PCA subspace is a quite important problem for the PCA-induced similarity metric, this problem has ever been extensively discussed in Tasdizen (2009). In this letter, we empirically assign the dimension as 10. 2.2. Related works and our model Regularization based image denoising methods share the general formulation

arg min u

k jjf  ujj22 þ priorðuÞ; 2

ð1Þ

where u represents the ideal true image observed as f, v represents the independent identically distributed Gaussian noise of variance r2. We follow the notation style in (Aharon et al., 2006; Elad and Aharon, 2006). Denote the image u as u ¼ ½u1 ; . . . ; ui ; . . . ; uN T , where i indexes the pixel location and N represents the total numpffiffi pffiffi ber of pixels. Let Ri be the operation that extracts the s  s patch centered at position i from the image and vectorizes it into a s  1 vector. Let Gi be the operation that extracts the group of patches having a certain degree of similarity with Riu, specially, Gi u ¼ ½Ri1 u; Ri2 u; . . . ; Rit ust and each Rij u; j ¼ 1; 2; . . . ; t satisfies mðRij ; u; Ri uÞ < s, where m is the similarity metric, s is the predefined similarity tolerance and t is the number of patches extracted by Gi. We call Giu the similarity data matrix (SDM) corresponding to ui. Note that both Ri and Gi can be represented by matrix, meanwhile, both RTi Ri and GTi Gi are the diagonal matrix. In this study, the patch size is set to 7  7, namely, s = 49. The similarity is measured in a global principle component analysis (PCA) subspace (an excellent introduction for PCA can be found in Abdi and Williams (2010). Specifically, the PCA is performed

ð2Þ

in which the first term is the data fitting term requiring the proximity between the observed image and its denoised version, the second term is the regularization term modeling image priors, the parameter k trades off between the fitting fidelity and the regularization force. As aforementioned, the construction of regularization term is crucial for the denoising performance. The challenge of modeling natural images mainly originates from the high dimensionality of image data, the patch-based method, which modeling the patches with relatively lower dimension, seems the natural choice to tackle this problem. The K-SVD algorithm is one of the most effective patch-based methods, which can be formulated as N n o X k arg min jjf  ujj22 þ jjRi u  Dai jj22 þ li jjai jj0 ; 2 u;fai g i¼1

2.1. Notations

2111

ð3Þ

in this formula, the first term is the data fitting term of the same sense as in Eq. (2), the second term integrates the patch-wise sparse representation prior to form the global regularization force, where D is the redundant dictionary, ai is the coefficient vector and li is the tradeoff parameter. The K-SVD algorithm can be viewed as a loose generalization of K-means algorithm, which means that the K-SVD exploits the clustering nature of image patches and hence implicitly exploits selfsimilarity inherent in images. To explicitly incorporate the self-similarity into the K-SVD framework, we shift from patch-wise modeling to cluster-wise modeling and extend Eq. (3) to N X k arg min jjf  ujj22 þ priorðGi uÞ: 2 u;fai g i¼1

ð4Þ

Then the problem is casted to the modeling of the SDM Giu. In order to develop an effective modeling, the structure of SDM needs a further investigation. Fig. 1 illustrates two SDMs, we can see that there is high linear correlation among the columns of the SDM, and each row shares the constant-like structure, which means that there also exists high linear correlation among the rows of the SDM. This fact also can be experimentally verified. We randomly sample 2000 pixels from 512  512 image Baboon

2112

X. Zhang et al. / Pattern Recognition Letters 34 (2013) 2110–2117

Fig. 1. Similarity data matrix (SDM). The highlighted pixels in (a) illustrate the trace of grouping similar patches around pixel B and C. The SDMs corresponding to B and C are, respectively displayed in (b) and (c).

and extract the SDMs corresponding to these pixels (Fig. 2). For each SDM Gi u ¼ ½Ri1 u; Ri2 u; . . . ; Rit ust , the column-correlation is computed by

CCðGi uÞ ¼

X 2 corðRij1 u; Rij2 uÞ; tðt  1Þ 1
ð5Þ

2

in which cor is the linear correlation coefficient between two vectors. Here cor is defined as

 Pn   i¼1 ðai  aÞ bi  b ffi; corða; bÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P  Pn  2  2 ni¼1 bi  b i¼1 ðai  aÞ where a and b are vectors of form a ¼ ða1 ; . . . ; ai ; . . . ; an ÞT and T b ¼ ðb1 ; . . . ; bi ; . . . ; bn Þ , respectively. The row-correlation of the SDM Giu can be computed in the similar way as Eq. (5). It turned out that the average column-correlation of the 2000 SDMs extracted is 0.8127 and the average row-correlation is 0.8461, this means there actually exist high linear correlations among both columns and rows of the SDM. In the topic of image modeling, exploring more correlation/ redundancy/structures inherent in images always can improve the accuracy of the model. In this letter, we take the merits of SVD to simultaneously explore the correlations among both columns and rows of the SDM. The left singular vectors of Giu are derived by exploiting the correlations among its columns. Similarly, the right singular vectors of Giu are derived by exploiting the correlations among its rows. The tensor product of the left and right

singular vectors forms a 2D dictionary D2D i , and we model Giu by the sparse approximation with respect to D2D (Fig. 3). Then Eq. i (4) can be specified as N n o X k 2 arg min jjf  ujj22 þ jjGi u  D2D i ai jj2 þ li jjai jj0 ; 2 u;fai g i¼1

ð6Þ

where D2D i ; i ¼ 1; 2; . . . ; N is the locally learned 2D dictionary with its atoms of the same size as Giu and ai is the coefficient vector. The proposed model (6) is closely related with the several recent published denoising methods. Firstly, it is an enhancement to PCA-based methods (Chatterjee and Milanfar, 2009; Zhang et al. 2010) in terms that the correlations among both columns and rows of the SDM are simultaneously taken into account. Secondly, it has a similar framework as K-SVD algorithm, but goes beyond the patch-wise modeling to the cluster-wise modeling. Thirdly, it is apart from the existing cluster-wise modeling, such as that in Mairal et al. (2009). The approach in Mairal et al. (2009) jointly models the cluster of similar patches by structural sparse representation (SSP) with assumption that similar patches share the similar sparse representation. The employment of SSP makes this approach a little too complex in computation. The model (6) jointly models the cluster of similar patches by sparse representation with respect to the 2D dictionaries, which permit a relatively lower computational load. 2.3. Local 2D dictionary learning and adaptive hard thresholding Assume that Giu is of size s  t with s 6 t, write the SVD of the matrix Giu as

Gi u ¼

s X   lij uðjÞ  v ðjÞ ;

ð7Þ

j¼1

where  is the tensor product symbol, lij ; uðjÞ ; v ðjÞ ; j ¼ 1; 2; . . . ; s are singular value, left singular vectors and right singular vectors, respectively. The orthogonal systems

 ðjÞ    u : j ¼ 1; 2; . . . ; s and v ðjÞ : j ¼ 1; 2; . . . ; s

ð8Þ

respectively provides the basis for the column and row space of the SDM Giu. We construct the 2D atoms by the tensor product of the column-basis and the row-basis (see Fig. 3), specifically, h i 2D 2D 2D 2D dij ¼ uðjÞ  v ðjÞ ; j ¼ 1; 2; . . . s, and D2D ¼ di1 ; . . . ; dij ; . . . dis . The i 2D dictionary D2D is highly adaptive to the SDM Giu, and Giu has a i unique representation with respect to the D2D i , namely, Fig. 2. 2000 Pixels are randomly sampled from the image Baboon and the SDMs corresponding to these pixels are extracted.

Gi u ¼

s X 2D lij dij : j¼1

ð9Þ

2113

X. Zhang et al. / Pattern Recognition Letters 34 (2013) 2110–2117

Fig. 3. The 2D sparse modeling of the SDMs.

This facilitates the sparse approximation to Giu. The sparse approximation sub-problem

n

2 aei ¼ arg min jjGi u  D2D i ai jj2 þ li jjai jj0

o

ð10Þ

ai

can be reformulated as 2 2 aei ¼ arg min jjai jj0 subject to jjGi u  D2D i ai jj2 6 e  s  t  r ;

ð11Þ

ai

where s  t  r2 is the energy of noise and e is a relaxing parameter. For the proper choice of e, the two problems (10) and (11) are equivalent. The solution of the problem (11) is given by using adaptive hard thresholding on the representation coefficients in Eq. (9). Assume that lij ; j ¼ 1; 2; . . . ; s is in decreasing order, then the solution of (11) has the form of

aei ¼ ½li1 ; . . . ; lidim ; 0; . . . ; 0T ; and we only need to determine the dimension dim. The approximation error

jjGi u 

D2D i

    X s s X   2 2 e ai jj2 ¼ j lij dij j22 ¼ l ij ;   j¼dimþ1 j¼dimþ1

ð12Þ

then we take the value of dim such that s X

2

l ij 6 e  s  t  r 2 <

j¼dimþ1

s X

2

l ij :

ð13Þ

j¼dim

adaptive hard thresholding. Meanwhile, we update the relaxing parameter e during the alternative minimization process. Note that str2 in Eq. (11) is the energy of noise contained in the SDM (here the SDM is extracted from the observed image which is contaminated by the noise of variance r2). Additionally, it can be observed from Eqs. (11)–(13) that e  s  t  r2 is nearly the energy filtered out by adaptive hard thresholding. After the whole iteration process is completed, the total energy filtered out is assumed to be close with the noise energy. Since

s  t  r2 ¼ lim

p!1

With local dictionaries D2D i ; i ¼ 1; 2; . . . N learned, the model (6) can be optimized by alternatively minimizing with respect to the output image u and sparse approximation coefficients ai ; i ¼ 1; 2; . . . N. This means we need to alternatively solve the following sub-problems n o (I) aki ¼ arg min jjGi uk  D2D jj22 þ li jjai jj0 ; i ¼ 1; 2; . . . ; N. i aiP a k 2 (II) ukþ1 ¼ argi min 2k jjf  ujj22 þ Ni¼1 jjGi u  D2D i ai jj2 . u

The sub-problem (I) is solved through solving the constraint optimization problem of the same form as Eq. (11) by using

2 ; s  t  r 2k

k¼1

then we update e as ek ¼ 21k at kth iteration. The sub-problem (II) has a closed-form solution which can be easily derived. The parameter k in sub-problem (II) is inherited from model (6), in which it implies the degree of proximity between the observed image and its denoised version. Then we set it inversely proportional to the noise variance as k ¼ rC , where C is a predefined constant. The proposed 2DDL-AHT denoising algorithm is summarized in the following. Task: Estimate the ideal image u from noisy observation f. Initialization: u1 ¼ f k ¼ 1: Repeat J times (1) Update dictionaries D2D i ; i ¼ 1; 2; . . . ; N. (2) Update sparse representation coefficients as 2 aki ¼ arg min jjai jj0 subject to jjGi uk  D2D i ai jj2

ai

In this case, the constraint in (11) is satisfied. 2.4. Summarization of the proposed 2DDL-AHT denoising algorithm

p  X 1

6 ek  s  t  r2 : (3) Update the output image as

u

kþ1

¼

N X kI þ 2 GTi Gi i¼1

!1

! N

X T 2D k kf þ 2 Gi Di ai : i¼1

(4) k = k + 1. End There are overall two parameters (the predefined constant C and the iteration number J) need to be configured in this algorithm. In the experiment, we set C = 0.01 and J = 3. These parameters are applied for all the test images and noise levels.

2114

X. Zhang et al. / Pattern Recognition Letters 34 (2013) 2110–2117

luminance component is extracted by using MATLAB routine rgb2ycbcr. Three representative denoising algorithms, including NLM (Buades and Morel, 2005), K-SVD (Elad and Aharon, 2006) and BM3D (Dabov et al., 2007), are involved for comparison. The source codes of K-SVD and BM3D are downloaded from the author’s homepage. The NLM and 2DDL-AHT are implemented in Matlab by ourselves. For NLM, a larger portion of the computation is consumed by computing nonlocal weights. In our implementation the nonlocal weights are computed by convolution, which leads to at least 20 times acceleration compared to the classical implementation. The source code of the proposed 2DDL-AHT and our implementation for NLM are available online at http://www.mathworks.cn/ matlabcentral/fileexchange/authors/96116. In the experiment, the parameters of NLM, K-SVD and BM3D are set as configured by their authors and the parameters of 2DDL-AHT are set as C = 0.01 and J = 3. Additionally, all the experiments are performed on the platform of DELL D630 (Dual core 2.1 GHz CPU and 2G RAM).

3. Experiment results 3.1. Experiment setup Extensive experiments were conducted to evaluate the proposed 2DDL-AHT algorithm for image denoising. Due to the diversity and complexity of natural images, the same denoising algorithm may have different performance on different images. A way to remedy this case is to employ a large dataset which contains representative images of different content. We employ two groups of test images in the experiments. The first group includes eight benchmark test images (Fig. 4), in which image Baboon, Barbara and house contain repeating patterns of different morphology, and the rest five images mainly contain macro structures. The second group is the Kodak dataset which consists of 24 RGB images of size 512  768. Since 2DDL-AHT is designed for grayscale images, we only perform denoising experiment on the luminance component of RGB images in this group. Here the

Fig. 4. The test images: Cameraman, Butterfly, Man, Baboon, Barbara, Montage, House, Peppers.

Table 1 The PSNR results of competing algorithms. Four PSNR results (in dB) are reported in each cell. Top left: results of NLM, top right: results of K-SVD, bottom left: results of BM3D, bottom right: results of 2DDL-AHT. The best result in each cell is bolded.

rnimage

Cameraman 256  256

Butterfly 300  300

Man 512  512

Baboon 512  512

Barbara 256  256

Montage 256  256

House 256  256

Peppers 256  256

5

34.20 38.29 32.07 34.18 30.48 31.91 29.27 30.48 28.23 29.45 27.27 28.64 25.61 27.18 24.26 26.12

34.17 37.86 32.25 33.71 30.60 31.46 29.23 29.94 28.1 28.81 27.13 27.9 25.42 26.10 23.88 25.13

35.73 37.82 32.77 33.98 30.78 31.93 29.37 30.59 28.28 29.62 27.4 28.86 25.99 27.65 24.88 26.81

30.63 35.25 28.91 30.58 27.14 28.18 25.66 26.61 24.52 25.45 23.65 24.57 22.42 23.11 21.59 22.35

36.43 38.31 33.65 34.98 31.42 33.11 29.66 31.78 28.26 30.72 27.11 29.81 25.34 27.99 24.06 27.23

38.72 41.14 35.58 37.35 33.35 35.15 31.67 33.61 30.32 32.37 29.17 31.37 27.24 29.52 25.56 27.9

38.03 39.83 35.3 36.71 33.49 34.94 31.94 33.77 30.6 32.86 29.44 32.09 27.48 30.65 25.92 29.69

35.7 38.12 33.28 34.68 31.34 32.7 29.83 31.29 28.61 30.16 27.57 29.28 25.83 27.7 24.4 26.68

10 15 20 25 30 40 50

37.84 38.31 33.65 34.23 31.35 31.98 29.86 30.49 28.81 29.43 27.97 28.63 26.68 27.41 25.63 26.41

37.34 38.27 33.22 34.29 30.92 32.06 29.42 30.51 28.31 29.34 27.36 28.39 25.97 26.95 24.87 25.8

37.5 37.85 33.55 34.03 31.47 31.94 30.09 30.56 29.07 29.56 28.29 28.79 27.06 27.64 26.07 26.74

35.19 35.21 30.47 30.57 27.99 28.19 26.38 26.64 25.22 25.52 24.33 24.64 22.97 23.33 21.98 22.37

38.08 38.43 34.38 35.14 32.33 33.23 30.81 31.84 29.56 30.71 28.54 29.73 26.88 28.16 25.47 26.95

40.09 41.15 36.05 37.47 33.76 35.25 32.18 33.66 30.97 32.39 30.01 31.32 28.41 29.6 27.18 28.21

39.33 39.78 35.94 36.56 34.29 34.97 33.08 33.95 32.13 33.09 31.17 32.34 29.62 30.92 28.07 29.66

37.78 37.96 34.19 34.71 32.18 32.8 30.79 31.41 29.68 30.32 28.84 29.41 27.43 27.98 26.21 26.88

X. Zhang et al. / Pattern Recognition Letters 34 (2013) 2110–2117 Table 2 Average PSNR (in dB) results of four denoising algorithms on the Kodak dataset. The best result in each column (for each noise level) is bolded. Methodsnr

5

10

15

20

25

30

40

50

NLM K-SVD BM3D 2DDL-AHT

36.39 37.93 38.72 38.80

33.45 34.16 34.90 35.01

31.49 32.12 32.88 32.96

30.06 30.75 31.55 31.60

28.93 29.72 30.57 30.61

28.00 28.88 29.81 29.82

26.54 27.55 28.56 28.61

25.42 26.52 27.73 27.66

3.2. Comparison of time complexity Assume that the total number of image pixels is N, that the patch pffiffi pffiffi size is s  s, that the average number of patches similar to the reference window is of size pffiffiffi pffiffiffi patch is t, and that the searching T  T . The complexity of NLM is O(s2TN) because computing the Gaussian kernelized L2 distance between two patches requires O(s2) operations. K-SVD is an iterative algorithm and each iteration is composed of a sparse-coding step and a dictionary-updating step.

2115

The complexity of sparse-coding step depends on the sparse coding algorithm that is used, here we assume it to be O(Csparse).The complexity of dictionary-updating can be written as OðM min ðs2 K; sK 2 ÞÞ with M representing the number of atoms (size of the dictionary) and K representing the average number of patches involved in the process of updating one of the atoms. Then the total complexity of K-SVD comes to Oð½C sparse þ M minðs2 K; sK 2 ÞJÞ with J representing the iteration number. BM3D requires O(sT) time to collect a stack of similar patches around each pixel because L2p disffiffiffiffiffi tance is employed as the similarity measure, it also requires Oðt s3 Þ 2 time for 2D transforms and O(t s) time for 1D transform if the transforms are implemented by using p matrix multiplication. This leads ffiffiffiffiffi to a total complexity of Oð½sT þ t s3 þ t2 sNÞ. 2DDL-AHT measure the similarity in the global PCA subspace, then it requires O(dT) time to search the similar patches around each pixel. Here d represents the dimension of the subspace and we have d  s, see Section 2.1. The similar patches are arranged to form a s  t matrix, and the SVD of this matrix requires Oðmin½s2 t; st 2 Þ operations. Then the total complexity of 2DDL-AHT comes to Oð½dT þ minðs2 t; st 2 NÞ.

Fig. 5. Visual performance comparison on sub-images clipped from image Butterfly and Baboon.

2116

X. Zhang et al. / Pattern Recognition Letters 34 (2013) 2110–2117

Fig. 6. Visual performance comparison on sub-images clipped from image House and Peppers.

The above analysis indicates that the computational complexity of 2DDL-AHT is comparable with BM3D.

3.3. Comparison of denoising performance The PSNR results of the four denoising algorithms on each test image in Fig. 4 are compared in Table 1, and the average PSNR results on Kodak dataset are compared in Table 2. We can see that 2DDL-AHT uniformly outperforms NLM and K-SVD, and can achieve comparable denoising performance with BM3D. The NLM and K-SVD, respectively employed the pixel-wise estimate and the patch-wise estimate, while both 2DDL-AHT and BM3D employ the cluster-wise estimate. The comparison in these tables implies that the denoising performance can be gradually enhanced from pixel-wise estimate to patch-wise estimate, to cluster-wise estimate. A statistical foundation of this phenomenon is worth further study.

Figs. 5 and 6 compare the structure-preserving ability of these algorithms. We can see that 2DDL-AHT can better recover the fine structures in images and provide quite a good visual quality. This means that, by using locally learned dictionaries (LLD) with twodirection linear correlation taken into account, the 2D sparse modeling characterizes the cluster of similar patches considerably well. Compared to the BM3D which uniformly uses wavelet to characterize the similar patches, the 2DDL-AHT has a good opportunity to obtain clearer edges in the denoising output. 4. Conclusion This letter presents an iterative denoising algorithm based on 2D dictionary learning and adaptive hard thresholding. The 2D dictionary is learned by exploring the two-direction correlation inherent in image data and the hard thresholding is designed such that the total energy filtered out by threholding is close to the noise energy. On the one hand, the algorithm enhances the traditional PCA-

X. Zhang et al. / Pattern Recognition Letters 34 (2013) 2110–2117

based methods in terms that the correlation among both columns and rows of the SDM is taken into account. On the other hand, the algorithm extends the classical K-SVD algorithm by going beyond the patch-wise 1D sparse modeling to cluster-wise 2D sparse modeling. Numerical results indicate that the proposed algorithm at least can achieve comparable denoising performance with the BM3D algorithm and is very competitive in fine structures preservation. Acknowledgements This work was supported by the National Science Foundation of China (Grants 61001156, 11261044, 61105011, 11101292, 60872138 and 61271294) and by the National Science Foundation of Ningxia Province (NZ13049). References Abdi, H., Williams, L.J., 2010. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2, 433–459. Aharon, M., Elad, M., Bruckstein, A.M., 2006. The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54 (11), 4311–4322. Aubert, G., Kornprobst, P., 2006. Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations, second ed. Springer, New York. Buades, B.C., Morel, J.M., 2005. A non-local algorithm for image denoising. In: Proc. IEEE Conf. CVPR. pp. 60–65. Chatterjee, P., Milanfar, P., 2009. Clustering-based denoising with locally learned dictionaries. IEEE Trans. Image Process. 18 (7), 1438–1451.

2117

Coifman, R.R., Maggioni, M., 2006. Diffusion wavelets. Appl. Comput. Harmon. Anal. 21, 53–94. Dabov, K., Foi, A., Katkovnik, V., et al., 2007. Image denoising by sparse 3D transform domain collaborative filtering. IEEE Trans. Image Process. 16 (8), 2080–2095. Elad, M., Aharon, M., 2006. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15 (12), 3736–3745. Hammond, D.K., Vandergheynst, P., Gribonval, R., 2011. Wavelets on graphs via spectral graph theory. Appl. Comput. Harmon. Anal. 30 (2), 129–150. Mahmoudi, M., Sapiro, G., 2005. Fast image and video denoising via nonlocal means of similar neighborhoods. IEEE Signal Process. Lett. 12 (12), 839–842. Mairal, J., Bach, F., Ponce, J., et al., 2009. Non-local sparse models for image restoration. Proc. ICCV, 2272–2279. Mallat, S., 2008. A Wavelet Tour of Signal Processing: The Sparse Way, third ed. Academic, New York. Pizurica, A., Philips, W., Lemahieu, I., et al., 2002. A joint inter- and intrascale statistical model for Bayesian wavelet based image denoising. IEEE Trans. Image Process. 11 (5), 545–557. Rajwade, A., Rangarajan, A., Banerjee, A., 2013. Image denoising using the higher order singular value decomposition. IEEE Trans. PAMI 35 (4), 826–849. Rudin, L., Osher, S., Fatemi, E., 1992. Nonlinear total variation based noise removal algorithms. Phys. D 60, 259–268. Sendur, L., Selesnick, I.W., 2002. Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency. IEEE Trans. Signal Process. 50 (11), 2744–2756. Shuman, D.I., Narang, S.K., Frossard, P., Ortega, A., Vandergheynst, P., 2013. The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 30 (3), 83–98. Tasdizen, T., 2009. Principal neighborhood dictionaries for non-local mean image denoising. IEEE Trans. Image Process. 18 (12), 2649–2660. Zhang, L., Dong, W., Zhang, D., et al., 2010. Two-stage image denoising by principal component analysis with local pixel grouping. Pattern Recogn. 43 (4), 1531– 1549.