Information Sciences 482 (2019) 334–349
Contents lists available at ScienceDirect
Information Sciences journal homepage: www.elsevier.com/locate/ins
No-Reference quality assessment of noisy images with local features and visual saliency models Mariusz Oszust a,∗ Department of Computer and Control Engineering, Rzeszow University of Technology, Wincentego Pola 2, 35–959 Rzeszow, Poland
a r t i c l e
i n f o
Article history: Received 14 April 2017 Revised 8 August 2017 Accepted 11 January 2019 Available online 12 January 2019 Keywords: Image quality assessment No-reference image quality assessment Local features Visual saliency Image denoising
a b s t r a c t Image quality assessment (IQA) measures predict the perceived quality of evaluated images, aiming to replace time-consuming human evaluation. This is particularly important for the automatic comparison of image processing techniques which often modify image content. Since the presence of noise highly affects the perception of images and only a few IQA measures address this type of distortion, a new no-reference IQA measure is proposed in this paper. In the introduced measure, two aspects of the Human Visual System (HVS) are considered. The sensitivity of the HVS to local image distortions is expressed by similarities between descriptors obtained with Speeded-Up Robust Features (SURF) technique. In order to improve the quality prediction, image patches for description are extracted from a gradient map of an input image. Noise also affects the visual saliency, since it decreases attention caused by image content. In the approach, visual saliency models of image patches are obtained and evaluated using adapted Visual Saliency-based Index (VSI). Experimental evaluation on popular IQA benchmarks reveals that the proposed measure outperforms the related state-of-the-art techniques. The applicability of the proposed method to an automatic selection of the best denoising algorithm is also discussed in the paper. © 2019 Elsevier Inc. All rights reserved.
1. Introduction The rapid growth of multimedia processing and communication systems is associated with the high demand of quality assessment of displayed visual content. Such assessment typically involves tests with human subjects. However, collecting the subjective ratings is expensive and time-consuming. It may result in opinion scores that depend on preferences of a given test group. Therefore, automatic image quality assessment (IQA) measures have been developed to support a variety of computer vision techniques for image acquisition, transmission, enhancement, or restoration [8,37]. There are full-reference (FR), reduced-reference (RR), and no-reference (NR) objective IQA measures [27]. Full-reference techniques predict the quality of distorted images, comparing them with corresponding distortion-free reference images [2,30,47]. Reduced-reference measures, in turn, use only some properties of reference images, and NR techniques have no access to such information. NR-IQA measures are of practical importance, due the lack of undistorted versions of evaluated images in most applications [27]. These measures can be further classified into general-purpose and distortion-specific
∗
Corresponding author. E-mail addresses:
[email protected],
[email protected] URL: http://marosz.kia.prz.edu.pl
https://doi.org/10.1016/j.ins.2019.01.034 0020-0255/© 2019 Elsevier Inc. All rights reserved.
M. Oszust / Information Sciences 482 (2019) 334–349
335
techniques [27]. General-purpose NR techniques are designed for the assessment of images degraded with a variety of distortion types. However, the usage of the distortion-specific approach seems to be more justified for some isolated tasks. For example, they evaluate compressed images [19], images that are affected by: blur [17,25], contrast change [11], or noise [9,15,50]. The development of denoising techniques requires reliable methods for their comparison. This is particularly important in cases in which a pristine version of the denoised image is not available. In the literature, there are only several NRIQA methods for quality prediction of noisy images, despite their practical significance and high impact of noise on the perceptual image quality. In this paper, a novel method for NR-IQA of noisy images is proposed. The method addresses the sensitivity of the HVS to local distortions and irregularities caused by noise [10], which is expressed by similarities between descriptors obtained with Speeded-Up Robust Features (SURF) technique [1]. In order to improve the quality prediction, image patches for description are extracted from a gradient map of an input image. Then, they are matched within the image and their similarities are used for the image assessment. This part of the approach does not require training and can be regarded as a novel NR technique. Noise also affects the visual saliency, since it decreases attention caused by image content. Therefore, in this paper, another approach to determining the quality of a noisy image is proposed. In the approach, visual saliency models of image patches are obtained and evaluated using FR Visual Saliency-based Index (VSI) [47]. It is observed that visual saliency models of image patches degraded by noise are similar. This leads to a novel saliency-based NR technique in which such models for the assessed image are compared. The introduced NR measure is composed of two proposed NR techniques which are based on local features and visual saliency models. These techniques can also be used separately. However, their fusion is beneficial and leads to superior performance. To avoid dependence on a learning set used for determination of their possible joint usage, they equally contribute to the resulted measure. As a result, the method has a small number of parameters and does not require training. The introduced measure is compared with the state-of-the-art NR techniques using a widely accepted protocol in which an IQA method is evaluated using all images from a benchmark dataset. In this work, also a new protocol is proposed. In the protocol, the degraded versions of an image are considered. This image-oriented benchmark is of practical significance, since many computer vision approaches are evaluated comparing qualities of their output images. Therefore, an NR-IQA technique which shows superior performance in such test could be used for evaluation of image processing algorithms. Furthermore, this new protocol seems to better reflect the way in which many human trials are organised [32,33]. The development of an NR measure for the quality assessment of noisy images is very important from the practical point of view. Such a measure could support image denoising algorithms in which different denoised versions of an image are often compared, or to develop an approach which aggregates several denoising techniques to provide the best restored image, disregarding the type of noise. The paper contains a comparison of NR techniques on a database with images processed by the state-of-the-art denoising algorithms, as well as a discussion on the suitability of NR-IQA measures to an automatic selection of the best denoising algorithm. The database has been prepared for the needs of this work. The aim of such application is to obtain better denoising results than it can be achieved by a single restoration method. The contributions of this work are as follows: 1) A novel NR measure based on comparison of SURF descriptors for image patches. 2) A novel NR technique which uses visual saliency models of image patches obtained with FR VSI. 3) An effective fusion of the introduced NR measures. 4) A new protocol for evaluation of IQA measures in which the degraded versions of an image are considered. 5) An application of NR measures to an automatic selection of the best denoising algorithm. The rest of the paper is organised as follows. The next section shows related work. Section 3, in turn, describes the proposed approach. Then, in Section 4, evaluation protocols and used public benchmark datasets are presented. The section contains a comparison of the introduced measures with the related state-of-the-art techniques. In Section 5, the NR techniques are compared on the dataset of restored images. This section also contains a practical application of NR measures. Finally, Section 6 concludes the paper. 2. Related work In a typical NR-IQA general-purpose measure, feature vectors extracted from images are mapped into subjective opinion scores using a regression technique, as shown in, e.g., [29]. In the approach proposed by Moorthy and Bovik [29], the Distortion Identification-based Image Verity and INtegrity Evaluation (DIIVINE) is introduced. DIIVINE uses 88 features and employs two-stage framework for the prediction of the distortion type and the image quality estimation. A single-stage framework for NR measure, Blind Image Integrity Notator (BLIINDS-II), is proposed in [35]. The measure improves previously introduced BLIINDS-I and uses generalised natural scene statistics (NSS) model of local discrete cosine transform coefficients. Then, a Bayesian inference approach is used for the quality prediction. Scene statistics of locally normalised luminance coefficients are used in Blind/Referenceless Image Spatial QUality Evaluator (BRISQUE) for quality measurement [28]. In BRISQUE, a model incorporating pairwise products of neighbouring luminance values is trained using a support vector machine regressor (SVR) and then applied to quantify the quality of distorted images. An unsupervised feature learning framework introduced in [46], namely Codebook Representation for No-Reference Image Assessment (CORNIA), uses unlabelled data available on the Internet for learning Gabor features. CORNIA represents an image employing soft-assignment coding with max pooling. In Local Gradient Patterns (LGP) measure [49], a Gaussian derivative filter is used to decompose an image into gradient magnitude and phase. Then, local statistics are extracted and mapped to subjective scores using SVR.
336
M. Oszust / Information Sciences 482 (2019) 334–349
An analysis of the correlation structure of image gradient orientations can be found in [23]. The method, Oriented Gradients Image Quality Assessment (OG-IQA) index, uses AdaBoosting-backpropagation neural network for learning. In [43], in turn, a framework based on image High Order Statistics Aggregation (HOSA) is presented. HOSA employs statistical differences between a codebook and images for image quality prediction. A semi-supervised and fuzzy framework for NR-IQA is presented in [26]. Here, a Gaussian function models the membership relation between subjective scores and values from a dictionary of limited grades, which are considered as the ground truth. Then, features of labelled and unlabelled images are mapped to the ground truth values. In [45], a simple method trained on images without access to human scores is reported. This Quality-Aware Clustering (QAC) technique is trained on a set of centroids of quality levels that belong to four distortion types. In the work of Zhang et al. [48], an NR-IQA measure is presented which integrates natural image statistics features derived from multiple cues using multivariate Gaussian model obtained from natural images. The Integrated Local Natural Image Quality Evaluator (IL-NIQE) does not require subjective scores for distorted images for training. The development of denoising techniques requires reliable methods for their comparison. This is particularly important in cases in which a pristine version of the denoised image is not available. In the literature, there are only several NR-IQA methods for quality prediction of noisy images, despite their practical significance and high impact of noise on the perceptual image quality. This is also reflected by a large number of noisy images in popular IQA benchmark datasets [16,32,33,41]. The state-of-the-art NR-IQA measures for quality prediction of noisy images are simple and opinion-unaware. For example, Zhu and Milanfar proposed Q-metric in which a fixed threshold is used to differentiate anisotropic patches with strong structure and statistical properties of the singular value decomposition [50]. The authors also indicate that known sharpness metrics can hardly distinguish high-frequency behaviour caused by noise. The use of anisotropy for assessment of blurred and noisy images is first introduced in [9]. Here, generalised Renyi entropy and normalised pseudo-Wigner distribution are applied. Objective scores are obtained measuring the averaged anisotropy of an image by means of pixelwise directional entropy. In the method, a set of distorted images is sorted in order to select the best version of an image. In [15], Kong et al. compared Q-metric to their metric which also takes into account all pixels in the image. The method is very simple and uses the structural similarity (SSIM) index [41] to obtain structure similarity maps of the input image, the denoised image, and their difference. The image quality score is calculated using the linear correlation between maps. A measure for the evaluation of noisy images, which is reported in [12], uses adaptively selected homogeneous image regions and their statistics. A division of the assessed image into blocks of pixels and their further assessment in order to find homogeneous blocks for statistics extraction seems to be intuitive and very popular. For example, Chen et al. in [3] proposed a metric for assessing videos distorted by spatially correlated noise. In the metric, features derived from the power spectral densities and a visual importance pooling are used for the estimation. In [40], Tang et al. proposed an NR measure based on the free energy theory. The method uses a bilateral filter and an auto-regression model to compute the visual saliency. Then, it employs NSS to predict the perceived image quality. The method is tested on images distorted by blur and AWGN, obtaining accuracy comparable to BRISQUE [28] and IL-NIQE [48]. There are also methods for noise level estimation used by denoising algorithms which require the standard deviation of the Gaussian noise as input [7]. However, these techniques do not correlate well with the human perception [40]. The state-of-the-art general-purpose NR measures often consider a wide spectrum of image distortions. This means that in such a measure a feature set used for image content description is selected to cover different distortion types, weakening the capability of a method to perceive noise. Furthermore, many techniques require training in which a feature set is mapped to subjective scores from a learning dataset. The training often introduces a dependence of the measure on a dataset [28,48]. There are only several NR-IQA measures designed for assessment of noisy images. They use anisotropy, correlation between structural similarity maps, or NSS models. Therefore, while notable advances have been achieved in the NR image quality metrics, new metrics are highly demanded for more effective assessment of images affected by noise. In this paper, a novel method for NR-IQA of noisy images is proposed to address the disadvantages of known NR metrics and explore other approaches to the HVS modelling. The method addresses the sensitivity of the HVS to local distortions and irregularities caused by noise [10] using similarity of SURF descriptors extracted from the evaluated image. The comparison of descriptors for patches within an image is novel in NR-IQA. Furthermore, visual saliency models of image patches are used for image quality prediction. In the approach, such models are obtained and evaluated using FR VSI technique, leading to a novel saliency-based NR measure. Both introduced techniques are used jointly, leading to an NR-IQA measure which does not require training and demonstrates very promising performance.
3. Methodology The HVS is biased for processing natural images, which makes it very sensitive to change in the regularities in natural images introduced by noise [10]. Its sensitivity cannot be determined by a simple passive filtering, since it requires active neural interactions which are present even in a visual system with disorders [18]. These findings confirm that the HVS is efficient in the noise detection of a broad range of frequencies [14]. Furthermore, local anisotropies are likely to be changed by image distortions [9], which indicates that the HVS modelling should use a technique which shares some properties with neurones in primary visual cortex [24,36], i.e., invariance to translation, scale or robustness to local distortions. As an example of such technique, Speeded-Up Robust Features method (SURF) [1] is used. The details of modelling the HVS sensitivity to noise with an approach based on local features are shown in Section 3.1.
M. Oszust / Information Sciences 482 (2019) 334–349
337
The visual saliency (VS) has become widely studied phenomena [22], since some parts of the image attract more attention than others. Therefore, many IQA measures model VS, e.g., [19,47]. In this paper, similarities among VS models obtained for image patches are used to determine saliency-based image quality. This part of the introduced NR technique is described in Section 3.2. In Section 3.3, it is shown that these two concepts that mimic the HVS can be used to develop a novel NR measure for the assessment of noisy images. This section also considers their contribution to the results of the method and the influence of their parameters on the performance. 3.1. Local feature descriptor In IQA measures, a variety of gradient-based features are often employed to model NSS [23,27,49,50]. The proposed method incorporates SURF [1] for image patch description. SURF uses an approximation of the determinant of Hessian for interest point detection. However, in the proposed NR approach, interest points are not detected, since their number is often large and some content may cause their uneven distribution in the image. Therefore, in this work, image patches are used for description. Furthermore, the usage of image patches significantly shortens the computation time of this part of the proposed approach. In the proposed NR technique, for an input image I, a grid with NLF disjoint image patches of the same size is created. However, instead of using a grayscale image, its gradient map is employed. The gradient map is obtained with Prewitt operators in horizontal and vertical directions. Such step significantly improves the image quality assessment performance of this part of the measure, along with the usage of evenly distributed image patches instead of detected interest points. Then, following the feature description pipeline present in SURF, for a centre of each n-th patch, n = 1, . . . , NLF , a circular neighbourhood On (x, y) of the size 6s is convolved with two box filters, Dx and Dy , to obtain directional gradients. In the algorithm, s expresses a scale of the described feature, and (x, y) its centre. These two box filters, or Haar wavelet responses, are approximations of the first order derivative operators [1]. Gradients are then weighted using a discrete Gaussian kernel of the standard deviation 2s and summed with gradients of a similar angle. Then, a global orientation θOn of the n-th patch is determined as maximal orientation determined in a π /3° sliding window. Given the orientation of the patch θOn , its descriptor is computed. At first, a 20s grid is selected and divided into 16 disjoint blocks, Pi , i = 1, . . . , 16. Each i-th block is further divided into 5 × 5 regions, which define locations of sampling points used for convolution with Dx and Dy box filters. Gradients in this step are rotated using the global orientation θOn and weighted with the Gaussian kernel of the standard deviation of 3.3s [1]. The weighted gradients are summed and used for the description of a patch (Eq. (1)):
Fn = [ Dix=1 , . . . , Dix=16 , Diy=1 , . . . , Diy=16 ,
|Dix=1 |, . . . , |Dix=16 |, |Diy=1 |, . . . , |Diy=16 |],
(1)
where Dix and Diy denote sums of Haar wavelet responses obtained for sampling points within i-th block in horizontal and vertical directions, respectively. Finally, a 64-dimensional real-valued descriptor is created. In a 128 version of the descriptor, better distinctiveness is provided by separating sums for positive and negative responses [1]. Given the vector of NLF descriptors for the assessed image, they are compared (matched) with each other using the Euclidean distance. There are U = NLF C2 distance computations. Finally, the quality of the image I, obtained with the approach based on local features (LF) is computed as the mean of obtained distances, as shown in Eq. (2):
QLF = mean([quLF=1 , . . . , quLF=U ] ),
(2)
where = Pk − Pl , and {k, l} denote two different compared patches, k = l, {k, l } = 1, . . . , 16. In a typical application of local features, they are matched with features in another image to establish correspondences between them. However, in an NR measure, only a degraded image is available. Therefore, in this work, a novel method is proposed for expressing the image degradation caused by noise using the similarities among the features. In the introduced approach, the 128-dimensional version of SURF is used. The scale of the described feature s is 0.1w, where w denotes the length of the patch side. Typically, SURF uses the 20s image patch weighted with the Gaussian kernel for a feature description and the 6s patch for determination of its dominant orientation. However, 0.1w scale is selected to describe all pixels that belong to the patch and its near-boundary. Setting the scale that corresponds to the patch size would reduce the contribution of the pixels located closer to the patch boundaries to the description due to the Gaussian weighting. quLF
3.2. Visual saliency model The visual saliency is modelled using the computational pipeline of VSI measure [47]. VSI is based on the concept that suprathreshold distortions affect the visual saliency. Since the technique proposed in this paper does not use reference images and VSI is an FR-IQA technique, a comparison between visual saliency models of image patches is employed. In contrast to the measure introduced in [15] which simply uses FR-IQA SSIM technique for comparison of distorted and enhanced versions of an input image, this VSI-based approach uses a self-similarity for image assessment. The adaptation of VSI to the NR-IQA is among the contributions of this work.
338
M. Oszust / Information Sciences 482 (2019) 334–349
Fig. 1. Processing flow of the proposed method.
This part of the introduced NR measure (QVS ) obtains the perceptual quality of an image, calculating pairwise VSI similarities between NVS image patches. For two image patches, Pi and Pj (i = j, {i, j = 1, . . . , NV S }), the following features are computed: VS maps based on GBVS model [47] (VSi , VSj ), gradient maps (GMi , GMj ) using Sharr operator and gradients from channels in an opponent colour space (Mi , Mj ). Then, a similarity of a pixel location in both patches is calculated using a combination of three components, SVS (x), SGM (x), and SM (x) [47]:
S(Pi , Pj ) = SV S (x )[SGM (x )]a [SM (x )]b ,
(3)
where a and b are parameters used to adjust the importance of the components, and x denotes a pixel location in both patches. Finally, VSI(Pi , Pj ) is computed as follows:
V SI (Pi , Pj ) =
S(Pi , Pj )max(V Si , V S j )
x∈Pi, j
x∈Pi, j
max(V Si , V S j )
.
(4)
U ], where qu = V SI (P , P ), and There are U = NV S C2 pairwise VSI patch similarities, stored in a vector qV S = [qVu=1 , . . . , qVu= i j S S VS {i, j} denote two compared patches (i = j, {i, j} = 1, . . . , NV S ). Finally, QVS is calculated as:
QV S =
max(qV S ) . min(qV S )
(5)
3.3. Proposed NR technique The proposed approach is composed of two parts that refer to different concepts present in the HVS. The sensitivity of the HVS to small changes in local image neighbourhoods is reflected by QLF and the visual saliency is taken into account in QVS . The values of both components are normalised to [0, 1] range. They equally contribute to the introduced Quality Evaluator of Noisy Images (QENI), i.e., the predicted score for a noisy image is computed as the sum of QLF and QVS . Such simple pooling ensures that the resulted QENI has fewer parameters while achieving acceptable performance. Furthermore, it does not depend on any training dataset. The performances of various pooling methods are shown in Section 4.4. The illustration of the computational scheme of the proposed NR measure can be seen in Fig. 1. Fig. 2, in turn, shows exemplary images from CSIQ dataset [16] that are corrupted with AWGN and additive Gaussian pink noise (AGPN). For each image, differential mean opinion scores (DMOS), as well as objective opinion scores for QLF , QVS , and QENI, are presented. Pearson correlation coefficient (PCC) between DMOS for all noisy images in CSIQ and QLF , QVS , and QENI for AWGN is calculated. The methods obtain 0.9887, 0.9974, and 0.9926, respectively. For AGPN, in turn, PCC for QLF is 0.9993, for QVS 0.8959, and 1 for QENI. In practice, isolated distortions are rarely considered. Therefore, it is important to calculate correlations taking into account all distorted versions of an image. In the case of the image presented in Fig. 2, PCC for QLF is 0.8913, 0.7219 for QVS , and 0.863 for QENI. As illustrated in this example, QVS is better correlated with DMOS than QLF for AWGN and worse for AGPN. QENI obtains results closer to the higher value. However, taking into account all noisy images, QLF outperforms QVS and QENI is better correlated with DMOS than them both. To show that the usage of evenly distributed image patches instead of detected interest points improves the quality assessment of QLF , a version of QLF in which SURF technique performs interest point detection is used. Interestingly, SRCC
M. Oszust / Information Sciences 482 (2019) 334–349
339
Fig. 2. Exemplary distorted images in CSIQ dataset for AGWN (a-e) and AGPN (f-j). DMOS, QLF , QVS , and QENI scores are presented under each image.
to DMOS for CSIQ calculated for QLF with NLF = 169 is 0.7053 and with 169 strongest interest points is 0.4187. Furthermore, in QLF , as described in Section 3.1, a grayscale image is replaced by its gradient map to improve the quality prediction performance. Here, SRCC for QLF in which the gradient map is not used is only 0.5250. These two findings justify the usage of image patches and the gradient map in QLF . The sensitivity of the introduced technique to changes of the standard deviation of AWGN (σ ) is shown using an exemplary image. That is, for each σ , the noise is introduced ten times and then the mean values for QENI and its contributing parts are reported in Fig. 3. It can be seen that QVS is better suited to assess the severity of AWGN than QLF . However, considering the better correlation with subjective scores of QLF , both techniques seem to complement each other. These examples show that QLF and QVS can also be used separately, since they seem to be good indicators of noisiness and they correlate well with subjective scores. These findings justify the development of QLF and QVS , as well as their joint usage as QENI technique. 4. Experimental results and discussions 4.1. Protocol and benchmark datasets The typical protocol for comparison of IQA measures involves image databases which contain reference images, their distorted versions, and subjective opinion scores obtained in tests with human subjects. Such scores are reported as MOS or DMOS. In order to measure consistency of the output of an IQA measure with subjective scores, the following four indices are used [38]: Spearman Rank order Correlation Coefficient (SRCC), Kendall Rank order Correlation Coefficient (KRCC), PCC, and Root Mean Square Error (RMSE). SRCC between objective scores Q and subjective scores S is calculated as:
SRCC(Q , S ) = 1 −
2 6 m i=1 di , 2 m (m − 1 )
(6)
where di is the difference between i-th image in Q and S, and m is the total number of images in a dataset. KRCC uses the number of concordant and discordant pairs (mc , md ) in the dataset, as illustrated by Eq. (7).
KRCC(Q , S ) =
mc − md . 0.5m ( m − 1 )
(7)
PCC and RMSE, in turn, use a vector Qp obtained after a nonlinear mapping between Q and S:
Q p = β1
1 2
−
1 1 + exp(β2 (Q − β3 ))
+ β4 Q + β5 ,
(8)
340
M. Oszust / Information Sciences 482 (2019) 334–349
Fig. 3. Sensitivity of QLF , QVS , and QENI to AWGN levels.
where β = [β1 , β2 , . . . , β5 ] are parameters of the regression model to be fitted [38]. Then, PCC is calculated as:
PCC(Q p , S ) =
T Q¯p S¯ T T Q¯p Q¯p S¯ S¯
,
(9)
where, Q¯p and S¯ denote mean-removed vectors. Eq. (10) presents the calculation of RMSE.
RMSE(Q p , S ) =
( Q p − S )T ( Q p − S ) m
.
(10)
The better IQA measure has smaller RMSE values and higher values of SRCC, KRCC, and PCC than other measures. These four indices are calculated considering all images in a given dataset. In most practical applications of NR-IQA measures, they are used to compare distorted versions of an image produced by different algorithms. This applies to a wide range of approaches, including image denoising, enhancement, or restoration algorithms, in which tests with human subjects are replaced by IQA measures [4,20,42,44]. Furthermore, subjective scores for large image datasets designed for IQA purposes are obtained in tests which include a pairwise comparison of distorted versions of an image [32,33]. Therefore, in this paper, a new protocol for evaluation of IQA algorithms is proposed. In this image-oriented protocol, SRCC, KRCC, PCC, and RMSE values are obtained for subsets that contain all distorted versions of the image. Then, their mean values are reported. In sections with an experimental evaluation of QENI and other methods, both protocols are used. Furthermore, in order to evaluate the statistical significance of results obtained by compared techniques, hypothesis tests based on the prediction residuals of each measure after the non-linear mapping are conducted using the left-tailed F-test [16,25]. In the test, smaller residual variance denotes the better prediction. In a pair of compared measures, a measure which is statistically better on a given dataset than the other measure with a confidence greater than 95% obtains a score
M. Oszust / Information Sciences 482 (2019) 334–349
341
Table 1 IQA benchmark image datasets. Benchmark
No. of ref. images
No. of dist. images
No. of distortions
No. of used noisy images
TID2013 TID2008 CSIQ
25 25 30
30 0 0 1700 866
24 17 6
1125 700 300
“1”. Consequently, scores “0” or “-1” are assigned to the indistinguishable, or worse measure, respectively. Finally, a table with scores is presented. Three large popular IQA benchmark datasets are used to evaluate the introduced QENI technique: TID2013 [32], TID2008 [33], and CSIQ [16]. Table 1 presents the number of reference images and distortion levels for each dataset. However, only a subset of these images is affected by noise. Therefore, only several distortion types are taken into account in this study. Specifically, the following distortions are used for TID2013 dataset: AWGN, additive noise in colour components that is more intensive than additive noise in the luminance component, spatially correlated noise, masked noise, high-frequency noise, impulse noise, non-eccentricity pattern noise, multiplicative Gaussian noise (MGN), and comfort noise (CN). There are 1125 images used in experiments. TID2008 contains 700 distorted images that share distortion types with TID2013, except MGN and CN. CSIQ dataset, in turn, contains images distorted with AWGN and AGPN (300 images in total). Since some compared state-of-the-art NR-IQA measures are trained on LIVE [41] dataset, it is only used in the approach to an automatic denoising algorithm selection, as shown in Section 5.
4.2. Implementation details There are two key parameters in QENI, the number of patches described with local features NLF and obtained VS models NVS . In order to demonstrate the sensitivity of QENI to the variations of these two parameters, an experiment with images that belong to CSIQ dataset is performed. The range of NLF in the experiment is from 1 to 400 and the range of NVS is from 1 to 25. Due to long computation time of VSI, NVS is much smaller than NLF . The implementation of QVS employs a look-up table in which saliency models of compared patches within an image are stored to avoid redundant and timeconsuming object creation caused by external execution of the VSI code. The results, in terms of SRCC, for the standard (dataset-oriented) protocol are presented in Fig. 4. It is shown that SRCC for QLF if NLF is larger than 100 is almost unchanged and the performance of QVS is maximal with 25 patches. A larger number of image patches in QLF can be explained by the need of capturing local image variations with this technique. In the experiments presented in the subsequent parts of this work, QENI parameters are set to NLF = 169 and NV S = 9. They provide acceptable results in reasonably short computation time.
4.3. Performance evaluation There are only a few IQA measures that are designed for the perceptual assessment of noisy images with accessible source code. Therefore, QENI is compared with the recently introduced technique developed by Kong et al. [15] and the method designed by Gabarda and Cristobal [9]. It is worth noticing that the first method selects which of two degraded versions of an image is better, which makes it closer to FR-IQA measures. General-purpose NR measures are often employed to assess images with a variety of distortion types, including noise of different origin. Thus, the following state-of-theart measures are compared with QENI: BRISQUE, IL-NIQE, HOSA, and OG-IQA. Interestingly, IL-NIQE is reported [48] to outperform BIQI, BRISQUE, BLIINDS2, DIIVINE, CORNIA, NIQE, and QAC. HOSA, in turn, outperforms them all and IL-NIQUE, as shown in [43]. OG-IQA is added to the comparison, since it was not compared with HOSA and IL-NIQE [23]; BRISQE is used due to its popularity. S-index [17] is applied as a recently introduced sharpness measure. BRISQUE, HOSA, and OG-IQA are opinion-aware, learning-based approaches [48]. Finally, QENI is compared with seven state-of-the-art NR measures. The QENI components (QLF and QVS ) are also added to the evaluation to show that they can be used separately. In practice, the distortion type and its severity is not known in advance. Therefore, in order to ensure a fair comparison between opinionaware and blind NR measures, the implementations of learning-based methods which are pre-trained by their authors on external datasets are used. All methods run with default parameter values. Table 2 summarises the experimental results obtained with the dataset-oriented protocol, while the results for the proposed image-oriented protocol are shown in Table 3. In both tables, the best two results are marked in boldface. As it is shown in Table 2, QENI outperforms other measures on CSIQ database by a large margin and is listed among two best techniques on TID2013 and TID2008 image datasets. The approach proposed in [15] and OG-IQA also perform well on TID2008. IL-NIQE obtained the best results on TID2013, followed by QENI and the measure of Kong et al. [15]. It is observed from overall results that the proposed method outperforms other approaches, in terms of SRCC and KRCC, and is among two best measures, taking into account PCC and RMSE. The image-oriented protocol reveals the superior performance of QENI to
342
M. Oszust / Information Sciences 482 (2019) 334–349
Fig. 4. Influence of parameters NLF , NVS on SRCC obtained for noisy images in CSIQ dataset.
other techniques, as shown in Table 3. Here, the proposed measure obtains the best performances in all tests. This means that QENI is able to better assess distorted versions of an image than other techniques. It is observed from Tables 2 and 3 that QLF is often listed among the best two compared measures, with performance better than IL-NIQE on CSIQ and TID2008 datasets, according to the dataset-oriented evaluation protocol, and with close performance to QENI shown in tests for the image-oriented protocol. The advantages of QVS , in turn, are more visible for the image-oriented protocol. Specifically, it is only outperformed by QENI and QLF . The statistical significance of results obtained by compared techniques is evaluated using the left-tailed F-test (see Section 4.1). In the test, a measure obtains “1”, “0”, or “-1” if it is significantly better, indistinguishable, or worse than a second measure in the pair. To save space, the summary of statistical significance tests for both protocols is shown in Table 4 as sums of scores obtained for all image datasets. Tests are performed between measures in rows against measures in columns. The last column contains the sum of obtained scores, indicating the best measure in overall. It can be seen that measures compared using the first protocol are worse than QENI. Furthermore, significance tests with the image-oriented protocol show that the introduced technique is significantly better than other measures for all datasets. QENI is also better than QLF and QVS , and its components outperform other measures according to the image-oriented protocol. For the datasetoriented protocol, QLF is worse than QENI, and QVS is outperformed by QENI, QLF , and IL-NIQE. The statistical significance tests confirm previously obtained findings.
4.4. Evaluation of pooling methods There are many ways in which QLF and QVS can be combined. In order to evaluate the impact of such methods on the results obtained by QENI, they are described as linear or nonlinear regression models. Then, a least-squares fit of MOS or
M. Oszust / Information Sciences 482 (2019) 334–349
343
Table 2 Evaluation on IQA datasets using the dataset-oriented protocol. IL-NIQE
OG-IQA
HOSA
BRISQUE
[9]
S-index
[15]
QENI
QLF
QVS
TID2013, SRCC KRCC PCC RMSE
1125 images 0.5201 0.0854 0.3704 0.0752 0.5856 0.3426 0.7679 0.8900
0.3845 0.2688 0.3585 0.8843
0.1729 0.1305 0.3065 0.9018
0.0098 0.0060 0.0243 0.9470
0.3635 0.2474 0.3397 0.8910
0.4091 0.2774 0.3599 0.8838
0.4922 0.3540 0.3422 0.8901
0.4551 0.3225 0.3723 0.8792
0.2454 0.1670 0.1468 0.9370
TID20 08, SRCC KRCC PCC RMSE
70 0 images 0.3539 0.0347 0.2480 0.0049 0.3816 0.3985 0.8769 0.8701
0.2283 0.1702 0.2551 0.9173
0.0386 0.0443 0.3409 0.8918
0.0037 0.0024 0.0458 0.9477
0.3295 0.2238 0.3119 0.9013
0.4341 0.2993 0.3943 0.8718
0.5322 0.3859 0.3353 0.8938
0.4747 0.3345 0.3589 0.8855
0.3245 0.2246 0.1720 0.9345
CSIQ, 300 images SRCC 0.4598 KRCC 0.3104 PCC 0.4418 RMSE 0.1904
0.1352 0.1163 0.1622 0.2094
0.1036 0.1010 0.1598 0.2095
0.1461 0.1384 0.1894 0.2084
0.0916 0.0623 0.1066 0.2110
0.3793 0.2520 0.3694 0.1972
0.3899 0.2630 0.2860 0.2034
0.6964 0.4925 0.6727 0.1570
0.7053 0.5116 0.6618 0.1591
0.3899 0.2630 0.4006 0.1945
Overall weighted SRCC 0.4568 KRCC 0.3216 PCC 0.4981 RMSE 0.7223
0.0757 0.0578 0.3355 0.7874
0.2934 0.2126 0.2964 0.7999
0.1249 0.1032 0.3013 0.8006
0.0193 0.0128 0.0430 0.8433
0.3545 0.2403 0.3347 0.7964
0.3992 0.2727 0.3608 0.7838
0.5342 0.3841 0.3866 0.7878
0.4969 0.3531 0.4088 0.7796
0.2919 0.1995 0.1909 0.8314
Overall average SRCC 0.4446 KRCC 0.3096 PCC 0.4697 RMSE 0.6117
0.0851 0.0655 0.3011 0.6565
0.2388 0.1800 0.2578 0.6704
0.1192 0.1044 0.2789 0.6673
0.0350 0.0236 0.0589 0.7019
0.3574 0.2411 0.3403 0.6632
0.3747 0.2565 0.3467 0.6530
0.5736 0.4108 0.4501 0.6470
0.5450 0.3895 0.4643 0.6413
0.3199 0.2182 0.2398 0.6887
Table 3 Evaluation on IQA datasets using the image-oriented protocol. IL-NIQE
OG-IQA
HOSA
BRISQUE
[9]
S-index
[15]
QENI
QLF
QVS
TID2013, SRCC KRCC PCC RMSE
1125 images 0.5324 0.1073 0.4030 0.0999 0.6335 0.4343 0.7170 0.8372
0.4043 0.3002 0.4264 0.8445
0.1830 0.1495 0.4624 0.8235
0.1521 0.1185 0.2322 0.9039
0.5001 0.3611 0.5221 0.7913
0.4752 0.3396 0.5323 0.7776
0.6586 0.5189 0.7091 0.6495
0.6020 0.4637 0.6277 0.7122
0.4843 0.3617 0.5277 0.7819
TID20 08, SRCC KRCC PCC RMSE
70 0 images 0.3634 0.1018 0.2722 0.0645 0.4128 0.4553 0.8532 0.8317
0.2432 0.2006 0.2962 0.8944
0.0864 0.0738 0.4944 0.7970
0.1902 0.1370 0.2433 0.9026
0.3955 0.2890 0.4363 0.8368
0.5283 0.3991 0.5244 0.7667
0.6633 0.5384 0.6785 0.6732
0.5796 0.4481 0.5853 0.7422
0.5698 0.4312 0.6089 0.7321
CSIQ, 300 images SRCC 0.4920 KRCC 0.3818 PCC 0.4445 RMSE 0.1825
0.2160 0.1787 0.1797 0.2016
0.2076 0.1536 0.2747 0.1933
0.2733 0.2160 0.2712 0.1949
0.2809 0.2382 0.2305 0.1979
0.6289 0.4769 0.6593 0.1534
0.4185 0.3344 0.4398 0.1779
0.8704 0.7246 0.8770 0.0965
0.8964 0.7676 0.8994 0.0875
0.7993 0.6445 0.7738 0.1234
Overall weighted SRCC 0.4710 KRCC 0.3569 PCC 0.5341 RMSE 0.6864
0.1208 0.0994 0.4053 0.7457
0.3235 0.2467 0.3621 0.7690
0.1639 0.1340 0.4459 0.7260
0.1828 0.1415 0.2356 0.8038
0.4838 0.3537 0.5132 0.7162
0.4847 0.3585 0.5166 0.6893
0.6900 0.5544 0.7227 0.5792
0.6362 0.5015 0.6521 0.6339
0.5569 0.4245 0.5892 0.6725
Overall average SRCC 0.4626 KRCC 0.3523 PCC 0.4969 RMSE 0.5842
0.1417 0.1144 0.3564 0.6235
0.2850 0.2181 0.3324 0.6441
0.1809 0.1464 0.4093 0.6051
0.2077 0.1646 0.2353 0.6681
0.5082 0.3757 0.5392 0.5938
0.4740 0.3577 0.4988 0.5741
0.7308 0.5940 0.7549 0.4731
0.6927 0.5598 0.7041 0.5140
0.6178 0.4791 0.6368 0.5458
DMOS that belong to a training dataset to the corresponding objective scores (QLF , QVS ) is used to determine their coefficients (parameters) [31]. Finally, the models are assessed using their predicted scores provided for a testing dataset. The following pooling methods are compared: weighted sum, weighted product, weighted mean, weighted harmonic mean, product,
344
M. Oszust / Information Sciences 482 (2019) 334–349 Table 4 Summary of statistical significance tests on CSIQ, TID2008 and TID2013 datasets. IL-NIQE
OG-IQA
HOSA
BRISQUE
[9]
S-index
[15]
QENI
QLF
QVS
Sum
Dataset-oriented protocol IL-NIQE 0 0 OG-IQA 0 0 HOSA 0 0 BRISQUE 0 0 [9] −1 −1 S-index 0 0 [15] 0 0 QENI 1 1 0 1 QLF −1 0 QVS
0 0 0 0 0 0 1 1 1 0
0 0 0 0 0 0 0 1 1 0
1 1 0 0 0 0 2 1 2 0
0 0 0 0 0 0 0 1 1 0
0 0 −1 0 −2 0 0 1 1 0
−1 −1 −1 −1 −1 −1 −1 0 0 −1
0 −1 −1 −1 −2 −1 −1 0 0 −2
1 0 0 0 0 0 0 1 2 0
1 −1 −3 −2 −7 −2 1 8 9 −4
Image-oriented protocol IL-NIQE 0 2 OG-IQA −2 0 HOSA −1 −1 BRISQUE 0 0 [9] −3 −2 S-index 0 2 [15] 0 2 QENI 3 3 2 3 QLF 1 3 QVS
1 1 0 1 −1 2 2 3 3 3
0 0 −1 0 −2 0 1 3 3 3
3 2 1 2 0 3 3 3 3 3
0 −2 −2 0 −3 0 0 3 3 2
0 −2 −2 −1 −3 0 0 3 2 2
−3 −3 −3 −3 −3 −3 −3 0 −1 −3
−2 −3 −3 −3 −3 −3 −2 1 0 −2
−1 −3 −3 −3 −3 −2 −2 3 2 0
0 −12 −15 −7 −23 −1 1 25 20 12
Table 5 Comparison of pooling methods. Training dataset
Testing dataset CSIQ
CSIQ TID2013 CSIQ TID2013 –
TID2013
Weighted sum (Q1 ) – 0.5150 0.5345 – Weighted mean (Q3 ) − 0.5201 0.3761 − Product (Q5 ) 0.6647 0.4631
CSIQ
TID2013
Weighted product (Q2 ) – 0.5382 0.4384 – Weighted harmonic mean (Q4 ) – 0.5323 0.6685 − Sum (Q6 , QENI) 0.6964 0.4922
Note: SRCC values are reported. The results for the pooling used in QENI are written in bold.
and sum. They are expressed as Q1 , Q2 , . . . , Q6 in Eq. (11), where QENI = Q6 , and δ i , γ i (i = 1, . . . , 4) are parameters of these methods.
Q1 =
δ1 QLF + γ1 QV S , γ
δ2 Q2 = QLF × QV S2 ,
Q3 =
δ3 QLF + γ3 QV S , δ3 + γ3 δ4 + γ4
Q4 = δ , 4 + QγV4S QLF
Q5 = QLF × QV S , Q6 = QLF + QV S .
(11)
The best pooling method should demonstrate good performance for a dataset which is not used for training. Therefore, parameters of pooling methods are determined using one dataset and tested on another. CSIQ and TID2013 datasets are used for training the regression models of pooling methods and their evaluation. The cross-dataset results, in terms of SRCC, are shown in Table 5. The product and the sum of QLF and QVS are also evaluated and presented. The presented cross-dataset results reveal the lack of generalisation capabilities of more complex methods, which obtained much worse SRCC values on CSIQ in comparison to the methods with the equal contribution of components. Here, the models try to express the diversity of distortion types present in TID2013, leading to the decreased performance on CSIQ with two distortion types. In general, the weighted harmonic mean and the sum are better than other pooling methods. Since in practical applications a suitable learning dataset cannot be easily determined, it is advisable to use a method
M. Oszust / Information Sciences 482 (2019) 334–349
345
Table 6 Computational complexity and run-time comparison of evaluated IQA measures. Measure
Computational complexity
Run-time (s)
IL-NIQE OG-IQA HOSA BRISQUE [9] S-index [15] QENI QLF QVS
O(K (d + h + gh )) O(Kd) O(Kd2 c) O(Kd2 ) O(Kwo) O(d log(d)) O(Kd2 ) 2 + NV2S d2 ) O(NLF 2 ) O(NLF O(NV2S d2 )
22.196 10.184 0.973 0.871 5.515 0.476 0.254 0.764 0.163 0.624
2
Note: K: number of pixels, d: patch size, h: filter size, g: log-Gabor filter size, c: codebook size, w: window size, and o: number of orientations.
which produces acceptable results and does not require training using subjective scores. Such results can be obtained with the sum of QLF and QVS , and therefore it is used as the pooling technique in QENI. 4.5. Computational complexity The computational complexity of the proposed NR measure depends on complexities of its parts for the perceptual quality prediction using approaches based on local features and visual saliency models. The first part in which an image patch is described using SURF technique is computed in constant time for a patch. Then, the pairwise distance computations between 128-dimensional floating-point vectors are performed, leading to the quadratic dependence on NLF of this step. 2 ). In the second part of QENI, Q , for each pair of Therefore, the computational complexity of QLF is of the order of O(NLF VS image patches, their visual saliency maps and other supporting objects are created. Then, they are compared using VSI metric. Since the computation time of QVS depends on the size (d) of compared patches and there are NV S C2 pairs of patches, its computational complexity is O(NV2 S d2 ). Table 6 contains the complexity and run-time comparison of evaluated IQA measures. The run-time is measured in seconds as the average time taken to assess an image in CSIQ dataset. All presented experiments were run on a CPU with Intel Core i5-5200u 2.2 GHz, 8 GB RAM, Microsoft Windows 7, and Matlab R2012a. The run-time of QLF is shorter than the time obtained by the approach of Kong et al. [15]. However, due to longer computation time of saliency models in QVS , QENI can be placed between BRISQUE and S-index. S-index and the method introduced in [15] are less computationally demanding, but they show inferior performance in comparison to QENI. Taking into account complexities of the best performing NR measures, the implementation of QENI is 13.3 times faster than OG-IQA and 29 times faster than IL-NIQE. Furthermore, most of the computation time (80%) for QENI is spent on description of image patches which can be performed independently. This means that in a production environment the image assessment with QENI can be accelerated. Since the introduced measure can be, for example, used for an automatic selection of the best denoising algorithm, as shown in Section 5, it is worth noticing that its computation time is also much shorter than time spent on denoising. 5. Application of QENI to automatic selection of denoising algorithm The proposed measure addresses the problem of the perceptual quality prediction of noisy images. Assuming that the state-of-the-art denoising techniques are not always able to successfully restore an image due to their distortion-specific design, QENI could be used in this problem as an approach to aggregate several such restoration algorithms. In general, the joint usage of denoising algorithms should be able to provide better results than it can be achieved with a single-input technique. However, this requires an IQA measure that selects the best version of the restored image. In order to evaluate the suitability of compared IQA measures to such application, a dataset of restored images has been prepared. At first, 59 pristine images from LIVE and CSIQ datasets are distorted introducing three types of noise: Gaussian noise with the standard deviation σ = 50, Gaussian noise mixed with salt-and-pepper noise (σ = 50, the density of the second noise is 5%), and speckle noise. The standard deviation of the multiplying factor in speckle noise is 0.02. The following ten state-ofthe-art denoising algorithms are applied to the resulted dataset of 177 noisy images: PCLR [4], PNLM [42], ROF [34], Wiener [21], Median [21], NLM [6], IMTG [39], BM3D [5], PGPD [44], and WESNR [13]. An exemplary pristine image, its corrupted derivatives, as well as restoration results obtained by three denoising algorithms, are shown in Fig. 5. Finally, 1770 restored images are obtained. The size of this dataset exceeds the size of the subset of noisy images in used IQA image benchmarks and is larger than TID2008 or CSIQ (see Table 1). In order to compare the performance of evaluated IQA measures on this dataset, FR-IQA SFF measure [2] is used as a proxy for subjective scores. A similar approach to the creation of a benchmark for six denoising algorithms can be found in [20]. In contrast, in this work, more denoising algorithms are applied and the noise level is more severe to make the denoising problem more difficult. It is worth noticing
346
M. Oszust / Information Sciences 482 (2019) 334–349
Fig. 5. Exemplary image (a), its three distorted versions (b-d), and images restored using three denoising algorithms: BM3D, PCLR, and PGPD (e).
Table 7 Evaluation of quality prediction performance on the dataset of 1770 denoised images. Dataset-oriented protocol
IL-NIQE OG-IQA HOSA BRISQUE [9] S-index [15] QENI QLF QVS
Image-oriented protocol
SRCC
KRCC
PCC
RMSE
SRCC
KRCC
PCC
RMSE
0.5358 0.0111 0.0812 0.1094 0.2237 0.5066 0.3053 0.5835 0.5443 0.5792
0.3594 0.0068 0.0569 0.0738 0.1501 0.3391 0.2160 0.3967 0.3616 0.4007
0.3746 0.3543 0.1150 0.0881 0.1880 0.3789 0.2992 0.4197 0.3379 0.4042
0.0390 0.0394 0.0418 0.0419 0.0413 0.0390 0.0402 0.0382 0.0396 0.0385
0.5641 0.1433 0.1225 0.1225 0.3054 0.5330 0.4009 0.5904 0.5418 0.6704
0.3940 0.1057 0.0853 0.0800 0.2220 0.3634 0.2767 0.3997 0.3597 0.4960
0.5277 0.5343 0.2855 0.3802 0.5133 0.4732 0.5472 0.5474 0.4822 0.6389
0.0321 0.0308 0.0363 0.0347 0.0325 0.0336 0.0310 0.0317 0.0328 0.0290
that SFF used in this work is significantly better correlated with the human assessment on LIVE and CSIQ datasets than popular PSNR, SSIM, or VSI, which are extensively used to assess processed images [2,31]. This justifies its usage in the presented experiments. The evaluation of quality prediction of compared IQA measures on the dataset of restored images is performed in terms of SRCC, KRCC, PCC, and RMSE. In these tests, both dataset- and image-oriented protocols are employed. The results are shown in Table 7. In the table, the best two results are written in bold. As shown, QENI outperforms other techniques in the dataset-oriented protocol and is the second best measure in the image-oriented-protocol, where QVS obtained better results. In both protocols, QLF obtained better or slightly worse results than IL-NIQE. Since QENI demonstrates its superiority on this dataset, it is worth to examine its performance in the automatic selection of the denoising algorithm, or in other words, in the selection of the best restored image. In this test, SFF scores are assigned
M. Oszust / Information Sciences 482 (2019) 334–349
347
Table 8 SRCC values for IQA algorithms or denoising techniques for three noise types. SRCC Mixed
Speckle
Overall
Denoising techniques PCLR [4] 0.9663 PNLM [42] 0.8922 ROF [34] 0.5652 Wiener [21] 0.5720 Median [21] 0.4387 NLM [6] 0.3829 IMTG [39] 0.2887 BM3D [5] 0.9917 0.9999 PGPD [44] WESNR [13] 0.4461
Gaussian
0.9194 0.8562 0.6384 0.6669 0.5477 0.5325 0.4588 0.9208 0.8809 0.2620
0.7596 0.7669 0.5180 0.7018 0.8702 0.9051 0.9235 0.6503 0.7615 0.7466
0.8818 0.8384 0.5739 0.6469 0.6189 0.6068 0.5570 0.8543 0.8808 0.4849
IQA measures IL-NIQE OG-IQA HOSA BRISQUE [9] S-index [15] QENI QLF QVS
0.9192 0.2601 0.1830 0.6760 0.4449 0.8891 0.2096 0.9175 0.9185 0.8472
0.6112 0.8070 0.7779 0.8064 0.6982 0.6921 0.5781 0.9205 0.8276 0.8462
0.8110 0.5312 0.4528 0.7915 0.5022 0.7437 0.4126 0.9348 0.9041 0.8486
0.9027 0.5264 0.3974 0.8922 0.3636 0.6500 0.4501 0.9663 0.9663 0.8523
to the restored images which are selected by an IQA algorithm or obtained with a denoising technique. Therefore, IQA algorithms and denoising techniques can be fairly evaluated together using SRCC between assigned values and the values for the best restoration images selected with SFF. SRCC, which measures the monotonic association between two variables, is used to determine the technique which selects or produces images that are better correlated with subjective scores simulated by SFF. In the test, the dataset-oriented protocol is employed. However, in order to outperform other measures in this test, an IQA measure should be able to select the best image among several restored versions of the distorted image. Table 8 shows the obtained results for NR-IQA measures and single-input denoising techniques. The results for the best IQA measures are written in bold and for the best denoising methods are written in italics. It can be seen that SFF selected images denoised by PGPD, BM3D, and PCLR as of the best perceptual quality. Images produced by these techniques are also often selected by QENI and IL-NIQE. It is worth noticing that these well-performing denoising techniques require the standard deviation of the Gaussian noise. Nevertheless, they are used in these experiments to show a broad spectrum of methods that can be used in the automatic selection of the denoising algorithm. Noise level for these techniques, if needed, can be provided employing a different approach, e.g., [7]. For mixed noise, in turn, IL-NIQE and QENI also correctly selected the best quality images among outputs of PCLR and BM3D. In the case of speckle noise, other denoising techniques demonstrate their superiority to the related approaches, i.e., NLM and IMTG. Their images are also correctly assessed by QENI. In overall, PCLR and PGPD perform better than IL-NIQE and BRISQUE. However, QENI shows stable performance across different types of noise and it outperforms all compared techniques. Taking into account results obtained by QLF and QVS , it can be noticed that they also show promising performance.
6. Conclusions This paper presents a novel, no-reference image quality approach for perceptual quality assessment of noisy images. The approach does not require training. In the method, gradient-based local features and visual saliency models of image patches are used to provide predicted scores that are highly correlated with human evaluation. The results of extensive experiments conducted on three large IQA datasets with noisy images using dataset- and image-oriented protocols, reveal that the proposed technique outperforms the state-of-the-art NR perceptual noisiness measures, as well as recently introduced general-purpose approaches. Furthermore, the QENI components, QLF and QVS , can be used separately, since they provide promising performance. The introduced measure is also more suitable than other NR-IQA measures for the assessment of results of the state-ofthe-art denoising techniques. Furthermore, QENI demonstrates superior performance in a practical application in which NR techniques are used to automatically select which restoration algorithm produces the best restored image. Future works will consider the suitability of QENI to support denoising techniques or incorporation of other the HVSbased findings into the approach. The Matlab code of the proposed approach is available at http://www.marosz.kia.prz.edu.pl/QENI.html.
348
M. Oszust / Information Sciences 482 (2019) 334–349
References [1] H. Bay, T. Tuytelaars, L.V. Gool, SURF: Speeded-up Robust Features, in: Proc. European Conf. on Computer Vision (ECCV), Springer, 2006, pp. 404–417. [2] H.W. Chang, H. Yang, Y. Gan, M.H. Wang, Sparse feature fidelity for perceptual image quality assessment, IEEE T. Image Process. 22 (10) (2013) 4007–4018. [3] C. Chen, M. Izadi, A. Kokaram, A no-reference perceptual quality metric for videos distorted by spatially correlated noise, ACM Multimedia 2016, Amsterdam, The Netherlands (2016). [4] F. Chen, L. Zhang, H. Yu, External patch prior guided internal clustering for image denoising, 2015 IEEE Int. Conf. Comput. Vision (ICCV) (2015) 603–611. [5] K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian, Image denoising by sparse 3-d transform-domain collaborative filtering, IEEE Trans. Image Process. 16 (8) (2007) 2080–2095. [6] J. Darbon, A. Cunha, T.F. Chan, S. Osher, G.J. Jensen, Fast nonlocal filtering applied to electron cryomicroscopy, in: 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2008, pp. 1331–1334. [7] L. Dong, J. Zhou, Y.Y. Tang, Noise level estimation for natural images based on scale-invariant kurtosis and piecewise stationarity, IEEE Trans. Image Process. 26 (2) (2017) 1017–1030. [8] F. Fan, Y. Ma, C. Li, X. Mei, J. Huang, J. Ma, Hyperspectral image denoising with superpixel segmentation and low-rank representation, Inf. Sci. 397–398 (2017) 48–68. [9] S. Gabarda, G. Cristóbal, Blind image quality assessment through anisotropy, J. Opt. Soc. Am. A 24 (12) (2007) B42–B51. [10] H.E. Gerhard, F.A. Wichmann, M. Bethge, How sensitive is the human visual system to the local statistics of natural images? PLoS Comput. Biol. 9 (1) (2013) 1–15. [11] K. Gu, W. Lin, G. Zhai, X. Yang, W. Zhang, C.W. Chen, No-reference quality metric of contrast-distorted images based on information maximization, IEEE Trans Cybern. 99 (2016) 1–7. [12] X. Huang, L. Chen, J. Tian, X. Zhang, X. Fu, Homogeneity based blind noisy image quality assessment, 2013 IEEE Int. Conf. Syst., Man, and Cybernetics (2013) 2963–2967. [13] J. Jiang, L. Zhang, J. Yang, Mixed noise removal by weighted encoding with sparse nonlocal regularization, IEEE T. Image Process. 23 (6) (2014) 2651–2662. [14] D. Kersten, Statistical efficiency for the detection of visual noise, Vision Res. 27 (6) (1987) 1029–1040. [15] X. Kong, K. Li, Q. Yang, L. Wenyin, M.H. Yang, A new image quality metric for image auto-denoising, 2013 IEEE Int. Conf. Comput.Vision (2013) 2888–2895. [16] E.C. Larson, D.M. Chandler, Most apparent distortion: full-reference image quality assessment and the role of strategy, J. Electron. Imag. 19 (1) (2010) 011006. [17] A. Leclaire, L. Moisan, No-reference image quality assessment and blind deblurring with sharpness metrics exploiting fourier phase information, J. Math. Imag. Vis 52 (1) (2015) 145–172. [18] D.M. Levi, S.A. Klein, I. Chen, The response of the amblyopic visual system to noise, Vision Res. 47 (19) (2007) 2531–2542. [19] L. Li, Y. Zhou, W. Lin, J. Wu, X. Zhang, B. Chen, No-reference quality assessment of deblocked images, Neurocomputing 177 (2016) 572–584. [20] H. Liang, D.S. Weller, Denoising method selection by comparison-based image quality assessment, in: 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 3106–3110. [21] J.S. Lim, Two-Dimensional Signal and Image Processing, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1990. [22] H. Liu, I. Heynderickx, Visual attention in objective image quality assessment: based on eye-tracking data, IEEE Trans. Circuits Syst. Video Technol. 21 (7) (2011) 971–982. [23] L. Liu, Y. Hua, Q. Zhao, H. Huang, A.C. Bovik, Blind image quality assessment by relative gradient statistics and adaboosting neural network, Signal Process. Image Commun. 40 (2016) 1–15. [24] D.G. Lowe, Object recognition from local scale-invariant features, Proc. Seventh IEEE Int. Conf. Comput. Vision 2 (1999) 1150–1157. [25] Q. Lu, W. Zhou, H. Li, A no-reference image sharpness metric based on structural information using sparse representation, Inf. Sci. 369 (2016) 334–346. [26] W. Lu, N. Mei, F. Gao, L. He, X. Gao, Blind image quality assessment via semi-supervised learning and fuzzy inference, Appl. Inf. 2 (1) (2015) 1–20. [27] R.A. Manap, L. Shao, Non-distortion-specific no-reference image quality assessment: a survey, Inf. Sci. 301 (2015) 141–160. [28] A. Mittal, A.K. Moorthy, A.C. Bovik, No-reference image quality assessment in the spatial domain, IEEE T. Image Process. 21 (12) (2012) 4695–4708. [29] A.K. Moorthy, A.C. Bovik, Blind image quality assessment: from natural scene statistics to perceptual quality, IEEE T. Image Process. 20 (12) (2011) 3350–3364. [30] M. Oszust, Full-reference image quality assessment with linear combination of genetically selected quality measures, PLoS ONE 11 (6) (2016) 1–17. [31] M. Oszust, Image quality assessment with lasso regression and pairwise score differences, Multimed. Tools Appl. 76 (11) (2017) 13255–13270. [32] N. Ponomarenko, L. Jin, O. Ieremeiev, V. Lukin, K. Egiazarian, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, C.C.J. Kuo, Image database TID2013: peculiarities results and perspectives, Signal Process.-Image 30 (2015) 57–77. [33] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, F. Battisti, TID2008 - A database for evaluation of full-reference visual quality assessment metrics, Advances of Modern Radioelectronics 10 (2009) 30–45. [34] L.I. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D 60 (1) (1992) 259–268. [35] M.A. Saad, A.C. Bovik, C. Charrier, Blind image quality assessment: a natural scene statistics approach in the DCT domain, IEEE T. Image Process. 21 (8) (2012) 3339–3352. [36] T. Serre, M. Kouh, C. Cadieu, U. Knoblich, G. Kreiman, T. Poggio, A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex, Tech. rep., DTIC Document (2005). [37] L. Shao, R. Yan, X. Li, Y. Liu, From heuristic optimization to dictionary learning: a review and comprehensive comparison of image denoising algorithms, IEEE Trans. Cybern. 44 (7) (2014) 1001–1013. [38] H.R. Sheikh, M.F. Sabir, A.C. Bovik, A statistical evaluation of recent full reference image quality assessment algorithms, IEEE T. Image Process. 15 (11) (2006) 3440–3451. [39] K. Shirai, M. Okuda, FFT Based solution for multivariable l2 equations using KKT system via FFT and efficient pixel-wise inverse calculation, in: 2014 IEEE international conference on acoustics, Speech Signal Process. (ICASSP) (2014) 2629–2633. [40] L. Tang, X. Min, V. Jakhetiya, K. Gu, X. Zhang, S. Yang, No-reference quality assessment for image sharpness and noise, in: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016, pp. 1–6. [41] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE T. Image Process. 13 (4) (20 04) 60 0–612. [42] Y. Wu, B. Tracey, P. Natarajan, J.P. Noonan, Probabilistic non-local means, IEEE Signal Process. Lett. 20 (8) (2013) 763–766. [43] J. Xu, P. Ye, Q. Li, H. Du, Y. Liu, D. Doermann, Blind image quality assessment based on high order statistics aggregation, IEEE T. Image Process. 25 (9) (2016) 4444–4457. [44] J. Xu, L. Zhang, W. Zuo, D. Zhang, X. Feng, Patch group based nonlocal self-similarity prior learning for image denoising, 2015 IEEE Int. Conf. Comput. Vision (ICCV) (2015) 244–252. [45] W. Xue, L. Zhang, X. Mou, Learning without human scores for blind image quality assessment, Proc. IEEE Conf. Comput. Vision Pattern Recog. (2013) 995–1002. [46] P. Ye, J. Kumar, L. Kang, D. Doermann, Unsupervised feature learning framework for no-reference image quality assessment, 2012 IEEE Conf. Comput. Vision Pattern Recog. (2012) 1098–1105. [47] L. Zhang, Y. Shen, H. Li, VSI: A visual saliency-induced index for perceptual image quality assessment, IEEE T. Image Process. 23 (10) (2014) 4270–4281.
M. Oszust / Information Sciences 482 (2019) 334–349
349
[48] L. Zhang, L. Zhang, A.C. Bovik, A feature-enriched completely blind image quality evaluator, IEEE T. Image Process. 24 (8) (2015) 2579–2591. [49] W. Zhou, L. Yu, W. Qiu, Y. Zhou, M. Wu, Local gradient patterns (LGP): an effective local-statistical-feature extraction scheme for no-reference image quality assessment, Inf. Sci. 397–398 (2017) 1–14. [50] X. Zhu, P. Milanfar, Automatic parameter selection for denoising algorithms using a no-reference measure of image content, IEEE T. Image Process. 19 (12) (2010) 3116–3132.