Accepted Manuscript No-Reference Image Quality Assessment with Local Features and High-Order Derivatives Mariusz Oszust PII: DOI: Reference:
S1047-3203(18)30206-2 https://doi.org/10.1016/j.jvcir.2018.08.019 YJVCI 2268
To appear in:
J. Vis. Commun. Image R.
Received Date: Accepted Date:
31 May 2018 21 August 2018
Please cite this article as: M. Oszust, No-Reference Image Quality Assessment with Local Features and High-Order Derivatives, J. Vis. Commun. Image R. (2018), doi: https://doi.org/10.1016/j.jvcir.2018.08.019
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
No-Reference Image Quality Assessment with Local Features and High-Order Derivatives Mariusz Oszust Department of Computer and Control Engineering, Rzeszow University of Technology, Wincentego Pola 2, 35-959 Rzeszow, Poland
Abstract The perceptual quality of images is often affected by applied image processing techniques. Their evaluation requires tests which involve human subjects. However, in most cases, image quality assessment (IQA) should be automatic and reproducible. Therefore, in this paper, a novel no-reference IQA method is proposed. The method uses high-order derivatives to extract detailed structure deformation present in distorted images. Furthermore, it employs local features, considering that only some regions of an image carry interesting information. Then, statistics of local features are used by a support vector regression technique to provide an objective quality score. To improve the quality prediction, luminance and chrominance channels of the image are processed. Experimental results on six large-scale public IQA image datasets show that the proposed method outperforms the state-of-the-art hand-crafted and deep-learning techniques in terms of the visual quality prediction accuracy. Furthermore, the method is better than popular full-reference approaches (i.e., SSIM and PSNR). Keywords: Image quality assessment, No-reference, Local features, Support Vector Regression
Email address:
[email protected] (Mariusz Oszust) URL: http://marosz.kia.prz.edu.pl (Mariusz Oszust)
Preprint submitted to Journal of Visual Communication and Image RepresentationAugust 22, 2018
1
1. Introduction
2
The main role of image quality assessment (IQA) techniques is to provide
3
an objective and reproducible evaluation of images, aiming at the replacement
4
of time-consuming and expensive tests with human observers. Such evaluation
5
has become particularly important for various image processing applications,
6
involving the quality monitoring of the visual content or the development of
7
algorithms for image/video processing [1, 2, 3]. Taking into account the avail-
8
ability of a distortion-free, reference image for the evaluation, the objective
9
measures are divided into full-reference (FR), reduced-reference (RR), and no-
10
reference (NR) techniques [1, 4]. Frequently used FR and RR techniques include
11
peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) [5], or RR-
12
SSIM [6]. Apart from structural image information, FR measures also employ
13
contrast changes [7], visual saliency maps [8], or image statistical properties
14
[9]. Furthermore, some measures combine other FR approaches [10, 11] or fuse
15
complementary features [12].
16
The NR-IQA measures can be divided into distortion-specific and general-
17
purpose methods. The techniques which belong to the first category are designed
18
to evaluate images corrupted by specific distortion types, such as blurriness, con-
19
trast change, compression, or noise [13, 14, 15]. The general-purpose methods,
20
in turn, address a variety of distortion types. The focus of this paper is general-
21
purpose NR-IQA. It is worth noticing that the development of such measures is
22
challenging and the most desirable since pristine images are seldom available in
23
practical applications. In a typical NR measure, a vector of perceptual features
24
is mapped into subjective scores using a regression technique in order to obtain
25
a quality model. For example, in DIIVINE [16], a distortion type is predicted
26
and then the image quality is estimated. In BLIINDS-II [17], a generalized nat-
27
ural scene statistics (NSS) model of local discrete cosine transform coefficients
28
is used with a Bayesian inference approach. The BRISQUE [18], in turn, uses
29
pairwise products of neighboring luminance values to train a support vector
30
regression (SVR) model. In OG-IQA index [19], AdaBoosting-backpropagation
2
31
neural network learns image gradient orientations for the IQA. A gradient mag-
32
nitude map and Laplacian of Gaussian (LOG) response are used with the SVR
33
in GM-LOG [20]. A more complex approach can be found in HOSA [21], in
34
which the K-means clustering of normalized image patches is used to create a
35
codebook based on the low and high-order statistics. The HOSA describes an
36
image using 14,700 features and uses statistical differences between a codebook
37
and images for the IQA. The Local Binary Patterns (LBPs) extracted from the
38
first- and second-order image structures are used in BSD [22]. In BHOD [23], in
39
turn, the SVR learns LBP histograms obtained for up to fourth-order gradient
40
maps. The usage of high-order information for the NR-IQA can also be found
41
in HOLDPM which extracts local structures with high-order local derivative
42
pattern (LDP) [24]. Here, the authors report that LBP-based first-order mea-
43
sures are incapable of extracting the high-order information. In HOLDPM, the
44
LOG is used for scale-time decomposition. Then, histograms extracted from the
45
second-order LDPs are used to train a regression model. In CIQM [25], which
46
is based on Quaternion Wavelet Transform, the magnitude and entropy of the
47
subband QWT coefficients and natural scene statistics of the third phase are
48
used.
49
Statistics for an assessed image, an equivalent of the image filtered with
50
Prewitt operators, and descriptors obtained with Speeded-Up Robust Features
51
(SURF) for the processed images are mapped into subjective ratings using the
52
SVR in NOREQI [26]. In UCA [15], which is dedicated to block-based image
53
and video compression, Shi-Tomasi corner detection technique is used to iden-
54
tify statistical differences among corners and edges between block boundaries.
55
Similarly, the BPRI [27] finds interest points in compressed images to determine
56
their quality based on corners which are also detected in an introduced pseudo-
57
reference image. Then, both images are compared using blockiness, sharpness,
58
and noisiness metrics. In that measure, the LBP descriptor characterizes local
59
structures for sharpness and noisiness metrics. The QAC [28], which similarly to
60
BPRI does not require a training step, incorporates a set of centroids of quality
61
levels which belong to four distortion types. Another technique of this type, 3
62
IL-NIQE [29], uses the multivariate Gaussian model to describe natural image
63
statistics derived from multiple cues.
64
With the development of deep neural network (DNN) approaches, many new
65
NR measures have been introduced. Most of them rely on architectures inher-
66
ited from other vision-based tasks which are suited to the NR-IQA [30, 31]. Due
67
to the lack of sufficient training samples, such measures use FR-IQA methods
68
to provide an approximation of subjective scores [32, 33]. Alternatively, image
69
patches are employed [31, 32, 34]. Despite encouraging results, these methods
70
suffer from complex architectures which often require specific hardware imple-
71
mentation for the efficient use and their black-box representations are difficult to
72
interpret. Consequently, the evaluation of their performance is often limited to
73
small subsets of images from popular IQA benchmarks or a few training-testing
74
experiments.
75
The HVS is adapted to extract structural information for understanding the
76
visual content of an image [5, 14, 22]. Therefore, many NR-IQA approaches
77
incorporate the first-order gradient information [23]. However, edges and lines
78
are not sufficient to capture essential information regarding a region in an image
79
[35]. To address this issue, some works employ high-order derivative patterns or
80
high-order derivative magnitude maps [23, 24]. Despite the mimicked physical
81
phenomenon, better results are obtained via extraction of a large number of
82
features, as it is reported for the HOSA [21]. However, it can be stated that
83
the application of high-order image derivatives to the NR-IQA remains largely
84
uninvestigated. In this work, a novel IQA method is proposed in which such
85
derivatives are incorporated. Since the bilaplacian captures sharp boundaries
86
and high-order smooth image variations, it has been successfully used for image
87
vectorization [36]. Moreover, the Laplace or biharmonic operators have a high
88
potential for image inpainting since they enhance the structural information.
89
Consequently, the Laplacian is often used for edge detection with an additional
90
Gaussian filtering to suppress its sensitivity to the noise. Taking into account
91
these findings, it is worth to investigate whether the usage of the high-order
92
derivatives (i.e., the Laplacian and the bilaplacian) can be beneficial to the 4
93
IQA. Unlike the most of the related works in which all pixels in a distorted
94
image are processed, the proposed NR measure incorporates an interest point
95
detection technique to indicate image parts that should be described. The
96
interest point detection, or feature detection, method mimics the attention of
97
the HVS drawn by visually attractive image regions. In this work, features are
98
detected in the bilaplacian domain obtained for luminance and chrominance
99
channels of an image. Then, statistics for pixel blocks describing the detected
100
interest points are used as perceptual features to train a quality model. Here, the
101
SVR is applied to model a nonlinear relationship between perceptual features
102
and quality scores. The main contribution of this study concerns the application
103
of local features in the bilaplacian domain for the NR quality prediction of color
104
images.
105
The experimental results on six large-scale popular IQA benchmark datasets
106
are encouraging and show that the proposed NR-IQA technique better estimates
107
the perceptual quality of distorted images than the related hand-crafted and
108
DNN-based state-of-the-art measures. It also outperforms popular FR mea-
109
sures, i.e., PSNR and SSIM, on the most demanding datasets.
110
The rest of the paper is organized as follows. The next section describes the
111
proposed approach. Section 3, in turn, explains evaluation protocols, introduces
112
public IQA benchmark datasets, and covers the extensive comparative evalua-
113
tion of the introduced method against other IQA techniques. Finally, Section 4
114
concludes the paper.
115
2. Proposed method
116
In this section, the proposed NR-IQA method is described. As illustrated
117
in its block diagram (Fig. 1), the introduced technique processes luminance
118
and chrominance channels of an assessed image. Specifically, for each channel,
119
an interest point detection method indicates regions in the bilaplacian domain
120
(Δ2 ). Then, statistics of pixel blocks of these regions are calculated and used
121
for training the SVR to obtain an image quality model.
5
Figure 1: Flowchart of the proposed NR-IQA method.
122
2.1. High-order image derivatives
123
The HVS is sensitive to change in regularities of natural images caused by
124
noise [37]. Hence, the subjective perception of an image depends on local seman-
125
tic structural information forming primitives in the visual cortex [38]. Accord-
126
ingly, many IQA approaches use first-order derivatives to capture information
127
carried by edges and lines. However, geometric properties and more discrimi-
128
native information can be captured using high-order derivatives. This is con-
129
sistent with models of the receptive fields in the visual cortex which employ
130
up to fourth-order image derivatives [35]. Therefore, it can be assumed that
131
an application of high-order derivatives to the IQA can lead to a significant
132
improvement in the image quality prediction performance.
133
The first-order edge detection algorithms provide locations of image inten-
134
sity changes which are identified by peaks in the gradient domain. All image
6
135
operators which involve differentiation are noise-sensitive. Consequently, the
136
high-order derivatives enhance noisy areas and edges. Therefore, an image is
137
often smoothed by the Gaussian function before the Laplacian is applied to find
138
edges in an image [39]. However, taking into account a possible application of
139
the Laplacian to the IQA, such smoothing can be seen as an additional and
140
unwanted distortion.
141
In the introduced method, images converted into YCbCr color space are
142
used. This color space is recommended in ITU-R BT.601 [40] for the video
143
broadcasting. Consequently, the efficient use of channel bandwidth is achieved
144
by a reduction of the bandwidth for chrominance components which provide
145
considerably less perceptual information than the luminance channel.
146
The directional derivatives of a grayscale image I are written as δI/δx and
147
δI/δy. The I can also represent a single YCbCr channel. The Laplacian is
148
then expressed as Δ = δ 2 I/δx2 + δ 2 I/δy 2 . The finite difference approximation
149
of the horizontal second-order partial derivative δ 2 /δx2 can be written using
150
the following kernel (or the mask) δ 2 /δx2 = [1, −2, 1]. The transposition of this
151
kernel leads to the approximation of δ 2 /δy 2 . Then, the respective horizontal and
152
vertical masks can be used to obtain the following popular Laplacian kernels: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 1 0 1 −2 1 1 0 1 −2 1 −2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ Δ1 = ⎣1 −4 1⎦ , Δ2 = ⎣−2 4 −2⎦ , Δ3 = ⎣0 −4 0⎦ , and Δ4 = ⎣ 1 4 1 ⎦ . 0 1 0 1 −2 1 1 0 1 −2 1 −2 (1)
153
As shown in Eq. (1), masks Δ2 and Δ3 highlight diagonal edges. Conse-
154
quently, a kernel which include diagonals can be calculated combining either Δ1
155
with Δ3 or Δ2 with Δ4 :
156
⎡ ⎤ −1 −1 −1 ⎢ ⎥ Δ5 = ⎣−1 8 −1⎦ . −1 −1 −1
(2)
The Laplacian is applied to the image using the convolution. The bilaplacian
7
157
Δ2 , in turn, is obtained as: Δ2 = ΔΔ =
δ4 δ2 δ2 δ4 + 2 2 2 + 4. 4 δx δx δy δy
(3)
158
To provide kernels for the bilaplacian operator, the Laplacian masks are com-
159
bined. Then, an image is transformed to the bilaplacian domain by convolving
160
it with two Laplacian kernels. This can be written as Δ2ij ∗ I = Δi ∗ Δj ∗ I,
161
where i, j ∈ {1, 2, . . . 5} and ”∗” denotes the convolution. To reduce the number
162
of investigated bilaplacian kernels, only Laplacian masks for i = j and masks
163
which are complementary (j = i + 2; i, j ∈ {1, 2}, see Eq. (1)) are combined.
164
Finally, the following bilaplacian masks are considered: Δ211 , Δ222 , Δ233 , Δ244 ,
165
Δ255 , Δ213 , and Δ224 .
166
2.2. Local feature detector
167
A keypoint (feature, corner, or interest point) is a stable group of pixels which
168
characterize an image region despite image distortions or transformations. Since
169
human eyes always focus on interesting image regions, feature detectors can be
170
used for the prediction of the visual saliency [41]. In the proposed NR method,
171
the Features from Accelerated Segment Test (FAST) [42] technique indicates
172
image regions in the bilaplacian domain for further processing. In other words,
173
it provides a list of keypoints for the image Δ2ij ∗ I. In the FAST, a circle of 16
174
pixels is used to determine whether the center pixel is a corner. Specifically, the
175
feature is detected if 12 contiguous pixels from the circle are significantly darker
176
or brighter than a potential corner. To speed up computations, only several
177
pixels are checked (1, 5, 9, and 13). To address the problems with the ordering
178
of the selected pixels for the comparison and their number, a decision tree is
179
trained on a large number of images with keypoints [42]. For the selection of
180
the features with the strongest response, a non-maximum suppression based on
181
a threshold is applied. The FAST is up to two orders of magnitude faster than
182
other popular feature detectors and identifies considerably stable corners [42].
183
Figure 2 presents an exemplary image from TID2013 dataset [43], its YCbCr
8
Figure 2: YCbCr channels of exemplary image and images resulted from their convolution with Laplacian and bilaplacian kernels. The bottom row contains interest points detected by the FAST technique in bilaplacian domain (shown as green dots).
184
components, as well as images obtained by convolving the channels with two
185
Laplacian masks (i.e., Δ2 and Δ4 ) and the bilaplacian kernel Δ224 . The figure
186
also shows FAST keypoints detected in the bilaplacian domain. As it can be
187
seen, the bilaplacian domain seems to provide more information about image
188
than its second-order derivatives. Interestingly, FAST detects interest points in
189
different image areas of the channels, suggesting that channels in the bilaplacian
190
domain contain complementary information and should be used together to
191
describe the visual content.
192
2.3. Quality prediction
193
In the proposed method, an input RGB image is transformed into the YCbCr
194
color space. Here, each channel (Y , Cb, or Cr), denoted as I, is independently
195
processed. Consequently, I is convolved with the bilaplacian mask and the im-
196
age Δ2ij (I) = Δi ∗ Δj ∗ I is obtained. In the next step, the FAST detector 9
197
indicates N regions in the bilaplacian domain to be described. The parameter
198
N is used to reduce keypoints with the weakest response. This leads to a better
199
IQA model with the proposed approach, as it will be shown in the next section.
200
In the next step, each n-th keypoint (n = 1, 2, . . . , N ; N ≤ N ) is described with
201
a M × M pixel block centered at its location; the N denotes the number of all
202
keypoints detected in the image. The block is used as a feature descriptor since
203
the characterized region contains rich information about intensity changes. In
204
order to assess the quality of an image, the list of N blocks must be transformed
205
into a perceptual feature vector used as the input to the SVR. Here, different
206
approaches can be used, based on positions of pixels in all blocks or keypoints
207
described with such blocks. For example, the most contributing pixel positions
208
to the variability of data in all blocks can be found using Principal Compo-
209
nent Analysis (PCA) and then characterized with some statistics. Furthermore,
210
approaches which are commonly used to provide a representation of an image
211
based on described keypoints, such as the bag of visual words, Fisher vectors or
212
VLAD, can be applied [44]. However, to provide an NR-IQA measure with a
213
reasonably low complexity, without a sophisticated intermediate representation,
214
the global description of Δ2ij (I) is obtained using the following statistics. Here,
215
for all m-th (m = 1, 2, . . . , M 2 ) pixels sample mean μ(D), standard deviation
216
σ(D), and histogram variance hvar(D) are calculated. The D denotes all m-th
217
pixels which describe N keypoints of the Δ2ij (I). The histogram variance is
218
obtained as [19]: hvar(D) =
(h(D) − μ(D))2 .
(4)
D 219
Finally, the perceptual feature vector that contains statistics of pixel blocks
220
is obtained. Since distortions affect images across scales [18], the introduced
221
method also provides statistics for the assessed image downsampled by a factor
222
of two. The resulting feature vector has 18M 2 dimensions. However, in experi-
223
ments on MLIVE dataset, four scales are used as suggested in works devoted to
224
the quality prediction of images corrupted by multiple distortions [45]. To ob-
225
tain a quality model, the popular SVR technique with the radial basis function
10
226
(RBF) kernel is employed [46]. It learns the mapping from the feature space to
227
subjective scores. Before the SVR is applied, the statistics are linearly scaled to
228
the range [0, 1]. The min-max normalization is used to scale the training and
229
testing subsets of the data.
230
Influence of distortions on the bilaplacian. Figure 3 illustrates the impact of
231
different distortion types and their severity on the images convolved with an
232
exemplary bilaplacian kernel (Δ224 ). The first row contains a pristine image
233
from TID2013 dataset and a small region which is magnified to show more de-
234
tails. The distorted regions belong to images corrupted by Gaussian noise and
235
JPEG2K compression. As illustrated, different distortions modify the pristine
236
image in their unique ways, and their severity can be discriminated in the bi-
237
laplacian domain. It is worth noticing how the neighborhood of the magnified
238
object is affected by the considered distortions. Specifically, the Gaussian noise
239
seems to be easily detected since it adds more information captured in the bi-
240
laplacian domain. The amount of information is associated with the severity
241
of this distortion type. For the JPEG2K compression, the introduced artifacts
242
make the intensity changes in the magnified object less distinctive, depending
243
on the compression rate. In general, the luminance component seems to carry
244
more information in comparison to chrominance channels of the image. This is
245
consistent with the recommendation regarding the applicability of YCbCr color
246
space [40]. Consequently, the application of YCbCr and the bilaplacian domain
247
to the IQA seems justified.
248
Sensitivity of statistics to distortions. To provide a more thorough investigation
249
whether the statistics of pixel blocks for features detected in the bilaplacian do-
250
main are suitable for the IQA, the following experiment has been carried out. In
251
the experiment, the Spearman’s Rank Correlation Coefficient (SRCC) between
252
subjective scores for images from LIVE dataset [5] and statistics obtained with
253
the proposed method are used for their evaluation. Here, the mask Δ224 is used.
254
The technique employs 256 bins for the calculation of the histogram variance
255
hvar, and processes up to N = 2000 FAST features per image in the bilaplacian 11
Figure 3: Influence of distortions seen in YCbCr channels convolved with the bilaplacian kernel Δ224 . The pristine image is shown in the first two rows. The remaining rows contain a selected region in images distorted by Gaussian noise (3rd and 4th rows) and JPEG2K compression (5th and 6th rows). The images in even rows are more severely distorted than images in odd rows.
256
domain, where each feature is described by the 15×15 pixel block (M = 15). The
257
results are presented in Table 1 in terms of the mean of SRCC values obtained
258
for statistics which describe a given YCbCr channel. The SRCC performance is
259
shown considering the entire dataset or images distorted by five distortion types;
260
the best three values are written in bold. As reported, the luminance channel
261
carries the most of the perceptual information, while the chrominance channels
262
provide a supportive role. The results for statistics show that the hvar is clearly
263
the best performing quality feature. However, for the quality prediction of im-
264
ages corrupted by white Gaussian noise (WN) or Gaussian blur (GB), other
265
statistics exhibit a competitive SRCC performance. The employed statistics for
266
the cumulative description of pixel blocks seem to complement each other and
267
the observed diversity of performances encourages their joint usage. Therefore,
268
they are used to train the SVR in order to obtain the quality model (cf. Fig.
12
Table 1: Mean SRCC performance of used statistics for YCbCr channels on LIVE dataset.
Dist. type JP2K JPEG WN GB FF All
μ 0.7966 0.8114 0.9764 0.9067 0.7332 0.2898
Y σ 0.7162 0.6348 0.9733 0.8984 0.7267 0.2290
hvar 0.8344 0.9542 0.9813 0.9211 0.7633 0.8303
μ 0.4406 0.1230 0.9842 0.8333 0.3454 0.0932
Cb σ 0.3988 0.2314 0.9843 0.7099 0.3404 0.0023
269
1).
270
3. Experimental results and discussion
hvar 0.5037 0.8078 0.9355 0.8119 0.3587 0.5702
μ 0.4371 0.2707 0.9858 0.8143 0.4133 0.1387
Cr σ 0.3971 0.0622 0.9854 0.5980 0.4066 0.0230
hvar 0.4921 0.7929 0.9484 0.7985 0.4627 0.5909
271
In this Section, the introduced NR-IQA method which applies stAT istics
272
of pixel blocks of local fE atuRes detected in the bilaplacian domain of YCbCr
273
channels (RATER) is compared against the state-of-the-art NR techniques and
274
two popular FR-IQA methods.
275
3.1. IQA Datasets and Evaluation Protocol
276
The quality prediction performance of NR-IQA measures can be evaluated
277
using IQA benchmark datasets. Each dataset contains reference images, dis-
278
torted images, and subjective scores obtained in tests with human subjects.
279
Subjective ratings are denoted as mean opinion scores (MOS) or difference MOS
280
(DMOS). In this work, the following six publicly available IQA datasets are used:
281
(i) TID2013 [43], (ii) TID2008 [47], (iii) CSIQ [48], (iv) LIVE [5], (v) LIVE In
282
the Wild Image Quality Challenge (LIVE WIQC) [49], and (vi) LIVE Multiply
283
Distorted Image Quality Database (MLIVE) [50].
284
The LIVE dataset contains 29 reference images corrupted by the following
285
five distortion types at various levels: JPEG compression, JPEG2K compres-
286
sion, Gaussian blur, white noise (AWGN), and simulated fast fading Rayleigh
287
channel. There are 779 distorted images in this dataset [5]. The CSIQ contains
288
30 reference images and 866 images corrupted by six types of distortion with up 13
289
to four or five distortion levels. The distortions used in this dataset are JPEG
290
compression, JPEG2K compression, Gaussian blur, global contrast decrements,
291
and additive pink Gaussian noise. The TID2008 is two times larger than the
292
CSIQ and it contains 1700 images distorted by 17 distortion types. There are
293
25 reference images in this dataset and four distortion levels for each distortion
294
type. The TID2013 contains 3000 distorted images, 24 distortion types, and
295
five levels of distortions. This dataset is considered the most challenging IQA
296
benchmark due to its size and the diversity of distortion types. The experimen-
297
tal evaluation presented in this paper also contains tests on the LIVE WIQC
298
dataset [49]. This dataset contains 1162 images captured by mobile camera de-
299
vices which are corrupted by multiple distortions. It is worth noticing that the
300
LIVE WIQC does not contain reference images and all tests with human ob-
301
servers were performed in an uncontrolled manner using the Amazon Mechanical
302
Turk. The MLIVE dataset contains 450 images distorted with Gaussian blur
303
followed by JPEG compression and Gaussian blur followed by Gaussian Noise.
304
Interestingly, TID2013 also contains examples of multiple-distortions (e.g., the
305
lossy compression of noisy images).
306
The performance of a given objective measure is evaluated using SRCC,
307
Kendall Rank order Correlation Coefficient (KRCC), Pearson Correlation Co-
308
efficient (PCC), and Root Mean Square Error (RMSE). As recommended by
309
the Video Quality Experts Group [51], the nonlinear relationship between sub-
310
jective and predicted scores can be taken into account by the application of a
311
nonlinear logistic regression before PCC and RMSE are calculated [52]: Q p = β1
1 1 − 2 1 + exp(β2 (Q − β3 ))
+ β4 Q + β 5 ,
(5)
312
where Q is the predicted score, Qp is the fitted score, and β = [β1 , β2 , . . . , β5 ]
313
are parameters determined by the regression.
314
The SRCC is obtained as: SRCC(Q, S) = 1 −
14
m 6 i=1 d2i , m(m2 − 1)
(6)
315
where di is the difference between i-th image in Q and S, and m is the total
316
number of images. The KRCC uses the number of concordant and discordant
317
pairs in the dataset, mc and md , respectively: KRCC(Q, S) =
318
mc − md m(m−1) 2
.
(7)
The PCC is calculated as: T¯ Q¯p S , PCC(Qp , S) = T ¯ ¯T ¯ ¯ Qp Qp S S
(8)
319
¯ The RMSE, in where, the mean-removed vectors are denoted as Q¯p and S.
320
turn, is written as:
RMSE(Qp , S) =
(Qp − S)T (Qp − S) . m
(9)
321
The better IQA measure is characterized by a smaller RMSE and larger SRCC,
322
KRCC, and PCC in comparison to other IQA methods.
323
Since the presented NR technique requires training, as many other learning-
324
based methods evaluated in further sections of this paper, the SVR model used
325
for the visual quality prediction is obtained using a widely accepted protocol. In
326
the protocol, each IQA benchmark dataset is divided into disjoint learning and
327
testing subsets, i.e., all distorted images which belong to 80% reference images
328
are used for training and the remaining 20% images are used in tests [21, 27].
329
The performance of an IQA measure is reported as the median values of SRCC,
330
KRCC, PCC, and RMSE over random 100 training-testing iterations [53]. In
331
order to avoid bias and fairly evaluate methods using the protocol, all learning-
332
based techniques are always run on the same 100 subsets. This protocol is also
333
used to evaluate the influence of parameters of the RATER on its performance.
334
The SVR models which map the objective scores into subjective ratings for
335
the RATER and other methods are obtained using LIBSVM library [46], aiming
336
at their best performance [20, 29].
15
Figure 4: SRCC performance of the proposed method with the considered Laplacian and bilaplacian kernels on TID2013 and LIVE datasets. The performance with raw YCbCr images is also shown.
337
3.2. Implementation details and feature analysis
338
The effectiveness of the introduced method based on statistics of features
339
detected and described in the high-order domain and the influence of the pa-
340
rameters of the method on its performance are investigated. Figure 4 contains
341
the SRCC performance of the method in the Laplacian and bilaplacian do-
342
mains obtained with different kernels as well as with raw YCbCr images. Here,
343
TID2013 and LIVE are used. The SRCC is shown since other performance cri-
344
teria (KRCC, PCC, and RMSE) lead to similar findings. The RATER is run
345
with M = 15 and N = N . As illustrated, the RATER with bilaplacian kernels
346
achieves greater SRCC values than it can be observed for the Laplacian do-
347
main or raw YCbCr images. Among bilaplacian kernels, the mask Δ224 provides
348
the best performance, followed by Δ222 , Δ244 kernels. The results for Laplacian
349
masks Δ2 and Δ4 are also promising. Since these masks perform better than
350
others and are complementary, their combination (i.e., Δ224 ) is beneficial to the
351
IQA. The inferior prediction performance of the method using raw color images
352
justifies the proposed application of high-order derivatives. In the experiments
353
presented in the remaining parts of this paper, the bilaplacian mask Δ224 is
354
employed.
16
(a)
(b)
Figure 5: Influence of the size of the pixel block M (a) and the maximum number of processed keypoints per image N (b) on the performance of the RATER, in terms of SRCC on TID2013 and LIVE benchmarks.
355
Apart from the bilaplacian mask, the impact of parameters M and N also
356
requires investigation. Therefore, Fig. 5 presents the performance of the method
357
on TID2013 and LIVE varying the size of the pixel block M and the number of
358
keypoints per the assessed image N . As reported, the RATER is not sensitive
359
to the changes of these two parameters. For the M = 15, the pixel block is two
360
times larger than the region used for the interest point detection with FAST;
361
therefore, it is used in the proposed method. To reduce the number of possible
362
test configurations, M = 15 is used in experiments with the parameter N . Since
363
the N = 2000 reflects a typical number of features per image applied in image
364
matching and influences the speed of image description, it is used in further
365
experiments with the RATER.
366
Since the introduced technique incorporates three statistics, Table 2 presents
367
their contribution to the SRCC performance on LIVE and TID2013. The table
368
also contains the values obtained for separate color components as well as for
369
RGB and HSV color spaces. As shown in the table and discussed in Section 2.3,
370
the hvar is the most contributing perceptual feature used in the RATER. How-
371
ever, the use of all three statistics leads to the best results on TID2013. Taking
372
into account the results for color channels, it is evident that the luminance chan-
373
nel carries more perceptual information in comparison to chrominance channels.
17
Table 2: Contribution of statistics, color spaces, or methods for feature detection in images to the SRCC performance of the RATER.
Test
IQA dataset LIVE TID2013
Statistic or channel μ σ hvar Y Cb Cr All Color space RGB HSV YCbCr Feature detector Shi - Tomasi Harris - Stephens SURF FAST
0.9268 0.7173 0.9111 0.6965 0.9364 0.7801 0.9348 0.6934 0.7971 0.6725 0.7966 0.6755 0.9422 0.8269 0.9387 0.7338 0.8902 0.6729 0.9422 0.8269 0.9276 0.7017 0.9285 0.6827 0.8770 0.7177 0.9422 0.8269
374
The performance difference observed for channels is smaller for TID2013 dataset.
375
This can be also attributed to a large number of distortions and their levels,
376
including distortions affecting the perceptual quality of color images. Conse-
377
quently, the quality of images in this dataset is better predicted by the RATER
378
using the color information. The RATER with the RGB color space performs
379
better than with the HSV, but it is clearly outperformed by the case in which
380
the YCbCr color space is employed. To show that the RATER achieves the best
381
performance with FAST keypoints, the results with other popular keypoint de-
382
tectors are also reported in Table 2. The suitability of the FAST detector comes
383
from the fact that it decides whether an image region contains a corner taking
384
into account values of neighboring pixels in the bilaplacian domain, while other
385
methods apply blurring to the region to find features. It is also faster than these
386
techniques.
18
387
3.3. Performance on IQA datasets
388
The introduced NR measure is compared with the following state-of-the-art
389
measures: HOSA [21], BPRI [27], BRISQUE [19], IL-NIQE [29], OG-IQA [19],
390
and NOREQI [26]. Furthermore, PSNR and SSIM [5], as the most popular
391
full-reference IQA measures, are added to the comparative evaluation. The
392
NOREQI and BPRI are recently introduced general-purpose measures which
393
incorporate feature detection step to facilitate the quality prediction. The IL-
394
NIQE and HOSA are devoted to the assessment of color images and they out-
395
perform BLIINDS2, DIIVINE, CORNIA, NIQE, BRISQUE, and QAC [21, 29].
396
The HOSA is also better than GM-LOG and IL-NIQE [21]. Since BPRI, IL-
397
NIQE, PSNR, and SSIM do not require a learning step, their performance is
398
evaluated using the testing subsets defined according to the applied protocol
399
(see Section 3.1) [29].
400
The median values of SRCC, KRCC, PCC, and RMSE obtained for the
401
compared IQA measures on six datasets are reported in Table 3. In the table, the
402
best performing IQA measure is written in italics and the best performing NR-
403
IQA technique is written in bold. As demonstrated, the RATER outperforms all
404
compared IQA measures on TID2013, TID2008, and CSIQ. The large difference
405
between the results obtained by the RATER and the second best technique
406
(i.e., SSIM) is encouraging and justifies the application of local features in the
407
bilaplacian domain to the NR-IQA. On LIVE dataset, the RATER is the best
408
performing NR measure, with values of evaluation criteria within 1% of the
409
results obtained by the SSIM. Since LIVE WIQC does not contain reference
410
images, FR measures cannot be evaluated on this dataset. Here, the best results
411
are obtained by the RATER and NOREQI. On MLIVE, in turn, the IL-NIQE,
412
RATER, and BRISQUE exhibit a similar level of performance. Table 3 also
413
contains overall values calculated as the average and weighted average. For the
414
weighted average, the number of images in the database is used as its weight.
415
The overall values are not reported for the RMSE due to the different range
416
of DMOS values for LIVE datasets. The overall results reveal the superiority
417
of the introduced NR-IQA technique. Among the remaining NR measures, the 19
Table 3: Performance of the evaluated methods on six IQA datasets.
PSNR
SSIM
SRCC KRCC PCC RMSE
0.6344 0.4667 0.6983 0.8858
0.7423 0.5631 0.7947 0.7468
SRCC KRCC PCC RMSE
0.5539 0.4039 0.5382 1.1274
0.7788 0.5831 0.7788 0.8414
SRCC KRCC PCC RMSE
0.8117 0.6172 0.8083 0.1528
0.8803 0.7006 0.8674 0.1291
SRCC KRCC PCC RMSE
0.8788 0.6937 0.8790 13.135
0.9473 0.8003 0.9451 8.9088
SRCC KRCC PCC RMSE
NA NA NA NA
NA NA NA NA
SRCC KRCC PCC RMSE
0.6834 0.5110 0.7595 12.0767
0.8636 0.6797 0.8873 8.7075
SRCC 0.7124 0.8425 KRCC 0.5385 0.6654 PCC 0.7366 0.8547 SRCC 0.6681 0.8006 KRCC 0.4991 0.6205 PCC 0.6970 0.8234
HOSA BPRI BRISQUE IL-NIQE OG-IQA NOREQI RATER TID2013, 3000 images 0.7132 0.2222 0.5551 0.5126 0.4855 0.5565 0.8269 0.5392 0.1527 0.3988 0.3631 0.3473 0.3930 0.6411 0.7823 0.4660 0.6486 0.6307 0.6228 0.6556 0.8409 0.7734 1.0946 0.9422 0.9679 0.9712 0.9361 0.6703 TID2008, 1700 images 0.7732 0.1825 0.6066 0.1510 0.5802 0.6203 0.8257 0.5935 0.1291 0.4423 0.1005 0.4191 0.4504 0.6496 0.8136 0.4747 0.6759 0.1984 0.6666 0.7008 0.8362 0.7732 1.1801 0.9831 1.3157 1.0024 0.9529 0.7361 CSIQ, 866 images 0.8290 0.5679 0.8608 0.8683 0.7689 0.8215 0.8983 0.6400 0.4238 0.6801 0.6852 0.5759 0.6346 0.7240 0.8473 0.7250 0.8851 0.8860 0.8064 0.8494 0.9211 0.1433 0.1781 0.1250 0.1254 0.1589 0.1418 0.1024 LIVE, 779 images 0.9408 0.8826 0.9391 0.8993 0.9159 0.8670 0.9422 0.7922 0.7211 0.7923 0.7200 0.7638 0.6886 0.7987 0.9415 0.8808 0.9427 0.9061 0.9195 0.8850 0.9428 9.1579 13.002 8.9522 11.567 10.801 12.888 8.9412 LIVE WIQC, 1162 images 0.5481 0.1700 0.6049 0.1917 0.4702 0.5827 0.6033 0.3734 0.1140 0.4276 0.1289 0.3223 0.4128 0.4277 0.5853 0.2969 0.6422 0.1930 0.5134 0.6307 0.6285 16.376 19.289 15.494 19.730 17.245 15.728 15.748 MLIVE, 450 images 0.8817 0.0041 0.8943 0.9077 0.8256 0.8760 0.8915 0.7051 0.2880 0.7148 0.736 4 0.6410 0.7015 0.7218 0.9143 0.4640 0.9183 0.8951 0.8834 0.8935 0.9191 7.5447 1.9488 7.6526 8.2772 8.8110 8.5553 7.6194 Overall direct 0.8276 0.3719 0.7712 0.6678 0.7152 0.7483 0.8769 0.6540 0.3429 0.6057 0.5210 0.5494 0.5736 0.7070 0.8598 0.6021 0.8141 0.7033 0.7797 0.7969 0.8920 Overall weighted 0.7802 0.3176 0.6734 0.5380 0.6172 0.6630 0.8532 0.6056 0.2555 0.5116 0.4041 0.4616 0.4925 0.6772 0.8254 0.5486 0.7371 0.6042 0.7084 0.7337 0.8668
20
Table 4: Results of statistical significance tests.
PSNR SSIM HOSA BPRI BRISQUE IL-NIQE OG-IQA NOREQI TID2013 1 1 1 1 1 1 1 1 TID2008 1 1 1 1 1 1 1 1 CSIQ 1 1 1 1 1 1 1 1 LIVE 1 0 0 1 0 1 1 1 LIVE WIQC NA NA 1 1 0 1 1 1 MLIVE 1 1 0 1 0 0 1 1
418
HOSA exhibits acceptable performance and BPRI seems to be more suitable
419
for less diverse datasets due to distortion-specific steps in its implementation.
420
The results for LIVE datasets are similar for several measures. Therefore,
421
in order to investigate whether the relative performance differences between the
422
RATER and other measures are statistically significant, the Wilcoxon rank-sum
423
test is used. In the test, the equivalence of the median values of independent
424
samples with a 5% significance level is measured. The null hypothesis assumes
425
that the SRCC values of compared IQA techniques are drawn from a popula-
426
tion with equal medians. The obtained results are reported in Table 4. The
427
symbol ”1” in the cell denotes that the RATER is statistically better than the
428
measure in the column on the dataset in the row with a confidence greater than
429
95%, while ”0” denotes statistically indistinguishable results. As reported, the
430
results are consistent with conclusions drawn from the previous experiments,
431
i.e., the RATER is statistically superior to the compared methods on TID2013,
432
TID2008, and CSIQ datasets. It can be seen that on LIVE the RATER is on
433
par with the SSIM, HOSA, and BRISQUE, and on par with the BRISQUE on
434
LIVE WIQC. On MLIVE, the results of the RATER are statistically indistin-
435
guishable from the results of the HOSA, BRISQUE, and IL-NIQE. In general,
436
the RATER achieves statistically better performance than compared methods.
437
The performance of the recently introduced DNN-based measures is often
438
reported in terms of median SRCC and PCC values obtained in 10 training-
439
testing iterations. Such small number of tests is dictated and partially justified
440
by the complexity of used architectures. In order to facilitate a fair compar-
21
Table 5: Comparison of the RATER with DNN-based measures.
BIECON PQR [30] PQR [30] PQR [30] Imagewise IQAMSCN RankIQA HPSC MEON DeepIQA [32] (S CNN) (ResNet50) (AlexNet) CNN [54] [55] +FT [56] +DAP [31] [57] [33] RATER TID2013, 3000 images SRCC 0.717 0.692 0.740 0.574 0.800 0.780 0.808 0.761 0.843 PCC 0.762 0.750 0.798 0.669 0.802 0.861 CSIQ, 866 images SRCC 0.815 0.908 0.873 0.871 0.812 0.829 0.893 PCC 0.823 0.927 0.901 0.896 0.791 0.871 0.921 LIVE, 779 images SRCC 0.958 0.964 0.965 0.955 0.963 0.953 0.981 0.919 0.945 PCC 0.960 0.966 0.971 0.964 0.964 0.957 0.921 0.946 Note: Median values of SRCC and PCC over 10 training-testing iterations are reported.
441
ison of the RATER with these measures, its SRCC and PCC median values
442
are reported accordingly. In this test, results for TID2013, CSIQ, and LIVE
443
are shown since most methods are evaluated on one or two IQA datasets. In
444
many cases, small subsets of distorted images are only employed and the results
445
for full-datasets are seldom reported. The comparative evaluation of measures,
446
presented in Table 5, is based on results published in the literature. As re-
447
ported, the RATER clearly outperforms DNN-based measures on the largest
448
IQA datasets. On the LIVE dataset, the results favor other measures as they
449
seem to be more focused on a small number of distortion types present in this
450
dataset. It can be concluded that the RATER achieves superior performance to
451
the compared hand-crafted and DNN-based measures.
452
Figure 6 shows the scatter plots of the subjective score against the predicted
453
score by the introduced measure on all IQA datasets. In plots, each point rep-
454
resents a distorted image. Since the RATER requires training, each scatter plot
455
shows a result of an exemplary quality prediction experiment in which 80% of
456
reference images and their distorted images are used for training and the remain-
457
ing images for testing. For convenience and to ensure coherent comparison, all
458
subjective and objective scores are normalized to the range [0 1]. As presented,
459
the predictions of the RATER are consistent with subjective scores.
22
1 0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6 0.5 0.4
Subjective score
1 0.9
Subjective score
Subjective score
1 0.9
0.6 0.5 0.4
0.5 0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
0
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0
0.1
0.2
0.3
Objective score
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.8
0.8
0.8
0.7
0.7
0.7
0.5 0.4
Subjective score
1 0.9
Subjective score
1
0.6
0.6 0.5 0.4
0.3
0.2
0.2
0.1
0.1
0.1
0
0 0.5
0.6
(d) LIVE
0.7
0.8
0.9
1
0.8
0.9
1
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Objective score
(e) LIVE WIQC
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
(f) MLIVE
3.4. Performance on individual distortion type
461
In order to determine the performance of the introduced measure taking
462
into account individual distortions, it is evaluated on TID2013 dataset. The
463
TID2013 is used since it contains images corrupted by more distortion types
464
than any other dataset. In this experiment, the previously described evaluation
465
protocol is used. Figure 7 shows the results obtained by the RATER and other
466
methods in terms of SRCC. The results obtained with all distortions are also
467
reported. It is evident from the figure that the RATER exhibits stable perfor-
468
mance across distortion types, clearly outperforming compared NR techniques.
469
This experiment also indicates how important the information about the refer-
470
ence images is for image ordering used to calculate SRCC. Hence, considering
23
0.6
Objective score
Figure 6: Scatter plots of subjective opinion scores against scores obtained by the RATER on used datasets. Curves fitted with logistic functions are also shown.
460
0.7
0.4
0.2
Objective score
0.6
0.5
0.3
0.4
0.5
0.6
0.3
0.3
0.4
(c) CSIQ
0.9
0.2
0.3
(b) TID2008
1
0.1
0.2
Objective score
0.9
0
0.1
Objective score
(a) TID2013
Subjective score
0.6
0.7
0.8
0.9
1
Figure 7: Performance of the methods on individual distortions in terms of SRCC.
471
individual distortions, the PSNR and SSIM yield leading quality prediction ac-
472
curacy for distortions such as Gaussian blur or JPEG compression. However,
473
the RATER outperforms other IQA measures on images corrupted by AWGN,
474
high-frequency noise, contrast change, change of color saturation, or lossy com-
475
pression of noisy images. Interestingly, all NR methods have difficulties in cases
476
in which local distortions are considered, i.e., non-eccentricity pattern noise or
477
local block-wise distortions of different intensity. Here, the RATER outperforms
478
them, being more focused on local regions of an image. In general, taking into
479
account individual distortion types, the performance of the introduced measure
480
is comparable to those of the evaluated FR-IQA methods.
481
3.5. Cross-database performance
482
The dataset-independence of the method is verified by a cross-dataset vali-
483
dation. It the experiment, learning-based NR measures are tested on a dataset
484
which is not used for their training. Since the number of distortions in TID2013 24
Table 6: Cross-dataset performance of learning-based NR-IQA methods in terms of SRCC.
Training dataset TID2013 TID2008 CSIQ LIVE
Testing dataset TID2008 TID2013 LIVE CSIQ
HOSA BRISQUE OG-IQA NOREQI RATER 0.839 0.752 0.580 0.662 0.927 0.772 0.656 0.507 0.661 0.753 0.904 0.689 0.830 0.770 0.858 0.584 0.597 0.583 0.520 0.678
485
and TID2008 datasets is similar, they are used together. Consequently, CSIQ
486
and LIVE are also paired. The results, in terms of SRCC, are shown in Table
487
6. It can be observed that the RATER and HOSA exhibit similar performance.
488
The HOSA is better on LIVE if learned on CSIQ and the RATER is better than
489
other measures in the case it is trained on LIVE and tested on CSIQ. In overall,
490
the RATER demonstrates the database independence and robustness.
491
3.6. Computational complexity
492
In practice, the usage of an NR measure is often justified by its computation
493
complexity. Therefore, the TID2013 dataset is used to analyze the computa-
494
tional complexity of a method in terms of the average time taken to assess a
495
512 × 384 image. The compared methods are run using their publicly available
496
Matlab implementations. The experiments are performed on a 3.3GHz Intel
497
Core CPU with 16GB RAM. Table 7 reports obtained timings. As reported,
498
the RATER is slower than the FR measures and the BRISQUE, but it is faster
499
than the remaining NR methods. This confirms the applicability of the RATER
500
in systems that require fast execution time and the superior quality prediction
501
accuracy. Since the perceptual features in the introduced RATER can be ob-
502
tained independently for YCbCr channels, Table 7 also contains the results for
503
the version of the RATER which uses the parallel implementation. It is denoted
504
by ”◦ ”. Note that the Matlab code of IL-NIQE uses such implementation by
505
default. In conclusion, the RATER manifests mild computational complexity.
506
The observed speedup for its parallel implementation indicates that the efficient
507
native implementation, using e.g., C++, would be beneficial.
25
Table 7: Time-cost comparison (in seconds).
PSNR SSIM HOSA BPRI BRISQUE IL-NIQE◦ OG-IQA NOREQI RATER RATER◦ 0.004 0.044 0.440 0.997 0.049 8.20 3.77 0.510 0.356 0.168
508
4. Conclusions
509
Existing NR-IQA methods often rely on computational models which cap-
510
ture changes in the structural information of the degraded images using first-
511
order image derivatives. However, since the HVS can be modeled using up
512
to fourth-order derivatives, in this work, a novel NR-IQA measure has been
513
presented which uses image characteristics captured in the bilaplacian domain.
514
To assess a distorted image, the introduced RATER employs statistics of pixel
515
blocks which describe features detected in the bilaplacian domain of YCbCr
516
channels. Then, the statistics are employed to train the SVR-based quality
517
model. The applicability of the investigated relationship between the features
518
extracted from the high-order image derivatives and image distortions to the
519
NR-IQA is discussed in the paper. The presented technique has been thor-
520
oughly evaluated against the related state-of-the-art NR methods as well as two
521
popular FR techniques (PSNR and SSIM). The NR methods include popular
522
hand-crafted measures and DNN-based approaches. The experimental evalua-
523
tion reveals that the RATER is superior to the compared measures in terms of
524
the visual quality prediction accuracy and achieves a short computation time.
525
Future work will involve an investigation of the usability of approaches which
526
provide a global image representation based on a set of detected features [44] to
527
the NR-IQA.
528 529
530 531 532
533 534 535
The Matlab code of the RATER is publicly available at http: // marosz. kia. prz. edu. pl/ RATER. html . [1] D. M. Chandler, Seven challenges in image quality assessment: Past, present, and future research, ISRN Signal Processing (2013) 53doi:10.1155/2013/905685. [2] M. Leszczuk, K. Kowalczyk, L. Janowski, Z. Papir, Lightweight implementation of no-reference (NR) perceptual quality assessment of H.264/AVC compression, Signal Process.-Image 39 (2015) 457 – 465, 26
536 537
538 539 540
541 542 543
544 545 546
547 548 549
550 551 552
553 554 555
556 557
558 559 560
561 562 563
564 565 566
567 568 569
570 571 572
recent Advances in Vision Modeling for Image and Video Processing. doi:10.1016/j.image.2015.05.003. [3] S. Gabarda, G. Cristbal, N. Goel, Anisotropic blind image quality assessment: Survey and analysis with current methods, J. Vis. Commun. Image R. 52 (2018) 101 – 105. doi:10.1016/j.jvcir.2018.02.008. [4] W. Lin, C.-C. J. Kuo, Perceptual visual quality metrics: A survey, J. Vis. Commun. Image R. 22 (4) (2011) 297 – 312. doi:10.1016/j.jvcir.2011.01.005. [5] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: From error visibility to structural similarity, IEEE T. Image Process. 13 (4) (2004) 600–612. doi:10.1109/tip.2003.819861. [6] A. Rehman, Z. Wang, Reduced-reference image quality assessment by structural similarity estimation, IEEE T. Image Process. 21 (8) (2012) 3378– 3389. doi:10.1109/TIP.2012.2197011. [7] A. Liu, W. Lin, M. Narwaria, Image quality assessment based on gradient similarity, IEEE T. Image Process. 21 (4) (2012) 1500–1512. doi:10.1109/tip.2011.2175935. [8] Y. Wen, Y. Li, X. Zhang, W. Shi, L. Wang, J. Chen, A weighted fullreference image quality assessment based on visual saliency, J. Vis. Commun. Image R. 43 (2017) 119 – 126. doi:10.1016/j.jvcir.2016.12.005. [9] H. R. Sheikh, A. C. Bovik, Image information and visual quality, IEEE T. Image Process. 15 (2) (2006) 430–444. doi:10.1109/TIP.2005.859378. [10] K. Okarma, Quality assessment of images with multiple distortions using combined metrics, Elektron. Elektrotech. 20 (6) (2014) 128–131. doi:10.5755/j01.eee.20.6.7284. [11] M. Oszust, Decision fusion for image quality assessment using an optimization approach, IEEE Signal Proc. Let. 23 (1) (2016) 65–69. doi:10.1109/LSP.2015.2500819. [12] X. Shang, X. Zhao, Y. Ding, Image quality assessment based on joint quality-aware representation construction in multiple domains, Journal of Engineering 2018 (2018) ID 1214697. doi:10.1155/2018/1214697. [13] J. Ospina-Borras, H. D. B. Restrepo, Non-reference assessment of sharpness in blur/noise degraded images, J. Vis. Commun. Image R. 39 (2016) 142 – 151. doi:10.1016/j.jvcir.2016.05.015. [14] R. A. Manap, L. Shao, Non-distortion-specific no-reference image quality assessment: A survey, Inform. Sciences 301 (2015) 141 – 160. doi:10.1016/j.ins.2014.12.055.
27
573 574 575 576
577 578 579
580 581 582
583 584 585
586 587 588
589 590 591 592
593 594 595
596 597 598
599 600 601
602 603 604
605 606 607 608
609 610 611
[15] X. Min, K. Ma, K. Gu, G. Zhai, Z. Wang, W. Lin, Unified blind quality assessment of compressed natural, graphic, and screen content images, IEEE T. Image Process. 26 (11) (2017) 5462–5474. doi:10.1109/TIP.2017.2735192. [16] A. K. Moorthy, A. C. Bovik, Blind image quality assessment: From natural scene statistics to perceptual quality, IEEE T. Image Process. 20 (12) (2011) 3350–3364. doi:10.1109/TIP.2011.2147325. [17] M. A. Saad, A. C. Bovik, C. Charrier, Blind image quality assessment: A natural scene statistics approach in the DCT domain, IEEE T. Image Process. 21 (8) (2012) 3339–3352. doi:10.1109/TIP.2012.2191563. [18] A. Mittal, A. K. Moorthy, A. C. Bovik, No-reference image quality assessment in the spatial domain, IEEE T. Image Process. 21 (12) (2012) 4695–4708. doi:10.1109/TIP.2012.2214050. [19] L. Liu, Y. Hua, Q. Zhao, H. Huang, A. C. Bovik, Blind image quality assessment by relative gradient statistics and adaboosting neural network, Signal Process.-Image 40 (2016) 1 – 15. doi:10.1016/j.image.2015.10.005. [20] W. Xue, X. Mou, L. Zhang, A. C. Bovik, X. Feng, Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features, IEEE T. Image Process. 23 (11) (2014) 4850–4862. doi:10.1109/TIP.2014.2355716. [21] J. Xu, P. Ye, Q. Li, H. Du, Y. Liu, D. Doermann, Blind image quality assessment based on high order statistics aggregation, IEEE T. Image Process. 25 (9) (2016) 4444–4457. doi:10.1109/TIP.2016.2585880. [22] Q. Li, W. Lin, Y. Fang, BSD: Blind image quality assessment based on structural degradation, Neurocomputing 236 (Supplement C) (2017) 93 – 103. doi:10.1016/j.neucom.2016.09.105. [23] Q. Li, W. Lin, Y. Fang, No-reference image quality assessment based on high order derivatives, in: 2016 IEEE International Conference on Multimedia and Expo (ICME), 2016, pp. 1–6. doi:10.1109/ICME.2016.7552997. [24] S. Du, Y. Yan, Y. Ma, Blind image quality assessment with the histogram sequences of high-order local derivative patterns, Digit. Signal Process. 55 (2016) 1 – 12. doi:10.1016/j.dsp.2016.04.006. [25] L. Tang, L. Li, K. Sun, Z. Xia, K. Gu, J. Qian, An efficient and effective blind camera image quality metric via modeling quaternion wavelet coefficients, J. Vis. Commun. Image R. 49 (2017) 204 – 212. doi:10.1016/j.jvcir.2017.09.010. [26] M. Oszust, No-reference image quality assessment using image statistics and robust feature descriptors, IEEE Signal Proc. Let. 24 (11) (2017) 1656– 1660. doi:10.1109/LSP.2017.2754539. 28
612 613 614
615 616 617
618 619 620
621 622
623 624 625
626 627
628 629 630
631 632 633
634 635 636 637
638 639 640
641 642 643
644 645 646
647 648
[27] X. Min, K. Gu, G. Zhai, J. Liu, X. Yang, C. W. Chen, Blind quality assessment based on pseudo reference image, IEEE T. Mult. PP (99) (2017) 1–1. doi:10.1109/TMM.2017.2788206. [28] W. Xue, L. Zhang, X. Mou, Learning without human scores for blind image quality assessment, in: 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 995–1002. doi:10.1109/CVPR.2013.133. [29] L. Zhang, L. Zhang, A. C. Bovik, A feature-enriched completely blind image quality evaluator, IEEE T. Image Process. 24 (8) (2015) 2579–2591. doi:10.1109/TIP.2015.2426416. [30] H. Zeng, L. Zhang, A. C. Bovik, A probabilistic quality representation approach to deep blind image quality prediction, arxiv.org/abs/1708.08190. [31] Z. Zhang, H. Wang, S. Liu, T. S. Durrani, Deep activation pooling for blind image quality assessment, Applied Sciences 8 (4). doi:10.3390/app8040478. [32] J. Kim, S. Lee, Fully deep blind image quality predictor, IEEE J. Sel. Top. Signal 11 (1) (2017) 206–220. doi:10.1109/JSTSP.2016.2639328. [33] K. Ma, W. Liu, T. Liu, Z. Wang, D. Tao, dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs, IEEE T. Image Process. 26 (8) (2017) 3951–3964. doi:10.1109/TIP.2017.2708503. [34] S. Bosse, D. Maniry, T. Wiegand, W. Samek, A deep neural network for image quality assessment, in: IEEE International Conference on Image Processing (ICIP), 2016, pp. 3773–3777. doi:10.1109/ICIP.2016.7533065. [35] K. Ghosh, S. Sarkar, K. Bhaumik, Understanding image structure from a new multi-scale representation of higher order derivative filters, Image Vision Comput. 25 (8) (2007) 1228 – 1238. doi:10.1016/j.imavis.2006.07.022. [36] G. Xie, X. Sun, X. Tong, D. Nowrouzezahrai, Hierarchical diffusion curves for accurate automatic image vectorization, ACM Trans. Graph. 33 (6) (2014) 230:1–230:11. doi:10.1145/2661229.2661275. [37] H. E. Gerhard, F. A. Wichmann, M. Bethge, How sensitive is the human visual system to the local statistics of natural images?, PLOS Computational Biology 9 (1) (2013) 1–15. doi:10.1371/journal.pcbi.1002873. [38] W. Zhou, W. Qiu, M. W. Wu, Utilizing dictionary learning and machine learning for blind quality assessment of 3-D images, IEEE T. Broadcast. 63 (2) (2017) 404–415. doi:10.1109/TBC.2016.2638620. [39] D. Marr, E. Hildreth, Theory of edge detection, Proceedings of the Royal Society of London Series B 207 (1980) 187–217.
29
649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688
[40] International Telecommunications Union, ITU-R Recommendation BT. 601-5: Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios (1995). [41] X. Zhang, S. Wang, S. Ma, W. Gao, A study on interest point guided visual saliency, in: 2015 Picture Coding Symposium (PCS), 2015, pp. 307–311. doi:10.1109/PCS.2015.7170096. [42] E. Rosten, T. Drummond, Machine learning for high-speed corner detection, in: A. Leonardis, H. Bischof, A. Pinz (Eds.), Computer Vision – ECCV 2006, Springer Berlin Heidelberg, Berlin, Heidelberg, 2006, pp. 430– 443. doi:10.1007/11744023_34. [43] N. Ponomarenko, L. Jin, O. Ieremeiev, V. Lukin, K. Egiazarian, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, C.-C. J. Kuo, Image database TID2013: peculiarities results and perspectives, Signal Process.-Image 30 (2015) 57–77. doi:10.1016/j.image.2014.10.009. [44] K. E. A. v. d. Sande, C. G. M. Snoek, A. W. M. Smeulders, Fisher and vlad with flair, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2377–2384. doi:10.1109/CVPR.2014.304. [45] Y. Zhou, L. Li, J. Wu, K. Gu, W. Dong, G. Shi, Blind quality index for multiply distorted images using bi-order structure degradation and nonlocal statistics, IEEE T. Mult. (2018) 1–1doi:10.1109/TMM.2018.2829607. [46] C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (3) (2011) 1–27. doi:10.1145/1961189.1961199. [47] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, F. Battisti, TID2008 - a database for evaluation of full-reference visual quality assessment metrics, Advances of Modern Radioelectronics 10 (2009) 30–45. [48] E. C. Larson, D. M. Chandler, Most apparent distortion: full-reference image quality assessment and the role of strategy, J. Electron. Imaging 19 (1) (2010) 011006. doi:10.1117/1.3267105. [49] D. Ghadiyaram, A. C. Bovik, Massive online crowdsourced study of subjective and objective picture quality, IEEE T. Image Process. 25 (1) (2016) 372–387. doi:10.1109/TIP.2015.2500021. [50] D. Jayaraman, A. Mittal, A. K. Moorthy, A. C. Bovik, Objective quality assessment of multiply distorted images, in: Proc. IEEE Int. Conf. on Signals, Systems, and Computers, (ASILOMAR), IEEE, 2012. doi:10.1109/acssc.2012.6489321. [51] Video Quality Experts Group, Final report from the video quality experts group on the validation of objective models of video quality assessment, phase ii (fr tv2), https://www.itu.int/ITUT/studygroups/com09/docs/tutorial opavc.pdf. 30
689 690 691
692 693 694
695 696 697 698
699 700 701
702 703 704
705 706 707
[52] H. Sheikh, A. Bovik, G. de Veciana, An information fidelity criterion for image quality assessment using natural scene statistics, IEEE T. Image Process. 14 (12) (2005) 2117–2128. doi:10.1109/tip.2005.859389. [53] Q. Lu, W. Zhou, H. Li, A no-reference image sharpness metric based on structural information using sparse representation, Inform. Sciences 369 (2016) 334 – 346. doi:10.1016/j.ins.2016.06.042. [54] J. Kim, H. Zeng, D. Ghadiyaram, S. Lee, L. Zhang, A. C. Bovik, Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment, IEEE Signal Proc. Mag. 34 (6) (2017) 130–141. doi:10.1109/MSP.2017.2736018. [55] C. Fan, Y. Zhang, L. Feng, Q. Jiang, No reference image quality assessment based on multi-expert convolutional neural networks, IEEE Access 6 (2018) 8934–8943. doi:10.1109/ACCESS.2018.2802498. [56] X. Liu, J. van de Weijer, A. D. Bagdanov, Rankiqa: Learning from rankings for no-reference image quality assessment, in: International Conference on Computer Vision (ICCV), 2017. [57] K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, W. Zuo, End-to-end blind image quality assessment using deep neural networks, IEEE T. Image Process. 27 (3) (2018) 1202–1213. doi:10.1109/TIP.2017.2774045.
31
Graphical Abstract
Highlights
A no-reference image quality assessment method is proposed. The method extracts local features from high-order derivatives of color channels Statistics of pixel blocks are used to train a quality model Experimental results on six image datasets reveal the effectiveness of the method