No-reference image quality assessment with local features and high-order derivatives

No-reference image quality assessment with local features and high-order derivatives

Accepted Manuscript No-Reference Image Quality Assessment with Local Features and High-Order Derivatives Mariusz Oszust PII: DOI: Reference: S1047-32...

1MB Sizes 0 Downloads 22 Views

Accepted Manuscript No-Reference Image Quality Assessment with Local Features and High-Order Derivatives Mariusz Oszust PII: DOI: Reference:

S1047-3203(18)30206-2 https://doi.org/10.1016/j.jvcir.2018.08.019 YJVCI 2268

To appear in:

J. Vis. Commun. Image R.

Received Date: Accepted Date:

31 May 2018 21 August 2018

Please cite this article as: M. Oszust, No-Reference Image Quality Assessment with Local Features and High-Order Derivatives, J. Vis. Commun. Image R. (2018), doi: https://doi.org/10.1016/j.jvcir.2018.08.019

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

No-Reference Image Quality Assessment with Local Features and High-Order Derivatives Mariusz Oszust Department of Computer and Control Engineering, Rzeszow University of Technology, Wincentego Pola 2, 35-959 Rzeszow, Poland

Abstract The perceptual quality of images is often affected by applied image processing techniques. Their evaluation requires tests which involve human subjects. However, in most cases, image quality assessment (IQA) should be automatic and reproducible. Therefore, in this paper, a novel no-reference IQA method is proposed. The method uses high-order derivatives to extract detailed structure deformation present in distorted images. Furthermore, it employs local features, considering that only some regions of an image carry interesting information. Then, statistics of local features are used by a support vector regression technique to provide an objective quality score. To improve the quality prediction, luminance and chrominance channels of the image are processed. Experimental results on six large-scale public IQA image datasets show that the proposed method outperforms the state-of-the-art hand-crafted and deep-learning techniques in terms of the visual quality prediction accuracy. Furthermore, the method is better than popular full-reference approaches (i.e., SSIM and PSNR). Keywords: Image quality assessment, No-reference, Local features, Support Vector Regression

Email address: [email protected] (Mariusz Oszust) URL: http://marosz.kia.prz.edu.pl (Mariusz Oszust)

Preprint submitted to Journal of Visual Communication and Image RepresentationAugust 22, 2018

1

1. Introduction

2

The main role of image quality assessment (IQA) techniques is to provide

3

an objective and reproducible evaluation of images, aiming at the replacement

4

of time-consuming and expensive tests with human observers. Such evaluation

5

has become particularly important for various image processing applications,

6

involving the quality monitoring of the visual content or the development of

7

algorithms for image/video processing [1, 2, 3]. Taking into account the avail-

8

ability of a distortion-free, reference image for the evaluation, the objective

9

measures are divided into full-reference (FR), reduced-reference (RR), and no-

10

reference (NR) techniques [1, 4]. Frequently used FR and RR techniques include

11

peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) [5], or RR-

12

SSIM [6]. Apart from structural image information, FR measures also employ

13

contrast changes [7], visual saliency maps [8], or image statistical properties

14

[9]. Furthermore, some measures combine other FR approaches [10, 11] or fuse

15

complementary features [12].

16

The NR-IQA measures can be divided into distortion-specific and general-

17

purpose methods. The techniques which belong to the first category are designed

18

to evaluate images corrupted by specific distortion types, such as blurriness, con-

19

trast change, compression, or noise [13, 14, 15]. The general-purpose methods,

20

in turn, address a variety of distortion types. The focus of this paper is general-

21

purpose NR-IQA. It is worth noticing that the development of such measures is

22

challenging and the most desirable since pristine images are seldom available in

23

practical applications. In a typical NR measure, a vector of perceptual features

24

is mapped into subjective scores using a regression technique in order to obtain

25

a quality model. For example, in DIIVINE [16], a distortion type is predicted

26

and then the image quality is estimated. In BLIINDS-II [17], a generalized nat-

27

ural scene statistics (NSS) model of local discrete cosine transform coefficients

28

is used with a Bayesian inference approach. The BRISQUE [18], in turn, uses

29

pairwise products of neighboring luminance values to train a support vector

30

regression (SVR) model. In OG-IQA index [19], AdaBoosting-backpropagation

2

31

neural network learns image gradient orientations for the IQA. A gradient mag-

32

nitude map and Laplacian of Gaussian (LOG) response are used with the SVR

33

in GM-LOG [20]. A more complex approach can be found in HOSA [21], in

34

which the K-means clustering of normalized image patches is used to create a

35

codebook based on the low and high-order statistics. The HOSA describes an

36

image using 14,700 features and uses statistical differences between a codebook

37

and images for the IQA. The Local Binary Patterns (LBPs) extracted from the

38

first- and second-order image structures are used in BSD [22]. In BHOD [23], in

39

turn, the SVR learns LBP histograms obtained for up to fourth-order gradient

40

maps. The usage of high-order information for the NR-IQA can also be found

41

in HOLDPM which extracts local structures with high-order local derivative

42

pattern (LDP) [24]. Here, the authors report that LBP-based first-order mea-

43

sures are incapable of extracting the high-order information. In HOLDPM, the

44

LOG is used for scale-time decomposition. Then, histograms extracted from the

45

second-order LDPs are used to train a regression model. In CIQM [25], which

46

is based on Quaternion Wavelet Transform, the magnitude and entropy of the

47

subband QWT coefficients and natural scene statistics of the third phase are

48

used.

49

Statistics for an assessed image, an equivalent of the image filtered with

50

Prewitt operators, and descriptors obtained with Speeded-Up Robust Features

51

(SURF) for the processed images are mapped into subjective ratings using the

52

SVR in NOREQI [26]. In UCA [15], which is dedicated to block-based image

53

and video compression, Shi-Tomasi corner detection technique is used to iden-

54

tify statistical differences among corners and edges between block boundaries.

55

Similarly, the BPRI [27] finds interest points in compressed images to determine

56

their quality based on corners which are also detected in an introduced pseudo-

57

reference image. Then, both images are compared using blockiness, sharpness,

58

and noisiness metrics. In that measure, the LBP descriptor characterizes local

59

structures for sharpness and noisiness metrics. The QAC [28], which similarly to

60

BPRI does not require a training step, incorporates a set of centroids of quality

61

levels which belong to four distortion types. Another technique of this type, 3

62

IL-NIQE [29], uses the multivariate Gaussian model to describe natural image

63

statistics derived from multiple cues.

64

With the development of deep neural network (DNN) approaches, many new

65

NR measures have been introduced. Most of them rely on architectures inher-

66

ited from other vision-based tasks which are suited to the NR-IQA [30, 31]. Due

67

to the lack of sufficient training samples, such measures use FR-IQA methods

68

to provide an approximation of subjective scores [32, 33]. Alternatively, image

69

patches are employed [31, 32, 34]. Despite encouraging results, these methods

70

suffer from complex architectures which often require specific hardware imple-

71

mentation for the efficient use and their black-box representations are difficult to

72

interpret. Consequently, the evaluation of their performance is often limited to

73

small subsets of images from popular IQA benchmarks or a few training-testing

74

experiments.

75

The HVS is adapted to extract structural information for understanding the

76

visual content of an image [5, 14, 22]. Therefore, many NR-IQA approaches

77

incorporate the first-order gradient information [23]. However, edges and lines

78

are not sufficient to capture essential information regarding a region in an image

79

[35]. To address this issue, some works employ high-order derivative patterns or

80

high-order derivative magnitude maps [23, 24]. Despite the mimicked physical

81

phenomenon, better results are obtained via extraction of a large number of

82

features, as it is reported for the HOSA [21]. However, it can be stated that

83

the application of high-order image derivatives to the NR-IQA remains largely

84

uninvestigated. In this work, a novel IQA method is proposed in which such

85

derivatives are incorporated. Since the bilaplacian captures sharp boundaries

86

and high-order smooth image variations, it has been successfully used for image

87

vectorization [36]. Moreover, the Laplace or biharmonic operators have a high

88

potential for image inpainting since they enhance the structural information.

89

Consequently, the Laplacian is often used for edge detection with an additional

90

Gaussian filtering to suppress its sensitivity to the noise. Taking into account

91

these findings, it is worth to investigate whether the usage of the high-order

92

derivatives (i.e., the Laplacian and the bilaplacian) can be beneficial to the 4

93

IQA. Unlike the most of the related works in which all pixels in a distorted

94

image are processed, the proposed NR measure incorporates an interest point

95

detection technique to indicate image parts that should be described. The

96

interest point detection, or feature detection, method mimics the attention of

97

the HVS drawn by visually attractive image regions. In this work, features are

98

detected in the bilaplacian domain obtained for luminance and chrominance

99

channels of an image. Then, statistics for pixel blocks describing the detected

100

interest points are used as perceptual features to train a quality model. Here, the

101

SVR is applied to model a nonlinear relationship between perceptual features

102

and quality scores. The main contribution of this study concerns the application

103

of local features in the bilaplacian domain for the NR quality prediction of color

104

images.

105

The experimental results on six large-scale popular IQA benchmark datasets

106

are encouraging and show that the proposed NR-IQA technique better estimates

107

the perceptual quality of distorted images than the related hand-crafted and

108

DNN-based state-of-the-art measures. It also outperforms popular FR mea-

109

sures, i.e., PSNR and SSIM, on the most demanding datasets.

110

The rest of the paper is organized as follows. The next section describes the

111

proposed approach. Section 3, in turn, explains evaluation protocols, introduces

112

public IQA benchmark datasets, and covers the extensive comparative evalua-

113

tion of the introduced method against other IQA techniques. Finally, Section 4

114

concludes the paper.

115

2. Proposed method

116

In this section, the proposed NR-IQA method is described. As illustrated

117

in its block diagram (Fig. 1), the introduced technique processes luminance

118

and chrominance channels of an assessed image. Specifically, for each channel,

119

an interest point detection method indicates regions in the bilaplacian domain

120

(Δ2 ). Then, statistics of pixel blocks of these regions are calculated and used

121

for training the SVR to obtain an image quality model.

5

Figure 1: Flowchart of the proposed NR-IQA method.

122

2.1. High-order image derivatives

123

The HVS is sensitive to change in regularities of natural images caused by

124

noise [37]. Hence, the subjective perception of an image depends on local seman-

125

tic structural information forming primitives in the visual cortex [38]. Accord-

126

ingly, many IQA approaches use first-order derivatives to capture information

127

carried by edges and lines. However, geometric properties and more discrimi-

128

native information can be captured using high-order derivatives. This is con-

129

sistent with models of the receptive fields in the visual cortex which employ

130

up to fourth-order image derivatives [35]. Therefore, it can be assumed that

131

an application of high-order derivatives to the IQA can lead to a significant

132

improvement in the image quality prediction performance.

133

The first-order edge detection algorithms provide locations of image inten-

134

sity changes which are identified by peaks in the gradient domain. All image

6

135

operators which involve differentiation are noise-sensitive. Consequently, the

136

high-order derivatives enhance noisy areas and edges. Therefore, an image is

137

often smoothed by the Gaussian function before the Laplacian is applied to find

138

edges in an image [39]. However, taking into account a possible application of

139

the Laplacian to the IQA, such smoothing can be seen as an additional and

140

unwanted distortion.

141

In the introduced method, images converted into YCbCr color space are

142

used. This color space is recommended in ITU-R BT.601 [40] for the video

143

broadcasting. Consequently, the efficient use of channel bandwidth is achieved

144

by a reduction of the bandwidth for chrominance components which provide

145

considerably less perceptual information than the luminance channel.

146

The directional derivatives of a grayscale image I are written as δI/δx and

147

δI/δy. The I can also represent a single YCbCr channel. The Laplacian is

148

then expressed as Δ = δ 2 I/δx2 + δ 2 I/δy 2 . The finite difference approximation

149

of the horizontal second-order partial derivative δ 2 /δx2 can be written using

150

the following kernel (or the mask) δ 2 /δx2 = [1, −2, 1]. The transposition of this

151

kernel leads to the approximation of δ 2 /δy 2 . Then, the respective horizontal and

152

vertical masks can be used to obtain the following popular Laplacian kernels: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 1 0 1 −2 1 1 0 1 −2 1 −2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ Δ1 = ⎣1 −4 1⎦ , Δ2 = ⎣−2 4 −2⎦ , Δ3 = ⎣0 −4 0⎦ , and Δ4 = ⎣ 1 4 1 ⎦ . 0 1 0 1 −2 1 1 0 1 −2 1 −2 (1)

153

As shown in Eq. (1), masks Δ2 and Δ3 highlight diagonal edges. Conse-

154

quently, a kernel which include diagonals can be calculated combining either Δ1

155

with Δ3 or Δ2 with Δ4 :

156

⎡ ⎤ −1 −1 −1 ⎢ ⎥ Δ5 = ⎣−1 8 −1⎦ . −1 −1 −1

(2)

The Laplacian is applied to the image using the convolution. The bilaplacian

7

157

Δ2 , in turn, is obtained as: Δ2 = ΔΔ =

δ4 δ2 δ2 δ4 + 2 2 2 + 4. 4 δx δx δy δy

(3)

158

To provide kernels for the bilaplacian operator, the Laplacian masks are com-

159

bined. Then, an image is transformed to the bilaplacian domain by convolving

160

it with two Laplacian kernels. This can be written as Δ2ij ∗ I = Δi ∗ Δj ∗ I,

161

where i, j ∈ {1, 2, . . . 5} and ”∗” denotes the convolution. To reduce the number

162

of investigated bilaplacian kernels, only Laplacian masks for i = j and masks

163

which are complementary (j = i + 2; i, j ∈ {1, 2}, see Eq. (1)) are combined.

164

Finally, the following bilaplacian masks are considered: Δ211 , Δ222 , Δ233 , Δ244 ,

165

Δ255 , Δ213 , and Δ224 .

166

2.2. Local feature detector

167

A keypoint (feature, corner, or interest point) is a stable group of pixels which

168

characterize an image region despite image distortions or transformations. Since

169

human eyes always focus on interesting image regions, feature detectors can be

170

used for the prediction of the visual saliency [41]. In the proposed NR method,

171

the Features from Accelerated Segment Test (FAST) [42] technique indicates

172

image regions in the bilaplacian domain for further processing. In other words,

173

it provides a list of keypoints for the image Δ2ij ∗ I. In the FAST, a circle of 16

174

pixels is used to determine whether the center pixel is a corner. Specifically, the

175

feature is detected if 12 contiguous pixels from the circle are significantly darker

176

or brighter than a potential corner. To speed up computations, only several

177

pixels are checked (1, 5, 9, and 13). To address the problems with the ordering

178

of the selected pixels for the comparison and their number, a decision tree is

179

trained on a large number of images with keypoints [42]. For the selection of

180

the features with the strongest response, a non-maximum suppression based on

181

a threshold is applied. The FAST is up to two orders of magnitude faster than

182

other popular feature detectors and identifies considerably stable corners [42].

183

Figure 2 presents an exemplary image from TID2013 dataset [43], its YCbCr

8

Figure 2: YCbCr channels of exemplary image and images resulted from their convolution with Laplacian and bilaplacian kernels. The bottom row contains interest points detected by the FAST technique in bilaplacian domain (shown as green dots).

184

components, as well as images obtained by convolving the channels with two

185

Laplacian masks (i.e., Δ2 and Δ4 ) and the bilaplacian kernel Δ224 . The figure

186

also shows FAST keypoints detected in the bilaplacian domain. As it can be

187

seen, the bilaplacian domain seems to provide more information about image

188

than its second-order derivatives. Interestingly, FAST detects interest points in

189

different image areas of the channels, suggesting that channels in the bilaplacian

190

domain contain complementary information and should be used together to

191

describe the visual content.

192

2.3. Quality prediction

193

In the proposed method, an input RGB image is transformed into the YCbCr

194

color space. Here, each channel (Y , Cb, or Cr), denoted as I, is independently

195

processed. Consequently, I is convolved with the bilaplacian mask and the im-

196

age Δ2ij (I) = Δi ∗ Δj ∗ I is obtained. In the next step, the FAST detector 9

197

indicates N regions in the bilaplacian domain to be described. The parameter

198

N is used to reduce keypoints with the weakest response. This leads to a better

199

IQA model with the proposed approach, as it will be shown in the next section.

200

In the next step, each n-th keypoint (n = 1, 2, . . . , N ; N ≤ N ) is described with

201

a M × M pixel block centered at its location; the N denotes the number of all

202

keypoints detected in the image. The block is used as a feature descriptor since

203

the characterized region contains rich information about intensity changes. In

204

order to assess the quality of an image, the list of N blocks must be transformed

205

into a perceptual feature vector used as the input to the SVR. Here, different

206

approaches can be used, based on positions of pixels in all blocks or keypoints

207

described with such blocks. For example, the most contributing pixel positions

208

to the variability of data in all blocks can be found using Principal Compo-

209

nent Analysis (PCA) and then characterized with some statistics. Furthermore,

210

approaches which are commonly used to provide a representation of an image

211

based on described keypoints, such as the bag of visual words, Fisher vectors or

212

VLAD, can be applied [44]. However, to provide an NR-IQA measure with a

213

reasonably low complexity, without a sophisticated intermediate representation,

214

the global description of Δ2ij (I) is obtained using the following statistics. Here,

215

for all m-th (m = 1, 2, . . . , M 2 ) pixels sample mean μ(D), standard deviation

216

σ(D), and histogram variance hvar(D) are calculated. The D denotes all m-th

217

pixels which describe N keypoints of the Δ2ij (I). The histogram variance is

218

obtained as [19]: hvar(D) =



(h(D) − μ(D))2 .

(4)

D 219

Finally, the perceptual feature vector that contains statistics of pixel blocks

220

is obtained. Since distortions affect images across scales [18], the introduced

221

method also provides statistics for the assessed image downsampled by a factor

222

of two. The resulting feature vector has 18M 2 dimensions. However, in experi-

223

ments on MLIVE dataset, four scales are used as suggested in works devoted to

224

the quality prediction of images corrupted by multiple distortions [45]. To ob-

225

tain a quality model, the popular SVR technique with the radial basis function

10

226

(RBF) kernel is employed [46]. It learns the mapping from the feature space to

227

subjective scores. Before the SVR is applied, the statistics are linearly scaled to

228

the range [0, 1]. The min-max normalization is used to scale the training and

229

testing subsets of the data.

230

Influence of distortions on the bilaplacian. Figure 3 illustrates the impact of

231

different distortion types and their severity on the images convolved with an

232

exemplary bilaplacian kernel (Δ224 ). The first row contains a pristine image

233

from TID2013 dataset and a small region which is magnified to show more de-

234

tails. The distorted regions belong to images corrupted by Gaussian noise and

235

JPEG2K compression. As illustrated, different distortions modify the pristine

236

image in their unique ways, and their severity can be discriminated in the bi-

237

laplacian domain. It is worth noticing how the neighborhood of the magnified

238

object is affected by the considered distortions. Specifically, the Gaussian noise

239

seems to be easily detected since it adds more information captured in the bi-

240

laplacian domain. The amount of information is associated with the severity

241

of this distortion type. For the JPEG2K compression, the introduced artifacts

242

make the intensity changes in the magnified object less distinctive, depending

243

on the compression rate. In general, the luminance component seems to carry

244

more information in comparison to chrominance channels of the image. This is

245

consistent with the recommendation regarding the applicability of YCbCr color

246

space [40]. Consequently, the application of YCbCr and the bilaplacian domain

247

to the IQA seems justified.

248

Sensitivity of statistics to distortions. To provide a more thorough investigation

249

whether the statistics of pixel blocks for features detected in the bilaplacian do-

250

main are suitable for the IQA, the following experiment has been carried out. In

251

the experiment, the Spearman’s Rank Correlation Coefficient (SRCC) between

252

subjective scores for images from LIVE dataset [5] and statistics obtained with

253

the proposed method are used for their evaluation. Here, the mask Δ224 is used.

254

The technique employs 256 bins for the calculation of the histogram variance

255

hvar, and processes up to N = 2000 FAST features per image in the bilaplacian 11

Figure 3: Influence of distortions seen in YCbCr channels convolved with the bilaplacian kernel Δ224 . The pristine image is shown in the first two rows. The remaining rows contain a selected region in images distorted by Gaussian noise (3rd and 4th rows) and JPEG2K compression (5th and 6th rows). The images in even rows are more severely distorted than images in odd rows.

256

domain, where each feature is described by the 15×15 pixel block (M = 15). The

257

results are presented in Table 1 in terms of the mean of SRCC values obtained

258

for statistics which describe a given YCbCr channel. The SRCC performance is

259

shown considering the entire dataset or images distorted by five distortion types;

260

the best three values are written in bold. As reported, the luminance channel

261

carries the most of the perceptual information, while the chrominance channels

262

provide a supportive role. The results for statistics show that the hvar is clearly

263

the best performing quality feature. However, for the quality prediction of im-

264

ages corrupted by white Gaussian noise (WN) or Gaussian blur (GB), other

265

statistics exhibit a competitive SRCC performance. The employed statistics for

266

the cumulative description of pixel blocks seem to complement each other and

267

the observed diversity of performances encourages their joint usage. Therefore,

268

they are used to train the SVR in order to obtain the quality model (cf. Fig.

12

Table 1: Mean SRCC performance of used statistics for YCbCr channels on LIVE dataset.

Dist. type JP2K JPEG WN GB FF All

μ 0.7966 0.8114 0.9764 0.9067 0.7332 0.2898

Y σ 0.7162 0.6348 0.9733 0.8984 0.7267 0.2290

hvar 0.8344 0.9542 0.9813 0.9211 0.7633 0.8303

μ 0.4406 0.1230 0.9842 0.8333 0.3454 0.0932

Cb σ 0.3988 0.2314 0.9843 0.7099 0.3404 0.0023

269

1).

270

3. Experimental results and discussion

hvar 0.5037 0.8078 0.9355 0.8119 0.3587 0.5702

μ 0.4371 0.2707 0.9858 0.8143 0.4133 0.1387

Cr σ 0.3971 0.0622 0.9854 0.5980 0.4066 0.0230

hvar 0.4921 0.7929 0.9484 0.7985 0.4627 0.5909

271

In this Section, the introduced NR-IQA method which applies stAT istics

272

of pixel blocks of local fE atuRes detected in the bilaplacian domain of YCbCr

273

channels (RATER) is compared against the state-of-the-art NR techniques and

274

two popular FR-IQA methods.

275

3.1. IQA Datasets and Evaluation Protocol

276

The quality prediction performance of NR-IQA measures can be evaluated

277

using IQA benchmark datasets. Each dataset contains reference images, dis-

278

torted images, and subjective scores obtained in tests with human subjects.

279

Subjective ratings are denoted as mean opinion scores (MOS) or difference MOS

280

(DMOS). In this work, the following six publicly available IQA datasets are used:

281

(i) TID2013 [43], (ii) TID2008 [47], (iii) CSIQ [48], (iv) LIVE [5], (v) LIVE In

282

the Wild Image Quality Challenge (LIVE WIQC) [49], and (vi) LIVE Multiply

283

Distorted Image Quality Database (MLIVE) [50].

284

The LIVE dataset contains 29 reference images corrupted by the following

285

five distortion types at various levels: JPEG compression, JPEG2K compres-

286

sion, Gaussian blur, white noise (AWGN), and simulated fast fading Rayleigh

287

channel. There are 779 distorted images in this dataset [5]. The CSIQ contains

288

30 reference images and 866 images corrupted by six types of distortion with up 13

289

to four or five distortion levels. The distortions used in this dataset are JPEG

290

compression, JPEG2K compression, Gaussian blur, global contrast decrements,

291

and additive pink Gaussian noise. The TID2008 is two times larger than the

292

CSIQ and it contains 1700 images distorted by 17 distortion types. There are

293

25 reference images in this dataset and four distortion levels for each distortion

294

type. The TID2013 contains 3000 distorted images, 24 distortion types, and

295

five levels of distortions. This dataset is considered the most challenging IQA

296

benchmark due to its size and the diversity of distortion types. The experimen-

297

tal evaluation presented in this paper also contains tests on the LIVE WIQC

298

dataset [49]. This dataset contains 1162 images captured by mobile camera de-

299

vices which are corrupted by multiple distortions. It is worth noticing that the

300

LIVE WIQC does not contain reference images and all tests with human ob-

301

servers were performed in an uncontrolled manner using the Amazon Mechanical

302

Turk. The MLIVE dataset contains 450 images distorted with Gaussian blur

303

followed by JPEG compression and Gaussian blur followed by Gaussian Noise.

304

Interestingly, TID2013 also contains examples of multiple-distortions (e.g., the

305

lossy compression of noisy images).

306

The performance of a given objective measure is evaluated using SRCC,

307

Kendall Rank order Correlation Coefficient (KRCC), Pearson Correlation Co-

308

efficient (PCC), and Root Mean Square Error (RMSE). As recommended by

309

the Video Quality Experts Group [51], the nonlinear relationship between sub-

310

jective and predicted scores can be taken into account by the application of a

311

nonlinear logistic regression before PCC and RMSE are calculated [52]: Q p = β1

1 1 − 2 1 + exp(β2 (Q − β3 ))

+ β4 Q + β 5 ,

(5)

312

where Q is the predicted score, Qp is the fitted score, and β = [β1 , β2 , . . . , β5 ]

313

are parameters determined by the regression.

314

The SRCC is obtained as: SRCC(Q, S) = 1 −

14

m 6 i=1 d2i , m(m2 − 1)

(6)

315

where di is the difference between i-th image in Q and S, and m is the total

316

number of images. The KRCC uses the number of concordant and discordant

317

pairs in the dataset, mc and md , respectively: KRCC(Q, S) =

318

mc − md m(m−1) 2

.

(7)

The PCC is calculated as: T¯ Q¯p S , PCC(Qp , S) = T ¯ ¯T ¯ ¯ Qp Qp S S

(8)

319

¯ The RMSE, in where, the mean-removed vectors are denoted as Q¯p and S.

320

turn, is written as:

RMSE(Qp , S) =

(Qp − S)T (Qp − S) . m

(9)

321

The better IQA measure is characterized by a smaller RMSE and larger SRCC,

322

KRCC, and PCC in comparison to other IQA methods.

323

Since the presented NR technique requires training, as many other learning-

324

based methods evaluated in further sections of this paper, the SVR model used

325

for the visual quality prediction is obtained using a widely accepted protocol. In

326

the protocol, each IQA benchmark dataset is divided into disjoint learning and

327

testing subsets, i.e., all distorted images which belong to 80% reference images

328

are used for training and the remaining 20% images are used in tests [21, 27].

329

The performance of an IQA measure is reported as the median values of SRCC,

330

KRCC, PCC, and RMSE over random 100 training-testing iterations [53]. In

331

order to avoid bias and fairly evaluate methods using the protocol, all learning-

332

based techniques are always run on the same 100 subsets. This protocol is also

333

used to evaluate the influence of parameters of the RATER on its performance.

334

The SVR models which map the objective scores into subjective ratings for

335

the RATER and other methods are obtained using LIBSVM library [46], aiming

336

at their best performance [20, 29].

15

Figure 4: SRCC performance of the proposed method with the considered Laplacian and bilaplacian kernels on TID2013 and LIVE datasets. The performance with raw YCbCr images is also shown.

337

3.2. Implementation details and feature analysis

338

The effectiveness of the introduced method based on statistics of features

339

detected and described in the high-order domain and the influence of the pa-

340

rameters of the method on its performance are investigated. Figure 4 contains

341

the SRCC performance of the method in the Laplacian and bilaplacian do-

342

mains obtained with different kernels as well as with raw YCbCr images. Here,

343

TID2013 and LIVE are used. The SRCC is shown since other performance cri-

344

teria (KRCC, PCC, and RMSE) lead to similar findings. The RATER is run

345

with M = 15 and N = N . As illustrated, the RATER with bilaplacian kernels

346

achieves greater SRCC values than it can be observed for the Laplacian do-

347

main or raw YCbCr images. Among bilaplacian kernels, the mask Δ224 provides

348

the best performance, followed by Δ222 , Δ244 kernels. The results for Laplacian

349

masks Δ2 and Δ4 are also promising. Since these masks perform better than

350

others and are complementary, their combination (i.e., Δ224 ) is beneficial to the

351

IQA. The inferior prediction performance of the method using raw color images

352

justifies the proposed application of high-order derivatives. In the experiments

353

presented in the remaining parts of this paper, the bilaplacian mask Δ224 is

354

employed.

16

(a)

(b)

Figure 5: Influence of the size of the pixel block M (a) and the maximum number of processed keypoints per image N (b) on the performance of the RATER, in terms of SRCC on TID2013 and LIVE benchmarks.

355

Apart from the bilaplacian mask, the impact of parameters M and N also

356

requires investigation. Therefore, Fig. 5 presents the performance of the method

357

on TID2013 and LIVE varying the size of the pixel block M and the number of

358

keypoints per the assessed image N . As reported, the RATER is not sensitive

359

to the changes of these two parameters. For the M = 15, the pixel block is two

360

times larger than the region used for the interest point detection with FAST;

361

therefore, it is used in the proposed method. To reduce the number of possible

362

test configurations, M = 15 is used in experiments with the parameter N . Since

363

the N = 2000 reflects a typical number of features per image applied in image

364

matching and influences the speed of image description, it is used in further

365

experiments with the RATER.

366

Since the introduced technique incorporates three statistics, Table 2 presents

367

their contribution to the SRCC performance on LIVE and TID2013. The table

368

also contains the values obtained for separate color components as well as for

369

RGB and HSV color spaces. As shown in the table and discussed in Section 2.3,

370

the hvar is the most contributing perceptual feature used in the RATER. How-

371

ever, the use of all three statistics leads to the best results on TID2013. Taking

372

into account the results for color channels, it is evident that the luminance chan-

373

nel carries more perceptual information in comparison to chrominance channels.

17

Table 2: Contribution of statistics, color spaces, or methods for feature detection in images to the SRCC performance of the RATER.

Test

IQA dataset LIVE TID2013

Statistic or channel μ σ hvar Y Cb Cr All Color space RGB HSV YCbCr Feature detector Shi - Tomasi Harris - Stephens SURF FAST

0.9268 0.7173 0.9111 0.6965 0.9364 0.7801 0.9348 0.6934 0.7971 0.6725 0.7966 0.6755 0.9422 0.8269 0.9387 0.7338 0.8902 0.6729 0.9422 0.8269 0.9276 0.7017 0.9285 0.6827 0.8770 0.7177 0.9422 0.8269

374

The performance difference observed for channels is smaller for TID2013 dataset.

375

This can be also attributed to a large number of distortions and their levels,

376

including distortions affecting the perceptual quality of color images. Conse-

377

quently, the quality of images in this dataset is better predicted by the RATER

378

using the color information. The RATER with the RGB color space performs

379

better than with the HSV, but it is clearly outperformed by the case in which

380

the YCbCr color space is employed. To show that the RATER achieves the best

381

performance with FAST keypoints, the results with other popular keypoint de-

382

tectors are also reported in Table 2. The suitability of the FAST detector comes

383

from the fact that it decides whether an image region contains a corner taking

384

into account values of neighboring pixels in the bilaplacian domain, while other

385

methods apply blurring to the region to find features. It is also faster than these

386

techniques.

18

387

3.3. Performance on IQA datasets

388

The introduced NR measure is compared with the following state-of-the-art

389

measures: HOSA [21], BPRI [27], BRISQUE [19], IL-NIQE [29], OG-IQA [19],

390

and NOREQI [26]. Furthermore, PSNR and SSIM [5], as the most popular

391

full-reference IQA measures, are added to the comparative evaluation. The

392

NOREQI and BPRI are recently introduced general-purpose measures which

393

incorporate feature detection step to facilitate the quality prediction. The IL-

394

NIQE and HOSA are devoted to the assessment of color images and they out-

395

perform BLIINDS2, DIIVINE, CORNIA, NIQE, BRISQUE, and QAC [21, 29].

396

The HOSA is also better than GM-LOG and IL-NIQE [21]. Since BPRI, IL-

397

NIQE, PSNR, and SSIM do not require a learning step, their performance is

398

evaluated using the testing subsets defined according to the applied protocol

399

(see Section 3.1) [29].

400

The median values of SRCC, KRCC, PCC, and RMSE obtained for the

401

compared IQA measures on six datasets are reported in Table 3. In the table, the

402

best performing IQA measure is written in italics and the best performing NR-

403

IQA technique is written in bold. As demonstrated, the RATER outperforms all

404

compared IQA measures on TID2013, TID2008, and CSIQ. The large difference

405

between the results obtained by the RATER and the second best technique

406

(i.e., SSIM) is encouraging and justifies the application of local features in the

407

bilaplacian domain to the NR-IQA. On LIVE dataset, the RATER is the best

408

performing NR measure, with values of evaluation criteria within 1% of the

409

results obtained by the SSIM. Since LIVE WIQC does not contain reference

410

images, FR measures cannot be evaluated on this dataset. Here, the best results

411

are obtained by the RATER and NOREQI. On MLIVE, in turn, the IL-NIQE,

412

RATER, and BRISQUE exhibit a similar level of performance. Table 3 also

413

contains overall values calculated as the average and weighted average. For the

414

weighted average, the number of images in the database is used as its weight.

415

The overall values are not reported for the RMSE due to the different range

416

of DMOS values for LIVE datasets. The overall results reveal the superiority

417

of the introduced NR-IQA technique. Among the remaining NR measures, the 19

Table 3: Performance of the evaluated methods on six IQA datasets.

PSNR

SSIM

SRCC KRCC PCC RMSE

0.6344 0.4667 0.6983 0.8858

0.7423 0.5631 0.7947 0.7468

SRCC KRCC PCC RMSE

0.5539 0.4039 0.5382 1.1274

0.7788 0.5831 0.7788 0.8414

SRCC KRCC PCC RMSE

0.8117 0.6172 0.8083 0.1528

0.8803 0.7006 0.8674 0.1291

SRCC KRCC PCC RMSE

0.8788 0.6937 0.8790 13.135

0.9473 0.8003 0.9451 8.9088

SRCC KRCC PCC RMSE

NA NA NA NA

NA NA NA NA

SRCC KRCC PCC RMSE

0.6834 0.5110 0.7595 12.0767

0.8636 0.6797 0.8873 8.7075

SRCC 0.7124 0.8425 KRCC 0.5385 0.6654 PCC 0.7366 0.8547 SRCC 0.6681 0.8006 KRCC 0.4991 0.6205 PCC 0.6970 0.8234

HOSA BPRI BRISQUE IL-NIQE OG-IQA NOREQI RATER TID2013, 3000 images 0.7132 0.2222 0.5551 0.5126 0.4855 0.5565 0.8269 0.5392 0.1527 0.3988 0.3631 0.3473 0.3930 0.6411 0.7823 0.4660 0.6486 0.6307 0.6228 0.6556 0.8409 0.7734 1.0946 0.9422 0.9679 0.9712 0.9361 0.6703 TID2008, 1700 images 0.7732 0.1825 0.6066 0.1510 0.5802 0.6203 0.8257 0.5935 0.1291 0.4423 0.1005 0.4191 0.4504 0.6496 0.8136 0.4747 0.6759 0.1984 0.6666 0.7008 0.8362 0.7732 1.1801 0.9831 1.3157 1.0024 0.9529 0.7361 CSIQ, 866 images 0.8290 0.5679 0.8608 0.8683 0.7689 0.8215 0.8983 0.6400 0.4238 0.6801 0.6852 0.5759 0.6346 0.7240 0.8473 0.7250 0.8851 0.8860 0.8064 0.8494 0.9211 0.1433 0.1781 0.1250 0.1254 0.1589 0.1418 0.1024 LIVE, 779 images 0.9408 0.8826 0.9391 0.8993 0.9159 0.8670 0.9422 0.7922 0.7211 0.7923 0.7200 0.7638 0.6886 0.7987 0.9415 0.8808 0.9427 0.9061 0.9195 0.8850 0.9428 9.1579 13.002 8.9522 11.567 10.801 12.888 8.9412 LIVE WIQC, 1162 images 0.5481 0.1700 0.6049 0.1917 0.4702 0.5827 0.6033 0.3734 0.1140 0.4276 0.1289 0.3223 0.4128 0.4277 0.5853 0.2969 0.6422 0.1930 0.5134 0.6307 0.6285 16.376 19.289 15.494 19.730 17.245 15.728 15.748 MLIVE, 450 images 0.8817 0.0041 0.8943 0.9077 0.8256 0.8760 0.8915 0.7051 0.2880 0.7148 0.736 4 0.6410 0.7015 0.7218 0.9143 0.4640 0.9183 0.8951 0.8834 0.8935 0.9191 7.5447 1.9488 7.6526 8.2772 8.8110 8.5553 7.6194 Overall direct 0.8276 0.3719 0.7712 0.6678 0.7152 0.7483 0.8769 0.6540 0.3429 0.6057 0.5210 0.5494 0.5736 0.7070 0.8598 0.6021 0.8141 0.7033 0.7797 0.7969 0.8920 Overall weighted 0.7802 0.3176 0.6734 0.5380 0.6172 0.6630 0.8532 0.6056 0.2555 0.5116 0.4041 0.4616 0.4925 0.6772 0.8254 0.5486 0.7371 0.6042 0.7084 0.7337 0.8668

20

Table 4: Results of statistical significance tests.

PSNR SSIM HOSA BPRI BRISQUE IL-NIQE OG-IQA NOREQI TID2013 1 1 1 1 1 1 1 1 TID2008 1 1 1 1 1 1 1 1 CSIQ 1 1 1 1 1 1 1 1 LIVE 1 0 0 1 0 1 1 1 LIVE WIQC NA NA 1 1 0 1 1 1 MLIVE 1 1 0 1 0 0 1 1

418

HOSA exhibits acceptable performance and BPRI seems to be more suitable

419

for less diverse datasets due to distortion-specific steps in its implementation.

420

The results for LIVE datasets are similar for several measures. Therefore,

421

in order to investigate whether the relative performance differences between the

422

RATER and other measures are statistically significant, the Wilcoxon rank-sum

423

test is used. In the test, the equivalence of the median values of independent

424

samples with a 5% significance level is measured. The null hypothesis assumes

425

that the SRCC values of compared IQA techniques are drawn from a popula-

426

tion with equal medians. The obtained results are reported in Table 4. The

427

symbol ”1” in the cell denotes that the RATER is statistically better than the

428

measure in the column on the dataset in the row with a confidence greater than

429

95%, while ”0” denotes statistically indistinguishable results. As reported, the

430

results are consistent with conclusions drawn from the previous experiments,

431

i.e., the RATER is statistically superior to the compared methods on TID2013,

432

TID2008, and CSIQ datasets. It can be seen that on LIVE the RATER is on

433

par with the SSIM, HOSA, and BRISQUE, and on par with the BRISQUE on

434

LIVE WIQC. On MLIVE, the results of the RATER are statistically indistin-

435

guishable from the results of the HOSA, BRISQUE, and IL-NIQE. In general,

436

the RATER achieves statistically better performance than compared methods.

437

The performance of the recently introduced DNN-based measures is often

438

reported in terms of median SRCC and PCC values obtained in 10 training-

439

testing iterations. Such small number of tests is dictated and partially justified

440

by the complexity of used architectures. In order to facilitate a fair compar-

21

Table 5: Comparison of the RATER with DNN-based measures.

BIECON PQR [30] PQR [30] PQR [30] Imagewise IQAMSCN RankIQA HPSC MEON DeepIQA [32] (S CNN) (ResNet50) (AlexNet) CNN [54] [55] +FT [56] +DAP [31] [57] [33] RATER TID2013, 3000 images SRCC 0.717 0.692 0.740 0.574 0.800 0.780 0.808 0.761 0.843 PCC 0.762 0.750 0.798 0.669 0.802 0.861 CSIQ, 866 images SRCC 0.815 0.908 0.873 0.871 0.812 0.829 0.893 PCC 0.823 0.927 0.901 0.896 0.791 0.871 0.921 LIVE, 779 images SRCC 0.958 0.964 0.965 0.955 0.963 0.953 0.981 0.919 0.945 PCC 0.960 0.966 0.971 0.964 0.964 0.957 0.921 0.946 Note: Median values of SRCC and PCC over 10 training-testing iterations are reported.

441

ison of the RATER with these measures, its SRCC and PCC median values

442

are reported accordingly. In this test, results for TID2013, CSIQ, and LIVE

443

are shown since most methods are evaluated on one or two IQA datasets. In

444

many cases, small subsets of distorted images are only employed and the results

445

for full-datasets are seldom reported. The comparative evaluation of measures,

446

presented in Table 5, is based on results published in the literature. As re-

447

ported, the RATER clearly outperforms DNN-based measures on the largest

448

IQA datasets. On the LIVE dataset, the results favor other measures as they

449

seem to be more focused on a small number of distortion types present in this

450

dataset. It can be concluded that the RATER achieves superior performance to

451

the compared hand-crafted and DNN-based measures.

452

Figure 6 shows the scatter plots of the subjective score against the predicted

453

score by the introduced measure on all IQA datasets. In plots, each point rep-

454

resents a distorted image. Since the RATER requires training, each scatter plot

455

shows a result of an exemplary quality prediction experiment in which 80% of

456

reference images and their distorted images are used for training and the remain-

457

ing images for testing. For convenience and to ensure coherent comparison, all

458

subjective and objective scores are normalized to the range [0 1]. As presented,

459

the predictions of the RATER are consistent with subjective scores.

22

1 0.9

0.8

0.8

0.8

0.7

0.7

0.7

0.6 0.5 0.4

Subjective score

1 0.9

Subjective score

Subjective score

1 0.9

0.6 0.5 0.4

0.5 0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0

0.1

0.2

0.3

Objective score

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.8

0.8

0.8

0.7

0.7

0.7

0.5 0.4

Subjective score

1 0.9

Subjective score

1

0.6

0.6 0.5 0.4

0.3

0.2

0.2

0.1

0.1

0.1

0

0 0.5

0.6

(d) LIVE

0.7

0.8

0.9

1

0.8

0.9

1

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Objective score

(e) LIVE WIQC

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.5

(f) MLIVE

3.4. Performance on individual distortion type

461

In order to determine the performance of the introduced measure taking

462

into account individual distortions, it is evaluated on TID2013 dataset. The

463

TID2013 is used since it contains images corrupted by more distortion types

464

than any other dataset. In this experiment, the previously described evaluation

465

protocol is used. Figure 7 shows the results obtained by the RATER and other

466

methods in terms of SRCC. The results obtained with all distortions are also

467

reported. It is evident from the figure that the RATER exhibits stable perfor-

468

mance across distortion types, clearly outperforming compared NR techniques.

469

This experiment also indicates how important the information about the refer-

470

ence images is for image ordering used to calculate SRCC. Hence, considering

23

0.6

Objective score

Figure 6: Scatter plots of subjective opinion scores against scores obtained by the RATER on used datasets. Curves fitted with logistic functions are also shown.

460

0.7

0.4

0.2

Objective score

0.6

0.5

0.3

0.4

0.5

0.6

0.3

0.3

0.4

(c) CSIQ

0.9

0.2

0.3

(b) TID2008

1

0.1

0.2

Objective score

0.9

0

0.1

Objective score

(a) TID2013

Subjective score

0.6

0.7

0.8

0.9

1

Figure 7: Performance of the methods on individual distortions in terms of SRCC.

471

individual distortions, the PSNR and SSIM yield leading quality prediction ac-

472

curacy for distortions such as Gaussian blur or JPEG compression. However,

473

the RATER outperforms other IQA measures on images corrupted by AWGN,

474

high-frequency noise, contrast change, change of color saturation, or lossy com-

475

pression of noisy images. Interestingly, all NR methods have difficulties in cases

476

in which local distortions are considered, i.e., non-eccentricity pattern noise or

477

local block-wise distortions of different intensity. Here, the RATER outperforms

478

them, being more focused on local regions of an image. In general, taking into

479

account individual distortion types, the performance of the introduced measure

480

is comparable to those of the evaluated FR-IQA methods.

481

3.5. Cross-database performance

482

The dataset-independence of the method is verified by a cross-dataset vali-

483

dation. It the experiment, learning-based NR measures are tested on a dataset

484

which is not used for their training. Since the number of distortions in TID2013 24

Table 6: Cross-dataset performance of learning-based NR-IQA methods in terms of SRCC.

Training dataset TID2013 TID2008 CSIQ LIVE

Testing dataset TID2008 TID2013 LIVE CSIQ

HOSA BRISQUE OG-IQA NOREQI RATER 0.839 0.752 0.580 0.662 0.927 0.772 0.656 0.507 0.661 0.753 0.904 0.689 0.830 0.770 0.858 0.584 0.597 0.583 0.520 0.678

485

and TID2008 datasets is similar, they are used together. Consequently, CSIQ

486

and LIVE are also paired. The results, in terms of SRCC, are shown in Table

487

6. It can be observed that the RATER and HOSA exhibit similar performance.

488

The HOSA is better on LIVE if learned on CSIQ and the RATER is better than

489

other measures in the case it is trained on LIVE and tested on CSIQ. In overall,

490

the RATER demonstrates the database independence and robustness.

491

3.6. Computational complexity

492

In practice, the usage of an NR measure is often justified by its computation

493

complexity. Therefore, the TID2013 dataset is used to analyze the computa-

494

tional complexity of a method in terms of the average time taken to assess a

495

512 × 384 image. The compared methods are run using their publicly available

496

Matlab implementations. The experiments are performed on a 3.3GHz Intel

497

Core CPU with 16GB RAM. Table 7 reports obtained timings. As reported,

498

the RATER is slower than the FR measures and the BRISQUE, but it is faster

499

than the remaining NR methods. This confirms the applicability of the RATER

500

in systems that require fast execution time and the superior quality prediction

501

accuracy. Since the perceptual features in the introduced RATER can be ob-

502

tained independently for YCbCr channels, Table 7 also contains the results for

503

the version of the RATER which uses the parallel implementation. It is denoted

504

by ”◦ ”. Note that the Matlab code of IL-NIQE uses such implementation by

505

default. In conclusion, the RATER manifests mild computational complexity.

506

The observed speedup for its parallel implementation indicates that the efficient

507

native implementation, using e.g., C++, would be beneficial.

25

Table 7: Time-cost comparison (in seconds).

PSNR SSIM HOSA BPRI BRISQUE IL-NIQE◦ OG-IQA NOREQI RATER RATER◦ 0.004 0.044 0.440 0.997 0.049 8.20 3.77 0.510 0.356 0.168

508

4. Conclusions

509

Existing NR-IQA methods often rely on computational models which cap-

510

ture changes in the structural information of the degraded images using first-

511

order image derivatives. However, since the HVS can be modeled using up

512

to fourth-order derivatives, in this work, a novel NR-IQA measure has been

513

presented which uses image characteristics captured in the bilaplacian domain.

514

To assess a distorted image, the introduced RATER employs statistics of pixel

515

blocks which describe features detected in the bilaplacian domain of YCbCr

516

channels. Then, the statistics are employed to train the SVR-based quality

517

model. The applicability of the investigated relationship between the features

518

extracted from the high-order image derivatives and image distortions to the

519

NR-IQA is discussed in the paper. The presented technique has been thor-

520

oughly evaluated against the related state-of-the-art NR methods as well as two

521

popular FR techniques (PSNR and SSIM). The NR methods include popular

522

hand-crafted measures and DNN-based approaches. The experimental evalua-

523

tion reveals that the RATER is superior to the compared measures in terms of

524

the visual quality prediction accuracy and achieves a short computation time.

525

Future work will involve an investigation of the usability of approaches which

526

provide a global image representation based on a set of detected features [44] to

527

the NR-IQA.

528 529

530 531 532

533 534 535

The Matlab code of the RATER is publicly available at http: // marosz. kia. prz. edu. pl/ RATER. html . [1] D. M. Chandler, Seven challenges in image quality assessment: Past, present, and future research, ISRN Signal Processing (2013) 53doi:10.1155/2013/905685. [2] M. Leszczuk, K. Kowalczyk, L. Janowski, Z. Papir, Lightweight implementation of no-reference (NR) perceptual quality assessment of H.264/AVC compression, Signal Process.-Image 39 (2015) 457 – 465, 26

536 537

538 539 540

541 542 543

544 545 546

547 548 549

550 551 552

553 554 555

556 557

558 559 560

561 562 563

564 565 566

567 568 569

570 571 572

recent Advances in Vision Modeling for Image and Video Processing. doi:10.1016/j.image.2015.05.003. [3] S. Gabarda, G. Cristbal, N. Goel, Anisotropic blind image quality assessment: Survey and analysis with current methods, J. Vis. Commun. Image R. 52 (2018) 101 – 105. doi:10.1016/j.jvcir.2018.02.008. [4] W. Lin, C.-C. J. Kuo, Perceptual visual quality metrics: A survey, J. Vis. Commun. Image R. 22 (4) (2011) 297 – 312. doi:10.1016/j.jvcir.2011.01.005. [5] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: From error visibility to structural similarity, IEEE T. Image Process. 13 (4) (2004) 600–612. doi:10.1109/tip.2003.819861. [6] A. Rehman, Z. Wang, Reduced-reference image quality assessment by structural similarity estimation, IEEE T. Image Process. 21 (8) (2012) 3378– 3389. doi:10.1109/TIP.2012.2197011. [7] A. Liu, W. Lin, M. Narwaria, Image quality assessment based on gradient similarity, IEEE T. Image Process. 21 (4) (2012) 1500–1512. doi:10.1109/tip.2011.2175935. [8] Y. Wen, Y. Li, X. Zhang, W. Shi, L. Wang, J. Chen, A weighted fullreference image quality assessment based on visual saliency, J. Vis. Commun. Image R. 43 (2017) 119 – 126. doi:10.1016/j.jvcir.2016.12.005. [9] H. R. Sheikh, A. C. Bovik, Image information and visual quality, IEEE T. Image Process. 15 (2) (2006) 430–444. doi:10.1109/TIP.2005.859378. [10] K. Okarma, Quality assessment of images with multiple distortions using combined metrics, Elektron. Elektrotech. 20 (6) (2014) 128–131. doi:10.5755/j01.eee.20.6.7284. [11] M. Oszust, Decision fusion for image quality assessment using an optimization approach, IEEE Signal Proc. Let. 23 (1) (2016) 65–69. doi:10.1109/LSP.2015.2500819. [12] X. Shang, X. Zhao, Y. Ding, Image quality assessment based on joint quality-aware representation construction in multiple domains, Journal of Engineering 2018 (2018) ID 1214697. doi:10.1155/2018/1214697. [13] J. Ospina-Borras, H. D. B. Restrepo, Non-reference assessment of sharpness in blur/noise degraded images, J. Vis. Commun. Image R. 39 (2016) 142 – 151. doi:10.1016/j.jvcir.2016.05.015. [14] R. A. Manap, L. Shao, Non-distortion-specific no-reference image quality assessment: A survey, Inform. Sciences 301 (2015) 141 – 160. doi:10.1016/j.ins.2014.12.055.

27

573 574 575 576

577 578 579

580 581 582

583 584 585

586 587 588

589 590 591 592

593 594 595

596 597 598

599 600 601

602 603 604

605 606 607 608

609 610 611

[15] X. Min, K. Ma, K. Gu, G. Zhai, Z. Wang, W. Lin, Unified blind quality assessment of compressed natural, graphic, and screen content images, IEEE T. Image Process. 26 (11) (2017) 5462–5474. doi:10.1109/TIP.2017.2735192. [16] A. K. Moorthy, A. C. Bovik, Blind image quality assessment: From natural scene statistics to perceptual quality, IEEE T. Image Process. 20 (12) (2011) 3350–3364. doi:10.1109/TIP.2011.2147325. [17] M. A. Saad, A. C. Bovik, C. Charrier, Blind image quality assessment: A natural scene statistics approach in the DCT domain, IEEE T. Image Process. 21 (8) (2012) 3339–3352. doi:10.1109/TIP.2012.2191563. [18] A. Mittal, A. K. Moorthy, A. C. Bovik, No-reference image quality assessment in the spatial domain, IEEE T. Image Process. 21 (12) (2012) 4695–4708. doi:10.1109/TIP.2012.2214050. [19] L. Liu, Y. Hua, Q. Zhao, H. Huang, A. C. Bovik, Blind image quality assessment by relative gradient statistics and adaboosting neural network, Signal Process.-Image 40 (2016) 1 – 15. doi:10.1016/j.image.2015.10.005. [20] W. Xue, X. Mou, L. Zhang, A. C. Bovik, X. Feng, Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features, IEEE T. Image Process. 23 (11) (2014) 4850–4862. doi:10.1109/TIP.2014.2355716. [21] J. Xu, P. Ye, Q. Li, H. Du, Y. Liu, D. Doermann, Blind image quality assessment based on high order statistics aggregation, IEEE T. Image Process. 25 (9) (2016) 4444–4457. doi:10.1109/TIP.2016.2585880. [22] Q. Li, W. Lin, Y. Fang, BSD: Blind image quality assessment based on structural degradation, Neurocomputing 236 (Supplement C) (2017) 93 – 103. doi:10.1016/j.neucom.2016.09.105. [23] Q. Li, W. Lin, Y. Fang, No-reference image quality assessment based on high order derivatives, in: 2016 IEEE International Conference on Multimedia and Expo (ICME), 2016, pp. 1–6. doi:10.1109/ICME.2016.7552997. [24] S. Du, Y. Yan, Y. Ma, Blind image quality assessment with the histogram sequences of high-order local derivative patterns, Digit. Signal Process. 55 (2016) 1 – 12. doi:10.1016/j.dsp.2016.04.006. [25] L. Tang, L. Li, K. Sun, Z. Xia, K. Gu, J. Qian, An efficient and effective blind camera image quality metric via modeling quaternion wavelet coefficients, J. Vis. Commun. Image R. 49 (2017) 204 – 212. doi:10.1016/j.jvcir.2017.09.010. [26] M. Oszust, No-reference image quality assessment using image statistics and robust feature descriptors, IEEE Signal Proc. Let. 24 (11) (2017) 1656– 1660. doi:10.1109/LSP.2017.2754539. 28

612 613 614

615 616 617

618 619 620

621 622

623 624 625

626 627

628 629 630

631 632 633

634 635 636 637

638 639 640

641 642 643

644 645 646

647 648

[27] X. Min, K. Gu, G. Zhai, J. Liu, X. Yang, C. W. Chen, Blind quality assessment based on pseudo reference image, IEEE T. Mult. PP (99) (2017) 1–1. doi:10.1109/TMM.2017.2788206. [28] W. Xue, L. Zhang, X. Mou, Learning without human scores for blind image quality assessment, in: 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 995–1002. doi:10.1109/CVPR.2013.133. [29] L. Zhang, L. Zhang, A. C. Bovik, A feature-enriched completely blind image quality evaluator, IEEE T. Image Process. 24 (8) (2015) 2579–2591. doi:10.1109/TIP.2015.2426416. [30] H. Zeng, L. Zhang, A. C. Bovik, A probabilistic quality representation approach to deep blind image quality prediction, arxiv.org/abs/1708.08190. [31] Z. Zhang, H. Wang, S. Liu, T. S. Durrani, Deep activation pooling for blind image quality assessment, Applied Sciences 8 (4). doi:10.3390/app8040478. [32] J. Kim, S. Lee, Fully deep blind image quality predictor, IEEE J. Sel. Top. Signal 11 (1) (2017) 206–220. doi:10.1109/JSTSP.2016.2639328. [33] K. Ma, W. Liu, T. Liu, Z. Wang, D. Tao, dipIQ: Blind image quality assessment by learning-to-rank discriminable image pairs, IEEE T. Image Process. 26 (8) (2017) 3951–3964. doi:10.1109/TIP.2017.2708503. [34] S. Bosse, D. Maniry, T. Wiegand, W. Samek, A deep neural network for image quality assessment, in: IEEE International Conference on Image Processing (ICIP), 2016, pp. 3773–3777. doi:10.1109/ICIP.2016.7533065. [35] K. Ghosh, S. Sarkar, K. Bhaumik, Understanding image structure from a new multi-scale representation of higher order derivative filters, Image Vision Comput. 25 (8) (2007) 1228 – 1238. doi:10.1016/j.imavis.2006.07.022. [36] G. Xie, X. Sun, X. Tong, D. Nowrouzezahrai, Hierarchical diffusion curves for accurate automatic image vectorization, ACM Trans. Graph. 33 (6) (2014) 230:1–230:11. doi:10.1145/2661229.2661275. [37] H. E. Gerhard, F. A. Wichmann, M. Bethge, How sensitive is the human visual system to the local statistics of natural images?, PLOS Computational Biology 9 (1) (2013) 1–15. doi:10.1371/journal.pcbi.1002873. [38] W. Zhou, W. Qiu, M. W. Wu, Utilizing dictionary learning and machine learning for blind quality assessment of 3-D images, IEEE T. Broadcast. 63 (2) (2017) 404–415. doi:10.1109/TBC.2016.2638620. [39] D. Marr, E. Hildreth, Theory of edge detection, Proceedings of the Royal Society of London Series B 207 (1980) 187–217.

29

649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688

[40] International Telecommunications Union, ITU-R Recommendation BT. 601-5: Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios (1995). [41] X. Zhang, S. Wang, S. Ma, W. Gao, A study on interest point guided visual saliency, in: 2015 Picture Coding Symposium (PCS), 2015, pp. 307–311. doi:10.1109/PCS.2015.7170096. [42] E. Rosten, T. Drummond, Machine learning for high-speed corner detection, in: A. Leonardis, H. Bischof, A. Pinz (Eds.), Computer Vision – ECCV 2006, Springer Berlin Heidelberg, Berlin, Heidelberg, 2006, pp. 430– 443. doi:10.1007/11744023_34. [43] N. Ponomarenko, L. Jin, O. Ieremeiev, V. Lukin, K. Egiazarian, J. Astola, B. Vozel, K. Chehdi, M. Carli, F. Battisti, C.-C. J. Kuo, Image database TID2013: peculiarities results and perspectives, Signal Process.-Image 30 (2015) 57–77. doi:10.1016/j.image.2014.10.009. [44] K. E. A. v. d. Sande, C. G. M. Snoek, A. W. M. Smeulders, Fisher and vlad with flair, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2377–2384. doi:10.1109/CVPR.2014.304. [45] Y. Zhou, L. Li, J. Wu, K. Gu, W. Dong, G. Shi, Blind quality index for multiply distorted images using bi-order structure degradation and nonlocal statistics, IEEE T. Mult. (2018) 1–1doi:10.1109/TMM.2018.2829607. [46] C.-C. Chang, C.-J. Lin, LIBSVM: A library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (3) (2011) 1–27. doi:10.1145/1961189.1961199. [47] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, F. Battisti, TID2008 - a database for evaluation of full-reference visual quality assessment metrics, Advances of Modern Radioelectronics 10 (2009) 30–45. [48] E. C. Larson, D. M. Chandler, Most apparent distortion: full-reference image quality assessment and the role of strategy, J. Electron. Imaging 19 (1) (2010) 011006. doi:10.1117/1.3267105. [49] D. Ghadiyaram, A. C. Bovik, Massive online crowdsourced study of subjective and objective picture quality, IEEE T. Image Process. 25 (1) (2016) 372–387. doi:10.1109/TIP.2015.2500021. [50] D. Jayaraman, A. Mittal, A. K. Moorthy, A. C. Bovik, Objective quality assessment of multiply distorted images, in: Proc. IEEE Int. Conf. on Signals, Systems, and Computers, (ASILOMAR), IEEE, 2012. doi:10.1109/acssc.2012.6489321. [51] Video Quality Experts Group, Final report from the video quality experts group on the validation of objective models of video quality assessment, phase ii (fr tv2), https://www.itu.int/ITUT/studygroups/com09/docs/tutorial opavc.pdf. 30

689 690 691

692 693 694

695 696 697 698

699 700 701

702 703 704

705 706 707

[52] H. Sheikh, A. Bovik, G. de Veciana, An information fidelity criterion for image quality assessment using natural scene statistics, IEEE T. Image Process. 14 (12) (2005) 2117–2128. doi:10.1109/tip.2005.859389. [53] Q. Lu, W. Zhou, H. Li, A no-reference image sharpness metric based on structural information using sparse representation, Inform. Sciences 369 (2016) 334 – 346. doi:10.1016/j.ins.2016.06.042. [54] J. Kim, H. Zeng, D. Ghadiyaram, S. Lee, L. Zhang, A. C. Bovik, Deep convolutional neural models for picture-quality prediction: Challenges and solutions to data-driven image quality assessment, IEEE Signal Proc. Mag. 34 (6) (2017) 130–141. doi:10.1109/MSP.2017.2736018. [55] C. Fan, Y. Zhang, L. Feng, Q. Jiang, No reference image quality assessment based on multi-expert convolutional neural networks, IEEE Access 6 (2018) 8934–8943. doi:10.1109/ACCESS.2018.2802498. [56] X. Liu, J. van de Weijer, A. D. Bagdanov, Rankiqa: Learning from rankings for no-reference image quality assessment, in: International Conference on Computer Vision (ICCV), 2017. [57] K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, W. Zuo, End-to-end blind image quality assessment using deep neural networks, IEEE T. Image Process. 27 (3) (2018) 1202–1213. doi:10.1109/TIP.2017.2774045.

31

Graphical Abstract

Highlights

   

A no-reference image quality assessment method is proposed. The method extracts local features from high-order derivatives of color channels Statistics of pixel blocks are used to train a quality model Experimental results on six image datasets reveal the effectiveness of the method