Image quality assessment based on adaptive multiple Skyline query

Image quality assessment based on adaptive multiple Skyline query

Journal Pre-proof Image quality assessment based on adaptive multiple Skyline query Siyuan He, Zezheng Liu PII: DOI: Reference: S0923-5965(19)30651-...

702KB Sizes 0 Downloads 62 Views

Journal Pre-proof Image quality assessment based on adaptive multiple Skyline query Siyuan He, Zezheng Liu

PII: DOI: Reference:

S0923-5965(19)30651-4 https://doi.org/10.1016/j.image.2019.115676 IMAGE 115676

To appear in:

Signal Processing: Image Communication

Received date : 28 June 2019 Revised date : 20 October 2019 Accepted date : 23 October 2019 Please cite this article as: S. He and Z. Liu, Image quality assessment based on adaptive multiple Skyline query, Signal Processing: Image Communication (2019), doi: https://doi.org/10.1016/j.image.2019.115676. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

Journal Pre-proof

Image Quality Assessment Based on Adaptive Multiple Skyline Query Siyuan He1,a,* and Zezheng Liu1,b 1

College of Electrical and Information Engineering, Hunan University, Changsha, Hunan, P.R. China a

[email protected],

b

[email protected]

I. INTRODUCTION

W

pro

perspective of multi-feature fusion. A metric space Skyline algorithm based on image visual feature and spatial feature fusion is proposed. In the metric space Skyline algorithm, the maximum cost is the overhead of calculating similarity. The feature similarity calculation of image in this paper is based on the bag-of-word model, where the vocabulary index is pregenerated and does not occupy the processing time. The major contributions in this paper can be summarized as follows: 1) We propose a novel model for non-reference image quality assessment based on deep learning techniques. The raw image features in different directions and scales are extracted using multiple Gabor wavelets, followed by Restricted Boltzmann Machines (RBMs) to generate dimension-reduced spatial and visual feature descriptors for future processing. 2) We use content-based clustering techniques for image feature matching, and design an adaptive multi-user Skyline query algorithm for image multi-feature fusion, named MSFF (Skyline based Multi-Feature Fusion). MSFF takes multiple feature similarities of images as the main evaluation index of Skyline operation. Compared with the traditional multi-feature fusion methods, this algorithm is general and simple, and has better adaptability and smaller parameter search space.

re-

Abstract—Non-reference image quality assessment has attracted great emphasis in recent years. Traditional image quality assessment algorithms based on structural similarity cannot make full use of the image gradient features, and the contrast similarity features often ignore the consistency of continuous color blocks within the image, which leads to large discrepancy between the evaluation result and the subjective judgment of human vision system. In this paper, we propose a deep model for image quality assessment where the spatial and visual features of image are both considered. For better feature fusion, we design an adaptive multiplecon Skyline query algorithm named MSFF, which takes as input multiple features of images, and learns the feature weights through end-to-end training. Extensive experiments on image quality assessment tasks prove that the proposed model exhibits superior performance compared with existing solutions. Index Terms—image quality, Skyline, feature fusion, Gabor wavelet

of

*Corresponding author

Jo

urn al P

ith the development of the Internet and the popularization of digital devices, big data driven image processing technologies have attracted more and more attention. In order to improve the efficiency of image processing under the premise of ensuring accuracy, many researchers have conducted extensive and in-depth researches. The metric Skyline Query algorithm has been well applied in the field of image processing. The Skyline query was first proposed in 2001 [1], which has attracted a lot of attention from many researchers ever since. Skyline queries now have a very important role in various applications, including data visualization, multiobjective decision making, and user preference queries [2, 14]. In image quality assessment, this algorithm aims to improve evaluation efficiency by pruning data in the metric space. The metric space of the existing image data Skyline algorithm is mostly based on semantic metric space modeling. Although the semantic information of the image is rich, they are considered complex, subjective, and very difficult to express. These shortcomings affect the metric space modeling effectiveness. In recent years, image quality assessment (IQA) [3] technology has developed rapidly and is widely used in daily life. IQA can visually reflect the quality of images, so it becomes an indispensable part of the image processing area. In this paper, we address the shortcomings of the metric space algorithm in selecting the image semantic information in traditional IQA algorithms. Starting from the image content, the spatial features and visual features of the image are selected as the research object. In order to improve the assessment results, the metric space Skyline algorithm is designed from the

II. RELATED WORK

A. Multi-user Skyline Query According to different operating environments, the specific algorithms of Skyline query can be divided into centralized Skyline algorithms and distributed Skyline algorithms. For the centralized Skyline algorithms, according to whether the algorithm uses an index, they can be further divided into nonindexed skyline algorithms and index-based skyline algorithms. The non-indexed skyline algorithms include BNL [1], SFS [2], D&C [1], which all use the memory space of the system for calculation. The index-based Skyline algorithms use indexes to assist the calculation process. Typical index-based algorithms include NN [4] and BBS [5] algorithms. In terms of the distributed Skyline algorithms, they can be classified according to different operating environments, including P2P based [6], web application based [7], and MapReduce based [8]. With the ever-changing computing environment, the development of information technology and the emergence of new application requirements, more and more applications involve the completion of a query by multiple users. For a data set, multiple users initiate a subspace Skyline query on a different combination of dimensions [9]. The system integrates the simultaneous sub-results and finally returns a result set.

Journal Pre-proof

Jo

of

urn al P

re-

B. Image Quality Assessment Non-reference based (NR-IQA) is also called Blind Image Quality Assessment (BIQA) [19]. In practical applications, the NR-IQA algorithms have a wide range of requirements but are more difficult to implement than algorithms that have either reference images or features. Peak Signal-to-Noise Ratio (PSNR) [11] and Mean Square Error (MSE) are the most classic image quality assessment algorithms. Researchers further considered the human visual system in the process of designing image quality assessment algorithms. Since the human eye is sensitive to structural information in images, Wang [12] et al. proposed Structure Similarity Algorithm (SSIM). Subsequently, a multi-scale structural similarity algorithm (Multiscale SSIM, MSSIM) was proposed [13]. The gradient information can reflect the change of image structure and contrast. Liu et al. [14] proposed Gradient Similarity (GSIM). Xue et al. [15] extended the quality of the image gradient local quality map to the overall image quality prediction, and proposed Gradient Magnitude Similarity Deviation (GMSD). Zhang [16] et al. proposed a new full reference IQA Feature Similarity (FSIM) index based on the characteristics of the human visual system (HVS) [10]. Chandler [17] proposed an effective algorithm for quantifying the visual fidelity of natural images based on the near-threshold and super-threshold characteristics of human vision (Visual Signal-to-Noise Ratio, VSNR). Moorthy et al. [20] proposed a Blind image quality index (BIQI) that performs image quality assessment with two steps. Firstly, the image features are extracted by using Generalized Gaussian distribution (GGD) to model the wavelet decomposition coefficient. Next, a support vector machine (SVM) is used to classify the probability that the input image is in each class, and support vector regression (SVR) is leveraged to determine the index value of image quality for every degradation type. The final index of quality evaluation is obtained according to the probability weights. In the image authenticity based on the subsequent distortion recognition (DIIVINE) [21], more complex 88-dimensional features are used. Subsequently, Mittal et al. [22, 23] proposed Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) algorithm that also uses SVM + SVR. They firstly calculate the multi-scale Mean Subtracted Contrast Normalized (MSCN) coefficients, and then perform asymmetric generalized Gaussian fitting on these coefficients and their correlation coefficients in different directions, and obtain the parameters as features. In the Blind image integrity notator using DCT Statistics (BLIINDS) proposed by Saad et al. [24], the image contrast characteristics are first estimated based on the DCT transform coefficients of 17 × 17 sub-blocks, then the Kurtosis value and the anisotropic entropy maximum of the DCT coefficient histogram are calculated on two scales as structural

features. Finally, the multivariate Gaussian (or Laplacian) probability model is used to describe the probability relationship between the feature and the subjective scoring DMOS value. Probability maximization predicts image quality. In subsequent studies, Saad et al. [25, 26] proposed an improved BLIINDS-II algorithm, which has a more complex feature extraction process. It uses overlapping 5 × 5 blocks to calculate DCT coefficients and non-DC DCT coefficients in generalized Gaussian. By analyzing these characteristic parameters and DMOS values with the SROCC correlation selects the appropriate parameters as the quality prediction features, all the features are calculated on multiple scales, and finally the DMOS value prediction is performed by training the multivariate generalized Gaussian probability model between these features and the DMOS values. Ye and Doermann [27, 28] calculate the Gabor transform features of all training images (randomly uniform selection blocks) during training, and generate the codebooks by clustering, and save the corresponding DMOS values. In the prediction, according to Gabor transform feature matching codes Ben, and the weighting is calculated according to the similarity to obtain an image quality score value, and finally the DMOS value is obtained by nonlinear regression. In the Codebook representation for noreference image quality assessment (CORNIA) algorithm proposed by Ye et al. [29], some blocks are randomly selected from the image during the training process, and local MSCN normalization is performed. After processing as a feature, the K-means algorithm is used to cluster the codebook. In the evaluation process, the code extracted from the image to be evaluated is generated by a soft-assignment strategy based on the dot product operation, and multiple codebooks are combined by max-pooling. Finally, the SVR is used to perform regression analysis on the codebook to obtain image quality evaluation. To verify image quality assessment results, usually the difference and correlation between the objective value of the model and the subjective value of the observation are usually compared. Two common evaluation indicators are Linear Correlation Coefficient (LCC) and Spearman's Rank Order Correlation Coefficient (SROCC). LCC, also known as the Pearson correlation coefficient (PLCC), describes the linear correlation between primary and objective assessments. In addition, there are Root Mean Square Error (RMSE), Kendall Rank Order Correlation Coefficient (KROCC) and other evaluation indicators. The nature of KROCC, like SROCC, also measures the monotonicity of algorithmic predictions. The RMSE calculates the absolute error between the MOS and the predicted value of the algorithm, and measures the accuracy of the algorithm prediction.

pro

Such queries can be summarized as MUSQ (multi-user Skyline query) issues, i.e. multi-user Skyline query issues. In such queries, the subject of the initiating query operation is transformed from a single user to a multi-user, and in most cases, the user is only interested in certain dimensions of the data set, not all dimensions. Due to the different dimensions of interest between users, query condition conflicts may occur on some data sets.

III. THE PROPOSED METHOD In this section, we describe the proposed method in detail. First, multiple Gabor filters are used to extract the raw features of the image in different directions and scales, followed by restricted Boltzmann machines (RBMs) [18] to generate feature descriptors that can better reflect the characteristics of the essential image content. Next, clustering based image quantization is performed to generate a visual vocabulary for the feature descriptors. The similarity of the features between

Journal Pre-proof

Fig. 1 The pipeline for feature extraction and dimension reduction

of visible nodes which denote the observation. Meanwhile, the bottom layer is the hidden one which is employed to engineer visual features. In our implementation, the BRM is utilized to encode local image descriptors. Specifically, the visible layer consists of R units, each of which corresponds to the dimensionality of the local descriptors, i.e., 64-D SIFT points [x]. The encoding layer contains M hidden units, each of which represents a visual vocabulary. The above double layer is linked by a weighting matrix K that is considered as a visual codebook. The BRM learning process can be briefed as: when the visible layer is trained and its inherent parameters will be fixed. Afterward, the hidden layer’s parameters will be inferred accordingly. The entire BRM network will be inferred layer-bylayer. It is worth emphasizing that, during BRM architecture design, we have to balance the model complexity and the training cost. Conventional methods typically train BRM from the raw image pixels, by leveraging a large-scale and complicated deep architecture. In our work, toward an efficient training process, local image patches are utilized as the input, and thereby we can obtain descriptive visual descriptors with a succinct deep architecture. Specifically, our RBM is formulated as follows. Consider that the training dataset consists of R images {𝐼1 , 𝐼2 , … , 𝐼𝑅 }, we split each image 𝐼𝑐 into M×N sub-blocks 𝑒𝑐 . We use Gabor filter groups of m+1 directions and n+1 scales to obtain image features in different directions and scales, as 𝑐 𝑐 𝑐 𝑐 𝑐 𝑐 Φ = {𝐺0,0 , 𝐺0,1 , … , 𝐺0,𝑛 , … , 𝐺𝑚,0 , 𝐺𝑚,1 , … , 𝐺𝑚,𝑛 } (3) The deep features in (3) are then used as the visible layer nodes to train the RBMs. The generated spatial and visual feature descriptors after dimension reduction are V𝑠 (𝑓1 , 𝑓2 , … , 𝑓𝑘 ) and V𝑞 (𝑓1 , 𝑓2 , … , 𝑓𝑘 ), respectively, as shown in Fig. 1. In our BRM, the fine-tuning is conducted as follows. A linear SVM classifier y is employed to connect BRM’s output of the first layer to that of the second layer. Each unit corresponds to a class label. In this way, the BRM is trained by

urn al P

re-

A. Feature Learning with Gabor Wavelet and RBM Considering that the structure of most areas is almost the same between different images of the same target, and the spatial connection is local in an image, we choose to let each neuron in the network consider the only local area of the image instead of the entire image. Therefore, the image is first divided into small sub-blocks, and the kernel function of Gabor wavelet is used to convolve with each image to obtain raw image features. Next, RBMs are used to encode the data and obtain more essential feature descriptor. Gabor wavelet combines Gabor transform and wavelet theory. It has spatial locality, directionality and multi-resolution characteristics of the wavelet transform. It has strong ability to classify and recognize images and can extract the local space and frequency of target domain information. The kernel function of a Gabor wavelet is defined as ‖𝑘 ∗ ‖2 −‖𝑘 ∗ ‖2 ‖𝑧‖2 𝑔𝑢,𝑤 (𝑘 ∗ , 𝑧) = exp ( )∗ 𝜎2 2𝜎 2

pro

of

the images are then calculated using TF-IDF feature matching. Finally, feature fusion is conducted by multi-user Skyline query for image quality assessment.

−𝜎 2

Jo

[exp(𝑖𝑘 ∗ 𝑧) − exp⁡( )] (1) 2 ∗ 𝑖𝜑𝑢 where 𝑘 = 𝑘𝑤 𝑒 is the Gabor kernel, 𝑘𝑤 = 𝐾𝑚𝑎𝑥 /𝑓 𝑤 is the sampling scale, 𝐾𝑚𝑎𝑥 is the maximum frequency, 𝑓 𝑤 is the kernel interval factor in the frequency domain, 𝜑𝑢 = πu/m is the sampling direction, u = 0,1, … , m − 1 and w = 0,1, … , n are direction and scale indexes, respectively; z = (x, y) represents a pixel; σ is a constant related to the wavelet bandwidth. When using Gabor wavelet to extract the image feature, we convolve the image I(x,y) with Gabor filters of different directions and scales, as in (2): 𝐺𝑢,𝑤 (𝑥, 𝑦) = 𝐼(𝑥, 𝑦) ∗ 𝑔𝑢,𝑤 (𝑘 ∗ , 𝑧) (2) (𝑥, where 𝐺𝑢,𝑤 𝑦) is the extracted image feature. Next, we use the RBMs to encode the learned features into spatial and visual feature descriptors. The BRM is short for Restricted Boltzman Machine. It is a double-layer bipartite structured neural network. The top layer is comprised of a set

Journal Pre-proof

of

D. Multiple Feature Fusion based on Skyline Query In the feature fusion stage, the feature matching results are treated as inputs to a multi-user Skyline query. The similarity of each feature is a decision objective of the Skyline operation. More specifically, given the image dataset I = {𝑜𝑖 }𝑛𝑖=1 , and an initial image q, we define the result R of the multi-feature fusion as follows. For the m feature vectors of each image X = {𝑥𝑗 }𝑚 𝑗=1 , the images in R has a similarity vector with the image q Vect 𝑖 (𝑜𝑖 , 𝑞) that are not dominated by any other similarity vectors of image in I. That is, for each image in R, the following condition must be met: ∀𝑜𝑐𝑜𝑛𝑑 ∈ 𝐼 ∧ 𝑜𝑖 ≠ 𝑜𝑐𝑜𝑛𝑑 (∃𝑥𝑗 ∈ 𝑋, 𝑠. 𝑡. 𝑑𝑖𝑠𝑡𝑗 (𝑞, 𝑜𝑖 ) < 𝑑𝑖𝑠𝑡𝑗 (𝑞, 𝑜𝑐𝑜𝑛𝑑 ))⁡⁡(7)

urn al P

re-

B. K-means-based Image Quantization Based on the above feature descriptor extraction method, we train the image in the image dataset and generate a visual vocabulary for the feature descriptors. This process is implemented by the K-means clustering algorithm. The obtained k clustering centers are treated as a visual vocabulary. Based on the clustering results, the spatial feature descriptor of each input image is quantized into a word bag representation with a corresponding neighbor center word in the visual vocabulary. In the visual word-of-bag model, given a visual vocabulary of a feature C𝑗 = {𝑐𝑖 }, 𝑖 = 1, … , 𝑘 , where 𝑗 = 1, … , 𝑚, k is the number of clustering centers, then each image is quantized as a k-dimension vector containing the frequency of each visual word. The same process is then performed on the visual feature descriptor to obtain another feature vector. It is noticeable that there are many state-of-the-art codebook learning algorithms in the literature, such as sparse coding (SC) and its multiple variants [33]. During our system design, we have considered a set of complicated codebook learning algorithms (SC, LLC-SC, and super vector (SV) [32]). However, we observe that these methods all have plenty of parameters to tune, which greatly decreases its convenience. Comparatively, the k-means only has one parameter to be decided, the codebook size. This makes it highly adaptable to various data sets. Moreover, we notice that k-means is highly competitive in time consumption. In our experimental validation, we perform codebook learning on four methods: kmeans, SC, LLC-SC, and SV. We adopt 10000 samples for testing. As we observed, the codebook learning time of the four methods are: k-means (4.33s), SC (7.21s), LLC-SC (10.32s), and SV (6.57s). Furthermore, quality of the obtained codebooks is nearly the same, as we reported in the experimental part (using the default parameter of SC, LLC-SC, and SV).

distance of the image can be calculated using the similarity distance of the vector. Given an image dataset containing n images I = {𝑜𝑖 }𝑛𝑖=1 , and an initial image q, a feature vector is generated according to (4). Thus, the similarity distance of the image q and any image in the image dataset I on the t-th feature can be expressed as the L1 distance of the two vectors: dist 𝑡 (q, 𝑜𝑖 ) = 𝑆𝑖𝑚(𝑣𝑡 (𝑞), 𝑣𝑡 (𝑜𝑖 )) = ‖𝑣𝑞𝑡 − 𝑣𝑜𝑡𝑖 ‖ (5) where 𝑣𝑜𝑡𝑖 is a k-dimension vector that represents the t-th feature of image 𝑜𝑖 . Based on (5), we have the similarity distance of the image q and any image 𝑜𝑖 in the image dataset I on each dimension. The similarity vector of q and 𝑜𝑖 is then defined as: Vect 𝑖 (𝑜𝑖 , 𝑞) = {𝑑𝑖𝑠𝑡1 (𝑜𝑖 , 𝑞), 𝑑𝑖𝑠𝑡2 (𝑜𝑖 , 𝑞), … , 𝑑𝑖𝑠𝑡𝑚 (𝑜𝑖 , 𝑞)} (6) where i ∈ [1, n], m is the number of features, 𝑑𝑖𝑠𝑡1 (𝑜𝑖 , 𝑞) is the similarity distance of the two images in the k-th feature (k ≤ m), Vect 𝑖 (𝑜𝑖 , 𝑞) is the similarity vector of image q and 𝑜𝑖 . We then calculate the similarity distance of each image in the dataset with the image q, and generate n similarity vectors.

pro

linking the second layer output to the first one. All the layers are uniformly employ a top-down sampling scheme. The deep features’ training errors are penalized by a softmax cross entropy loss. They are back-propagated through parameters involving the two layers of codebook.

Jo

C. TF-IDF Based Feature Matching Given an input image, we first extract its features and then quantize its feature descriptors into feature vectors based on the generated visual vocabulary. In order to calculate the image similarity distance more accurately, a standardized operation is required before the similarity calculation, TF-IDF (Term Frequency Inverse Document Frequency) is applied. Similar to the weight calculation method in document search, for any word c𝑗 in the vocabulary, the weight is calculated as: TF-IDF𝑐𝑗 ,𝑜𝑖 = 𝑇𝐹𝑐𝑗 ,𝑜𝑖 ∗ 𝐼𝐷𝐹𝑐𝑗 ,𝑜𝑖 =

𝑛𝑗𝑖 𝑛𝑖

∗ log⁡(

𝑁

𝑁𝑐𝑗

)

(4)

where 𝑛𝑖 is the total word count in image 𝑜𝑖 , 𝑛𝑗𝑖 is the occurrence number of word 𝑐𝑗 in image 𝑜𝑖 , N is the total number of images in the training set, and 𝑁𝑐𝑗 is the number of images that contain word 𝑐𝑗 . The next step is to calculate the similarity of the features between the images, namely the feature matching process. When an image is represented as a vector, the similarity

Fig. 2 Feature fusion with Skyline query

It can be seen that the result set of the multi-feature fusion based on Skyline is a subset of the image library where the image is not dominated by any other image in the dataset in the multi-feature metric space. Taking the two-dimensional case as an example, the whole process of feature fusion using Skyline is shown in Fig. 2. The spatial and visual feature similarity distance values of an image q and any other image 𝑜𝑖 constitute a point. For example, the x-axis of a point represents the similarity distance of the spatial feature between the image and the initial image, and the y-axis represents the similarity

Journal Pre-proof

IV. EXPERIMENTS

pro

of

/{[𝑁 ∑ 𝑦 2 − (∑ 𝑦)2 ]1/2 [𝑁 ∑ 𝑦̂ 2 − (∑ 𝑦̂)2 ]1/2 }1/2 (8) In (8), N is the number of samples, y is the subjective evaluation value, 𝑦̂ is the objective evaluation value. The performance of an image quality assessment algorithm grows as 𝐶𝐿𝐶 increases. b. Spearman Rank Order Correlation Coefficient (𝐶𝑆𝑅𝑂𝐶 ). It is a monotonic function that measures how a quality score can be described as another one. 𝐶𝑆𝑅𝑂𝐶 = 1 − 6 ∑ 𝐷 2 /[𝑁(𝑁 2 − 1)] (9) In (9), 𝐷 = |𝑦 − 𝑦̂| is the difference between subjective and objective evaluation values. The accuracy of a quality assessment algorithm is considered higher as 𝐶𝑆𝑅𝑂𝐶 approaches 1. All experiments are based on the bag-of-word model, where a clustering algorithm is first applied to conduct the visual vocabulary. We randomly divide each of the image datasets into a training set and a testing set, where the training set ratio σ is set at 0.5. The visual vocabulary is built upon the training set, and the initial image is selected from the testing set A. A Comparative Study In this experiment, we evaluate the quality prediction of a well-known method on the above three quality data sets. For each dataset, both the SROCC and LLC evaluations are conducted. It can be seen from the tables that the proposed method can better evaluate five kinds of distortions (ALL is a comprehensive evaluation of five kinds of distortions). The comprehensive evaluation results show that the proposed MSFF based algorithm is better than classical non-reference image quality assessment algorithms.

re-

distance of the visual features between them. These distances are calculated based on the word bag model in the multi-feature metric space. In Fig. 2, it is obvious that images are more similar to each other when the similarity distance is smaller. Therefore, {𝑝1 , 𝑝2 , 𝑝3 , 𝑝4 } are the final Skyline results. This indicates that no other better image is more similar to the initial image than the visual and spatial features of {𝑜1 , 𝑜2 , 𝑜3 , 𝑜4 }. That is, there is no other image in the image library that dominates the visual and spatial features of these images. The input of our feature fusion algorithm is an image dataset I = {𝑜𝑖 }𝑛𝑖=1 and an initial image q, where the images are two dimension vectors in the multi-feature metric space. The underlying feature vector is denoted as Vect 𝑖 (𝑜𝑖 , 𝑞) = {𝑑𝑖𝑠𝑡𝑠 (𝑜𝑖 , 𝑞), 𝑑𝑖𝑠𝑡𝑞 (𝑜𝑖 , 𝑞)} . The spatial and visual similarity distances are represented by 𝑑𝑠 and 𝑑𝑞 , respectively. The algorithm is described as follows. Algorithm: Feature fusion based on Skyline query Input: I, the image set; q, the initial image. Output: SKlist, the Skyline result list. Steps: 1) Extract the spatial and visual features of q: 𝑥𝑠 and 𝑥𝑞 ; 2) For every point 𝑜𝑖 ∈ 𝐼, compute the distance between q and 𝑜𝑖 according to (5); 3) Initialize Skyline result set SKlist=∅; 4) Insert the first point of the set R into SKlist; 5) For every point 𝑟 ∈ 𝑅, calculate the result following (7).

Table 1. A comparison of the 𝐶𝑆𝑅𝑂𝐶 results of different algorithms on LIVE database. Algorithms JPEG2000 JPEG WN BLUR BE ALL BRISQUE 0.915 0.914 0.980 0.952 0.878 0.933 BLIINDS-Ⅱ 0.930 0.901 0.947 0.924 0.890 0.922 DIIVINE 0.914 0.911 0.985 0.922 0.864 0.917 NSS-TS 0.932 0.916 0.972 0.940 0.936 0.931 CORNIA 0.925 0.940 0.970 0.960 0.907 0.942 MSFF 0.933 0.948 0.977 0.962 0.914 0.950

Jo

urn al P

In this section, we conduct experiments on extensive datasets to verify the effectiveness of the proposed image quality assessment algorithm. The image datasets used in the experiments include the following: 1) LIVE database [30]. LIVE database consists of 982 distorted images that are from 29 reference images, and the distortion types include JPEG2000 compression, JPEG compression, white noise, Gaussian blur, and bit errors in JPEG2000 bit stream. The database provides Differential Mean Opinion Scores (DMOS) values for each image. Between 0 and 100, the higher the DMOS value, the lower the image quality. 2) TID2008 database [31]. There are 1700 distorted images in TID2008 database from 25 reference images, including 17 types of distortion. The database provides Mean Opinion Scores (MOS) values for each image, between 0 and 9, the higher the MOS, the higher the image quality. The algorithm is objectively evaluated through the correlation coefficient between the objective value and the subjective value of the statistical image quality assessment algorithm. This paper mainly uses the following two performance evaluation measures. 3) TID2013 database [34] is comprised of the same 25 reference color images as reported TID2008 where 24 distortion-free images are collected. In total, the dataset contains 3000 distorted images. There are 120 distorted images (five levels for each of twenty-four types of distortions) obtained for each reference image. As we aforementioned, there are five levels of distortions. a. Linear Correlation Coefficient (𝐶𝐿𝐶 ). It evaluates the linear correlation between objective and subjective evaluation results. 𝐶𝐿𝐶 = (𝑁 ∑ 𝑦𝑦̂ − ∑ 𝑦 ∑ 𝑦̂)

Table 2. A comparison of the 𝐶𝐿𝐶 results of different algorithms database. Algorithms JPEG2000 JPEG WN BLUR BE BRISQUE 0.923 0.936 0.967 0.946 0.904 BLIINDS-Ⅱ 0.935 0.921 0.937 0.928 0.897 DIIVINE 0.923 0.922 0.989 0.924 0.889 NSS-TS 0.948 0.934 0.964 0.951 0.943 CORNIA 0.950 0.965 0.978 0.955 0.918 MSFF 0.953 0.972 0.985 0.954 0.933

on LIVE ALL 0.925 0.909 0.918 0.927 0.942 0.949

Table 3. A comparison of the 𝐶𝑆𝑅𝑂𝐶 results of different algorithms on TID 2008 database. Algorithms JPEG2000 JPEG WN BLUR BE ALL BRISQUE 0.865 0.879 0.912 0.895 0.905 0.897 BLIINDS-Ⅱ 0.904 0.896 0.916 0.914 0.901 0.901 DIIVINE 0.904 0.907 0.901 0.899 0.909 0.907 NSS-TS 0.904 0.902 0.907 0.899 0.905 0.903 CORNIA 0.904 0.913 0.909 0.912 0.914 0.912 MSFF 0.927 0.922 0.919 0.928 0.911 0.921 Table 4. A comparison of the 𝐶𝐿𝐶 results of different algorithms on TID 2008 database. Algorithms JPEG2000 JPEG WN BLUR BE ALL BRISQUE 0.896 0.905 0.905 0.896 0.910 0.902 BLIINDS-Ⅱ 0.903 0.907 0.899 0.902 0.904 0.907

Journal Pre-proof

0.900 0.912 0.914 0.931

0.904 0.911 0.916 0.922

0.905 0.904 0.909 0.925

0.906 0.907 0.912 0.923

0.903 0.909 0.912 0.926

Table 5. A comparison of the 𝐶𝑆𝑅𝑂𝐶 results of different algorithms on TID 2013 database. Algorithms JPEG2000 JPEG WN BLUR BE ALL BRISQUE 0.878 0.888 0.893 0.885 0.894 0.889 BLIINDS-Ⅱ 0.886 0.892 0.894 0.893 0.891 0.893 DIIVINE 0.892 0.893 0.895 0.883 0.872 0.891 NSS-TS 0.904 0.902 0.907 0.899 0.905 0.903 CORNIA 0.904 0.913 0.909 0.912 0.914 0.912 MSFF 0.924 0.919 0.931 0.928 0.936 0.932 Table 6. A comparison of the 𝐶𝐿𝐶 results of different algorithms on TID 2013 database. Algorithms JPEG2000 JPEG WN BLUR BE ALL BRISQUE 0.882 0.874 0.893 0.881 0.873 0.879 BLIINDS-Ⅱ 0.896 0.904 0.896 0.885 0.875 0.875 DIIVINE 0.906 0.895 0.894 0.898 0.902 0.901 NSS-TS 0.895 0.904 0.896 0.893 0.902 0.898 CORNIA 0.906 0.901 0.905 0.904 0.903 0.906 MSFF 0.917 0.919 0.914 0.921 0.915 0.917

Fig. 3 Test results of SF, VF, FSV, and MSFF on TID2008 image dataset

Moreover, we analyze the clustering error of a set of codebook generation algorithms, i.e., k-means, SC, LLC-SC, and SV. For each algorithm, we calculate the reconstruction error, i.e., the error of reconstructing the entire samples based on the codebook. The errors of the four algorithms are: 0.1132 (k-means), SC (0. 1102), LLC-SC (0.1021), and SV (0.1202). This observation clearly demonstrates the effectiveness of our adopted k-means.

re-

Based on the comprehensive experimental results as shown in Table 1~6, the following observations are made: 1) Our method outperforms its competitors significantly. This clearly demonstrates the necessity of optimally fusing multichannel visual features for image quality evaluation. Besides, in our framework, both local and global visual features are captured. This is also a key attribute that contributes to the good performance of our method. 2) Our BRM-based deep features are highly representative to image quality prediction. This is because deep learning can mimic human visual perception and cognition. Apparently, encoding human visual perception can greatly enhance image quality prediction, as different distortions are finally determined by human understanding. Moreover, we notice that image quality prediction and visual recognition can both benefit from well-engineered deep features, as the two tasks are similar from certain aspects. 3) In spite of the powerful deep features, shallow visual features are also contributive for image quality evaluation. In practice, the deep and shallow features should be optimally fused for image quality prediction. That is also the motivation of our approach, i.e., projecting the multi-channel deep/shallow features into a unified feature space for assessing visual quality.

since feature fusion can combine the advantages of multiple features, and make the assessment process more comprehensive. The performance of MSFF is significantly higher than SF and VF, since MSFF benefits from considering images that are similar in fused feature spaces.

of

0.899 0.903 0.911 0.927

pro

DIIVINE NSS-TS CORNIA MSFF

In this paper, we propose a multi-user Skyline query based deep model for image quality assessment. We use Gabor wavelets and RBMs to extract image visual and spatial features in multiple directions and scales. For better feature fusion, we design an adaptive multi-user Skyline query algorithm named MSFF, which can learn the feature weights through end-to-end training. Compared with traditional multi-feature fusion methods, MSFF has better adaptability and smaller parameter search space. We conduct experiments on extensive image datasets, and experimental results demonstrate that the proposed model exhibits superior performance compared with existing solutions.

urn al P

Jo

B. Analysis on Feature Fusion and K-means Clustering To verify the performance of the proposed Skyline query based feature fusion algorithm, we conduct another experiment on the TID2008 image dataset. We use SF to represent the assessment algorithm using only spatial feature, and VF as the algorithm using only visual feature. FSV denotes the feature fusion algorithm based on calculating a weighted distance where the weights are set by experience. The parameter settings are listed as follows: σ is set 0.5, k1 is 10,000, k2 is 1,000. The test results of SF, VF, FSV, and MSFF are illustrated in Fig. 3. It is observed that the SROC and LC of SF are both higher than VF, while the feature fusion based algorithms FSV and MSFF obtain higher scores than the single feature-based algorithms,

V. CONCLUSION

REFERENCE

[1] Borzsony, Stephan, Donald Kossmann, and Konrad Stocker. "The skyline operator." Proceedings 17th international conference on data engineering. IEEE, 2001. [2] Chomicki, Jan, et al. "Skyline with presorting." ICDE. Vol. 3. 2003. [3] Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE transactions on image processing, 2004, 13(4): 600-612. [4] Kossmann, Donald, Frank Ramsak, and Steffen Rost. "Shooting stars in the sky: An online algorithm for skyline queries." Proceedings of the 28th international conference on Very Large Data Bases. VLDB Endowment, 2002. [5] Papadias D, Tao Y, Fu G, et al. An optimal and progressive algorithm for skyline queries[C]//Proceedings of the 2003 ACM SIGMOD international conference on Management of data. ACM, 2003: 467-478.

Journal Pre-proof

pro

of

domain." IEEE Transactions on Image Processing 21, no. 12 (2012): 4695-4708. [24] Saad, Michele A., Alan C. Bovik, and Christophe Charrier. "A DCT statistics-based blind image quality index." IEEE Signal Processing Letters 17, no. 6 (2010): 583-586. [25] Saad, Michele A., Alan C. Bovik, and Christophe Charrier. "DCT statistics model-based blind image quality assessment." In 2011 18th IEEE International Conference on Image Processing, pp. 3093-3096. IEEE, 2011. [26] Saad, Michele A., Alan C. Bovik, and Christophe Charrier. "Blind image quality assessment: A natural scene statistics approach in the DCT domain." IEEE transactions on Image Processing 21, no. 8 (2012): 3339-3352. [27] Ye, Peng, and David Doermann. "No-reference image quality assessment based on visual codebook." In 2011 18th IEEE International Conference on Image Processing, pp. 30893092. IEEE, 2011. [28] Ye, Peng, and David Doermann. "No-reference image quality assessment using visual codebooks." IEEE Transactions on Image Processing 21, no. 7 (2012): 3129-3138. [29] Ye, Peng, Jayant Kumar, Le Kang, and David Doermann. "Unsupervised feature learning framework for no-reference image quality assessment." In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1098-1105. IEEE, 2012. [30] Sheikh, Hamid R., Zhou Wang, Lawrence Cormack, and Alan C. Bovik. "LIVE image quality assessment database release 2 (2005)." 2007 07-17]. http://live, ece. utexas. edu/research/quality (2005). [31] Ponomarenko, Nikolay, Vladimir Lukin, Alexander Zelensky, Karen Egiazarian, Marco Carli, and Federica Battisti. "TID2008-a database for evaluation of full-reference visual quality assessment metrics." Advances of Modern Radioelectronics 10, no. 4 (2009): 30-45. [32] Xi Zhou, Kai Yu, Tong Zhang, and Thomas S. Huang, Image Classificaiton using Supe-vector Encoding of Local Image Descriptors, ECCV, 2010. [33] Honglak Lee Alexis Battle Rajat Raina Andrew Y. Ng, Efficient sparse coding algorithms, NIPS, 2006. [34] Nikolay N. Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir V. Lukin, Karen O. Egiazarian, Jaakko Astola, Benoit Vozel, Kacem Chehdi, Marco Carli, Federica Battisti, C.-C. Jay Kuo: Image database TID2013: Peculiarities, results and perspective s. Sig. Proc.: Image Comm. 30: 57-77 (2015)

Jo

urn al P

re-

[6] Jagadish H V, Ooi B C, Vu Q H. Baton: A balanced tree structure for peer-to-peer networks[C]//Proceedings of the 31st international conference on Very large data bases. VLDB Endowment, 2005: 661-672. [7] Lo E, Yip K Y, Lin K I, et al. Progressive skylining over web-accessible databases[J]. Data & Knowledge Engineering, 2006, 57(2): 122-147. [8] Mullesgaard K, Pederseny J L, Lu H, et al. Efficient Skyline Computation in MapReduce[C]//EDBT. 2014: 37-48. [9] Bai M, Xin J, Wang G, et al. Discovering the $ k $ Representative Skyline Over a Sliding Window[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8): 2041-2056. [10] Thorpe S, Fize D, Marlot C. Speed of processing in the human visual system[J]. nature, 1996, 381(6582): 520. [11] De Boer J F, Cense B, Park B H, et al. Improved signal-tonoise ratio in spectral-domain compared with time-domain optical coherence tomography[J]. Optics letters, 2003, 28(21): 2067-2069. [12] Wang L T, Hoover N E, Porter E H, et al. SSIM: a software levelized compiled-code simulator[C]//Proceedings of the 24th ACM/IEEE Design Automation Conference. ACM, 1987: 2-8. [13] Wang Z, Simoncelli E P, Bovik A C. Multiscale structural similarity for image quality assessment[C]//The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003. Ieee, 2003, 2: 1398-1402. [14] Liu A, Lin W, Narwaria M. Image quality assessment based on gradient similarity[J]. IEEE Transactions on Image Processing, 2011, 21(4): 1500-1512. [15] Xue W, Zhang L, Mou X, et al. Gradient magnitude similarity deviation: A highly efficient perceptual image quality index[J]. IEEE Transactions on Image Processing, 2013, 23(2): 684-695. [16] Zhang L, Zhang L, Mou X, et al. FSIM: A feature similarity index for image quality assessment[J]. IEEE transactions on Image Processing, 2011, 20(8): 2378-2386. [17] Chandler D M, Hemami S S. VSNR: A wavelet-based visual signal-to-noise ratio for natural images[J]. IEEE transactions on image processing, 2007, 16(9): 2284-2298. [18] Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborative filtering[C]//Proceedings of the 24th international conference on Machine learning. ACM, 2007: 791-798. [19] Wang, Zhou, and Alan C. Bovik. "Modern image quality assessment." Synthesis Lectures on Image, Video, and Multimedia Processing 2, no. 1 (2006): 1-156. [20] Moorthy, Anush Krishna, and Alan Conrad Bovik. "A twostep framework for constructing blind image quality indices." IEEE Signal processing letters 17, no. 5 (2010): 513-516. [21] Moorthy, Anush Krishna, and Alan Conrad Bovik. "Blind image quality assessment: From natural scene statistics to perceptual quality." IEEE transactions on Image Processing 20, no. 12 (2011): 3350-3364. [22] Mittal, Anish, Anush K. Moorthy, and Alan C. Bovik. "Blind/referenceless image spatial quality evaluator." In 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pp. 723-727. IEEE, 2011. [23] Mittal, Anish, Anush Krishna Moorthy, and Alan Conrad Bovik. "No-reference image quality assessment in the spatial

Siyuan He was born in Changsha, Hunan, P.R. China, in 1989. He received the master’s degree from the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, in 2012. Now, he is currently working

Journal Pre-proof

toward the PhD degree in the College of Electrical and Information Engineering, Hunan University, Changsha, P.R. China. His research interests include parallel computing and big data analysis. E-mail: [email protected] Zezheng Liu was born in 1991. He received the master’s degree in the

of

School of Business, University of Melbourne, Melbourne, Australia, in

Ph.D. degree in the College of Electrical and Information Engineering, Hunan University, Changsha, P.R. China. His

management. E-mail: [email protected]

There is no conflict of interest.

re-

research interests include parallel computing and data

urn al P

1) We propose a novel model for non-reference image quality assessment based on deep learning techniques. 2) We use content-based clustering techniques for image feature matching, and design an adaptive multi-user Skyline query algorithm for image multifeature fusion.

Jo

pro

2015. He is currently pursuing his