Improved visual information fidelity based on sensitivity characteristics of digital images

Improved visual information fidelity based on sensitivity characteristics of digital images

Accepted Manuscript Improved Visual Information Fidelity Based on Sensitivity Characteristics of Digital Images Tien-Ying Kuo, Po-Chyi Su, Cheng-Mou T...

953KB Sizes 2 Downloads 49 Views

Accepted Manuscript Improved Visual Information Fidelity Based on Sensitivity Characteristics of Digital Images Tien-Ying Kuo, Po-Chyi Su, Cheng-Mou Tsai PII: DOI: Reference:

S1047-3203(16)30102-X http://dx.doi.org/10.1016/j.jvcir.2016.06.010 YJVCI 1785

To appear in:

J. Vis. Commun. Image R.

Received Date: Revised Date: Accepted Date:

12 November 2015 18 March 2016 16 June 2016

Please cite this article as: T-Y. Kuo, P-C. Su, C-M. Tsai, Improved Visual Information Fidelity Based on Sensitivity Characteristics of Digital Images, J. Vis. Commun. Image R. (2016), doi: http://dx.doi.org/10.1016/j.jvcir. 2016.06.010

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Improved Visual Information Fidelity Based on Sensitivity Characteristics of Digital Images Tien-Ying Kuo1, Po-Chyi Su2*, Cheng-Mou Tsai1 1

Department of Electrical Engineering,

National Taipei University of Technology, Taiwan 2

Department of Computer Science and Information Engineering, National Central University, Taiwan

Abstract Digital images may lose certain information during transmission or transcoding processes. Since the lost information can influence the visual quality perceived by the human eyes, several quality assessment metrics have been proposed. The structural similarity index (SSIM) and visual information fidelity (VIF) are two of the most common methods that take characteristics of the human perceptual system into account. Although many improved metrics based on SSIM have been developed, the methods related to VIF, which outperforms SSIM-based approaches in certain image databases, have rarely been discussed. This research aims at improving VIF to increase the effectiveness and reduce its computational complexity. The enhanced VIF employs the Haar wavelet transform, log-Gabor filter, and spectral residual approach to emphasize the visual sensitivity in image quality assessment. The experimental results demonstrate the superior performance of the proposed method, when compared to various popular or latest assessment indices.

Index Terms: Image quality assessment, log-Gabor filter, visual information fidelity, visual sensitivity

1. Introduction A significant amount of imagery information is digitized nowadays to facilitate storage and transmission. Digital images are easily influenced by various types of processing and noises during acquisition and conversion processes. Whether a distorted image demonstrates

reduced visual quality or not is always a focus of study. Image quality assessment plays a crucial role in many applications, such as image enhancement [1-2], acquisition [3], watermarking [33], compression [34] and transmission [35]. The assessment methods are divided into subjective and objective quality measures. More accurate and reliable ways of assessing the image quality should be through subjective evaluation by the human visual system (HVS). However, such methods are restricted by the environment and specific conditions. In addition, subjective measures are not only time consuming but also costly, rendering them improper in many situations. Objective measures involve a set of assessment algorithms that can automatically evaluate the image quality so the feasibility is significantly increased. Generally, when the results of objective image assessment are more consistent with the subjective scores, they are considered closer to the quality perceived by the human eyes. Therefore, the methods of objective quality measurement are usually designed to seek consistency with the subjective scores in experiments. Various algorithms have been developed in this manner to pursue more accurate objective quality evaluation [1-4, 29]. Regarding the use of reference images or undistorted original images, objective image quality measures can be classified as full-reference, reduced-reference, and no-reference methods. When the complete original image is available, the full-reference method yields the most accurate results. The reduced-reference method employs partial information in the original image to assess the quality such that the necessary data for quality evaluation can be reduced. Reduced-reference methods are thus more suitable to video quality evaluation. The no-reference method, although preferred, is relatively difficult to conduct since the distorted areas in images cannot be located easily. This research focuses on improving quality assessment for digital images to produce the results better corresponding with the visual quality perceived by the human eyes, so the full-reference methodology is adopted. The traditional full-reference image quality measures are mean squared error (MSE) and peak signal-to-noise ratio (PSNR), which use the amount of pixel errors in the reference image and the distorted one to determine degree of distortion. The major advantage of MSE/PSNR is a low computational complexity but the errors without considering the HVS characteristics do not yield a good correspondence between the results and subjective scores. Therefore, current image assessment methods usually incorporate the models of HVS to enhance the performance. Such methods can be roughly divided into two stages. Local information of the two images is first gathered and compared to obtain local assessments, which are combined in the second stage into an overall quality assessment. The structural similarity index (SSIM) [5] is the most commonly employed image assessment method. SSIM

separately assesses the local brightness, contrast, and structure of both images, and then averages all local assessments to acquire the overall assessment. Unlike traditional image assessment methods, which tend to compare differences pixel by pixel, SSIM adopts a patch-based approach because the human eyes can easily perceive local information differences in an area, instead of individual pixel differences. Several SSIM-based methods are proposed to improve the effectiveness. The information content weighted SSIM (IW-SSIM) [7] and feature similarity index (FSIM) measures [8] are considered very effective image assessment methods in the recent literature. The IW-SSIM is an improved image assessment method based on SSIM. FSIM simultaneously assesses the phase congruency [9] and gradient magnitude of images. Compared to other SSIM-based methods, both IW-SSIM and FSIM perform well as the reported results correspond to the subjective scores more closely, with the cost of additional complexity. Visual information fidelity (VIF) [6] is another image assessment method based on the characteristics of HVS. VIF includes the concept of information theory and simulates image signals to extract cognitive information through the HVS channel. Fidelity refers to the similarity between signals from the reference and distorted images. VIF applies the wavelet decomposition to calculate the mutual information between the two images. The compiled mutual information will determine the ratio for generating the overall assessment results. Compared with SSIM-based methods, the improvement of VIF is rarely discussed probably because SSIM is more intuitional and possesses less computational complexity. However, by considering the three most frequently used image databases, TID2008 [10], LIVE [11], and CSIQ [12], SSIM-based methods only perform better in TID2008 and is outperformed in LIVE and CSIQ by VIF. Therefore, this research aims at enhancing the effectiveness of VIF in TID2008 to achieve overall superior performance and reducing its computational complexity to increase the scope of applications. The proposed method is basically composed of three parts, including the Haar wavelet transform [13], log-Gabor filter [14], and spectral residual approach [15]. The Haar wavelet transform is first applied to both the reference and distorted images to filter out high-frequency components, which are less visually sensitive, leaving low-pass images that contain important contents. The log-Gabor filter is used to decompose the spectra of two low-pass images into multiple-scale horizontal and vertical subbands. According to the spatial domain responses of each band, the features of image distortion that the human eyes are sensitive to will be captioned, along with the corresponding reference image features. The VIF method then calculates the local mutual information between both images and estimate

Natural image source

C

Distortion channel

D

HVS channel

E

HVS channel

F

Figure 1. The VIF schema the amount of local image information via the HVS channel. Finally, the spectral residual approach is adopted to detect the object regions of the reference image that the human eyes care, which will act as the weighting basis of the integrated local information. The local information of both images is compiled to determine the ratio, which is used to compute the overall image assessment score. The organization of the rest of the paper is as follows: Section 2 describes the VIF image assessment principles. Section 3 explains the theory, structure, and implementation of the proposed image assessment method, S-VIF. Section 4 presents the experimental results to demonstrate the feasibility of S-VIF. The conclusion is presented in Section 5.

2. VIF The main structure of VIF is shown in Fig. 1. First, the Gaussian scale mixture (GSM) model [16] is used to transform natural images into the Gaussian vector random field. Image signals without distortion can pass directly through the HVS channel and enter the brain, where perceptual information is extracted. However, if images are distorted, it can be assumed that the reference image signals have passed through other distortion channel before entering the HVS channel. In this structure, VIF separately calculates the following two types of mutual information: the mutual image information before and after transmission through the HVS channel, which is called reference image information; and the mutual image information before and after transmission through the distortion and HVS channels, which is called distorted image information. Each piece of information of the reference and distorted images will be used in the quality assessment process.

2.1. Source Model A wavelet transform is first applied to decompose the reference image into multiple-scale horizontal and vertical subbands, and then the GSM model is used to transform each band of the reference image into the Gaussian vector random field. The reference image bands are

divided into 3×3 non-overlapped blocks. The nine coefficients in each block form vector The overall band can thus be regarded as the vector random field we can further represent

. Using the GSM model,

as the product of two independent random fields, specifically, the

positive scalars random field covariance matrix

.

and the zero-mean Gaussian vector random field

, as shown in (1). The original vector random field

with a

transformed

through the GSM model possesses Gaussian distribution characteristics with a zero mean and the covariance matrix

[16-18]. I denotes the set of spatial indices for the random field.



(1)

2.2. Distortion Channel The wavelet transform is also applied to decompose distorted images into multiple-scale horizontal and vertical subbands. The distortion channel through the reference image band and corresponding distorted image band is calculated. The distortion channel mainly uses the signal attenuation and noises in the wavelet domain to simulate image distortion, as shown in (2), where

is the reference image band random field,

image band random field,

is the corresponding distorted

is the scalar gain field for simulating signal attenuation, and

is the zero-mean additive Gaussian noise with a covariance matrix used to simulate image signal noises. The distorted image random field as the reference image random field

, which can be can be represented

that passes through the distortion channel and has the

Gaussian distribution with a covariance matrix

.



(2)

2.3. Human Visual System Channel The main purpose of the HVS channel is to quantify the uncertainty factors that increase as image signals pass through the HVS. The simulation of Gaussian noise in the wavelet domain is shown in Eqs. (3) and (4), where extracted by the brain after the reference image the HVS channel.

and

matrices are respectively model parameter

and

are respectively the visual signals

and the distorted image

pass through

are zero-mean Gaussian noises, for which the covariance and

, which can be further represented using the HVS

, as shown in (5). Because

and

are combined with Gaussian

noises to represent

and

,

and

Their covariance matrices are

also possess Gaussian distribution characteristics. and

[19].

(3) (4) (5)

2.4. VIF Image Assessment The mutual information

of

and

is calculated to estimate the amount of

reference image information that is transmitted to the brain via the HVS channel. The mutual information

of

and

is also computed to estimate the amount of distorted image

information that is transmitted to the brain through the HVS channel, compared to that of the reference image. The methods for calculating mutual information shown in Eqs. (6) and (7), where and the conditional symbol

and

are

represents the number of local blocks in the image band,

indicates that the original image signals are transformed into

the Gaussian vector random field through the GSM model. Therefore, only the mutual information distribution relationship, that is, the covariance matrix, must be considered. Furthermore, because the covariance matrix used as a substitute for

is symmetrical, the eigenvalue

can be

in the following calculation to reduce the computation load. The

mutual information of all bands of the reference and distorted images is compiled to determine the ratio, which is then used to obtain the VIF image assessment value, as shown in (8), where

is the number of image decomposed bands.







The parameters

,

,

,

, and

must be estimated in advance.

determined using the coefficient of each reference image band, as shown in (9), and

can be can

be estimated using maximum-likelihood estimation, as shown in (10). The distortion channel parameter

and

can be determined using simple linear regression based on coefficients

of the reference and distorted image bands, as shown in Eqs. (11) and (12). Finally, the HVS parameter

can be adjusted according to experience to obtain the optimum VIF estimation.

is set as a constant according to [6].

(9) (10) (11) (12)

3. S-VIF

The proposed VIF-based method is termed S-VIF and the procedure is shown in Fig. 2. VIF can be divided into two steps; the first step calculates the mutual information of the reference and distorted images and the second step compiles the mutual information to determine the ratio and the overall assessment result. The proposed scheme will improve both steps.

Reference image

Distorted image

Haar wavelet transform

Haar wavelet transform

High frequency component filtering

Log-Gabor Filter

Log-Gabor Filter

Distorted image feature extraction

Max response direction (Distorted)

Local information measurement based on VIF

Image information

Spectral residual

(1)

Pooling Weighting

Visual sensitivity

(2) Quality score

Figure 2. The proposed framework

3.1. Mutual Image Information Calculation 3.1.1. Haar Wavelet Transform to Filter High-Frequency Components Haar wavelet transforms comprise a low-pass filter, a high-pass filter, and a downsampling filter. Images passed through the first-order Haar wavelet transform can be decomposed to a low-pass image LL and three high-pass images HL, LH, and HH. Low-pass images contain most of the image information, and high-pass images show the changes in pixels compared to the original image. We ignore the high-pass images, which the human eyes are less sensitive to, and use low-pass images to focus the subsequent image assessment on important image information. In addition, the smaller image size can reduce the computational complexity as well.

(a) Scale 1

(b) Scale 2

(c) Scale 3

(d) Scale 4

Figure 3. Log-Gabor filter masks in the frequency domain

3.1.2. Log-Gabor Filter to Extract Distorted Image Features The log-Gabor filter uses the high- and low-frequency characteristics of images to decompose spectra into bands of varying scales and directions. All bands are then transformed to the spatial domain to detect the image features. The transfer function of the log-Gabor filter in the frequency domain is

, as shown in (13):



where the first term can determine the filter radius,

is the bandwidth of the filter, and

is the center frequency of the filter, which can be adjusted to generate filters of varying scales. The second term can determine the angle of the filter. the filter, and

is the angular bandwidth of

is the orientation angle of the filter, which can be adjusted to generate filters

of varying directions. Because the human eyes are more sensitive to horizontal and vertical image features, the image spectrum is decomposed into horizontal and vertical subbands with four scales by log-Gabor filters as shown in Fig. 3 to detect the horizontal and vertical features in the spatial domain. When conducting subjective image assessment, the human eyes first observe salient areas on the distorted image and compare these features with those at identical locations on the reference image to determine the degree of image distortion. Therefore, saliencies on the distorted image must be considered before assessing images. The log-Gabor filter is first

(a) Distorted image

(e) Reference image

0

2

3

1

8

0

1

8

3

9

8

9

1

6

3

9

8

9

1

5

4

2

7

2

2

7

4

(b) Horizontal

(c) Vertical

response

(d) Maximum

response

response

2

5

1

3

4

2

3

4

1

0

3

8

2

5

0

0

3

8

7

1

2

1

3

1

1

3

2

(f) Horizontal response

(g) Vertical response

(h) Correspondence response

Figure 4. Distorted and reference image features extraction applied on the distorted image spectrum. We compare the responses of horizontal and vertical subbands of the same scale in the spatial domain. The larger response at the location is considered the pixel response of the scale. This response is integrated with those from the horizontal and vertical subbands of the same scale, as shown in Figs. 4(b) and (c), to produce a maximum band response, as shown in Fig. 4(d). Subsequently, the log-Gabor filter is also used to decompose the reference image spectrum following the same procedure. The resulting response integrates those of the horizontal and vertical subbands of the same scale into a band response that corresponds with the maximum response determined in the distorted image, as

shown in Fig. 4(h). With this method, we integrate the four scales of horizontal and vertical subband responses of the distorted and reference images into four scales of maximum band responses and the corresponding band responses. It should be noted that the original VIF uses a wavelet transform to decompose reference and distorted images into four scales of horizontal and vertical subband responses, and directly calculates the mutual information of each band in the wavelet domain without first extracting the obvious distorted image features and the corresponding reference image features. Therefore, the original VIF does not consider that the human eyes are more sensitive to distorted image features and further improvement is necessary.

3.1.3. VIF Calculation of Local Mutual Information of Images The VIF method described before is used to calculate the local mutual information for each scale of the maximum band response of distorted images and the four scales of corresponding reference image band responses captured in the previous step. Assuming that the reference image band response is transformed into the Gaussian vector random field

by

the GSM model, we use the reference image band response and distorted image band response to calculate the distortion channel, which can show the degree of distortion between the two images. Therefore,

can be combined with the distortion channel to produce the

distorted image band response after

and

and

are the respective results output to the brain

pass through the HVS channel. We use

vector of the random fields and

.

,

, and

,

, and

to represent each

. By calculating the mutual information

, the amount of local reference image information transmitted to the brain through

the HVS channel can be estimated. By calculating the mutual information and

of

of

, the amount of local distorted image information transmitted to the brain through the

HVS channel can also be estimated and compared to that of the reference image. The calculation of mutual information in the proposed method is identical to VIF. However, VIF uses

and

to represent the amount of information from the

overall band responses of the reference and distorted images that is transmitted to the brain. By contrast, considering the sensitivity information of the human eyes, we use

and

to represent the amount of band response local information from the reference and distorted images to facilitate the subsequent processing.

3.2. Integrating Overall Assessment Results

3.2.1. Using Spectral Residual Approach to Detect Object Region

(a) Reference image

(c) Object feature

(b) Saliency map

(d) Overlapped image

image Figure 5. Object regions extraction based on the spectral residual approach Although there are various saliency detection algorithms [36-37], the spectral residual approach [15] is included in the proposed method because of its efficiency in focusing the image assessment on visually significant areas during our empirical tests. Since the human eyes tend to be less sensitive to repeated backgrounds in images and more sensitive to certain unexpected objects, the spectral residual approach involves removing repeated and invariable parts from the image spectrum, and using the residual parts as unexpected parts to identify the object regions that the human eyes care more. First, based on the human visual characteristics, the longer side of the reference image is resized to 64 pixels with the other side proportionally resized. The real and imaginary parts in the resized image are transformed into amplitude and phase

through the Fourier transform. We hypothesize that

logarithm of amplitude

, and that

represents further passing

is the through the

average low-pass filter to simulate the consistent tendency of log spectra between various images. The spectral residue

can be determined by deducting

from

, as

shown in (14).

(14)

The spectral residue

retains the original phase

transform before passing through the Gaussian filter the resized images in the spatial domain, as shown in (15).

and conducts inverse Fourier to generate the saliency map of



is then upsampled back to the original size, and the saliency map

of the reference image

can be obtained [15][21-22]. Finally, we use a value 1.5 times the average pixel value of the saliency map

as the threshold to extract the regions containing objects on the saliency map

and generate the object map

, as shown in (16).



By overlapping the object map

and the reference image, we find that the spectral residual

approach can be used to effectively extract regions that include objects in the image, as shown in Fig. 5. We further use the object map

as the basis for image assessments incorporating

weighting, and set the weighting of the object region as twice that of the non-object region to produce the integrated weighting map

, as shown in (17). By contrast, the original VIF

compiles the mutual information of both images and determines the ratio to obtain assessment results without considering the varying sensitivities of the human eyes to local image information.



3.2.2. Image Assessment Results The weighting map

is used as the integrated weighting of local image information.

The band response local information of both images is compiled to determine the ratio and obtain the overall image assessment results, as shown in (18), where

and

represent the

scale number decomposed in the image and the number of local blocks in each scale, respectively. Again, the conditional symbol

indicates that the calculation of this formula is

based on transforming the image into the Gaussian vector random field.

TABLE I. FREQUENTLY EMPLOYED IMAGE DATABASES Reference Distorted Distortion Images Images Types TID2008 25 1700 17 LIVE 29 779 5 CSIQ 30 866 6 MICT 14 168 2 IVC 10 185 4 A57 3 54 6 WIQ 7 80 5 CCID2014 15 655 1 Database

Image Type color color color color color gray gray color

Observers 838 161 35 16 15 7 60 22



4. Experimental Results

Current literature has discussed eight frequently employed image databases that contain subjective scores, as shown in Table I. These databases are widely used to test the effectiveness of image assessment methods on the distorted images, each of which is suffered from one type of distortion. We use these image databases to compare and verify our algorithm. Below, each image database is briefly introduced. 1.

The TID2008 image database [10] is collaboratively developed by Finland, Italy, and Ukraine and contains 25 reference images. These reference images are used to generate 1,700 distorted images of 17 image distortion types, each of which comprises four levels. The TID2008 image database includes the most distorted images and distortion types.

2.

The LIVE image database [10] is developed by the University of Texas in Austin, U.S. Twenty-nine reference images are used to generate 779 distorted images of 5 image distortion types.

3.

The CSIQ image database [11] is developed by the Oklahoma State University in the U.S. Thirty reference images are used to generate 866 distorted images of 6 image distortion types.

4.

The MICT image database [22] is developed by Toyama University, Japan, and composed of 168 distorted images (JPEG and JPEG2000 compression distortion).

5.

The IVC image database [23] is developed by Polytechnic University of Nantes, France. It contains 10 reference images and there are 185 distorted images of 4 image distortion

types. 6.

The A57 image database [24] is developed by Cornell University in the U.S. It consists of 54 distorted images of 6 distortion types.

7.

The WIQ image database [26] is developed by the Blekinge Institute of Technology, Sweden, and comprises 7 reference images and 80 distorted images. The distorted images are primarily generated by simulating the image signals passing through transmission channels.

8.

The CCID2014 database [38] is completed at Shanghai Jiaotong University and takes exclusively the contrast distortion into account. It includes 655 images via eight kinds of transfer mappings to process 15 source ones. The eight kinds are composed of negative and positive gamma transfers, convex and concave arcs, cubic and logistic functions, mean shifting, and compound functions with mean shifting before logistic transfer. Every image in these databases has a subjective score. According to the subjective scores

and objective scores of the image assessment, four evaluation metric types will determine the performances of the assessment methods. The root-mean-square error (RMSE) and the Pearson linear correlation coefficient (PLCC) are used to calculate the accuracy of the assessment. The Kendall rank correlation coefficient (KRCC) and the Spearman rank correlation coefficient (SRCC) are also employed to calculate the monotonicity of the image assessment and subjective scores [27-28]. When evaluating the accuracy of image assessment, the ranges for every type of subjective and objective scores differ. To provide more convenient comparisons, nonlinear mapping of objective scores onto subjective scores is usually performed. Assuming that the subjective score in the image database and

is

is the objective score for image assessments,

will be the objective scores after the nonlinear mapping:



The nonlinear regression with the parameter values

allows for optimal correlations

between mapped objective scores and subjective scores [27-28].

TABLE II. S-VIF IMPROVEMENT COMPARISON Model VIF

RMSE

PLCC

KRCC

SRCC

0.7899

0.8084

0.5860

0.7491

Haar transform

0.6794

0.8624

0.6685

0.8461

Log-Gabor filter

0.6620

0.8699

0.6678

0.8387

Spectral residual

0.7773

0.8152

0.5935

0.7598

S-VIF

0.5725

0.9044

0.7226

0.8945

We use the subjective score

and nonlinear mapped objective score

to calculate the

RMSE and PLCC, as shown in Eqs. (20) and (21), where a lower RMSE or higher PLCC indicates a more accurate image assessment. Furthermore, the difference between the subjective score

and objective score

rank is used to calculate the KRCC and SRCC, as

shown in Eqs. (22) and (23), where a higher KRCC or SRCC indicates better monotonicity of the image assessment and subjective scores. image database.

and

denotes the number of test images in the

denote the concordant and disconcordant ranks of the subjective

and objective scores, respectively.

denotes the rank difference between the subjective and

objective scores. (20) (21) (22) (23)

To confirm that the included visual sensitivity characteristics improve the image assessment, we examine the difference in assessment before and after the VIF improvement. The results for the TID2008 image database [10] are shown in Table II. We individually test the original VIF, the VIF combined with the log-Gabor filter [14] that first considers the features of distorted images, the VIF combined with the spectral residual approach [15] to detect the image object regions, and the S-VIF developed in this research. The results show that considering the features of distorted images first, and detecting image object regions can effectively improve the VIF image assessment and generate image assessment results that are closer to the image quality perceived by the human eyes. Because S-VIF employs these two

TABLE III. COMPARISON OF S-VIF WITH COMMONLY EMPLOYED ASSESSMENT METHODS Database TID 2008

LIVE

CSIQ

MICT

IVC

A57

WIQ

CCID 2014

Criteria

PSNR

UQI

SSIM

RMSE PLCC KRCC SRCC RMSE PLCC KRCC SRCC RMSE PLCC KRCC SRCC RMSE PLCC KRCC SRCC RMSE PLCC KRCC SRCC RMSE PLCC KRCC SRCC RMSE PLCC KRCC SRCC RMSE PLCC KRCC SRCC

1.0994 0.5734 0.4027 0.5531 13.360 0.8723 0.6865 0.8756 0.1575 0.8000 0.6084 0.8058 0.9585 0.6429 0.4443 0.6132 0.8460 0.7196 0.5218 0.6884 0.1737 0.7073 0.4309 0.6189 14.138 0.7939 0.4626 0.6257 0.6439 0.1744 0.4834 0.6743

1.0031 0.6643 0.4255 0.5851 11.982 0.8987 0.7100 0.8941 0.1460 0.8312 0.6188 0.8098 0.8731 0.7164 0.5227 0.7028 0.6792 0.8302 0.6252 0.8244 0.1897 0.6356 0.3330 0.4260 16.416 0.6974 0.4360 0.6084 0.4531 0.7210 0.5413 0.7182

0.8511 0.7732 0.5768 0.7749 8.946 0.9449 0.7963 0.9479 0.1334 0.8613 0.6907 0.8756 0.5738 0.8887 0.6939 0.8794 0.4999 0.9119 0.7223 0.9018 0.1469 0.8017 0.6058 0.8066 13.805 0.7980 0.5569 0.7261 0.3689 0.8256 0.6063 0.8136

MSSSIM 0.7173 0.8451 0.6568 0.8542 8.6188 0.9489 0.8045 0.9513 0.1149 0.8991 0.7393 0.9133 0.5640 0.8927 0.7029 0.8874 0.5029 0.9108 0.7203 0.8980 0.1253 0.8603 0.6478 0.8414 13.449 0.8095 0.5740 0.7495 0.3488 0.8458 0.6236 0.8271

VIF 0.7899 0.8084 0.5860 0.7491 7.6137 0.9604 0.8282 0.9636 0.0980 0.9277 0.7537 0.9195 0.5084 0.9138 0.7315 0.9077 0.5239 0.9028 0.7158 0.8964 0.1784 0.6915 0.4589 0.6223 14.873 0.7605 0.5246 0.6918 0.3349 0.8589 0.6419 0.8349

IWADDRFSIM FSIM LTG SSIM SSIM 0.6895 0.6746 0.6525 0.6158 0.6222 0.8579 0.8645 0.8738 0.8885 0.8860 0.6636 0.6780 0.6946 0.7240 0.6994 0.8559 0.8680 0.8805 0.9055 0.8805 8.3473 9.6642 7.6780 8.2482 7.7744 0.9522 0.9354 0.9597 0.9533 0.9587 0.8175 0.7816 0.8337 0.8191 0.8358 0.9567 0.9401 0.9634 0.9579 0.9646 0.1063 0.1042 0.1077 0.0782 0.0958 0.9144 0.9179 0.9120 0.9546 0.9311 0.7529 0.7645 0.7567 0.8192 0.7697 0.9213 0.9295 0.9242 0.9602 0.9330 0.4761 0.7857 0.5248 0.6625 0.6105 0.9248 0.7783 0.9078 0.8484 0.8729 0.7537 0.5752 0.7302 0.6446 0.6872 0.9202 0.7731 0.9059 0.8422 0.8718 0.4686 0.6684 0.4236 0.4730 0.5573 0.9231 0.8361 0.9376 0.9216 0.8892 0.7339 0.6452 0.7564 0.7357 0.6912 0.9125 0.8192 0.9262 0.9124 0.8804 0.1054 0.1305 0.0844 0.2457 0.1425 0.9034 0.8475 0.9393 0.8540 0.8146 0.6842 0.6324 0.7639 0.7275 0.6786 0.8709 0.8215 0.9181 0.8973 0.8681 12.677 13.424 11.895 11.943 12.7836 0.8329 0.8103 0.8546 0.8533 0.8298 0.6038 0.5493 0.6215 0.6120 0.5949 0.7865 0.7368 0.8006 0.7944 0.7681 0.3606 0.3979 0.3758 0.3555 0.2878 0.8342 0.7936 0.8183 0.8393 0.8980 0.5898 0.5386 0.5705 0.5944 0.6924 0.7811 0.7305 0.7654 0.7907 0.8767

S-VIF 0.5725 0.9044 0.7226 0.8945 8.3145 0.9526 0.8232 0.9558 0.0955 0.9316 0.7767 0.9342 0.5663 0.8917 0.6980 0.8851 0.5191 0.9047 0.7133 0.8960 0.0839 0.9399 0.7737 0.9232 11.085 0.8751 0.6633 0.8330 0.3187 0.8732 0.6613 0.8583

visual sensitivity characteristics, the accuracy of using the proposed S-VIF is significantly improved compared to that obtained using the original VIF. We next compare the commonly employed image assessment methods, that is, the PSNR, universal quality index (UQI) [30], SSIM [5], multi-scale SSIM (MS-SSIM) [31], VIF [6], IW-SSIM [7], Riesz-transforms based feature similarity (RFSIM) [32], FSIM [8], LTG [39] and ADD-SSIM [40] with the S-VIF. The test results for the eight image databases mentioned previously are shown in Table III. The best results of each assessment indicator are presented in italic boldface and the second best results are shown in boldface. The test results show that the proposed S-VIF possesses the best image assessment of most image databases with the most occurrences in the top two ranks. We further use the test results shown in Table III to compare the performance of the overall assessment. Previous studies typically employ two methods to calculate the overall assessment. The first method averages the assessment of all image databases directly and the

TABLE IV. COMPARISON OF AVERAGE PERFORMANCES Direct Average PLCC KRCC 0.6819 0.5051 0.6961 0.5266 0.8407 0.6561 0.8653 0.6837 0.8232 0.6551 0.8756 0.6999 0.8273 0.6456 0.8855 0.7159 0.8826 0.7096 0.8804 0.7062 0.8969 0.7290

Model PSNR UQI SSIM MS-SSIM VIF IW-SSIM RFSIM FSIM LTG ADD-SSIM S-VIF

SRCC 0.6605 0.7494 0.8507 0.8765 0.8530 0.8929 0.8480 0.9004 0.8891 0.8850 0.9092

Model PSNR UQI SSIM MS-SSIM VIF IW-SSIM RFSIM FSIM LTG ADD-SSIM S-VIF

Database Size-Weighted Average PLCC KRCC 0.6855 0.5113 0.7143 0.5401 0.8387 0.6516 0.8795 0.6963 0.8412 0.6768 0.8788 0.7022 0.8639 0.6843 0.8884 0.7160 0.9042 0.7355 0.9022 0.7327 0.9058 0.7398

SRCC 0.6250 0.7545 0.8385 0.8777 0.8708 0.8870 0.8712 0.8923 0.9042 0.9069 0.9129

TABLE V. COMPARISON OF ASSESSMENT IN VARIOUS DISTORTION TYPES KRCC Distortion Type

SRCC

VIF IW-SSIM FSIM LTG JPEG2000 0.8039 0.7981 0.8079 0.8315 JPEG 0.7956 0.7981 0.8002 0.7799 Gauss blur 0.8220 0.8079 0.8320 0.7917 AWGN 0.8067 0.7204 0.7430 0.8460 Contrast change 0.6661 0.6249 0.6221 0.5982

ADDSSIM 0.8350 0.8082 0.7831 0.8314 0.7086

S-VIF 0.8172 0.8010 0.8341 0.8158 0.7088

ADDVIF IW-SSIM FSIM LTG SSIM S-VIF 0.9360 0.9325 0.9384 0.9524 0.9553 0.9442 0.9336 0.9443 0.9442 0.9244 0.9404 0.9368 0.9382 0.9319 0.9480 0.9234 0.9080 0.9434 0.9434 0.8771 0.9079 0.9626 0.9524 0.9493 0.8627 0.7884 0.7852 0.7823 0.8842 0.8871

TABLE VI. COMPARISON OF COMPUTATIONAL COMPLEXITY (SECOND/IMAGE) Model

PSNR

UQI

SSIM

Time

0.0013

0.0488

0.0208

MSSSIM 0.0977

VIF 0.6994

IWSSIM 0.3698

RFSIM

FSIM

LTG

0.0899

0.3241

0.0219

ADDSSIM 0.1797

S-VIF 0.2725

second averages the assessment using the image database size as the weighting. We use these two methods separately to calculate the overall assessment of the proposed S-VIF and other commonly employed methods for the eight image databases, as shown in Table IV. The results presented in italic boldface are the best scores. Table IV demonstrates that S-VIF achieves superior accuracy on average than IW-SSIM, FSIM, LTG and ADD-SSIM, which are the current methods with the best performance. To analyze the individual distortion types in image assessment methods, we use the distortion types that occur the most frequently in the eight image databases, including JPEG2000, JPEG, Gaussian blurring, additive Gaussian noises (AWGN), and contrast changes. We then average the test results of distortion types for various image databases to determine the performance of methods, as shown in Table V. Again, the results presented in italic boldface are the best ones. The test results demonstrate that, of the various distortion

types, the SRCC and KRCC scores of the S-VIF are maintained above a specific level. Finally, we compare the computational complexity of methods. The specifications of our computer hardware equipment are Intel(R) Core(TM)2 Duo CPU i7-4790 CPU @ 3.60GHz , 12G RAM. Matlab R2010a software is used to conduct the simulation and tests. Table VI shows the average time required to test a 512×512 image using the S-VIF and other commonly employed image assessment methods. S-VIF not only demonstrates significantly better effectiveness, but also has a lower computational complexity compared to VIF, IW-SSIM and FSIM. In addition, our S-VIF assesses more accurately with slight tradeoff of speed in contrast to LTG and ADD-SSIM.

5. Conclusions This research presents an image assessment method, S-VIF, to improve the original VIF. The S-VIF employs the Haar wavelet transform and log-Gabor filter to filter high-frequency components, and considers the obvious features in distorted images first. Then, the spectral residual approach is adopted to detect image object regions to focus the image assessments on visually significant areas. The test results for various image databases show that the S-VIF achieves considerably improved accuracy of image quality assessment with a reduced computational complexity, when compared to the original VIF. Furthermore, the effectiveness and efficiency of the overall image assessments of the S-VIF surpass those of the existing work in extensive experiments.

ACKNOWLEDGMENT This research was supported by the Ministry of Science and Technologies of the Republic of China under the Grants MOST 103-2221-E-027-039- and MOST 104-2221-E-008-075.

REFERENCE [1] T. N. Pappas, R. J. Safranek, and J. Chen, A. Bovik, “Perceptual criteria for image quality evaluation,” Handbook of mage and ideo Processing, Academic press, 2005. [2] Z. Wang and A. C. Bovik, Modern Image Quality Assessment, Morgan & Claypool Publishers, Mar. 2006. [3] T.-J. Liu, Y.-C. Lin, W. Lin and C.-C. Jay Kuo, “ isual quality assessment: recent developments, coding applications and future trends,” AP PA Trans. on Signal and Information Processing, Jul. 2013.

[4] J. Preiss, . ernandes, and P. Urban, “Color-image quality assessment: From prediction to optimization,” EEE Trans. on Image Processing, vol. 23, pp. 1366-1378, March 2014. [5] Z. Wang, A.C. Bovik, H.R. Sheikh, and E.P. imoncelli, “ mage quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004. [6] H.R. heikh and A.C. Bovik, “ mage information and visual quality,” IEEE Trans. on Image Processing, vol. 15, no. 2, pp. 430-444, Feb. 2006. [7] Z. Wang and Q. Li, “ nformation content weighting for perceptual image quality assessment,” IEEE Trans. on Image Processing, vol. 20, no. 5, pp. 1185-1198, May 2011. [8] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “

M: A feature similarity index for image

quality assessment,” IEEE Trans. on Image Processing, vol. 20, no. 8, pp. 2378-2386, Aug. 2011. [9] P. Kovesi, “ mage features from phase congruency,” Journal of Visual Communication and Image Representation, vol. 1, no. 3, pp. 1-26, 1999. [10] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, and F. Battisti, “T D2008 - A database for evaluation of full-reference visual quality assessment metrics,” Advances of Modern Radioelectronics, vol. 10, pp. 30-45, 2009. [11] H. R. Sheikh, Z. Wang, L. Cormack, and A. C. Bovik. (2005) Live Image Quality Assessment Database Release 2. [Online]. Available: http://live.ece.utexas.edu/research/quality [12] E. C. Larson and D. M. Chandler, Categorical Image Quality (CSIQ) Database 2009 [Online]. Available: http://vision.okstate.edu/csiq [13] M. Antonini, M. Barlaud ,P. Mathieu and I. Daubhechies, "Image coding using wavelet transform", IEEE Trans. on Image Processing, vol. 1, no. 2, pp. 205-220,Apr. 1992. [14] D. J. ield, “Relations between the statistics of natural images and the response properties of cortical cells,” Journal of the Optical Society of America A, vol. 4, no. 12, pp. 2379-2394, Dec. 1987. [15] X. Hou and L. Zhang, “ aliency detection: A spectral residual approach,” EEE Conf. on Computer Vision and Pattern Recognition, pp. 1-8, Jun. 2007. [16] M. J. Wainwright and E. P. imoncelli, “ cale mixtures of Gaussians and the statistics of natural images,” Advances in Neural Information Processing Systems, vol. 12, pp. 855-861, May 2000.

[17] M. J. Wainwright, E. P. imoncelli, and A. . Wilsky, “Random cascades on wavelet trees and their use in analyzing and modeling natural images,” Applied and Computational Harmonic Analysis, vol. 11, pp. 89-123, 2001. [18] J. Portilla, M. W. . trela, and E. P. imoncelli, “ mage denoising using scale mixtures of Gaussians in the wavelet domain,” IEEE Trans. on Image Processing, vol. 12, no. 11, pp. 1338-1351, Nov. 2003. [19] E. P. imoncelli and B. A. Olshausen, “Natural image statistics and neural representation,” Annual Review of Neuroscience, vol. 24, pp. 1193-1216, May 2001. [20] H.R. heikh, A.C. Bovik, and G. de eciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Trans. on Image Processing, vol. 14, no. 12, pp. 2117-2128, Dec. 2005. [21] Q. Ma and L. Zhang. “ mage quality assessment with visual attention”, in Proc. IEEE. Int. Conf. on Pattern Recognition, pp. 1-4, Dec. 2008. [22] A. Guo, D. Zhao, . Liu, X. an, W. Gao, “ isual attention based image quality assessment,” IEEE Intl. Conf. on Image Processing, pp. 3297-3300, Sept. 2011. [23] Y. Horita, K. Shibata, Y. Kawayoke, and Z. M. P. Sazzad, MICT Image Quality Evaluation Database 2000 [Online]. Available: http://mict.eng.u-toyama.ac.jp/mictdb.html [24] A. Ninassi, P. Le Callet, and F. Autrusseau, Subjective Quality Assessment-IVC Database 2005 [Online]. Available: http://www2.irccyn.ec-nantes.fr/ivcdb/ [25] D. M. Chandler and S. S. Hemami, A57 Database 2007 [Online]. Available: http://foulard.ece.cornell.edu/dmc27/vsnr/vsnr.html [26] U. Engelke, M. Kusuma, H.J. Zepernick, and M. Caldera, “Reduced-reference metric design for objective perceptual quality assessment in wireless imaging,” ignal Processing: Image Communication, vol. 24, pp. 525-547, 2009. [27] P. Corriveau et al., “ ideo quality experts group: Current results and future directions,” SPIE Visual Communication and Image Processing, vol. 4067, Jun. 2000. [28] H.R. heikh, M. . abir, and A.C. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Trans. on Image Processing, vol. 15, no. 11, pp. 3440-3451, Nov 2006. [29] W. Lin and C.-C. Jay Kuo, "Perceptual visual quality metrics: A survey," Journal of Visual Communication and Image Representation, vol. 22, no. 4, pp. 297-312, May 2011. [30] Z. Wang and A.C. Bovik, “A universal image quality index,” IEEE Signal Processing Letters, vol. 9, no. 3, pp. 81-84, March 2002.

[31] Z. Wang, E. P. imoncelli, and A. C. Bovik, “Multi-scale structural similarity for image quality assessment,” EEE Asilomar Conf. on Signals, Systems, and Computers, pp. 1398-1402, Nov. 2003. [32] L. Zhang, L. Zhang, and X. Mou, “R

M: A feature based image quality assessment

metric using Riesz transforms,” IEEE Intl. Conf. on Image Processing, pp. 321-324, Sept. 2010. [33] M. Xenos, K. Hantzara, E. Mitsou, I. Kostopoulos, “A model for the assessment of watermark quality with regard to fidelity,” Journal of Visual Communication and Image Representation, vol. 16, no. 6, pp. 621-642, Dec. 2005. [34] C.-Y. Wu, P.-C. Su, L.-W. Huang and C.-Y. Chiou “Constant frame quality control for H.264/A C,” AP PA Trans. on Signal and Information Processing, May 2013. [35] C.-Y. Wu and P.-C. u, “A content-adaptive distortion-quantization model for H.264/A C and its applications,” EEE Trans. on Circuits and Systems for Video Technology, vol. 24, no. 1, pp. 113-126, Jan. 2014. [36] X. Hou, J. Harel and C. Koch, "Image Signature: Highlighting Sparse Salient Regions," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 34, no. 1, pp. 194-201, Jan. 2012. [37] K. Gu, G. Zhai, W. Lin, X. Yang and W. Zhang, "Visual Saliency Detection with Free Energy Theory," IEEE Signal Processing Letters, vol. 22, no. 10, pp. 1552-1555, Oct. 2015 [38] K. Gu, G. Zhai, W. Lin and M. Liu, "The Analysis of Image Contrast: From Quality Assessment to Automatic Enhancement," IEEE Trans. on Cybernetics, vol. 46, no. 1, pp. 284-297, Jan. 2016. [39] K. Gu, G. Zhai, X. Yang and W. Zhang, "An efficient color image quality metric with local-tuned-global model," 2014 IEEE Intl. Conf. on Image Processing (ICIP), pp. 506-510, Paris, 2014. [40] K. Gu, S. Wang, G. Zhai, W. Lin, X. Yang, W. Zhang, "Analysis of Distortion Distribution for Pooling in Image Quality Prediction," IEEE Trans. on Broadcasting, no.99, pp.1-11, Jan. 2016

Improved Visual Information Fidelity Based on    Sensitivity Characteristics of Digital Images    Highlights       





 

An objective image quality assessment method is developed based on the  mutual information model of the Visual Information Fidelity.  The mutual information is captured using features provided by multi‐scale  maximal directional response of the Log‐Gabor filter, where the directional  features are dominated by distorted images.    Spectral residual approach is utilized to emphasize visual attention of objects in  image scenes.    The experimental results demonstrate the superior performance of the  proposed method, when compared to various popular or latest assessment  indices.