Quality assessment of retargeted images by salient region deformity analysis

Quality assessment of retargeted images by salient region deformity analysis

Accepted Manuscript Quality Assessment of Retargeted Images by Salient Region Deformity Analysis Maryam Karimi, Shadrokh Samavi, Nader Karimi, S.M. Re...

1MB Sizes 1 Downloads 45 Views

Accepted Manuscript Quality Assessment of Retargeted Images by Salient Region Deformity Analysis Maryam Karimi, Shadrokh Samavi, Nader Karimi, S.M. Reza Soroushmehr, Weisi Lin, Kayvan Najarian PII: DOI: Reference:

S1047-3203(16)30266-8 http://dx.doi.org/10.1016/j.jvcir.2016.12.011 YJVCI 1915

To appear in:

J. Vis. Commun. Image R.

Received Date: Revised Date: Accepted Date:

20 August 2016 16 November 2016 20 December 2016

Please cite this article as: M. Karimi, S. Samavi, N. Karimi, S.M. Reza Soroushmehr, W. Lin, K. Najarian, Quality Assessment of Retargeted Images by Salient Region Deformity Analysis, J. Vis. Commun. Image R. (2016), doi: http://dx.doi.org/10.1016/j.jvcir.2016.12.011

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1

Quality Assessment of Retargeted Images by Salient Region Deformity Analysis Maryam Karimi, Shadrokh Samavi, Nader Karimi, S.M. Reza Soroushmehr, Weisi Lin, Kayvan Najarian

 Abstract— Displaying images on different devices, requires resizing of the media. Traditional image resizing methods result in quality degradation. Content-aware retargeting algorithms aim to resize images for displaying them on a new device with the goal of preserving ‎important contents of the image. ‎Quality assessment of retargeted images can be employed to choose among outputs of different retargeting methods or help the optimization of such methods. In this paper we propose a learning based quality assessment method for retargeted images. An optical flow algorithm is used to find the correspondence between regions in the scaled and retargeted images. Three groups of features are defined to cover different ‎aspects of distortions that are important to human observers. Area related features are used to detect how the areas of salient regions are retained and how much geometrical deformities are produced in the image. Also, to better assess the retargeted image we introduce features to show how well the aspect ratios of objects are retained. More importantly, we introduce the concept of measuring the homogeneity of distribution of deformities throughout the image. Experimental results demonstrate that our quality ‎estimation method has better correlation with subjective scores and outperforms existing methods. ‎ Index Terms— image quality assessment, image retargeting, geometrical distortions, homogeneity of deformities, saliency preservation. ‎

I. INTRODUCTION With the increase use of internet, a high volume of visual media is being transmitted. ‎Diversity of display devices, such as mobile phones and tablets, requires display of an image with different sizes on different devices. Image retargeting performs this resizing task and is becoming an important tool. ‎When an image is retargeted at a receiver device, with a display size different than the transmitter, ‎quality preservation is of cardinal importance. Although subjective quality assessment is ‎the most accurate method to determine the

Maryam Karimi is with the Department of Electrical and Computer Engineering, Isfahan University of Technology, 84156-83111, Iran. Shadrokh Samavi is with the Department of Electrical and Computer Engineering, Isfahan University of Technology, 84156-83111, Iran, and the University of Michigan Center for Integrative Research in Critical Care, Ann Arbor, 48109 U.S.A. Nader Karimi is with the Department of Electrical and Computer Engineering, Isfahan University of Technology, 84156-83111, Iran. S.M.Reza Soroushmehr is with the Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, 48109 U.S.A. Weisi Lin is with the School of Computer Engineering, Nanyang Technological University, 639798 Singapore. Kayvan Najarian is with the Michigan Center for Integrative Research in Critical Care, and also with the Dept. of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, 48109 U.S.A.

quality of images, it is time-consuming, laborious, very ‎costly, and in most cases it is unpractical. Thus a lot of efforts have been made to design ‎computational models for objective quality assessment of such images and videos [1].‎ Size and aspect ratio of images are changed to display them on different devices. Traditional retargeting methods, such as uniform scaling and ‎cropping, often lead to degradation of salient areas or loss of background content [2, 3]. To solve these ‎problems, many content aware retargeting algorithms have been proposed [4-9]. These methods aim to ‎protect salient content of images during the retargeting process. ‎ The well-known method of seam carving (SC) is proposed by Avidan and Shamir [4]. The SC algorithm reduces the ‎width or height of the image by removing pixels that belong to a path with least fluctuations. Such a pixel path is supposed to belong to a less important area. Warping (WARP), ‎which is presented by Wolf [5], finds important areas using local saliency, object and motion detection ‎algorithms and tries to keep them away from shrinkage. Rubinstein in [6] offered a multi ‎operator (MULTIOP) retargeting algorithm, which combines cropping, scaling and seam carving operators. It provides better results than single operator methods. In [7] a ‎scale-and-stretch warping (SNS) method is proposed. It iteratively computes an optimal local scaling factor for each local region. It tries to update a warped image that best matches such scaling factor. In [8] a non-uniform warping is also ‎introduced for streaming video (SV) which tries to retain cinematographic scene ‎composition after retargeting operation. Another method, named shift-map editing (SM), converts the image ‎retargeting problem to a graph labeling problem and then solving it graph cut [9].‎ Using retargeting methods causes information loss and geometric distortions in images‫‏‬. To determine the ‎efficiency of a retargeting approach, its results are usually perceptually ‎evaluated for a small set of images. But in real-time applications, and for large sets of images, using subjective ‎quality assessment is not possible. Therefore, designing retargeted image quality assessment (RIQA) criteria ‎has become a new challenge in this area. Using these criteria, not only facilitates online quality monitoring of ‎retargeted images, but also retargeting methods can be optimized through it. ‎ Although distorted image quality assessment methods for natural images have made good progress [10-13], the field of quality assessment for retargeted images is still in its ‎infancy. For natural images that have gone through noisy channels, or been compressed, or have been filtered, there are powerful objective image quality assessment (IQA) methods which produce quality scores that are very close to subjective scores. It is not easy to align non-uniformly retargeted images with

2 their original versions. Traditional ‎quality assessment metrics, which are used for equal-sized images, are not applicable to retargeted images [11, 14]. In addition, the metrics, which ‎assess image quality by examining changes in natural image statistics [12], are inefficient for ‎geometrically distortions that occur in retargeted images.‎ Most of the current RIQA methods use image matching procedures to find the correspondence between different ‎parts of each original image and its retargeted versions. The difference between two corresponding areas is used to ‎calculate the amount of damage. One such measure is the BiDirectional Warping (BDW) distance used in [6] that takes the sum of ‎asymmetric dynamic time warping distance between the original image and the retargeted image. For each element in the source image, this distance finds a ‎single match in the retargeted image that minimizes the warping cost without violating the order preserving constraint. The ‎distance metric is the maximum sum of squared differences of the pixel values in matched rows (columns). ‎BiDirectional Similarity (BDS) in [15] looks for patches in the retargeted image that are similar to patches in the original ‎image. In this approach, the matching criteria is the minimum sum of squared distances in the ‎CIE L * a * b color space. BDS is the average of matching errors for all patches in both directions. BDS and BDW do not have correlation with subjective ranking of images. These criteria impose big penalties to small local deformations when such changes are not very important for the human visual system. In addition, because of global patch ‎comparison in BDS, a deformed region may match different parts of the original image. Hence, correct ‎changes may not be considered properly. ‎ Earth mover's distance (EMD) [16, 17] solves a transportation problem instead of solving the matching ‎problem. In this case, the histogram of image features, such as color, texture, or position, is considered as the ‎image signature. The EMD distance of the image, according to a solution of the well-known transportation ‎problem, is the minimum cost to transform the signature of the original image to that of retargeted image. In [18] the ‎correspondence search is formulated as a discrete optimization problem by SIFT flow method. The cost ‎function between the two images is defined based on the fact that spatially close pixels, should have similar ‎displacements. Also, the SIFT descriptors in the corresponding points should be almost alike. The value of the ‎function at optimum mode is considered as the dissimilarity measure of the two images. SIFT flow [18] and ‎EMD [16, 17] use a SIFT descriptor that can robustly capture structural properties of an image. Therefore, their ‎rankings agree more with user labels as compared with other objective measures. ‎ Another method, called scale space matching (SSM) [19], extracts global geometric structures of both images ‎using a top-down process. Afterwards, SIFT and SSIM are used to find the pixel correspondence and quality ‎assessment passing through several scales. The final quality score is the saliency weighted summation of ‎quality assessment results in different parts of image. In [20] SIFT flow is used to match different areas of the two ‎images. An SSIM map is generated by finding SSIM of two corresponding windows which have corresponding pixels at their centers. This ‎map indicates how much the retargeting process has preserved the structural

information of the original image at ‎each point. The final score can be computed by the weighted summation of all SSIM map values using their related ‎saliencies. In [21] an RIQA approach is presented based on perceptual geometric distortion and information loss (PGDIL) which ‎obtains corresponding points using SIFT flow. This technique defines the weighted combination of information ‎loss and geometric distortion values as the distance between the two images. In [22] information about spatial and frequency contents of the images are used for assessments. Shape distortions and visual content changes are separately measured by the spatial quality factors. Since image matching is needed to find spatial quality factors and this may cause measurement errors, frequency domain parameters are also used. In [23] another method has been proposed, which is ‎based on five key factors, including salient region preservation, influence of introduced artifacts, preservation ‎of global structure, aesthetics, and preservation of symmetry. In [24] after a backward SIFT flow ‎matching, an aspect ratio similarity metric is calculated for each block. Then the weighted sum of all ‎values, using their corresponding saliency, is applied as the visual quality of each retargeted image. ‎ Despite the efforts of all these measures to estimate the quality of the retargeted images from different ‎perspectives, but their results do not align sufficiently to user perception. It seems that use of a precise matching algorithm and ‎opporp descriptors for different distortions are needed. Also, proper machine learning methods to incorporate the ‎descriptors and employ effective visual attention maps for weighting the distances in different parts of the ‎image can lead to RIQA metrics with more accurate match with users’ assessments. ‎ Image retargeting processes tend to generate geometric distortions, such as broken lines, change in aspect ratio of ‎objects, and loss of salient content, which are not negligible by human vision. Thus to evaluate the objective ‎quality of retargeted images, it is necessary to introduce features that reveal such geometric distortions. Since ‎most traditional quality assessment metrics are not designed for such distortions, their evaluation results are ‎not acceptable for retargeted images [25].‎ In this paper a new retargeted-image quality assessment method is proposed to achieve objective quality rankings close to the subjective ranks. For this purpose, we introduce the concept of analysis of distribution of deformities and losses. We not only introduce features for detecting loss of saliency and detecting geometrical deformities, but we also consider how these artifacts are distributed throughout the image. We first identify the salient parts of the original image. Then we need to come up with a correspondence between regions of the targeted and the original image. For this purpose, we use optical flow between the two images. We propose three sets of features to describe distortions caused by retargeting process which are important to human observers. Different steps of the proposed method are shown in Fig. 1. We see that in the first step the saliency map of the original image is extracted and a block-based correspondence between blocks of the original image and the retargeted image is formed using optical flow. ‎In the second step we extract features to reveal how well area and aspect ratio of a block is retained in the retargeted image. Also our features would show deformities

3 that occur in geometrical shapes. A new feature is introduced to show the level of homogeneity in distribution of blocks that their area and aspect ratios are changed. In the third step an adaptive normalization is also proposed to be applied on extracted features of ‎different retargeted images. A support vector regression (SVR), as a machine learning approach, is then applied to the generated set of features. Our estimated ‎scores are closer to the subjective rankings as compared to other RIQA methods. ‎ The rest of this paper is organized in the following manner. In Section II the pre-processing stage of the algorithm is explained. In Section III a set of dedicated features are proposed that could serve as means of quality measures. In Section IV experimental results on retargeted image databases are presented. Section V concludes the paper. II. PRE-PROCESSING ‎ Before extracting features we perform some pre-processing operations and the feature extraction will be performed on the results of these operations. In the pre-processing step, original blocks are warped, based on obtained optical flow vectors, to find the matched areas. A saliency map is detected and corrected in order to give a weighting to distortions in different areas of the image. SIFT flow vectors are also extracted to specify shape distortions.

sequences to show how each pixel has moved relative to its corresponding pixel in the previous frame. Coordinates of pixels in the retargeted image may change as compared to the original image. We use optical flow to show the amount of displacement of each pixel of the retargeted image as compared to its position in the original image. This mechanism could help us to show how much objects in the retargeted image have moved or deformed relative to the original image. Usual optical flow methods cannot provide a good estimate of fine relative movements. But motion detail preserving optical flow estimation of [26] could estimate both small and large displacements using a coarse to fine refinement framework. The mentioned algorithm is very accurate but both images should be of equal size. Suppose that the original image is of size and the size of its retargeted versions is . According to Fig. 2 we uniformly scale the original image to form with the desired size. Then optical flow algorithm is fed with both the scaled and retargeted images to estimate the flow vectors for each pixel

of the

scaled image. Horizontal component of the optical flow vector for pixel is and its vertical component is . Hence, for an retargeted image we produce two and matrices, containing horizontal and vertical optical flow components of all pixels.

A. Optical Flow Estimation Optical flow shows the pattern of changes that exist between two relatively similar images. Usually this is used in video f 1u

f 1v

Horizontal Component (U )

f 2u

Original image

f 2v f 4u f 4v fhorz fvert

Saliency Detection & Correction

farea Area Feature Extraction

Saliency Map

fH_area

Retargeted image

Optical Flow Estimation

U

Block Warping

Aspect Ratio Feature Extraction

V Scaled image

First Step: Pre-processing

Retargeted Blocks (Semi-Blocks)

Second Step: Feature Extraction

Fig 1. Block diagram of the proposed method.

fAR

fH_AR Third Step: Quality Estimation

Quality Score

Shape Feature Extraction

Vertical Component (V )

Quality Estimation

SIFT Flow Estimation

4 HxW

hxw

hxw

Uniform scale

Optical flow Estimation

Scaled image

Original image

Retargeted image Horizontal component

Block Warping

Uniform scale

Scaled Blocks

Original Blocks

Vertical component

Semi-Blocks

Fig 2. Using optical flow to estimate flow vectors and block warping based on the resulted vectors.

B.

Block Warping

We want to establish correspondence between a block and the reshaped version of this block in the retargeted image (semi-block). Using optical flow vectors of the pixels, block warping is performed on each scaled block , to show boarders of a semi-block . This semi-block does not necessarily have a regular shape and may even have been completely vanished. (1) illustrates warped pixel-wise version of block , which transforms the scaled image to the retargeted one, . Also, indicates the set of optical flow vectors carrying scaled block from to . An example of formation of correspondence between blocks and semiblocks is displayed in Fig. 2. Each pixel of the scaled image, using its optical flow vector, is mapped to form the warped image. Hence it is possible that there be cases where multiple pixels are mapped into the same location and it is possible to have locations in the new warped image that no pixel is mapped into them. We see examples of such cases in the warped image of Fig. 2. where

C. SIFT Flow Estimation SIFT Flow method [18] can estimate flow vectors between unequal sized images. This method outputs a flow vector for each pixel

, where

are of the horizontal and vertical components of the SIFT flow vector respectively. and

D. Saliency Detection and Correction To grade the importance of different parts of an image, it is necessary to use a saliency detection algorithm. After trying several saliency detection methods in our work we got our

best results using Hierarchical saliency detection (Hsaliency) [27]. One of the main problems in some of saliency detection algorithms is that small-scale structures have negative impact on saliency detection. This difficulty is very common due to the texture of natural images where saliency map becomes cluttered with fragments. Hsaliency extracts importance values from three layers of image in several scales. It finally fuses them using a graphical model. This method can handle complex foregrounds and backgrounds with different details. Content loss or geometrical destructions, which occur due to retargeting operations in more salient areas, are much more important for human eye than destructions that occur in less important areas. To reduce noticeable distortions, we apply Gamma correction with to the saliency map. This nonlinear function emphasizes higher values and weakens lower values of the initial saliency map. After testing different values, the best results were obtained using . III. FEATURE EXTRACTION AND QUALITY ESTIMATION To reach the target dimensions, different image retargeting algorithms impose geometric changes to the image. We divide such distortions into three main categories of 1) salient area changes, 2) aspect ratio changes, and 3) local shape distortions. To measure salient area change we can use features that calculate the retention rate of salient content and distribution of salient content in different parts of the retargeted image. To examine changes in aspect ratios of blocks we first need to look at each block individually. Then we need to look at the level of concentration of this phenomenon at different parts of the retargeted image. To examine the third category of distortions we need to measure degree of distortion in block-based and strip-based manners. A. Changes in Salient Areas When the percentage of size reduction increases, all retargeting algorithms inevitably remove some of the salient content. The loss of this content will have a negative effect on the perceived quality score [21, 22]. In this paper two features are proposed to cover such effects.

5 1) Area Retention

The spatial distance metric is measured as

To determine to what extent each block is shrunk or expanded we consider the ratio of number of pixels in semiblock to that of as the area retention of that block. The result of this operation for each of the retargeted images is an area retention map . In these maps 0 is for a block which is completely removed in the retargeted image and the maximum value can be even more than 1 because some blocks may be expanded through the retargeting operation. Hence, different gray levels are assigned to map blocks based on their level of retention of size. Formation of an area retention map is as follows:

(2) where counts the pixels surrounded in the input segment. The sum of saliency values of a block is used as a weight for the area retention of that block. Then the sum of all weighted area retentions of the image is used as a feature. We call this feature which is calculated as follows:

(3) where is the saliency map of the original image which is uniformly scaled to the size of the retargeted image. This feature is an indication of how much of the overall salient area is preserved.

and

is the

normalization factor. For a given block, the more similar it is to its neighboring blocks, the greater homogeneity value is assigned to it. We need to combine the property of semi-blocks’ area and their distribution. A function is needed to return greater values when semi-blocks that have lost their areas are distributed throughout the image. Also, this function should return grater values when not shrunken semi-blocks are packed close to each other. In other cases, the function should return relatively smaller values. This function is performed by which is shown in Fig. 3. Then we convert the map to a single value feature. This is done by adding the homogeneity value of each block to its area-retention-value. Then the produced sum is given a weight according to the saliency of the block. Hence the overall homogeneity of the image is measured by the following equation:

(5) Finally, the weighted sum of these values for each block with their saliency values represents the area homogeneity feature ( ) of the image.

2) Area Homogeneity When retargeting an image to smaller dimensions, it is inevitable that some blocks lose their area and some of the entries of are smaller than 1. If those blocks that have shrunk are scattered throughout the image, then the human visual system is not disturbed. But when shrunken blocks are concentrated in a region it becomes noticeable and makes the image to appear as low quality. Hence, we need a criterion to show the homogeneity of dispersion of shrunken semi-blocks throughout the image. We measure how much two semiblocks in the retargeted image are different in terms of their areas. Also, the distance between the two compared semiblocks should be considered. We define area-homogeneity map using of each semi-block to give us a quality feature for the retargeted image. To compute homogeneity, , of block , sum of all weighted differences, between the area of and the areas of all blocks of the image, is computed. We use the normalized chessboard distance between every two compared blocks as the weight. Hence, the homogeneity of is computed as in equation (4): (4)

Fig. 3. Combining effects of semi-blocks area loss and homogeneity of their distribution.

B. Aspect Ratio Feature Extraction Loss of area in salient regions means loss of information and that is important to human visual system. Besides that, another important factor affecting retargeted image quality is the preservation of aspect ratio of salient objects. Therefore, if retargeting preconditions change the aspect ratio of an image, and the retargeting algorithm transfers this aspect ratio change to less-salient parts, the visual quality of the image will be largely protected. Hence, we propose two

6 features regarding aspect ratio preservation in semi-blocks of a retargeted image. 1) Aspect Ratio Protection To measure the aspect ratio of each semi-block the best convex quadrilateral is fitted on it. Then the ratio of the second shortest side to the longest one is assigned to as the aspect ratio of :

blocks are considered. These blocks are identified by their superscripts, , as , , . Assuming as the horizontal and vertical components of SIFT flow vectors, two variance maps of and are defined for as follows: (10)

(6)

(11)

In equation (6), the Quad function fits the best quadrilateral on semi-block and outputs the four suitable sides. Operator “\” is also the subtraction symbol in set theory and is a small positive constant to avoid division by zero.

where is the pixel saliency in the original image. The sum of weighted variances of vertical and horizontal components on all blocks produces features and respectively. Features are calculated the same way for the two other block sizes.

To come up with a single feature for the retargeted image we can use the saliency of each block as a weight for its . Then the sum of all weighted values will be a feature, called , and can determine the aspect ratio retention of salient parts in the image:

(12)

(13) (7) 2) Strip-Based Shape Distortions 2) Aspect Ratio Homogeneity Suppose after retargeting, some blocks have small aspect ratios. If these blocks are adjacent, it may lead to a noticeable change in an object or a particular region. However, if these blocks are scattered throughout the image their perceptual effect is ignorable. Therefore, we form a map for the homogeneity aspect ratios ( ) and an overall feature, called homogeneity of aspect ratios .

The displacement of pixels in the same row (column), relative to each other, can lead to breakage of straight lines and edges. Accordingly, it is necessary to measure the distortions in vertical and horizontal strips. Let and be the width and height of the image. Then we name the -th vertical and th horizontal strips as and repectively. Features and are the sums of weighted variances in the vertical and horizontal strips.

(8)

(14)

(9)

(15)

C. Local Shape Feature Extraction Removing and shrinkage of some image parts, during retargeting operations, forces the remaining parts to move from their original positions. This drift imposes some distortions, like deformed objects, broken lines, and the edges and pattern degradations of the image. Therefore, in addition to global changes, local block distortions, as well as drifts in the vertical and horizontal strips, have effects in the quality of images. 1) Block-Based Shape Distortions In this paper, to measure local distortions, weighted variances of horizontal and vertical components of SIFT flow vectors are considered. We analyze the image in a hierarchical manner. First blocks, then , and finally 4

(16)

(17) Now to illustrate the performance of our proposed features we present some examples in the following. In Fig. 4 behaviors of and maps are demonstrated. An original image in Fig. 4(a) and its retargeted versions by SCL and SV are shown in parts (b) and (c) respectively. Corresponding and are also displayed respectively in the second and third rows (Fig. 4(d)-(g)). Darker blocks in these maps are indicative of lower preservation of area or

7 aspect ratio values, and brighter blocks are the blocks with better preserved values. It can be seen that the scaled image in Fig. 4(b) preserves much more salient area than the image in Fig.4(c) which is retargeted by SV method. Nevertheless, by looking at (b) and (c) we can easily recognize that the SV-retargeted image in (c) has a better visual quality comparing with (b). It shows that, in addition to salient area protection, aspect ratio protection of salient objects is also important. Fig. 4 (f) and (g), show for retargeted images of parts (b) and (c). It can be seen that though SCL in (d) preserves salient area better than SV operator does in (e), but higher aspect ratio in (g), comparing to (f), has led to better perceptual quality. In Fig. 5 an example is used to show the impact of area homogeneity. Images in parts (a) and (b) of Fig. 5 both have

(a) Original image

(b)

(c)

(d)

(e)

(f)

(g)

Fig. 4. (a) An original image and retargeted versions of the image by operators (b) SCL (c) SV. (d), (e) (f), (g)

lost the same salient area. Therefore, they have similar values of , but image (b) is visually quite better than (a). The maps of in parts (c) and (d) show that the left half of image in part (a) remains intact and the right half is shrunk. In part (b) the area of odd block columns is preserved and even columns are shrunk. Lower values of in part (e), comparing to (f), indicate inhomogeneous localization of shrunken semi-blocks in a compact area of image (a) versus homogeneous shrinkage of the image in part (b). As can be seen in Fig. 6, large variance of motion vectors in region B causes a local deflection while in region A the overall shape is retained due to similarity of the flow vectors. The map in Fig. 6 (c) shows greater values in region B than region A.

(a)

(c)

(e)

(b)

(d)

(f)

Fig. 5. (a) Half of image is shrunk, (b) shrunken strips are distributed, (c), (d) maps, (e), (f) maps.

8 calculate the ranking based correlation, Kendall Rank Correlation Coefficient (KRCC) [28] is used as follows: (18)

Fig‫‏‬6. Weighted variance of horizontal components of SIFT flow vectors in two parts of SC retargeted image. The variance in region A is smaller than B.

D. Quality Estimation In IQA for a noisy and compressed image that the quality of different contents of the image are comparable with each other, a RIQA method is expected to rank the retargeted images similar to subjective ranking. This property makes it possible for per-image normalization of the extracted features. Hence, for retargeted image , which has been retargeted by operator , feature is normalized independent of other reference images. This normalization provides a reasonable range for each feature for different images. Now a feature vector with length 12 is generated, for quality assessment, in the form of . We use the popular linear SVR with to be trained using the learning set of images. For a given test image and its retargeted versions, we normalize features in the same way before quality estimation and quality ranking based on the learned model. IV. EXPERIMENTAL RESULTS The proposed method is tested on RetargetMe dataset [28] and it is compared with state-of-the-art methods in this area. RetargetMe database is the most famous retargeted image set which is used as a basis of comparison for different RIQA methods. Each of the 37 original images in the dataset has been retargeted by 8 different methods. These retargeting methods are: Cropping (CR), Multi-Operator (MULTIOP), Seam Carving (SC), Scaling (SCL), Shift-Maps (SM), Scaleand-Stretch (SNS), Streaming Video (SV) and Warping (WARP). On the other hand, each of the 37 images is categorized in at least one of the following categories: lines/edges, faces/people, texture, foreground objects, geometric structures and symmetry. The retargeted images have been subjectively evaluated by 38 subjects in a paired comparison manner. Each time two retargeted versions of an original image are randomly shown to the user to determine which one he/she prefers. Ultimately, subjective quality score of each retargeted image is the number of times the image is preferred. In this data set, size reduction, occurred in only one of the dimensions of the image. The reduction is 50% for 14 images and 25% for the remaining 23 images. A. Evaluation Criteria The correlation between subjective and objective rankings is the most important metric for comparing RIQA methods. To

where and are respectively, concordant and discordant pairs between the subjective and objective rankings. If is the number of retargeted images, the denominator is representative of the total number of image pairs. The best and worst values of KRCC are 1 and -1. Let specify the subjective rank of image and specify the objective rank of the same image. Then the above equation is modified such that only the image pair is considered such that . Then KRCC value is calculated based on . Performance of RIQA methods is usually reported as the average of KRCC values over all original images in RetargetMe dataset with respect to and . KRCC, for , tests the performance of RIQA using only higher quality images. KRCC, for , considers all retargeted versions in its calculations. The second benchmark in RIQA area is Pearson Linear Correlation Coefficient (PLCC). PLCC measures the linear correlation between subjective and objective quality scores of retargeted images. If and , respectively are the subjective and objective quality score sequences for the retargeted versions of an original image, PLCC can be measured as: (19) where and are standard deviations of subjective and objective scores. The numerator includes the covariance of sequences. The average of PLCC over all original images of the dataset is reported as the second performance measure for different methods. PLCC=1 represents total positive correlation, 0 is for no correlation, and -1 reflects the total negative correlation. In this method, for each image, a feature vector is extracted. Since the dataset choses a paired comparison way with identical original image to achieve the subjective scores, perimage normalization is applied on it. After that, we train an SVM model using 80 percent of the original images which are randomly selected and all their retargeted versions. The remaining 20 percent of images are used as the test set. To ensure independence of the results from the selected set of images for training or test, the train-test process was repeated 1000 times. The average of 1000 values for each KRCC and PLCC on test sets is finally reported as the performance measures of the proposed RIQA method. B. Feature Analysis In this section the effect of proposed features on quality estimation is studied. For this purpose, the correlation between each feature and the subjective quality scores in RetargetMe dataset is investigated. Also, the performance in the absence of each group of features is investigated. PLCC, KRCC and SRCC (spearman rank order correlation) correlations of the proposed features with subjective scores in RetargetMe dataset are plotted in Fig. 7. Higher correlations are related to ,

9 , and . Block-based and strip-based shape distortions are next important features respectively. To show the impact of features in the final performance of the proposed method we perform a set of experiments. Each time we retry the 1000 train-test experiment by omitting a single feature or a set of features from the list of our features. A feature, which causes greater loss in correlation between subjective and our objective score, plays more important role in the algorithm. In Table 1, each row includes PLCC, KRCC ( ) and KRCC ( ) results in the absence of features that are marked with ×. We use the first of the table as the reference where all features are present. The next three rows show the impact of the aspect-ratio feature group. In the absence of each of the and the correlation values fall down while has higher effect than . Simultaneous omission of both features causes higher loss than eliminating either one alone. Therefore, they are complementary features in the first group. Similar operations are performed for the area features. Rows 5 to 7 in Table 1 show that both and are necessary features. Also, the absence of area features has higher negative impact on correlation results however the aspect ratio features are more correlated with subjective scores as we saw before. The block-based and stripbased groups of variances are also tested in rows 9 and 10 which have lower effect on the performance comparing to the first two groups but they are still helpful.

Table 1: PLCC and KRCC on RetargetMe dataset, in the absence of different features. PLCC

KRCC K=∞

0.613 0.494 0.559

× ×

0.567 0.458 0.546

× ×

0.598 0.479 0.535 0.531 0.424 0.497

× ×

0.558 0.443 0.513

× × ×

0.606 0.488 0.549 0.519 0.392 0.420

×

0.559 0.445 0.527

×

×

0.613 0.482 0.550

×

C.

×

×

×

×

× 0.578 0.447 0.495

Performance Evaluation

The performance results of the proposed method are compared with the current RIQA methods including SIFT flow [18], EMD [17], Liu-Luo [19], IR-SSIM [20], Liu-Lin [22], PGDIL [21], Liang [23] and ARS [24]. The mean and standard deviation ( ) of KRCC ( ), and also mean PLCC and p-value are compared in Table 2. The KRCC ( ) values are displayed in Table 3. In all columns, the best two values are bolded. PLCC and p-value are not reported in Table 3 because these values are not meaningful for top-3 ranking [21]. The mean KRCC (K=∞) by the proposed method is equal to 0.494 which is about 0.04 and 0.08 greater than that of the state-of-the-art methods ARS [24] and PGDIL [21] respectively. The mean PLCC is also improved from about 0.47 and 0.57 to more than 0.61. Table 3 illustrates that our method also outperforms the previous correlation results in terms of overall mean KRCC ( ). The KRCC ( ) results reported by PGDIL and Liu-Lin have been improved at a rate of 0.03 (4% relative improvement). The overall results in both Tables 2 and 3, indicate that the objective scores that are estimated by our learning based method are correlated more with subjective ones as compared to the other methods. Rank correlation values are compared individually for each attribute class in the left parts of Tables 2 and 3.

Fig 7. Correlation of features with subjective scores‫‏‬in RetargetMe dataset‫[‏‬28].

Table 2. Rank correlation of objective and subjective measures (K=∞) for RetargetMe dataset [28]. Method

SIFT flow [18] EMD[17] Liu-Luo [19] IR-SSIM [02] Liu-Lin[20] PGDIL [21] Liang [23] ARS [24] Proposed

Lines/Edges

Faces/People

0.097 0.220 0.140 0.309 0.431 0.351 0.463 0.453

0.252 0.262 0.328 0.452 0.390 0.271 0.519 0.589

KRCC K=3

Attribute Foreground Texture Objects 0.119 0.218 0.107 0.226 0.190 0.309 0.321 0.377 0.286 0.389 0.188 0.258 0.330 0.444 0.494 0.564

Geometric Structure 0.085 0.237 0.084 0.313 0.438 0.415 0.505 0.431

Symmetry 0.071 0.500 0.095 0.333 0.523 0.548 0.464 0.380

Mean KRCC 0.145 0.251 0.195 0.363 0.384 0.415 0.399 0.452 0.494

Total Mean std PLCC 0.262 0.231 0.272 0.277 0.237 0.256 0.271 0.439 0.296 0.468 0.283 0.567 0.261 0.613

p-value 0.031 1e-5 0.009 1e-3 6e-10 1e-11 1e-10

10 Table 3. Rank correlation of objective and subjective measures (K=3) for RetargetMe dataset [28]. Method

SIFT flow [18] EMD [17] Liu-Luo [19] Liu-Lin[20] PGDIL [21] ARS [24] Proposed

Lines/Edges

Faces/People

0.241 0.301 0.227 0.547 0.560 0.543

0.428 0.416 0.568 0.558 0.615 0.650

Attribute Foreground Texture Objects 0.312 0.442 0.216 0.295 0.111 0.501 0.471 0.552 0.403 0.475 0.578 0.636

In Table 2, the results of the proposed method are far better than other RIQA metrics for the categories of Faces/People, Texture, and Foreground objects. In Lines/Edges category also our results are very close to the best KRCC results. Our KRCC values are still lagging those of Liang [23] and ARS [24] in Symmetry and Geometric structure classes, respectively. In Table 3, our KRCC ( ) values in three subsets including Faces/People, Texture, and Foreground objects, are superior to other methods, while they are slightly weaker in other subsets. Mean values of KRCC over the 1000 train-test process are displayed for each of the 37 original images of RetargetMe dataset in Fig. 8 with ( ) and Fig. 9 ( ). The KRCC ( ) values for 28 of 37 images (76%), is obtained to be more than 0.40. Also in Fig. 9 the reported KRCC ( ) values for 26 images (70%) are greater than 0.50 and only two cases have negative correlation values. These results represent the high ability of the proposed technique to properly rank retargeted images.

Total Geometric Structure 0.303 0.226 0.103 0.580 0.583 0.505

Symmetry 0.002 0.534 0.056 0.614 0.500 0.417

Mean KRCC 0.298 0.326 0.304 0.537 0.533 0.519 0.559

std 0.483 0.496 0.448 0.383 0.346 0.340

One reason for the results on RetargetMe dataset is that the qualities of many retargeted images are very close to each other. In such cases the subjective judgment becomes difficult. Thus, the subjective rankings of such images are not good references for objective quality assessment [21]. High variance of subjective scores for retargeted versions of a single original image is representative of good preference by human visual system. Low variances are for images that the viewers in many cases failed to take firm decision to prefer one image to the other. To check the performance of the proposed method, for cases that the retargeted images are rankable by human eye, we did what [21] suggests. We measured performance of our method on 5 original images with highest variances of subjective scores on Top 5, on Top 10, on Top 15, and finally on all images on the basis of KRCC ( ). It is shown in Fig. 10 that removal of test images that do not have a reliable subjective ranking can improve the rank correlation between subjective and objective ratings. Our rank correlation results are also compared with [17], [30], [21] and [24] in Fig. 10. It is clear that the curve of the proposed method, as compared to other methods, is at higher levels.

Fig 8. The mean KRCC ( ) for each of 37 images of RetargetMe dataset [28] by the proposed method

Fig 10. Comparison of mean KRCC for 5, 10, 15 and 37 images with highest subjective score variances‫ ‏‬in RetargetMe dataset [28].

V. CONCLUSION

Fig 9. The mean KRCC ( ) for each of 37 images of RetargetMe dataset [28] by the proposed method

Image retargeting algorithms aim to retarget images for displaying with different aspect ratios and sizes. Most of the existing quality assessment methods cannot be applied on retargeted images because the reference and the retargeted images are not of the same size. In this paper, we have proposed a novel method for quality estimation of retargeted

11 images. We knew that retargeted images cause distortions in the image. We came up with set of features that show what types of distortions are important to human visual system. We introduced the concept of the effect of homogeneity in distribution of deformed blocks throughout an image. Hence, our contributions were: 1) finding how each block of the original image is warped in the retargeted image by applying optical flow between the scaled and retargeted images, 2) formulation of features to show the overall area and aspect ratio changes in salient regions, 3) formulation of features to show the homogeneity of distribution of blocks with geometric shape distortions, 4) formulation if local deformation in multi scaled blocks and strips, 5) the use of per-image normalization to make it possible to learn a generalized model on the set of training features. The performed set of experiments showed that the performance of our method was superior to relevant existing methods in terms of correlation with subjective quality scores. REFERENCES [1]

[2]

[3]

[4]

[5]

[6] [7]

[8]

[9]

[10]

[11]

[12]

L. Ma, C. Deng, W. Lin, K. N. Ngan, and L. Xu, “Retargeted Image Quality Assessment: Current Progresses and Future Trends,” in Visual Signal Quality Assessment, ed: Springer, pp. 213-242, 2015. D. Vaquero, M. Turk, K. Pulli, M. Tico, and N. Gelfand, “A survey of image retargeting techniques,” in SPIE Optical Engineering+ Applications, pp. 779814-779814-15, 2010. X. Zhang, Y. Hu, and D. Rajan, “Dynamic distortion maps for image retargeting,” Journal of Visual Communication and Image Representation, vol. 24, no. 1, pp. 81-92, 2013. S. Avidan and A. Shamir, “Seam carving for content-aware image resizing,” in ACM Transactions on graphics (TOG), vol. 26, no. 3, p. 10, 2007. L. Wolf, M. Guttmann, and D. Cohen-Or, “Non-homogeneous contentdriven video-retargeting,” in IEEE 11th International Conference on Computer Vision ( ICCV 2007), pp. 1-6, 2007. M. Rubinstein, A. Shamir, and S. Avidan, “Multi-operator media retargeting,” in ACM Transactions on Graphics (TOG), p. 23, 2009. Y.-S. Wang, C.-L. Tai, O. Sorkine, and T.-Y. Lee, “Optimized scaleand-stretch for image resizing,” ACM Transactions on Graphics (TOG), vol. 27, no. 5, p. 118, 2008. P. Krähenbühl, M. Lang, A. Hornung, and M. Gross, “A system for retargeting of streaming video,” ACM Transactions on Graphics (TOG), vol. 28, no. 5, p. 126, 2009. Y. Pritch, E. Kav-Venaki, and S. Peleg, “Shift-map image editing,” in IEEE 12th International Conference on Computer Vision (ICCV 2009), pp. 151-158, 2009. A. K. Moorthy and A. C. Bovik, “Blind image quality assessment: From natural scene statistics to perceptual quality,” IEEE Transactions on Image Processing, , vol. 20, no. 12, pp. 3350-3364, 2011. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, 2004. L. Li, W. Xia, Y. Fang, K. Gu, J. Wu, W. Lin, and J. Qian, “Color image quality assessment based on sparse representation and reconstruction

residual,” Journal of Visual Communication and Image Representation, vol. 38, pp. 550-560, 2016. [13] M. Song, D. Tao, C. Chen, J. Bu, Y. Yang, “Color-to-gray based on chance of happening preservation”, Neurocomputing, vol. 119, pp. 222231, 2013. [14] W. Lin and C.-C. J. Kuo, “Perceptual visual quality metrics: A survey,” Journal of Visual Communication and Image Representation, vol. 22, no. 4, pp. 297-312, 2011. [15] D. Simakov, Y. Caspi, E. Shechtman, and M. Irani, “Summarizing visual data using bidirectional similarity,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008), pp. 1-8, 2008. [16] Y. Rubner, C. Tomasi, and L. J. Guibas, “The earth mover's distance as a metric for image retrieval,” International journal of computer vision, vol. 40, no. 2, pp. 99-121, 2000. [17] O. Pele and M. Werman, “Fast and robust earth mover's distances,” in IEEE 12th international conference on Computer vision (ICCV 2009), pp. 460-467, 2009. [18] C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman, “Sift flow: Dense correspondence across different scenes,” in 10th European Conference on Computer Vision (ECCV 2008), pp. 28-42, 2008. [19] Y. J. Liu, X. Luo, Y. M. Xuan, W. F. Chen, and X. L. Fu, “Image retargeting quality assessment,” in Computer Graphics Forum, vol. 30, no. 2, pp. 583-592, 2011. [20] Y. Fang, K. Zeng, Z. Wang, W. Lin, Z. Fang, and C.-W. Lin, “Objective quality assessment for image retargeting based on structural similarity,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 4, no. 1, pp. 95-105, 2014. [21] C.-C. Hsu, C.-W. Lin, Y. Fang, and W. Lin, “Objective quality assessment for image retargeting based on perceptual geometric distortion and information loss,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 3, pp. 377-389, 2014. [22] A. Liu, W. Lin, H. Chen, and P. Zhang, “Image retargeting quality assessment based on support vector regression,” Signal Processing: Image Communication, vol. 39, no. 2, pp. 444-456, 2015. [23] Y. Liang, Y.-J. Liu, and D. Gutierrez, “Objective Quality Prediction of Image Retargeting Algorithms,” IEEE Transactions on Visualization and Computer Graphics, vol. PP, no. 99, p. 1, 2016. [24] Y. Zhang, Y. Fang, W. Lin, X. Zhang, and L. Li, “Backward Registration Based Aspect Ratio Similarity (ARS) for Image Retargeting Quality Assessment,” IEEE Transactions on Image Processing, vol. 25, no. 9, pp. 4286-4297, 2016. [25] M. Rubinstein, D. Gutierrez, O. Sorkine, and A. Shamir, “A comparative study of image retargeting,” in ACM transactions on graphics (TOG), p. 160, 2010. [26] L. Xu, J. Jia, and Y. Matsushita, “Motion detail preserving optical flow estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1744-1757, 2012. [27] Q. Yan, L. Xu, J. Shi, and J. Jia, “Hierarchical saliency detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013), pp. 1155-1162, 2013. [28] M. Rubinstein, D. Gutierrez, O. Sorkine, and A. Shamir, “Retarget me a benchmark for image retargeting,” ed, 2012 Online: http://people.csail.mit.edu/mrub/retargetme/, available 2016. [29] H. Abdi, “The Kendall rank correlation coefficient,” Encyclopedia of Measurement and Statistics. Sage, Thousand Oaks, CA, pp. 508-510, 2007. [30] C. Chamaret, O. Le Meur, P. Guillotel, and J.-C. Chevet, “How to measure the relevance of a retargeting approach?,” in Trends and Topics in Computer Vision, ed: Springer, pp. 156-168, 2010.

12

Highlights







 



We found how each block is warped by applying optical flow between the scaled and retargeted images. We formulated features to show the overall area and aspect ratio changes in salient regions. We formulated features to show the homogeneity of distribution of blocks with geometric shape distortions. We modeled local deformations in multi scaled blocks and strips. We used per-image normalization to make it possible to learn a generalized model on the set of training features. We have achieved much better results compared to state-of-the-art research works.