A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency

A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency

Signal Processing: Image Communication ] (]]]]) ]]]–]]] 1 Contents lists available at ScienceDirect 3 Signal Processing: Image Communication 5 j...

6MB Sizes 0 Downloads 110 Views

Signal Processing: Image Communication ] (]]]]) ]]]–]]]

1

Contents lists available at ScienceDirect

3

Signal Processing: Image Communication

5

journal homepage: www.elsevier.com/locate/image

7 9 11

A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency

13 15 17

Q1

Qiuping Jiang a, Feng Shao a,n, Gangyi Jiang a, Mei Yu a, Zongju Peng a, Changhong Yu b a

19

b

Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China Department of Information and Electronic Engineering, Zhejiang Gongshang University, 310018 Hangzhou, China

21

a r t i c l e i n f o

abstract

23 25 27 29

Keywords: 3D visual saliency Depth saliency Visual comfort based saliency Visual comfort assessment

31 33 35 37 39

With the emerging development of three-dimensional (3D) related technologies, 3D visual saliency modeling is becoming more important and challenging. This paper presents a new depth perception and visual comfort guided saliency computational model for stereoscopic 3D images. The prominent advantage of the proposed model is that we incorporate the influence of depth perception and visual comfort on 3D visual saliency computation. The proposed saliency model is composed of three components: 2D image saliency, depth saliency and visual comfort based saliency. In the model, color saliency, texture saliency and spatial compactness are computed respectively and fused to derive 2D image saliency. Global disparity contrast is considered to compute depth saliency. Particularly, we train a visual comfort prediction function to distinguish stereoscopic image pair as high comfortable stereo viewing (HCSV) or low comfortable stereo viewing (LCSV), and devise different computational rules to generate a visual comfort based saliency map. The final 3D saliency map is obtained by using a linear combination and enhanced by a “saliency-center bias” model. Experimental results show that the proposed 3D saliency model outperforms the state-of-the-art models on predicting human eye fixations and visual comfort assessment. & 2015 Published by Elsevier B.V.

41 43

63 1. Introduction

45 47 49 51 53

Image/video saliency detection has become a crucial Q4 and meaningful research area in various video processing

applications, e.g., video coding [1], image retrieval [2,3], image/video quality assessment [4,5], and content-aware image retargeting [6,7]. The human vision system (HVS) can selectively pay more attention to those particular regions, due to the fact that the HVS cannot fully process the tremendous amount of visual information. This process can be explained by visual attention mechanism [8].

55 57 59

n

Corresponding author. Tel.: þ86 574 87600017; fax: þ 86 574 87600582. E-mail addresses: [email protected], [email protected] (F. Shao).

For two-dimensional (2D) images, the salient locations tend to have distinctive visual attributes that make them distinguish from their surroundings. Currently, many saliency computational models have been proposed for 2D image/videos [9–21]. State-of-the-art saliency computational models can be classified into two types: top-down [9,10] and bottom–up [11–21]. Particularly, bottom–up approaches are data-driven or task-independent, which have been widely investigated in the past decades. Many bottom–up approaches are based on the assumption that locations distinctive from their surroundings are likely to attract viewer's attention. Itti et al. [11] proposed one of the earliest computational saliency models. In their work, the bottom–up saliency map is calculated from the multi-scale center-surrounding differences by feature

http://dx.doi.org/10.1016/j.image.2015.04.007 0923-5965/& 2015 Published by Elsevier B.V.

61 Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

65 67 69 71 73 75 77 79 81

2

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

Q. Jiang et al. / Signal Processing: Image Communication ] (]]]]) ]]]–]]]

contrast from color, intensity and orientation. Inspired by Itti's model, Harel et al. [12] computed bottom–up saliency map by applying graph theory to form activation maps from the raw visual features. Erdem et al. [13] proposed a saliency computational method by using covariance matrices of features as region descriptors for saliency detection. Bruce et al. [14] designed a saliency computational algorithm based on information maximization. Hou et al. [15] devised a spectral residual (SR) based saliency method by analyzing the difference between the log Fourier amplitude spectrum of an image and the prior knowledge. Achanata et al. [16] proposed a frequency tuned (FT) approach to capture global contrast for saliency estimation. Levine et al. [17] combined global contrast in frequency domain and local contrast in spatial domain for saliency estimation. The related works were also proposed in [18–21]. On the other hand, these methods can be classified into other two types: space-based (eye fixation prediction) [11–15,17,18] or object-based (salient object detection) saliency models [16,19–21]. The space-based saliency computational models target to predict the eye fixations in free viewing, which defines saliency as outliers of the distribution of visual features on the image, whereas the object-based saliency models aim to detect the most salient objects within images, which follows the hypothesis that objects may better predict fixations. Accordingly, the ground truth used to assess these two different types of models is dramatically different: eye fixation vs manually-labeled binary map. In this work, we aim to address the first task for stereoscopic images. Currently, three-dimensional (3D) media has become important for information representation, and has various emerging applications, such as 3D video coding [22], 3D quality assessment [23], 3D rendering [24], etc. Recent studies have been shown that 3D visual attention is an important factor affecting 3D Quality of Experience (3DQoE) [25,26]. Therefore, for 3D saliency computation, a more challenging issue than its 2D counterpart is how to account for various 2D factors (e.g., color, orientation and texture) and 3D factors (e.g., depth perception, visual comfort) simultaneously. Different from saliency computation for 2D images, depth perception will largely influence human viewing behavior, and the depth factor has been considered in saliency computation for 3D images [27–29]. Jansen et al. [30] exploited visual attention mechanism for both 2D and 3D natural scenes, and found that binocular disparity will affect human visual Q5 attention in free-viewing 3D natural scenes. Lang et al. [31] analyzed the discrepancies between 2D and 3D human fixations of the same scene, and proposed a 3D saliency detection method by extending the existing 2D saliency models. Niu et al. [32] explored the saliency analysis for stereoscopic images by extending a 2D image saliency detection model. Wang et al. [33] proposed a computational model of visual attention for 3D images by extending the traditional 2D saliency detection methods. Fang et al. [34] extracted color, intensity, texture and depth features from DCT coefficients for 3D saliency computation. Kim et al. [35] presented a 3D video saliency computational model that accounts for diverse low-level attributes and high-level classification of scenes for 3D videos. The related works were also proposed in

[36–39]. However, the factor of visual discomfort is not considered in the above methods. As an important aspect of 3D-QoE, visual discomfort refers to the subjective sensation of visual fatigue when viewing 3D signals. The issues of visual discomfort have been widely investigated by the subjective [40–45] and objective methods [46–49]. As investigated in earlier studies [40–42], visual discomfort can be induced by many factors, such as excessive binocular disparity, accommodation–convergence conflict, binocular mismatch, depth motion, etc. The relationship between visual discomfort and the amount of binocular disparities was investigated in [50,51]. It was generally accepted that excessive binocular disparities can negatively affect the experienced 3D visual comfort, leading to visual discomfort. Meanwhile, Khaustova et al. [37] investigated the influence of disparity on the visual attention by subjective experiments, and found that gaze deployment is not significantly influenced by stereoscopic contents with uncrossed disparities, while a significant difference between 2D and 3D has been observed for the crossed disparity condition. Therefore, there should be some latent relationships between visual discomfort and visual attention, and the factor of visual discomfort has guiding significance for the 3D saliency computation. Currently, some attention model-based visual comfort assessment (VCA) approaches have been proposed. Jung et al. [48] proposed a 3D saliency detection model by fusing 2D image and disparity saliency maps, and extracted perceptually significant disparity features using the 3D saliency as weighting for the prediction of visual discomfort. Jung et al. [49] detected the perceptually significant regions in a stereoscopic 3D video, and quantified the average magnitude of motion in each region for visual discomfort prediction. The underlying principle is that the visual discomfort in salient regions will have dominant effect to the perception of overall visual discomfort of entire stereoscopic image. However, these methods are not targeted at quantifying how and to what extent the factor of visual discomfort will affect the 3D visual attention. In this paper, besides the factor from depth perception, from another perspective, we aim at demonstrating our proposed 3D saliency computational model that quantifies the influence of visual discomfort induced by conflict between convergence and accommodation. Thus far, little work has been done on further investigation of visual discomfort in 3D saliency computation. The basic idea behind the proposed 3D saliency computational model is that visual discomfort is likely to play an essential role in guiding 3D saliency computation for stereoscopic 3D image, because 3D saliency will embody different manifestations under different sensations of visual discomfort. To the best of our knowledge, although some VCA metrics that exploited visual attention have been reported in the literature, no attempt has thus far been made for use of visual discomfort for 3D saliency computation. In order to explore the basic idea behind the proposed 3D saliency computational model, the research innovations presented in this paper include as follows:

63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119

1) To emphasize the influence of visual discomfort, the relationship between visual discomfort and 3D visual attention is formulated in detail.

121 123

61 Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

Q. Jiang et al. / Signal Processing: Image Communication ] (]]]]) ]]]–]]]

1 3 5 7 9

2) We train a visual comfort prediction function, and distinguish a stereoscopic image as high comfortable stereo viewing (HCSV) or low comfortable stereo viewing (LCSV) and then derive different visual comfort based saliency maps for these two cases. 3) Factors from low-level visual features, depth perception and visual discomfort, are considered to derive 2D saliency, depth saliency and visual comfort based saliency, receptively, and three saliency maps are integrated into a 3D saliency map.

11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41

The remainder of this paper is organized as follows. Section 2 analyzes the relationship between visual discomfort and 3D visual attention. Section 3 illustrates each part of the proposed 3D saliency computational model in detail. Experimental results are given in Sections 4 and 5 drawn conclusions of this paper. 2. Relationship between visual discomfort and 3D visual attention As introduced in Section 1, there are numerous potential causes of visual discomfort when viewing 3D images such as convergence–accommodation conflict, binocular mismatch, crosstalk, depth motion, unnatural blur/sharpness, etc. In this paper, we focus on the vergence–accommodation conflict, because it is present in all disparity-based stereo displays, while the other factors are not present in some instantiations of stereo display technology [42]. Fig. 1 shows the accommodation and convergence differences under natural viewing and stereo viewing. In natural viewing, the viewer adjusts the vergence of the eyes to look at an object, and the accommodation and convergence are always coupled. While in the stereo viewing, the convergence distance is unconsciously adjusted according to the depth position of the perceived object, which may be located behind or in front of the screen, while the accommodation still remains on the screen. From the perspective of neural processing, the conflict induced by stereo viewing need to be resolved: the viewer

3

must accommodate to a different distance than the distance to which he/she must converge, because of the neural coupling mechanism between convergence and accommodation. Otherwise, symptoms like eyestrain, headache, visual discomfort, and diplopia phenomenon may occur. However, previous studies have investigated that objects could be seen clearly while maintaining binocular fusion when they are positioned inside a specific area. This specific area is known as Zone of Clear Single Binocular Vision (ZCSBV) [52]. In addition, the set of accommodation and convergence responses that can be achieved without discomfort is defined as Percival's Zone of Comfort (PZC), which is about one-third of ZCSBV [53,54]. Obviously, objects in the natural viewing always fall inside the PZC while many perceived objects under 3D viewing do not. Fig. 1(b) and (c) shows the cases that the perceived object falls inside and outside the PZC, respectively. To fuse and focus the 3D objects that perceived outside the PZC, the viewer must struggle to counteract the normal accommodation–convergence coupling, leading to visual discomfort [55]. Therefore, from the perspective of providing high QoE to viewers, salient objects should be positioned inside the PZC (small disparity) to reduce the conflict between accommodation and convergence whenever possible (This motivates us to consider small disparity as a rule to predict salient locations in perceptually comfortable stereoscopic images). From another perspective, it is generally accepted that people are usually interested in the regions popping out from the screen, which may have small depth (or large disparity) values against the other regions [56]. These popping-out regions will automatically attract more attention no matter whether the stereoscopic image is perceptually comfortable or not. Motivated by this property, regions with large disparity tend to be salient for comfortable or uncomfortable stereoscopic images. However, this property seems to be completely contradictory to the previous discussed property that the salient objects will be designated small disparities to provide high visual comfort. In order to distinguish the above two properties

63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103

43

105

45

107

47

109

49

111

51

113

53

115

55

117

57

119 121

59 61

Fig. 1. Illustrations of different manifestations of accommodation and convergence under natural viewing and stereo viewing. (a) Natural viewing; (b) high comfortable stereo viewing (the perceived object falls inside the PZC); (c) low comfortable stereo viewing (the perceived object falls outside the PZC).

Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

123

4

1 3 5 7 9 11

Q. Jiang et al. / Signal Processing: Image Communication ] (]]]]) ]]]–]]]

and to highlight different sensations of visual discomfort, the above small disparity limitation will be satisfied only for those perceptually comfortable stereoscopic images, while for those perceptually uncomfortable stereoscopic images, this property will be no longer applicable. Thus, for the perceptually uncomfortable stereoscopic images, regions with large disparity will dominate the perceived saliency. Based on the above analyses, we know that 3D saliency will embody different manifestations under different sensations of visual discomfort. Therefore, we derive two rules for visual comfort based saliency computation as follows:

database, and predict the degree of visual discomfort for each stereoscopic image pair w.r.t. the learnt visual comfort prediction function. Then, by distinguishing a stereoscopic image pair as high comfortable stereo viewing (HCSV) or low comfortable stereo viewing (LCSV), we adopt different computational rules to generate a visual comfort based saliency map. Finally, three saliency maps are linearly combined to obtain 3D saliency map enhanced with “saliency-center bias” model. As an example, Fig. 3 shows different saliency maps obtained by each component as well as the final 3D saliency maps. We will elaborate each part in the following subsections.

17 19 21 23

1) Rule-1: regions falling in the visual comfort zone (small disparity) tend to be salient. 2) Rule-2: regions popping out from the screen plane (large disparity) tend to be salient. These two rules will be activated or inactivated according to the sensations of visual discomfort. Particularly, for the perceptually comfortable stereoscopic images, both Rule-1 and Rule-2 are activated, while for the perceptually uncomfortable stereoscopic images, only Rule-2 is activated.

25 3. Proposed 3D saliency computational model

31 33 35 37 39 41

67 69 71 73

3.1. 2D image saliency

77

In this work, regional saliency is calculated by combining global contrast measurement from low-level features (e.g., color, intensity, texture and location). We define salient regions from three aspects: color uniqueness, texture distinctiveness and spatial compactness. For region segmentation, we use the Simple Linear Iterative Clustering (SLIC) algorithm in [57] to oversegment the input left image, which can efficiently generate regular, compact and nearly uniform superpixels with low computational cost. In the experiment, the number of superpixels is set to 400.

79 81 83 85 87 89

27 29

65

75

13 15

63

Fig. 2 shows the framework of the proposed 3D saliency computational model. The proposed saliency model is composed of three components: 2D image saliency, depth saliency and visual comfort based saliency. Firstly, we segment the input left image into multiple superpixels. Based on these segmented superpixels, we combine uniqueness measurement with spatial compactness measurement from color and texture features to generate color feature, texture feature and spatial compactness maps, respectively, and fuse these maps to get a 2D saliency map. Based on the same superpixels, we compute global disparity contrast to generate a depth saliency map. To compute the visual comfort based saliency, we first train a visual comfort prediction function from a given training

3.1.1. Inter-region similarity Based on these basic superpixels, we first compute inter-region similarity from the aspects of color, texture and spatial location, i.e., color, texture and spatial similarities are taken into account in inter-region similarity calculation. For two superpixels (e.g., SPi and SPj), the inter-region similarity ρ(i, j) is computed as ρði; jÞ ¼ ρc ði; jÞ Uρt ði; jÞ U ρs ði; jÞ

91 93 95 97

ð1Þ

where ρc(i, j), ρt(i, j) and ρs(i, j) denote color similarity, texture similarity and spatial similarity between SPi and SPj, respectively. Here, we use multiplication strategy to combine these color, texture and spatial similarities as in [21].

99 101 103

43

105

45

107

47

109

49

111

51

113

53

115

55

117

57

119

59

121

61

Fig. 2. The framework of the proposed saliency computational model for stereoscopic images.

Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

123

Q. Jiang et al. / Signal Processing: Image Communication ] (]]]]) ]]]–]]]

5

1

63

3

65

5

67

7

69

9

71

11

73

13

75

15

77

17

79

19

81

21 23 25

Fig. 3. Results of individual saliency maps obtained by each component and final 3D saliency maps: (a) input 3D image; (b) disparity map; (c) 2D image saliency map; (d) depth saliency map; (e) visual comfort based saliency map; (f) 3D saliency map; (g) human eye fixation density map (Ground truth map).

3.1.1.1. Color similarity map. Color similarity between SPi and SPj is calculated based on color histograms [58]

27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61

ρc ði; jÞ ¼

3 qX m

min

n

H ri ; H rj

o

ð2Þ

r¼1

where H ri and H rr represent the r-th bin of color histograms Hi and Hj, respectively. To calculate the color histogram, each color channel (using RGB color space in our experiment) is first quantized into q bins, and global color histogram H0 with q  q  q bins is constructed using all pixels in the image [59]. The quantized color of each bin ck (k¼1,2,…,q3) is calculated as the mean of those pixels fell into the k-th bin. Considering that color in a natural image typically covers only a small portion of the full color space, to further reduce the number of color bins in this histogram, m most frequently occurring color bins that cover the colors of more than p% of the image pixels are selected as the representative colors, and the remaining least frequently occurring (q3  m) bins are respectively replaced by the one of the m bins with smallest color difference. Specifically, the color difference is measured by the ℓ2-norm distance. Accordingly, the quantized color of the selected bin is updated. In our experiment, the value of q equals to 8 and the percentage p is set to 95%. Thus, all colors that appear in an image can be represented by a color quantization table. 3.1.1.2. Texture similarity map. We use Gabor filtering response and luminance component to measure texture similarity between regions. Gabor filter is appropriate for texture representation due to its high consistency with simple cells in primary visual cortex (V1). To generate the Gabor response map, we use the responses of a bank of 20 Gabor filters (five scales and four orientations). The Gabor response map IG(x,y) is calculated by summing the responses across all of the scales and orientations. With

83 85

the luminance component, the texture covariance matrix Ci for SPi is defined as:

87

X 1 ðf ðx; yÞ  μi Þðfðx; yÞ  μi ÞT ð3Þ jSPi j 1 ðx;yÞ A SPi   where f ðx; yÞ ¼ Iðx; yÞ; ð∂Iðx; yÞ=∂xÞ; ð∂Iðx; yÞ=∂yÞ; I G ðx; yÞ , μi represents the mean feature vector of those pixels in SPi, and |SPi| denotes the number of pixels in SPi. Then, we calculate the distance between the texture covariance matrices by [60] vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u n uX 2 dðC i ; C j Þ ¼ t ln λk ðC i ; C j Þ ð4Þ

89

Ci ¼

k¼1

  where λk ðC i ; C j Þ k ¼ 1;2; ::: ;n is the generalized eigenvalues of Ci and Cj. The texture similarity between SPi and SPj is simply the reciprocal of the distance ρt ði; jÞ ¼

1 dðC i ; C j Þ

ð5Þ

91 93 95 97 99 101 103 105 107 109

3.1.1.3. Spatial similarly map. The between SPi and SPj is calculated by   ‖pi pj ‖ ρs ði; jÞ ¼ exp  2 σ

spatial

similarity 111 ð6Þ

113

where pi and pj represent the geometrical center of SPi and SPj, respectively, and σ controls the strength of spatial distance which is set to 0.4 as in [61].

115

3.1.2. Regional saliency calculation Observed from a variety of images, we find that salient objects are usually distinctive in color appearance and compact in color distribution, compared with their background. Therefore, we combine regional uniqueness (global

119

Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

117

121 123

Q. Jiang et al. / Signal Processing: Image Communication ] (]]]]) ]]]–]]]

6

1 3 5 7 9 11

contrast) measurement with spatial compactness measurement from color and texture features. In addition, we adopt the inter-region similarity to refine the global contrast and spatial compactness measurements. The underlying principle for the inter-region similarity refinement is that regions with high similarity values usually employ high saliency scores. Specifically, three individual saliency maps are calculated by combining global contrast measurement and spatial compactness measurement weighted by inter-region similarity. The color saliency of SPi is defined based on global contrast from color features and inter-region similarly as follows:

13 15

N P

CSðiÞ ¼

j ¼ 1;j a i

j ¼ 1;j a i

19 GCCðiÞ ¼ 21

25 27 29

N X

ρd ði; jÞ U SPj U‖ci cj ‖

TSðiÞ ¼

ρði; jÞ U GTCðjÞ

j ¼ 1;j a i

ð9Þ

N P j ¼ 1;j a i

35

39

GTCðiÞ ¼

45 47 49

N X

ρði; jÞ

ρd ði; jÞ U SPj U dðC i ; C j Þ

ð10Þ

j ¼ 1;j a i

The spatial saliency of SPi is defined as: N P

41 43

ð8Þ

where ci and cj denote the mean color vectors in the CIELAB color space of SPi and SPj [62], respectively, |SPj| denotes the number of pixels in SPj, and N is the number of all superpixels in an image. Similarly, the texture saliency of SPi is defined based on global contrast from texture features and inter-region similarly measure as follows:

33

37

ρði; jÞ

j ¼ 1;j a i

N P

31

ð7Þ

N P

17

23

ρði; jÞ U GCCðjÞ

SSðiÞ ¼

j ¼ 1;j a i

ρði; jÞ UDðjÞ

N P j ¼ 1;j a i

ð11Þ ρði; jÞ

where D(j) is the Euclidean distance between the geometrical center of SPj and image center. By integrating the saliency maps using multiplication strategy [21], the final regional saliency for SPi is defined as:

51

S2D ðiÞ ¼ NCSðiÞ UNTSðiÞ U NSSðiÞ

53

where NCS(i) and NTS(i) are the normalized versions of CS (i) and TS(i), respectively, and NSS(i) is the inversely normalized version of SS(i). Fig. 4 shows the examples of each saliency feature maps and the final 2D image saliency maps. Obviously, by combining each saliency feature maps using multiplication operation, regions with high color/ texture distinctiveness and strong spatial compactness will be highlighted and the backgrounds are suppressed effectively.

55 57 59 61

ð12Þ

3.2. Depth saliency

63

The key problem in 3D saliency modeling is how to incorporate depth cues into the 2D features. The existing 3D saliency models can be divided into two groups based on the role of depth cues in saliency modeling: (1) depth cues are used as additional features, i.e., “depth-weighting model” [24,31,33,34]; (2) depth saliency are directly calculated from the depth cues, i.e., “depth-saliency model” [27,28,32]. The main difference of the two approaches is that if it is to involve a stage to extract depth features and create a depth saliency map. In this work, depth saliency for each region is directly generated by computing global contrast from disparity features. It is known that disparity contrast not only provides primary depth perception, but also is an important indicator for potential salient objects [33]. From this perspective, we compute global disparity contrast to characterize the effect of depth contrast on the depth saliency model. For simplicity, we directly extend the color contrastbased saliency model in [59] for global disparity contrast computation. The main differences with the color contrast-based saliency model are that: (1) we simple use the same superpixels with color images, instead of using the graph-based segmentation algorithm in [59]. Even region sizes should be differentiated for the color image and disparity map, disparity contrast may be a constant over regions if having the same disparity. (2) we do not consider the region size as a factor in measuring disparity contrast because the superpixels are generated from color image. Specifically, given an input disparity map Id(x,y), the global contrast from disparity feature of SPi is calculated by 0 1 N X ð13Þ SDP ðiÞ ¼ norm@ ωd ði; jÞ Udv ði; jÞA

65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97

j ¼ 1;j a i

where norm(  ) denotes the normalization operation, ωd(i, j) and dv(i,j) are factors reflecting spatial distance and absolute disparity difference, respectively, which are calculated as: P P ðx;yÞ A SPj I d ðx; yÞ ðx;yÞ A SPi I d ðx; yÞ dv ði; jÞ ¼ ð14Þ  SPj jSPi j

‖pi  pj ‖ ωd ði; jÞ ¼ exp λ U max fH; Wg

99 101 103 105 107

ð15Þ

where W and H are the width and height of an image, respectively, and λ is a parameter to quantify the importance of each factor in depth saliency. In our experiment, λ is empirically set to 5. We present the results of depth saliency maps in Fig. 3(d). It is seen that the estimated depth saliency maps can well characterize the depth perception since those regions with protruding objects or disparity discontinuities are highlighted.

109 111 113 115 117

3.3. Visual comfort based saliency

119

Besides low-level visual features and depth cues, visual discomfort is another important factor affecting 3D visual attention (this makes 3D visual attention distinguish with

121

Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

123

Q. Jiang et al. / Signal Processing: Image Communication ] (]]]]) ]]]–]]]

1 3 5 7 9 11 13 15 17 19

its 2D counterpart). As stated in the previous section, saliency will embody different manifestations under different sensations of visual discomfort. Therefore, we classify all stereoscopic images into two categories: HCSV or LCSV images, according to their degrees of visual discomfort, and design different saliency computational models for the HCSV and LCSV images, respectively. The detailed computational flow is shown in Fig. 5. In order to predict the degree of visual discomfort, we propose to learn a visual comfort prediction function from a given training database. Specifically, we randomly select 80 stereoscopic images from IVY LAB Stereoscopic 3D image database [63] associated with their subjective visual comfort scores (e.g., MOS values). The subjective visual comfort label of each stereoscopic image Li is obtained by comparing its MOS value with a fixed threshold T. If MOSt Z T, assigning Lt ¼ þ 1 ( þ1 denotes positive); otherwise, assigning Lt ¼  1 (  1 denotes negative). The threshold T is set to 3.5 empirically that is based on the Likert-like scale used in subjective

7

assessment. In the subjective VCA, each participate was asked to assign a visual comfort score to each stereoscopic image according to a Likert-like scale: 5 ¼very comfortable, 4 ¼comfortable, 3 ¼ mildly comfortable, 2 ¼uncomfortable and 1 ¼extremely uncomfortable. The selected threshold can well distinguish a stereoscopic image pair as HCSV or LCSV. Then, for each training stereoscopic image, refer to [46,47], disparity features (i.e., mean absolute disparity magnitude d1, disparity variance d2, and disparity range d3) are defined as follows: d1 ¼

M X N X 1 dðx; yÞ M  Nx¼1y¼1

ð16Þ

63 65 67 69 71 73 75 77

M X N X

1 dðx; yÞ  d1 2 d2 ¼ M  Nx¼1y¼1

ð17Þ

    d3 ¼ max dðx; yÞ  min dðx; yÞ

ð18Þ

79 81

21

83

23

85

25

87

27

89

29

91

31

93

33

95

35 37

Fig. 4. Examples of three saliency components and final 2D image saliency map: (a) input image “Hall”; (b) The corresponding color saliency map of “Hall”; (c) the corresponding texture saliency map of “Hall”; (d) the corresponding spatial compactness map of “Hall”; (e) 2D image saliency map of “Hall”; (f) input image “Umbrella”; (g) the corresponding color saliency map of “Umbrella”; (h) the corresponding texture saliency map of “Umbrella”; (i) the corresponding spatial compactness map of “Umbrella”; (j) 2D image saliency map of “Umbrella”.

97 99

39

101

41

103

43

105

45

107

47

109

49

111

51

113

53

115

55

117

57

119

59

121

61

Fig. 5. The detailed computation flow of visual comfort based saliency.

Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

123

Q. Jiang et al. / Signal Processing: Image Communication ] (]]]]) ]]]–]]]

8

1

11

The disparity features are combined into a visual comfort feature vector, x¼[d1, d2, d3]. Then, with the collected fxt g80 t¼1 and the corresponding fLt g80 t ¼ 1 A f þ 1;  1g, we train a visual comfort prediction function Φ(  ) via support vector machine (SVM). Testing results on the remaining stereoscopic images of IVY LAB Stereoscopic 3D image database show that the proposed visual comfort prediction function can achieve a classification accuracy of 87.5%. With the learnt visual comfort prediction function, for a testing stereoscopic image Ip, its visual comfort label Lp can be predicted by

13

Lp ¼ Φðxp Þ

3 5 7 9

15 17 19 21 23 25 27

ð19Þ

where xp is the corresponding disparity feature vector of Ip. In the previous section, we have derived two computational rules for visual comfort based saliency prediction. Based on Rule-1, regions perceived on the screen (i.e., with zero disparity) will be assigned high saliency score, while for the regions perceived behind or in front of the screen (i.e., with positive disparity or negative disparity), the saliency score decreases with the increase of the perceptual distance away from the screen. The saliency map satisfied with the rule is defined as: 8  di > < dmax ; ifdi Z 0 dmax S1 ðiÞ ¼ ð20Þ d  d > i : min ; ifdi o 0 dmin

29

33

Based on Rule-2, regions perceived furthest front of the screen (i.e., with small negative disparity) will be assigned high saliency score. The saliency map satisfied with the rule is defined as:

35

S2 ðiÞ ¼

31

37 39 41 43 45 47 49 51 53 55 57 59 61

dmax di ; dmax  dmin

ð21Þ

where dmax and dmin are the maximal and minimal disparity values, respectively, and di is the mean disparity value of SPi. Here, we design different ways to compute the visual comfort-based saliency map for the HCSV and LCSV images, respectively. As we have analyzed in the previous section, both of the two rules will be activated for the HCSV images (i.e., perceptually comfortable stereoscopic images). While for the LCSV images (i.e., perceptually uncomfortable stereoscopic images), only Rule-2 is activated. Thus, for a testing stereoscopic image, based on its predicted visual comfort label ðLp A f þ 1;  1gÞ and the derived saliency maps (S1(i) and S2(i)), the final visual comfort based saliency map of SPi is computed as ( ð1 γÞ U S1 ðiÞ þ γ U S2 ðiÞ; if Lp ¼ þ1 ð22Þ SVC ðiÞ ¼ if Lp ¼ 1 S2 ðiÞ; where γ is a parameter to balance the importance of S1(i) and S2(i). Then, the remaining issue is how to derive the parameter γ to make the combination be consistent with HVS. In general, it is clear that scene with high negative disparity will lead to visual discomfort [41,44,45]. Therefore, we compute the percentage of negative disparities as

a prominent factor for the parameter γ γ ¼ λþ

Nd

o0

ð23Þ

65

where N is the number of superpixels in an image, N d o 0 is i the number of superpixels whose mean disparities are lower than 0, and λ is a parameter with default value 0.5.

67 69

3.4. Saliency map fusion

71

As in Fig. 3, each individual saliency component is completing each other. We adopt a simple but effective fusion method in this section. The proposed fusion method linearly combines 2D image saliency map, depth saliency map and visual comfort based saliency map as:

73

i

N

ð1 λÞ

63

S ¼ β2D US2D þ βDP U SDP þ βVC U SVC

ð24Þ

where the weighting parameters (β2D, βDP and βVC) are used to adjust the relative importance of each portion. Theoretically, optimal weighting parameters can be found by training on a large number of training samples. In this paper, due to the lack of 3D saliency databases for training, we set β2D ¼ βDP ¼ βVC ¼ 1=3 for simplicity. It is also known that humans' fixation distributions always shift from image center to the distributions of image features when searching the scenes for a conspicuous target [61]. We adopt ‘saliency-center biased’ Gaussian function to simulate the shifting process of human fixations. Different from the widely used “image-center bias” model that directly use image center as bias center, we formulate the “saliency center bias” model in a certain Gaussian distribution as: " !# ðx  xc Þ2 ðy yc Þ2 Gx;y ¼ exp  þ ð25Þ 2σ x 2 2σ y 2 where (xc, yc) denotes the shifted center of visual field calculated from the saliency map computed in Eq. (20). We set σx ¼0.25  W and σy ¼0.25  H, as in [64]. The coordinates of the shifted center are calculated as: xc ¼

M X

N X

x U Sðx; yÞ=

M X

N X

x¼1y¼1

x¼1y¼1

M X N X

M X N X

Sðx; yÞ

ð26Þ

75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105

xc ¼

y USðx; yÞ=

x¼1y¼1

Sðx; yÞ

ð27Þ

x¼1y¼1

107

Finally, the 3D saliency considering the center bias factor is computed as:

109

S3D ¼ η U S þð1  ηÞ UG

ð28Þ

111

where 0 oη o 1 is a parameter to adjust the importance of each component, its proper value is 0.7, as in [34]. The final results of 3D saliency map are shown in both Fig. 3(f) and Fig. 7(i).

113 115 117

4. Experimental results and analyses 119 In this section, we conduct several experiments to demonstrate the performance of the proposed saliency computational model. To this end, we compare the performances of the proposed model to some state-of-the-art

Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

121 123

Q. Jiang et al. / Signal Processing: Image Communication ] (]]]]) ]]]–]]]

1

5

2D/3D saliency models. In addition, we design multiple combination schemes to investigate the influence of each saliency component. Finally, we examine the application of the proposed 3D saliency computational model for the task of VCA.

7

4.1. Benchmark database and performance indicators

9

For performance comparison of different saliency models, the publicly available 3D eye-tracking database [33] is used. The 3D eye-tracking database contains 18 stereoscopic images and the corresponding eye fixation density maps. Among these 18 stereoscopic images, 10 images are selected from the Middlebury database, and the other eight images are captured in the campus of University of Nantes using 3D camera. Fig. 7(a) and (j) shows some examples of stereoscopic images (left images as examples) and the corresponding eye fixation density maps. This eyetracking database was created to investigate how different features (including 2D low-level and depth features) affect the distribution of human visual attention. In this paper, four commonly-used performance indicators are used to benchmark the proposed saliency model against the relevant state-of-the-art models [65]: F-measure, Linear Correlation Coefficient (LCC), Area under the ROC Curve (AUC), and Normalized Scanpath Saliency (NSS). The LCC measures the strength of a linear relationship between the predicted and ground truth saliency maps. AUC¼ 1 implies a perfect prediction, whereas a score of AUC¼0.5 results from random choices. NSS¼1 indicates that the eye positions fall in a region whose predicted saliency is one standard deviation above average. NSSZ1 exhibits significantly higher saliency scores at human fixated locations compared to other locations. To compute these indictors, each eye fixation density map is normalized (thresholded at 30%) to obtain corresponding ground truth saliency map. Overall, with larger F-measure, LCC, AUC and NSS scores, the saliency model can predict more accurate salient locations.

3

11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61

4.2. Performance comparison with the existing saliency models In order to quantify the performance of the proposed 3D saliency model, we compare seven competitive models, including three 2D-extended models and four 3D saliency models (Model 1 in [33], Model 2 in [33], Model 3 in [33] and Fang's model [34]). For the 2D-extended models, different 2D saliency maps (Itti's [11], Bruce's [14], Hou's [15]) and depth map are linearly combined to obtain the 3D saliency map. The same weights are assigned for 2D saliency map and depth map combination with [48]. We denote these 2D-extended saliency models as ‘Itti's modelþDepth’, ‘Bruce's modelþDepth’, and ‘Hou's modelþDepth’, respectively. For the three 3D saliency models in [33], the final 3D saliency maps are obtained by point-wise multiplication combination of different 2D saliency maps (Itti's [11], Bruce's [14], Hou's [15]) and depth saliency map (DSM). We denote these 3D saliency models as ‘Itti's model  DSM’, ‘Bruce's model  DSM’, and ‘Hou's model  DSM’, respectively. For Fang's model, we adopt the best combination approach (to provide the best performance in

9

[34]). Table 1 shows the quantitative comparison results of different models on the 3D eye-tracking database. Observed from the table, the proposed model demonstrates the best performance than other competitive models in terms of all quantitative indicators. The ROC curves in Fig. 6 also show the better performance of the proposed saliency model over other existing ones. We also present some visual comparison samples from different models in Fig. 7. Observed from Fig. 7(b)–(g), the saliency maps obtained from ‘Itti's modelþDepth’, ‘Bruce's modelþ Depth’, ‘Hou's model þDepth’, ‘Itti's model  DSM’, ‘Bruce's model  DSM’, and ‘Hou's model  DSM’ mainly highlight the contour of salient objects or some object corners in scenes. For Fang's models, although it can predict much accurate salient regions in scenes, it still fails to highlight the salient objects and suppress the backgrounds in some special scenes, as depicted in the first and the second saliency maps in Fig. 7(h). On the contrast, the proposed model can estimate the majority of salient locations while suppressing the background regions more accurately with respect to the human eye fixation density map, as shown in Fig. 7(i).

63 65 67 69 71 73 75 77 79 81 83 85

4.3. Influence of different combination schemes

87

In order to investigate the influence of different combination of each saliency component, we design nine combination Table 1 Quantitative comparison results for different 3D saliency models (the values in bold: the best performance). 3D saliency model

F-measure

LCC

AUC

NSS

Itti's modelþ Depth Bruce's model þDepth Hou's model þDepth Itti's model  DSM [33] Bruce's model  DSM [33] Hou's model  DSM [33] Fang's model [34] Proposed model

0.5043 0.4430 0.4209 0.4409 0.5349 0.5284 0.5317 0.6298

0.3256 0.2190 0.1656 0.2115 0.4164 0.3707 0.4411 0.4909

0.7004 0.6275 0.6156 0.6721 0.7309 0.7397 0.7591 0.8185

0.6816 0.5583 0.4505 0.5497 0.9071 0.8913 1.0675 1.1694

89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121

Fig. 6. ROC curves for different 3D saliency models.

Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

123

10

Q. Jiang et al. / Signal Processing: Image Communication ] (]]]]) ]]]–]]]

1

63

3

65

5

67

7

69

9

71

11

73

13

75

15

77

17

79

19

81

21

83

23

85

25

87

27 29

Fig. 7. Visual comparison between the proposed model and other existing saliency models: (a) Input left image; (b) Saliency map obtained from ‘Itti's model þDepth’ model; (c) Saliency map obtained from ‘Bruce's modelþ Depth’ model; (d) Saliency map obtained from ‘Hou's modelþ Depth’ model; (e) Saliency map obtained from ‘Itti's model  DSM’ model [33]; (f) Saliency map obtained from ‘Bruce's model  DSM’ model [33]; (g) Saliency map obtained from ‘Hou's model  DSM’ model [33]; (h) Saliency map obtained from Fang's model [34]; (i) Saliency map obtained from the proposed model; (j) Human eye fixation density map.

35 37 39 41 43 45 47 49 51

schemes for performance comparison. For Scheme-1, Scheme2 and Scheme-3, only 2D image saliency map (S2D), depth saliency map (SDP) or visual comfort based saliency map (SVC) is directly regarded as the 3D saliency map, respectively. For Scheme-4, Scheme-5 and Scheme-6, pair-wise combinations of S2D, SDP and SVC are conducted to generate the 3D saliency map. Besides, multiplication combination strategy is used in Scheme-7. The quantitative comparison results of the eight schemes are shown in Table 2. Clearly, the proposed model delivers significantly better performance than other schemes in terms of all the criteria, which demonstrates the superiority of the proposed linear combination scheme and the effectiveness of each saliency component. Moreover, while each saliency component does not supply standout performance when used alone, when combined them together, leading to considerable performance improvement. In addition, it can be observed that linear weighted strategy is more effective than the multiplication combination strategy for the fusion of different saliency components. Overall, the proposed model can provide an effective way for 3D saliency prediction.

we compute the mean of absolute disparity to reflect disparity magnitude characteristic, and the mean of absolute differential disparity to reflect disparity gradient characteristic. Furthermore, to investigate how 3D saliency weighting affects the VCA, the saliency weighted disparity features are calculated as follows:

4.4. Application to VCA

f1 ¼

M X N 1X S3D ðx; yÞ dðx; yÞ Ax¼1y¼1

ð29Þ

f2 ¼

M X N 1X S3D ðx; yÞ Δdðx; yÞ Ax¼1y¼1

ð30Þ

Table 2 Quantitative comparison results of different combination schemes (the values in bold: the best performance). Scheme Scheme-1 Scheme-2 Scheme-3 Scheme-4 Scheme-5 Scheme-6 Scheme-7 Proposed

55

59 61

It is known that visual attention is an important cue in addressing the prediction of visual discomfort [48,49]. To further demonstrate whether and to what extent the proposed 3D saliency model can improve the prediction accuracy of visual discomfort, we compare VCA performances with different 3D saliency models. Refer to [48],

(S2D) (SDP) (SVC) (S2D þSDP) (S2D þSVC) (SDP þSVC) (S2D  SDP  SVC)

F-measure

LCC

AUC

NSS

0.5254 0.5343 0.4945 0.5959 0.5432 0.5491 0.4852 0.6298

0.3751 0.3620 0.3482 0.4688 0.3379 0.3622 0.3949 0.4909

0.7371 0.7262 0.6925 0.7770 0.7306 0.7328 0.7772 0.8185

0.9075 0.6865 0.6593 0.9777 0.7518 0.7073 0.9318 1.1694

95 97 99 101 103 105 107

53

57

91 93

31 33

89

109 111 113 115 117

PM PN x¼1 y ¼ 1 S3D ðx; yÞ, where A is a normalizing factor as dðx; yÞ denotes the absolute disparity value at pixel (x, y),

Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

119 121 123

Q. Jiang et al. / Signal Processing: Image Communication ] (]]]]) ]]]–]]]

11

1

63

3

65

5

67

7

69

9

71

11

73

13

75

15

77

17

79

19

81

21

Fig. 8. Examples of anaglyph images (the first and third rows) and corresponding 3D saliency maps obtained by the proposed 3D saliency model (the second and fourth rows).

85

23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61

83

Table 3 Quantitative comparison results for different VCA models (the values in bold: the best performance). Method

PLCC

SRCC

KRCC

RMSE

Model-1 Model-2 Proposed

0.7341 0.8088 0.8346

0.7493 0.7837 0.8168

0.5625 0.5934 0.6269

0.5487 0.4776 0.4375

and Δdðx; yÞ denotes the absolute differential disparity value. The design of the features is based on the work in [48]. We design three methods for objective comparison, denoted as Method-1, Method-2 and Proposed. For Method-1, only original disparity features (without 3D saliency weighting) are used. For Method-2 and Proposed, the 3D saliency model in [48] and the proposed model are used to compute the saliency weighted disparity features using Eqs. (29) and (30), respectively. We conduct the experiments on the publicly available IVY LAB Stereoscopic 3D image database [62]. The database is generated for visual discomfort prediction, which contains 120 stereoscopic image pairs with different disparity ranges. Also, the mean opinion scores of visual comfort are provided to evaluate the performance of visual discomfort prediction. The details of the database can refer to [62]. Since the perceived visual discomfort is affected by 3D saliency, we use anaglyph (red–green) images to perceive the visual discomfort of the stereoscopic images (see Fig. 8). These images show that the proposed 3D saliency model can well highlight the salient locations. Four commonly-used performance indicators are employed to evaluate the VCA metrics: Pearson linear correlation coefficient (PLCC), Spearman rank order correlation coefficient (SRCC), Kendall rank-order correlation coefficient (KRCC), and root mean squared error (RMSE), between the objective and subjective scores. Among these criteria, PLCC and RMSE are employed to assess prediction accuracy, and SRCC and KRCC

are employed to assess prediction monotonicity. For a perfect match between the objective and subjective scores, PLCC¼SRCC¼KRCC¼1, and RMSE¼0. The results of the performance comparison are shown in Table 3. The proposed method performs better much better than the others.

87 89 91

5. Conclusions 93 A saliency computational model for stereoscopic 3D images has been proposed by fusing depth perception and visual comfort factors in this paper. Existing 3D saliency models only consider 2D image saliency and depth saliency and neglect the important role of visual comfort in guiding 3D visual saliency computation. We have formulated a comprehensive relationship between visual discomfort and 3D visual attention to quantify the influence of visual discomfort. To address the fact that 3D saliency will embody different manifestations under different sensations of visual discomfort, we train a visual comfort prediction function to distinguish a stereoscopic image as high comfortable stereo viewing (HCSV) or low comfortable stereo viewing (LCSV), and derive different visual comfort based saliency maps for these two cases. As indicated in the experimental results, by fusing the visual comfort based saliency, the proposed model outperforms the state-of-the-art models on predicting human eye fixations and visual comfort assessment.

95 97 99 101 103 105 107 109 111 113

Acknowledgments

115

117 This work was supported by the National Natural Q6 Q7 Science Foundation of China (Grants 61271021, 61271270, 119 U130125), Natural Science Foundation of Zhejiang Province of China (Grant LQ12F01005). It was also sponsored 121 by K.C.Wong Magna Fund in Ningbo University. The authors would like to thank Dr. Yuming Fang for providing 123 comparison results of Ref. [33].

Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

12

Q. Jiang et al. / Signal Processing: Image Communication ] (]]]]) ]]]–]]]

1

References

3

[1] Z. Li, S. Qin, L. Itti, Visual attention guided bit allocation in video compression, Image and Vision Computing 29 (1) (2011) 1–14. [2] K. Vu, K.A. Hua, W. Tavanapong, Image retrieval based on regions of interest, IEEE Trans. Knowl. Data Eng. 15 (4) (2003) 1045–1049. [3] S. Wan, P. Jin, L. Yue, An approach for image retrieval based on visual saliency, in: Proceedings of the International Conference on Image Analysis and Signal Processing, 2009. [4] Z. Lu, W. Lin, X. Yang, E. Ong, S. Yao, Modeling visual attention's modulatory aftereffects on visual sensitivity and quality evaluation, IEEE Trans. Image Process. 14 (11) (2005) 1928–1942. [5] H. Liu, I. Heynderickx, Visual attention in objective image quality assessment: based on eye-tracking data, IEEE Trans. Circuits Syst. Video Technol. 21 (7) (2011) 971–982. [6] V. Setlur, T. Lechner, M. Nienhaus, Retargeting images and video for preserving information saliency, IEEE Trans. Comput. Graph. Appl. 27 (5) (2007) 80–88. [7] Y. Fang, Z. Chen, W. Lin, C.W. Lin, Saliency detection in the compressed domain for adaptive image retargeting, IEEE Trans. Image Process. 21 (9) (2012) 3888–3901. [8] A. Borji, L. Itti, State-of-the-art in visual attention modeling, IEEE Trans. Pattern Anal. Mach. Intell. 35 (1) (2013) 185–207. [9] G. Deco, J. Zihl, Top-down selective visual attention: a neurodynamical approach, Vis. Cogn. 8 (1) (2001) 118–139. [10] C. Kanan, M.H. Tong, L. Zhang, SUN: top–down saliency using natural statistics, Vis. Cogn. 17 (6-7) (2009) 979–1003. [11] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal.Mach. Intell. 20 (11) (1998) 1254–1259. [12] J. Harel, C. Koch, P. Perona, Graph-based visual saliency, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2006. [13] E. Erdem, A. Erdem, Visual saliency estimation by nonlinearly integrating features using region covariances, J. Vis. 13 (4) (2013) 11. [14] N.D.B. Bruce, J.K. Tsotsos, Saliency based on information maximization, in: Proceedings of the Advances in Neural Information Processing Systems (NIPS), 2005. [15] X. Hou, L. Zhang, Saliency detection: a spectral residual approach, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007. [16] R. Achanta, S. Hemami, F. Estrada, S. Susstrunk, Frequency-tuned salient region detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009. [17] M. Levine, X. An, H. He, Saliency detection based on frequency and spatial domain analyses, in: Proceedings of the BMVC, 2011. [18] S. Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection, IEEE Trans. Pattern Anal. Mach. Intell. 34 (10) (2012) 1915–1926 . [19] G. Chen, Y. Ding, J. Xiao, T.X. Hang, Detection evolution with multiorder contextual co-occurrence, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [20] Z. Liu, X. Zhang, S. Luo, O. Le Meur, Superpixel-based spatiotemporal saliency detection, IEEE Trans. Circuits Syst. Video Technol. 24 (9) (2014) 1522–1540. [21] Z. Liu, W. Zou, O. Le Meur, Saliency tree: a novel saliency detection framework, IEEE Trans. Image Process. 23 (5) (2014) 1937–1952. [22] F. Shao, G. Jiang, M. Yu, K. Chen, Y. Ho, Asymmetric coding of multiview video plus depth based 3D video for view rendering, IEEE Trans. Multimed. 14 (1) (2012) 157–167. [23] F. Shao, W. Lin, S. Gu, G. Jiang, T. Srikanthan, Perceptual fullreference quality assessment of stereoscopic images by considering binocular visual characteristics, IEEE Trans. Image Process. 22 (5) (2013) 1940–1953. [24] C. Chamaret, S. Godeffory, P. Lopez, O. Le Meur, Adaptive 3D rendering based on region-of-interest, in: Proceedings of the SPIE 7524, Stereoscopic Displays and Applications XXI, 75240 V, 2000. [25] Q. Huynh-Thu, M. Barkowsky, P. Le Callet, The importance of visual attention in improving the 3DTV viewing experience: overview and new perspectives, IEEE Trans. Broadcast. 57 (2) (2011) 421–431. [26] P. Hanhart, T. Ebrahimi, Subjective evaluation of two stereoscopic imaging systems exploiting visual attention to improve 3D quality of experience, in: Proceedings of the SPIE 9011, Stereoscopic Displays and Applications XXV, 90110D, 2014. [27] N. Ouerhani, H. Hugli, Computing visual attention from scene depth, in: Proceedings of the 15th International Conference on Pattern Recognition (ICPR), 2000.

5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61

63 [28] N.D.B. Bruce, J.K. Tsotsos, An attention framework for stereo vision, in: Proceedings of the 2nd Canadian Conference on Computer and Robot Vision (ICCRV), 2005. 65 [29] D. Khaustova, J. Founier, E. Wyckens, O. Le Meur, How visual attention is modified by disparities and textures changes?, in: Proceedings of the SPIE 8651, Human Vision and Electronic Imaging 67 XVIII, 865115, 2013. [30] L. Jansen, S. Onat, P. König, Influence of disparity on fixation and 69 saccades in free viewing of natural scenes, J. Vis. 9 (1) (2009) 29. [31] Y. Niu, Y. Geng, X. Li, F. Liu, Leveraging stereopsis for saliency analysis, in: Proceedings of the IEEE International Conference on 71 Computer Vision and Pattern Recognition (CVPR), 2012. [32] C. Lang, T. Nguyen, H. Katti, K. Yadati, M. Kankanhalli, S. Yan, Depth matters: Influence of depth cues on visual saliency, in: Proceedings 73 of the European Conference on Computer Vision (ECCV), 2012. [33] J. Wang, M.P. Da Silva, P. Le Callet, V. Ricordel, Computational model 75 of stereoscopic 3D visual saliency, IEEE Trans. Image Process. 22 (6) (2013) 2151–2165. [34] Y. Fang, J. Wang, M. Narwaria, P. Le Callet, W. Lin, Saliency detection 77 for stereoscopic images, IEEE Trans. Image Process. 23 (6) (2014) 2625–2636. 79 [35] H. Kim, S. Lee, A.C. Bovik, Saliency prediction on stereoscopic videos, IEEE Trans. Image Process. 23 (4) (2014) 1476–1490. [36] A. Maki, P. Nordlund, J.-O. Eklundh, A computational model of 81 depth-based attention, in: Proceedings of the the 13th International Conference on Pattern Recognition (ICPR), 1996. [37] D. Khaustova, J. Fournier, E. Wyckens, O. Le Meur, An investigation of 83 visual selection priority of objects with texture and crossed and uncrossed disparities, in: Proceedings of the SPIE 9014, Human 85 Vision and Electronic Imaging XVIII, San Francisco, United States, 2014, pp. 90140D–90140D-13. [38] J. Gautier, O. Le Meur, A time-dependent saliency model mixing 87 center and depth bias for 2D and 3D viewing conditions, Cogn. Comput. 4 (2) (2012) 141–156. [39] T. Dittrich, S. Kopf, P. Schaber, B. Guthier, W. Effelsberg, Saliency 89 detection for stereoscopic video, in: Proceedings of the ACM Multimedia Systems Conference, 2013. 91 [40] D.W. Kim, J.S. Yoo, Y.H. Seo, Qualitative analysis of individual and composite content factors of stereoscopic 3D video causing visual discomfort, Displays 34 (3) (2013) 223–240. 93 [41] M. Lambooij, M. Fortuin, I. Heynderickx, Visual discomfort and visual Q8 fatigue of stereoscopic displays: a review, J. Imaging Sci. Technol. 53 95 (3) (2009). 30201-30201. [42] T Shib.ata, J. Kim, D.M. Hoffman, M.S. Banks, Visual discomfort with stereo displays: effects of viewing distance and direction of 97 vergence-accommodation conflict, in: Proceedings of the SPIE 7863, Stereoscopic Displays and Applications XXII, 78630P, 2011. [43] M. SareyKhanie, M. Andersen, B.M. 't Hart, J. Stoll, W. Einhäuser, 99 Integration of eye-tracking methods in visual comfort assessments, in: Proceedings of the CISBAT, 2011, pp. 14–15. [44] Y. Nojiri, H. Yamanoue, S. Ide, S. Yano, F. Okana, Parallax distribution 101 and visual comfort on stereoscopic HDTV, in Proceedings of the IBC, 2006, pp. 373–380. 103 [45] W.J. Tam, F. Speranza, S. Yano, K. Shimono, H. Ono, Stereoscopic 3DTV: visual comfort, IEEE Trans. Broadcast. 57 (2) (2011) 335–346. [46] M. Lambooij, W.A. IJsselsteijn, I. Heynderickx, Visual discomfort of 3- 105 D TV: assessment methods and modeling, Displays 32 (4) (2011) 209–218. [47] D. Kim, K. Sohn, Visual fatigue prediction for stereoscopic image, 107 IEEE Trans. Circuits Syst. Video Technol. 21 (2) (2011) 231–236. [48] Y. Jung, H. Sohn, S. Lee, H.W. Park, Y.M. Ro, Predicting visual 109 discomfort of stereoscopic images using human attention model, IEEE Trans. Circuits Syst. Video Technol. 23 (12) (2013) 2077–2082. 111 [49] Y. Jung, S. Lee, H. Sohn, H.W. Park, Y.M. Ro, Visual comfort assessment metric based on salient object motion information in stereoscopic video, J. Electron. Imaging 21 (1) (2011) 011008. 113 [50] S. Lee, Y.J. Jung, H. Sohn, Y.M. Ro, Subjective assessment of visual discomfort induced by binocular disparity and stimulus width in stereoscopic image, in: Proceedings of the SPIE 8648, Stereoscopic 115 Displays and Applications XXIV, 86481T, 2013. [51] H. Sohn, Y. Jung, S. Lee, Y.M. Ro, Attention model-based visual 117 comfort assessment for stereoscopic depth perception, in: Proceedings of the International Conference on Digital Signal Processing (DSP), 2011. 119 [52] G. Fry, Further experiments on the accommodative convergence relationship, Am. J. Optom. 16 (1939) 325–334. [53] D.M. Hoffman, A.R. Girshick, K. Akeley, Vergence–accommodation 121 conflicts hinder visual performance and cause visual fatigue, J. Vis. 8 (3) (2008) 33. 123

Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

Q. Jiang et al. / Signal Processing: Image Communication ] (]]]]) ]]]–]]]

1 3 5 7 9 11 13 15

[54] I.P. Howard, B.J. Rogers, Seeing in Depth, University of Toronto Press (Toronto ON, Canada, 2002. [55] T. Shibata, J. Kim, D.M. Hoffman, M.S. Banks, The zone of comfort: predicting visual discomfort with stereo displays, J. Vis. 11 (8) (2011) 1–29. 11. [56] J. Häkkinen, T. Kawai, J. Takatalo, R. Mitsuya; G. Nyman, What do people look at when they watch stereoscopic movies?, in: Proceedings of the SPIE 7524, Stereoscopic Displays and Applications XXI, 2010. [57] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, S. Susstrunk, SLIC superpixels compared to state-of-the-art superpixel methods, IEEE Trans. Pattern Anal. Mach. Intell. 34 (11) (2012) 2274–2282. [58] Z. Liu, O. Le Meur, S. Luo, Superpixel-based saliency detection, in: Proceedings of the International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), 2013, pp. 1–4. [59] M.M. Cheng, G.X. Zhang, N.J. Mitra, X. Huang, S.M. Hu, Global contrast based salient region detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.

13

[60] W. Förstner, B. Moonen, A Metric for Covariance Matrices, Geodesy: the Challenge of the 3rd Millennium, Springer, Berlin Heidelberg, 2003, 299–309. [61] W.B. Yang, Y.Y. Tang, B. Fang, Z.W. Shang, Y.W. Lin, Visual saliency detection with center shift, Neurocomputing 103 (2013) 63–74. [62] K. Fu, C. Gong, J. Yang, Y. Zhou, I. Yu-Hua Gu, Superpixel based color contrast and color distribution driven salient object detection, signal processing, Image Commun. 28 (10) (2013) 1448–1463. [63] H. Sohn, Y. Jung, S. Lee, Y.M. Ro, Korea Advanced Institute of Science and Technology. IVY Lab Stereoscopic Image Database[OL], 〈http://ivylab.kaist.ac.kr/ demo/3DVCA/3DVCA.htm〉, 2013. [64] X.H. Li, H.C. Lu, L.H. Zhang, X. Ruan, M.H. Yang, Saliency detection via dense and sparse reconstruction, in: Proceedings of the International Conference on Computer Vision (ICCV), 2013. [65] A. Borji, N.S. Dicky, L. Itti, Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study, IEEE Trans. Image Process. 22 (1) (2013) 55–69.

Please cite this article as: Q. Jiang, et al., A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency, Signal Processing-Image Communication (2015), http://dx.doi.org/10.1016/j. image.2015.04.007i

17 19 21 23 25 27 29 31