Perceived depth quality - preserving visual comfort improvement method for stereoscopic 3D images

Perceived depth quality - preserving visual comfort improvement method for stereoscopic 3D images

Signal Processing 169 (2020) 107374 Contents lists available at ScienceDirect Signal Processing journal homepage: www.elsevier.com/locate/sigpro Pe...

6MB Sizes 7 Downloads 54 Views

Signal Processing 169 (2020) 107374

Contents lists available at ScienceDirect

Signal Processing journal homepage: www.elsevier.com/locate/sigpro

Perceived depth quality - preserving visual comfort improvement method for stereoscopic 3D images Hongwei Ying a,c, Mei Yu a,b, Gangyi Jiang a,b,∗, Zongju Peng a, Fen Chen a a

Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China National Key Lab of Software New Technology, Nanjing University, Nanjing 210093, China c College of Electronic and Information Engineering, Ningbo University of Technology, Ningbo 315211, China b

a r t i c l e

i n f o

Article history: Received 7 August 2019 Revised 7 November 2019 Accepted 11 November 2019 Available online 23 November 2019 Keywords: Stereoscopic image Stereoscopic three-dimensional (S3D) Display Visual comfort improvement Perceived depth quality 3D visual satisfaction

a b s t r a c t The improvement of visual comfort (VC) of stereoscopic three-dimensional (S3D) images is often accompanied with a decline in the perceived depth (PD) quality which involves three senses: presence, power, and realism. To address this problem, this paper proposes a novel PD quality - preserving VC improvement method for S3D images. First, an overall visual experience index termed 3D visual satisfaction which accounts for both VC and PD quality is defined. Then, a new viewing distance nonlinear shifting (VDNS) scheme is developed to improve the VC of S3D images and VDNS-based rendering method is proposed to generate new S3D images. A subjective assessment experiment is conducted on S3D image database, consisting of discomfort S3D images and their VDNS-based rendering images to obtain the corresponding ground truth 3D visual satisfaction scores. Based on the labeled dataset, an objective 3D visual satisfaction assessment model is presented, which integrates VC and PD quality and denotes as VCPD model. Using the VCPD model as guidance, VDNS is used to improve the VC and 3D visual satisfaction of S3D images in a stepwise manner without introducing geometric proportional distortion. As a result, the adjusted S3D image can provide better 3D visual satisfaction to viewers, i.e., improving the VC while preserving the PD quality. Experimental results show that the proposed method can achieve better comprehensive performance in improving both of VC and PD quality than other relevant methods as it can provide the optimal 3D visual satisfaction. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Stereoscopic three-dimensional (S3D) displays can provide enhanced viewing experience to viewers, resulting in the increasing demand for S3D content generation, processing and related services [1,2]. However, visual fatigue and discomfort could be induced while watching inappropriate stereoscopic visual contents in S3D displays [3,4]. Visual fatigue and discomfort reflect the adverse physiological and psychological reactions caused by S3D perception [5], which degrades the viewers’ visual experience of quality [6,7]. Therefore, visual discomfort has become one of important issues in visual health now [8,9]. Many objective assessment methods have been proposed to measure the visual discomfort of S3D images [2,8,10,11,12] or S3D videos [13,14]. Some factors induce visual discomfort when viewing S3D images: accommodation–vergence conflict (AV conflict), binocular

∗ Corresponding author at: Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China. E-mail addresses: [email protected] (M. Yu), [email protected] (G. Jiang).

https://doi.org/10.1016/j.sigpro.2019.107374 0165-1684/© 2019 Elsevier B.V. All rights reserved.

fusion limit, window violation, and binocular mismatches [15]. Binocular disparity is crucial for depth perception [5], but excessive binocular disparity will aggravate the AV conflict and lead to failure of binocular fusion, which is a major factor of visual discomfort [16]. Therefore, some methods had been proposed to improve the visual comfort (VC) of S3D images by adjusting disparity features such as the disparity magnitude, disparity range, or relative disparity of S3D images [17–20], in order to achieve binocular fusion. Oh et al. [17] proposed a VC improvement method via an optimization process whereby a predictive indicator of VC was minimized, while still aiming to maintain the viewer’s sense of presence by performing a suitable disparity shift and directed blurring of the S3D image. Sohn et al. [18] first performed linear disparity remapping to address visual discomfort induced by excessive disparities. This linear remapping changed the disparities of the scene to obtain an overall target disparity range. Then, a nonlinear disparity remapping process selectively adjusted the disparity of problematic local disparity ranges according to their contribution to the visual discomfort. Jung et al. [19] proposed an auto-

2

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

matic control scheme of disparity scaling and shifting to reduce visual discomfort in S3D displays, in which a stepwise linear search was used with a predetermined step size for disparity scaling and disparity shifting until VC was improved to the predetermined target. Jung et al. [20] used weighted disparity map to describe the saliency, sensitivity, and discomfort of S3D images. The pixels with larger values in the weighted disparity map were more compressed by a sigmoid function. The method is equivalent to the method with nonlinear scaling and shifting of the original disparity map. In fact, subjective assessment of S3D images is not limited to VC. Urvoy et al. [21] and Zhou et al. [22] indicated that besides visual fatigue and discomfort, visual quality and depth quality are also main axes of the viewers’ 3D Quality of Experience. Although extremely large disparity is one of the main factors leading to visual discomfort, the disparity is also the main way to produce depth perception. It should be noted that VC improvement does not mean that 3D visual satisfaction will be improved, but it may also lead to a decline of 3D visual satisfaction, for example, the usage of excessive disparity adjustment. Thus, there should be a trade-off relationship between VC improvement and perceived depth (PD) quality degradation when disparity adjustment is performed. In summary, previous studies paid less attention to simultaneously considering both VC and PD quality, and some objective assessment models for the VC improvement did not include the PD quality. Therefore, this paper defines 3D visual satisfaction comprising VC and PD quality. Compared to previous studies, this paper considers the effect of VC improvement on PD quality, and tries to find the optimal trade-off relationship between the increase in VC and decline in PD quality to optimize the overall 3D visual satisfaction of S3D images. This paper proposes a new VC improvement method for S3D images integrated with its PD quality. The main contributions of this paper are as follows:

(1) Different from the methods of scaling or shifting for disparity to improve VC in previous papers, this paper presents another way to improve VC by shifting for viewing distance. Viewing distance nonlinear shifting (VDNS) is proposed to adjust the VC of S3D images, then a VDNS-based rendering method is proposed to generate the new S3D images with improved VC. (2) The concept of 3D visual satisfaction for S3D images is defined, and a subjective assessment experiment regarding the 3D visual satisfaction for S3D images is designed and carried out. In the experiment, the data augmentation of the S3D image database is realized by adding VDNS-based rendered S3D images into the sample set. Based on the experimental results, an objective assessment model of 3D visual satisfaction, which integrates VC and PD quality, denoted as VCPD model, is established. (3) Under the guidance of the VCPD model, the method of using VDNS to gradually improve VC and 3D visual satisfaction of S3D image in a stepwise manner is proposed, and this method can avoid geometric proportional distortion after rendering. As a result, 3D visual satisfaction of S3D image is significantly improved.

The rest of this paper is organized as follows. Section 2 describes the basic methods for VC improvement of S3D images and the possible side effects such as the decline in PD quality. Section 3 discusses the proposed method for VC improvement with PD quality. In Section 4, the proposed method is compared with other VC improvement methods in many aspects, such as the effects of VC improvement and 3D visual satisfaction improvement. Conclusions are presented in Section 5.

Fig. 1. Sketch showing the disparity scaling way.

2. Motivation It is believed that VC can be directly improved by adjusting the disparity magnitude, disparity range, or relative disparity of S3D images. In general, these adjustment methods can be summarized as two ways: disparity scaling and disparity shifting. The disparity scaling way involves compressing the disparity in a linear or nonlinear way so that the scene can be located as far as possible in Panum’s fusion area [23]. The disparity scaling way is illustrated in the ping-pong table scene of Fig. 1. The two pink vertical sections represent the planes with angular disparities equal to −1° and +1°, respectively. The area between the two sections is Panum’s fusional area. The gray vertical section represents the plane with angular disparity equal to 0°, i.e. the display screen plane. Both ends of the ping-pong table are located outside Panum’s fusion area. As long as the scaling down is enough, the disparity scaling way can cause the angular disparity feature of the whole scene to be in the range of ±1°, so it can basically eliminate viewing discomfort. The shortcoming of the disparity scaling way is that the relative distance between objects in the depth direction will decrease as the scene is compressed, which will have a negative impact on depth perception. First, it reduces the relative perception distance in the depth direction and the sense of power. Second, it increases the viewing distance between the eyes and the scene and weakens the sense of presence. Third, the geometric proportion of objects is changed because the objects in the scene are compressed in the depth direction; hence, the visual information conveyed by the objects is easily distorted, which decreases the sense of realism. Therefore, the disparity scaling way improves VC but may reduce the PD quality, so that 3D visual satisfaction of viewers may not be improved. The disparity shifting way involves adding a variation to the disparity of S3D image, which can achieve the effect of moving scene in the depth direction and make the scene fall within Panum’s fusion area as far as possible. As shown in Fig. 2(a), the ping-pong table is located outside Panum’s fusion area, thus inducing visual discomfort. After disparity shifting, the ping-pong table is located entirely inside Panum’s fusion area; visual discomfort is eliminated, as shown in Fig. 2(b). The advantage of the disparity shifting way is that the disparity range of the scene is not changed; therefore, it can improve VC of the scene on the premise of maintaining the sense of power, which will certainly be beneficial for improving the viewers’ 3D visual satisfaction. However, when the disparity range of the scene is too large, the disparity shifting way cannot make the whole scene to appear in Panum’s fusion area. As shown in Fig. 3, when the depth of the ping-pong table is large, the foreground and background cannot fall into Panum’s fusion area together, so its VC improvement is limited. In addition, owing to the nonlinear relationship between disparity and depth, the disparity shifting way may also change the shape of the object in the depth direction, which will change the geometric proportion of the objects, thus, the sense of realism is reduced.

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

3

Fig. 2. Sketch showing the disparity shifting way (in case of complete comfort).

Fig. 3. Sketch showing the disparity shifting way (in case of incomplete comfort).

The PD quality can be assessed by different features such as the sense of realism, power and presence [21]. It is one of the attributes of 3D visual experience and reflects the quality of 3D visual effect [24,25]. According to the above analysis, there may be reduction in the PD quality when disparity scaling or shifting way is used to improve VC of S3D image. For visual discomfort induced by excessive disparity magnitude, reducing disparity may increase the viewing distance between the eyes and the scene (for example, visual discomfort induced by cross disparity); this is beneficial for VC improvement. However, viewers generally prefer to watch closer sceneries [26]; increasing the viewing distance will weaken the immersive feeling and reduce the sense of presence [27], so it is uncertain whether 3D visual satisfaction will be improved. Reducing the disparity range can also improve VC but will result in a decrease in the sense of power; the scene might appear bland and unappealing [5], and the sense of presence also decreases [28]. In addition, changing the viewing distance and the depth of the scene may also lead to distortions in the geometric proportions of objects in the scene, such as the puppet theater effect or cardboard effect; this will have a very big impact on perception quality [29], and can reduce the sense of realism and naturalness of the scene [30]. Therefore, improving VC excessively may lead to a significant decline in the corresponding PD quality, thereby 3D visual satisfaction of S3D contents are reduced. In summary, while improving the VC of S3D images, 3D visual satisfaction should also be improved. Therefore, VC improvement needs to be guided by the VC assessment method combined with the factors influencing the PD quality. However, many current VC assessment methods did not integrate the factors pertaining to the PD quality and cannot be used to improve 3D visual satisfaction. Therefore, it is necessary to develop an objective assessment model for 3D visual satisfaction, which integrates VC and PD quality. 3D visual satisfaction reflects human overall visual experience of S3D images, which includes not only VC but also PD quality. To obtain the optimum 3D visual satisfaction when viewing S3D images, this study will analyze the relationship between VC, PD qual-

ity, and subjective 3D visual satisfaction, then an objective assessment model for 3D visual satisfaction is established by considering both of VC and PD quality (called as VCPD model). This study also proposes a new VC improvement method based on view distance nonlinear shifting of the scene under the guidance of the VCPD model. 3. Proposed method for VC improvement of S3D images with PD quality According to the above analyses, the disparity scaling and disparity shifting ways may improve VC, but they may also reduce PD quality at the same time, resulting in a decline in overall visual satisfaction. In order to make the S3D image after VC improvement more in line with human visual satisfaction, the process of VC improvement for S3D image should be guided by 3D visual satisfaction assessment model which integrates VC and PD quality. In this section, we propose a new method for VC improvement of S3D images with PD quality and 3D visual satisfaction assessment model. Firstly, we describe the overall framework of the proposed VC improvement method, then present the viewing distance nonlinear shifting (VDNS) method used for VC improvement and the rendering method for new left and right views generation in Section 3.1. The advantage of VDNS is that the improved S3D image maintains the geometric scale of the scene and does not lose the sense of realism. In Section 3.2, we propose an objective assessment model of 3D visual satisfaction, i.e. VCPD model, based on a subjective experiment of 3D visual satisfaction. Section 3.3 will describe VC improvement in a stepwise manner under the guidance of VCPD model. Fig. 4 shows the overall framework of the proposed VC improvement method based on the VCPD model, it includes two main parts, the training phase for VCPD modeling and the testing phase for VC improvement guided by the VCPD model. In Fig. 4, the training phase represents the VCPD modeling process. First, each S3D image in the training set of discomfort S3D

4

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

Fig. 4. Framework of the proposed VC improvement method based on the VCPD model.

Fig. 5. Two types of view distance shifting sketches, VDLS and VDNS.

images is improved for VC and rendered j times with incremental shifting viewing distance Zj based on VDNS. Then, three objective assessment models, i.e. VCA, PRE and POW, are used to calculate the objective assessment scores of VC, sense of presence and sense of power for the rendered S3D images. The subjective mean opinion scores (MOSs) of 3D visual satisfaction of the rendered S3D images are obtained with subjective experiments. Finally, the support vector regression (SVR) method is used to fit the relationship between the subjective MOSs and the three objective assessment scores, namely QVCA , QPRE and QPOW , thus the VCPD model is established. The testing phase of Fig. 4 represents the procedure of VC improvement. For a discomfort S3D image, the viewing distance is increased in a stepwise manner to improve its VC, then the objective assessment scores of each step of VCPD and VCA models are calculated. When the VCPD score reaches the maximum and the VCA score is better than before, it is inferred that the VC is improved just right. 3.1. The scheme of viewing distance nonlinear shifting (VDNS) Different from the methods of scaling or shifting for disparity to improve VC in previous papers, this paper presents another way to improve VC by shifting viewing distance. Fig. 5(a) shows a scheme of viewing distance linear shifting (VDLS). A scene ABCD has a high cross disparity, which leads to visual discomfort. Therefore, the scene needs to be moved away from the eyes to reduce this cross disparity. In Fig. 5(a), the scene ABCD is linearly shifted to A’B’C’D’. After shifting, the size and shape of the whole scene do not change at all, so the sense of realism remains unchanged. How-

ever, the viewing distance has been changed, thus, the VC, sense of power and presence are also changed. According to the law of perspective geometry, the object is big when near and small when far; therefore, when the scene is far away, the angle of view becomes smaller and the scene looks smaller. The projection points of A and B on the screen are SA and SB , respectively, and the projection points of A’ and B’ on the screen are SA and SB , respectively. As observed in Fig. 5(a), there appears a black band between SA and SA . Similarly, black bands will appear all around the screen. To make full use of the screen areas, after performing VDLS as shown in Fig. 5(a), both the left and right views of S3D image have to be zoomed in to avoid black bands around the screen. Therefore, we propose VDNS to replace VDLS, i.e. to shift all depth planes nonlinearly, as shown in Fig. 5(b). VDNS can also keep the angle of view and the ratios of width to depth and height to depth unchanged after the scene ABCD being moved to A’B’C’D’. In addition, there is no black band around the screen. The projection of A coincides with that of A’ at SA on the screen and the projection of B coincides with that of B’ at SB on the screen. Since the depth, width, and height of the scene increase and the proportions of the scene remain unchanged after VDNS, VDNS does not affect the sense of realism. At the same time, the viewing distance between the eyes and the scene increases from ZA to ZA’ , the cross disparity decreases, the sense of presence decreases, the depth range of the scene expands from AB to A’B’, and the sense of power also increases. Reducing excessive cross disparity can improve VC, which is beneficial for achieving 3D visual satisfaction. But reducing the sense of presence can reduce 3D visual satisfaction. On the one hand, enhancing the sense of power is benefi-

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

5

is changed from RA to RA , the difference between RA and RA is denoted as RA , that is

DA = LA − RA , DA = LA − RA

1  1 LA = LA LA = OA LA − OA LA = OL · T · − ZA ZA 1  p ·T 1 d

=−

·

2

ZA



ZA

1  1 RA = RA RA = OA RA − OA RA = OR · T · − ZA ZA 1  1 p ·T =

Fig. 6. Relationship between screen disparity, viewing distance, and pixel column coordinates of left and right views of S3D image.

cial to improving 3D visual satisfaction. On the other hand, the background part of the scene may be shifted to the region with an angular disparity greater than +1°, which is out of Panum’s fusion area, this will decrease both of VC and 3D visual satisfaction. Therefore, there should be a constraint for achieving the desired VC and PD quality, and it is necessary to find the optimal shifting distance in VDNS for the best 3D visual satisfaction. This study uses VDNS to improve VC of S3D image, which is better than VDLS. In Fig. 5(b), the horizontal viewing distances between the eyes and points A or B are ZA or ZB , respectively, before scene shifting. After A and B are shifted into A’ and B’, the horizontal viewing distances from eyes to the points A’ and B’ are ZA and ZB , respectively . The geometric relationships of the viewing distance are expressed by

Z A = Z A  − Z A , Z B = Z B  − Z B

(1)

Z B Z B Z B  = = Z A Z A Z A 

(2)

Z B  = Z B + Z B = Z B · ( 1 +

Z A ZA

)

(3)

Let Zn be the original viewing distance of the point N which is the nearest point in the scene, and ZA be the original viewing distance of a random point A in the scene. When N is shifted with Zn in the depth direction, A is shifted to A’ based on VDNS. From Eq. (3), the viewing distance ZA’ can be calculated by



ZA = ZA ·

1+

Z n



(4)

Zn

0 be the original viewing distance map of scene. After perLet ZM forming VDNS, the new viewing distance map, ZM , can be computed as follows



0 ZM = ZM ·

1+

Z n 0} min{ZM



(5)

Then, ZM needs to be converted to screen disparity map for rendering the left and right virtual views of new S3D image. Fig. 6 shows the relationship between the viewing distance, screen disparity, and pixel column coordinates of the left and right views before and after performing VDNS, in which points L and R represent the left and right view points respectively, and pd is the baseline distance. When A is shifted to A’, the screen disparity of A is changed from DA to DA , the column coordinate of the left view of A is changed from LA to LA , the difference between LA and LA is denoted as LA ; the column coordinate of the right view of A

d

2

·

ZA



ZA

(6)

(7)

(8)

Eqs. (7) and (8) clearly show the relationship between the differences of column coordinates in the left and right views of S3D image and the variations in the viewing distance when performing VDNS. From Fig. 6, we can also derive equations for the relationship between screen disparity and viewing distance as follows

ZA =

pd · T pd − DA

ZA =

pd · T pd − DA

(9) (10)

By substituting Eqs. (9) and (10) into Eqs. (7) and (8), Eqs. (11) and (12) can be obtained, which indicates the relationship between the differences of the column coordinates in the left and right views and the variations in screen disparity when performing VDNS.

L A = ( D A − D A  ) / 2

(11)

R A = − ( D A − D A  ) / 2

(12)

Let CL and CR be the original left and right views, respectively, DR be disparity map of the original right view, and Zn be the shifting distance of the nearest point in the scene. Finally, a VDNSbased rendering method is proposed to generate new left and right views, which is described as follows: (1) DR is substituted for the variable DA in Eq. (9) to obtain the original viewing distance map ZR of the right view. 0 in Eq. (5) to obtain the (2) ZR is substituted for the variable ZM viewing distance map ZR of the right view after performing VDNS. (3) ZR is substituted for the variable ZA in Eq. (10) to obtain the disparity map DR  of the right view after performing VDNS. (4) DR and DR  are substituted for the variables DA and DA’ in Eqs. (11) and (12) to get the L and R maps, respectively. L represents the differences of the column coordinates in CL and CL of the left view; R represents the differences of the column coordinates in CR and CR of the right view. CL and CR respectively indicate the rendered left and right views after VDNS. CR is rendered first; the color of each pixel in CR is mapped as CR (x,y) = CR (x + R(x,y), y). (5) For the holes HR in CR , fill with the corresponding parts of CL according to disparity mapping, HR (x,y) = CL (x + DR (x,y) + R(x,y), y), then get CR  = CR ∪HR . (6) The left view CL is rendered by using CR  and DR  : CL (x,y) = CR  (x - DR  (x,y), y). (7) For the holes HL in CL , fill with the corresponding parts of CL according to disparity mapping, HL (x,y) = CL (x + L(x,y), y), then get CL  = CL ∪HL . (8) If there are still some small holes in CR  or CL  , the image inpainting technique [31] in Opencv is used to fill in these holes so that the new left and right virtual views are obtained.

6

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

Original left view (C L)

Original right view (CR)

Uncompensated left view after VDNS (C L′)

Uncompensated right view after VDNS (CR′)

Full left view after VDNS (C L′′)

Full right view after VDNS (CR′′)

Fig. 7. New left and right view rendering process, taking No.32 S3D image with Zn =10 0 0 mm as an example.

Take No.32 S3D image in the IVY database [32] as an example, Fig. 7 shows the rendering process of the left and right views, where Zn =10 0 0 mm. According to the above steps, as long as Zn is determined, the corresponding shifted S3D image can be rendered. In the next section, an objective assessment model of 3D visual satisfaction is established to guide the search for optimal Zn , so as to render the new S3D image with the optimum 3D visual satisfaction. 3.2. Objective assessment model of 3D visual satisfaction 3D visual satisfaction reflects human overall visual experience of S3D images, which includes not only VC but also PD quality. The PD quality includes the sense of realism, power and presence. When performing VDNS for VC improvement, setting an appropriate viewing distance from the eyes to the nearest part of the scene Zn may improve the VC of S3D images. However, at the same time, it may also lead to a decline in its PD quality. Therefore, it is necessary to optimize Zn according to the objective assessment model of 3D visual satisfaction. Here, an objective assessment model of 3D visual satisfaction (that is, VCPD model) is used to assess the rendered S3D images, and the S3D image with the highest objective VCDP score is selected as the optimal result of VC improvement. To guide the improvement of VC with PD quality for S3D images, the VCPD model is constructed. Because VDNS maintains the sense of realism, so the VCPD model comprises of three parts: VC, sense of power and sense of presence. Through the subjective experiments on 3D visual satisfaction designed in this study, the subjective MOSs of 3D visual satisfaction of the training samples are obtained; these MOSs are then fitted with the VC, sense of power, and sense of presence via SVR. Finally, the objective assessment model for 3D visual satisfaction, the VCPD model, is obtained. 3.2.1. Visual comfort assessment In the VCPD model, the method in [12] is used to assess the objective VC. This method extracts two features, namely the mean of saliency-weighted absolute disparity (MSAD) and mean of saliencyweighted absolute differential disparity (MSADD) from S3D images. Then, SVR is used to obtain the VC assessment (VCA) model, the output score of VCA model is denoted as QVCA (DM ), DM is the

screen disparity map representing each point in the scene. According to the relationship viewing distance given in Eq. (9), distance map ZM , so QVCA (DM ) can

the set of angular disparities of between screen disparity and DM can be converted into view also be expressed as QVCA (ZM ).

3.2.2. Perceived depth (PD) quality assessment The perceived distance increases with the true distance but at a reduced and diminishing rate, and there is a relationship between perceived distance and true distance, as shown below [33]:

P L = Z L+Z

(13)

where P is perceived distance, Z is true distance, and L is a constant and equal to 30.48 m. Let Zn and Zf be the nearest and farthest true distances from the eye to the scene respectively, then, Zn = min(ZM ) and Zf = max(ZM ). From Eq. (13), the nearest perceived distance to the scene, Pn , can be obtained as follows:

Pn =

L · Zn L · min(ZM ) = L + Zn L + min(ZM )

(14)

Similarly, the farthest perceived distance to the scene, Pf , is computed by

Pf =

L · Zf L · max(ZM ) = L + Zf L + max(ZM )

(15)

The sense of presence is closely related to the viewing distance. If the viewing distance is too far, the sense of presence decreases [27]. Assuming that the nearest part of the scene has the best sense of presence, with the score of 5, the farthest part of the scene has the worst sense of presence, with the score of 1. Normalizing the assessment value of the sense of present to the range of 1 to 5, the objective assessment model for the sense of presence (denoted as PRE) is defined as

Q PRE (ZM ) = 5 −

4 · Pn 4·L = +1 L L + min(ZM )

(16)

where QPRE (ZM ) is the output score of the PRE model. The sense of power can be expressed by the relative perceived distance within the scene. When VDNS is performed, the relative

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

perceived distance within the scene changes. If the relative perceived distance between the nearest and farthest part of the scene is the largest, the sense of power is the best; if the relative perceived distance between the nearest and farthest part of the scene is zero, the sense of power is the worst. In addition, the impact of the scene type on the sense of power is also taken into account. For example, the relative perceived distance of indoor scenes is generally much smaller than that of outdoor scenes. Therefore, the sense of power measurement for current scene should be based on the comparison with original scene. Thus, an enhancement coefficient for the sense of power is defined as follows: 0 G ( ZM , ZM ) =

=

Pf − Pn Pf0 − Pn0 0 0 (L + max(ZM ))(L + min(ZM ))(max(ZM ) − min(ZM )) 0 ) − min (Z 0 )) (L + max(ZM ))(L + min(ZM ))(max(ZM M

(17) 0 ,Z ) G ( ZM M 0 ZM is the

where is the enhancement coefficient for the sense of power, original viewing distance map, Pf0 andPn0 are the original farthest and nearest perceived distances, respectively, ZM is the current viewing distance map after performing VDNS. Pf and Pn are the current farthest and nearest perceived distances after performing VDNS, respectively. The stronger the sense of power, the higher the value of 0 , Z ). Normalizing G (Z 0 , Z ) to the range of 1 to 5, the objecG ( ZM M M M tive assessment model for the sense of power (denoted as POW) is defined as 0 Q POW (ZM , ZM ) = 5 − 4 · e−G(ZM ,ZM ) 0

(18)

0 , Z )is the output score of POW model. Accordwhere Q POW (ZM M 0 , Z ing to the relationship of ZM n and ZM given in Eq. (5), POW 0 0 , Z ) . Q (ZM , ZM ) can also be expressed as Q POW (ZM n Next, we will construct subjective experiment for 3D visual satisfaction and fit the MOSs of 3D visual satisfaction with QVCA , QPRE and QPOW to obtain the objective assessment model of 3D visual satisfaction, that is, the VCPD model.

3.2.3. Subjective assessment experiment regarding 3D visual satisfaction To establish the VCPD model, a subjective experiment is designed to assess 3D visual satisfaction of S3D images. A Samsung UA65F90 0 0 65 inch Ultra HD 3D-LED TV is used in the subjective experiments. The display has low crosstalk levels (left: 0.38% and right: 0.15%) compared to the visibility threshold of crosstalk in another study [34]. The peak luminance of this displayer is adjusted to 50 cd/m2 . The viewing distance is thrice the height of the screen. A total of 17 testers, 13 males and 4 females, wearing 3D shutter glasses participated in the experiment. All testers in the experiment met the minimum stereoscopic acuity requirement of less than 60 s of arc (sec–arc), and passed a color version test. A total of 22 S3D images with VC MOSs less than 3.0 in the IVY Lab database were selected as the original test S3D images, denoted as ={φ i ; i∈}, where  represents the set of numbers for these S3D images, that is, ={2, 28, 29, 30, 32, 33, 35, 39, 46, 47, 49, 50, 51, 52, 53, 55, 70, 73, 74, 101, 102, 103}. The right views of these original test S3D images are shown in Fig. 8. The No.i original test S3D image in , φ i , is shifted 12 times in the depth direction, and 12 rendered S3D test images {φ  i,j ; j = =1, 2, …,12} are generated by the rendering method described in Section 3.1. They form the No.i test sequence, denoted as  i ={φ i , φ  i,j ; j = =1, 2, …,12}. All 22 test sequences constitute a set, denoted as  ={ i ; i∈}. The shifting step length Sl is set to 100 mm. All testers rate all test S3D images in all 22 test sequences subjectively. Each test sequence is presented twice; the first time is not rated, while the second time is rated. Each test

7

S3D image is presented for 10 s and a resting time of 5 s is given with a midgray image. During the resting time, presenting the images twice, testers are asked to assess the overall VC and PD quality of S3D images and then rate the degree of 3D visual satisfaction according to the absolute category rating (ACR) test methodology described in ITU-T P.910 [35] and ITU-T P.911 [36]. The degree of 3D visual satisfaction is rated on a scale from 1 to 5, that is, 5=very good, 4=good, 3=middle, 2=bad, and 1=very bad. After subjective assessments, one outlier is eliminated by using the screening methodology recommended in ITU-R BT.500–11 [37]. The final MOS of 3D visual satisfaction for each test S3D image is calculated as the mean of the remaining opinion scores. All these rendered S3D images and their MOSs constitute the S3DI-VDNS database [38]. Fig. 9 shows the subjective MOSs and objective assessment scores of VCA, PRE, and POW of  2 ,  32 ,  53 and  103 , respectively. In each subfigure, the black "O" represents the MOS, the red "∗ " represents the VCA score, the blue "" represent the PRE score, and the green "" represents the POW score. Fig. 9(a) shows that for  2 , the best VCA and MOS appear in shifting Step 7 and 5, respectively. Fig. 9(b) shows that for  32 , the best VCA and MOS appear in shifting Step 7 and 6, respectively. Fig. 9(c) shows that for  53 , the best VCA and MOS appear in shifting Step 10 and 6, respectively. Fig. 9(d) shows that for  103 , the best VCA and MOS appear in shifting Step 8 and 6, respectively. From the facts, it is found that before the VCA score reaches its maximum, the MOS has reached its maximum and shows a downward trend. This indicates that VCA increases slowly after it reaches a certain grade. If VC continues to be improved, it will not benefit subjective 3D visual satisfaction. At the same time, the negative impact of the PD quality decline on subjective 3D visual satisfaction increases rapidly, and ultimately leads to the decline in subjective 3D visual satisfaction. Fig. 10 shows the original S3D image, rendered S3D images with respect to the best MOS score and the best VCA of each test sequence in  2 ,  32 ,  53 and  103 . The figure shows that compared with the original S3D image, the disparity of all rendered S3D images is greatly reduced. It should be noted that the disparity of the rendered S3D image with the best MOS is larger than that of the rendered S3D image with the best VCA, that is, the PD quality is better. 3.2.4. The VCPD model for 3D visual satisfaction Based on the subjective assessment of 3D visual satisfaction and the objective assessment of VCA, PRE, and POW, we construct an objective assessment model for 3D visual satisfaction, namely the VCPD model, by using the ε -SVR [39]; the histogram intersection function is selected as the kernel function of the ε -SVR. 234 S3D images of eighteen  s randomly selected from  form the training set, while 52 S3D images of the four remaining  s in  constitute the test set. The feature vector X of each training S3D image includes three objective assessment scores from the models of VCA, PRE, and POW; the label y is the MOS of 3D visual satisfaction. Let Xj =(Qj VCA , Qj PSE , Qj POW ) denote the feature vector, Qj VCA , Qj PSE and Qj POW denote the assessment scores output from the models of VCA, PRE, POW, and yj = MOSj denote the label of the jth training S3D image, thus, the training sample set can be expressed as {(Xj , yj ); j = 1,2,…,J}, where J is the number of training samples. For the input feature vector Xg =(Qg VCA , Qg PSE , Qg POW ) of the gth test S3D image, the output score yg of the VCPD model is expressed by

yg = Q VCPD (Xg )

(19)

As mentioned in the Sections 3.2.1 and 3.2.2, the output score QVCA of VCA model and the output score QPRE of PRE model can be calculated by the viewing distance map ZM ; the output score QPOW

8

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

Fig. 8. The right views of the selected visual discomfort S3D images in the IVY Lab database.

(a) Scores of Φ′2

(b) Scores of Φ′32

(c) Scores of Φ′53

(d) Scores of Φ′ 103

Fig. 9. MOSs, VCA, PRE, and POW scores of  2 ,  32 ,  53 and  103 .

of POW model can be calculated by the original viewing distance 0 and the viewing distance map Z map ZM M, and ZM can be calcu0 and Z in Eq. (5). Therefore, the output score y of lated by ZM n the VCPD model can be represented as follows

views are obtained according to the VDNS-based rendering method described in Section 3.1, and the final new S3D image with the improved VC can be obtained.

0 y = Q VCPD (ZM , Z n )

3.3. Visual comfort adjustment based on VDNS

(20) QVCPD ,

When the VCPD score, is the maximum, the visual satisfaction of S3D image is the best. At this time, if the current VCA score is better than the initial VCA score, the current nearest shifting distance Zn can be considered as the optimal nearest shifting opti viewing distance Zn , which can be expressed as 0 0 Znopti = arg max(Q VCPD (ZM , Zn ))s.t.Q VCA (ZM , Znopti ) 0 > Q VCA (ZM , 0)

Znopti ,

(21)

After obtaining the scene is shifted in depth direction via VDNS according to Eq. (5), and the new left and right rendered

opti

In order to compute Zn in Eq. (21), a stepwise manner is used to shift the scene of the S3D image to be adjusted for VC. Both of the two assessment scores output from VCPD and VCA models are calculated in each step of VDNS. When the VCPD score of the S3D image reaches the maximum and its VCA score is better than its initial VCA score, it is inferred that the VC is improved just opti right and the corresponding nearest shifting distance isZn . opti Here, an example is given to determine Zn for No.32 S3D image in the IVY database, Zn increases in a stepwise manner from 0 to 1200 mm with a step length of 100 mm, the correspond-

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

9

Fig. 10. The original S3D images, rendered S3D images with respect to the best MOS and the best VCA of  2 ,  32 ,  53 and  103 .

ing rendered results of the left and right views of each shifting step are shown in Fig. 11. From the original S3D images, it can be found that the foreground obviously has cross disparity, protruding from the screen. According to Eq. (9), The nearest and farthest part of the scene to the human eye, i.e. the nearest and the farthest viewing distance are 564 mm and 680 mm respectively. Therefore, the relative depth of the scene is 116 mm, the angular disparity of the nearest part is −4.11° and that of the farthest part is −2.99° The whole scene is far beyond Panum’s fusion area, which results in high visual discomfort; thus, the 3D visual satisfaction is also low. However, at this time, the scene is close to eyes, and the sense of presence is excellent. With an increase in Zn , the disparity of the foreground decreases gradually, the visual comfort and 3D visual satisfaction of No.32 S3D image are improved gradually. When Zn = 600 mm, the nearest viewing distance is 1164 mm, and the farthest viewing distance is 1404 mm; the relative depth is 240 mm, so the sense of presence is decreased and the sense of power is increased. The angular disparity of the nearest part is −0.72°, and that of the farthest part is −0.17° The MOS of 3D visual satisfaction reaches the best score of 4.19, and the output score QVCA of VCA model is computed as 4.39. When Zn = 700 mm, the nearest viewing distance is 1264 mm, and the farthest viewing distance is 1524 mm; the relative depth is 260 mm, so the sense of presence is further decreased and the sense of power is further increased. The angular disparity of the nearest part is −0.46°, and that of the farthest part is 0.04° QVCA is computed as 4.52, which is the best, but the subjective MOS of 3D visual satisfaction is 4.06 and starts to a decline. opti So, it can be approximated that Zn is 600 mm. On continually increasing Zn , the sense of presence and power are further decreased and increased, respectively. Both of the VC and 3D visual satisfaction tend to decline. When Zn ≥ 900 mm, the whole scene is behind the screen and has only uncrossed disparity. When Zn = 600 mm and 1000 mm, the QVCA are 4.39 and 4.41, respectively. It is worth noting that the two scores are very close. However, according to the 3D visual satisfaction MOSs shown in Fig. 9(b), viewers favored the S3D image that is closer to the eyes. Therefore, when improving VC, viewer’s preferences for the components that make up the PD quality should also be fully taken into account. The VDNS-based rendering method can

improve the VC of the scene; meanwhile, the sense of presence and power will also be changed. Therefore, it is necessary to improve VC under the guidance of the VCPD model. 4. Experimental results and discussion To verify the effectiveness of the proposed method, the performance of the proposed objective assessment model for 3D visual satisfaction is first analyzed and then compared with other VC improvement methods [19,20] based on subjective and objective assessment. Finally, the characteristics and limitations of the proposed method are analyzed, which reveal the problems to be further studied in the future. Here, two subjective assessment experiments are conducted. The first is used to establish the VCPD model; the specific process is detailed in Section 3.2.3, and the second is used for comparison with the existing VC improvement methods [19,20]; the specific process is detailed in Section 4.2.2. All the experimental test images are taken from the IVY Lab database. 4.1. Performance of the VCPD model We use the cross-validation method to verify the validity of the VCPD model. In each validation, 234 S3D images of eighteen  s selected from  constitute the training set, while 52 S3D images from the four remaining  s in  made up the test set; hence, a total of 7315 cross validations are required. Some nonlinear factors may be leaded into the subjective satisfaction assessment experiment. To avoid the influence of these factors on the performance of the objective assessment model, we use a five-parameter logistic function to perform nonlinear fitting for the objective assessment scores output from the VCPD model [23] in each validation, then the correlation between the nonlinear fitted scores and the 3D visual satisfaction MOSs of the test S3D image set is calculated to measure the performance of the VCPD model. Four commonly used performance indicators are used to verify the validity of the VCPD model, which include Pearson linear correlation coefficient (PLCC), Spearman rank-order correlation coefficient (SROCC), mean absolute error (MAE) and root mean squared error (RMSE). Among the four indicators, PLCC is used to

10

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

shifing steps

left views

right views

S3D images

0 original

1 ΔZn = 100mm

2 ΔZn = 200mm

3 ΔZn = 300mm

4 ΔZn = 400mm

5 ΔZn = 500mm

6 ΔZn = 600mm

7 ΔZn = 700mm

8 ΔZn = 800mm

9 ΔZn = 900mm

10 ΔZn = 1000mm Fig. 11. Stepwise process of VC improvement based on VDNS for the No.32 S3D image in the IVY database.

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

11

11 ΔZn = 1100mm

12 ΔZn = 1200mm

Fig. 11. Continued

Table 1 The performance of VCPD model. PLCC

SROCC

RMSE

MAE

0.9439

0.9349

0.3308

0.2643

measure the assessment correlation, SROCC is utilized to benchmark the assessment monotonicity, and MAE and RMSE are used to measure the assessment accuracy. The PLCC and SROCC scores range from 0 to 1, and larger scores indicate better assessment performance. However, for MAE and RMSE, smaller scores indicate better assessment performance. The average performance indicators of 7315 cross validation experiments are given in Table 1. The results show that the VCPD model has a small error and high consistency with human subjective perception.

0 rate of all points in the scene are defined respectively. With ZM representing the viewing distance map of the original scene and ZM representing the viewing distance map of the scene after VC improvement, the viewing distance change rate map of the scene is defined by V 0 ZM (x, y ) = ZM (x, y )/ZM (x, y ) − 1

Assuming that there are N points in the viewing distance map, the average viewing distance change rate is defined as V Zμ =

1 V Z (x, y ) N x,y M

The proposed VC improvement method is compared with other methods [19,20]. Considering that the study in [19] chose S3D images (Nos.86 and 101) in the IVY Lab database, and the study in [20] chose discomfort S3D images (Nos.32, 33, 49, 51, and 53) for experiments, the proposed method uses the same test S3D images to compare subjective and objective VC, 3D visual satisfaction, and other indicators related to these experimental S3D images with the studies in [19,20]. 4.2.1. Comparison of objective experimental results The previous method in [19] needed to artificially set the target score for VC improvement, then two parameters, disparity scale value s and disparity shift value h, are used to control the degree of disparity shifting and disparity scaling respectively so as to adjust the disparity and gradually improve the VC to the target score. But the proposed method aims at obtaining the optimal solution of the VCPD model that integrates objective assessment models for VC and PD quality. It is an automatic optimization process and does not need to preset the target score of visual comfort. Therefore, we can only compare VCPD method with the above study in [19] on the premise of the same VCA score. First, the proposed method is applied to improve VC, and the optimal score of the VCPD model and the scores of VCA, PRE, and POW models are obtained. Second, the score of VCA obtained by the proposed method is set as the target score of improving VC of the method in [19], and then, the scores of PRE, POW, and the VCDP model of the new S3D image generated by the method in [19] are calculated. Finally, the sores of the VCPD model, PRE, and POW with respect to the two methods are compared with the same score of VCA. To measure the geometric proportional distortion in the scene after VC improvement, the consistency parameters ZσV of the viewing distance change rate and PσV of the perceived distance change

(23)

The consistency of the viewing distance change rate is expressed using the mean square deviation of the viewing distance change rate as follows



V

4.2. Performance comparison of the VC improvement methods

(22)

Zσ =

1 V 2 (Z (x, y ) − ZμV ) N x,y M

(24)

0 be the perceived distance map of the original Similarly, let PM scene and PM be the perceived distance map of the scene after V, VC improvement, then the perceived distance change rate map PM V the average perceived distance change rate Pμ , and the consistency of the perceived distance change rate PσV are defined as shown in Eqs. (25)–(27), respectively. V PM (x, y ) = PM (x, y )/PM0 (x, y ) − 1

PμV =

1 V P (x, y ) N x,y M

 V

Pσ =

1 V 2 (P (x, y ) − PμV ) N x,y M

(25) (26)

(27)

The values of ZσV and PσV increase with the degree of the geometric proportional distortion in the scene. As shown in Fig. 5(b), the scene naturally has no geometric proportional distortion after performing VDNS. According to Eqs. (2) and (22), the values V (x, y ) are the same, so ZV = 0. of each point in ZM σ The proposed method and the method in [19] are compared on the output scores of VCA, PRE, POW, VCPD models and ZσV by using S3D images Nos. 86 and 101. The scores of these objective assessment indicators, QVCA , QPRE , QPOW , QVCPD , ZσV and PσV are given in Table 2. When the proposed method is applied to improve the VC of the No.86 S3D image, the output score QVCPD of VCPD model is equal to 4.41 and the output score QVCA of VCA model is equal to 4.37. At this time, the nearest shifting distance is increased by 620 mm (that is, Zn = 620 mm). When the QVCA of the No.86 S3D image is preset to be 4.37 using the method in [19], the disparity scale value s = 0.4 and the disparity shift value h = 12 pixels. Similarly,

12

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374 Table 2 Objective and subjective comparison of the method in [19] and the proposed method with the S3D images Nos.86 and 101 in the IVY Lab database. Image ID

Methods

86

Original method [19] proposed Original method [19] Proposed

101

parameters

objective assessment

s = =0.4, h = =12 Z = 620mm s = =0.5, h = =27

Z = 640mm

QVCA

QPRE

QPOW

3.76 4.37 4.37 2.65 3.76 3.76

4.90 4.84 4.82 4.91 4.84 4.83

3.53 3.26 4.25 3.53 3.73 4.27

subjective assessment ZσV

PσV

0.179 0.000

0.170 0.005

0.257 0.000

0.236 0.025

QVCPD

VC

1.32 2.61 4.41 1.46 2.18 4.06

3.25 4.13 4.25 2.69 3.75 3.69

GPD 3/16 0/16 11/16 0/16

3D visual satisfaction 1.75 2.94 4.31 1.81 2.56 4.19

Table 3 Objective and subjective comparison of the proposed method and the method in [20] with the S3D images Nos.32, 33, 49, 51, and 53 in the IVY Lab database. Image ID

method

32

original method [20] proposed original method [20] proposed original Method [20] proposed original method [20] proposed original method [20] proposed

33

49

51

53

objective assessment QVCA

QPRE

QPOW

2.35 4.26 4.34 2.35 3.88 3.91 1.35 4.01 3.36 1.59 4.46 3.81 1.76 4.36 4.36

4.93 4.82 4.86 4.94 4.84 4.87 4.93 4.83 4.86 4.90 4.82 4.83 4.93 4.82 4.84

3.53 3.22 4.41 3.53 1.91 4.40 3.53 1.44 4.36 3.53 1.19 4.18 3.53 3.92 4.59

subjective assessment ZσV

PσV

0.127 0.000

0.121 0.002

0.670 0.000

0.639 0.019

0.565 0.000

0.539 0.034

0.359 0.000

0.340 0.019

0.095 0.000

0.091 0.002

when the proposed method is used to improve VC of the No.101 S3D image, its QVCPD is equal to 4.06 and QVCA is equal to 3.76, with Z = 640 mm. When the preset target QVCA of the No.101 S3D image is set to 3.76 using the method in [19], it can be calculated to get s = 0.5 and h = 27. In Table 2, the method in [19] is only slightly ahead of the proposed method in terms of the QPRE ; however, in terms of both the QPOW and QVCPD , the proposed method is better than the method in [19]. To improve VC, the method in [19] substantially compresses the disparity range of the scene, leading to a decrease in the sense of power; this consequently leads to a decrease in 3D visual satisfaction. In addition, ZσV is equal to 0 and PσV is approximately equal to 0 in the proposed method, which shows that the viewing distance and perceived distance of each point in the scene changes consistently before and after VC improvement; that is, there is no geometric proportional distortion in the depth direction. However, there is obvious geometric proportional distortion in the depth direction in the method in [19]. The method in [20] used for comparison can adaptively compress the disparity range of the scene in a nonlinear manner and improve VC. The proposed method is compared with the method in [20] considering five S3D images in the IVY Lab database, as presented in Table 3. The QPRE , QPOW , and QVCPD of the proposed method are all better than those of the method in [20]. In the S3D image No.53, the two methods have the same QVCA . The QVCA s of the proposed method are higher than those of the method in [20] for the S3D images Nos.32 and 33 but lower for the S3D images Nos.49 and 51. The reason is that the S3D images Nos.49 and 51 are outdoor scenes with large depth ranges, while the proposed method does not compress the scene in the depth direction to avoid geometric distortion in the depth direction; thus, the VC improvement effect is limited. The method in [20] compressed the scene in the depth direction significantly, so its VC is improved more. However, it can also be found that the ZσV s and PσV s of the S3D images Nos.49 and 51 are relatively high, while the QPOW is

QVCPD

VC

1.13 2.26 4.25 1.15 2.04 3.84 1.00 1.96 3.61 1.06 2.15 3.67 1.10 3.06 4.22

2.13 4.13 4.31 2.38 3.94 4.06 1.19 4.13 3.56 1.44 4.38 3.75 1.56 4.13 4.31

GPD 3/16 0/16 16/16 0/16 15/16 0/16 15/16 0/16 0/16 0/16

3D visual satisfaction 1.38 2.50 4.19 1.50 2.31 3.75 1.06 2.19 3.75 1.06 2.44 3.94 1.00 3.19 4.38

relatively low for the method in [20]. It indicates that the geometric proportional distortion of the scene in the depth direction is more serious, and the sense of power decreases seriously; hence, QVCPD of the method in [20] is not as good as that of the proposed method. 4.2.2. Comparison of subjective experimental results To further verify the performance of the proposed method, we use the S3D images listed in Tables 2 and 3 for further verification by subjective assessment comparison. The subjective assessment is based on ITU-R BT.500–11 and ITU-R 1438 [40]. For the S3D images with Nos.86 and 101, the original S3D images, the S3D images processed by the proposed method, and the S3D images processed by the method in [19] are shown in Fig. 13. For the S3D images with Nos.32, 33, 49, 51 and 53, the original S3D images, the S3D images processed by the proposed method, and the S3D images processed by the method in [20] are shown in Fig. 14. In the experiments, the original S3D image is displayed first, and then the processed S3D images are displayed in random order. Sixteen testers introduced in Section 3.1(3) participate in the experiments under the same viewing conditions. Each original S3D image and its two processed S3D images are presented three times. The first time is not rated. In the second time, testers are asked to rate the degree of VC and 3D visual satisfaction according to the ACR test methodology described in ITU-T P.910 and ITU-T P.911. The degree of VC is rated on a scale from 1 to 5, i.e. 5=very comfortable, 4=comfortable, 3=mildly comfortable, 2=uncomfortable, and 1=extremely uncomfortable. The degree of 3D visual satisfaction is also rated on a scale from 1 to 5: 5=very good, 4=good, 3=middle, 2=bad, and 1=very bad. In the third time, testers are asked to state yes/no regarding whether there is geometric proportional distortion (GPD) in each processed 3D image; yes has a score of 1, and no has a score of 0. The final MOSs of 3D visual satisfaction for each test S3D image is calculated as the mean of the 16 opinion scores regarding 3D visual satisfaction. The final MOSs of VC for each test

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

13

(a) The original S3D images with Nos.86 and 101

(b) Improved S3D images using the method in [19]

(c) Improved S3D images using the proposed method

(d) The S3D images in the first line are the locals of the original No.86 S3D image, the S3D images in the second line are the locals of No.86 S3D image processed by the method in [19], and the S3D images in the third line are the locals of No.86 S3D image processed by the proposed method. Fig. 12. Comparison of the original S3D images of Nos.86 and 101 and those with improved VC (a) The original S3D images with Nos.86 and 101 (b) Improved S3D images using the method in [19] (c) Improved S3D images using the proposed method (d) The S3D images in the first line are the locals of the original No.86 S3D image, the S3D images in the second line are the locals of No.86 S3D image processed by the method in [19], and the S3D images in the third line are the locals of No.86 S3D image processed by the proposed method.

S3D image is calculated as the mean of the 16 opinion scores for VC. The final MOSs of GPD for each test S3D image is the percentage of testers who can perceive GPD among the total of 16 testers. From the subjective assessment data given in Table 2, it can be found that VC and 3D visual satisfaction are all significantly increased after the implementation of the proposed method and the method in [19]. The comparison of 3D visual satisfaction between the proposed method and the method in [19] is conducted on the premise of the same output score of objective VCA model. Therefore, the two subjective assessment scores of VC after implementing the two methods are not significantly different; however, for the subjective assessment of 3D visual satisfaction, the proposed method has obvious advantages over the method in [19], which is consistent with the objective assessment of the VCPD. In the subjective assessment of GPD, none of the testers observed that there

was GPD in the scene processed by the proposed method, but some testers believed that the geometric proportion of the scene processed using the method in [19] was not consistent with the original scene. As shown in Fig. 12(b) and 12(c), which are obtained by the method in [19] and the proposed method, respectively, both of the two methods significantly reduce the disparity of S3D images and improve VC. The foreground disparity magnitudes of S3D images processed by the two methods are roughly equal. So the senses of presence obtained by the two methods are also approximately equal, as shown by the QPRE s in Table 2. However, according to the local comparisons of S3D images shown in Fig. 12(d), the background of S3D image processed by the method in [19] has lower disparity and a weaker sense of power, QPOW of No.86 S3D image obtained by the method in [19] is

14

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

(a) Original S3D images with Nos.32, 33, 49, 51, and 53

(b) Improved S3D images with No.32, 33, 49, 51, and 53 using the method in [20]

(c) Improved S3D images with Nos.32, 33, 49, 51 and 53 using the proposed method

(d) The left is the local of the original No.53 S3D image, the middle is the local of No.53 S3D image processed by the method in [20], and the right is the local of No.53 S3D image processed by the proposed method. Fig. 13. Comparison of original S3D images and those with improved visual comfort of Nos.32, 33, 49, 51, and 53 (a) Original S3D images with Nos.32, 33, 49, 51, and 53 (b) Improved S3D images with No.32, 33, 49, 51, and 53 using the method in [20] (c) Improved S3D images with Nos.32, 33, 49, 51 and 53 using the proposed method (d) The left is the local of the original No.53 S3D image, the middle is the local of No.53 S3D image processed by the method in [20], and the right is the local of No.53 S3D image processed by the proposed method.

3.26, while that achieved by the proposed method is 4.25, which means that the PD quality obtained by the proposed method are better than the method in [19]. Because the two methods are compared on the premise of the same VC, so the proposed method is better than the method in [19], this can be proved by QVCPD . QVCPD of No.86 S3D image obtained by the method in [19] is 2.61, while that obtained by the proposed method is 4.41. According to the subjective assessment data listed in Table 3, VC and 3D visual satisfaction of S3D images are considerably increased by using the proposed methods or the method in [20]. In the 3D visual satisfaction subjective assessment of the processed 3D images, the performance of the proposed method is better than that of the method in [20]. In the VC subjective assessment of the processed 3D images, the performance of the proposed method is slightly better than that of method in [20]. In the subjective assessment of GPD, testers found that the geometric proportion of the scene processed by the method in [20] did not match the original image for the S3D images of Nos.32, 33, 49, and 51. For S3D image No.53, its objective assessments ZσV and PσV reflect a small amount of GPD; however, the reason is that the original scene is relatively flat and the depth range is very small, the depth range of each re-

gion in the scene does not change much after processing; hence, no GPD is observed by the testers. According to Fig. 13(b) and 13(c), compared to the original S3D images shown in Fig. 13(a), both the method in [20] and the proposed method significantly reduce the disparity magnitude in the foreground of S3D images and improve the corresponding VC. However, the new disparity magnitude in the foreground obtained by the proposed method is larger than that of the method in [20], which is obvious from the comparisons between Fig. 13(b) and (c) of Nos.32, 33, 49 and 51 S3D images. The same phenomenon can also be observed from the local comparison of two No.53 S3D images processed by the two methods, respectively. The disparity magnitude of the local S3D image processed by the method in [20] is close to zero and the local S3D image is approximate to a 2D image, as shown in the middle S3D image of Fig. 13(d), while that processed by the proposed method still retains a certain sense of power, as shown in the right S3D image of Fig. 13(d). Comparing the middle and right S3D images of Fig. 13(d) intuitively, it is obvious that the left and right edges of the badge in the right S3D image of Fig. 13(d) have larger disparity magnitude than that in the middle S3D image of Fig. 13(d). Analyzing the objective assessment indicators, QPRE , QPOW and QVCPD of No.53 S3D image ob-

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

tained by the method in [20] are 4.82, 3.92 and 3.06, respectively, while that obtained by the proposed method are 4.84, 4.59 and 4.22, respectively. Therefore, the sense of presence, power and PD quality obtained by the proposed method are all better than the method in [20]. Although the method in [20] offers a slightly better improvement of VC than the proposed method for outdoor S3D images Nos.49 and 51, according to subjective assessment, the 3D visual satisfaction of the S3D images processed by the proposed method is better than the method in [20]. Based on the above analysis of objective and subjective indicators, the proposed method is superior to methods in [19,20], and the new S3D images processed by the proposed method can better satisfy testers’ requirements. 4.3. Discussions 3D visual satisfaction is a basic assessment criterion to measure human visual experience of S3D contents. The VCPD model proposed in this paper reflects 3D visual satisfaction of S3D images which integrates VC and PD quality, Under the guidance of the VCPD model, a VC improvement method based on VDNS is proposed to significantly improve VC and 3D visual satisfaction of S3D images, which can be confirmed by the objective and subjective assessment indicators given in Tables 2 and 3. It can be seen that the proposed method is better than the methods in [19,20] in terms of 3D visual satisfaction improvement. However, in some cases, it is not always better than the method in [20] only in terms of VC improvement. Among the five test S3D images, VC improvement effects of the S3D images Nos.49 and 51 are not as good as that provided by the method in [20]. Further analysis shows that the scenes in the S3D images Nos.49 and 51 are outdoors. For the discomfortable areas in the foreground caused by an extremely large cross disparity, when VDNS is implemented to shift the scene, the disparity magnitude of the foreground can be reduced rapidly and VC can be improved effectively. However, the relative distance between the foreground and background will be further enlarged, and the background area may fall outside Panum’s fusion area where the angular disparity is larger than +1° Although testers pay less attention to the distant background area in general, it may still produce a certain degree of discomfort; thus, it is difficult to achieve complete VC. Therefore, it is worthwhile to further study how a large background area outside Panum’s fusional area can cause discomfort and how to overcome the discomfort caused by such background area. Some methods improved VC of S3D images by reducing the relative disparity of the foreground and background and disparity magnitude through disparity shifting or disparity compression. In particular, for outdoor scenes with larger depth of field, the degree of VC improvement is very high. However, such methods tend to weaken the sense of power and cause GPD in the scene; this will reduce the PD quality of S3D image and consequently lead to the decline in 3D visual satisfaction. The proposed VDNS can keep the geometric proportion of the scene consistent before and after VC improvement and maintain or even enhance the sense of power, which is more beneficial for improving the 3D visual satisfaction of the S3D images. Comparing ZσV , PσV and GPD in Tables 2 and 3, it is also found that when the objective ZσV and PσV is small, the subjective GPD is also small or even equal to zero; this indicates that the human visual system has a certain tolerance to the GPD in the scene. This means that when shifting the scene backward, the viewing distance of the background can be further reduced compared to that based on VDNS, without the GPD being perceived by eyes. This characteristic can be used to reduce the disparity magnitude of the background, which is beneficial for improving VC without changing the subjective GPD. So, the just noticeable GPD can be further researched and applied to the VC improvement.

15

5. Conclusions With the wide applications of stereoscopic three-dimensional (S3D) displays, the visual discomfort caused by S3D visual contents has become an urgent problem to be solved. In the research on visual discomfort (VC) improvement in S3D images, the existing methods tend to improve VC to the extreme, but this may cause the perceived depth (PD) quality, including the sense of presence, power, and realism to decline; this will induce viewer dissatisfaction, such as the lack of immersive feeling and stereo feeling, perceiving that objects in the scene are inconsistent or distorted in geometric proportion compared to the real objects. Therefore, the VC improvement methods for S3D content should also consider the PD quality so that the 3D visual satisfaction is optimal when watching S3D content. Based on the experimental data regarding the subjective assessment of 3D visual satisfaction, we establish an objective 3D visual satisfaction assessment model, integrating VC and PD quality and denoted as VCPD model, and propose a scene shifting scheme, denoted as Viewing Distance Nonlinear Shifting (VDNS), which can maintain consistency in terms of the geometric proportion of the scene. Under the guidance of the VCPD model and based on the VDNS scheme, the VC of the S3D images is automatically improved in a stepwise manner. The VCPD model integrates the various factors of VC and PD quality synthetically. It solves the problem of optimizing the trade-off relationship between the increase in VC and the decrease in the PD quality. The experimental results show that the proposed VC method can optimize the overall VC and PD quality of S3D images comprehensively, therefore, its 3D visual satisfaction is also optimal. According to this study, if the scene has a larger depth of field, the background may be shifted to the visual discomfort area when the scene is shifted backward based on VDNS, therefore, how to improve VC of the background at the same time is the next aspect requiring to be studied. To bring the background area closer to the VC area, the shifting distance of the background should be reduced as much as possible while keeping the geometric proportion of the scene consistent. Therefore, the objective assessment model of just noticeable geometric proportional distortion should be studied. In addition, the technology of rendering left and right views used in VC improvement may lead to rendering distortion of S3D images. Therefore, it is also necessary to develop a better algorithm of rendering left and right views for VC improvement in S3D images and integrate the rendering distortion of S3D images into the assessment of 3D visual satisfaction, which is also one of significant research topics. Finally, it will be interesting to apply the proposed method to address issues of VC improvement in S3D video. Declaration of Competing Interest None. Acknowledgements This work was supported by the Natural Science Foundation of China under Grant Nos. 61671258, 61871247, 61931022, 61671412 and 61620106012. It was also sponsored by the K.C. Wong Magna Fund of Ningbo University. References [1] B. Jiang, J. Yang, Z. Lv, H. Song, Wearable vision assistance system based on binocular sensors for visually impaired users, IEEE Int. Things J. 6 (2) (2019) 1375–1383. [2] H. Oh, S. Ahn, S. Lee, A.C. Bovik, Deep visual discomfort predictor for stereoscopic 3D images, IEEE Trans. Image Process. 27 (11) (2018) 5420–5432. [3] Y. Fan, M.C. Larabi, F.A. Cheikh, C. Fernandez-Maloigne, A survey of stereoscopic 3D just noticeable difference models, IEEE Access 7 (2019) 8621–8645.

16

H. Ying, M. Yu and G. Jiang et al. / Signal Processing 169 (2020) 107374

[4] Q. Jiang, F. Shao, G. Jiang, M. Yu, Z. Peng, Leveraging visual attention and neural activity for stereoscopic 3D visual comfort assessment, Multimed. Tools Appl. 76 (7) (2017) 9405–9425. [5] K. Terzic´ , M. Hansard, Methods for reducing visual discomfort in stereoscopic 3D: a review, Signal Process. Image Commun. 47 (2016) 402–416. [6] H. Xu, G. Jiang, M. Yu, T. Luo, Z. Peng, F. Shao, H. Jiang, 3D visual discomfort predictor based on subjective perceived-constraint sparse representation in 3D display system, Future Generat. Comput. Syst. 83 (2018) 85–94. [7] B. Kan, Y. Zhao, S. Wang, Objective visual comfort evaluation method based on disparity information and motion for stereoscopic video, Opt Express 26 (9) (2018) 11418–11437. [8] H.G. Kim, H. Jeong, H. Lim, Y.M. Ro, Binocular fusion net: deep learning visual comfort assessment for stereoscopic 3D, IEEE Trans. Circuits Syst. Video Technol. 29 (4) (2019) 956–967. [9] L. Rebenitsch, C. Owen, Review on cybersickness in applications and visual displays, Virtual Real 20 (2) (2016) 101–125. [10] H. Sohn, Y. Jung, H.W. Park, Y.M. Ro, Attention model-based visual comfort assessment for stereoscopic depth perception, in: Proceedings of the 17th International Conference on Digital Signal Processing, IEEE, 2011, pp. 1–6. [11] J. Park, S. Lee, A.C. Bovik, 3D visual discomfort prediction: vergence, foveation, and the physiological optics of accommodation, IEEE J. Sel. Top. Signal Process 8 (3) (2014) 415–427. [12] Y.J. Jung, H. Sohn, S. Lee, H.W. Park, Y.M. Ro, Predicting visual discomfort of stereoscopic images using human attention model, IEEE Trans. Circuits Syst. Video Technol. 23 (12) (2013) 2077–2082. [13] T. Lee, J. Yoon, I. Lee, Motion sickness prediction in stereoscopic videos using 3D convolutional neural networks, IEEE Trans. Vis. Comput. Graph. 25 (5) (2019) 1919–1927. [14] N. Padmanaban, T. Ruban, V. Sitzmann, A. Norcia, G. Wetzstein, Towards a machine-learning approach for sickness prediction in 360° stereoscopic videos, IEEE Trans. Vis. Comput. Graph. 24 (4) (2018) 1594–1603. [15] W.J. Tam, F. Speranza, S. Yano, K. Shimono, H. Ono, Stereoscopic 3D-TV: visual comfort, IEEE Trans. Broadcast. 57 (2) (2011) 335–346. [16] H. Sohn, J. Yong, S. Lee, M.R. Yong, Predicting visual discomfort using object size and disparity information in stereoscopic images, IEEE Trans. Broadcast. 59 (1) (2013) 28–37. [17] H. Oh, J. Kim, J. Kim, T. Kim, S. Lee, A.C. Bovik, Enhancement of visual comfort and sense of presence on stereoscopic 3D images, IEEE Trans. Image Process. 26 (8) (2017) 3789–3801. [18] H. Sohn, J. Yong, S. Lee, F. Speranza, M. Yong, Visual comfort amelioration technique for stereoscopic images: disparity remapping to mitigate global and local discomfort causes, IEEE Trans. Circuits Syst. Video Technol. vol.24 (5) (2014) 745–758. [19] Y. Jung, H. Sohn, S. Lee, Y.M. Ro, Visual comfort improvement in stereoscopic 3D displays using perceptually plausible assessment metric of visual comfort, IEEE Trans. Consumer Electron. 60 (1) (2014) 1–9. [20] C. Jung, L. Cao, H. Liu, J. Kim, Visual comfort enhancement in stereoscopic 3D images using saliency-adaptive nonlinear disparity mapping, Displays 40 (2015) 17–23.

[21] M. Urvoy, M. Barkowsky, P.L. Callet, How visual fatigue and discomfort impact 3D-TV quality of experience: a comprehensive review of technological, psychophysical, and psychological factors, Ann. Telecommun. 68 (11–12) (2013) 641–655. [22] Y. Zhou, W. Zhou, P. An, Z. Chen, Visual comfort assessment for stereoscopic image retargeting, in: Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), 2018. [23] Q. Jiang, F. Shao, G. Jiang, M. Yu, Z. Peng, Visual comfort assessment for stereoscopic images based on sparse coding with multi-scale dictionaries, Neurocomputing 252 (C) (2017) 77–86. [24] C.T.E.R. Hewage, M.G. Martini, Quality of experience for 3D video streaming, Commun. Mag. IEEE 51 (5) (2013) 101–107. [25] P.J.H. Seuntiëns, Visual Experience of 3D TV, Technische Universiteit Eindhoven, 2006. [26] H. Kim, S. Lee, A.C. Bovik, Saliency prediction on stereoscopic videos, IEEE Trans. Image Process. 23 (4) (2014) 1476–1490. [27] H. Oh, S. Lee, Visual presence: viewing geometry visual information of UHD S3D entertainment, IEEE Trans. Image Process. 25 (7) (2016) 3358–3371. [28] W. Ijsselsteijn, H. De. Ridder, R. Hamberg, D. Bouwhuis, J. Freeman, Perceived depth and the feeling of presence in 3DTV, Displays 18 (4) (1998) 207–214. [29] B. Jiang, J. Yang, Q. Meng, B. Li, W. Lu, A deep evaluator for image retargeting quality by geometrical and contextual interaction, IEEE Trans. Cybern. (Early Access) (2018) 1–13. [30] L. Meesters, W. IJsselsteijn, P. Seuntiëns, A survey of perceptual evaluations and requirements of three-dimensional TV, IEEE Trans. Circu. Syst. Video Technol. 14 (3) (2004) 381–391. [31] A. Telea, An image inpainting technique based on the fast marching method, J. Graph. Tools 9 (1) (2008) 25–36. [32] IVY LAB stereoscopic image database [Online]. Available: https://ivylabdb.kaist. ac.kr/base/dataset/data2013.php. [33] A.S. Gilinsky, Perceived size and distance in visual space, Psychol. Rev. 58 (6) (1951) 460–482. [34] A. Woods, Understanding crosstalk in stereoscopic displays, in: Proceedings of the Keynote Presentation at the Three-Dimensional Systems and Applications Conference, Tokyo, Japan, 2010, pp. 19–21. [35] ITU-T P.910, “Subjective video quality assessment methods for multimedia applications,” Recommendation ITU-T P.910, ITU Telecom. Sector of ITU, 1999. [36] ITU-T P.911, “Subjective video quality assessment methods for multimedia applications,” Recommendation ITU-T P.911, ITU Telecom. Sector of ITU, 1999. [37] ITU-R BT-500.11, “Methodology for the subjective assessment of the quality of television pictures,” ITU-R BT-500.11, 2002. [38] IVY lab stereoscopic image database [Online]. Available: https://github.com/ yinghongwei/NBU- S3DI- VDNS. [39] C.-.C. Chang, C.-.J. Lin, LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (3) (2011) 1–27. [40] ITU-R BT.1438, Subjective assessment for stereoscopic television pictures, ITU-R BT.1438, 20 0 0.