Accepted Manuscript Using Distortion and Asymmetry Determination for Blind Stereoscopic Image Quality Assessment Strategy Sid Ahmed Fezza, Aladine Chetouani, Mohamed-Chaker Larabi PII: DOI: Reference:
S1047-3203(17)30172-4 http://dx.doi.org/10.1016/j.jvcir.2017.08.009 YJVCI 2047
To appear in:
J. Vis. Commun. Image R.
Received Date: Revised Date: Accepted Date:
10 March 2017 20 June 2017 21 August 2017
Please cite this article as: S.A. Fezza, A. Chetouani, M-C. Larabi, Using Distortion and Asymmetry Determination for Blind Stereoscopic Image Quality Assessment Strategy, J. Vis. Commun. Image R. (2017), doi: http://dx.doi.org/ 10.1016/j.jvcir.2017.08.009
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Using Distortion and Asymmetry Determination for Blind Stereoscopic Image Quality Assessment Strategy Sid Ahmed Fezzaa , Aladine Chetouanib , Mohamed-Chaker Larabic,∗ a
National Institute of Telecommunications and ICT, Oran, Algeria b PRISME Laboratory, University of Orl´eans, France c XLIM Laboratory, University of Poitiers, France
Abstract Predicting the perceived quality of stereoscopic 3D images is a challenging task, especially when the stereo-pair is asymmetrically distorted. Despite the considerable efforts to fix this issue, there is no commonly accepted metric. Most of the attempts consisted in developing full reference quality metrics, while very few efforts have been dedicated to blind/no-reference (NR) quality assessment of stereoscopic images. In this paper, we propose a blind/NR quality assessment strategy for stereoscopic images based on the identification of the distortion type in order to select the most efficient impairment measure in addition to the determination of whether a stereo-pair is symmetrically or asymmetrically distorted to account for the binocular fusion properties. Finally, the last step combines the two key information derived from previous steps and estimates the 3D image quality appropriately using different binocular combination strategies. Experimental results on four publicly available 3D image quality assessment databases showed that the proposed strategy reaches significant prediction consistency and accuracy when compared to state-of-the-art metrics.
Keywords: Blind/NR image quality assessment, stereoscopic images, distortion classification, asymmetric distortion, weighting strategy
∗
Corresponding author Email addresses:
[email protected] (Sid Ahmed Fezza),
[email protected] (Aladine Chetouani),
[email protected] (Mohamed-Chaker Larabi)
Preprint submitted to J. Vis. Commun. Image R.
June 20, 2017
1. Introduction In the last few years, significant changes have been noticed in the field of visual media. The aim being to provide richer content and offer more immersive viewing experience, as well as to increase the user’s experience. Besides, the field of 3D visual applications providing depth sensation has attracted a sustained research interest, and important efforts have been made to bring more realism to consumers. In order to provide the user with depth perception, at least two slightly different views coming from the multimedia processing chain should be provided. The multimedia chain is prone to some artifacts/distortions having an effect on the overall 3D quality of experience (QoE). Consequently, a central issue in the success of 3D image/video technologies lies in the ability to predict and assess in a reliable manner the quality as perceived by the end user. By means of extensive subjective experiments, some works tackled this challenging issue by investigating the different factors affecting the overall 3D visual experience [1–3]. Obviously, subjective evaluation relying on human observers remain the best way to assess the 3D visual experience. However, this user-centric procedure is not applicable for real-time applications in addition of being expensive, time consuming and presenting a high observer context dependency. Therefore, computational objective tools represent the best alternative to automatically predict the perceived quality of 3D content in line with human perception. While significant progress has been achieved in 2D image/video quality assessment [4], 3D image quality assessment (3D-IQA) is still an open problem and, to date, no commonly accepted metric, ensuring reliable 3D quality evaluation, can be found in the literature. This observation is even more true when it comes to the asymmetrically distorted stereoscopic content. This type of content constitutes a real challenge to the visual quality research community. 3D perception is closely related to 2D, because of the use of two slightly shifted 2D images to the left and right retinas. Nevertheless, the process behind is much more complex and depends on several other perceptual attributes and cues. Many research efforts are still needed to reach a better understanding of 3D quality perception. Similarly to 2D-IQA metrics, 3D-IQA metrics can be classified into three categories: 1) full-reference (FR), 2) reduced reference (RR), and 3) blind/no-reference (NR), depending on the availability of the pristine reference image. According to the stat-of-the-art of stereoscopic image quality assessment (SIQA) metrics, most of the proposed approaches fall into the FR category. Beyond this, several authors checked the applicability of the rich set of FR 2D-IQA metrics on stereoscopic content [5–8]. As a conclusion, 2D-IQA metrics may provide satisfactory results in the case of symmetric distortions, i.e., when the left and right views are affected
2
by an equivalent impairment. However, when the latter condition is not fulfilled, i.e., asymmetric distortion, the results are relatively poor. Indeed, when such a content is provided to a given observer, different binocular phenomena may occur, and according to both distortion type and level of asymmetry, the 3D perceived quality may be dominated by the high/low quality view. With the aim to deal with this issue, several recent works tried to simulate the binocular characteristics by incorporating tuned weighting coefficients for the left and right views [13–25]. The latter methods demonstrated their efficiency by providing the best results. Despite their usefulness and importance, very few works have been dedicated to NR metrics for IQA problem in general and SIQA in particular. In NR QA algorithms, the assessment is performed on the processed content without any cue about the reference image. This corresponds to the operating of most of the real multimedia applications such broadcasting, videoconferencing or streaming services. Furthermore, the few tentative metrics existing in the literature assume a priori knowledge of the distortion affecting the stereo-pair and build the quality prediction models based on it [26–28]. Unfortunately, such information is often unavailable for the in-use systems. Based on the aforementioned weaknesses, we propose in this paper a blind quality assessment strategy for stereoscopic images based on both impairment and asymmetry determination. The proposed model is structured in three steps including: 1) determination of the distortion type using a machine-learning procedure based on a support vector machine (SVM), where the input features correspond to 2D-IQA scores, 2) determination of symmetry/asymmetry through a feature matching technique between the views of a stereo-pair, and 3) binocular fusion strategies based on both impairment and asymmetry awareness, in addition to a content-adaptive weighting approach. The obtained scores are compared to human judgment on different databases. The rest of the paper is organized as follows: Section 2 first presents a review of the most important attempts for objectively predict the perceived quality of stereoscopic images, with a particular emphasis on NR 3D-IQA algorithms, followed by a description of some findings on the binocular vision that motivated the proposed approach. Section 3 describes in detail each step of the proposed method. The experimental evaluations and the performance analysis are given in Section 4. Finally, Section 5 concludes this paper with a summary of the main findings and some open questions for future work.
3
2. Related work 2.1. Stereoscopic image quality assessment: state-of-the-art Initial attempts on SIQA consisted merely in applying state-of-the-art 2D-IQA metrics to each view of the stereo-pair, and combine them into an overall 3D quality score [5–8]. For instance, Campisi et al. [5] evaluated the appropriateness of four 2D-IQA metrics for stereoscopic images, where the obtained 2D image quality scores of the left and right views were combined using three different strategies: average score, dominant eye score and visual acuity approach. Results have shown no performance improvement over the average approach. Yasakethu et al. studied the relationship between subjective scores of 3D content and several 2D image quality metrics [7]. They concluded that some 2D image quality metrics provide a relatively good correlation with perceived 3D image quality. However, it is important to note that this class of methods have been validated using symmetrically distorted stereoscopic images, i.e., when both views have approximately the same level and type of distortion. According to the results obtained in the above summarized works, the exclusive application of 2D-IQA metrics may provide sufficient performance for the case of symmetric distortion. Nonetheless, when the stereoscopic image pair is asymmetrically distorted, this approach yield to unsatisfactory results. This can be explained by the fact that the binocular vision involves several visual phenomena, such as the binocular fusion and binocular rivalry/suppression. The latter are not considered when using such straightforward solutions and thus cannot provide accurate predictions. This aspect is thoroughly described in the next section (cf. Section 2.2). Considering the binocular disparity as an important feature of stereoscopic 3D, it has been combined with 2D-IQA in other 3D-IQA models. For instance, in [9], a score of disparity distortion estimated using 2D-IQA metrics between the original and distorted disparity maps, is combined with 2D image quality scores to provide an overall evaluation of 3D image quality. You et al. evaluated ten well-known 2D-IQA metrics on stereoscopic image pairs as well as on disparity maps, and combined the two respective scores in different ways [10]. The best results have been obtained in this case when using the structural similarity (SSIM) [11] metric on the stereo-pair and the mean absolute difference (MAD) on the disparity map. Although the use of depth/disparity together with 2D-IQA metrics brings some improvement to the performance with regards to human judgment, the results are globally poor and dependency to performance of the disparity estimation methods can be observed. Recently, the problem has been addressed differently by attempting to understand and model different properties of binocular perception for SIQA. Bensalma
4
and Larabi proposed a 3D-IQA method based on the binocular fusion process [13]. They modeled the behavior of simple and complex cells of the human visual system (HVS), and computed the 3D quality score as the difference of the binocular energy generated by the reference and impaired stereo-pairs. In [14], a FR quality metric for stereoscopic images considering the asymmetric property was proposed. Based on a binocular perception model, the authors extended the three components of SSIM (luminance, contrast and structural similarities) to the case of stereoscopic images, and finally, combined them to derive an overall 3D quality index. They showed that their extended SSIM performs better than the average SSIM of left and right views. Besides the binocular fusion, the human brain creates from the left and right views a mental image called the cyclopean image [35]. This cyclopean image refers to the view perceived by a virtual eye at the middle position between both eyes. With the aim to mimic this visual process, several approaches evaluated the perceived 3D quality using this image. For instance, Maalouf and Larabi [12] proposed to form the cyclopean images of both reference and distorted stereo-pairs. Then, the quality score is computed as an average of the difference between both corresponding cyclopean images and the coherence of disparity maps. In the same vein, to account for the binocular rivalry, the energy of the Gabor filter responses is used in the weighting coefficients of the left and right views when computing the cyclopean image [15]. Once the cyclopean images of the reference and the impaired stereo-pairs are computed, a FR 2D-IQA metric is applied on these obtained images to predict the perceived quality score. In [16], a FR quality metric based on both cyclopean amplitude and phase was proposed for stereoscopic images. Local amplitude and phase maps are extracted from the reference and impaired stereopairs, and then combined to produce cyclopean amplitude/phase maps. Finally, a pooling stage using a linear regression is employed to translate the similarity between both cyclopean maps (amplitude and phase) into a quality index. Shao et al. segmented and categorized the stereoscopic images into three kinds of regions, where each region was evaluated separately by considering the binocular properties [17]. Finally, a pooling stage using a linear weighting is applied to combine the scores of different regions in a single quality score. In order to evaluate the quality of stereoscopic 3D images and to simulate the neural mechanisms of binocular vision, Lin and Wu proposed to account for the binocular integration behaviors into existing 2D-IQA metrics [18]. De Silva et al. conducted a comprehensive set of subjective experiments where the perceptual quality of compressed (symmetric and asymmetric) stereoscopic videos were deeply analyzed [19]. Based on this subjective study, the authors developed a FR 3D-IQA metric that extracts three types of features, namely, structural distortions, blurring artifacts and content complexity, to quantify the effect of compression artifacts. In [20, 21], under
5
the assumption that the binocular rivalry is related to the relative energy of the stereo-pair, Wang et al. used the local energy to simulate the strength of view dominance. They proposed a binocular rivalry inspired multi-scale model to predict the perceptual quality of stereoscopic images. Zhang and Chandler proposed a FR 3D-IQA metric which is an extension of the MAD algorithm to 3D [22]. Their method consists of two stages, first, applying the conventional MAD algorithm to the left and right images and then combining the obtained quality scores based on a block-based contrast measure. In the second stage, the cyclopean image quality degradation is computed using statistical-difference based features. Finally, what was obtained from the two stages is fused to yield an overall quality score. In [24], for monocular perception, the authors proposed to combine visual attention with color visual features, while for binocular interactions, cyclopean images are formed using the gain control model and the difference-of-Gaussian responses of the left and right views. Finally, in order to reflect the nonlinear relationship between monocular perception and binocular interaction, a nonlinear pooling is performed to derive the final quality score. All the metrics mentioned above belong to the FR 3D-IQA category, assuming the availability of pristine stereoscopic images. This represents a serious obstacle for the majority of multimedia applications. Consequently, it becomes highly important to put the focus on image quality metrics capable to predict the perceptual quality of stereoscopic images without any cue about the reference stereo-pair. Only few studies have been dedicated to the development of NR/blind metrics for SIQA problem. NR 3D-IQA metrics addressing JPEG-compressed stereoscopic images were proposed in [26] and [27]. In [26], Sazzad et al. extracted local features of distortions and disparity from both reference and distorted stereo-pairs. Then, all the extracted features are combined through a non-linear function with weighting coefficients determined using a logistic regression model based on subjective data. Gu et al. proposed to combine a NR 2D-IQA metric with a nonlinear additive model, an ocular dominance model, and a saliency-based parallax compensation [27]. Ryu and Sohn proposed to compute perceptual blurriness and blockiness of each view of the stereo-pair, and then combine them into an overall quality index using a binocular quality perception model [28]. In [29] and [30], natural scene statistics (NSS) have been exploited in the design of NR IQA metrics for stereoscopic images. For instance, Chen et al. proposed a NR 3D-IQA algorithm that extracts both 2D and 3D features using NSS from the cyclopean image and the disparity map, respectively [29]. Next, the same extracted features are exploited to train a support vector regressor (SVR) used to predict the quality score of the tested stereo-pair. In the same vein, Appina et al. proposed a bivariate generalized Gaussian density model to extract the joint luminance and disparity subband statistics of the stereo-pair used for the design of a NR 3D-IQA metric
6
[30]. Zhou et al. proposed a blind 3D-IQA relying on two important properties of the visual system, namely the binocular vision and the visual structural [31]. The inter- and intra-pixel binocular quality-predictive features are first extracted from the self-similarity of binocular rivalry response as well as the binocular orientation selectivity. Next, the extracted features are used as input to SVR to provide the overall quality score. In [32], a NR 3D-IQA metric is introduced, where the stereoscopic images are first decomposed using the Gabor filter bank, and the cyclopean and difference maps representing the binocular characteristic and asymmetric information are generated. Then, the statistical characteristics of the latter are estimated by utilizing the generalized Gaussian distribution fitting. Finally, a SVR regression is learned to map the estimated characteristics to subjective scores. In [33], the authors proposed a machine learning (ML)-based trainer for blind quality assessment of stereoscopic images. More specifically, they constructed their ML model by mapping monocular feature encoding to the subjective scores. Finally, the inferred model is applied to the left and right images, independently, and the two arising scores are combined based on binocular features. Shao et al. proposed to develop binocular guided quality lookup and visual codebook to achieve NR 3D-IQA [34]. Therefore, the training stage was dedicated to the construction of the phase-tuned quality lookup and phase-tuned visual codebook based on the binocular energy responses. The quality score is finally obtained by a simple pooling process.
2.2. Binocular perceptual properties A sound understanding of binocular vision properties is essential to design and implement an effective stereoscopic quality assessment model. Therefore, in the following, we review some key phenomena of binocular vision that are related to the present work. Compared to 2D content, the perceptual quality of stereoscopic content depends on the qualities of both views, and according to the degree of similarity/dissimilarity between their visual content, different binocular phenomena may occur. More specifically, when left and right images with similar content are presented to the viewer, the human brain fuses them and makes them perceived as a single visual impression [36]. This phenomenon made by the HVS is known as the binocular fusion. On the other hand, when the two images of the stereo-pair are sufficiently dissimilar (asymmetric) causing match failure, instead of being merged, the two images enter in a kind of competition, referred to as the binocular rivalry phenomenon. In this specific situation, two cases may arise: 1) the binocular suppression, where one view dominates the perceived content, or 2) an instability due to the alternation of domination/suppression [37, 38]. In vision related studies, the binocular suppression is considered as a particular case of the binocular rivalry
7
[38]. Based on the above, it appears essential to consider these binocular phenomena, i.e., binocular rivalry/suppression, in any development of stereoscopic quality assessment algorithm in order to ensure an effective prediction, as was the case in [13–15, 17–23]. By means of subjective experiments, several studies [19–21, 38–44] tackled the issue of perceptual quality of asymmetrically distorted stereoscopic images. For instance, when the binocular stimuli are presented with an asymmetric blurriness, i.e., one image of the stereo-pair is blurred (low-quality view) while the second is sharp (high-quality view), all the previous studies demonstrated that the perceived 3D fusion is relatively close to the sharpest view [19, 38–44]. In other words, in case of asymmetric blur, the binocular perception is dominated by the high-quality view, and this can be explained by the fact that sharp details mask the blur effect of the lower quality image. On the contrary, when it comes to the blockiness distortion, some studies [39–41] reported that the perceptual 3D quality can be calculated as the average between the left and right views. Whereas other studies showed that in case of asymmetric blockiness the binocular quality is mainly determined by the lowquality view [42–44]. This brings the fact that the perceptual quality of stereoscopic images highly depends on distortion types [20, 21, 45]. As mentioned in [39], the visual quality of a stereoscopic content is dominated by the view that contains more information, which is consistent with the psychophysical findings described in [38]. This relationship between visual information and perceptual quality of stereoscopic images is well illustrated in [28], where the authors proposed to classify the distortions into two groups: information-loss distortion (ILD) and information additive distortion (IAD). More specifically, in case of asymmetry caused by ILD (such as asymmetric blurring), the resulting perceptual quality of the stereo-pair follows the high quality view. This is because the latter view (unaltered) contains more information than the former view (blurred image), thereby completing its missing information. On the other hand, in the case of blockiness or noise distortion (i.e., IAD), the binocular percept does not follow the high quality view, because the latter has a lower amount of information than the blocky or noisy image. In other words, noise and blockiness adds new information, thereby preventing them from being suppressed by the other view. Fig. 1 aims at visually illustrating this point by showing anaglyph images asymmetrically impaired using Gaussian blur, JPEG, JPEG 2000, and white noise distortions. One can clearly notice that for Gaussian blur and JPEG 2000 (see Figs. 1(b) and (d)), the artifacts are somewhat attenuated and the provided perceptual quality can be considered as high compared to the other distortions. However, for JPEG and white noise (see Figs. 1(c) and (e)), despite the fusion, the artifacts are still visible and provide
8
(a) Original
(b) Gaussian blur
(c) JPEG
(d) JPEG 2000
(e) White noise
Figure 1: Examples of anaglyph images impaired asymmetrically by Gaussian blur, JPEG, JPEG 2000 and white noise distortions.
low perceptual quality. Given these findings, and according to several other psychophysical studies [37, 38], the view of high contrast or rich contours, containing more information, tends to dominate the 3D final perception in the case of binocular rivalry.
2.3. Limitations of existing work and possible improvements As mentioned previously, the applicability of state-of-the-art 2D-IQA metrics based on the average of left and right views qualities has been demonstrated for the case of symmetric distortions. However, these approaches are not appropriate when the stereo-pair is asymmetrically distorted. This was expected given that these methods do not consider any of the properties of binocular perception, such as the binocular rivalry/suppression. Most of the recent methods trying to address this challenging issue, focused on modeling the binocular combination process of asymmetric stimulus by means of weighting coefficients for left and right views [14–20, 23–25, 28, 29]. In other words, in order to mimic the view dominance in the case of binocular rivalry, the weighting coefficients were used to model the stimulus strength. However, handling the symmetric and asymmetric cases in the same fashion may not be efficient, since the former can be modeled by a simple average between both views. Moreover, the use of complicated models for predicting the perceptual quality of symmetric distortions unnecessarily increases the complexity, or even reduces the performance, because of the specific tuning intended for asymmetric distortions. To the best of our knowledge, except for [29], no other study has adopted a stage for determining whether a stereoscopic image pair is symmetrically or asymmetrically distorted, before performing the quality assessment. By identifying how the distortion is distributed among views, this will enable to adopt an appropriate strategy for predicting quality. In [29], this information is only used as an input feature to SVR, and is not exploited neither in the construction of the cyclopean image, nor in addressing differently each case of distortion distribution. As specified previously, the 3D quality depends on the distortion type, making it driven by the high- or low-quality view, i.e., over- or under-weighting the high-
9
quality view. Despite the important role played by the distortion type in the case of binocular rivalry, none of the previous work from the literature have explicitly taken into account its determination in the prediction algorithm. Moreover, when this information is available at the quality prediction stage, it ensures an appropriate metric selection to address the specific type of distortions, which performs better than using a universal quality metric. The few blind 3D-IQA metrics proposed recently in the literature can be divided into two classes: 1) distortion-based metrics [26–28], and 2) machine-learning based metrics [29–34]. The application of the former class of methods is limited, since the information about the type of distortion is not available in most cases. For the learning-based approaches, these methods exploit some local features extracted from the tested stereo-pair, and use them as input to their model (e.g., SVM or neural network). However, the scalability and regularity of these local features to other stereo content with different characteristics is questionable. In addition, most of these methods train their models using human subjective scores, introducing the problem of dependency on the data set and test design. Therefore, it would be highly desirable to develop a blind 3D-IQA metric that is not limited to a specific type of distortion, which does not require learning based on subjective scores and does not exhibit dependency to data sets.
3. Proposed method To overcome the aforementioned limitations and based on the psychophysical findings on binocular vision, in this paper, we propose a blind quality assessment strategy for stereoscopic images. The proposed framework is designed as a threestage approach, shown in Fig. 2 : 1) identification of the nature of the distortion affecting the stereo-pair, 2) determination of distortion distribution (symmetrically or asymmetrically) between views, and 3) according to the type and distribution of distortion, computation of the 3D image quality using an appropriate binocular combination strategy. Each of these stages is described in details in the following subsections.
3.1. Distortion type identification Several works proposed to first identify the distortion contained in the image and then perform distortion-specific IQA [46–49]. For instance, in [46] a machine learning method was used to select the most salient features from a pool of candidate features. Then, the selected features were linearly combined to form a distortion-aware measure. In [47], the distorted image is classified as belonging to a certain type of distortion, followed by the fusion of three image quality metrics achieved based on the k -nearest-neighbor (k -NN) regression approach. In [48], a
10
Figure 2: Flowchart of the proposed blind quality assessment strategy.
linear discriminant analysis (LDA) classifier was employed to assign a distortion label to the test image, and accordingly, the best metric for the assigned distortion type was selected. In a different manner, in [49], natural scene statistics were used to first determine the distortion type. Then, the same set of statistics were exploited in assessing the quality of the distorted image. In these described 2D-IQA works, the distortion identification/classification stage has been employed to choose the most appropriate features and/or IQA metric for that specific degradation. In this paper, besides choosing the most adapted quality metric performing better than a global metric, the distortion identification also allows knowing which view will dominate the perceived quality in case of asymmetry leading to effectively quantify the global 3D image quality. The distortion type identification is achieved in this work based on a two-stage process. First, the characterization of the degradation is performed by feature extraction from both views, followed by the use of a machine-learning process for its identification. Note that both steps are dataset independent and do not require subjective tests. The two steps are described in the following.
3.1.1. Degradation characterization The literature of distortion-aware IQA metrics reported a significant number of features. In our case, the aim is to select those avoiding dataset dependency and allowing scalability. From the IQA literature, typically distortion-specific NR IQA
11
Table 1: Linear Pearson’s correlation coefficient obtained using JPEG NR IQA metric [51] for different distortion types. Degradation Blocking Blur Noise Ringing
Pearson Correlation 0.93 0.34 0.52 0.40
metrics show their inefficiency for other distortions than those for which they have been developed. To illustrate this point, consider Table 1, which shows the performance of NR IQA metrics developed for JPEG compressed images [51]. From this table, we can clearly see that the assessed metric achieves good performance for blocking artifact, however, the performance drops considerably for the other distortions. This observation shows that the performance of distortion-specific NR IQA metrics is supposed to be variable depending on the impairment. Therefore, we propose to take benefit of this limitation (in terms of performance) as a discriminatory factor and use the quality indexes produced by various NR 2D-IQA metrics as our features. Specifically, several features (i.e., quality indexes of NR 2D-IQA metrics) are considered for each degradation type (two for blockiness [50, 51], three for bluriness [52–54], one for ringing [55], and two for noise [56, 57]). These used NR 2DIQA metrics [50–57] are summarized in Table 2, in which each metric is defined according to the target distortion, the measured feature and the domain in which the measurement is applied. Once these NR 2D-IQA metrics are applied on both left and right views of the stereo-pair, the resulting quality indexes of the selected set of metrics constitute
Table 2: The list of features considered here for the degradation characterization. Degradation Blocking Blur Ringing Noise
Reference Bovik et al. [50] Wang et al. [51] Marichal et al. [52] Marziliano et al. [53] Chetouani et al. [54] Sheikh et al. [55] Van de Ville et al. [56] Buckley et al. [57]
12
Feature Block Block Block Edge Radial analysis NSS Variance Variance
Domain DCT Frequency DCT Spatial Frequency Wavelet Spatial DCT
our feature vector given by: f~i = qi1 , qi2 , ..., qi8 , i ∈ {l, r}
(1)
where f~i represents the feature vector of the left (f~l ) or right (f~r ) image, and q denotes the quality index. f~l and f~r are used as input to a classifier. In the following we describe how to map the extracted feature vectors onto distortion categories.
3.1.2. Distortion classification The last step to predict the distortion type is to feed the classifier with the 16 input features (i.e., 8 from each view) and obtain in return one output consisting in distortion label. Different classifiers can be used to achieve this purpose. For this work, we opted for the SVM as classifier since the latter has shown good performance in high-dimensional spaces, while offering high classification accuracy and generalization capabilities. For a detailed description of the SVM algorithm, the reader can refer to the tutorial paper in [58]. We use the SVM with the radial basis function (RBF) kernel defined as: K(xi , xj ) = exp(−γ||xi − xj ||2 ), γ > 0
(2)
where xi and xj denote the training vectors. The parameters of the SVM, i.e., C (the regularization parameter) and γ (the RBF kernel parameter), are tuned using cross-validation on the training set (see Section 4.2 in the experimental part for a detailed explanation of the training stage). In addition, we used the LIBSVM package [59] in order to implement the SVM. Then, once the SVM has been trained, it is applied on the test set. More specifically, for each stereo-pair of the test set, we extract the feature vector (i.e., 16 quality indexes) which is exploited by the trained SVM to produce the distortion type of the testing stereo-pair. Both stages of training and testing are illustrated in Fig. 3.
3.2. Distortion distribution One key information to accurately predict 3D image quality is related to whether the stereo-pair is symmetrically or asymmetrically distorted. Indeed, when the observer is provided with asymmetric stereoscopic stimuli, binocular rivalry/suppression may occur. These phenomena are, as evoked previously, related to the difficulty of the fusion process where the brain tries to match corresponding regions or points with similar content but at different levels of quality or details. Thus, this dissimilarity induces difficulty in matching the two views, making them rival.
13
Figure 3: The diagram of the degradation type detection.
Therefore, in order to mimic this visual behavior within our quality assessment framework and to determine whether the input stereo-pair is asymmetrically distorted or not, we apply a feature matching algorithm between both views. According to the matching rate, a decision is taken about the distortion distribution, i.e., symmetric/asymmetric. Over the past years, various algorithms to find correspondences between images have been proposed in the literature [60, 61]. However, the aim here is to provide an indicator on the matching proportion between images. Besides, given that one or both views can be highly distorted, we opted for using methods relying on local invariant features to reliably achieve the matching. The latter are more robust than dense correspondence methods, usually used for disparity estimation. Therefore, we selected the scale invariant feature transform (SIFT) [62] method for the matching purpose, because it is considered as one of the most efficient image matching techniques based on local features [61]. SIFT detects the feature points, and extracts the invariant feature descriptors allowing to generate large numbers of accurate matching features points corresponding to the same parts of the scene, while remaining robust to different types of image noise, illumination variations and geometrical changes [62]. In addition, in recent works [63, 64], it has been demonstrated that the regions around the detected feature points, such as SIFT keypoints, highly attract visual attention. Intuitively, one can assume that the HVS attempts to fuse these regions first. More formally, let consider a stereo-pair of images, denoted by Il for the left view and Ir for the right. SIFT is applied to Il and Ir to detect the set of SIFT keypoints Kl and Kr , respectively. Once the SIFT keypoints are detected and their descriptors extracted, they are matched. Then a geometric filtering using
14
the robust random sample consensus (RANSAC) [65] algorithm is performed for outliers removal. Based on the epipolar constraint and the inliers of feature points, RANSAC estimates the fundamental matrix and makes use of it to eliminate the outliers in correspondences. After the SIFT-RANSAC stage, each view of the stereo-pair will have a set of SIFT keypoints labeled as matched or unmatched. From that we compute the matching rate between views as follows: M atching Rate =
card(Ml,r ) , max(card(Kl ), card(Kr ))
(3)
where card(Ml,r ) represents the cardinality of the matched SIFT keypoints between Il and Ir images. Finally, the above calculated matching rate is assessed according to a fixed threshold T . Let Dl,r be an indicator on whether the stereo-pair is symmetrically or asymmetrically distorted, defined as follows 1 if M atching Rate > T, Dl,r = (4) 0 otherwise. where Dl,r = 1 (resp. 0) indicates symmetrically (resp. asymmetrically) distorted stereoscopic images. Thanks to a deep experimentation on various image databases, the threshold T was fixed to 35%. In other words, if the proportion of matched SIFT keypoints does not exceed 35% of the detected SIFT keypoints, the stereo-pair is considered as asymmetrically distorted, otherwise the distortion is symmetric. This aspect is discussed in the experimental section (cf. Section 4.3). In order to illustrate the correlation existing between feature matching and symmetric/asymmetric distortions, Fig. 4 shows some examples of matching applied on a stereo-pair degraded symmetrically and asymmetrically using five standard distortions, namely: Gaussian blur, JPEG, JPEG 2000, white noise and fast fading. One can easily notice that whatever the used impairment, the matching rate is higher for symmetric stereo-pairs (Fig. 4 left) than asymmetric stereo-pairs (Fig. 4 right). This simple example can be generalized independently of the content type or asymmetry level in order to demonstrate that the matching rate can well predict the nature of distortion distribution.
3.3. Quality estimation The aim of this last step is to combine the respective quality scores of the left and right views into an overall 3D image quality score. However, in contrast to previous works addressing the symmetric and asymmetric cases using a single
15
Figure 4: Examples of matching applied on symmetric (left) and asymmetric (right) distorted stereo-pairs.
model, we tackled the problem in a different fashion by adopting several options. Thanks to the information derived from the two previous steps, the 3D perceptual quality is adaptively estimated using an appropriate binocular combination strategy. In the following, we present two ways to reach that goal, where the first applies a fixed-weight strategy, while the second is relying on an adaptive-weight strategy based on the image entropy.
3.3.1. Fixed-weight strategy When the stereo-pair is symmetrically or near-symmetrically distorted, the global 3D quality index can be calculated by a simple average between both views’ quality, as follows: 1 Q3D = (Q2D + Q2D (5) r ) 2 l where Q2D and Q2D r denote the 2D image quality scores of the left and right views, l respectively. Moreover, considering that the type of distortion has been already identified during the first step, this opens the floor to the use of well performing 2D impairment metrics specifically designed for the identified distortion. To do so, several metrics have been used to evaluate blur [54], blockiness [51], noise [57], fast fading and ringing distortions [53]. It is important to note that these metrics are the same as those employed in the distortion identification stage. In another way,
16
Table 3: Values of w for the fusion of left and right quality scores for each distortion type. Distortion type Blur Ringing Fast Fading Noise Blocking
High-quality view (w) 0.753 0.73 0.583 0.221 0.304
Low-quality view (1 − w) 0.247 0.27 0.417 0.779 0.696
we reuse the same quality indexes generated in the previous step, thus avoiding redundant computation and maintaining an acceptable complexity. However, when the stereo-pair is asymmetrically distorted and according to the identified type of distortion, the perceptual 3D image quality can be dominated by the high- or low- quality view. Therefore, with the aim to mimic the predominance of the view in case of asymmetric distortion, we defined the following fusion model to produce an overall 3D quality score: 2D 2D 2D Q3D = w · max(Q2D l , Qr ) + (1 − w) · min(Ql , Qr )
(6)
where w is used as a weighting coefficient controlling the view dominance to simulate the binocular suppression process. Using a few training stereo-pairs, w has been empirically fixed for each distortion type, as reported in Table 3. From this table, one can clearly notice that when the distortion introduces new information, such as noise or blockiness, the low-quality view dominates the 3D perceived quality, i.e., w has low values. At the opposite, when the distortion causes a loss of information, such as blur or ringing, the high-quality view dominates, i.e., w has high values, which is consistent with the findings from 3D perception reporting that the visual quality of stereoscopic content is dominated by the view that contains the highest amount of information [39].
3.3.2. Adaptive-weight strategy According to the psychophysical findings and the previously obtained weights (i.e., fixed-weight strategy), the 3D quality estimation can be improved by adaptively defining the weighting coefficient of each view based on the inner visual information of the image to better account for the content as well as to model the strength of view dominance in the binocular fusion stage. Hence, when the stereo-pair is symmetrically distorted, we calculate the 3D image quality in the same way as in (5). However, in the case of asymmetric distortion, the image entropy is used to weight the left and right views as follows: 2D 2D 2D Q3D = el · max(Q2D l , Qr ) + er · min(Ql , Qr ),
17
(7)
(a) Original image
(b) Gaussian noise
(c) Gaussian blur
Figure 5: Examples of local image entropy maps (top: original and distorted images, bottom: corresponding image entropy maps). Brighter pixels indicate stronger information.
where el and er are defined respectively as: el =
El2 El2 + Er2
and
er =
Er2 , El2 + Er2
(8)
and E∗ denotes the image entropy formulated as follows: E=−
n X
p(xi ) · log2 (p(xi ))
(9)
i=1
with n and p(x) the number of bins and the probability of a bin in the histogram, respectively. Thanks to this modeling, the proposed metric can efficiently handle the asymmetric distortions. Because, if dissimilar stimuli are presented, their entropies will be different accordingly, and the contribution to 3D quality will be dominated by the view having the higher entropy as suggested by the psychophysical studies. Moreover, to demonstrate that the image entropy well correlates with the distortion types, we show in Fig. 5 the image entropy maps of one original image and corresponding distorted images obtained by introducing a subtractive distortion (blur) and an additive distortion (Gaussian noise). From this figure, we can clearly see that when the Gaussian noise is introduced, the amount of information increases, i.e., high entropy values, while when the image is altered by blur, this causes a reduction of the amount of information, i.e., low entropy values.
4. Experimental Results In this section, each step of the proposed framework is separately tested and validated. Finally, the ability of the proposed metric to predict the perceived quality of stereoscopic images is evaluated and compared to state-of-the-art 3DIQA metrics. To achieve these tasks, four publicly available 3D-IQA databases are
18
exploited: LIVE 3D IQA database Phase II [15, 29], IEEE standard association stereo image database [66], Waterloo-IVC 3D image quality database Phase I and Waterloo-IVC 3D image quality database Phase II [20, 21]. A brief description of these databases is provided in the next section.
4.1. 3D image quality databases 1. The LIVE 3D IQA database Phase II (LIVE 3D II) [15, 29] contains 360 distorted stereo-pairs generated from 8 pristine stereo-pairs. Every reference stereo-pair was altered by five distortion types (JPEG and JPEG 2000 (JP2K) compression standards, additive white Gaussian noise (WN), Gaussian blur (blur) and fast fading (FF)) generating a total of 120 symmetrically and 240 asymmetrically distorted stereo-pairs, and each of them was assessed by 33 subjects to produce differential mean opinion scores (DMOS). 2. The IEEE standard association stereo image database (IEEE stereoscopic) [66] contains 650 distorted stereo-pairs generated from 13 pristine stereopairs. The same five distortion types as LIVE Phase II were used to only provide symmetrically distorted stereo-pairs (130 each for JPEG, JP2K, WN, blur and FF). Each of these symmetrically distorted stereo-pairs has been assigned a DMOS derived from subjective quality ratings that the subjects provided. 3. The Waterloo-IVC 3D Image Quality Database Phase I (Waterloo-IVC I) [20, 21] contains 330 distorted stereo-pairs generated from 6 pristine stereopairs. Three commonly encountered distortion types, specifically WN, JPEG and blur were used to generate symmetrically and asymmetrically distorted stereo-pairs, which have been assessed by 24 subjects. However, in contrast to existing 3D-IQA databases, in addition to the usual asymmetric distortion case, i.e., with the same distortion type but at different levels, this database contains asymmetrically distorted stereo-pairs with mixed distortion types and levels. In the context of this study, we have not considered these mixed distorted stereo-pairs and picked only 258 distorted stereo-pairs. 4. The Waterloo-IVC 3D Image Quality Database Phase II (Waterloo-IVC II) [20] includes the same three types of distortions as Phase I, however, with more diverse image contents consisting in 460 distorted stereo-pairs created from 10 pristine stereo-pairs. Each distorted stereo-pair was viewed by 22 subjects providing the subjective scores (i.e., DMOS). In the same way as Phase I, we picked only 340 distorted stereo-pairs without taking into account the mixed distorted stereo-pairs. Table 4 provides the main features of the four databases, including the number of used images, their resolution, the used types of distortions, and finally, whether
19
Table 4: Description of the main features of the commonly used 3D-IQA databases. Database
# of images
Resolution
LIVE 3D Phase II
360
640×360
IEEE stereoscopic
650
1920×1080
Waterloo-IVC 3D Phase I
258
1920×1080
Waterloo-IVC 3D Phase II
340
1920×1080
Distortions Gaussian blur, white noise, JPEG, JPEG 2000, fast fading Gaussian blur, white noise, JPEG, JPEG 2000, fast fading Gaussian blur, white noise, JPEG Gaussian blur, white noise, JPEG
Symmetric/Asymmetric distortion Yes/Yes Yes/No Yes/Yes Yes/Yes
the database contains symmetrically and/or asymmetrically distorted stereo-pairs.
4.2. Performance of the distortion type detection Since we use an SVM classifier to identify the distortion type, a training step is required to tune its parameters and to calibrate the relationship between the extracted features and the class label (i.e., distortion type). For this purpose, we randomly divide the LIVE 3D II database into two non-overlapping subsets: training and testing. First, in order to ensure that the performance of our distortiontype classifier is not affected by the size of the training subset, the classification accuracy has been evaluated using the standard procedure involving three different train-test splitting: 30%, 50% and 80% for training while the remaining part is for testing. This train-test procedure has been repeated 1000 times, and the median classification accuracy over the iterations is considered as the final result and reported in Table 5. Table 5: Obtained classification accuracy for different sizes of the training subset of the LIVE 3D Phase II database. Ratio of samples for training Percentage accuracy of classification
30%
50%
80%
93.24%
94.60%
95.99%
Table 5 shows that for all sizes of the training subset the obtained values are higher than 90% and the performance can be considered as stable. Therefore, according to these results, it is clear that our distortion-classification approach is almost independent of the size of training samples. Next, once the SVM-based classifier has been trained and its parameters tuned, its classification accuracy is evaluated per-category of distortion on the four databases of Table 4. It is important to note that the trained model resulting
20
Table 6: Obtained confusion matrices (tested vs. available distortions) for the four used 3D IQA databases. (Ns→Noise, Bk→Blocking, Rg→Ringing, FF→Fast Fading, Bl→Blur)
(a)
(b)
(c)
(d)
from 80% train—20% test combination is employed for this purpose (these sizes are usually used in ML-approaches for IQA). Thus, the training has been done only on LIVE 3D II database, while for the three remaining databases (IEEE stereoscopic, WATERLOO-IVC I & II) the whole images have been used for testing without any prior training on them. The obtained confusion matrices for the four databases are presented in Table 6. For the LIVE 3D II database (see Table 6a) the mean classification accuracy is equal to 95.99%. For most of the considered degradation types, we achieved 100% of good classification, except for ringing and fast fading degradation types, where there is a confusion between them and for which we obtained 96.79% and 83.14%, respectively. This can be explained by the fact that the fast fading impairment was simulated using JPEG 2000 bitstreams and, at the same time, the ringing effect is also generated by such a type of compression. For IEEE stereoscopic database (see Table 6b), we obtained 88.77% as classification accuracy and the same confusion as for LIVE 3D II database is observed, while for Waterloo-IVC I & II databases (see Tables 6c and 6d), the classification accuracy is 96.03% and 95.15%, respectively. For both, blur degradation has been well classified while between blocking and fast fading we can see some confusion. The little confusion that can be observed in Table 6 was foreseen because the different impairments may share some properties (spatial or frequential) and cannot be considered as pure. For instance, the fast fading generates gradient variation that could be considered as blockiness. Finally, we can consider that the obtained performance of distortion-classification stage are sufficiently good to proceed to the next step.
21
Figure 6: Classification accuracy of distortion distribution classifier over various T values.
4.3. Performance of the distortion distribution determination The second step of the proposed framework is to determine whether the stereopair is symmetrically or asymmetrically distorted. As mentioned in Section 3.2, this is performed based on the matching rate between both views, which is evaluated according to a threshold T . In order to fix the value of this latter, we performed a deep experimentation on the four mentioned databases, where for each database the value of T has been varied from 0 to 100 (with a step of 0.1), and for each iteration the symmetric vs. asymmetric distortion classification rate C has been estimated. The optimal value of T (Topt ) is obtained as follows: Topt =
∩ arg max(Ck (T )).
k∈DB
(10)
T
where DB represents the four used databases. Fig. 6 shows the classification accuracy for the four databases versus the thousand values of T . Looking at the results of Fig. 6, it is clear that by fixing the value of T to 35%, we achieve optimal results across the four databases with an accuracy ranging from 90.2% to 100%. The misclassified stereo-pairs represent in turn a rate ranging from only 0% to 9.8% (see Table 7 for a separate result per database). Table 7: Obtained symmetric/asymmetric classification performance for the four used 3D-IQA databases. Database LIVE 3D Phase II IEEE stereoscopic WATERLOO-IVC 3D Phase I WATERLOO-IVC 3D Phase II
22
Classification Accuracy 90.2% 100% 98.37% 94.86%
Figure 7: Examples of asymmetrically distorted stereo-pairs with low perceptual difference between views (left and right), making them as symmetric-like stereo-pair from the perceptual point of view.
The misclassified stereo-pairs, in most cases, are those where the distortion is asymmetrically applied in a nearly-symmetric way from the perceptual point of view, as those depicted in Fig. 7. This is due to the low difference between the distortion levels of the views. Thus, our model classifies them as symmetric stereo-pairs, and obviously for these pairs the observer does not experience any binocular rivalry.
4.4. Performance of the proposed 3D image quality metric With the aim to provide a comprehensive and convincing evaluation of the performance of the proposed metric with fixed and adaptive weightings (called proposed FW and AW, respectively), we used the four databases described previously (see Table 4) and we compared with a total of eleven metrics from the literature. The set of considered metrics represents: 1) three FR 2D-IQA metrics, namely PSNR, SSIM and MS-SSIM because of their wide use for 3D-IQA. For the latter, the final score is calculated as the average of left and right views scores, 2) five FR 3D-IQA metrics, namely Gorley’s metric [8], Benoit’s metric [9], You’s metric [10], Chen’s metric [15] (called in the following Chen1 ), and Lin’s metric [18], and 3) three NR 3D-IQA metrics, namely Chen’s metric [29] (called Chen2 in the following), Appina’s metric [30] (called StereoQUE in the following) and finally Shao’s metric [34]. The list of metrics used for performance comparison may change depending on the used database because of the unavailability of quality scores. The performance of the set of metrics in addition to the two variants proposed here are evaluated for their prediction accuracy using the Pearson Linear Correlation Coefficient (LCC), and the Root Mean Square Error (RMSE) between the
23
Table 8: Performance evaluation (LCC, SROCC, RMSE) of the proposed 3D-IQA metric with regards to state-of-the-art metrics on the four used databases (LIVE 3D Phase II, IEEE stereoscopic, Waterloo IVC phase I & II). Methods PSNR SSIM MS-SSIM Gorley [8] Benoit [9] You [10] Chen1 [15] Lin [18] Chen2 [29] StereoQUE [30] Shao [34] Proposed (FW) Proposed (AW)
LLC 0.657 0.803 0.795 0.412 0.748 0.784 0.900 0.648 0.895 0.845 0.738 0.925 0.918
LIVE 3D II SROCC RMSE 0.632 8.829 0.793 6.721 0.777 6.851 0.255 10.875 0.728 7.490 0.683 7.862 0.889 4.987 0.638 8.600 0.880 5.102 0.888 7.279 0.710 7.355 0.908 3.018 0.895 3.210
IEEE stereoscopic LLC SROCC RMSE 0.683 0.664 6.086 0.855 0.877 4.555 0.918 0.928 4.193 0.507 0.535 8.092 0.875 0.866 4.683 0.869 0.883 4.128 0.857 0.882 4.027 0.870 0.806 5.246 N/A N/A N/A 0.834 0.829 5.548 N/A N/A N/A 0.878 0.886 4.687 N/A N/A N/A
Waterloo-IVC I LLC SROCC 0.579 0.551 0.625 0.686 0.609 0.581 0.502 0.484 0.673 0.528 0.710 0.588 0.753 0.694 0.684 0.631 N/A N/A 0.748 0.681 N/A N/A 0.893 0.883 0.904 0.898
Waterloo-IVC II LLC SROCC 0.528 0.530 0.661 0.623 0.713 0.712 0.593 0.582 0.547 0.534 0.679 0.622 0.618 0.590 0.587 0.578 N/A N/A 0.667 0.609 N/A N/A 0.887 0.861 0.890 0.866
predicted and subjective scores coming from the different databases. The prediction monotonicity is evaluated using the Spearman Rank Order Correlation Coefficient (SROCC). A perfect prediction is achieved by a given quality metric when LCC = SROCC = 1, and RM SE = 0. Let us remind that following the ITU-R recommendations relative to performance evaluation of quality models [67], prior to any performance measurement, a non-linear regression using a 4-parameter logistic function is applied in order to obtain the predicted scores for each database separately. Our performance evaluation starts by the inspection of metrics’ behavior on each individual database. The LCC, SRCC and RMSE results are provided in Table 8 for LIVE 3D II, IEEE stereoscopic, Waterloo-IVC I & II, where the top performing metric is given in boldface. An overview of the behavior of the proposed metrics is given thanks to the scatter plots provided in Fig. 8, in which each data point represents a stereoscopic test-images. This figure shows the scatter distributions of subjective MOS versus the predicted scores obtained by the proposed metric. As it can be seen, there is a good agreement between objective and subjective scores for the four used databases and for every type of impairments. This is materialized by the fact that dots are concentrated around the diagonal (i.e., diagonal means perfect agreement), thus indicating a good prediction of the human judgment. Obviously, there is some non-linearity due to the nature of the quality judgment. The latter is taken into account by using the aforementioned logistic function when computing performance metrics. From Table 8, one can observe that the proposed metric, with either fixed or adaptive weightings outperforms the majority of FR or NR quality metrics for almost all databases based on the performance indicators. The solely exception is for IEEE stereoscopic database where MS-SSIM is performing the best. Note
24
(a) Fixed Weighting (FW)
(b) Adaptive Weighting (AW)
Figure 8: Scatter plots of predicted quality scores against subjective scores (DMOS) of the proposed metric for the four used databases.
25
however that our metric achieves the second rank in this case and it does not rely on any cue about the reference image. Moreover, this performance of MSSSIM is facilitated by the fact that the IEEE stereoscopic database contains only symmetric distortions for which it has been already shown that 2D-IQA metrics may perform well. We can observe that 2D/3D FR metrics are globally performing bad on Waterloo-IVC I & II databases, thus showing a global disagreement between the metrics’ predictions and the human quality judgment. On LIVE 3D II, the performance of these metrics are somehow higher with some exceptions such as for Gorley’s metric. Regarding NR metrics, the performance of stereoQUE are low with regards to the proposed metric whatever the used database. Shao’s metric is performing bad on LIVE 3D II while Chen2 is providing good results but lower than those of the proposed metric.
4.5. Performance on Individual Distortion Types The proposed metric allows to first identify the type of distortion and then to apply the best performing impairment metrics together with an adapted fusion strategy based on the binocular findings. Hence, it appears important to analyze the performance on each individual distortion type. In this experimental part, we applied our metric and the set of metrics described above separately to the different sets of distorted images, i.e. Blur, FF, JP2K, JPEG and WN, from LIVE 3D II database and IEEE stereoscopic database. The performance is examined using LCC, SROCC and RMSE. The results are gathered in Tables 9 and 10, and several observations could be made based on them. For LIVE 3D II database, our metric outperforms the whole set of used metrics except for WN where it ranks second for LCC and SROCC. In that specific case, it is important to highlight that the best performing metric with regard to LCC is a FR 3D-IQA metric, namely Chen1. It is not the case however for SROCC where the best performing metric is a NR 3D-IQA metric, namely Chen2. Nevertheless, the figures obtained with our metric for WN are by far competitive. One can note that regarding WN, our metric shows the best RMSE performance. For the IEEE stereoscopic database, the behavior is quite variable depending on both distortion type and performance metric. In this case our metric is competing with 2D/3D FR metrics having access to pristine images. Nevertheless, the proposed metrics succeeded in achieving comparable performance for the different explored distortions.
4.6. Asymmetry-based Performance Evaluation It has been demonstrated that 2D metrics may achieve acceptable results for symmetrically-distorted 3D stereoscopic images. The performance drops drastically when it comes to asymmetric distortion mainly because of the simplistic fusion strategy. Our metric relies on the findings regarding the binocular fusion
26
Table 9: Performance evaluation (LCC, SROCC, RMSE) on individual distortion types including Blur, FF, JP2K, JPEG and WN, on LIVE 3D IQA Phase II. Methods PSNR SSIM MS-SSIM Gorley [8] Benoit [9] You [10] Chen1 [15] Lin [18] Chen2 [29] StereoQUE [30] Shao [34] Proposed (FW) Proposed (AW)
Blur 0.660 0.855 0.798 0.872 0.535 0.712 0.963 0.671 0.941 0.878 0.916 0.974 0.958
FF 0.715 0.869 0.871 0.685 0.807 0.886 0.901 0.739 0.932 0.836 0.850 0.957 0.961
LLC JP2K 0.607 0.671 0.757 0.384 0.784 0.825 0.834 0.744 0.899 0.867 0.846 0.936 0.948
JPEG 0.473 0.673 0.843 0.363 0.853 0.808 0.862 0.584 0.901 0.829 0.789 0.905 0.903
WN 0.884 0.931 0.951 0.733 0.926 0.857 0.957 0.909 0.947 0.920 0.920 0.953 0.933
Blur 0.655 0.842 0.801 0.384 0.455 0.758 0.908 0.711 0.900 0.846 0.857 0.928 0.892
FF 0.742 0.834 0.832 0.475 0.773 0.817 0.884 0.701 0.933 0.860 0.819 0.952 0.940
SROCC JP2K JPEG 0.689 0.458 0.701 0.679 0.797 0.846 0.385 0.163 0.751 0.867 0.742 0.653 0.814 0.843 0.719 0.613 0.867 0.867 0.864 0.839 0.837 0.803 0.927 0.886 0.928 0.902
WN 0.901 0.921 0.945 0.534 0.923 0.884 0.940 0.907 0.950 0.932 0.930 0.947 0.934
Blur 9.487 7.228 8.390 5.628 11.763 8.859 3.747 9.322 4.725 6.662 6.484 3.137 3.019
FF 7.612 5.703 5.652 8.155 6.894 6.484 4.966 7.749 4.180 6.519 6.481 3.314 3.201
RMSE JP2K 7.871 7.279 6.420 9.113 6.096 5.239 5.562 6.559 4.298 5.087 5.618 3.443 3.522
JPEG 7.624 5.421 7.418 8.347 3.787 4.596 3.865 5.952 3.342 4.756 5.015 3.426 3.265
WN 4.078 3.922 3.327 6.492 4.028 5.774 3.368 4.465 3.513 4.325 4.816 3.240 3.482
Table 10: Performance evaluation (LCC, SROCC, RMSE) on individual distortion types including Blur, FF, JP2K, JPEG and WN, on IEEE stereoscopic. Methods PSNR SSIM MS-SSIM Gorley [8] Benoit [9] You [10] Chen1 [15] Lin [18] StereoQUE [30] Proposed
Blur 0.764 0.689 0.669 0.509 0.734 0.846 0.842 0.794 0.784 0.848
FF 0.789 0.900 0.908 0.583 0.882 0.887 0.923 0.837 0.884 0.889
LLC JP2K 0.835 0.839 0.867 0.487 0.824 0.745 0.708 0.762 0.775 0.839
JPEG 0.690 0.923 0.927 0.692 0.915 0.880 0.894 0.869 0.908 0.933
WN 0.672 0.863 0.895 0.524 0.839 0.863 0.849 0.869 0.877 0.902
Blur 0.752 0.688 0.665 0.428 0.718 0.800 0.831 0.743 0.805 0.808
FF 0.768 0.910 0.921 0.561 0.894 0.916 0.917 0.812 0.894 0.907
SROCC JP2K JPEG 0.829 0.646 0.816 0.915 0.830 0.921 0.505 0.671 0.812 0.904 0.794 0.902 0.657 0.913 0.748 0.844 0.755 0.918 0.805 0.925
WN 0.624 0.885 0.879 0.550 0.834 0.852 0.876 0.841 0.854 0.923
Blur 6.762 4.065 4.552 8.551 4.328 3.084 3.527 7.579 5.264 4.645
FF 5.237 4.217 5.108 7.727 4.847 4.225 4.186 5.870 6.316 5.027
RMSE JP2K 6.982 3.886 4.593 7.323 4.058 3.741 3.847 4.315 4.857 5.023
JPEG 5.733 5.884 3.956 8.193 4.662 4.815 4.246 4.399 6.536 4.684
WN 6.973 5.202 4.231 8.077 5.064 5.073 4.183 5.484 4.972 4.463
in order to achieve an agreement with the fusion process happening in the HVS. It appears obvious that we evaluate the performance of the proposed metric on symmetric and asymmetric image sets and compare the obtained results with the metrics described above. The results of this asymmetry-based evaluation are given in Table 11. One can notice that the propose metric achieves the best performance whatever the used database. It is important to remind that the fusion strategy is only used in the case of asymmetry and that for symmetric image pairs, the final score is a simple average of left and right scores. This explains why the performance figures of fixed and adaptive weighing strategies are the same for the symmetric set. The performance of 2D FR metrics are relatively weak for the asymmetric set which was somewhat expected because of the simplistic fusion strategy. However, it should be noted that Shao’s metric propose poor performance for the asymmetric set when used on LIVE 3D II database. The performance of stereoQUE are highly variable depending on the used database. For instance, the SROCC figure could be acceptable for LIVE 3D II database while it is poor for Waterloo-IVC I & II databases. The adaptive weighting strategy outperforms the fixed weighting one because of the nature of the fusion that takes into account a priori knowledge about the binocular fusion. Nevertheless, the gap between both strategies is low and even
27
Table 11: Performance comparison on symmetrically (Sym) and asymmetrically (Asym) distorted stereoscopic images of the LIVE 3D Phase II and Waterloo IVC phase I & II databases. Methods PSNR SSIM MS-SSIM Gorley [8] Benoit [9] You [10] Chen1 [15] Lin [18] Chen2 [29] StereoQUE [30] Shao [34] Proposed (FW) Proposed (AW)
LIVE 3D LCC Sym Asym 0.784 0.455 0.849 0.717 0.897 0.705 0.557 0.384 0.769 0.732 0.802 0.763 0.932 0.864 0.596 0.741 N/A N/A 0.907 0.811 0.911 0.565 0.935 0.957 0.935 0.909
Phase II SROCC Sym Asym 0.710 0.527 0.843 0.677 0.902 0.687 0.378 0.154 0.860 0.671 0.756 0.653 0.923 0.842 0.605 0.668 0.918 0.834 0.857 0.872 0.896 0.524 0.928 0.892 0.928 0.882
WATERLOO-IVC 3D Phase I LCC SROCC Sym Asym Sym Asym 0.640 0.543 0.642 0.510 0.847 0.638 0.875 0.650 0.598 0.608 0.598 0.564 0.584 0.493 0.566 0.475 0.804 0.662 0.728 0.604 0.832 0.695 0.721 0.549 0.891 0.769 0.855 0.628 0.829 0.645 0.688 0.592 N/A N/A N/A N/A 0.827 0.705 0.842 0.615 N/A N/A N/A N/A 0.910 0.874 0.902 0.860 0.910 0.882 0.902 0.869
WATERLOO-IVC 3D Phase II LCC SROCC Sym Asym Sym Asym 0.682 0.584 0.671 0.515 0.750 0.640 0.688 0.615 0.836 0.696 0.749 0.690 0.695 0.558 0.683 0.537 0.718 0.532 0.620 0.551 0.775 0.631 0.658 0.521 0.848 0.634 0.768 0.567 0.796 0.544 0.644 0.527 N/A N/A N/A N/A 0.860 0.647 0.771 0.568 N/A N/A N/A N/A 0.914 0.844 0.915 0.801 0.914 0.845 0.915 0.804
negligible in some cases.
5. Conclusion In this paper, we presented a novel blind quality assessment approach for stereoscopic images. The prominent contributions of this work lie in the determination of the distortion type and how this distortion impairs the stereo-pair (symmetrically or asymmetrically) prior to the process of image quality prediction. The fact of having these two information about the distortion available allows to choose the most appropriate binocular fusion strategy when combining the respective quality scores of the left and right views into an overall 3D image quality score. The experimental results performed on four among the most used databases show that our method achieves high correlation with human judgment. It can be concluded that the proposed metric is competitive with regards to the most efficient FR and NR 3D-IQA metrics. Also, it is important to emphasize that this proposed objective assessment way allows to handle effectively both the symmetric and asymmetric distortions of stereoscopic images without providing any side information. Since the proposed framework is modular, both distortion identification and distortion distribution procedures could be exploited for a broad range of issues. For instance, the information about the asymmetry nature of the stereo-pair can be exploited for the assessment of visual fatigue or used to tune the asymmetry level in the stereoscopic image/video coding schemes.
28
References [1] P. Seunti¨ ens, L. Meesters, W. IJsselsteijn, Perceived quality of compressed stereoscopic images: effects of symmetric and asymmetric JPEG coding and camera separation, ACM Transactions on Applied Perception 3 (2) (2006) 95–109. [2] M. Lambooij, W. IJsselsteijn, D. G. Bouwhuis, I. Heynerickx, Evaluation of stereoscopic images: Beyond 2-D quality, IEEE Trans. Broadcast. 57 (2) (2011) 432–444. [3] M. T. Pourazad, Z. Mai, P. Nasiopoulos, K. Plataniotis, R. K. Ward, Effect of brightness on the quality of visual 3D perception, in: Int. Conf. Image Process. (ICIP 2011), Brussels, Belgium, Sep. 2011, pp. 989–992. [4] S. Winkler, P. Mohandas, The evolution of video quality measurement: From PSNR to hybrid metrics, IEEE Trans. Broadcasting 54 (3) (2008) 660–668. [5] P. Campisi, P. L. Callet, E. Marini, Stereoscopic images quality assessment, in: Proc. European Signal Processing Conference (EUSIPCO 2007), Poznan, Poland, Sep. 2007. [6] C. T. E. R. Hewage, S. T. Worrall, S. Dogan, A. M. Kondoz, Prediction of stereoscopic video quality using objective quality models of 2-D video, Electronics Letters 44 (16) (2008) 963–965. [7] S. L. P. Yasakethu, C. T. E. R. Hewage, W. A. C. Fernando, A. M. Kondoz, Quality analysis for 3D video using 2D video quality models, IEEE Trans. Consum. Electron. 54 (4) (2008) 1969–1976. [8] P. Gorley, N. Holliman, Stereoscopic image quality metrics and compression, in: Proc. SPIE Stereoscopic Displays and Applications XIX, San Jose, CA, USA, Jan. 2008. [9] A. Benoit, P. Le Callet, P. Campisi, R. Cousseau, Quality assessment of stereoscopic images, EURASIP J. Image Video Process. 2008 (2009). [10] J. You, L. Xing, A. Perkis, X. Wang, Perceptual quality assessment for stereoscopic images based on 2D image quality metrics and disparity analysis, in: Proc. Int. Work. VPQM, Scottsdale, AZ, USA, Jan. 2010. [11] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (2004) 600–612. [12] A. Maalouf, M.-C. Larabi, CYCLOP: A stereo color image quality assessment metric, in: Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP 2011), May 2011, pp. 1161–1164. [13] R. Bensalma, M.-C. Larabi, A perceptual metric for stereoscopic image quality assessment based on the binocular energy, Multidimensional Systems and Signal Processing 24 (2) (2013) 281–316. [14] S. Ryu, D. H. Kim, K. Sohn, Stereoscopic image quality metric based on binocular perception model, in: IEEE International Conference on Image Processing (ICIP’2012), Orlando, FL, USA, Sep.-Oct. 2012, pp. 609–612. [15] M.-J. Chen, C.-C. Su, D.-L. Kwon, L. K. Cormack, A. C. Bovik, Full-reference quality assessment of stereopairs accounting for rivalry, Sig. Process.: Image Commun. 28 (9) (2013) 1143–1155. [16] Y. Lin, J. Yang, W. Lu, Q. Meng, Z. Lv, H. Song, Quality Index for Stereoscopic Images by Jointly Evaluating Cyclopean Amplitude and Cyclopean Phase. IEEE Journal of Selected Topics in Signal Processing, 11 (1) (2017), 89–101.
29
[17] F. Shao, W. Lin, S. Gu, G. Jiang, T. Srikanthan, Perceptual Full-Reference Quality Assessment of Stereoscopic Images by Considering Binocular Visual Characteristics, IEEE Trans. Image Process. 22 (5) (2013) 1940–1953. [18] Y.-H. Lin, J.-L. Wu, Quality assessment of stereoscopic 3D image compression by binocular integration behaviors, IEEE Trans. Image Process. 23 (4) (2014) 1527–1542. [19] V. De Silva, H. K. Arachchi, E. Ekmekcioglu, A. Kondoz, Towards an impairment metric for stereoscopic video: A full-reference video quality metric to assess compressed stereoscopic video, IEEE Trans. Image Process. 22 (9) (2013) 3392–3404. [20] J. Wang, A. Rehman, K. Zeng, S. Wang, Z. Wang, Quality prediction of asymmetrically distorted stereoscopic 3D images, IEEE Trans. Image Process. 24 (11) (2015) 3400–3414. [21] J. Wang, Z. Wang, Perceptual quality of asymmetrically distorted stereoscopic images: the role of image distortion types, in: Proc. Int. Work. VPQM, Chandler, AZ, USA, Jan. 2014, 29–31. [22] Y. Zhang, D. Chandler, 3D-MAD: a full reference stereoscopic image quality estimator based on binocular lightness and contrast perception, IEEE Trans. Image Process. 24 (11) (2015) 3810–3825. [23] X. Geng, L. Shen, K. Li, P. An, A stereoscopic image quality assessment model based on independent component analysis and binocular fusion property, Sig. Process.: Image Commun. 52 (2017) 54–63. [24] J. Gangyi, H. Xu, M. Yu, T. Luo, Y. Zhang, Stereoscopic Image Quality Assessment by Learning Non-negative Matrix Factorization-based Color Visual Characteristics and Considering Binocular Interactions, Journal of Visual Communication and Image Representation. 46 (2017) 269–279 [25] J. Yang, Y. Liu, Z. Gao, R. Chu, Z. Song, A perceptual stereoscopic image quality assessment model accounting for binocular combination behavior, Journal of Vis. Commun. Image Represent. 31 (2015) 138–145. [26] Z. M. P. Sazzad, R. Akhter, J. Baltes, Y. Horita, Objective no-reference stereoscopic image quality prediction based on 2D image features and relative disparity, Advances in Multimedia, ID 256130 (2012). [27] K. Gu, G. Zhai, X. Yang, W. Zhang, No-reference stereoscopic IQA approach: From nonlinear effect to parallax compensation, Journal of Electrical and Computer Engineering (2012). [28] S. Ryu, K. Sohn, No-Reference Quality Assessment for Stereoscopic Images Based on Binocular Quality Perception, IEEE Transactions on Circuits and Systems for Video Technology 24 (4) (2014) 591–602. [29] M.-J. Chen, L. K. Cormack, A. C. Bovik, No-Reference Quality Assessment of Natural Stereopairs, IEEE Trans. Image Process. 22 (9) (2013) 3379–3391. [30] B. Appina, S. Khan, S. S. Channappayya, No-reference Stereoscopic Image Quality Assessment Using Natural Scene Statistics, Sig. Process.: Image Commun. 43 (2016) 1–14. [31] W. Zhou, S. Zhanga, T. Pana, L. Yub, W. Qiua, Y. Zhou, T. Luo, Blind 3D image quality assessment based on self-similarity of binocular features, Neurocomputing. 224 (2017) 128–134. [32] L. Shen, J. Lei, C Hou, No-reference stereoscopic 3D image quality assessment via combined model, Multimedia Tools and Applications (2017) 1-18. [33] F. Shao, K. Li, W. Lin, G. Jiang, M. Yu, Using Binocular Feature Combination for Blind Quality Assessment of Stereoscopic Images, IEEE Signal Processing Letters 22 (10) (2015) 1548–1551.
30
[34] F. Shao, W. Lin, S. Wang, G. Jiang, M. Yu, Blind image quality assessment for stereoscopic images using binocular guided quality lookup and visual codebook, IEEE Transactions Broadcasting 61 (2) (2015) 154–165. [35] B. Julesz, Foundations of Cyclopean Perception. Univ. Chicago Press, 1971. [36] S. B. Steinman, B. A. Steinman, R. P. Garzia, Foundations of Binocular Vision: A Clinical Perspective, New York, USA: McGraw-Hill, 2000. [37] M. Fahle, Binocular rivalry: Suppression depends on orientation and spatial frequency, Vis. Res. 22 (7) (1982) 787–800. [38] R. Blake, D. H. Westendorf, R. Overton, What is suppressed during binocular rivalry?, Perception 9 (2) (1980) 223–231. [39] D. V. Meegan, L. B. Stelmach, W. J. Tam, Unequal weighting of monocular inputs in binocular combinations: implications for the compression of stereoscopic imagery, Journal of Experimental Psychology: Applied 7 (2) (2001) 143–153. [40] P. Seuntiens, L. Meesters, W. IJsselsteijn, Perceptual evaluation of JPEG coded stereoscopic images, in: Proc. SPIE Stereoscopic Displays and Virtual Reality Systems X, Santa Clara, CA, USA, Jan. 2003, pp. 215–226. [41] L. B. Stelmach, W. J. Tam, D. V. Meegan, A. Vincent, P. Corriveau, Human perception of mismatched stereoscopic 3D inputs, in: Proc. IEEE International Conference on Image Processing (ICIP’2000), Vancouver, BC, Sep. 2000, pp. 5–8. [42] P. Seuntiens, L. Meesters, W. IJsselsteijn, Perceived quality of compressed stereoscopic images: Effects of symmetric and asymmetric JPEG coding and camera separation, ACM Trans. Appl. Perception 3 (2) (2006) 95–109. [43] S. Anstis, A. Ho, Nonlinear combination of luminance excursions during flicker, simultaneous contrast, afterimages, and binocular fusion, Vision Research 38 (4) (1998) 523–539. [44] V. De Silva, H. Arachchi, E. Ekmekcioglu, A. Fernando, S. Dogan, A. Kondoz, S. Savas, Psychophysical limits of interocular blur suppression and its application to asymmetric stereoscopic video delivery, in: Proc. 19th International Packet Video Workshop (PV 2012), Munich, Germany, May. 2012, pp. 184–189. [45] P. J. H. Seuntiens, Visual experience of 3D TV, Ph.D. thesis, Eindhoven Univ. Technol., Eindhoven, The Netherlands, 2006. [46] T. H. Falk, Y. Guo, W. Y. Chan, Improving Robustness of Image Quality Measurement with Degradation Classification and Machine Learning, in: Proc. Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, Nov. 2007, pp. 503–507. [47] P. Peng, Z. Li, Image quality assessment based on distortion-aware decision fusion, in: Proc. International Conference on Intelligent Science and Intelligent Data Engineering, Xi’an, China Oct. 2011, pp. 644–651. [48] A. Chetouani, A. Beghdadi, M. Deriche, A hybrid system for distortion classification and image quality evaluation, Sig. Process.: Image Commun. 27 (9) (2012) 948–960. [49] A. K. Moorthy, A. C. Bovik, Blind image quality assessment: From natural scene statistics to perceptual quality, IEEE Trans. Image Process. 20 (12) (2011) 3350–3364.
31
[50] A.C. Bovik, S. Liu, DCT-domain blind measurement of blocking artifacts in DCT-coded images, in: IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, USA, May. 2001. [51] Z. Wang, H.R. Sheikh, A.C. Bovik, No Reference Perceptual Quality Assessment of JPEG Compressed Images, in: Int. Conf. Image Process. (ICIP 2002), 2002. [52] X. Marichal, W. Y. Ma, Z. H. Jiang, Blur determination in the compressed domain using DCT information, in: Int. Conf. Image Process. (ICIP 1999), Kobe, Oct. 1999. [53] P. Marziliano, F. Dufaux, S. Winkler, T. Ebrahimi, Perceptual blur and ringing metrics: application to JPEG2000, Sig. Process.: Image Commun. 19 (2) (2004) 163–172. [54] A. Chetouani, A. Beghdadi, M. Deriche, A new free reference image quality index for blur estimation in the frequency domain, in: IEEE Symposium on Signal Processing and Information Technology (ISSPIT 2009), Ajman, UAE, Dec. 2009, pp. 155–159. [55] H.R. Sheikh, A.C. Bovik, L.K. Cormack, No-Reference Quality Assessment Using Natural Scene Statistics: JPEG2000, IEEE Trans. Image Process. 14 (12) (2005) 1918–1927. [56] D. Van de Ville, M. Kocher, SURE-Based Non-Local Means, IEEE Signal Processing Letters 16 (11) (2009) 973–976. [57] M.J. Buclkey, Fast computation of a discretized thin-plate smoothing spline for image data, Biometrika 81 (2) (1994) 247–258. [58] C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining Knowl. Discov. 2 (2) (1998) 121–167. [59] C. Chang, C. Lin, LIBSVM: A Library for Support Vector Machines, 2001. [Online]. Available: http://www.csie.ntu.edu.tw/ cjlin/ libsvm/. [60] D. Scharstein, R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, International Journal of Computer Vision 47 (1-3) (2002) 7–42. [61] K. Mikolajczyk, C. Schmid, A performance evaluation of local descriptors, IEEE Trans. Pattern Anal.Mach. Intell. 27 (10) (2005) 1615–1630. [62] D.G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision 60 (2) (2004) 91–110. [63] M. Nauge, M.-C. Larabi, C. Fernandez-Maloigne, A statistical study of the correlation between interest points and gaze points, in: Proc. SPIE Conf. Human Vision and Electronic Imaging XVII, Jan. 2012. [64] X. Zhang, S. Wang, S. Ma, W. Gao, A study on interest point guided visual saliency, in: IEEE Picture Coding Symposium (PCS), May. 2015, pp. 307–311. [65] M. A. Fischler, R. C. Bolles, Random Sample Consensus: a Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography, Communications of the ACM 24 (6) (1981) 381–395. [66] (2008). IEEE Standards Association http://grouper.ieee.org/groups/3dhf/
Stereoscopic
Database
[Online].
Available:
[67] ITU-R Recommendation BT.500–11: Methodology for the Subjective Assessment of the Quality of Television Pictures, International Telecommunication Union Std, 2002.
32
*Highlights (for review)
Highlights
A blind quality assessment strategy for stereoscopic images is proposed. Determination of the distortion type and how this distortion impairs the stereo‐pair (symmetrically or asymmetrically). Estimation of perceived 3D image quality based on different binocular combination strategies. The experimental results performed on four widely used databases show a significantly high correlation with human judgment.