A multi-factors approach for image quality assessment based on a human visual system model

ARTICLE IN PRESS Signal Processing: Image Communication 21 (2006) 316–333 www.elsevier.com/locate/image A multi-factors approach for image quality a...

Download PDF

4MB Sizes 2 Downloads 89 Views

Report

Full Text

ARTICLE IN PRESS

Signal Processing: Image Communication 21 (2006) 316–333 www.elsevier.com/locate/image

A multi-factors approach for image quality assessment based on a human visual system model Giaime Ginesu, Francesco Massidda, Daniele D. Giusto Department of Electrical and Electronic Engineering, University of Cagliari, Piazza D’Armi, Cagliari 09123, Italy Received 3 May 2005; received in revised form 9 November 2005; accepted 24 November 2005

Abstract In this paper, a multi-factor full-reference image quality index is presented. The proposed visual quality metric is based on an effective Human Visual System model. Images are pre-processed in order to take into account luminance masking and contrast sensitivity effects. The proposed metric relies on the computation of three distortion factors: blockiness, edge errors and visual impairments, which take into account the typical artifacts introduced by several classes of coders. A pooling algorithm is used in order to obtain a single distortion index. Results show the effectiveness of the proposed approach and its consistency with subjective evaluations. r 2005 Elsevier B.V. All rights reserved. Keywords: Image quality assessment; Human visual system

1. Introduction Compression of digital images has gained tremendous importance in telecommunications. Its wide diffusion is mainly related to the improvement of acquisition and processing technology, the enhancement of digital data resolution and the limits imposed on the transmission bandwidth. In fact, given actual and prospected technologies, data storage and transmission are becoming an issue. Image compression plays then an essential role in minimizing the data size and organizing it into a scalable stream. Several applications make extensive use of digital image information, both commercial and scientiﬁc. The internet, digital photography and television, Corresponding author. Tel.: +39 706755865.

E-mail address: [email protected] (F. Massidda).

narrow-band broadcasting services, e.g. mobile video-telephony, image and video streaming and advanced sensors are just few examples. Telemedicine, for instance, represents a particular application requiring the transmission of critical information over dedicated or conventional networks for remote diagnosis. Since lossless compression is generally unable to achieve compression rates higher than 4, lossy compression may be compulsory in transmission-critical applications. However, lossy coding is often unacceptable, since it inevitably introduces artifacts that might undermine the validity of medical data. Under certain conditions, recent coding algorithms [52,53] provide near-lossless quality and may suit medical imaging [54–56]. All previous tasks are deeply concerned with the performance of image compression algorithms. In order to evaluate their validity, meaningful and reliable quality metrics are required. Building such

0923-5965/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2005.11.005

ARTICLE IN PRESS G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

metrics is certainly a very challenging task. In fact, classical digital compression standards based on the Discrete Cosine Transform (DCT), such as JPEG, MPEG-1,2,4 and H.26x [7–9], and novel standards based on different encoding methods such as JPEG2000, MJPEG2000 and H.264 [52,65], present different distortion artifacts. A common approach consists in the decomposition of the global image distortion into single effects, caused by the different nature of compression methods. Examples of such effects are: blockiness in DCT-based compressed images [21–23, 29,31–34], blurring and ringing distortions in wavelet-based encoding standards and artifacts caused by rapidly changing quality of service levels in multimedia streaming applications, such as packet loss, or noisy AC/DC converted video sequences [11,13,24,25,35,36,58,61,62]. Several approaches to the evaluation of image quality are presented and discussed in the literature. These can be grouped into two main classes: subjective and objective methods. The former estimate the visual quality by subjective tests where a group of persons evaluate several compressed or corrupted images, with appropriate criteria, methodologies and hardware. In order to obtain signiﬁcant results from this type of data analysis, several rules and procedures have been standardized by the ITU [1]. With subjective evaluations, the ﬁnal users and the human visual system (HVS) are directly involved. Assessments are exactly a measure of the perceived quality. Nevertheless, performing such tests is complex and time and money consuming. Consequently, subjective testing is rarely employed, e.g. for performance assessment of particular compression techniques, for evaluating different quality metrics or, more often, for the parametric optimization of compression techniques [6–8]. Objective methods try to estimate the amount of distortion within an image using mathematical operations in the 2D or 3D spatial/frequency domain of images or video sequences. Standard objective distortion measures as MSE, SNR and PSNR, represent simple methods that give a measure of the differences between two images. Although these measures are quite good for certain applications, they provide better results when applied to the evaluation of non-structured analogical distortion, thanks to their pixel-by-pixel analysis capability. Digital systems and coders generally introduce well-known distortion effects and artifacts within videos and images that cannot

317

be evaluated properly with classical indexes, like those previously mentioned. Subjective and objective metrics can be computed from a comparison between reference and compressed images or using only the compressed or distorted images. For subjective methods, these two approaches are called Double Stimulus (DS-) and Single Stimulus Continuous Quality Scale (SS-CQS), respectively, while the corresponding approaches for objective methods are called full-reference and no-reference. Another category of objective methods, called reduced-reference, use the compressed/ distorted images and, in addition, some a priori information of the reference image and encoding method. The main advantage of no-reference metrics is the possibility to assess the quality level even if the reference data is unknown, as in broadcast transmissions. This is generally possible thanks to a-priori knowledge of both typical distortion effects and encoding method. Furthermore, no-reference metrics allow for the evaluation of distortion type and level on the receiver side and are particularly useful in real-time transmission systems for in-service applications [6,10], e.g. postprocessing for reducing the blockiness effect. However, no-reference methods only provide a rough evaluation of the image/video quality and are not indicated for applications that require high reliability, e.g. sensible data storage, scientiﬁc or medical imaging. In such cases, full-reference methods are the only possible choice. Although several objective methods are implicitly based on some HVS characteristics [4,12,67–69], HVS-based objective methods have been recently proposed for blockiness assessment [8,10,17,19, 20,30,59,60] and for global distortion measures [5,12,15]. They potentially represent the most general and accurate approaches. In the last years, a great work has been done by the Video Quality Expert Group (VQEG) [2] in order to collect, compare and evaluate a set of psychovisual methods speciﬁcally studied for the compressed digital video distortion evaluation [3]. Some full-reference objective quality measures, e.g. BTFR from British Telecom and VQM from the National Telecommunications and Information Administration, have been chosen among others thanks to their performance and high correlation factors with respect to subjective evaluations, and they have been recently standardized by ITU [63]. A separate study and analysis of the different types of possible distortions, which arise from the

ARTICLE IN PRESS 318

G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

encoding and transmission methods, is fundamental for producing effective objective metrics. Typically, a global distortion measure is obtained as a weighted functional of the single distortion evaluations [5,12]. This pooling procedure must normally take into account the HVS perception level and sensibility with respect to the different distortion effects. In this paper, a novel framework for full-reference image quality assessment is proposed. The suggested approach is based on a HVS mathematical model. Images are ﬁrst pre-processed though a state of the art algorithm, which models the luminance masking, contrast masking and contrast sensitivity effects. Pre-processed reference and compressed images are then compared using three novel fullreference distortion factors for blockiness, edge errors and visual impairments assessment. Finally, non-linear regression techniques and an effective pooling algorithm are used, in order to obtain a single distortion index. Results prove that the proposed method achieves a good correlation between objective and subjective evaluations. Furthermore, this approach allows obtaining independent impairment measurements in order to separately assess several distortion effects or compression artifacts. This paper is organized as follows. In Section 2 a background on HVS and the proposed HVS mathematical model are provided luminance masking and contrast sensitivity function (CSF) are considered and embedded in the ﬁnal framework. In Section 3 the proposed full-reference distortion measures and the global quality index are described. Details are given on several factors used to evaluate distortion artifacts. Section 4 provides the objective and subjective results. Finally, conclusions are drawn in Section 5.

models that try to describe, with increasing level of accuracy, the behavior of the HVS. A number of high-level human vision effects are known, accepted and modeled in an almost unique manner. Some of these effects are taken into considerations in the proposed HVS-based metrics and discussed in this section.

2. Human visual system

where I is the perceived luminance level, L the original intensity level and g a control parameter which values are in the range 1=2 1=3 [42] (Fig. 1). The chosen value for g is 1/2.2 [43]. While Weber’s intensity perception law can be directly applied only to reduced complexity images, a combination of variable luminance intensity values with different contrast levels is often present in natural scenes. Such combinations of intensity and contrast masking effects are very hard to be combined into a single mathematical model. Thus, a simpliﬁed approach that considers luminance and contrast effects complementarily is often adopted.

We refer at the HVS as the complex set of biological and psychological elements that allow the human vision of the physical world [37,38]. Such system is generally divided into two parts: the transducer, i.e. the eye, and the processing unit, i.e. the human brain. Actually, light acquisition and interpretation by the eyes/brain apparatus are very complex tasks that cannot be precisely modeled. The joint analysis of vision phenomena from a biological and psychological point of view have produced, in the recent past, several mathematical

2.1. Luminance and contrast masking effects An important relationship between human perception and different physical stimuli have been theorized and empirically proven by the psychologist E.H. Weber. Let us consider two similar physical stimuli with different magnitude levels, i.e. different luminance stimuli, L. Let us suppose to increase both stimuli with the same value DL in consecutive step until the new values can be considered different with respect to the starting ones. Weber’s law asserts that the minimum DL difference between the starting and the new perceived values is proportional to the initial L value: DL ¼ k L.

(1)

The k parameter has to be set in order to better ﬁt Weber’s Law to different stimuli. Weber’s Law is computed empirically but its effectiveness may be proven for several application ﬁelds. By increasing progressively the value of DL, it is possible to ﬁnd the exact DL that allows human eyes to perceive the different level with respect to the starting one, L. This is called visibility threshold. Generally, nonlinearity of Weber–Fechner–Stevens’s law is taken into account using a standard Gamma correction approach [39–41]: L g I ¼ 255 , (2) 255

ARTICLE IN PRESS G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

319

Fig. 2. Test image for contrast masking effect. Visibility threshold are measured varying progressively the distance of the stimulus (vertical bar) from the strong edge (high contrast zone).

Such effects are then separated into two uncorrelated aspects affecting the human perception. Weber’s law can be used within a global HVS model and applied to image quality assessment if luminance and contrast masking effects are considered independent and modeled separately. The crucial point is represented by the exact knowledge of the behavior of visibility thresholds for the same visual stimuli in different contrast level areas. It has been empirically proven that if a distortion stimulus is located close to a highly contrasted background, the visibility threshold increases. This phenomenon is called contrast masking effect [44]. Contrast masking, often called texture or activity masking, represents the effect of a strong signal variation (masker) that masks and hides other image details (target) close to the masker signal (Fig. 2). Contrast masking is a local effect. The masking effect decreases rapidly when the distance from the high contrast zone increases (all pixels within a visual angle of about 0.11 are normally considered to be masked [18]). 2.2. Contrast sensitivity function HVS perception of contrast depends on the spatial frequency of the visual stimulus. The human eye behavior resembles that of a band-pass ﬁlter: higher sensibility and, thus, higher deﬁnition is achieved for middle frequencies while sensibility is reduced for low and high frequency components [46,57]. In Fig. 3 it is possible to evaluate the exact luminance level (vertical axis) that allows observers to perceive the variation at each spatial frequency.

sensitivity

Fig. 1. Gamma correction function for two values of g.

spatial frequency (cycles/degree)

Fig. 3. Test image for HVS frequency response evaluation with superimposed contrast sensitivity function.

These points can be plotted into a continuous curve, representing the visibility thresholds for different spatial frequencies. Such threshold is called Contrast sensitivity function. It is very important to evaluate the visual angle, observation distance and image size in order to obtain a better representation of the CSF function in terms of cycles/degree. Sensibility has a maximum at 3–4.5 cycles/degree, decreases rapidly at lower and higher frequencies [47]. In the human perception literature, CSF is often approximated by the function S(o) [48,49]: 3 2 2 2 2 SðoÞ ¼ expðs o =2Þ expð2s o Þ 2 1 þ expðbðo o0 ÞÞcos4 ð2yÞ , ð3Þ 1 þ expðbðo o0 ÞÞ where s, b and o0 are set to better ﬁt the real observation conditions. In the model proposed in this paper, b ¼ 8, o0 ¼ 11:13 cycles/degree [50] and y ¼ arctan(u/v).

ARTICLE IN PRESS G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

320

In order to take into account the two spatial frequency components, the model becomes S(o)S(u, v), with u and v the horizontal and vertical components, respectively. If a two-dimensional spatial frequency transform of the image, e.g. DCT, is considered, F ðu; vÞ, it is possible to evaluate the perceived quality level by considering Sðu; vÞ as a weighting factor that models the human visual response to visual stimuli at the same frequency: F 0 ðu; vÞ ¼ Sðu; vÞ F ðu; vÞ.

(4)

2.3. HVS-based methods Recently, several HVS-based objective quality metrics have been developed embedding simple and elaborate HVS models [15,27,28]. In order to describe their typical structure, a generic scheme is illustrated in Fig. 4 [9]. The frequency analysis is initially performed in the transform domain. The signal is divided into several frequency subbands or coefﬁcients, corresponding to the different HVS responses. By performing such analysis for both original and distorted images, it is possible to map errors according to their spatial frequencies. Different weights are assigned to each error component through the CSF [26]. Subsequently, a particular masking model, including both luminance and contrast/texture masking is applied. Finally, the global distortion measure is obtained for the entire image with a weighted data pooling at different frequencies and spatial locations. This general scheme can be used for digital distortion assessment for both static and video sequences. In the last case, the CSF and the masking models should also include the temporal dimension.

3. Proposed method In this section, the building blocks for the proposed full-reference image quality index are comprehensively illustrated. The HVS model proposed in the Section 2 is used [39,40,48,49,59]. For the reference and compressed images, luminance levels are altered through (2) and the frequency domain CSF is considered using (3) and (4), as shown in Fig. 5. Then, separate quality indexes are computed by taking into account the effects of blockiness, edge errors with blurring and visual impairments. Finally, a pooling algorithm is used in order to obtain a single quality measure (Fig. 6). The proposed method follows an approach similar to the PQS index [5]. PQS is a full-reference HVS-based quality index to evaluate perceived distortion of compressed images. The HVS model is based on the Weber–Fechner’s Law and a CSF. Five factors are considered: monitor-related distortion, perceived global distortion, blockiness, correlated errors and edge errors, respectively. Perceptual error maps are generally used as input for the computation of each distortion factor. Parameters are optimized through multiple regression analysis of PQS with MOS over 75 encoded images. As in PQS the proposed method deﬁnes the HVS through a gamma function and a CSF. Moreover the method is based on the deﬁnition of several factors for taking into account the inﬂuence of different distortion effects. However, the proposed

Perceived Reference Frame

Blockiness Edge

IR ID

Frequency Analysis

CSF

Masking

Pooling

Q

Perceived Compressed Frame

Fig. 4. Generic scheme for full-reference objective metrics with HVS model. IR and ID represent reference and distorted image.

Original Frame

YUV

Luma Extraction

Visual

P R O C E S S I N G

Fig. 6. Proposed index block scheme.

Gamma

CSF

Luminance Masking

Frequency Masking

Fig. 5. Proposed HVS pre-processing scheme.

Perceived Frame

FR Index

ARTICLE IN PRESS G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

method applies the HVS model for the computation of all distortion factors, while factors 1 and 5 of PQS are computed from original and compressed images only. Moreover, PQS deﬁnes ﬁve factors, describing monitor-related distortion, perceived global distortion, blockiness, correlated errors and edge errors, respectively. The ﬁrst two factors represent both a measure of global distortion and their contribution to the global index is similar. In fact, the correlation coefﬁcient obtained with F1 and F2 singly (Table V in [5]) are very similar and the combination of F1 and F2 results in the only case of decreased correlation with 2 factors. This is also conﬁrmed by the fact that the same correlation is obtained by considering either F1 or F2 together with the other three factors indifferently. Moreover, the best correlation result is obtained with factors 2, 4 and 5 only. The proposed method, on the other hand, deﬁnes only three factors to precisely model three macro distortion effects. Such choice results in a good tradeoff between the deﬁnition of a complete distortion model and computational complexity. In fact, other factors have been investigated but were temporarily discarded due to computational reasons. For instance, a wavelet domain blurring metric has been developed and might be the object of future work. 3.1. Blockiness Blockiness is the most typical artifact introduced by block-based image coders. Such coders decompose the original signal into square blocks, which are processed independently, in order to exploit the decorrelation of visual information. Encoders use only the information inside each block, and do not take into account transitions between neighboring blocks. Blockiness is generally very annoying; the visual degradation is evidenced by the detail loss inside each block, which results in uniform blocks at the highest compression ratio, and the appearance of horizontal and vertical borders in correspondence of each block transition. The choice of introducing a blockiness factor comes from both the vast diffusion of block-based coders and the peculiarity of the produced effect: the compression introduces a structured distortion that is easily recognizable by the user thanks to its regularity. Several methods for the evaluation of DCT based coders have been proposed in the literature [14]. Since the proposed quality index must be universal, i.e. it has not to be specialized for a particular class

321

of coders, a very selective blockiness metric has been developed, so to avoid false blockiness recognition in case of very high distortion of different kind. The proposed blockiness metric analyzes the compressed image and evaluates the local error between original and compressed data in a neighborhood of each block transition. In particular, the algorithm has been designed in order to verify the presence of 8 8 pixels blocks, although such choice allows measuring the blockiness effect with block conﬁgurations that are multiple of 8. The analysis is performed on all horizontal and vertical block transitions. The algorithm decides whether a block transition presents the characteristics of structured noise or not and, consequently, estimates the weight of the introduced artifact. Detail loss inside each block and differences between pixel errors along directions orthogonal to neighboring block contours are considered as blockiness criteria. The information loss deriving from block quantization is computed from the compressed image through three parameters: the standard deviations of the two regions adjacent to the block transition and the standard deviation of the block transition area, both for horizontal and vertical transitions. Let us deﬁne s1, s2, standard deviations of pixels belonging to sets P1 and P2 respectively, and st the standard deviation of pixels belonging to the two lines across adjacent blocks that represent Pt set (Fig. 7). A block becomes visible when the standard deviation of the transition region st is higher than the standard deviations s1 and s2 computed in each block. From the previous deﬁnitions the conditions for block visibility are: st4th s1 and st4th s2, being th41. The presence of blocking effect in the compressed image is not necessarily caused by the coding algorithm, but may derive from original

P1

Pt

P2

(a)

(b)

Fig. 7. Regions for horizontal (a) and vertical (b) transitions.

ARTICLE IN PRESS 322

G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

image information, e.g. a tiling texture. In order to evaluate the real blocking effect, the tendency of image difference to show relevant variations along directions orthogonal to the transitions has been measured. Fig. 8 illustrates the point-to-point difference (stretched) between original and blockbased compressed images at high compression ratio. It can be noticed that the blocking effect results in sharp variations in correspondence of block transitions. By deﬁning eði; jÞ the pixel-by-pixel difference between original and compressed images, the errors along directions orthogonal to the transition edges are computed as average square errors: vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u b u1 X H? ¼ t ½eði0 ; j 0 þ kÞ eði0 1; j 0 þ kÞ2 , (5) b k¼1

Concerning the quantitative aspect of the error factor, a combination of the blocking error parameters has been designed in order to be consistent with the visual effect and easily normalized to a desired range 0–1. The error contribution is proportional to the orthogonal errors and the ﬂattening effect inside each block. The square pooling has been chosen for the single errors summation. By deﬁning nh and nv, the number of block transitions along the horizontal and vertical directions, respectively, the two error components are: 2 nh X H? Eh ¼ ; H ? þ s1h þ s2h i¼1 2 nv X V? Ev ¼ . ð7Þ V ? þ s1v þ s2v i¼1

vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u b u1 X V? ¼ t ½eði0 þ k; j 0 Þ eði0 þ k; j 0 1Þ2 . b k¼1

With the increase of detail loss inside the block, s1-0 and s2-0, so that Eh and Ev errors tend to one. The same behavior can be observed for high values of H? and V?. The blockiness errors along horizontal and vertical directions are linearly averaged in order to produce the desired metric. The ﬁnal index, proportional to the sum of the two errors is then normalized as rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Eh þ Ev F1 ¼ . (8) nh þ nv

(6)

Eqs. (5) and (6) are compared with the standard deviation of the compressed image transition area and must verify: H?4k sth and V?4k stv, with k41. Such assessment allows deciding whether the transition area derives from the original image information or was introduced by the block-based coder.

3.2. Edge errors

Fig. 8. Point-to-point difference between original and blockbased compressed images (high CR).

Edges comprise a fundamental role in the process of shape and object recognition [58]. By recognizing the contours, the vision system is able to correctly acquire the visual information and understand the semantic of the scene. In the proximity of contours, the image shows high contrast, causing the error visibility threshold to rise locally. Although such phenomenon allows for higher error tolerance, edge distortions may result in wrong edge recognitions or coarse shape reconstruction. In order to evaluate the edge similarity between original and compressed image, an edge estimator is required. The Canny edge extractor [45] has been chosen for the task. The hysteresis process is applied to the edge-map with parameters thlow and thhigh, where thlowothhigh. Those pixels whose values are higher than thhigh are selected as high visibility points and considered as starting positions for edge tracking. All pixels whose values are higher than

ARTICLE IN PRESS G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

thlow are marked as edge points. The proposed method employs three pairs of hysteresis thresholds, in order to determine three edge levels with different importance. This procedure resembles the observer’s behavior when recognizing shapes from a scene, and allows attributing the same importance to all points that are perceived as belonging to the same contour, independently from single gradient values. In Fig. 9 an example of the Canny edge detector result is shown for image chest22. The Canny ﬁlter is applied to the original image, and the resulting edge map, edge(i,j), is used as a mask for edge errors computation. The edge mask may assume one of three possible positive values, indicating different edge strengths, while zero indicates ﬂat areas. Each edge pixel is employed as center of mass for selecting the neighborhoods in which the error metric is computed. Local edge strength r and direction y are employed as main contour characteristics. For each edge point, the difference between edge strength and direction in original and compressed images is considered. Being r1, y1 and r2, y2 the strength and direction of original and compressed edge, respectively, both the absolute strength difference, Dr ¼j r1 r2 j, and the angular difference, Dy ¼j y1 y2 j, are normalized and combined in order to obtain the total edge

323

error as Errði; jÞ ¼

Dy Dr þ . p=2 jr1 þ r2 j

(9)

The error factor is then the sum of squared errors weighted by the corresponding edge strength values: vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u h X w u 1 X F2 ¼ t edgeði; jÞ Err2 ði; jÞ. (10) h w i¼1 j¼1 3.3. Visual impairments The global distortion depends on the local visibility of each distorted image pixel. In order to measure such behavior, two local effects are taken into account: block similarity and luminance distortion. Such effects are evaluated on each pixel’s neighborhood (window) and averaged over the whole image. Typically, the window size is chosen in order to be visible under an angle of a few tenth degrees from the assumed viewing distance [18]. For instance, the window size is about 5 5 pixels for a 512 512 pixels image viewed from a distance of 4 times its length. Furthermore, the distortion is computed by considering the threshold levels imposed by the luminance and contrast masking

Fig. 9. Canny edge detector output applied to chest22 test image, with 3 hysteresis levels.

ARTICLE IN PRESS G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

324

effects described in Section 2. For each original and compressed image pixel, the algorithm computes several parameters representative of luminance and contrast, evaluated through the mean value and the standard deviation for each pixel neighborhood [12,44,51,66]. Once local distortion measures are derived, the global visual impairments factor is computed with a pooling algorithm in the entire image plane. Let us consider p1, m1, s1 and p2, m2, s2 the array of image pixels, the average value and the standard deviation in the neighborhood of the original and compressed image, respectively. It is possible to connect these parameters with luminance and contrast levels: m1 and m2 represent the local perceived luminance of the two images, while s1 and s2 are related to the local contrast. In order to evaluate the block similarity distortion, several expressions are evaluated. Error and correlation measures are globally derived from the two images. The correlation between image neighborhoods is computed as s12 r ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ , s1 s2

(11)

where s12 is the co-variance between considered blocks, and it is deﬁned as sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ n 1X s12 ¼ ½p ðkÞ m1 ½p2 ðkÞ m2 . (12) n k¼1 1 Moreover, the algorithm computes the standard deviation, serr, of the error array, e ¼ p1 p2 , representing the difference between corresponding pixels within the selected neighborhood. The error standard deviation, serr, is used as estimation of the image noise, while the correlation, r, is proportional to the block similarity and has values in the range 1 to 1; large positive values represent good similarity. Thus, the block similarity distortion is computed through the standard deviations and correlation values. The estimated noise is compared with the original image contrast through the introduction of a visual threshold, th; those pixels verifying serr4th s1 contribute to the similarity distortion factor, deﬁned by E sim ¼

serr 1r , 2 s 1 þ s2

(13)

where the error factor serr/(s1+s2) is multiplied by the similarity coefﬁcient, (1r)/2.

The single error contribution is proportional to the estimated noise strength serr and inversely proportional to the sum of image standard deviations, since the HVS masks higher noise values where high contrast is present. The luminance error is computed through the mean values, which represent the perceived local luminance. Through the introduction of a visibility threshold, k, the values verifying jm1m2j4k are considered and normalized with respect to the maximum value, e.g. max ¼ 255 for an 8 bpp graylevel images: E lum ¼

jm1 m2 j . max

(14)

The pixel error is obtained as linear combination of block similarity distortion and luminance error as Errði; jÞ ¼

w1 E sim þ w2 E lum . w1 þ w2

(15)

Finally, the visual impairments factor is obtained by averaging Errði; jÞ over the whole image plane [16]: vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u h X w u 1 X F3 ¼ t Err2 ði; jÞ. (16) h w i¼1 j¼1 3.4. Global FR image quality index The global quality index is obtained as linear combination of the previous distortion factors. It is normalized in a MOS-equivalent scale range [1–5]. It is computed with a normalized weighted summation: w1 F 1 þ w2 F 2 þ w3 F 3 Q¼5 1k . (17) w1 þ w2 þ w3 Weighting factors have been set through linear regression method in order to best-ﬁt subjective evaluations. The subjective rates of the LIVE database [71] have been used. Half of the LIVE database has been considered as training set, i.e. 168 images of which 84 JPEG and 84 JPEG2000 compressed images were considered from the original: bikes, buildings, carnivaldolls, churchandcapitol, dancers, house, lighthouse, monarch, paintedhouse, plane, sailing1, sailing3, statue and woman. Such portion of the experimental data set has been subsequently discarded for the validation of the proposed method. The remaining 176 images have been used for testing and evaluation (Section 4.2).

ARTICLE IN PRESS G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

The derived weighting factors are then: w1 ¼ 1:5;

w2 ¼ 2:0;

w3 ¼ 5:0;

k ¼ 4.

(18)

It must be noticed that while the global index has been built in order to better ﬁt human observer evaluations, the proposed quality metric may be applied for other purposes than HVS-based objective assessment. With such idea, the single factors (and other possible inclusions) may be combined differently depending on application. 4. Results The proposed metric reliability has been evaluated through different experimental tests, in order to prove the single factors’ selectivity (Section 4.1) and the global metric performance in terms of its correlation with subjective quality evaluations. The LIVE compressed image data set has been chosen as experimental data set for evaluating the proposed system performance (Section 4.2). Half of the data set has been used as training set for parameter ﬁtting (Section 3.4), while the other half has been used for validation. The quality metrics used as competitive approaches are described in Section 4.3. Section 4.4 provides the validation results by comparing the

325

correlation results of the proposed methods with those described in the previous section. 4.1. Selectivity of the proposed factors A selectivity test set has been used in order to assess the single factors performance versus different types of artifacts. JPEG coding and six noise effects: blurring, speckle, Gaussian, salt and pepper, stretching and shifting, have been considered from the LIVE database. The data set has been built in order to obtain the same MSE (225) for all corrupted images. In Fig. 10, details for the six distortion effects, applied on the Lena sample image, are shown. Experimental results are shown in Fig. 11 for Lena. A similar behavior is observed for the other sample images within the chosen data set. The ﬁrst factor shows high selectivity for the blockiness effect. Higher values are achieved only by JPEG encoded images. The second factor represents the edge ﬁdelity. The higher the frequency losses and the stronger the distorted contours are, the greater the second factor is. In fact, high distortion measures are obtained for JPEG and blurring. The last factor represents global visual impairments and presents higher values when point-to-point errors

Fig. 10. Lena image corrupted by several artifact types with constant MSE.

ARTICLE IN PRESS 326

G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

Fig. 11. Distortion factors and global quality metric computed for the noise-distorted versions of Lena.

exceed the visibility threshold imposed by HVS luminance and contrast masking effects. In theory, stretching and shifting should obtain low F3 distortion measures, since they do not strongly change the global quality perception level. In fact, for the third factor, JPEG and blurring effects present the highest distortion levels. Speckle, Gaussian and salt and pepper achieve a mid-level distortion. Stretching and shifting obtain low distortion evaluations because, although the total image differences are high, stretched and shifted gray-levels remain under the visibility thresholds and their contribution to the F3 factor is considered low. The global quality measure, represented with a horizontal bold black bar, strongly agrees with subjective evaluations. 4.2. The LIVE database Before describing the testing environment and the comparative results it is necessary to illustrate the test image database used for the evaluation. The Laboratory for Image and Video Engineering (LIVE), together with the Department of Psychology of Austin University have published a great number of subjective evaluations related to several

distorted images. The care which was spent in the development of the subjective tests, the choice for instrumentation and the evaluation of environmental factors contributed to the wide use of such database as a standard for image quality evaluation [71]. Twenty-nine high-resolution 24-bits/pixel RGB images (typically 768 512) were distorted using ﬁve distortion types: JPEG2000, JPEG, white noise in the RGB components, Gaussian blur, and transmission errors in the JPEG2000 bit stream using a fast-fading Rayleigh channel model. The JPEG compressed subset is made of 175 images, while the JPEG2000 subset is made of 169 images (compressed with the JPEG2000 Kakadu implementation ver. 2.2). In both cases, the compression parameters have been set so that for each image the perceptual quality roughly covered the entire quality range. Observers were asked to provide their perception of quality on a continuous linear scale that was divided into ﬁve equal regions marked with adjectives ‘‘Bad’’, ‘‘Poor’’, ‘‘Fair’’, ‘‘Good’’ and ‘‘Excellent’’. More than 20 human observers rated each image. Each distortion type was evaluated by different subjects in different experiments using the

ARTICLE IN PRESS G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

same equipment and viewing conditions. In this way a total of 982 images, out of which 203 were the reference images, were evaluated by human subjects in seven experiments. The raw scores for each subject were converted to difference scores (between the test and the reference) and then Z-scores and then scaled and shifted to the full range (1–100). Finally, a Difference Mean Opinion Score (DMOS) value for each distorted image was computed. As already stated in Section 3.4, half of the LIVE database has been used as training set for the deﬁnition of parameters and subsequently discarded during the validation phase. The original images included for validation are shown in Fig. 12. 4.3. Performance comparison In order to compare the proposed quality index with other competitive methods on the same database, three alternative quality measures have been tested on the LIVE images: PSNR, SSIM [68] and VQM [64]. SSIM starts from the consideration that the main HVS function consists in extracting structural information from the viewing ﬁeld to produce a full-reference image and video metric for evaluating structural distortion (Structural SIMilarity). The method consists in computing the SSIM metric on the two signals to be compared with a sliding window approach, resulting in the quality map of the distorted image. Since the SSIM metric is deﬁned in the range 0–1, the overall quality value is taken as the average of the quality map. Slight modiﬁcations are considered to include chrominance components and video features contributions, such as presence of dark frames and large motion. VQM is a full-reference video quality measurement method considered in the ITU-T J.144 recommendation [63]. It is based on four different steps: after the sampling and calibration phases, some quality features are extracted from spatiotemporal sub-regions both in original and processed streams. Then, a set of quality parameters are derived by comparing features extracted from the processed video with features from the original video. Pooling criteria are ﬁnally applied in order to obtain a particular evaluation model. In this comparison, we choose the general model [63], adapted in order to be applied to the still images evaluation. This model consists of a linear combination of video quality parameters and produces

327

output values that range from zero (no perceived impairment) to approximately one (maximum perceived impairment). It is useful for measuring the perceptual effects of a wide range of impairments such as blurring, block distortion, jerky/ unnatural motion, noise, and error blocks. 4.4. Validation criteria and results This section is dedicated to the validation of the proposed method. The results of the proposed quality assessment model are compared with those obtained with the PSNR, SSIM and VQM. From the VQEG guidelines [2,3], the performance of a visual quality evaluation method may be modeled through three attributes: prediction accuracy, monotonicity and consistency. Several statistical measures have been proposed as validation criteria, among which RMSE, the Pearson linear correlation index and outlier ratio (OR) are reported in this work. Such metrics together express the performance accuracy, monotonicity and consistency of the assessment model. In addition to these validation criteria, the Spearman rank order correlation factor has been included in order to evaluate the monotonicity between several data clusters. Spearman rank correlation measures the correlation between two sequences of values. DMOSP values have been derived from the proposed method, while subjective DMOS values have been obtained directly from LIVE database. By deﬁning the difference between measured (subjective) and predicted (objective) DMOS as the absolute prediction error [3]: Perror ðiÞ ¼ DMOSðiÞ DMOSP ðiÞ,

(19)

the root-mean-square error of Perror, representing a measure of the method accuracy level, is computed as sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 1 X 2 RMSE ¼ P ðiÞ, (20) N N error where the index i denotes the single test image. Pearson correlation coefﬁcient is calculated as Pn ðX i X¯ Þ ðY i Y¯ Þ RP ¼ Pn i¼1 , (21) Pn ¯ 2 ¯ 2 i¼1 ðX i X Þ i¼1 ðY i Y Þ where Xi, Yi denote the subjective and objective scores, respectively, DMOS and DMOSP, and N represents the total number of image samples considered in the analysis. The Pearson correlation

ARTICLE IN PRESS 328

G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

Fig. 12. Subset of original LIVE images used for validation. From top-left: building2, caps, cemetry, coinsinfountain; ﬂowersonih35, lighthouse2, manﬁshing, ocean, parrots, rapids, sailing2, sailing4, stream, studentsculpture and womanhat.

factor is a measure of the index accuracy/monotonicity Also the Spearman coefﬁcient is calculated using DMOS and DMOSP scores. The two

sequences scores are ranked separately and the differences in rank are calculated at each position, i. The distance between sequences is computed using

ARTICLE IN PRESS G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

the following formula: P 6 ni¼1 ðrankðX i Þ rankðY i ÞÞ2 RS ¼ 1 . n ðn2 1Þ

(22)

The OR is generally considered an effective consistency attribute for the objective metric. It represents the ratio between ‘‘outlier-points’’ and total points N as OR ¼ N outliers =N,

(23)

where an outlier is a point for which jPerror ðiÞj42 sðDMOSðiÞÞ

(24)

and s(DMOS(i)) represents the standard deviation of the individual scores associated with the test image i. Typically, individual scores are considered normally distributed. 2 s is taken as a good threshold for deﬁning an outlier point, representing the 95% conﬁdence interval. The computation of statistical metrics has been initially performed directly on the output of the methods under comparison. Table 1 illustrates the results for Pearson, Spearman and RMSE, in the case of JPEG, JPEG2000 and complete data set. RMSE* equals the RMSE values (20) with normalization in the range [01]. From Table 1 it results that the values of Pearson and Spearman coefﬁcients are signiﬁcantly higher for the proposed method (FRI) than for the competing algorithms, except in the case of Spearman coefﬁcient and complete veriﬁcation data set for SSIM. Regarding RMSE, FRI is last for the JPEG subset and second for the JPEG2000 and complete data sets. However, such placements need further attention. In fact, FRI’s RMSE value is extremely close to those of SSIM and PSNR in the case of JPEG data set and almost identical to that of VQM in the case of JPEG2000 data set. In the case of complete data set the best RMSE result is undoubtedly achieved by VQM. However, such

329

approach achieves signiﬁcantly inferior results in terms of monotonicity, i.e. the Pearson and Spearman values are signiﬁcantly worse. After such remarks, it can be stated that, when evaluation metrics are directly applied to the output data, the proposed method achieves the best global results, and may be preferred to the competing approaches. SSIM constitutes a valid competitor, especially in the case of JPEG images. In a second instance, the non-linear correction suggested in the VQEG guidelines for subjective quality assessment has been considered [3]. It consists in the application of a non-linear regression technique to the method’s output before the evaluation of performance metrics. The following logistic function is then adopted: DMOSP ¼

1þ

b1 b e 2 ðVQRb3 Þ

,

(25)

where DMOSP indicates the predicted DMOS value and VQR is the output of the proposed quality model. The non-linear ﬁtting depends on the fact that subjective evaluations often show a non-linear quality rating compression at the extremes of the test range. It is then reasonable to adjust the performance metrics in order not to be inﬂuenced by such non-linear behavior. In Tables 2 and 3 are the values of the considered performance metrics with non-linear regression in the case of JPEG, JPEG2000 and complete data set. Fig. 13 illustrates the scatter plots for the proposed Full Reference Index model (a), PSNR (b), SSIM (c) and VQM (d). According to VQEG recommendations [3], RMSE is evaluated as sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ X 1 RMSE ¼ P2 ðiÞ, (26) N d N error where Perror(i) ¼ DMOS(i)DMOSP(i), N is the total number of used images and d is the number of free degrees of the logistic function. RMSE*

Table 1 Model veriﬁcation results Models

FRI PSNR SSIM VQM

JPEG subset

JPEG2000 subset

Complete veriﬁcation data set

Pearson

Spearman

RMSE*

Pearson

Spearman

RMSE*

Pearson

Spearman

RMSE*

0.9075 0.8126 0.8482 0.7519

0.9134 0.8101 0.9105 0.8066

0.2755 0.2687 0.2703 0.2035

0.9562 0.8589 0.9152 0.8167

0.9507 0.8618 0.9396 0.8478

0.1965 0.2608 0.2670 0.1935

0.9248 0.8315 0.8821 0.7843

0.9268 0.8395 0.9289 0.8342

0.2406 0.2649 0.2702 0.1941

ARTICLE IN PRESS G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

330

Table 2 Model veriﬁcation results with non-linear regression (JPEG and JPEG2000 subset). Models

FRI PSNR SSIM VQM

JPEG subset

JPEG2000 subset

Pearson

Spearman

RMSE*

OR

Pearson

Spearman

RMSE*

OR

0.9259 0.8177 0.9291 0.8229

0.9134 0.8101 0.9105 0.7549

0.2401 0.3659 0.2351 0.3401

0.0659 0.1429 0.0440 0.1556

0.9579 0.8724 0.9501 0.8541

0.9507 0.8618 0.9396 0.8478

0.1942 0.3307 0.2111 0.3383

0.0000 0.1176 0.0353 0.1071

Table 3 Model veriﬁcation results with non-linear regression (complete data set) Models

FRI PSNR SSIM VQM

Complete veriﬁcation data set Pearson

Spearman

RMSE*

OR

0.9331 0.8417 0.9371 0.8369

0.9268 0.8395 0.9289 0.8342

0.2340 0.3513 0.2270 0.3565

0.0398 0.1364 0.0227 0.1379

represents the [01] normalization of RMSE values deﬁned in (26). It must be noticed that ORs have not been reported in Table 1 since the three competing methods do not have comparable and signiﬁcant OR without non-linear ﬁtting. In fact, they have not been studied and ﬁtted for the speciﬁc LIVE data set, thus achieving very higher OR values than those of the proposed index. Tables 2 and 3 show the complete evaluation for the competing methods after non-linear ﬁtting. Also in the case of non-linear ﬁtting, the veriﬁcation results are globally in accordance with those deriving from direct application of the approach (Table 1). However, we can notice a signiﬁcant improvement in the correlation and RMSE values of SSIM index. Such improvement is easily explained since the behavior of rough values from this method differs signiﬁcantly from that from human judgment. Then, SSIM is the method that best beneﬁts from the non-linear mapping. In this case, the difference in the values of FRI and SSIM is not statistically sufﬁcient to decide for the superiority of one approach to the other [3], while both may be considered superior than PSNR and VQM. Notice that there are several signiﬁcant outliers in the scatter plots for PSNR and VQM (Fig. 13) resulting in higher OR, as reported

numerically in Table 3. VQM performance is globally lower than expected by a quality evaluation metric recommended by ITU. An explanation could be that this measure has been speciﬁcally studied for video sequences and its performance decreases in the case of still pictures. 5. Conclusions In this paper, a full-reference image quality index, based on an effective HVS mathematical model is presented. Images have been pre-processed in order to take into account luminance masking and contrast sensitivity effects. The HVS model suggested in this work has been speciﬁcally studied for still images but can be easily extended to video quality assessment by including temporal masking and integration effects. The evaluation of the distortion level has been performed with a multifactor approach and a pooling procedure: three distortion factors, blockiness, edge errors and visual impairments have been considered. Then, a global quality index is derived. The multi-factor approach allows for the efﬁcient evaluation of several distortion artifacts. Furthermore, such modular structure allows for easy substitution or addition of single distortion factors to the HVS-based framework, e.g. to provide more effective blockiness or blurring indexes or to integrate ringing effect evaluation in wavelets compressed images or quantization or moving artifacts in MPEG-X/H26X broadcasting systems. In brief, the proposed index achieves good average results in terms of correlation between measured quality values and subjective observations. When compared to other state-of-the-art approaches, its performances are often superior otherwise similar. It then represents an efﬁcient and effective method for full-reference image quality assessment, while its modular framework may favor further improvements and extensions.

ARTICLE IN PRESS G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333

331

Fig. 13. Scatter plot for the differential mean opinion score (DMOS) versus objective models: (a) FRI, (b) PSNR, (c) SSIM and (d) VQM. Each point represents a different image within the considered test set.

References [1] ITU-R Recommendation BT.500-11, Methodology for the Subjective Assessment of the Quality of Television Pictures, ITU, Geneva, Switzerland, 1998. [2] A.M. Rohaly, et al., Video Quality Expert Group: current results and future directions, in: Proceedings of the SPIE, vol. 4067, Perth, Australia, 2000, pp. 742–753. [3] ITU-R Document 6Q/14, Final report from the Video Quality Experts Group on the validation of objective models of video quality assessment, Phase II (FR-TV2), September 2003, available at www.vqeg.org [4] K. Hosaka, A new picture quality evaluation method, in: Proceedings of the International Picture Coding Symposium PCS ’06, Tokyo, Japan, April 1986, pp. 17–18. [5] M. Miyahara, K. Kotani, V.R. Algazi, Objective picture quality scale (PQS) for image coding, IEEE Trans. Comm. 46 (9) (1998) 1215–1226. [6] S. Winkler, C.J. van den Branden Lambrecht, M. Kunt, Vision and video: models and applications, in: C.J. van den Branden Lambrecht (Ed.), Vision Models and Applications to Image and Video Processing, Kluwer Academic Publishers, Dordrecht, 2001 (Chapter 10).

[7] J. Berts, A. Persson, Objective and subjective quality assessment of compressed digital video sequences, Final Thesis, Department of Signals and Systems, Chalmers University of Technology, Go¨teborg, Sweden, 1998. [8] M.P. Eckert, A.P. Bradley, Perceptual quality metrics applied to still image compression, Signal Processing 70 (1998) 177–200. [9] S. Winkler, Quality metric design: a closer look, in: Proceedings of the SPIE Human Vision and Electronic Imaging, vol. 3959, San Jose, CA, January 22–28, 2000, pp. 37–44. [10] S. Winkler, A. Sharma, D. McNally, Perceptual video quality and blockiness metrics for multimedia streaming applications, in: Proceedings of the Fourth International Symposium on Wireless Personal Multimedia Communications, Aalborg, Denmark, September 9–12, 2001, pp. 553–556. [11] A.B. Watson, DCT quantization matrices visually optimized for individual images, Proc. SPIE 1913-14 (1993) 202–216. [12] Z. Wang, A.C. Bovik, A universal image quality index, IEEE Signal Process. Lett. 9 (3) (2002 March) 81–84. [13] A.M. Eskicioglu, Quality Measurement for Monochrome Compressed Images in the Past 25 Years, in: Proceedings of

ARTICLE IN PRESS 332

[14]

[15]

[16]

[17]

[18] [19]

[20]

[21]

[22]

[23]

[24]

[25]

[26] [27]

[28]

[29]

[30]

G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, Istanbul, Turkey, June 5–9, 2000, pp. 1907–1910. D. Marr, Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, W. H. Freeman & Company, New York, 1982. T.N. Pappas, R.J. Safranek, Perceptual criteria for image quality evaluation, in: A.C. Bovik (Ed.), Handbook of Image and Video Processing, Academic Press, New York, 2000, pp. 669–684. H. de Ridder, Minkowski-metrics as a combination rule for digital-image-coding impairments, in: Proceedings of the SPIE, vol. 1666, San Jose, CA, 1992, pp. 16–26. Z. Yu, H.R. Wu, S. Winkler, T. Chen, Vision model based impairment metric to evaluate blocking artifacts in digital video, Proc. IEEE 90 (1) (2002 January) 154–169. B.G. Breitmeyer, Visual Masking: An Integrative Approach, Oxford University Press, New York, 1984. I. Hontsch, L.J. Karam, Adaptive image coding with perceptual distortion control, IEEE Trans. Image Process. 11 (3) (2002 March) 213–222. R. Rosenholtz, A.B. Watson, Perceptual adaptive JPEG coding, in: Proceedings of the IEEE International Conference on Image Processing, 1996, pp. 901–904. Z. Wang, A.C. Bovik, B.L. Evans, Blind measurement of blocking artifacts in images, in: Proceedings of the IEEE International Conference on Image Processing, vol. III, September 10–13, 2000, Vancouver, Canada, pp. 981–984. A. Paquet, Blind measurement of blocking artifacts in images, Project Report, University of British Columbia, Department of Electrical Engineering, December 18, 2000. S. Liu, A.C. Bovik, Efﬁcient DCT-domain blind measurement and reduction of blocking artifacts, IEEE Trans. Circuits Systems Video Technol. (2001 April). M. Mocofan, R. Vasiu, Quality of MPEG coded video sequences, in: Proceedings of the Third COST 276 Workshop on Information and Knowledge Management for Integrated Media Communication, Budapest, Hungary, October 11–12, 2002. D. Melcher, S. Wolf, Objectives measures for detecting digital tiling, Standards Project, National Telecommunications and Information Administration, Institute for Telecommunication Sciences, Boulder, CO, January 9, 1995. E. Peli, Contrast sensitivity function and image discrimination, J. Opt. Soc. Am. A 18 (2) (2001) 283–293. R.W.S. Chan, P. Goldsmith, Modeling and validation of a psychovisually based image quality evaluator for DCTbased compression, Signal Processing: Image Communications 17 (6) (2002 July) 485–495. Z. Wang, H.R. Sheikh, A.C. Bovik, No-reference perceptual quality assessment of JPEG compressed images, in: Proceedings of the IEEE International Conference on Image Processing, Rochester, New York, September 22–25, 2002, pp. 477–480. J. Yang, H. Choi, T. Kim, Noise estimation for blocking artifacts reduction in DCT coded images, IEEE Trans. Circuits Syst. Video Technol. 10 (7) (2000 October). C. Derviaux, F.X. Coudoux, M.G. Gazalet, P. Corlay, M. Gharbi, A post-processing technique for block effect elimination using a perceptual distortion measure, in: Proceedings of the International Conference on Acoustics,

[31] [32]

[33]

[34]

[35] [36]

[37] [38] [39] [40] [41] [42]

[43]

[44] [45] [46] [47] [48]

[49]

[50]

[51]

[52] [53]

Speech, and Signal Processing, Bayern Munchen, Germany, April 21–24, 1997. T. Vlachos, Detection of blocking artifacts in compressed video, Electron. Lett. 36 (13) (2000) 1106–1108. H.R. Wu, M. Yuen, A generalized block-edge impairment metric for video coding, IEEE Signal Process. Lett. 4 (11) (1997) 317–320. D.D. Giusto, M. Perra, Estimating blockness distortion for performance evaluation of picture coding algorithms, in: Proceedings of the IEEE PACRIM97, Victoria, BC, Canada, August 20–22, 1997. C. Perra, F. Massidda, D.D. Giusto, Image blockiness evaluation based on sobel operator, in: Proceedings of the IEEE Internatinal Conference on Image Processing, Genova, September 11–14, 2005. A.J. Ahumada Jr., Computational image quality metrics: a review, SID Digest 24 (1993) 305–308. N. Graham, J. Nachmias, Detection of grating patterns containing two spatial frequencies: a comparison of singlechannel and multiple-channel models, Vis. Res. 11 (1971) 251–259. M.H. Pirenne, Vision and the Eye (Science Paperbacks), Chapman & Hall, London, 1967. G.S. Brindley, Physiology of the Retina and Visual Pathway, Williams & Wilkins, Baltimore, 1970. G.T. Fechner, Elements of Psychophysics, vol. 1, Holt, Rinehart & Winston, New York, 1966 [1860]. S. Stevens, To Honor Fechner and Repeal His Law, Science (1961) 80–86. A.B. Watson, Digital Image and Human Vision, MIT Press, Cambridge, 1984. M. Yasuda, K. Hiwatashi, A model of retinal neural network and its spatio-temporal characteristics, Jpn. J. Med. Electron. Biol. Eng. (1968) 53–62. F.W. Campbell, J.J. Kulikowsky, J.Z. Levinson, The effect of orientation on the visual resolution of gratings, J. Physiol. 187 (1966) 427–436. G.E. Legge, J.M. Foley, Contrast masking in human vision, J. Opt. Soc. Am. 70. J. Canny, A computational approach to edge detection, IEEE Trans. Pattern Anal. Mach. Intell. 8 (6) (1986 November). G.E. Legge, A power law for contrast discrimination, Vis. Res. 21 (1981) 457–467. A.N. Netravali, B.G. Haskell, Digital Pictures, Plenum Press, New York, 1995 (Chapter 4). M. Yasuda, K. Hiwatashi, A model of retinal neural network and its spatio-temporal characteristics, Jpn. J. Med. Electron. Biol. Eng. 0 (1968) 53–62. F.W. Campbell, J.J. Kulikowsky, J.Z. Levinson, The effect of orientation on the visual resolution of gratings, J. Physiol. 187 (1966) 427–436. G.C. Phillips, H.R. Wilson, Orientation bandwidth of spatial mechanisms measured by masking, J. Opt. Soc. Am. 0 (1984) 226–231. T. Carney, S.A. Klein, Q. Hu, Visual masking near spatiotemporal edges, in: Proceedings of the SPIE, vol. 2657, San Jose, CA, 1996, pp. 393–402. ISO/IEC 15444-1, Information technology—JPEG 2000 image coding system—Part 1: core coding system, 2001. A. Said, W.A. Pearlman, A new, fast, and efﬁcient image codec based on set partitioning in hierarchical trees, IEEE Trans. Circuits Systems Video Technol. 6 (3) (1996).

ARTICLE IN PRESS G. Ginesu et al. / Signal Processing: Image Communication 21 (2006) 316–333 [54] Z. Xiong, X. Wu, S. Cheng, J. Hua, Lossy-to-lossless compression of medical volumetric data using three-dimensional integer wavelet transforms, IEEE Trans. Med. Imaging 22 (3) (2003 March) 459–470. [55] P. Schelkens, A. Munteanu, J. Barbarien, M. Galca, X. Giro-Nieto, J. Cornelis, Wavelet coding of volumetric medical datasets, IEEE Trans. Med. Imaging 22 (3) (2003 March) 441–458. [56] T.H. Oh, R. Besar, Medical image compression using JPEG2000 and JPEG: a comparison study, J. Mech. Med. Biol. 2 (3 & 4) (2002) 313–328. [57] G.C. Phillips, H.R. Wilson, Orientation bandwidth of spatial mechanisms measured by masking, J. Opt. Soc. Am. (1984) 226–231. [58] P. Marziliano, F. Dufaux, S. Winkler, T. Ebrahimi, Perceptual blur and ringing metrics: application to JPEG2000, Signal Processing: Image Communications 19 (2) (2004 February) 163–172. [59] S.A. Karunasekera, N.G. Kingsbury, A distortion measure for blocking artifacts in images based on human visual sensitivity, IEEE Trans. Image Process. 4 (6) (1995 June) 713–724. [60] D.R. Fuhrmann, J.A. Baro, J.R. Cox Jr., Experimental evaluation of psychophysical distortion metrics for JPEGencoded images, J. Electron. Imaging 4 (4) (1996 October) 397–406. [61] A.M. Eskicioglu, P.S. Fisher, S. Chen, Image quality measures and their performance, IEEE Trans. Comm. 43 (12) (1995 December) 2959–2965. [62] T. Eude, A. Mayache, An evaluation of quality metrics for compressed images based on human visual sensitivity, in:

[63]

[64] [65]

[66]

[67]

[68]

[69]

[71]

333

Proceedings of the Fourth International Conference on Signal Processing, vol. 1, September 1998, pp. 779–782. ITU-T J.144 (1998), Objective perceptual video quality measurement techniques for digital cable television in the presence of a full reference, March 2004. S. Wolf, M. Pinson, Video quality measurement techniques, NTIA Report 02-392, June 2002. Joint Video Team (JVT) (VCEG/MPEG), H.264/MPEG4 part 10, Ofﬁcial title: advanced video coding (AVC) international standard: December 2002, available on the net at http://www.chiariglione.org/mpeg/working_documents. htm#MPEG-4 A.M. Eskicioglu, P.S. Fisher, The variance of the difference image: an alternative quality measure, in: Proceedings of the Picture Coding Symposium, September 1994, pp. 88–91. Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4) (2004 April) 600–612. Z. Wang, L. Lu, A.C. Bovik, Video quality assessment based on structural distortion measurement, Signal Processing: Image Communications (special issue on Objective video quality metrics) 19 (2) (2004 February) 121–132. A. Toet, M.P. Lucassen, A new universal colour image ﬁdelity metric, Displays 24 (4–5) (2003 December) 197–207. H.R. Sheikh, Z. Wang, L. Cormack, A.C. Bovik, LIVE Image Quality Assessment Database Release 2, http:// live.ece.utexas.edu/research/quality

A multi-factors approach for image quality assessment based on a human visual system model

A multi-factors approach for image quality assessment based on a human visual system model

Recommend Documents