Design and Evaluation of an Entirely Psychovisual-Based Coding Scheme

Journal of Visual Communication and Image Representation 12, 401–421 (2001) doi:10.1006/jvci.2001.0489, available online at http://www.idealibrary.com...

Download PDF

311KB Sizes 0 Downloads 51 Views

Report

PDF Reader
Full Text

Journal of Visual Communication and Image Representation 12, 401–421 (2001) doi:10.1006/jvci.2001.0489, available online at http://www.idealibrary.com on

Design and Evaluation of an Entirely Psychovisual-Based Coding Scheme H. Senane, A. Saadane, and D. Barba IRESTE/SEI La Chantrerie, C.P. 3003, Nantes Cedex 03, France Received June 25, 1999; accepted May 14, 2001

In this paper a new psychovisual-based coding scheme is proposed. The analysis and the quantization stages, the two main functions which determine the performances of a coding scheme, are based on the human visual system properties. In the first stage, a filter bank decomposes images into subimages of perceptual significance when a contrast transformation is applied. Analytic cortex filters have been used because they provide an accurate modelization of visual receptive fields. The choice of subbands lies on psychovisual experiments led in the laboratory. It was found that visual information is processed through 17 channels. In the second stage the use of the local band-limited contrast yields very interesting properties concerning the quantization. A scalar and vector quantization have been considered. In this latter case the vector’s construction methodology preserves the main properties of the human visual system about perception of quantization impairments and takes into account the masking effect due to interaction between subbands with the same radial frequency but with different orientations. The vector’s components are the local band limited contrasts Ci j (m, n) defined as the ratio between the luminance L i j at the point (m, n), which belongs to the radial subband i and angular sector j and the average luminance at this location. Hence the vector’s dimension depends on the orientation selectivity of the chosen decomposition. The low pass subband, which is nondirectional is scalar quantized. A methodology for automatic subsampling matrix design was also developed. The performance have been evaluated on a set of images in terms of peak SNR, true bit rates, and visual quality. For the latter, no impairments are visible at a distance of four times the height of the used high quality TV monitor. The SNRs are about 6 to 8 dB under the ones of classical subband image coding schemes when producing the same visual quality. Another particularity of this approach, due to the use of the local band limited contrast, lies in the structure of the reconstruction image error which is found to be highly correlated to the structure of the original image. °C 2001 Elsevier Science

I. INTRODUCTION Data compression aims to reduce the number of bits with a maximal redundancy removal by changing the representation space. It is of great interest for storage or transmission. Image 401 1047-3203/01 $35.00

° C 2001 Elsevier Science

All rights reserved.

402

SENANE, SAADANE, AND BARBA

storage is required for medical images, movies, satellite images, educational and business documents, and so on. Image transmission applications are in broadcast television, remote sensing via satellite, video conferencing, computer communications, and so on. The method of compression depends on the nature of the data. Several techniques have been developed in the field of image coding. These include schemes based on discrete cosine transform (DCT), fractals, wavelets, and subband filtering. To achieve high compression ratios while providing a high visual quality for the images reconstructed by these techniques, the properties of the human visual system (HVS) are often considered. For JPEG standard [1], the image is first divided into blocks of size 8 × 8 pixels. For each block, DCT coefficients are computed, then quantized according to a quantization matrix, and finally transmitted to the decoder which performs the inverse operations. To avoid blocking effects which affect the reconstructed images at low bit rates, psychovisual aspects may be taken into account either in the design of the quantization matrix [2] or in the definition of postprocessing algorithms [3]. Watson et al. [4] showed how HVS properties may be considered in the design of wavelet systems. The authors measured the visibility of discrete wavelet transform artifacts and the thresholds obtained have been used to design a quantization matrix so that errors induced were not perceptible. In [5], O’Rourke and Stevenson proposed an HVS-based wavelet decomposition. A vector quantization, based on HVS properties, has been used to allocate bits to the different subbands. HVS considerations have also been used in a subband coding context by Safranek and Johnston [6] and Van Dyck and Rajala [7]. It has been shown [8] that a full-polar frequency decomposition corresponds better to the HVS. For this purpose subband-like coding schemes appear to be of particular interest because of their flexibility in sharing the spectral plane while preserving spatial information. The goal of this paper is to present a new visual subband coding scheme in which both the decomposition and the vector quantization are based on the HVS properties. For the decomposition, designed in the lab from psychophysical experiments, the radial and angular selectivities vary with radial frequency. For the vector quantization, the vector’s construction methodology takes into account the masking effects between subbands. For this purpose, Section II describes the used visual decomposition. Section III shows how the local band-limited contrast may be quantized without inducing perceptible impairments. For achieving a high compression ratio with an excellent visual quality of the coded image, vector quantization and psychovisual lattice are presented. Section IV briefly presents the scheme of A-B Watson and introduces our adaptive modifications. An automatic subsampling matrix design algorithm is also given. Finally, results which show particularities coherent to the human representation of visual informations (structure of the error image) are discussed. II. HUMAN VISUAL SYSTEM PROPERTIES AND SPATIAL FREQUENCY CHANNELS Several psychophysical studies have shown that the retinal image is likely to be processed in different frequency channels which are narrowly tuned around specific spatial frequencies and orientations. Other considerations, based on biology, exhibit the processing of the visual information throughout several frequency channels. The organization of striate cells in the cortex [9] is one of these arguments. In [10] it clearly appears that the HVS is polar separable in the frequency domain. Nevertheless, a great diversity concerning the number of channels

A PSYCHOVISUAL-BASED CODING SCHEME

403

and their characteristics are found in the literature. For instance in [8, 11, 12], a model with four spatio-frequential channels is proposed whereas in [13] the model has six channels and in [14] a model with 30 channels is proposed. Such a diversity is also met concerning the central frequency and the radial and angular bandwidths of the channels. Among well-known decompositions used in subband coding schemes, let us mention the hexagonal decomposition of Simoncelli and Adelson [15] and the directional decomposition of Bamberger and Smith [16]. However, the hexagonal decomposition does not allow a simultaneous maximum response in the vertical and horizontal directions while the directional one permits the decomposition into 2n angular subbands but does not allow radial decomposition. In the present state of the art, no method exists allowing a full polar separable decomposition with critical subsampling. By relaxing this critical subsampling constraint, several interesting decompositions are encountered in the literature (Navaro [17], Campell [18], Schlomot [19], and Watson [20]). Here we are concerned with such decompositions. The one presented is based on a large synthesis of the literature results [21–23] completed by numerous experimental studies [24]. The conducted experiments were based on the masking effect [25–27]. This effect is based on the assumption that two adjacent channels are feebly coupled. The masked signal will have a much lower differential visibility threshold if its central frequency is outside the channel to be characterized. Hence, a masking signal whose central frequency is localized inside the band to be characterized is presented to the observer. A signal of small magnitude (called the masked signal) frequency centered around the masking signal is added to the latter. The visibility threshold (the just noticeable magnitude) of the masked signal is measured at several frequencies. The resulting curve allows the determination of the bandwidth of the channel to be characterized. The results obtained are summarized in Fig. 1. For this decomposition, three bandpass radial frequency channels are needed (corona V, IV, and III), each of them being

FIG. 1. Spectrum decomposition in the HVS.

404

SENANE, SAADANE, AND BARBA

decomposed into angular sectors associated with an orientation selectivity of 30◦ , 30◦ , and 45◦ , respectively. Channels II and I have been merged in a nondirectional low-pass channel. Obviously this decomposition looks like the most widely used one, Watson’s subband decomposition, as both of them model the peripheral parts of the HVS. For computational reasons, Watson intentionally constrained bandwidths so the cortex filters are all identical except for scaling and rotation. The bandwidths of Fig. 1 will be used here as they are because they are more physiologically plausible and there are no real-time considerations.

III. PSYCHOVISUAL QUANTIZATION LAWS III.1. Local Band-Limited Contrast Quantizers are of great importance in a coding scheme as they directly determine performances. Hence, there is a need of a representation space of image data coherent to the perception of degradations by the HVS. It is well known that direct quantization of luminance is not psychovisually significant and the use of contrast is more relevant. Even though the physical contrast is well defined by Michelson’s formula [28], several definitions exist for complex signals (Pavel et al. 29], Rubien and Siegel [30], Hess et al. [31]). For these latter signals, an appropriate definition must consider the contrast of an image as being defined locally dependent on the spatial frequency content of the image at and around this location. Based on this consideration Pelli [32] defined the local band-limited contrast as the ratio between the luminance of a pixel issued from an image filtered by a channel radially localized in the frequency domain and the luminance of this pixel issued from the same image filtered by a filter containing all frequencies below that channel. For decomposition such as that of Fig. 1, where channels are not only radially localized in the frequency domain, but also angularly localized, we have adapted this definition to take into account the directional components. Hence the contrast at point (m, n) is defined as L i, j (m, n) , Ci, j (m, n) = Pi−1 Pcard(l)−1 L ik,l (m, n) k=0 l=0

(1)

where i, k are index numbers of radial bands and j, l those of angular bands. card(l) is the number of angular bands in the kth corona. L i, j (m, n) and L ik,l (m, n) are given by ªª © © 1/2 L i, j (m, n) = F −1 Trunci Hi, j (u, v) · F(L 0 (m, n))

(2)

L ik,l (m, n) = F −1 {Trunci {Hk,l (u, v) · F(L 0 (m, n))}},

(3)

where F represents the Fourier transform and F −1 the inverse Fourier transform. The truncation operator, Trunci , allows us to keep the signal only on the support of the ith radial band from which the subimage L i, j (m, n) is obtained. Hence one can see from these expressions that the sizes of L i, j (m, n) and L ik,l (m, n) are identical. Figure 2 shows 1/2 how they are obtained. Only Hi, j (u, v) is used for L i, j (m, n) because the decoder will also 1/2

filter L i, j (m, n) by Hi, j (u, v) (see Sect. TV), whereas L ik,l (m, n) needs Hk,l (u, v) because

A PSYCHOVISUAL-BASED CODING SCHEME

405

FIG. 2. Principle of obtention of L ik,l (m, n) and L i, j (m, n).

of the relation i−1 card(l)−1 X X k=0

Hk,l (u, v) = Mi (u, v),

(4)

l=0

where Mi (u, v) is the DOM filter (Sect. V.2) [33]: a cylinder, in the frequency domain, blurred with a gaussian and containing all frequencies below the ith corona. Quantizing the band-limited local contrast is very important, because it allows us to take into account partially the masking effect between radial channels. Furthermore, this quantity is the only one permitting the characterization of the contrast simultaneously in the spatial and frequency domains. III.2. Quantizer Construction The design of quantizers involves specifying thresholds and levels which minimize both entropy and some measure of distortion. A usual way is to fix the number of reconstruction levels and determine the position of decision thresholds and reconstruction levels constraint to the minimization of a given measure of distortion. For classical quantizers such distortion may be well measured by the mean square error [34], L Z X

ti+1

ε = 2

i=1

(u − ri )2 Pu (u) du,

(5)

ti

where L is the number of reconstruction levels, ti the decision threshold, ri the reconstruction level, and Pu (u) the density of probability of the input source. For psychovisual quantizers we are concerned by the visual distortion. Indeed for the human visual system, the same error is less detectable when it is superimposed with strong signals than when it is superimposed with weak ones (small values). In a previous study [35] a methodology to design psychovisual quantizers has been developed. This methodology uses natural pictures, characterized by the local band-limited contrast, to evaluate the degradation perception inside each channel. The use of complex signals is more realistic than incremental (or decremental) patches used by Whittle [36] and by Kingdom and Moulden [37]. Another important point of this methodology concerns the choice of degradations. Those induced by quantizers to be constructed instead of noise or contrast increment also seem to be more interesting as explained above. Figure 3 shows the principle of tests permitting the measurement of degradations visibility due to a quantizer at the output of a channel.

406

SENANE, SAADANE, AND BARBA

FIG. 3. Principle of quantizer evaluation in terms of degradation visibility. After filtering, two pictures are presented to an observer (with a reference image) who has to decide which one is degradated (here picture A).

1/2

The filter Hi, j (u, v), the same as in Fig. 2, allows the filtering of the input picture inside the desired channel. The input picture is chosen in such a way that its spectrum is rich inside the channel in which we want to evaluate the quantizer. The quantizer construction has been performed threshold by threshold and level by level. The results show that contrasts must always be uniformly quantized in order to achieve a just noticeable quantization noise (the quantization step 1i j being dependent on the visual subband considered). Also it was shown that the masking effect between channels exists only with angular adjacent channels. In the case of radial adjacent subbands, the masking effect was already taken into account by the contrast definition. So only the masking effect along directional adjacency has to be considered. III.3. Vector Quantization Vector quantization consists in representing a set of data, vector arranged, by an index corresponding to a representative vector of a dictionary. Even if the components of the vector to be quantized are not corelated, this approach is advantageous compared to the scalar quantization [38]. In our case the vector quantization will be used to integrate the masking effect between the angular components at the same spatial location. Because of the subsampling, the masking effect between two adjacent spatial locations, in the same channel, is very small. Furthermore, psychovisual experiments have shown that the visibility threshold is significally increased only between two angularly adjacent channels. The vector’s construction follows the following steps. First, from the previous remark about orientational masking effects on the contrast quantization, we deduced preliminary vectors: their components are the contrasts Ci j associated with the different orientations j taken at the same location. So the dimension of the vectors is dependent on the orientation selectivity: six for coronas V and IV and four for corona III. The low-pass band is scalar quantized. We have chosen the lattice vector quantization to quantize the vectors. It is simple and fast and needs no learning of the statistics of the input source. Results, when Dn lattices are used, are shown in [39]. In such lattices, only statistical dependences between angulary channels

407

A PSYCHOVISUAL-BASED CODING SCHEME

are exploited. The resulting average true bit-rate is of 0.7 bpp. But if we want to take profit of the masking effect between the angulary adjacent channels we have to propose another kind of lattice. The simplest law linking the visibility threshold of the contrast at a given location and in a given channel, and the contrast magnitude at the same spatial location and in an adjacent channel is given by µ µ ¶¶ |ci, j+1 (m, n)| |ci, j−1 (m, n)| + , si,0 j = si, j · 1 + αi · si, j+1 si, j−1

(6)

where si, j is the first quantization threshold of the given channel (ith corona and jth orientation) which has been experimentally determined and si,0 j the first quantization threshold in the same channel but taking into account the masking effect between two angulary adjacents channels. ci, j (m, n) is the local band-limited contrast, spatially localized at (m, n) in the channel i, j. αi , experimentally determined, gives the strength of the masking effect. A comparison between the corresponding lattice and the Zn and Dn lattices is given in Fig. 4. Quantizing ci, j (m, n) with a quantization threshold si,0 j gives the same result as p quantizing the perceived local band-limited contrast ci, j (m, n) with a quantization threshold p si, j , where ci, j (m, n) is given by the following equation. p

ci, j (m, n) =

1 + αi ·

³

ci, j (m, n) |ci, j+1 (m,n)| si, j+1

+

|ci, j−1 (m,n)| si, j−1

´

Hence, it is very easy to quantize a vector according the psychovisual lattice. The first step consists in converting, using Eq. (9), the components of the input vector into perceived local band-limited contrast components. Then one has to quantize according to a Zn lattice the corresponding vector. The inverse-quantization step consists in recovering the local bandlimited contrast by an inverse transformation. In order to identify such a transformation, a matrix formulation of the problem has been adopted. In this formulation, Eq. (9) becomes diag

Ci

© ª¡ diag p ¢ diag (m, n) = I N + αi · Ci (m, n) · (M+ + M− ) Ci (m, n) ,

(7)

FIG. 4. A graphical comparison between the Zn lattice (a), the Dn lattice (b), and the psychovisual lattice (c) in the case of vectors of dimension two.

408

SENANE, SAADANE, AND BARBA

where £ p ¤T p p Ci (m, n) = ci,1 (m, n) · · · ci,N (m, n) ,

Ci (m, n) = [ci,1 (m, n) · · · ci,N (m, n)]T , with

—N the number of orientations in the ith corona, diag —Ci (m, n) the N × N square matrix where the diagonal elements are the absolute p value of the components of the vector Ci (m, n) and the other values are set to zero, diag p —Ci (m, n) the N × N square matrix where the diagonal elements are the absolute p value of the components of the vector Ci (m, n) and the other values are set to zero. —M+ and M− are given by: 

0

0

..

0

  1/si,2 0   0 1/si,3 0 M+ =   · ·  : 0   · · 0 ·· 0 1/si,N  0 1/si,1 0 ··  0 · 0 1/sl,2   ·  M− =  ·  : : 0   0 0 · 1/si,N 0 0

1/si,1 0 0 :

     ;    

0 0 :



    .  0  1/si,N −1  0

(8)

p

Because of the use of the absolute value of ci, j (m, n), one has to restore the sign of ci, j (m, n) by using the one of ci, j (m, n). By using the expression (directly obtained from Eq. (10)) diag p

CiT (m, n) = [1 . . . 1] · Ci

© ª−1 diag p (m, n) · I N − αi · (M+ + M− ) · Ci (m, n)

(9)

Q Q (m, n) · · · ci,N (m, n)]T . one can recover the quantized vector CiQ (m, n) = [ci,1 p Q In practice, Ci (m, n) is deduced from Eq. (12) and Ci (m, n) from Eq. (8). Values αi have been determined experimentally. Assessment quality subjective tests have been conducted with three observers and two images, Port and Lena. Table 1 gives the maximum values of αi for each corona so that no impairments are visible on the reconstructed images.

TABLE 1 The Experimentally Determined αi for Each Corona αII

αIII

αIV

0.005

0.07

0.05

A PSYCHOVISUAL-BASED CODING SCHEME

409

IV. IMAGE CODING SCHEME DESCRIPTION IV.1. Introduction The filters used to achieve the radial–angular decomposition of Fig. 1 are similar to the cortex filters of Watson [33]. These filters are defined as the product between the DOM filters, which characterize the radial selectivity, and the fan filters providing the angular one. In Section IV.2, we will briefly remember cortex filters, explain the parameters we used, and modifications we brought. Subsampling matrixes are generated by a process shown in Section IV.3. Section IV.4 gives a complete view of the scheme in order to link the notions explained previously. IV.2. Filters Design The coronas of Fig. 1 are generated by the DOM filters which are issued from the Mesas filters. The construction of the original Mesa filter is achieved by convolving a cylinder having a radius of β/2, β ∈ [0, 1[ in the spectral domain, with a gaussian, in the aim to allow a soft transition between the unity gain in the bandpass to the almost zero gain at the 1/2 radial frequency µ M0 (u, v) =

γ f0

¶2 ·e

−π ((r ·γ )/ f 0 )2

µ

¶ r ⊗5 , 2 · f0

(10)

where r 2 = u 2 + v 2 and 5(r/(2 · f 0 )) is a rectangular pulse of unit height and 2 · f 0 of width centered at the origin. The softness of the transition is controlled by the parameter γ as shown in Fig. 2 of [25]. This parameter is link to the standard deviation of the gaussian by the relation 1 f0 · , σ0 = √ 2·π γ

(11)

where f 0 is the radius of the cylinder, f 0 = β/2. Based on the trade-off between image quality and a manimum amount of aliasing after subsampling (see Sect. IV.3) we used β = 0.91 and γ = 12. The scaled-Mesa filters defined by Watson are homothetical versions of the original Mesa filter M0 (u, v): Mk (u, v) = M0 (sk · u, sk · v).

(12)

Watson laid sk = 2k for his decomposition of spectrum by octave. The scale factors sk for the spectral decomposition of Fig. 1 are given by sV = 1 for corona V, sIV = 14.2/28.2 for corona IV, sIII = 5.7/28.2 for corona III, and SI+II = 1.5/28.2 for corona I+II. However, these values have to be adjusted to consider the subsampling to be performed. Indeed the ratio between the number of pixels in the channel to be coded and the determinant of the applied subsampling matrix must be an integer. Based on the subsampling matrix design constraint (see Sect. IV.3) and determinants of Table 4, chosen sk values are given in Table 2.

410

SENANE, SAADANE, AND BARBA

TABLE 2 The Parameters sk Chosen to Realize the Decomposition sI+II

sIII

sIV

sV

1 18

3 14

1 2

1

The difference of Mesa filters, or DOM filters, are generators of coronas I+II, III, IV, V (Table 1) and are directly issued from the scaled mesa filters: DOMk (u, v) = Mk (u, v) − Mk+1 (u, v).

(13)

Now that we have the coronas, we need to construct the directional selective filters or Fan filters. For this, Watson took a two-dimensional oriented step function and convolved it with a gaussian. The horizontal blurred edge is given by: µ b0 (u, v) = step(v) ⊗

σb ·

1 √

¶ 2π

v2 2

· e 2.σb .

(14)

Hence µ b0 (u, v) = F

v σb

¶ ,

(15)

where F(X ) is the distribution function of the normal centered gaussian with standard deviation of one. One can generalize the previous expression according any direction θ by: µ bθ (u, v) = F

(u, v)T · u θ +π/2 σb

¶

µ =F

v cos θ − u sin θ σb

¶ .

(16)

√ In fact for a given σ0 of M0 (σ0 = β · (γ · 2π )−1 ) the corresponding standard deviation for Mk is σk = σ0 · sk . For the σb parameter of Fan filters Watson took σb = (σk + σk+1 )/2, which depends on the corona. In order to get a soft σb transition along radial edges of Fan filters, we preferred to have this σb parameter adapted to the radial frequency and took σb ( f ) = σ0 ·

f f = √ , f0 γ · 2π

where f = (u, v)T · u θ represents the radial frequency. This modification leads to: Ã ! √ (u, v)T · u¯ θ +π/2 · γ · 2π . bθ (u, v) = F (u, v)T · u θ

(17)

(18)

Thus µ bθ (u, v) = F

¶ √ v cos θ − u sin θ · γ · 2π . u cos θ + v sin θ

(19)

411

A PSYCHOVISUAL-BASED CODING SCHEME

FIG. 5. Analytical filter obtained by multiplying a DOM filter with a Fan filter.

Directional selective filters are readily obtained from oriented edge by gk (u, v) = bθk (u, v) · (1 − bθk+1 (u, v)) g D (u, v) = bθ D (u, v) · bθ0 (u, v),

(20)

where D is the number of directions. Multiplying a DOM filter with a Fan filter gives an analytical filter corresponding to a given channel as shown in Fig. 5. Table 3 gives the different θk , in degrees, for each comma. For a filter localized in a given channel at the kth radial frequency and lth direction we have f k,l (u, v) = Dom k (u, v) · glk (u, v),

(21)

where {glk , l ∈ [0, D k ]} is the set of direction selective filters for the kth corona defined by (16). Finally, this construction ensures the property: Dk i X X

Mk,l (u, v) = Mi (u, v).

(22)

k=0 l=0

Hence once the reconstruction has been performed, one has almost a proportion of π/4 of the recovered spectrum. IV.3. Subsampling Matrix Design Once we have the spectrum of the image in a given channel, we have the possibility to reduce the number of samples by first limiting the support of the spectrum to its highest frequencies and by second decimation in the spatial domain [20]. We describe here TABLE 3 Orientations Used inside Each Corona to Construct the Fan Filters, the Corona I+II Is Not Direction Selective

I+II III IV V

θ0

θ1

θ2

θ3

θ4

θ5

0 −22.5 −15 −15

22.5 15 15

67.5 45 45

112.5 75 75

157.5 105 105

135 135

165 165

412

SENANE, SAADANE, AND BARBA

how decimation is performed in order to maximally reduce the number of samples while minimizing the aliasing. Given an integer matrix 3, one defines the downsampling signal v(n) from the input signal x(n) by v(n) = x(3n).

(23)

This process reduces the number of samples by a factor of |det(3)|. The Fourier transform of v(n) is [40, 41] ω) = V (*

X 1 M−1 ω − 2π k l )), X ((3T )−1 · (* M l=0

(24)

where k l , l ∈ [0, M − 1] are M = |det(3)| distinct cosets vectors, i.e., a minimum set of vectors which allows the mapping of Z2 (the integer lattice of points) by the union of lattices generated by 3 shifted of k l . A way to construct the coset vectors is to find the |det(3)| vectors n belonging to [0, 1[ × [0, 1[ such that 3 · n ∈ Z 2 [40]. The upsampled output y(n) from the input v(n) is defined by: ( y(n) =

v[3−1 · n] if 0 otherwise.

(25)

This process increases the number of samples by a factor |det(3)|. In the Fourier domain one gets ω) = V (3T * ω). Y (*

(26)

Therefore, when cascading a downsampler with an unsampler, as shown in Fig. 6, we get: ω) = Y (*

X 1 M−1 ω − 2π(3T )−1 k l ). X (* M l=0

(27)

ω) consists in the original X (* ω) plus its replica versions according to the basis Hence Y (* vectors given by (3T )−1 . Now our goal is to find, for each channel, the appropriate subsampling (equivalently, upsampling) matrix. So one has to maximize its determinant while minimizing the aliasing. For this, consider two matrices 31 and 32 which will be independently applied at the output of a given channel. If we use a dirac pulse as an input signal, at the output of a channel we will get its impulsional response x(n) in the spatial domain or equivalently its * ω) in the Fourier domain. Y1 ( ω) is the spectrum of x(n) downsampled and spectrum X (* * then upsampled by 31 and Y2 ( ω) which is the spectrum of x(n) downsampled and then

FIG. 6. A downsampler followed by an upsampler.

413

A PSYCHOVISUAL-BASED CODING SCHEME

upsampled by 32 . We shall also call D the square domain D = [−π, π[ × [−π, π[. Then one can show easily with relation (23) that: Z

Z

*

Y1 ( ω) dω = D

Z

*

Y2 ( ω) dω = D

ω) dω. X (*

(28)

D

*

*

If Y2 ( ω) creates more aliasing than Y1 ( ω), then from the elementary identity relation * ) ∈ R+ in the aliasing region (a + b)2 > a 2 + b2 when a, b > 0 and from the fact that X (ω * and X (ω) ∈ R elsewhere (by construction), we see that: 1 · M

Z

ω) dω ≤ X 2 (*

D

Z

ω) dω < Y12 (*

D

Z D

ω) dω. Y22 (*

(29)

And therefore in the spatial domain by the Parseval relation: X

|y2 (n)|2 >

X

|y1 (n)|2 ≥

1 X · |x(n)|2 . M

(30)

Hence the quantity ρ=

P M · |y(n)|2 P |x(n)|2

(31)

is the criterion to choose to evaluate the quality of a matrix in terms of aliasing. This quantity allows comparison between matrices having different determinants, and once the spatial description of the filter, x(n), has been calculated, it is very fast and easy to evaluate different downsampling matrixes because the criterion ρ has a spatial formulation. In fact for a given size of an image, not all determinants are allowed. One has to verify that the ratio between the number of pixels and the quantity |det(3)| is an integer. Our results concerning the maximum determinant and the downsampling matrices are given in Tables 4 and 5. Figure 7 shows examples of the resulting replication after a downsampling and upsampling process for some channels. In fact two matrices 31 and 32 have exactly the same behavior if there exists an integer matrix E such that |det(E)| = 1 and 31 = E · 32 . So we choose matrices having the smaller coefficients magnitude. Furthermore when allowed we use symmetries to deduce one matrix from another. For example, in the corona V, the directions are deduced from each other by inverting the x and y coordinates. Now each component of the scheme has been described, so we can present the entire scheme.

TABLE 4 Maximum Determinant Obtained in Each Corona Corona

I+II

III

IV

IV

Determinant

1

6

9

12

414

SENANE, SAADANE, AND BARBA

TABLE 5 Downsampling Matrixes Corona·III ¸ 2 −2 direction 2 : 16 · 1 2 · ¸ −2 2 direction 4 : 16 · 1 2

·

¸ −1 2 · 2 2 · ¸ 2 2 · −1 2

direction 1 :

1 6

direction 3 :

1 6

direction 1 :

1 12

direction 4 :

1 12

0 · −3 · −3 · 0

direction 1 :

1 12

·

direction 4 :

1 12

·

·

· ·

3 1 1 3

Corona ·IV ¸ 0 3 1 direction 2 : 12 · −3 1 · ¸ 3 −1 1 direction 5 : 12 · 0 3

¸ ¸

¸

−1 3

2 3

−3 2

−3 −2

Corona·V ¸ 3 −3 1 · 12 2 2 · ¸ −2 −2 1 direction 5 : 12 · 3 −3 direction 2 :

¸

· direction 3 :

1 12

·

direction 6 :

1 12

·

direction 3 :

1 12

·

direction 6 :

1 12

·

·

· ·

−3 1 0 3 0 3

3 −1

2 3

2 −3

¸ ¸

¸

−3 −3 2 2

¸

IV.4. Overview of the Scheme The overall block diagrams of the coder and the decoder are shown in Figs. 8 and 9, respectively. The quantization of the local band-limited contrast rather than the luminance itself needs a particular procedure. It was shown in Section III that the local band-limited

FIG. 7. Effect of a downsampling and upsampling process with our matrix for different channels in Fourier domain. (In each downsampled corona, only two directions are shown because others can be deduced by symmetries.)

A PSYCHOVISUAL-BASED CODING SCHEME

415

FIG. 8. Image coding scheme.

contrast may be written as: Ci, j (m, n) =

L i, j (m, n) L i, j (m, n)

.

(32)

The denominator corresponds to the local average of the signal relative to the ith corona. Hence, to quantize the pixels belonging to a given subband, a coding scheme based on the local band limited contrast needs to reconstruct all the subbands having radial frequencies lower than those of the considered one. To satisfy this constraint, the coder has to

FIG. 9. Image decoding scheme.

416

SENANE, SAADANE, AND BARBA

construct progressively low frequencies for each new-coded corona. The information has to be transmited in the following order: •

Quantization, coding, and transmission of luminances of the channels of lowest frequencies (corona I+II), • Quantization, coding, and transmission of the contrast (real part and imaginary part) for each directional channel of corona III. By adding this corona with that of the low frequencies (the corona I+II) we get the low frequencies relative to corona IV after adaptation of the spectrum support by zero padding. This low frequency will be used on the decoder side to recover luminances from received contrasts. • Reiteration of the last step with coronas IV and V. For the latter, after summation with its relative low frequencies, one obtains the decoded spectrum of luminances of the picture. Notice that, like in a MICD coding scheme where the prediction is calculated from received data, the contrasts are calculated from the decoded low frequencies relative to the corona being transmitted. Based on these considerations the coder description may be summarized as follows. The input gray levels are converted to the real luminances by using a nonlinear transform (photometric measures have been used to model the nonlinearity of the screen). A Fourier transform followed by a filtering described above are then applied. Hence, on each spectral subband obtained, an inverse Fourier transform and the appropriate subsampling are performed. After the computation of the local band-limited contrast, the quantization laws of Section III are applied. At the reconstruction stage, symmetrical operations are performed. The reconstructed image in the frequential space is obtained by addition of the different spectral subband images. After an inverse FFT and the luminance-gray levels conversion, the spatial image is displayed on the monitor screen. Note that, in order to take into account degradations caused by the quantization process, the images used to compute the local band-limited contrast in the coder are those corresponding to the decoder side.

V. RESULTS The performances of this psychovisual subband image coding have been evaluated on a set of well-known images. In the case of scalar quantization, the measured linear laws have been applied with the quantization steps shown in Table 6.

TABLE 6 First Quantization Threshold of Contrast in Each Channel Corona First threshold

V.1 0.04

V.2 0.06

V.3 0.06

V.4 0.04

V.5 0.06

V.6 0.06

Corona First threshold

IV.1 0.043

IV.2 0.064

IV.3 0.064

IV.4 0.043

IV.5 0.064

IV.6 0.064

Corona First threshold

III.1 0.048

III.2 0.057

III.3 0.048

III.4 0.057

417

A PSYCHOVISUAL-BASED CODING SCHEME

In fact we have measured thresholds for the horizontal and first oblique directions of each corona (subbands V.1, IV.1, III.1, V.2, IV.2, and III.2). We have assumed that within a corona the human visual system has the same acuity in the vertical direction as in the horizontal one and also the same acuity for the remaining directions as in direction 2. Recall that these values have been obtained with an average luminance screen of 14 cd/m2 and at an observation distance of six times the height of the screen. So to take into account both the stimulus size and the background luminance variations two investigations have been considered. First, the quantization step values have been modified according to Rico’s law [42], which states that the visibility threshold decreases with the stimulus area. The product between the differential visibility threshold and the stimulus area is constant. This means that after a spectral limitation by a factor κ and the inverse Fourier transform one has an increase in the corresponding filter impulse response support by the factor κ. Hence if the filter H (u, v), used in the coding scheme, has an impulse response support (area) of S then the filter H (λu, λv) has an impulse response support of S/λ2 . Examples of how the thresholds are modified are given below for each corona: For corona V λ = 1 ⇒ first threshold of subband V.1 becomes 0.04/1 = 0.04. For corona IV λ = 22 ⇒ first threshold of subband IV.1 becomes 0.043/4 ≈ 0.01. For corona III λ = (14/3)2 ≈ 21.7 ⇒ first threshold of decision become 0.048/21, 7 ≈ 0.0022. Image quality is excellent when using such a threshold. But for the corona III when we increased the first quantization threshold by a factor of 2.3 the quality remains excellent. An explanation comes from Rico’s law which is not valid for such a variation of 21.7 in the stimulus size. In fact when the stimulus area becomes too important the differential visibility threshold loses its dependence with the stimulus area [30]. With these new thresholds (V.1: 0.04, IV: 0.01, and III.1: 0.005), we got the results of Table 7 in terms of entropy and SNR. One can see, from Table 7, that the contribution of corona IV to global entropy is, paradoxically, higher than that of corona V. Three factors explain that: (1) The downsampling matrix determinant is higher in corona V than that of corona IV, (2) The steps of quantization in corona IV are finer than those of corona V, (3) The signal activity is more important in corona IV than in corona V. Visual quality is very good, but when approaching from the screen, one can see that areas of low luminance are without default, whereas areas of high luminance allow some perceptible TABLE 7 Performances of the Scheme Using Threshold of Table 6 Picture

I+II

III

IV

V

entropy

S.N.R

Port Chimney Clown Couple Enfa Lena

0.027 0.028 0.026 0.027 0.027 0.028

0.268 0.268 0.324 0.288 0.271 0.279

0.687 0.516 0.757 0.629 0.669 0.625

0.165 0.085 0.125 0.197 0.119 0.340

1.147 0.896 1.233 1.141 1.086 1.270

36.00 37.42 36.99 35.34 34.85 35.39

Note. For each corona, the contribution to the global entropy is reported.

418

SENANE, SAADANE, AND BARBA

TABLE 8 Performances of the Scheme with an Excellent Visual Quality Due to the Adaptation of the Quantization Threshold to the Local Luminance Mean Picture

I+II

III

IV

V

entropy

M.O.S

Port Chimney Clown Couple Toys Lena

0.0165 0.0174 0.0170 0.0167 0.0168 0.0169

0.278 0.273 0.311 0.287 0.281 0.277

0.724 0.537 0.679 0.625 0.720 0.591

0.160 0.082 0.093 0.205 0.109 0.255

1.179 0.910 1.100 1.135 1.126 1.140

5 4.8 4.8 4.6 4.8 5

Note. For each corona, the contribution to the global entropy is reported.

impairments. This is due to the fact that the tests, as indicated, were realized with a given background luminance of 14 cd/m2 . The goal of the second investigation is to take this parameter into account. Different models of the threshold of visibility, depending on background luminance, are proposed in the literature such as Moon and Spencer’s law (which does not take into account the spatial frequency) or that of Lamming (given for a constant bandwidth). So we proposed to quantize the following quantity homogeneously to the local contrast which takes into account the local mean luminance and avoid the observed default Ci,0 j (m, n) =

L i, j (m, n) ³ ´α , L (m,n) L 0 · i, jL 0

(33)

where L0 = 14 cd/m2 , α = 0.7 is a good trade-off, i (respectively j) is the radial index of the channel (respectively the angular one). So if we keep the thresholds, Thresholdi,L 0j , measured for a background luminance of L 0 = 14 cd/m2 in a given channel, then we can see easily that the quantization law, in this channel, is given by: µ Ci,Qj (m, n) = 2 · k · Thresholdi,L 0j ·

L i, j (m, n) L0

¶α−1

,

k ∈ Z.

(34)

The new threshold of quantization is then: µ Thresholdi, j =

Thresholdi,L 0j

·

L i, j (m, n) L0

¶α−1

,

k ∈ Z.

(35)

TABLE 9 Performances of the Scheme with Vector Quantization Picture

I+II

III

IV

V

entropy

M.O.S

Port Chimney Clown Couple Iba Toys Lena

0.0165 0.0174 0.0170 0.0167 0.0174 0.0166 0.0165

0.086 0.066 0.079 0.082 0.087 0.077 0.063

0.241 0.186 0.186 0.199 0.282 0.208 0.169

0.192 0.185 0.175 0.207 0.244 0.193 0.203

0.47 0.34 0.33 0.42 0.579 0.4 0.47

4.6 4.4 4.4 4.2 4.6 4.4 4.6

A PSYCHOVISUAL-BASED CODING SCHEME

419

FIG. 10. The laplacian-like structure of the error image is consistent with the HVS description. On the left is the original image and on the right the error image.

One can see from Eq. (35) that for a local mean luminance smaller than L0 the quantization law is coarser and for a local mean luminance greater than L0 the quantization law is finer. With this modification we get excellent results. The corresponding performances of the coding scheme are given in Table 8 in terms of entropy and mean opinion score (M.O.S). To obtain M.O.S., subjective tests, based on the CCIR recommendation 500-3 [43], have been conducted with five observers. The double stimulus quality scale method (DSQSM) associated with the five quality scale has been used. In the case in which vector quantization is associated with a psychovisual lattice (Sect. IV) the obtained true bit rates are given in Table 9. These true bit rates are reached despite an increase in the number of samples to encode (about 40% more samples). Indeed, due to the nonparallelpipedic shape of either the angular and the radial selectivity, critical subsampling matrices cannot be found. At these entropies, the image quality given by the MOS values remains much better than conventional JPEG coding. An a posteriori interesting aspect of the scheme came from the spatial structure of the error image (see Fig. 10). On uniform areas the error is low but near transitions one can see important errors. Hence the error image has a laplacian-like structure. This is consistent with many descriptions of the human visual system concerning the perception of degradations. VI. CONCLUSION In this paper a psychovisual-based coding scheme is presented. We used a subband-like scheme based on Watson’s one. This scheme allows great flexibility concerning the characteristics of channels. These channels have decreasing relative bandwidth and decreasing angular bandwidth with increasing radial frequencies. Concerning quantization, the use of the local band-limited contrast gave very interesting results because of the linearity of the resulting quantizers and the radial masking effect taken into account in this definition of contrast. The vector quantization integrates the masking effect between the angular adjacent components to improve significantly the total bit rates of the coding scheme. The chosen scheme does not permit a critical subsampling, but with the chosen channels and with a procedure of subsampling matrices designs shown here, we came to an oversampling factor of 1.4.

420

SENANE, SAADANE, AND BARBA

Finally, performances of this scheme exhibit an excellent visual quality with an entropy of approximatively one bit per pixel in the case of scalar quantization and about 0.4 bpp in the case of vector quantization.

REFERENCES 1. W. B. Pennebaker and J. L. Mitchell, JPEG still image data compression standard, Van Nostrand Reinhold, New York, 1993. 2. A. B. Watson, DCTune: A technique for visual optimization of DCT quantization matrices for individual images, in Soc. Inform. Display Digest Technical Papers 24, 1993, 946–949. 3. F. X. Coudoux, M. Gazalet, and P. Corlay, Reduction of blocking effect in DCT coded images based on a visual perception criterion, Signal Process. Image Comm. 11, 1998, 179–186. 4. A. B. Watson, G. Y. Yang, J. A. Solomon, and J. Villasenor, Visibility of wavelet quantization noise, IEEE Trans. Image Process. 6, 1997, 1164–1175. 5. T. P. O’Rourke and R. L. Stevenson, Human visual system based wavelet decomposition for image compression, J. Visual comm. image representation 6, 1995, 109–121. 6. R. J. Safranek and J. P. Johnston, A perceptually tuned sub-band image coder with image dependent quantization and post quantization data compression, in Proc. IEEE ICASSP, Glasgow, Scotland, Vol. 3, pp. 1945–1948, 1989. 7. R. E. Van Dyck and S. A. Rajala, Subband/VQ coding in perceptually uniform color spaces, in Proc. IEEE ICASSP, San Francisco, CA, Vol. 3, pp. 237–240, 1992 . 8. D. J. Heeger and J. Nachmias, A computer model of human retinal visual processing: effect of compressive nonlinearity on spatial frequency filters, in Proc. Conf. on Pattern Recognition, Montreal, Canada., pp. 1281–1287, 1984. 9. F. L. Kooi, R. L. DE Valois, and E. Switkes, Spatial localisation across channels, Vision Res. 31, 1987, 1627–1631. 10. R. M. Shapley and P. Lennie, Spatial frequency analysis in the visual system, Annual Review of Neuroscience, Vol. 8, pp. 547–583, Annual Reviews, Palo Alto, CA, 1985. 11. M. G. Harris, The perception of moving stimuli: a model of spatio-temporal coding in human vision, Vision Res. 26, 1986, 1281–1287. 12. D. Yager, P. Kramer, M. Shaw, and N. Graham, Detection and identification of spatial frequency: model and data, Vision Res. 24, 1991, 1067–1072. 13. E. T. Davis, P. Kramer, and D. Yager, Shifts in perceived spatial frequency of low-contrast stimuli: data and theory, J. Opt. Soc. Amer. 3, 1986, 1189–1202. 14. C. Zetsche and G. Hauske, Multiple channel model for the prediction of subjective image quality, SPIE Human Vision Visual Process. Digital Display 1077, 1989, 209–216. 15. E. R. Simoncelli and E. H. Adelson, Non-separable extensions of quadrature mirror filters to multiple dimensions, Proc. IEEE 78, 1990, 652–664. 16. R. H. Bamberger and M. J. T. Smith, A filter bank for directional decomposition of images: Theory and design, IEEE Trans. Signal Process. 40, 4, 1992, 882–893. 17. R. Navarro and A. Tabernero, Gaussian wavelet transform: two alternative fast implementations for images, Multidimensional Systems Signal Process. 2, 1991, 421–436. 18. T. G. Campbell, T. R. Reed, and M. Kunt, An orthogonal image transform based on QMF filters, in (J. Torres, E. Mosgraw, and M. A. Lagunas, Eds.), Signal Processing V: Theorie and Applications, Proceedings of EUSIPCO ’90, Barcelona, Spain September 18–21, pp. 877–880, Springer-Verlag, Berlin/ New York, 1990. 19. E. Shlomot, Y. Y. Zeevi, and W. A. Pearlman, The importance of spatial frequency and orientation in image decomposition and coding, in Visual Communications and Image Processing II, Vol. 845, pp. 152–158, SPIE, Bellingham, UA, 1987. 20. A. B. Watson, Efficiency of a model human image code, J. Opt. Soc. Amer. 4, 1987, 2401–2417. 21. S. J. Anderson and D. C. Burr, Spatial summation properties of directionally selective mechanisms in human vision, J. Optical Soc. Amer. A 8, 8, 1991, 1330–1339. 22. D. C. Burr and S. A. Wijesundra, Orientation discrimination depends on spatial frequency, Vision Res. 31, 1991, 1449–1452.

A PSYCHOVISUAL-BASED CODING SCHEME

421

23. S. J. Anderson, D. C. Burr, and M. C. Morrone, Two dimensional spatial and spatial frequency selectivity of motion sensitive mechanisms in human vision, J. Opt. Soc. Amer. A 8, 1991, 1340–1351. 24. A. Saadane, D. Barba, and H. Senane, The estimation of visual bandwidths and their impact in image decomposition and coding, Proc. SPIE 2094, 1993, 1508–1515. 25. A. M. Derrington and G. B. Henning, Some observations on the masking effect of 2D stimuli, Vision Res. 29, 1989, 241–246. 26. M. E. Perkins and M. S. Landy, Nonadditivity of masking by narrow-band noise, Vision Res. 31, 1991, 1053–1065. 27. J. Nachmias and B. E. Rogowitz, Masking by spatially-modulated gratings, Vision Res. 23, 1983, 1621–1629. 28. A. A. Michelson, “Studies in Optics,” University of Chicago Press, Chicago, IL, 1927. 29. M. Pavel, G. Sperling, T. Riedl, and A. Vanderbeek, Limits of visual communication: the effect of signal-tonoise ratio on the intelligibility of american sign language, J. Opt. Soc. Amer. A 4, 1987, 2355–2365. 30. G. S. Rubien and K. Siegel, Recognition of low-pass filtered faces and letters, Invest. Opthalmol. Vis. Sci., Suppl. 25, 1984, 635–642. 31. R. F. Hess, A. Bradley, and L. Piotrowsky, Contrast-coding in amblyopia, I, Differences in the neural basis of human amblyopia, Proc. Roy. Soc. London, Ser. B 217, 1983, 309–330. 32. E. Peli, Contrast in complex images, J. Opt. Soc. Amer. A 7, 1990, 2032–2040. 33. A. B. Watson, The cortex transform: rapid computation of simulated neural images, Comput. Vision Graphics Image Process. 39, 1987, 311–327. 34. A. K. Jain, Fundamentals of Digital Image Processing Prentice Hall, Englewoods Cliffs, NJ 1989. 35. A. Saadane, H. Senane, and D. Barba, Visual coding: Design of psychovisual quantizers, J. Visual Comm. Image Representation 9, 1998, 381–391. 36. P. Whittle, Increments and decrements: luminance discrimination, Vision Res. 26, 1986, 1677–1691. 37. F. Kingdom and B. Moulden, A model for contrast discrimination with incremental and decremental test patches, Vision Res. 31, 1991, 851–858. 38. P. Zador, Asymptotic quantization error of continuous signals and their quantization dimension, IEEE Trans. Inform. Theory 28, 1982, 139–149. 39. H. Senane, A. Saadane, and D. Barba, Image coding in the context of a psychovisual image representation with vector quantization in I.C.I.P., Washington D.C, October 23–26. 1995. 40. PP. Vaidyanathan, Multirate Systems and Filter Banks, Prentice Hall, Englewoods Cliffs, NJ, 1993. 41. A. N. Akansu and R. A. Haddad, Multiresolution Signal Decomposition, Academic Press, San Diego, 1992. 42. V. Haese-Coat, Visibilite des degradations sur textures: application au codage des images par syst`eme M.I.C.D. a` quantification adaptative, Th`ese de doctorat trosi`eme cycle, sp´ecialit´e traitement de l’information, INSA de RENNES, Ma 1987. 43. CCIR, Method for the Subjective Assessment of the Quality of Television Pictures, Recommendations and Reports of the CCIR, Recommendation 500-3, Vol. XI, Part 1, 1986.

Design and Evaluation of an Entirely Psychovisual-Based Coding Scheme

Design and Evaluation of an Entirely Psychovisual-Based Coding Scheme

Recommend Documents