A vector quantization scheme using prequantizers of human visual effects

A vector quantization scheme using prequantizers of human visual effects

SIGNAL PROCESSING: IIWkGE COMMUNICATION ELSEVIER Signal Processing: Image Communication 12 (1998) 13-21 A vector quantization scheme using prequan...

821KB Sizes 0 Downloads 21 Views

SIGNAL PROCESSING:

IIWkGE COMMUNICATION

ELSEVIER

Signal Processing: Image Communication 12 (1998) 13-21

A vector quantization scheme using prequantizers of human visual effects Chun-Hung

Kuoa,*, Chang-Fuu

Chenb

aDepartment of Electrical Engineering, Tatung Institute of Technology 40, Chung-Shari North Rd., Sec. 3. Taipei, Taiwan 10451. ROC ‘Department

of Computer and Communication

Engineering,

National Institute of Technology at Kaohsiung,

Taiwan, ROC

Received 16 January 1996

Abstract In this paper, a new human visual system, which considers the Weber’s law, spatial masking effect, and Mach band effect in the gray level domain, is applied to the vector quantization (VQ) scheme. Since the new human visual system only requires to look up a ROM table to prequantize the image for performing a transformation from the gray level to the index of the prequantizer, it simplifies the VQ process and reduces the computation. The simulation results indicate that the results of the vector quantization with the new human visual system is better than those with the conventional VQ and classified VQ no matter in the bit-rate or the quality of the reconstructed images. 0 1998 Elsevier Science B.V. All rights reserved. Keywords;

Vector Quantization

(VQ); Prequantizer;

Human

Visual Effect (HVE); Image coding

1. Introduction

The technique of the image compression on the spatial domain includes differential pulse code modulation (DPCM) [7], block truncation coding (BTC) [3], and Vector Quantization (VQ) [4,5, 10, 131 mainly. The DPCM and BTC are two methods on high-quality compression; therefore, the compression ratio is not high. The VQ, shown in Fig. 1, is a popular method for a high compression ratio in the spatial domain. The codebook, which consists of some codevectors, is established at the transmitter and receiver for the VQ scheme, respectively, and each codevector in the codebook is a block of

*Corresponding 0923-5965/98/$19&l

author. 0

1998 Elsevier Science B.V. All rights reserved

PII SO923-5965(97)00032-5

N x N pixels. At the transmitter, a comparison task between a block of the image and each codevector of the codebook is performed to search a codevector which is the closest to the block based on the minimum mean square error. Then, the index of the codevector in the codebook is transmitted to the receiver. At the receiver, the corresponding codevector of this index is used as the reconstructed block to reconstruct the image. If the number of codevectors in a codebook is 256 and the block size for VQ is 4 x 4 pixels, the bitrate is 0.5 bits per pixel (bpp), that is, the compression ratio is 16. Though the VQ has a simple structure for the receiver and its compression ratio is very high, the quality of the reconstructed image is not high and it generates the block effect. The reason why the conventional VQ generated the

C.-H. Kuo, C.-F. Chen / Signal Processing:

Image Communication

0

vector

I---! /I code

1

vector

I

2

Original B’ock

code vector

which the

cede

I2 (1998) 13-21

code

l-----l vector

code vector

code vector

2

_ri

fullsea;& I I I I-- u

L-l

/I

code

vector

L-l

code vector I-7 I

code book Transmitter

,

code book Receiver

Fig. 1. The structure of the conventional VQ.

block effect is that the images dissimilar to those in the training set may not be well represented by the codevectors in the codebook [ll]. Recently, the human visual system (HVS) has been utilized to the image compression widely. The concept of the technique is to find the nonsensitive information to the human eyes such that the information does not require to be transmitted. The modulation transfer function (MTF) [ 1,2,6,11,12] is the most famous human visual system and considers the sensitivity between the human eyes and spatial frequency. The MTF has been widely used in transform coding [12] because it is working as a prefilter to filter the transform coefficients to obtain a less bit-rate. However, the MTF is not suitable for the VQ because it requires to perform the discrete cosine transform (DCT) to obtain the transform coefficients such that the computation complexity for the VQ increases drastically. Therefore, it is necessary to develop a human visual system in the spatial domain for the VQ to support the high compression ratio without complex computation. The Weber’s law, visual masking effect, and Mach beand effect [6, 111 are three famous human visual effects in the spatial domain. These human visual effects show the relation between the

human eyes and the luminance of the light. Recently, the Weber’s law has been modeled in the gray level domain and it can improve the compression ratio of the DPCM largely [8]. In [S], the model of the Weber’s law only requires to look up a table to prequantize the original image, therefore, its implementation only requires a read-onlymemory (ROM) such that the realization is much simpler than that of the MTF. The advantage of the prequantization for an image is that it increases the correlation between pixels largely under the low computation complexity. The characteristics of high correlation can increase the compression ratio significantly. Since the VQ is based on the compression with blocks, in addition to the Weber’s law, the visual masking and Mach band effects can also be utilized to improve the compression ratio because both the effects consider the sensitivity of the human eyes on a smaller region. In this paper, the Weber’s law, visual masking effect, and Mach band effect are modeled in the gray level domain to propose an entire human visual system in the spatial domain. The proposed system improves the VQ technique and simulation results indicate that the proposed VQ can reduce the block effect and increase the image quality largely at the same compression ratio.

C-H. Kuo, C.-F. Chen / Signal Processing:

Image Communication

12 (1998) 13-21

15

2. The human visual model

The Weber’s law, visual masking effect, and Mach band effect show the human visual effect in the spatial domain. The Weber’s law considers the sensitivity of the human eyes to a luminance under a background. An experiment indicates that the sensitivity of the human eyes to a luminance is a logarithmic function [l 11. In an image system, the luminance is expressed in a gray level. Let x be a gray level from 0 to 255 and p be the mean of a region in an image to express the background of the region, then, the Weber’s law in the gray level system can be modeled as [S]

C(x) =

I lnCci(ei_- x)il(er(l27.5 - (x - s)))], 0 d x d 128,

1

lnC(x - G)(x. - cn)/(ci(255 - +))I, 128~~~2255. i

The result of the quantization

cl = 127.512,

(2a)

cn = (128 - e-k )/(l - e-k),

(2b)

cr = 128/(1 - eek),

(2c)

and the parameter k is defined as (2d)

Similarly, for the high dark background (p < 128), ( C(x) =

lnCciciJ((l27.5 - (x - s))(et. - 4)1, 0 < x < 128,

1 L

lnC(255 - cn)(x - cJI(ci(x - cdl,

(3)

128 d x < 255,

where cu = (128 - e”)/(l - ek),

(4a)

cL = - 128ek/(l - ek),

(4b)

and the parameter k is defined as k = 2.5/(1 + eP/25).

for the contrast

function

(1)

For the high bright background (p > 128),

k = 2,5/(1 + e(255-p)/5fi).

Gray Level

(4c)

Thereafter, the contrast function C(x), modeled by the Weber’s law, is quantized to be n levels

uniformly as shown in Fig. 2. The uniform quantization expresses the same sensitivity to the human eyes at each quantization level under the background of the image, that is, the set of the gray levels in each quantization level has the same contrast for the human eyes under the background and the difference between the maximum and minimum gray levels in the set is called visual threshold [ 111. For decoding, a median luminance in each quantization level is used to represent the gray levels in the set for the quantization level. At the transmitter, the gray level is transferred to the index of the quantization level for transmitting, and the indexes are decoded to the represented gray level of the quantization level at the receiver. For a specified background, the transformation between the gray level and the index of the corresponding quantization level can be easily implemented by looking up a ROM table with the size of 256. The number of the backgrounds, the mean of the image, is 256 for an 8-bit gray level; therefore, it requires to establish 256 ROM tables for 256 different contrast functions. By experiments, the slight background error almost does not influence the quality of the image; therefore, we can quantize the mean suitably to reduce the number of the tables. In this paper, the mean is quantized uniformly to 16 levels, so that only 16 tables are established with the contrast function C(x) for these 16 different backgrounds.

16

C.-H. Kuo, C.-F. Chen / Signal Processing:

If an image is divided into nonoverlapping regions, the number of quantization levels can be chosen adaptively by applying the concepts of the visual masking effect and Mach band effect [ll]. In the visual masking effect, it is found that the visual threshold of a luminance on both sides of a large change in the background luminance increases, that is, the sensitivity of a luminance to the human eyes on both sides of an edge reduces [ll]. According to this visual masking effect, the block with an edge can be prequantized with less number of quantization levels such that the visual threshold increases naturally. In general, the standard deviation of a region, 0, is a good and simple estimator for detecting the edge with low or high contrasts. That is, the less number of quantization levels is given to the large standard deviation. In addition to the visual-masking effect, the Mach band effect is another human visual effect which influences the number of quantization levels. The Mach band effect refers to a change in perceived brightness at an edge, there is an apparent increase in the brightness on the light side and a decrease on the dark side [ll]. Therefore, to the observation of human eyes, the contrast on an edge is increased such that the number of quantization levels can be reduced further. From experiments, the brighter or darker the background is, the more obvious the Mach band effect is. For example, considering a vertical edge with the same difference between the light and dark sides, but with the different background, we can easily observe that the Mach band effect with the bright background is more obvious than that with the suitable background with the mean about 128. To summarize the visual-masking effect and Mach band effect, a parameter, called edge-sensitivity ratio (ESR), is proposed to estimate the sensitivity of the edge with a low or high contrast to the human eyes in a region. The parameter ESR is defined as ESR = aln(lp - 1281 + w),

(5)

where p and 0 express the mean and standard deviation of the region, respectively, and w is a constant to avoid the undefined logarithm and is given as 10 in this paper. It should be noted that the first

Image Communication

I2 (1998) 13-21

factor of the ESR, cr, measures the contrast of the edge to consider the visual-masking effect and is weighted by the Mach band effect expressed by the second factor of the ESR, ln(lp - 1281+ w), where the operation of taking logarithm for mean expresses that the sensitivity of the human eyes to the luminance is proportional to a logarithmic function [ll]. Due to the reason, the sensitivity of the human eyes to the background is proportional to the ln(lp - 1281),not (1~ - 1281)directly. Eq. (5) expresses that the larger the ESR is, the easier the edge is to be felt in existence within a region. Thus, we can utilize this ESR to set the number of quantization levels for an adjustment of the visual threshold. In this paper, we classify the ESR into four different classes with different number of quantization levels, n, for reducing the capacity of the ROM. The four classes are classified as No. of quantization n0

= 1

levels

for small ESR,

nl

for median ESR,

112

for large ESR,

n3

for very large ESR,

where ni expresses the number of the quantization levels and no > It1 > n2 > n3. The increase of the visual threshold caused by the visual-masking effect and Mach band effect makes the number of the quantization levels for quantizing the contrast function C(x) decrease. After prequantization, the correlation between pixels increases much more than that without prequantization such that using a less codebook size can still include all of the characteristics of all training vectors. It should be noted that the less codebook size reduces the bit-rate greatly in the VQ scheme. Since the number of quantization levels is classified into four classes and each class includes 16 ROM tables for a specified number of quantization levels as described in the second paragraph of this section, 64 ROM tables are utilized to be chosen for prequantizing a region such that there are 6 bits overhead information to be transmitted for each region. The overhead is not too sensitive to the bitrate, for example, if the region size is 16 x 16 pixels, the bit-rate for the overhead is 0.023 bpp.

C.-H. Kuo, C.-F. Chen / Signal Processing:

:

Nonlinear Encoder of no

i ;

Nonlinear Encoder of n,

levels

i

levels

Image Communication

Nonlinear I Encoder of n2 I levels

1 :

+

1 \

Conventional VQ

Conventional

‘--,

1 Nonlinear Decoder of IL, levels 1

VQ

Nonlinear Decoder of n, levels 4

I

12 (I 998) 13-21

17

Nonlinear Encoder of n3

:

levels



J c

Conventional VQ

1

1

Nonlinear Decoder of n1 levels 1

Nonlinear Decoder of n3 levels

-

*I

Fig. 3. The structure of the proposed VQ

3.

The VQ with the human visual system

The structure of the proposed VQ is shown in Fig. 3. The image is first divided into nonoverlapped regions and the mean and standard deviation of each region are calculated for classifying a region into the very large, large, median or small edgesensitivity ratio based on Eq. (5). Then, the system performs a transformation for each pixel of the region from the value of the gray level to the index of the quantization level using the corresponding ROM table. The size of a region is set as 2N x 2N pixels in order to process the VQ conveniently because the block size processed by VQ is 4 x 4 pixels, generally. As shown in Fig. 3, the region for each class is processed by the individual subsystem. Therefore, four codebooks for VQ should be established in advance and the size of the codebook can be different each other for reducing the bit-rate. If the size of each codebook is given as Si for each class i (i = 0, 1,2,3), the large ESR represents the existing edge and many different types of edges, such as horizontal, vertical and diagonal edges, make the range of the ESR value be distributed

widely such that the codebook requires large capacity. On the contrary, if a codebook with small size for a large ESR is used, the characteristics of the different edges fail to be included completely such that the large distortion is generated and the quality of the reconstructed image is not good. In this paper, we suggest that So < S1 < SZ < SJ. The same idea is also proposed in [13]. For each region with size 16 x 16 pixels which is divided into 16 blocks with 4 x 4 pixels, 16 indexes of the codebook are transmitted to the receiver. According to the received indexes, the region is reconstructed by code vectors of the codebook at the receiver. Finally, the reconstructed region is decoded to be the gray level by looking up the ROM table, specified by the transmitted overhead (6 bits).

4. Simulation The image ‘LENA’ with size 256 x 256 pixels shown in Fig. 4, is utilized to simulate the proposed VQ, conventional VQ and classified VQ [13].

C.-H. Kuo. C.-F. Chen / Signal Processing:

Fig. 4. The test image ‘LENA’.

Seven images with size 256 x 256 pixels, ‘PEPPER, ‘JET’, ‘COUPLE’, ‘BABOON’, ‘HOUSE’, ‘CAR and ‘BOATS’, are used to work as the training images. In this paper, the LBG algorithm is used to train the codebook [9]. For the conventional VQ, all training images are used to train one codebook; however, the proposed VQ should classify all the training images into four training sequences in advance and train them individually to become four codebooks. In this simulation, the region for classification is set as a block with size 16 x 16 pixels and the choice of the number of quantization levels is dependent on the following condition: No. of quantization

levels

32

for ESR < 24,

24

for 24 < ESR d 48,

= 16 for 48 < ESR<72,

Image Communication

I2 (1998) 13-21

are classified into six classes, shade, midrange, horizontal edge, vertical edge, diagonal edge and mixed edge, and all the training vectors included in each class are trained individually to perform six codebooks. Therefore, in this scheme, a block should require additional 3 bits as an overhead to identify which codebook the block should look up. Assume that the sizes of four codebooks for the proposed VQ are So, . . . , S3, respectively. Clearly, the block with size 4 x 4 pixels is represented with an index of log, Si bits and the bit-rate for each block with class i is log, Si/16. Let po, . . . , p3 be the probability of the block for classes 0 to 3, respectively, the total bit-rate for the proposed VQ is (bit-rate),,, = (iti

log, Si)/l6

I 12 for ESR > 72. It should be noted that the quality of the image quantized and dequantized by using the proposed prequantizer (the entire process in Fig. 3 without conventional VQ step) is the same as that of the original image for all images in the simulation. In this classification, the bit-rate, including the overhead, is just 0.3125 bpp, if the size of the four codebooks are assigned as 8, 16,32 and 64, respectively. It should be noted that if the size of the codebook is doubled at a time, the bit-rate is increased & bpp at a time. The result lets us compare the proposed VQ with the conventional VQ easily under the same bit-rate. For the classified VQ in [13], all the training vectors with size 4 x 4 pixels

(8)

where the (bit-rate),,, expresses the bit-rate of the overhead for the region required to be transmitted to the receiver to identify which ROM table should be used to reconstruct the region. Fig. 5 shows the profile of the different bit-rates and PSNRs for the proposed VQ, conventional VQ and the classified VQ [133, while the PSNR is defined as PSNR = 1010g,o(2552/mse),

(7)

+ (bit-rate),,,,

(9)

where the mse is the mean square error between the original and reconstructed images. For comparing with each other, Fig. 6 shows some of the reconstructed images with the proposed VQ, conventional VQ and classified VQ. From Figs. 5 and 6, under the same bit-rate, the image quality with the proposed VQ is better than that with the conventional VQ obviously. It should be noted that the block effect at edges with the proposed VQ is much less than that with the conventional VQ. Furthermore, comparing the proposed VQ with classified VQ, we can find that the quality with the proposed VQ is much better than that with the classified VQ when the bit-rate is 0.5 bpp. And the quality with the proposed method in the bit-rate 0.5 bpp is nearly the same as that with the classified VQ in the bit- rate 0.625 bpp. Therefore, the performance of the proposed VQ is better than those of the conventional VQ and the classified VQ.

C.-H. Kuo, C.-F. Chen / Signal Processing:

Image Communication

I2 (1998) 13-21

19

33 I 32 -

P 31 S 30 .

25 0.3125

0.375

0.4375

I

I

0.5

0.5625

,

0.625

0.7

0.8

Bit Rate (bpp) Fig. 5. The profiles

‘of the bit-rate

(bpp) and PSNR

(dB) with the proposed

Because the proposed scheme considers the human visual effect, the quality of the reconstructed images with the proposed method is better than those with the conventional and classified VQ. In addition, in the proposed method, each block is prequantized by a prequantizer in advance such that the correlation between pixels increases largely and the total number of the possible image patterns that could occur is reduced largely. For example, if each pixel with 256 gray levels in a block with 4 x 4 pixels is prequantized as i levels (i<<256), the total number of the possible image patterns that could occur is i16 patterns rather than 256r6 patterns. The fact indicates that if the number of training patterns are sufficient, the proposed method trains a better codebook than the conventional VQ and classified VQ do. Therefore, each block performing VQ can search a good codevector to reconstruct the image, especially a block with an edge, such that the PSNR of the reconstructed image with the proposed VQ is much higher than that with other method. Fig. 7 shows the convergence rate of the each codebook for the proposed and conventional VQ

VQ. conventional

VQ and classified

VQ

when the bit-rate is 0.375 bpp, where the convergence rate, denoted as CR, is defined as CR = (Di _ I-

Di)/Diy

(10)

where Di expresses the mean square error of the ith iteration, which all training vectors are quantized into the codebook determined in the (i - l)th iteration. From Fig. 7, the final convergence rates of all codebooks are 0 before 40 iterations, where the convergence rate 0 expresses that the training is finished and the codebook can not be updated. Observing Fig. 7, most of the codebook of the proposed VQ is finished about 20 iterations; however, the conventional VQ requires about 40 iterations. Though the convergence rate of Class 1 shown in Fig. 7 is not faster than that of the conventional VQ, the total training time of the proposed VQ is faster than that with conventional VQ. The reason of the less number of the iterations required by the proposed VQ is that the correlation between pixels after the prequantization with the human visual system increases. In addition, from our experiments, the convergence rate of each class with the proposed VQ is about equal to that

20

C.-H. Kuo, C.-F. Chen / Signal Processing:

(a) 0.375 bpp;

28.2

dB

(e) 0.5 bpp;

28.5

Image Communication

I2 (1998) 13-21

dB

-1 -

I

6

II

16

21

26

31

36

the number of iterations Fig. 7. The convergence rate of training codebook posed VQ and conventional VQ when the bit-rate

(b) 0.4375 bpp;

29.OdB

(f) 0.5 bpp;

27.7

for the prois 0.375 bpp.

with the classified VQ [13]. Therefore, if the number of iterations is fixed for the proposed and conventional VQ, the codebook with the proposed VQ is better than that with the conventional VQ such that the quality of the reconstructed image with the proposed VQ is better than that with the conventional VQ, especially the block effect at edges.

dB

5. Conclusion (c) 0.5 bpp;

29.7 dB

(d) 0.4375 bpp;

27.8

(g) 0.5625 bpp;

dB

(h) 0.625 bpp;

28.6

29.7

dB

dB

Fig. 6. The reconstructed images (a)@) with the proposed VQ, (d)(e) with the conventional VQ and (f)-(h) with the classified

vQ C131.

In this paper, the human visual effects of the spatial domain are modeled in the gray level domain to propose a human visual system for the compression of the image in the spatial domain. The proposed human visual system works as a prequantizer to remove the nonsensitive information of the image to the human eyes and the operation of the prequantization only requires to look up a ROM table and avoids the complex computation. Then, the proposed human visual system is applied to the VQ technique. Because the prequantization increases the correlation between pixels, the training sequence can be classified easily. Due to this satisfactory codebook for the proposed VQ, the PSNR of the reconstructed image with the proposed VQ is better than those with the conventional VQ and classified VQ under the same

C.-H. Kuo, C.-F. Chen / Signal Processing:

bit-rate. Since the proposed VQ uses the human visual system, the quality of the reconstructed image to the human eyes with the proposed VQ_ method is much better than those with the conventional VQ and classified VQ obviously. Especially, the proposed VQ can remove the block effect at edges caused by the conventional VQ. It indicates that the proposed human visual system can remove the nonsensitivity information to the human eyes efficiently to increase the compression ratio for the VQ technique.

Acknowledgements The authors would like to thank the anonymous reviewers for their valuable suggestions. This work was supported in part by the Tatung Company under Contracts 81-1207-40 and 82-1207-38, and by the National Science Council, Taiwan, under Grants NSC 83-0404-E036-013 and NSC 84-2213E036-007. The authors are grateful to the financial aid from them.

References [l]

C.F. Chen, H.H. Lin, Progressive image transmission using a prefilter and a modified difference pyramid structure, J. Chinese Inst. Eng. 17 (1994) 259-269.

Image Communication

PI B. Chitprasert,

12 (1998) 13-21

21

K.R. Rao, Human visual weighted progressive image transmission. IEEE Trans. Commun. COM-38 (1990) 1040-1044. using block [31 E.J. Delp, O.R. Mitchell, Image compression trmcation coding, IEEE Trans. Commun. COM-27 (1979) 1335-1342. and Signal 141 A. Gersho, R.M. Gray, Vector Quantization Compression, Kluwer Academic Publishers, Dordrecht, 1992, pp. 309-689. IEEE Acoust. Speech 151 R.M. Gray, Vector quantization, Signal Process. Mag. (1984) 4-29. of Digital Image Processing. PI A.K. Jain, Fundamentals Prentice-Hall, Englewood Cliffs, NJ, 1989, pp. 49.-78 and 476-561. 171 N.S. Jayant, P. Nell, Digital Coding of Waveforms Principles and Application to Speech and Video, Prentice-Hall, Englewood Cliffs, NJ, 1984, pp. 252-324. with the human I21 C.-H. Kuo, C.-F. Chen, A prequantizer visual effect for the DPCM, Signal Processing: Image Communication 8 (1996)433442. 191 Y. Linde, A. Buzo, R.M. Gray, An algorithm for vector quantizer design, IEEE Trans. Commun. COM-28 (1980) 84-95. [lo] N.M. Nasrabadi, R.A. King, Image coding using vector quantization: a review, IEEE Trans. Commun. COM-36, (1988) 957-971. [ll] A.N. Netravali, B.G. Haskell, Digital Picture Representation and Compression, Plenum Press, New York, 1988, pp. 245-265. [12] K.N. Ngan, K.S. Leong, H. Singh, Adaptive cosine transform coding of images in perceptual domain, IEEE Trans. Acoust. Speech Signal Process. ASSP-37 (1989) 1743--1750. [13] B. Ramamurthi, A. Gersho, Classified vector quantization of images, IEEE Trans. Commun. COM-34 (1986) 1105-1115.