Neural networks for classified vector quantization of images

Neural networks for classified vector quantization of images

EngngApplic. Artif. Intell. Vol. 5, No. 5, pp. 451-456, 1992 0952-1976/92 $5.00+0.00 Copyright © 1992 Pergamon Press Ltd Printed in Great Britain. A...

507KB Sizes 0 Downloads 35 Views

EngngApplic. Artif. Intell. Vol. 5, No. 5, pp. 451-456, 1992

0952-1976/92 $5.00+0.00 Copyright © 1992 Pergamon Press Ltd

Printed in Great Britain. All rights reserved

Contributed Paper

Neural Networks for Classified Vector Quantization of Images CHENG-CHANG LU Kent State University YONG HO SHIN Kent State University

Recently, the vector quantization (VQ) technique has received considerable attention and has become an effective tool for data compression. It provides high compression ratios and simple decoding processes for digital images. However, studies of practical implementation of VQ have revealed some major difficulties such as edge integrity of the reconstructed images and computational complexity of codebook design. Over the past few years, a new wave of research in neural networks has emerged. Neural network models have provided an effective means of soloing computationally intensive problems. This paper proposes the implementation of classified vector quantization for image compression with neural network models. In order to preserve the edge integrity and improve the efficiency of codebook design, the proposed method includes a multilayer perceptron model for edge classification and a self-organization model for codebook design. A system architecture is proposed, and simulation results demonstrate improvements in visual quality and input encoding. Keywords: Image compression, neural networks, vector quantization.

INTRODUCTION In recent years, the demand for digital image transmission and storage has increased dramatically. To minimize the memory for storage and the bandwidth for transmission, image data compression techniques become mandatory in applications to facsimile transmission, computer communication, medical images, teleconferencing, visual communication, etc. Recently, vector quantization (VQ) has received considerable attention and has emerged as an effective tool for image compression. ~ It provides high compression ratios and simple decoding processes. However, studies of the practical implementation of VQ have revealed some major difficulties such as edge integrity of the reconstructed images and computational complexity of codebook design. Edge integrity plays an important role in visual perception. 2 In most VQ implementations a single codebook is used to encode all blocks, which include edge blocks, and the mean squared error (MSE) is employed as the distortion measurement. Edge blocks usually count only as a small portion of the training sequence. As a result, there are not enough codewords that can be obtained to represent edge blocks. In Correspondence should be sent to: Dr Cheng-Chang Lu, Department of Mathematics and Computer Science, Kent State University, P.O. Box 5190, Kent, OH 44242-0001, U.S.A. 451

addition, edge blocks tend to be misrepresented when a MSE measurement is used, since MSE does not take the visual perception model into account in the computation. Therefore, in order to achieve a good subjective visual quality using VQ, preferential treatment is needed to process edge blocks. The most popular algorithm for VQ codebook design so far has been the Linde-Buzo-Gray algorithm (LBG algorithm) and its variants. 3 The practical implementation of the LBG algorithm has been limited because it is computationally demanding in codebook generation as well as vector encoding. Over the past few years, a new wave of research in neural networks has emerged. 4 Neural network models offer an efficient means of performing certain computations such as pattern classification, artificial intelligence, and speech and vision recognition. Information processing in neural systems tends to be parallel, asynchronous and stochastic rather than sequential, clocked and deterministic as in conventional computer systems. In applications to image compression, the neural style of processing information holds considerable promise because of its learning and real-time processing capabilities. In this paper, it is proposed to implement classified vector quantization for image compression with neural network models. In order to preserve the edge integrity and improve the efficiency of codebook design, the

452

CHENG-CHANG LU and YONG HO SHIN: NEURAL NETWORKS

proposed method includes a multilayer perceptron model for edge classification and a self-organizing model for codebook design. First, the main features of the vector quantization of images will be reviewed and the neural-network-based VQ will be described. Then the implementation issues that are needed to handle edge blocks effectively will be discussed. Finally, the system architecture will be proposed and the performance will be evaluated.

NEURAL.NETWORK-BASED VECTOR QUANTIZATION This section first reviews the vector quantization of images and then describes the neural network model for VQ codebook design.

Vector quantization An N-level VQ can be viewed as a mapping Q from a k-dimensional Euclidean space R k into a finite subset Y

x

.

[~ distortion{

t. ,o

Fig. 1. Encoding and decoding processes of VQ.

process is simply a table lookup and a reconstructed block can then be obtained. The LBG algorithm is known to be computationally demanding. Nevertheless the compression ratio is substantially good. For example, a good-quality reconstructed image can be obtained with codebook size 4096 and block size 4 × 4, which is equivalent to 0.75 b/ pixel compared to 8 b/pixel for an original image.

Neural-network-based VQ Neural-network models have provided an effective method for solving computationally intensive probof R k lems. The principle of designing neural-network models developed from the understanding of the neurQ: Rk----~y (1) ons in the brain. Neurons are placed in an orderly fashion and reflect some physical characteristics of an where Y={Yl, Y2. . . . Yu} is the set of reproduction external stimulus. Although much of the low-level vectors. In image VQ the first step is to partition the organization in the brain is genetically predetermined, image into contiguous small square blocks. The block it is likely that some high-level organization, which size is application-dependent and it can be varied from promotes self-organization, is created during learning. 2 x 2 to 6 x 6 in most cases. For every k-dimensional The Kohonen self-organizing feature map (KSFM) is input vector x, the distortion between the input x and known by its ability to form clusters from training every codeword y~(i= 1, 2 . . . N) is computed. The samples for pattern classification applications without codeword Yl is selected as the representation vector for supervision. 5 VQ can be treated as a pattern classificax if tion problem and, therefore, KSFM can be applied to VQ Codebook design and encoding. 6 Figure 2 shows the D(x, = min D(x, (2) connections between the input vector and output l<~i~N neural units. The input vector is connected to every output neural unit; a weight wi/is associated with the ith The mean squared error (MSE) is usually employed as element of the input vector and that of the jth output a distortion measurement and unit. During the training phase, each input vector is introduced to the network without specifying the k (3) desired output node. The neural unit that gives the D(x, y/) = E (x/-Yij). lowest distortion is declared the winning node. Weights i=1 of the winning codeword and its neighboring entries Extensive studies of codebook design have been will be updated according to recently performed by many researchers. The LBG wij(n + 1) = wij(n) + rl(n) ( x , - wii), (4) algorithm and its variants have been developed and widely used. Given an initial eodebook with size N, the where 1 <~i <~k, we(n ) is the weight of the ith element of training sequence is first partitioned into a set of N update weight vectors subspaces according to the minimum distortion criterion. The vector centroid for each subspace is calculated and an updated distortion is obtained. This process will W~ be stopped only when the average distortion goes =I below a threshold. Figure 1 shows the VQ encoding all and decoding processes. Given a codebook, the distortion between the input vector and every codeword needs to be calculated in input X the encoding. The codeword with minimum distortion Fig. 2. Connectionsof KSFM. will be used to represent the input vector. The decoding

Yl)

Yi).

CHENG-CHANG LU and YONG HO SHIN: NEURALNETWORKS output node j at training time n. Both the learning parameter r/(n) and the size of affected neighboring vectors are decreasing functions of training time n. After enough input vectors have been processed, weights {wij} between input space and output space would be determined. This set of weights is equivalent to a set of codewords used in the LBG algorithm. KSFM has advantages over the LBG algorithm in matters of implementation. KSFM is, relatively speaking, easier to implement and has a lower computational complexity than the LBG algorithm. Unlike the LBG algorithm, KSFM is a non-iterative method. The weight vectors are updated adaptively as the training vector comes in. While the LBG algorithm keeps iterating until the average distortion goes below the predefined threshold, the KSFM can achieve comparable results in both compression ratio and reconstructed image quality without iterations. This technique has been demonstrated in its application to speech and image compression. 7

EDGE-PRESERVING IN VQ A human vision system contains special ceils in the brain that are very sensitive to edges inside images. In order to obtain a good-quality reconstructed image, edge information has to be preserved. When VQ is employed for image coding, the reconstructed images usually have jagged edges. This indicates that the edge block is not well represented in the codebook. The edge block usually comprises only a small portion of the entire image, and its variations are much larger than those blocks without significant edge activity inside. As a result, the number of codewords inside the codebook, which can be used to represent edge blocks with a small amount of distortion, is usually not enough. Thus, edge blocks have to be coded separately from the rest of the blocks to avoid edge degradation. Edge orientation is another important issue in designing codebooks for edge blocks. Since edge blocks have large variations and the number of codewords is limited, it is possible that the edge orientations within edge blocks will not be represented correctly, especially when the MSE is used as the distortion measurement. 2 Therefore, further classification of an edge block accoring to its edge orientation is needed to make sure the edge block will be represented by a vector that has a similar edge orientation.

[~

[~]

[~

[~

[~

~

[~]-

r

T ..l-l-

r

T "I

..... l= .I, d _ I _

453

operator

titt i

l.l, JL , J

input block • "

"I"

:.24

"

1" "

"I"

binary output "

1" "

"I"

"



65:

26 62q 65 65 6-5]

It 15 o I0 1

1 1 0 0 ,

i24_ 26 62,65 65 ;=4126 62 65 65 , 24, 26' 62' 65, 65, 651

x5 15610 I01 ts x54 0101

0o

1 1 0 0 x x 0 0

I

•..I_

_ J . . . I . .

J...1..



Fig. 4. Blockactivityanalyzer. Given a total number of codewords for all classes, which include background and different edge orientations, a problem of assigning the number of codewords to each class needs to be addressed. According to Ref. 2, the optimal distribution of codewords can be defined as the codeword distribution that results in the same average distortion per codevector in each class. For instance, the MSE for edge blocks is usually large, and more codewords are needed in order to reduce the average distortion in edge classes. The difficulty of solving the optimality analytically has been shown, and it is suggested that the suboptimal solution can be obtained experimentally. In classified VQ, additional computations are needed to classify the input vector according to its edge feature. Nevertheless, the computations in the encoding process can be reduced because of the smaller codebook size, and the saving is usually quite significant. EDGE-ORIENTED CLASSIFIER Figure 3 illustrates different edge orientations for an image block that will be considered in this work. The classification algorithm using edge detection and the precise definition of gradient has been proposed in Ref. 2. In this section, an edge-oriented classifier based on neural-network models is presented. The implementation details are described as follows: Block activity analyzer The input image block is first classified into two classes: edge and background. This is accomplished by performing gradient computation at each pixel inside the block. The block size is chosen as 4 × 4, and the subblock size 3 × 3 centered at a given pixel is used to

horizontaledges

1

12

output layer

vertieal edges hidden layer

[]

c1 Y

diagonal edges

Fig. 3, Differentedge orientations.

input layer

1

2

16

Fig. 5. Multi-layerperceptron network for edge-oriented classification.

454

CHENG-CHANG LU and YONG HO SHIN: NEURALNETWORKS

block input

~

activity analyzer

image

multi-layer perceptron model

codebook

background

Fig. 6. Systemconfiguration. calculate the gradient for that pixel. The output of the analyzer contains a binarized block in which the highactivity pixel is marked with "1" and the low-activity pixel is marked with "0". If the number of non-zero pixels is greater than the threshold, the block will be classified as an edge block, and vice versa. The analyzer used to classify an input block, with an example, is depicted in Fig. 4.

Edge classifier In order to preserve the edge orientation, edge blocks are further classified into classes with different orientations. Three distinct classes are defined: horizontal, vertical and diagonal, as shown in Fig. 3. Several neural-network-based classifiers have been proposed and applied to pattern classification problems. 4 Among those classifiers, back-propagation classifiers have received great attention and have been successfully used for speech and character recognition. In the application described here, a multi-layer perceptron network with one hidden layer is considered, as shown in Fig. 5. The input layer consists of 16 input neurons, and each one represents a pixel in a 4 x 4 image block. Twelve different orientations are considered, and each one is represented by an output neuron. The back-propagation training algorithm is applied to determine the synapse weights with a set of typical training vectors. Upon the completion of training processes, the network would be able to classify the input block with binary activity indices as a horizontal, vertical or diagonal block according to its edge orientation. SYSTEM CONFIGURATIONS The system configuration for VQ codebook design is depicted in Fig. 6. The input block is first classified as an edge block or a background block by the block

activity analyzer. If it is an edge block, further classification using a multi-layer perceptron model is performed to determine its orientation. In order to ensure the right classification for the input block, a 4 x 4 mask operation is conducted between the binarized input block and the representative binary block from the class into which the input block is classified. If the number of pixel positions with different values of activity index is greater than the threshold, it usually indicates that there is no definite single edge in the block. In this case, the block will be reclassified as a "mixed" edge. An example is illustrated in Fig. 7, and the input block is classified as a mixed edge since no single definite edge can be detected. After a training vector is classified, the codebook for the corresponding class will be updated using KSFM. The final codebook consists of five subcodebooks, and its codewords are determined after all training vectors are passed through the system. In this implementation, the number of codewords assigned to each class needs to be set in front. This will make sure that there are enough codewords for the edge block and will improve the visual quality by preserving the edge integrity. Simulations are conducted based on the compression system proposed above. Five 512x512 gray-scale images, including faces and natural scenes, are employed as the training set. Two images, "Lena" and "Tiffany", are used for testing where "Lena" is not included in the training set. The block size is chosen as 4×4. Two codebook sizes are selected in this experiment: 2048 with 0.6875 b/pixel and 4096 with 0.75 b/pixel. In order to minimize the overall average distortion as suggested in Ref. 2, the distribution of codewords among different classes is predetermined experimentally as shown in Table 1. About 38% of codewords are assigned to background, which includes blocks with no significant gradients and moderate gradients. Also,

;_ "24"," " T " "," " T " "',"_". ; . 26 . 62 . 65, 65. 65, ;'-

q

,241 26162165165165,

11511561521951

12 2616216519519s I

1152~s~10611271

,24126162165195195J

11521156~0611271

1 1 1 1

~1 26J 621 65[ 651 6-5i

[15~15(~52195J

I I 0 1

P',41

.,4

11 _..

1 1 1 1

f2" , 26', 62; 6s', 6si 6"5. b .

.I-

.

Jr...I..

,L . . . .

input image



gradient computation threshold Fig. 7. Mixededge example.

Table 1. Codeworddistributionsamongdifferentclasses Edge classification Horizontal edge Vertical edge Diagonal (2) edge Mixed edge Background

% in training vectors

% in codewords

5% 5% 5% 20% 65%

6% 6% 10% 40% 38%

CHENG-CHANG LU and YONG HO SHIN: NEURAL NETWORKS Table 2. SNR comparisons for LBG and classified VQ with KSFM LBG algorithm C~de~ate
20]6~75

{a)

CVQ+ KSFM 20:75

20]6~75

ma

p

...~.".'~~

30:7~

620/0 are used for various edges, which contain mixed edges as well as vertical, horizontal and diagonal orientations. Since the number of training vectors associated with each class is different, two sets of learning coefficients and neighborhood functions are used in order to have i_...,

. ¢"

. I.~i

I.Ii

: i n,•

~: z ~ W~t'#."

. ~" ,.~[~

"'"

" "

"

", "

"'" I

..

-

.-T.,"

"-

"~" " ' " : .

_..

"

"

, . . InI

KSFM converge in one iteration. For edge classes, the coefficients are chosen as follows: r/
~.

.

'~

..

Fig. 8. (a) Image blocks with large MSE when the non-classified KSFM is used. (b) Image blocks with large MSE when the classified

KSFM is used.

"" -" ~01, I~ .

,:V.

,

~s~,suso~

"/ .:,~

,. • '~=~ ." :. ~ •

" I

,,.Ji ~1 .-: ..

."~]

,,; ,

.= "

. . blocks . . with. large . . . when . Fig. 9." ' "(a) Image MSE the non-classified KSFM is used. (b) Image blocks with large MSE when the classified

~,-

" .

{ ('l~...~i~?""

" ".

m_~=~,i~=

.. ~ ' ~ :

"!

..=l

:

':

-.

"

~.

.". .

"

(b)

•k

:./.,~-

: ~'... ~'. :..-,?1 ..~

:.-

ii

~ l f : '~.

• -"

".":-..,

~-t3","

i~.

.--

:

~.1

".'.-'i~

".;.~, .e ~.

•,

...,.J

,.It:

.~J :



"

'

,~,~ :

~

" "

IMi



,:.;~y.', ,_"'r".-re.,

:m

"'

"...

,I '=' 1 ":~WL . ~1 ,. • ,:" , ~1 . ii. =.., "'".. .$1 ~N. "m~--ke

"t "'.i

~.

\/~. .,~,,,

~'~ ~ ~

-- " " '~"i • ~ . ,.

.~

¢" " ~

,

°

~'w "'--" .~. ~ l ~ .

~

~:

(al

455

NE(n)= I + 100 • exp(-t/100). For background and mixed-edge classes, the following coefficients are used: r/(n)=0.07-exp(-t/10,000)

NE(n) = 3 + 400- exp(-t/100).

456

CHENG-CHANG LU and YONG HO SHIN: NEURAL NETWORKS

For edge classes, the number of training vectors is small and the variation between training vectors is large. A larger learning coefficient will be needed to achieve a significant adjustment to the weight vector, since there are only a few occasions on average in which each output neural unit can get excited. Also, a small neighborhood size is preferred because it can reduce the disturbance caused by neighboring neural units. Table 2 presents a comparative peak signal-to-noise ratio (SNR) between the proposed approach and the L B G algorithm. There are improvements on SNR over the non-classified L B G and KSFM algorithms. Figure 8(a) displays edge blocks that have large MSE when the non-classified KSFM is applied to the test image "Lena", while Fig. 8(b) shows the result when the classified KSFM is employed. It can be seen that the number of blocks with large MSE has been reduced when the edge classification scheme is introduced. It can be seen that most of the improvements are at the edges of the picture, concluding that the edge integrity has been better preserved by the proposed method. Figure 9 illustrates the results obtained when test image "Tiffany" is used. Since KSFM is non-iterative, a substantially faster codebook design process can be expected. In addition, the number of comparisons needed to encode an input block is reduced significantly because of the smaller codebook size of each class.

CONCLUSION An efficient implementation of classified vector quantization for image compression has been proposed, based upon neural-network models. Edge integrity has been preserved by having separate codebooks for edge blocks with different orientations. The edge classification is accomplished by employing a multilayer perceptron model. The efficiency of VQ codebook design has also been improved with a self-organizing neural network model. A system architecture is proposed, and improvements in visual quality and input encoding have been achieved. REFERENCES 1. Nasrabadi N. M. and King R. A. Image coding using vector quantization: A review. 1EEE Trans. Commun. COM-36, 957971 (1988). 2. Ramamurthi B. and Gersho A. Classified vector quantization of images. IEEE Trans. Commun. COM-34, 1105-1115 (1986). 3. Gray R. M. Vector quantization. IEEE ASSP Mag. 1, 4-29 (1984). 4. Lippmann R. P. An introduction to computing with neural nets. IEEE ASSP Mag., pp. 4-22 (1987). 5. Kohonen T. Self-Organization and Associatioe Memory, 2nd edn. Springer, New York (1988). 6. Nasrabadi N. M. and Feng Y. Vector quantization of images based upon the Kohonen self-organizing feature maps. Proc. IEEE Int. Conf. Neural Networks, pp. 1101-1108 (1988). 7. Krishnamurphy A. K. Ahalt S. C., Melton D. E. and Chen P. Neural networks for vector quantization of speech and images. IEEE J. Selected Areas o f Communications SAC-8, 1449-1457 (1990).