A novel training scheme for neural-network-based vector quantizers and its application in image compression

A novel training scheme for neural-network-based vector quantizers and its application in image compression

Neurocomputing 61 (2004) 421 – 427 www.elsevier.com/locate/neucom Letters A novel training scheme for neural-network-based vector quantizers and it...

514KB Sizes 0 Downloads 102 Views

Neurocomputing 61 (2004) 421 – 427

www.elsevier.com/locate/neucom

Letters

A novel training scheme for neural-network-based vector quantizers and its application in image compression N.A. Laskaris∗ , S. Fotopoulos Laboratory of Electronics, Department of Physics, University of Patras, Patras 26500, Greece Available online 15 June 2004

Abstract A “roulette-wheel” routine (Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, MA, 1989) for random sampling is adopted as a means of adjusting the resolution of vector quantizers. With the proposed procedure, the sequential presentation of training vectors is controlled according to an external, user-de8ned criterion. The new training scheme is applied to the problem of codebook design, using the neural gas network (IEEE Trans. Neural Networks (1993) 558), for image coding and shown to improve the visual quality of the reconstructed images. c 2004 Elsevier B.V. All rights reserved.  Keywords: Vector quantization; Image compression; Neural-gas network; Randomized sampling

1. Introduction Vector quantization (VQ) is a powerful strategy for data compression [3] and diCerent techniques have been introduced over the last 20 years. A vector quantizer encodes a data manifold V ⊆ Rd utilizing only a 8nite set of reference or “codebook” vectors Oj ∈ Rd ; j = 1; : : : ; K. Each data vector X ∈ V is described by the best-matching reference vector Oj(X ) for which the distortion error d(X; Oj(X ) ) = X − Oj(x) 2L2 is minimal. The eEcient application of VQ depends, mainly, on the proper selection of reference vectors. For this step, that is the codebook design, the use of traditional clustering algorithms like the K-means was originally proposed. Soon it was experimentally veri8ed that these algorithms often lead to a suboptimal solution and this can degrade the ∗

Corresponding author. Tel.: +30-261-0997287; fax: +30-261-0997456. E-mail address: [email protected] (N.A. Laskaris).

c 2004 Elsevier B.V. All rights reserved. 0925-2312/$ - see front matter  doi:10.1016/j.neucom.2004.03.013

422

N.A. Laskaris, S. Fotopoulos / Neurocomputing 61 (2004) 421 – 427

subsequent encoding of the data. Self-organizing neural networks (NNs) like the Kohonen’s feature map [2] and the neural gas [1,6] provided eEcient alternative schemes for the codebook design. Stochastic presentation of the input data, competition among the neural nodes (to which weight vectors Wj ∈ Rd have been assigned) and a ‘soft max’ adaptation rule are the common characteristics of these networks that guarantee the fast convergence to a set of weight vectors, which can serve as a high-8delity codebook. The resulting codebook vectors are allocated according to the probability distribution of data vectors over the manifold V , and in such a way that the average distortion error is minimized. VQ is considered an appealing approach for lossy image compression especially due to its conceptual simplicity. The input image is decomposed into rectangular blocks that are reshaped into vectors of 8xed size. Given a codebook, each image block can be represented by the binary address of its closest codebook vector. Such a strategy results in signi8cant reduction of the information involved in image transmission and storage. The image is reconstructed by replacing each block with the closest codebook vector. Unfortunately the end result is quite often not—visually—pleasing, especially for large blocks, since the procedure aims at low average distortion error and the image context is not particularly taken into consideration. This work was motivated by the need to improve the visual quality of the reconstructed image, like it has been done for other block-based coding schemes (e.g. [7]), but without sacri8cing the simplicity of the overall VQ procedure. We are proposing a modi8ed training scheme for principled learning in which blocks with high spatial structure (e.g. well-formed edges) are considered more important than homogeneous patches and therefore a more detailed repertoire should exist for them in the overall codebook. The core idea is to represent the data manifold with a variable resolution that depends on the importance (as this is measured via standard image analysis operators) of each manifold region, rather than the local density (which is the case for the current NN-algorithms). Without loss of generality, we present the new scheme with respect to the neural-gas network and demonstrate its use by incorporating simple image processing indices. 2. The roulette-wheel-based training scheme In the original neural-gas network algorithm, a stochastic sequence of incoming data vectors X (t); t = 1; 2; : : : ; tmax , which is governed by the distribution P(X ) over the manifold V, drives the adaptation step for adjusting the weights of the K neurons {Wj }j=1:K (i.e. the reference vectors) PWj = h (kj (X (t); {Wi }i=1:K ))(X (t) − Wj );

j = 1; : : : ; K; ∀t = 1; : : : ; tmax : (1)

The function h (y), in the above equation, has an exponential form e−y= and k(X; {Wi }) is an indicator function that determines the ‘neighborhood-ranking’ of the reference vectors according to their distance from the input vector X , while for both parameters  and  an exponential decreasing schedule is followed, with tmax being the 8nal number of adaptation steps that can be de8ned from the data based on simple convergence criteria (for details see [1,6]). The asymptotic density distribution of the codebook

N.A. Laskaris, S. Fotopoulos / Neurocomputing 61 (2004) 421 – 427

423

vectors P(W ) is nonlinearly proportional to the data distribution, P(W ) ˙ P(X )d=(d+2) [6] (where d is the intrinsic dimensionality of the input data). Using this fact and by adjusting the relative frequency of data sampling from a particular manifold region, we can control the density of codebook vectors covering this region. To this end, we are borrowing a random sampling technique, known as roulette-wheel selection, from the literature of Genetic algorithms, where it is employed to simulate the ‘survival of the 8ttest’ step [4]. Given a set of N input vectors {Xi }i=1:N along with a set of external scalar measurements Fi ; i = 1; 2 : : : N describing the importance of the individual Xi ’s, a ‘wheel’ is built in which N sectors are de8ned with size in proportion to the Fi ’s. This biased wheel is repeatedly spun resulting each time t = 1; 2; : : : tmax in the selection of a new sector. The vector associated with the measurement Fsel(t) (corresponding to the selected sector) is the next vector to be used in the adaptation step (Eq. (1)), i.e. X (t) = Xsel(t) . The above described ‘roulette-wheel’ training scheme performed, in practice, rather unsatisfactorily. This was due to the fact that the actual data manifold, usually, is not populated uniformly. Data vectors from a manifold region of very high local density, naturally, tend to appear more often during the sampling (since this region is mapped onto plenty of sectors). In such a case, and in order to serve precisely the goal of adjusting the sequence of training vectors according to a given external index, it is necessary to counterbalance the local density diCerences across the input data. The incorporation of the Potential Functions (Parzen’s estimator [9]) technique was a straightforward way to achieve the ‘equalization’ of the local density. This gave rise to an algorithmic procedure that can be thought of as a ‘whitening’ step and can be outlined with the following three steps. An estimate EP(Xi ) of the local density is 8rst attached to each input vector Xi   N  −Xi − Xj 2 1 ; EP(Xi ) = exp (2)d=2 rod N 2ro2 j=1

ro =

N  N  1 Xi − Xj 2 : N (N − 1) i j

(2)

The N estimates are then normalized to the overall maximum EPmax and 8nally used, after negation, for a roulette-wheel-based random sampling, in which they play the role of the external scalar measurements (i.e. Fi = 1 − EP(Xi )=EPmax ). In summary, the overall training scheme starts with the optional ‘whitening’ step, during which a portion  of the original data vectors is kept. This is followed by the stochastic presentation of the remaining N  = N vectors and the corresponding weight adaptation. The frequency of appearance of each vector (from the ‘equalized’ set) is proportional to a user-de8ned measure. After extensive experimentation we noticed that the 8nal results are relatively insensitive to the choice of , as long it is in the range [0.3–0.9]. In all the results included in this paper, we have used the value  = 0:8. An illustration of the previous procedure is provided with the 2D example of Fig. 1. The input data set includes three subsets A, B and D with each one containing 50 samples and the subset C containing 100 samples (Fig. 1a). External measurements have been

424

N.A. Laskaris, S. Fotopoulos / Neurocomputing 61 (2004) 421 – 427

Fig. 1. Input data set before (a) and after (b) ‘whitening’ and the 10 codebook vectors in which the neural-gas network converges after the roulette-wheel-based training scheme (c) and the conventional training scheme (d).

associated with these N = 250 samples denoting that the subset D is twice as more important (Fi = 2; Xi ∈ D) than the rest (Fi = 1; Xi ∈ A ∪ B ∪ C). During the ‘whitening’ step, we draw N  = 200 samples from the original set (Fig. 1b). These samples were repeatedly used (with a frequency depending on the corresponding 200 Fi -labels) to train a neural-gas network of K = 10 nodes. The computed weights are depicted in Fig. 1c and can be compared with the ones from the standard training scheme (i.e. adaptation using Eq. (1), without the roulette-wheel-based sampling) applied to the original dataset (Fig. 1d).

3. Application of the new training scheme in image compression Edges are considered very important regarding the visual quality of an image [7], especially whenever clear-cut objects are present over a uniform background. We are demonstrating, here, how edge strength can guide the codebook design and 8nally improve the quality of the reconstructed images. Fig. 2 visualizes the employed steps using (a simpli8ed example with) real data. The comparison between the two histograms (Fig. 2, middle) justi8es the use of the ‘whitening’ step: its necessity stems from the fact that edge-containing patches are relatively rare in the image. Our overall training scheme results in a codebook specialized in edge representation (Fig. 2, bottom). On the contrary, the codebook from the standard training procedure is better tailored to luminance variations. The two training schemes are compared based on the quality of the reconstructed image and under more realistic compression-rate scenarios (i.e. larger codebook size K) in the 8rst three rows of Fig. 3. Apart from the existence of edges, the (related) spatial contrast is also another factor inSuencing image quality. Speci8cally for natural scenes, the high spatial contrast has been shown to capture the focus of visual attention [8]. A proper index of contrast can therefore serve as the external measure for guiding the codebook design. As such an index, the local standard deviation within each block, normalized by the global mean intensity of the image, was used in the codebook design for the reconstructed image in the last row of Fig. 3.

N.A. Laskaris, S. Fotopoulos / Neurocomputing 61 (2004) 421 – 427

425

Fig. 2. Top: The original 256 × 256 image is partitioned into 8 × 8 blocks and the resulting N = 1024 vectors constitute the original data set {Xi }i=1:N . The corresponding edge map (obtained by a conventional edge detector) is partitioned accordingly, and from each block an index ei is computed expressing the edge content of the corresponding image block Xi . These indices after normalization (with respect to the overall maximum) serve as the external measures {Fi }i=1:N . Middle: The histogram of edge-strength-related indices before (left) and after (right) the whitening step, in which N  = 820 vectors have been sampled from the original set, based on local density. Bottom: A small network of K = 5 neurons has been trained based on the ‘equalized’ data set and the corresponding indices. The resulting codebook vectors are shown along with the ones produced via standard training.

426

N.A. Laskaris, S. Fotopoulos / Neurocomputing 61 (2004) 421 – 427

Fig. 3. Comparison between the original images and the ones reconstructed by using codebooks, which were designed either with the proposed training scheme or the standard procedure. The second row includes a zoomed version of the images in the 8rst row. The size of the used codebooks was (from top to bottom) 64, 64, 32 and 64, respectively, while the corresponding blocks were 4 × 4, 4 × 4, 8 × 8 and 8 × 8 in size.

N.A. Laskaris, S. Fotopoulos / Neurocomputing 61 (2004) 421 – 427

427

4. Discussion Despite the simplicity of the two criteria tested (edge strength and spatial contrast), we experimentally veri8ed that the proposed training scheme results in signi8cantly improved codebook design. More sophisticated criteria (or combinations) are expected to further improve the quality of the reconstructed image. For instance, the saliency-map produced by biologically inspired image processing algorithms [5] can readily replace the edge-map in Fig. 2. Our training algorithm is expected to be useful in other application areas as well. Acknowledgements The research of the 8rst author was funded by the Greek GSRT (ENTER 2001). References [1] A. Atukorale, P. Suganthan, Hierarchical overlapped neural gas network with application to pattern classi8cation, Neurocomputing 35 (2000) 165–176. [2] E. de Bodt, M. Cottrell, P. Letremy, M. Verleysen, On the use of self-organizing maps to accelerate vector quantization, Neurocomputing 56 (2004) 187–203. [3] A. Gersho, R.M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Boston, 1992. [4] D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA, 1989. [5] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal. Mach. Intell. 20 (11) (1998) 1254–1259. [6] T. Martinez, S. Berkovich, K. Schulten, Neural-gas network for vector quantization and its application to time-series prediction, lEEE Trans. Neural Networks 4 (4) (1993) 558–569. [7] C.M. Privitera, M. Azzariti, Y.F. Ho, L.W. Stark, A comparative study of focused Jpeg compression, Pattern Recognition Lett. 23 (10) (2002) 1119–1127. [8] P. Reinagel, A.M. Zador, Natural scene statistics at the centre of gaze, Network: Comput. Neural Systems 10 (4) (1999) 341–350. [9] D. Sindoukas, N. Laskaris, S. Fotopoulos, Algorithms for color image edge enhancement using potential functions, IEEE Signal Processing Lett. 4 (9) (1997) 269–273.