A sequential initialization technique for vector quantizer design

A sequential initialization technique for vector quantizer design

Pattern Recognition Letters 7 (1988) 157 161 North-Holland March 1988 A sequential initialization technique for vector quantizer design Guoliu Y U A...

356KB Sizes 2 Downloads 69 Views

Pattern Recognition Letters 7 (1988) 157 161 North-Holland

March 1988

A sequential initialization technique for vector quantizer design Guoliu Y U A N Wuhan Technical University of Surveying and Mapping, Wuhan, China

Morris G O L D B E R G * Dept. of Electrical Engineering, University of Ottawa, Ottawa. Canada, K1N 6N5 Received 27 May 1987

Abstract." Vector quantization has been used in compressing both speech and image data. In theory, better performance can always be achieved by coding vectors instead of scalars. However, actual results depend upon the proper design of the quantizer. Vector quantizer design typically employs an algorithm such as the K-means algorithm or the Linde Buzo Gray algorithm in which the initialization affects the design cost (convergence rate) and the achievable performance (quantization error). After reviewing several current initialization techniques, a sequential initialization method called Error Function Initialization is presented. In this method, the seeds are chosen one at a time by attempting to maximize the step-wise reduction in the quantization error. Experimental results show that this technique yields faster convergence and smaller quantization errors. For real time applications, the technique could be used to design sub-optimal vector quantizers.

I. Introduction

Vector quantization is being applied to speech compression [1, 2], and has been demonstrated to be an effective approach to image coding [3]. A Klevel vector quantizer, Q, is a mapping from an Ndimensional vector set, { V}, into a finite codebook, W = {w1, w 2 , . . . , Wk} , assigning to an input vector, v, a representative vector (codeword), w. The vector quantizer, Q, is completely described by the codebook, W = {w 1, w2 ..... Wk}, together with the disjoint partition, R = {R1, R2 . . . . . Rk} , where

Ri = {v: Q(v) = wi} and w and v are N-dimensional vectors. Vector quantizer design involves the determination of the partition which ideally minimizes a quantization error measure. There are a three preliminary steps involved in image coding by vector quantization, vector formation, codebook generation and finally the coding or

vector quantization step [2]. The first step concerns the choice of vectors drawn from the image. Typically the image is decomposed into non-overlaping (m,n) rectangular blocks and the pixels in each block are then used as the components of the mndimensional vector. Alternatively, some features are extracted from each block and used to form the vector; for example, the lower order coefficients of some unitary transform applied to each block. In the codebook generation step, the codebook W and its associated partition R is found and a copy of the codebook is sent to the receiver. Quantization of a vector, v, is then the process of finding the region, Ri, to which the vector is mapped by the quantizer, Q. Compression is achieved by transmitting the label assigned to the region. At the receiver reconstruction is performed by replacing the label with the codeword corresponding to the region. In implementation, vector quantizer design commonly uses a training set, instead of a source model, and employs the K-means clustering algorithm to

0167-8655/88/$3.50 @ 1988, Elsevier Science Publishers B.V. (North-Holland)

157

Volume 7, Number 3

PATTERN RECOGNITION LETTERS

generate the partition. The K-means clustering algorithm [4] is an iterative process for finding a local optimal set of K cluster centers which minimize a criterion function. For vector quantization, this function is just the average quantization error. Briefly, the algorithm functions as follows: S t e p 1. Choose K initial cluster centers or seeds. S t e p 2. Assign the training vectors to the closest cluster centers. S t e p 3. Recompute the cluster centers. S t e p 4. Return to Step 2 until the process converges. In this algorithm, the achievable local optimum and the speed of convergence depends upon the initial choice of the cluster centers or seeds [4]. A number of initialization techniques for the Kmeans clustering algorithm have been proposed and described in the next chapter. In this paper, a sequential initialization method, Error Function Initialization (EFI), is presented, based upon choosing the initial seeds one by one. The position of each new seed is determined by maximizing the estimated amount of quantization error reduction achieved by creating one more output level for the quantizer. As the choice of the initial seeds is related to the distortion measure, better performance resuits: faster convergence and lower average distortion. The rest of this paper is organized as follows. Section 2 reviews some of the initialization techniques currently used. In Section 3 the Error Function Initialization method is introduced and in Section 4 experimental results are presented. Finally, some concluding remarks are given in Section 5.

2. The previous work on initialization techniques A good initialization technique should possess two features. Firstly, it should enable the K-means process to achieve a small quantization error (distortion), and secondly, it should make the process converge quickly. In addition, it is clear that the initialization time itself should be short. The initialization methods which have been proposed include the following. 1. Binary Splitting [5]. Thwo initial cluster centers, 158

March 1988

Z(1) and Z(2), are first determined using the mean vector of the sample distribution, plus or minus one standard deviation. The K-means process is then applied to yield two representative vectors. Two new seeds are then derived from each representative vector as follows: Z(i) + P and Z(i) - P, where P is a fixed perturbation vector. The four vectors thus derived are then used as seeds to find four representative vectors, and the entire process is iterated until the desired number of seeds is determined. This scheme generally produces satisfactory resuits, as is shown below. The initialization step can be quite time consuming, unless the number of iterations used to find the representative vectors at each step is limited. 2. Mode-seeking Initialization [6-8]. This scheme consists of two steps. A histogram of the sample data is first generated, and then the modes of the histogram are determined and used as the initial cluster centers. From the point of view of implementation the main problem is the difficulty in calculating the histogram, particularly as the dimension of the vector increases. A judicious choice of the appropriate resolution is required to decrease the computational requirements and at the same time generate the appropriate number of modes. A more fundamental shortcoming is that this method is not related to the quantization error criterion, in the other words, some modes may correspond to tightly bound (low variance) clusters and others may have much larger variance. 3. Parametric Initialization [9]. In this method, the mean u(d) and standard deviation a(d) for each vector dimension of the sample data is first calculated. The value of the d-th component of the k-th seed is then given by z(d, k) = (u(d) - c * a(d)) + (k - 0.5) * 2 • c • a(d)/K,

where c is a multiplicative factor which depends on the sample distribution. In other words, each dimension is partitioned according to the variance. This initialization approach generally gives acceptable results provided that the data is correlated in all its dimensions.

Volume 7, Number 3

P A T T E R N R E C O G N I T I O N LETTERS

4. Maximum Separation Initialization [10]. This method takes as seeds K sample vectors which are separated from each other as much as possible. It proceeds as follows: Step 1. Set an initial threshold distance, T 0. Step 2. Let T=(I

I * C ) * T o,

where I is the iteration number and C is a constant. Step 3. Among the training set, find K vectors such that the distances between one and the other are bigger than T. These K vectors are chosen as the initial seeds. Step 4. If this value of T does not yield enough seeds, then the value of I is increased by one, and Step 1 is recommenced. This method is applicable for both correlated and uncorrelated data. However, this method does not directly exploit the statistical characteristics of the data.

3. Error function initialization Compared with the above techniques, the Error Function Initialization (EFI), which is presented in this paper, has some distinguishing features. The main idea is that the seeds are chosen sequentially one at a time. The position of each new seed is determined by maximizing the estimated amount of quantization error reduction achieved by creating one more output level for the quantizer. Suppose that, for a given training set, i seeds have already

March 1988

been found. We can consider that they constitute a vector quantizer of i levels, VQ(i), where these i seeds are taken as the representative vectors. Applying VQ(i) to the training set results in a quantization error or distortion, E i. If a new quantizer of i + 1 levels, VQ(i + 1), is constructed by adding a new representative vector or seed to VQ(i), then the resulting distortion value is given by Ei + I(P), where p is the position of the new representative vector or seed. Furthermore, Ei + I(P) < E~ and the difference, Ri(p) = E i - Ei+ I(P), represents the quantization error reduction resulting from adding this new output level to the quantizer. Under the above conditions, R~(p), the Error Reduction Funclion, varies with the position of the (i + 1)th seed. The problem is now how to determine p, the position of the (i + 1)th seed, so that the Error Reduction Function, Ri(p), is maximized. It is computationally expensive to calculate the value of R~(p) for every point p in the space so as to find the maximum. A reasonable alternative is to construct and compute a function, the performance of which correlates with the Error Reduction Function. We observe that every time a new output level is added, only those samples close to (neighbouring) the new output level are affected. This implies that R~(p) can be estimated from the quantizer error reductions of the samples near p. This estimate can be expressed as a product of two factors: one is the frequency of occurrence of the samples near p, and the other is the distance to the nearest representative vector or seed. To reduce the computational costs, a coarse histogram of the vectors is calculated

seed 3

dl

seed 2 histogram cell Figure I. The formation of Quasi-Error Reduction Function in a two-dimensional vector space when there are already 3 seeds. Here f(p) is the frequency of occurrence of the histogram cell in which p is located and D(p) is the distance from the cell to the nearest seed.

159

Volume 7, N u m b e r 3

PATTERN RECOGNITION LETTERS

and the neighbourhood of the point p is replaced by the cell of the histogram in which p is found (see Figure 1). Thus Ri(p) can be estimated from a Quasi-Error Reduction Function, Qi(p), which is defined by

Qi(p) = f ( p ) * D2(p), where tip) is the frequency of occurrence of the histogram cell in which p is located and Di(p) is the distance to the nearest cell. In computer implementation, Error Functon Initialization consists of the following steps:

Step 1. Divide the vector space into M cells. Count the frequency of occurrencef(p), in the training set, of each cell; in other words, generate a histogram. Step 2. Sieve out and discard all the cells that have frequency below a threshold value. Step 3. Choose the most frequent cell as the first seed. Step 4. Select as the (i + 1)th seed, the p which maximizes Qi(p). Step 5. Repeat Step 4 until the desired number of initial seeds have been generated.

4. Experimental results The performance of the Error Function Initialization has been investigated experimentally, and compared with three other techniques: Binary Splitting, Parametric Initialization and Maximum Separation Initialization. Two different data sets are used for training. The first, referred to as 'correlated data' comes from a monochrome ace image where the vectors are formed from blocks of 2 × 2 pixels. The second set, referred to as 'differential data', comes from a Landsat thematic mapper image where the vectors are derived from 3 x 3 pixel blocks. The image is first divided into 3 x 3 pixel blocks; the 9 grey level pixel values are then normalized by the block mean and variance, thus decreasing the correlation. The experimental results are given in Tables 1 and 2, where E, P, S stand for Error Function Initialization, Parametric Initialization and Maximum Separation Initialization respectively, and B2 and B4 correspond to Binary Splitting methods 160

March 1988

where the number of iterations for each step in the initialization is limited to 2 and 4, respectively. The numbers shown for Binary Splitting include the numbers of iterations required in the initialization step. In the experiment, the mean square error (MSE) is used as the distortion measure. The convergence criterion in the experiments is the quantization error difference (QED) measure defined by: QED = ( ~ 1

- x/MSE,)/~,

Table 1 Comparison of performances of theinitialization techniques for 'correlated data' (vector dimension: 4) Number of clusters

Method

Number of iterations for cony.

Initial distortion (MSE)

Final distortion (MSE)

4

E P S B2 B4

4 14 5 7 8

247.2 1140.6 1029.7 1180.2 982.0

229.1 229.8 230.0 229.5 230.5

8

E P S B2 B4

6 21 20 9 13

94.7 126.7 327.4 209.0 207.8

79.7 93.6 92.1 79.7 79.7

16

E P S B2 B4

13 9 15 6 8

47.8 72.5 287.0 71.3 70.1

44.8 52.1 48.9 44.6 44.6

32

E P S B2 B4

20 21 27 20 25

35.8 53.9 181.9 39.6 39.5

30.1 31.0 30.6 29.4 28.8

64

E P S B2 B4

17 16 19 19 24

32.4 48.2 75.4 32.1 30.9

19.8 22.0 21.4 19.8 19.8

128

E P S B2 B4

17 19 21 20 17

31.6 46.2 45.4 24.0 20.5

13.8 14.7 14.9 13.6 13.6

Volume 7. N u m b e r 3

PATTERN RECOGNITION LETTERS

where t - 1 and t refer to the previous and current iteration, respectively. A value of 0.001 is used for Q E D in Table 1, and 0.005 in Table 2. The above experimental results show that EFI can be applied to data with varying degrees of correlation, and in general achieves low final distortion with fewer iterations. For the 'correlated' data set, the results compare favourably with the Binary Splitting technique, whereas for the less correlated 'differential' data, the results are comparable to Maximum Separation Initialization. Note that the computational overhead incurred by using EFI corTable 2 Comparison of performances of the initialization techniques for the normalized 'differential data' (vector dimension: 9) Number of iterations for conv.

Initial distortion

Final distortion

(MSE)

(MSE)

E P S B2 B4

4 3 4 Il 10

0.696 1.096 0.878 0.919 0.812

0.567 0.939 0.590 0.567 0.572

8

E P S B2 B4

4 I0 6 7 10

0.554 1.003 0.754 0.689 0.588

0.470 0.471 0.472 0.479 0.472

16

E P S B2 B4

4 9 6 8 I0

0.457 0.983 0.639 0.507 0.486

0,381 0.385 0.399 0.383 0.380

E P S B2 B4

5 11 7 8 9

0.431 0.979 0.413 0.412 0.392

0.315 0.322 0.317 0.320 0.318

64

E P S B2 B4

6 10 6 8 10

0.383 0.979 0.424 0.346 0.328

0.262 0,263 0.264 0.262 0.262

128

E P S B2 B4

8 11 7 8 10

0.356 0.979 0.336 0.285 0.274

0.214 0.216 0,216 0.215 0,214

Number of clusters

4

32

Method

March 1988

responds to much less than one iteration of the Kmeans algorithm. Furthermore, the histograms generated in the Error Function Initialization in the experiments are quite coarse. For the first correlated data set, the histogram contains only a total of 256 cells. For the 'differential' data, each dimension of the vector space is divided into 11 bins or cells. 5. Conclusion A sequential initialization technique called Error Function Initialization has been introduced for choosing the initial seeds in the K-means algorithm. This technique is shown to compare favourably with other techniques for image vector quantization applications, especially as regards the number of iteration required to achieve convergence. One possible application of EFI is as a replacement for the time consuming K-means algorithm in the cases where real-time processing is required. References [1] Buzo, A., A.H. Gray, Jr., R.M. Gray and J D . Markel (1980). Speech coding based upon vector quantization. IEEE Trans. Acoust., Speech and Signal Proe. 28 (5), 562 564. [2] Wong, D.Y., B.H. Juang and A.H. Gray, Jr. ( 1981 ). Recent development in vector quantization for speech processing. IEEE Intl. Con£ Acoust., Speech and Signal Proc. 1-4. [3] Gray, R.M., (1984). Vector quantization. IEEE A S S P Mag., 4~29. [4] Tom J.T. and R.C. Gonzalez (1974). Pattern Recognition Principles. Addison-Wesley, Reading, MA. [5] Linde, Y., A. Buzo and R.M. Gray (1980). An algorithm for vector quantizer design. 1EEE Trans. on Communications 28 (1), 84 95. [6] MacCalla. J.R. and M.V. Chang (1980). Mulfispectral data compression using hybrid cluster coding. Atm:rican Institute o[ Aeronautics and Astronautics Communications Satellite. Systems Con(., 404--407. [7] Narendra, P.M. and M. Goldberg (1977). A non-parametric clustering scheme for Lansat. Pattern Recognition 9, 207 215. [8] Warton Stephen, W. (1983). A generalized h~stogram clustering scheme for multidimensional image data. Pattern Recognition 16 (2), 193 199. [9] Hilbert, E.E. (1977). Cluster compression algorithm, a joint clustering/data compression concept. Jet propulsion Laboratory Publication 77 4 3 . [10] Sun, H.F. and M. Goldberg (1984). Image sequence coding using vector quantization. IEEE Proceeding o/" Intl. Communic, and Energy Con(., Montreal, Canada, 266 269. 161