Signal Processing 67 (1998) 163—172
A 2-D vector excitation coding technique Nasir Memon* Department of Computer Science, Northern Illinois University, DeKalb, IL 60115, USA Received 24 March 1997; received in revised form 12 January 1998
Abstract We present a VQ-based technique for coding image data that adopts an analysis by synthesis approach. We define a new type of spatial interaction model for image data, called prediction pattern, which we use along with a excitation vector, to generate an approximation of an input block of pixels. A prediction pattern is a k]k array with each element representing a prediction scheme from a given set of predictors. A prediction pattern captures the spatial dependences present in an image block. Given a codebook of prediction patterns and a codebook of excitation vectors, we encode an image by partitioning it into blocks and for each block identifying the prediction pattern from within the codebook that best models the spatial dependences that are present in the block. We then search the excitation codebook for a code vector that in combination with the already chosen prediction pattern results in the synthesis of the closest approximation to the current image block. We present algorithms for codebook design and give implementation results. The proposed technique gives promising results. ( 1998 Elsevier Science B.V. All rights reserved. Zusammenfassung Wir stellen eine VQ basierte Methode zur Bildkodierung vor, die auf einem Analyse-durch-Synthese Ansatz beruht. Wir definieren einen neuen Typ eines interaktiven ra¨umlichen Modells fu¨r Bilddaten, das als Pra¨diktionsmuster bezeichnet wird und das wir zusammen mit einem Anregungsmuster benutzen, eine Approximation eines Eingangspixelblocks zu erzeugen. Ein Pra¨diktionsmuster ist eine k]k Anordnung, wobei jedes Element ein Pra¨diktionsschema einer gegebenen Menge von Pra¨diktoren darstellt. Ein Pra¨diktionsmuster repra¨sentiert die ra¨umlichen Abha¨ngigkeiten innerhalb eines Bildblocks. Sind das Codebuch der Pra¨diktionsmuster und dasjenige der Anregungsmuster vorgegeben, dann kodieren wir ein Bild durch eine Partionierung in Blo¨cke und durch die Identifizierung des Pra¨diktionsmusters jedes Blocks mit demjenigen des Codebuches, das die ra¨umlichen Abha¨ngigkeiten innerhalb des Blocks am besten modelliert. Danach durchsuchen wir das Anregungscodebuch nach einem Codevektor, der in Kombination mit dem bereits gewa¨hlten Pra¨diktionsmusters die beste Approximation des augenblicklichen Bildblockes ergibt. Wir stellen Algorithmen fu¨r den Entwurf der Codebu¨cher vor und teilen Ergebnisse u¨ber die Implementierung mit. Die vorgeschlagene Methode liefert verhei{ungsvolle Ergebnisse. ( 1998 Elsevier Science B.V. All rights reserved. Re´sume´ Nous pre´sentons dans cet article une technique base´e sur la quantification vectorielle (VQ) pour le codage des images, qui adopte une approche d’analyse par synthe`se. Nous de´finissons un type nouveau de mode`le d’interaction spatiale,
* Tel.: (815) 753 6944; fax: (815) 753 0342; e-mail:
[email protected]. 0165-1684/98/$19.00 ( 1998 Elsevier Science B.V. All rights reserved. PII S 0 1 6 5 - 1 6 8 4 ( 9 8 ) 0 0 0 3 4 - 6
164
N. Memon / Signal Processing 67 (1998) 163–172
appele´ gabarit de pre´diction, que nous utilisons avec le vecteur d’excitation pour ge´ne´rer une approximation d’un bloc de pixels d’entre´e. Un gabarit de pre´diction est un re´seau k]k dont chaque e´le´ment repre´sente une technique de pre´diction issue d’un ensemble donne´ de pre´dicteurs. Un gabarit de pre´diction capture les de´pendances spatiales pre´sentes dans un bloc de l’image. Pour un dictionnaire de gabarit de pre´diction et un dictionnaire de vecteurs d’excitation donne´s, nous encodons une image en la partitionnant den blocs et en identifiant pour chaque bloc le gabarit de pre´diction du dictionnaire qui mode`le le mieux les de´pendances spatiales pre´sentes dans le bloc. Nous parcourons ensuite le dictionnaire d’excitation pour obtenir un vecteur de code qui, en combinaison avec le gabarit de pre´diction de´ja` choisi produit en synthe`se l’approximation la plus proche du bloc d’image courant. Nous pre´sentons des algorithmes pour la conception des dictionnaires et fournissons des re´sultats d’implantation. La technique propose´e donne des re´sultats prometteurs. ( 1998 Elsevier Science B.V. All rights reserved. Keywords: Image compression; Vector excitation coding; Prediction patterns; Vector quantization
1. Introduction Vector quantization has been found to be a very efficient compression technique due to its inherent ability to exploit correlation between neighboring pixels. In fact, rate distortion theory shows that memoryless VQ is capable of achieving the ultimate performance limits of data compression [6]. Despite this, VQ techniques find difficulty in outperforming transform-based techniques [9]. The reason for this being that moderate to large vector dimensions and codebooks are needed before the rate-distortion advantages of VQ come into effect. The time and space complexity encountered while designing vector quantizers with large vector dimension are formidable to say the least. For example, for a vector dimension of 36 (corresponding to a 6]6 image block) and bit rate 0.5 bpp, we require a codebook of size 262 144. This would be prohibitive for most applications. For higher bit rates or larger dimensions, the problem gets even worse. Researchers have identified various approaches to overcome this problem. These include tree-structured VQ, product-code VQ, shape-gain VQ, multistage VQ, predictive VQ, hierarchical VQ, finite-state VQ to name a few [1,6]. One such class of techniques applied very successfully for speech coding is generically referred to as vector excitation coding (VXC) [6], code excited linear prediction (CELP) [10] being one of the better known VXC techniques. VXC techniques retain the large dimension required for effective vector quantization but drastically reduce the size of the codebook. Reducing codebook size for a fixed bit
rate allows transmission of a large amount of side information that specifies a model for the input vector or in other words a processing operation that will convert the code vector into a better approximation of the input vector. In CELP, the processing operation is a timevarying shaping filter. The parameters that describe the filter are determined by performing linear prediction analysis on the input vector X(n). For speech data, if the prediction order is sufficiently large then the prediction residual is white noise and can be quantized effectively by a relatively small codebook, the elements of which are called excitation vectors. The parameters specifying the filter are quantized and transmitted to the receiver for each set of m input vectors, along with the codebook indices of the excitation vectors. In the closed-loop form of VXC (or CELP), after quantization of the filter parameters, the index of the excitation that gives the best reconstructed vector when used in conjunction with the shaping filter is determined and transmitted. This approach is known as an analysis by synthesis approach [3]. Closed-loop VXC techniques have been used very successfully for coding speech data [2]. In the case of image data, however, the same success has not been achieved. This is because it is difficult to model such data as produced by a combination of a finite set of shaping filters and excitation signals. In this paper we present a VQ-based technique for coding image data that, like closed-loop VXC, adopts an analysis by synthesis approach. We define a new type of spatial interaction model called prediction pattern, which we use along with a
N. Memon / Signal Processing 67 (1998) 163—172
quantized residual vector, or excitation vector, to generate an approximation of an input block of pixels. A prediction pattern captures the essential structure of inter-pixel relationships in the image block, and having captured this structure, all that is needed is a sequence of residuals that can lead to a better approximation of the image block. The paper is organized as follows. In the next section we introduce the notion of a prediction pattern and show how an image can be encoded using a codebook of prediction patterns and a codebook of excitation vectors. In Section 3 we present algorithms for joint design of codebooks for prediction pattern and excitation vectors. We also consider the problem of designing a set of underlying predictors that are used to form a prediction pattern. We give an iterative algorithm for predictor and codebook design that descends to a local optimum. In Section 4 we apply the techniques that are developed to image data and demonstrate that initial results are indeed promising. We finally conclude in Section 5 with a discussion on future developments that are proposed. It is seen that the modeling paradigm that we have introduced opens up many avenues for further research and can potentially lead to a new family of vector quantization techniques.
2. Prediction patterns Given a set of prediction schemes F" M f , f ,2, f N, we define a prediction pattern ¹"(t ) 1 2 r ij to be a k]k array with each element belonging to the given set F of predictors. If each predictor in
165
¹ is causal with respect to the raster scan then we say the prediction pattern is causal. In this paper we restrict our attention to causal prediction patterns. For the sake of brevity we drop the qualification and from henceforth simply refer to a causal prediction pattern as a prediction pattern. A prediction pattern ¹ and a vector E consisting of prediction errors (residuals) can be used to synthesize an image block B, pixel by pixel, in raster order (top to bottom, left to right within each row), by using the predictor specified by ¹[i, j] to arrive at an estimate of the intensity value at the spatial location (i, j) and adding to it the residual E[i, j] to get back the original value B[i, j]. A very simple example is provided by letting F be the following prediction schemes: 1. H : PK [i, j]"P[i, j!1], 2. V : PK [i, j]"P[i!1, j], 3. L : PK [i, j]"P[i!1, j!1], 4. R : PK [i, j]"P[i!1, j#1], where PK [i, j] represents the predicted value for P[i, j]. For this example set of predictors, Fig. 1 shows a 4]4 image block along with a prediction pattern and residual vector that can be combined to exactly synthesize the image block by utilizing previously reconstructed pixel values from neighborhood blocks (shown as shaded in the figure). Note that for clarity we use the symbols H, V, L and R to describe a specific predictor from the above set. The set of predictors in the above example is simple and, in practice, we could use a more sophisticated set F designed to effectively capture spatial dependencies present in the image block. A discussion on selecting F is given in the next section.
Fig. 1. Left: a 4]4 image block. Middle: a prediction pattern that generates the block from known neighborhood pixels (shaded). Right: corresponding prediction residuals needed to exactly synthesize the block.
166
N. Memon / Signal Processing 67 (1998) 163–172
Clearly, there are many prediction pattern/residual vector combinations that can be used to synthesize a given image block. Naturally, given an image block, we would like to find a prediction pattern that results in a residual vector of minimum energy. In Fig. 2 we show one such prediction pattern and the corresponding residual vector of minimum energy for the image block in Fig. 1. A prediction pattern that minimizes the mean-squared prediction error will vary from block to block. However, there are many segments within a digital image which possess similar patterns of inter-pixel relationships. Hence, if we partition the image into k]k blocks, it would be reasonable to expect to find prediction patterns which are near optimal for a multitude of blocks. This leads to the idea of constructing a codebook of prediction patterns and for each block in the image, identifying the best prediction pattern from within the codebook and transmitting the prediction residuals. In the examples used so far the image block is synthesized exactly. However, if the prediction residuals are quantized then we get an approximate reconstruction of the image block. In this case prediction will be done based on reconstructed pixels of neighboring and current block and the quantized residual added to this predicted value to arrive at an approximate reconstruction of the current pixel. Now that we have a way of approximately synthesizing image blocks, we are ready to describe a VXC-type of technique for coding image data. Instead of synthesis filters, we have prediction patterns and the quantized prediction residual vectors serve the role of excitation vectors. In the rest
of this paper we use the terms quantized residual vector and excitation vector interchangeably. The VXC-type technique we propose works as follows. Given an image, a codebook of prediction patterns, and a codebook of excitation vectors, we encode the image block by block, identifying for each block the prediction pattern from within the codebook that best models the spatial dependencies that are present. By best we mean the prediction pattern that yields the least-mean-squares prediction error when used on the image block. Having identified the prediction pattern we then search the excitation codebook for the code vector that in combination with the already chosen prediction pattern results in the synthesis of the closest approximation of the current image block. Here again, the notion of closeness is interpreted in the mean-square sense. We then transmit the index of the prediction pattern along with the index of the excitation vector to the receiver. The receiver reconstructs the block pixel by pixel, by predicting the value of the current pixel by the scheme specified by the corresponding entry in the prediction pattern and then using the associated excitation value to reconstruct a better approximation of the original pixel. In Fig. 3 we show how an approximation of the image block in Fig. 2 can be constructed, given a prediction pattern and an excitation vector. Note that the technique outlined above also bears a relationship to DPCM with switched predictors. A prediction pattern specifies a particular switching sequence. However, only a limited number of ways of switching between predictors is
Fig. 2. Left: a 4]4 image block. Middle: a prediction pattern that results in residual vector of minimum energy. Right: residuals needed to exactly synthesize the block.
N. Memon / Signal Processing 67 (1998) 163—172
167
Fig. 3. Left: a prediction pattern. Middle: excitation vector. Right: resulting image block that is synthesized.
allowed, as specified by the prediction pattern codebook. Also, when selecting a prediction pattern (or equivalently a switching sequence), original pixel values are used as opposed to reconstructed values used in DPCM with switched predictors. The question that arises is, how do we construct a codebook of prediction patterns and excitation vectors that are required by the technique outlined above. In the next section we address the codebook design problem. Before we do this, we would like to point out that the encoder complexity of the above technique, just like any product code VQ or multistage VQ is simply the sum of the sizes of the two codebooks involved. The number of possible reconstruction vectors, however, is the product of the sizes of the two codebooks. The decoder on the other hand, is extremely simple as it only needs to table look-ups followed by reconstruction of the image block. The extra complexity of the proposed technique comes from performing predictions for each pixel in the image block, both at the encoder and decoder end. However, as is shown later, simple predictors requiring only shifts and additions, yeild reasonable performance.
3. Codebook design We need to design codebooks for both prediction patterns and excitation vectors. A simple approach is to first design a codebook of prediction patterns. This codebook is then fixed, using which we generate a training sequence for excitation codebook design. We give below details of the design process.
3.1. Designing a codebook of prediction patterns The problem is to design an optimal set of prediction schemes and an optimal codebook of prediction patterns, given an image (or class of images). The problem stated in this manner seems formidable. In order to make it more tractable, we impose some structure on the set of predictors. We defer the discussion on designing F to later in this section. For the moment let us assume that F has been suitably selected. An optimal codebook of prediction patterns for the current choice of F can be constructed by using the generalized Lloyd algorithm (GLA) [8]. Let B"MB ,B ,2,B N be a training set of k]k image 1 2 blocks from which we wish to construct a codebook of prediction patterns C"M¹ ,¹ ,2,¹ N. 1 2 n We define the distortion incurred for using the prediction pattern ¹"(t ) on block B to be ij D(¹, B) given by k k D(¹, B)" + + Dd(B,i, j,t )D2, ij i/1 j/1
(1)
where d(B,i, j,t ) denotes the prediction error when ij the prediction scheme specified by t is used on the ij pixel B[i, j]. Let ¹ C denote the prediction pattern from ( ,B) C which yields the minimum-squared error when used on block B. Our goal is to construct a codebook C such that +- D(¹ C ,B ) is minimized. i/1 ( ,Bi) i Having defined a distortion measure, in order to use a GLA-like algorithm, we also need a welldefined and efficiently computable ‘centroid’ prediction pattern for a given cluster of image blocks
168
N. Memon / Signal Processing 67 (1998) 163–172
that results in the minimum distortion when used as a representative for the cluster. It is easy to show that given a cluster B"MB ,B ,2,B N of image blocks each of size 1 2 m k]k, the prediction pattern ¹ that gives the minimum distortion +m D(¹,B ) is given by i/1 i m ¹[i, j]"min + Dd(B , i, j, f )D2, 1)s)DFD. (2) k s s k/1 Essentially, the (i, j)th element of the centroid prediction pattern is the prediction scheme from F that minimizes the mean-squared prediction error over all pixels at location (i, j) in the given cluster of image blocks for which we wish to compute a centroid prediction pattern. Once we are equipped with the notion of a centroid prediction pattern for a cluster of image blocks, we can now use the GLA algorithm to design a codebook of prediction patterns from a training set.
3.2. Designing the set of predictors Above we presented an algorithm that constructs a codebook of prediction patterns, given a set of predictors F. But how do we select F? Although the literature abounds with techniques for estimating coefficient values for predictive and random field models, techniques for selecting a neighborhood set based on which prediction will be performed, have not been reported. Generally, neighborhood sets are selected in an ad hoc manner
Table 1 Set of predictors used in simulations Prediction for P[i, j] P[i!1, j] P[i!1, j!1] P[i, j!1] P[i#1, j#1] P[i, j!1]#P[i, j!1]!P[i!1, j!1] P[i, j!1]#(P[i, j!1]!P[i!1, j!1])/2 P[i!1, j]#(P[i!1, j]!P[i!1, j!1])/2 (P[i, j!1]#P[i!1, j])/2
by using a fixed set of spatially adjacent pixels that are closest to the predicted pixel. Kashyap and Chellapa [7] propose a procedure for selecting among competing neighborhood sets but do not provide any mechanism for selecting the candidate sets that are to be considered in the first place. One approach is to select neighborhood sets that comprise of a small but well-defined collection of four or eight neighbors of the predicted pixel. For instance, the simple set of prediction schemes listed in the example in Section 2 provides one such alternative. A more elaborate collection of neighborhood sets can be made by taking all possible subsets of these four pixels. This would give us a total of 16 predictors. The initial coefficients assigned to each element could just be 1/k, where k is the cardinality of the set. Instead, coefficients could also be optimized for the training set under consideration. Note that in the examples discussed so far, each predictor in F uses a different neighborhood set. This, in general, need not be the case. We could have predictors with the same neighborhood set but different prediction coefficients. Indeed, there are many possibilities for initializing the set of predictors F. If we make the assumption that the elements in F are all linear predictors, then coefficients for this set of predictors and the codebook of prediction patterns can be jointly optimized in the manner described below. After initializing F, we iteratively descend to a locally optimal design by repeating the following two steps. In the first step we design a locally optimal codebook for current choice of F. Then in the second step we derive optimal coefficients for prediction schemes in F based on the current codebook. The codebook now is no longer optimal with respect to the set of predictors F, so we repeat step 1 again, followed by step 2 and so on. It is easy to see that the above procedure converges and we stop when the reduction in prediction errors is less than a pre-decided threshold. We have already presented an algorithm for designing a codebook of prediction patterns, given a set of predictors. For a given codebook of prediction patterns, coefficients for the prediction schemes in F that minimize a given cost criteria can be computed by collecting all pixels that get mapped to a particular prediction scheme and then
N. Memon / Signal Processing 67 (1998) 163—172
computing optimal coefficients. If the mean-square error is chosen as the cost criteria to minimize then it is easy to compute optimal coefficients using standard linear regression techniques. For the sake of brevity, we omit the details.
3.3. Designing the excitation codebook Once a codebook for the prediction patterns is fixed, we can now design a codebook for the excitations based on the ideal residual sequence generated when the training sequence is modeled by the optimal prediction pattern identified from the codebook. That is, we take a training sequence of image blocks MX N from image data and for each n block first identify the prediction pattern from the codebook that gives the least-mean-squares prediction error. The prediction residuals obtained from each block give us the residual sequence from which we design an excitation codebook by using the generalized Lloyd algorithm. The distortion measure we used for excitation codebook design was the mean-square distortion measure. The entire design process is described in more detail below. 1. Select best prediction pattern. For each image block B in training sequence MB ,B ,2,B N, find i 1 2 l prediction pattern from the codebook that minimizes mse of residual. 2. Form residuals. For each image block B in traini ing sequence and corresponding prediction pattern ¹(C,B ) form residuals E vector that can i i exactly synthesize the image block. The sequence E ,E ,2,E represents the training se1 2 l quence for residuals. 3. Initialization. Let the number of iterations i"0. Set total distortion for ith iteration D to 0. i Assign the initial excitation codebook EK 0" MEK ,2,EK N in an appropriate manner (possibly 1 n random). Let e'0 be some predetermined threshold value. 4. Classification. For each residual vector E in j E ,2,E find EK 3EK i such that d(EK ,E ) is min1 l k k j imum. Classify residual E in cluster C . Add j k d(EK ,E ) to D . k j i 5. Codebook updating. For each cluster C of k blocks, compute the centroid E@ and assign to k excitation codebook EK i`1.
169
6. ¹ermination test. If D !D (e terminate. i i~1 Else, increment number of iterations i by one and go back to classification stage. The approach outlined above for codebook design is called an open-loop approach. This is because during both prediction pattern and excitation codebook design, prediction residuals in the current block are generated by using the original neighboring pixels. In practice, however, since the original image is not available to the receiver, prediction is only possible with respect to the reconstructed image. Hence the codebook is not optimal for the actual data being used. However, if the resulting reconstructed image blocks are of sufficiently high quality, then they would be very close to the original and the codebook should give close to optimal quality. The prediction pattern codebook and the excitation codebook can also be constructed by using a closed-loop approach similar to that given in [5]. In such an approach the prediction pattern codebook is first designed just like in the open-loop method and subsequently remains fixed during the rest of the design process. In design of the excitation codebook, however, pixels within the current block are predicted with respect to the corresponding adjacent reconstructed pixels. This is different from the open-loop approach where the prediction was done with respect to the original neighboring pixels. We can see that in the closed-loop design process, the training sequence of residuals changes with every iteration and hence convergence to a local minimum is not guaranteed. However, it has been observed in practice that the closed-loop technique gives improvement over the open-loop technique.
4. Simulation results In Table 2 we give results obtained on a set of 576]720 images from the JPEG test set. The original images are color images with three different color planes — ½, º and ». We took the ½ image as our test image. In all cases, the codebooks were generated by using the Balloon, Barb, Hotel and Zelda images and encoding results are presented for the remaining images which were outside this training set.
170
N. Memon / Signal Processing 67 (1998) 163–172
Table 2 Comparison of PSNR and bit rates obtained by proposed technique (PPVQ), standard VQ codes — FSVQ and PTSVQ, and JPEG PPVQ
FSVQ
PTSVQ
JPEG
Image
PSNR
% Smooth
Rate
PSNR
Rate
PSNR
Rate
PSNR
Rate
Board Girl Barb2 Gold Boats
33.39 32.97 26.22 30.86 30.81
60% 26% 30% 15% 43%
0.26 0.36 0.35 0.39 0.32
31.04 31.02 26.14 29.79 29.80
0.38 0.41 0.45 0.44 0.41
31.06 31.13 26.64 29.54 30.15
0.29 0.36 0.46 0.39 0.36
33.16 33.01 28.16 30.60 31.35
0.27 0.35 0.43 0.33 0.33
First, we generated a codebook of 256 prediction patterns for 6]6 image blocks by using the algorithm presented in the previous section. The set of predictors used as the initial set is shown in Table 1. This is essentially the set of predictors specified by the lossless JPEG standard [11]. Codebook initialization was done by randomly selecting blocks from the training set and computing the best prediction scheme for each pixel in the block, which was used to form a prediction pattern that was placed in the codebook. Six iterations of the GLA algorithm were then run on this initial codebook. The prediction coefficient optimization was not done in this implementation. The same predictors were used in each iteration, without changing their coefficients. Once we obtained a codebook of 256 prediction patterns, we then fixed this codebook and used it for excitation codebook construction. An excitation codebook of size 1024 was designed in a closed-loop manner, as described in the previous section It was observed that in smooth areas of the image, the excitation vector has very low energy and can be quantized to the all zeroes vector without significant loss in performance. Here, the prediction pattern alone is enough to reconstruct a good approximation of the original block. In such a case, a special symbol can be transmitted indicating an all zero excitations vector. Furthermore, such blocks were not taken into consideration while designing the excitation codebook. A threshold of 9 was selected and if the average meansquare error in the block was below this threshold, no excitation was transmitted. A different set of images, none of which belonged to the training set, were then encoded using the pair
of codebooks that was constructed. The first three columns in Table 2 give the results that were obtained with the proposed technique, which we call predictive pattern VQ (PPVQ). The bit-rates were obtained by taking the zero-order entropy of codeword indices. The second column gives the percentage of vectors that could be reconstructed satisfactorily (average mean-squares error less than equal to 9) by an appropriate prediction pattern alone and did not require the transmission of an excitation vector. Depending on this value, the number if bits needed to encode the residual vector varied from 30% to 60% of the total bit budget. The large number of smooth blocks for most images, necessitated a variable length coding technique which reduced the bit rate to 50% to 75% of the rate that would be needed using fixed-length coding. For the sake of comparison we then used the public domain vector quantization package by Jill Goldschneider available from ftp://isdl.ee.washington.edu/pub/VQ/code/ The package contains two separate codecs. The first codec is based on a simple full search VQ (FSVQ). It runs the GLA for codebooks of size 2n, n"0,1,2,2, until the final size is reached. Each increase in the size of the codebook is done by splitting codewords from the next smallest codebook. The second codec is the wellknown pruned tree search VQ (PTSVQ) [6] technique. Codebooks of size 256 containing vectors of dimension 16 were constructed with these two codecs, using the same training set that was used for constructing the PPVQ codebooks. Columns 4—7 in Table 2, labelled FSVQ and PTSVQ, show PSNR values and bit rates that were obtained on
N. Memon / Signal Processing 67 (1998) 163—172
171
the test set. To make a fair comparison, the bit rates were again obtained from the zero-order entropy of codeword indices. Sample images are shown in Figs. 4 and 5. Fig. 4 shows the reconstructed girl image using FSVQ and Fig. 5 shows the reconstructed image with PPVQ. In the last two columns we give rates obtained by using the JPEG standard [11]. The particular implementation of JPEG used in this work was a public domain implementation provided by the independent JPEG group. A quality factor Q of 15 was used for all images. We see that PPVQ gives results comparable to JPEG in contrast to FSVQ and PTSVQ. Further, an examination of the reconstructed image with JPEG in Fig. 6 shows blocking Fig. 6. Girl image with JPEG.
and ringing artifacts and a poorer subjective quality as compared to PPVQ. We would like to note that better SNR and PSNR values than those given in Table 2, at comparable bit rates have been reported in the literature with sophisticated enhancements to the basic VQ technique. However, most such enhancements can easily be incorporated into the scheme presented here. We have deliberately used simple codebook generation, organization and search techniques so that a proper estimate of the gains made by the proposed technique can be obtained.
Fig. 4. Girl image with FSVQ.
Fig. 5. Girl image with PPVQ.
5. Conclusions and future work We have introduced a new type of spatial interaction model for image data called a prediction pattern. A prediction pattern is simply an array of prediction schemes from a fixed set, that captures the spatial inter-relationships between intensity values in a local neighborhood. We show how a prediction pattern along with an excitation vector can exactly synthesize an image block. When the excitation vector is quantized we get an approximation of the image block. We then developed an image coding scheme, that partitions an image into blocks and for each block finds the best prediction pattern from a codebook of prediction patterns. An excitation codebook is then searched which yields the best reconstructed block when used in conjunction with this prediction patterns. Algorithms for
172
N. Memon / Signal Processing 67 (1998) 163–172
codebook design were developed for both prediction patterns and excitation vectors. The issue of designing an appropriate set of prediction schemes was also discussed. Preliminary results compared favorably with two other VQ codecs, full search VQ and a pruned tree search VQ available in the public domain. What we have presented in this paper is just an approach and strong evidence that it leads to new and efficient image coding techniques. However, there are many issues which are still unresolved and many improvements and refinements to make. First of all, there are some aspects of our technique that are adhoc and need improvement. We have initialized the set of prediction schemes F in an ad hoc manner. The question of selecting the best set of predictors for a given image or family of images needs to be addressed in a more formal setting. Also, we have only used linear and causal predictors in our design. Next, the size of the codebook of prediction patterns was also selected in an adhoc manner. Perhaps alternative clustering techniques like PNN [4] can be used to jointly optimize the size of the codebook along with the cost of encoding the image with respect to the codebook. There are other improvements that can be made on the basic technique we have reported. The performance of the GLA algorithm can be improved by periodically splitting large clusters and also by stochastic relaxation techniques to avoid local minima. Many alternate distortion measures are also possible while selecting the prediction pattern for a specific block. Another way to improve the codebook is to separate image blocks into different categories based on the texture and edges present within the block. A different codebook of prediction patterns can then be designed for each such type of block. The codebooks can also be constructed and maintained in an adaptive manner. This would lead to a symmetric algorithm. The performance of such an algorithm would have to be compared to other well-known adaptive VQ techniques. Using a random codebook for excitations in a manner similar to certain CELP-based speech-coding techniques, is also a task worth investigating. Another problem
with the techniques described so far is that the selection of predictions pattern and excitation vectors is done in a manner independent of each other despite the fact that the two steps are closely related. Independent quantization does not guarantee that the reconstructed vector is optimal in the mean-squares sense with respect to the pair of codebooks. Hence, a technique that searches for the best prediction pattern and excitation vector pair could be employed. Although this is clearly a better strategy than simply quantizing the residual, the associated increase in complexity could make it impractical in many applications. Note, however, that the increase in complexity is only at the encoder end and the decoder remains very simple. Besides, encoding can be done in parallel with suitable hardware.
References [1] H. Abut, Vector Quantization, IEEE Press, New York, 1990. [2] B.S. Atal, V. Cuperman, A. Gersho, Advances in Speech Coding, Kluwer Academic Publishers, Dordrecht, 1991. [3] B.S. Atal, S.L. Hanauer, Speech analysis and synthesis by linear prediction of the speech wave, J. Acoust. Soc. Amer. 50 (1971) 637—655. [4] W.H. Equitz, A new vector quantization clustering algorithm, IEEE Trans. Acoust. Speech Signal Process. 37 (1989) 1568—1575. [5] A. Gersho, V. Cuperman, Vector quantization: A pattern matching technique for speech coding, IEEE Commun. Mag. 21 (1983) 15—21. [6] A. Gersho, R.M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Dordrecht, 1991. [7] R.L. Kashyap R. Chellapa, Estimation and choice of neighbors in spatial interaction models, IEEE Trans. Inform. Theory IT-29 (1) (1983) 736—745. [8] Y. Linde, A. Buzo, R.M. Gray, An algorithm for vector quantization design, IEEE Trans. Commun. 28 (1980) 84—95. [9] M. Rabbani, P.W. Jones, Digital Image Compression Techniques, Vol. TT7 Tutorial Texts Series, SPIE Optical Engineering Press, Bellingham, Washington, USA, 1991. [10] M.R. Schroeder, B.S. Attal, Code-excited linear prediction (celp): high-quality speech at very low bit rates, Proc. Internat. Conf. Acoustics, Speech and Signal Processing, IEEE Press, New York, 1985, pp. 937—940. [11] G.K. Wallace, The JPEG still picture compression standard, Commun. ACM 34 (4) (1991) 31—44.