SIGNALPROCESSING:
IM.AGE COMMUNICATION
Signal Processing:
Image Communication
11 (1997) 147-159
Compression of multispectral images by address-predictive vector quantizationl Gerard0 R. Canta, Giovanni Poggi* Dipartimento
di Ingegneria Elettronica.
Universitb di Napoli Federico II, Via Ciaudio, 21-80125 Napoli, Italy Received
16 May 1996
Abstract Multispectral images are formed by a large number of component images of a single subject taken in different spectral windows. They are often represented by tens or even hundreds of Mbits of data and huge resources are required to transmit and store them, making some form of data compression necessary. To obtain a high compression efficiency, exploiting both the spatial and the spectral dependency, we propose two coding schemes based on vector quantization and address prediction, one more suited to the case of strong spectral dependence, and the other preferable in the case of strong spatial dependence. The performances of the proposed techniques are assessed by means of numerical experiments and compared to those of other techniques known in the literature. It turns out that for compression ratios on the order of 25: 1 the reconstructed images are almost undistinguishable from the original ones, and that a good image quality is still achieved for ratios as high as 40: 1. 0 1997 Elsevier Science B.V. Keywords: Multispectral;
Image compression; Vector quantization;
1. Introduction With the widening distribution of a cost-effective satellite technology, remote sensing imagery is gaining more and more popularity as a tool for the
analysis and control of wide regions of the Earth. In particular, remotely sensed multispectral images are commonly used for the analysis and classifica-
* Corresponding author. ’ Work carried out under the financial support of the Italian Minister0 dell’Universit8 e della Ricerca Scientifica e Tecnologica.
0923-5965/97/$17.00 0 1997 Elsevier Science B.V. All rights reserl PII SO923-5965(96)00043-4
Address prediction
tion of the surface of the Earth. A multispectral image is obtained by sensing electromagnetic radiation reflected or emitted at different wavelengths by the same subject. In other words, it is a collection of images of a single subject taken in different spectral windows; each component image is individually denoted as a ‘band’ and is characterized by its center frequency. Color images are an everyday example of multispectral images composed of the red, blue and green bands. In Earth analysis, multispectral images are sensed from satellites or planes, and are composed by up to some tens of bands taken at relatively large frequency intervals. The different reflection/emission properties
148
G.R. Canta, G. Poggi 1 Signal Processing: Image Communication I I (1997) 147-159
exhibited by the materials sensed at different wavelengths represent a ‘spectral signature’ that can be exploited to classify distinct areas of the same scene. More sophisticated applications, such as manmade structure identification, require a higher spectral resolution, on the order of ten urn per band, so that in the near future images composed of hundreds or thousands of monochromatic bands (hyperspectral or ultraspectral images) will likely be used. The storage and transmission of multispectral images can be a major problem due to the huge amount of data involved. For instance, each image produced by the sensor GER (Geophysical Environmental Research Corp.) is composed of 63 bands of approximately 2 Mbytes each, while the HIRIS (High-Resolution Imaging Spectrometer) hyperspectral images will be composed of 192 bands each. The heavy requirements in channel and memory capacity can be reduced by exploiting the statistical and psychovisual redundancy of the images through some form of data compression. In order to preserve fully the information content, one should use lossless compression [21], but the data reduction obtained in this case is usually very small, of the order of 2:l. To obtain a significant data reduction one has to resort to lossy compression techniques and to accept some degradation in image quality. In the case of multispectral images it is reasonable to expect a good compression performance because they exhibit not only a high spatial redundancy, but also a high dependency among different spectral bands. However, the large majority of the existing image compression algorithms concern only the case of still (isolated) images [ 111, so they do not take into account any spectral dependency. (On the other hand, algorithms devised for the encoding of moving pictures [ 163 rely heavily on characteristics that are specific to that source, e.g., background invariance.) Therefore, it is necessary to develop new algorithms that exploit the interband dependency specifically devoted to the compression of multispectral images. In the last few years, several such algorithms have been proposed. Some of them [20,6, l] rely on transform coding to exploit the interband dependency. In fact, by operating a suitable linear transformation on a block of original data one can
effectively concentrate the energy in the transformed domain, and subsequently carry out a simple compression algorithm. In [20,6] it is used the Karhunen-Loewe (or principal component) transform, which is optimal once the linear statistics of the data are known, while in [l] it is used the popular discrete cosine transform to obtain a simpler implementation. As an alternative approach, several authors [2, lo,93 resort to vector quantization (VQ) [13, 14, 81. Indeed, the use of VQ as a building block in a compression algorithm is extremely appealing, since it has been proven [S] to be the optimal algorithm among all block coding techniques, including all transform coding techniques. Its main drawback is its high computational complexity, which imposes a severe limitation on the performance actually achievable. To keep the complexity reasonable, [2, lo] propose to encode multispectral images by using VQ only to exploit the interband dependency. A more effective technique is the feature predictive vector quantization (FPVQ) proposed by Gupta and Gersho [9], which takes advantage of both the spectral dependency, by means of nonlinear interband prediction, and the spatial dependency, through the vector quantization of the prediction errors. FPVQ allows for compression ratios of the order of 20: 1 with a good quality of the encoded images. In this paper, we propose two VQ-based techniques for encoding multispectral images, which efficiently exploit both spatial and spectral dependency. They are based on the address-predictive vector quantization (APVQ) algorithm [3, 17, 191, proposed for the encoding of gray-scale images. APVQ works by first vector quantizing the image and then predicting and encoding the addresses (indexes) of the VQ codewords. Experiments on well-known test images have shown that, for a given image quality and computational complexity, APVQ yields a compression ratio 60% higher than that of ordinary memoryless VQ, whereas a 100% increase is easily accomplished by accepting some negligible image impairment. Here, we tailor APVQ to the case of multispectral images by considering two different strategies to take into account the spectral redundancy. In the first strategy, we simply substitute three-dimensional (3-D)
G.R. Canta, G. Poggi / Signal Processing:
blocks (spatial + spectral) for 2-D blocks (spatial only), without changing the APVQ encoding scheme. In the second strategy, we stay with 2-D blocks, and perform address prediction based on both intraband and interband information. In this latter case, the design of the predictor (more generally, of the predictive encoder) becomes the central issue. The choice between the two strategies is based mainly on the type of dependency (spatial versus spectral) that prevails in the images to be coded. The rest of the paper is organized as follows. Section 2 provides the strictly necessary background on VQ and presents the APVQ algorithm. Then, in Section 3 the new encoding schemes for multispectral images are developed, and in Section 4 their performances are assessed by means of numerical experiments and compared to those of other techniques recently proposed in the literature. Finally, Section 5 draws conclusions.
2. Vector quantization Vector quantization is the extension of ordinary scalar quantization (SQ) to a multidimensional space. In SQ each input sample x E LJ?is represented by a value y chosen from a finite set {y,~9?, n = l,..., Nj so that the difference Ix - y,( is minimized over the whole set. The minimum distance (and minimum square error) criterion is not the only possible one, but is by far the most common and we will use it throughout this paper. VQ is conceptually identical to SQ, except that now a block of K input samples is quantized with a single operation. So, given the input vector XE@, the quantizer searches the codebook g z (Y~E.#, y1= 1,. . , N) for the closest codevector, namely the yn for which I/x -Ynl12 G /Ix -J&II2
~Y?nE’Y.
(1)
The codebook can be designed easily [13,12] on the basis of either the theoretical or the empirical statistical description of the source. Despite their similarity, SQ and VQ are usually employed for quite different tasks. While the main use of SQ is the digitization of truly analog samples, VQ is more often employed as a data compression
Image Communication
11 (1997) 147-159
149
Fig. 1. VQ coding/decoding scheme.
technique and operates on samples that are already in numerical form. With VQ, starting from K samples each quantized by B bits, one ends up with an index represented by log, N bits which addresses the best matching codevector and contains all the relevant information at the output (see Fig. 1). The compression ratio KB/log, N is often of the order of ten or even more, and the bit-rate, R = log, N/K, less than one bit/pixel. In other words, the encoder of the VQ system maps the input vector to an index into a codebook according to a given rule, thereby achieving a data compression. The decoder simply retrieves the approximating codevector on the basis of the index. In any lossy compression process, some information is irretrievably lost, and the signal coming out at the output of the decoder is affected by some distortion. The amount of distortion, usually measured by the mean square error (MSE), is related to the compression ratio (the larger the compression, the higher the distortion), and to the effectiveness of the compression technique. Despite its simple approach, VQ can be shown [S] to be the optimal block-based compression technique for a given block size, and its performance improves as the block size increases. Due to its theoretical optimality and to the simple structure of both the encoder and the decoder, VQ is one of the most popular compression techniques and has found a large number of applications in the image coding field. The major drawback of VQ is its computational/memory complexity which is proportional to the cardinality of the codebook N = 2RK and for a given rate grows exponentially with the block size K. As a consequence, one is forced to use small blocks (4 x 4 samples is a typical size in image coding), and must neglect the statistical dependence among all samples belonging to different blocks. Clearly, this severely limits the actual performance of VQ.
150
G.R. Canta, G. Poggi / Signal Processing:
Image Communication
I I (I 997) 147-159
Fig. 2. PVQ coding/decoding scheme: the current block is predicted from past vector quantized blocks
A partial solution to this problem is offered by predictive vector quantization (PVQ) [S] and in general by all VQ coding schemes which, in a way or another, take into account interblock dependency. Fig. 2 shows the coding/decoding scheme of conventional PVQ: a linear prediction i of the current block x is formed on the basis of past blocks, then the prediction error e is vector quantized and the corresponding index n is transmitted. The decoder builds the approximate version of the input by summing the prediction and the quantized error, P = i + e,. Since encoder and decoder must be synchronized, the prediction is carried out from the reconstructed blocks Z’,which are available at both ends. If the input blocks exhibit a strong statistical dependence and the predictor is well designed, a significant improvement is obtained with respect to memoryless VQ. In general, the optimal predictor is nonlinear, but given the high dimensionality of the problem (we are predicting from one or more blocks of past samples) the design complexity is so high that only linear predictors are commonly implemented. However, by looking at the PVQ decoder, it should be easy to realize that all the information needed to carry out the prediction lies in the indexes. Keeping
this fact into account, an alternative encoding/decoding scheme can be derived, see Fig. 3, where the prediction depends explicitly only on the indexes. This new predictor is necessarily nonlinear, but since the dimensionality of the problem is much reduced one can afford to carry out the design and obtain a substantial improvement in performance. The scheme in Fig. 3 can be seen as a special form of nonlinear interpolative VQ [7], and has been extended to the case of multispectral images by Gupta and Gersho in their FPVQ algorithm [9].
2.1. Address predictive
VQ
The idea of reducing the dimensionality of the problem by working in the index space rather than in the vector space is also present in a class of VQ-based techniques where the indexes generated by the VQ encoder are themselves compressed in order to further reduce the bit-rate. The simplest such technique consists of entropy encoding the indexes exploiting thereby their different frequencies of occurrence. However, the individual encoding of the indexes is not very effective, and some form of joint encoding is necessary in order to
Fig. 3. PVQ coding/decoding scheme: the current block is predicted from the VQ indexes associated with past blocks.
G.R. Canta, G. Poggi / Signal Processing: Image Communication I1 (1997) 147-159
obtain a satisfactory efficiency. In fact, joint index encoding makes sense for strongly correlated sources (like images or speech) because close blocks of samples are highly correlated, and the indexes associated with them are statistically dependent. However, straight joint entropy coding of indexes can be computationally very intensive, and techniques based on this approach [I53 have to deal with serious problems of complexity. For this reason, the low-complexity predictive approach proposed in APVQ is more appealing [17,19]. The basic APVQ scheme is shown in Fig. 4: the index emitted by the VQ encoder for the current input block is predicted from past indexes, and the difference is then entropy encoded. Note that the predictor here does not work on blocks of pixels like in PVQ but only on scalar indexes, thereby greatly simplifying the problem. The predictor is of the classified-linear type: the index are organized in a squared array so that they form a virtual image similar to the original one, and the current index is predicted as the preceding index lying on either the same row or the same column, chosen adaptively on the basis of the context. Such a simple predictor works surprisingly well provided that the indexes are linearly dependent, which is guaranteed by the use of ordered VQ codebooks [17]. For what concerns us, a codebook will be considered to be ordered if codevectors having close indexes happen to be similar, namely, if In - m1 small implies lIyn- ym11’small, at least most of the times. We will assume (realistically) that ordered codebooks can be designed as needed, referring the interested reader to more specific papers [18] for further information. The encoding quality of APVQ is exactly the same of memoryless VQ, because the index coding is lossless. Also the computation/memory complexity is almost the same, since both prediction and index coding are straightforward. However, APVQ exhibits a much higher compression ability than memoryless VQ, and in various experiments on well-known test images [17] a bit-rate reduction of up to 40% was observed. An even better performance is obtained by removing the constraint of lossless address coding in APVQ, and exploiting the use of ordered codebooks to trade off distortion for bit-rate in a
151
Fig. 4. APVQ coding scheme: the current VQ index is predicted from past indexes.
favourable way. When an ordered codebook is used, the substitution of the correct address with a close address entails the substitution of the bestmatching codevector with a similar codeword. Therefore, one can allow for small errors in the address encoding and increase the compression efficiency with only a limited impairment of the decoded image. To clarify this point, let us consider the encoding of a generic vector x in APVQ. First, on the basis of past indexes, we guess what the current index will be, say, ri, then we find the actual index n as the one that minimizes the distortion d = IIX -A/l22 and finally the error prediction e = n - ri is computed and encoded with a codeword I(e) bit long. Note that the actual index it is chosen regardless of what the prediction ii was, with only the goal of minimizing the distortion. However, there might exist another index, say m, that provides almost the same distortion, /Ix - y,,, //2 z J/x - yn (I2, with a much smaller encoding cost, l(m - $<
n) = llx -ynl12 + 1 l(G- n),
(2)
which is a weighted sum of distortion and bit-rate, with the scalar /z controlling the exchange between the two terms. For each input vector x and index prediction vi one chases the index n that minimizes this generalized cost measure rather than the distortion measure alone, and an increase in distortion is accepted whenever it entails a large enough saving in bit-rate. The resulting encoding scheme is shown in Fig. 5. The ‘PRED’ block outputs the
152
G.R. Canta, G. Poggi J Signal Processing:
gen.cost
n
VQ0’) A
Fig. 5. Generalized-cost-measure
APVQ coding scheme.
prediction n^for the current index, which is simply the index of the previous block on the same row or on the same column depending on a simple adaptation rule; given the input block x and the index prediction iz, the VQ encoder singles out the optimal index n according to the generalized cost measure (2); finally, the prediction error is entropy encoded. In [19], it was also used a lookahead strategy, with large groups of blocks encoded at once so as to minimize the cost measure over the whole group. The resulting algorithm is called generalized-cost-measure APVQ (GCM-APVQ) and reduces to APVQ in the special case A = 0. In the following sections we will refer to both algorithms simply as APVQ, and specify the value of 1. if necessary. Numerical experiments [ 191 showed that by properly choosing 1 one can significantly increase the compression ratio with respect to basic APVQ, with a negligible loss in the quality of the decoded images.
3. The proposed algorithms We now extend APVQ to the case of multispectral images. Our goal is to take into account both the spatial and the spectral dependencies so as to increase the encoding efficiency as compared to the case in which only the spatial dependence is exploited.
3.1. 3-D APVQ The first algorithm we consider is the straightforward extension of the APVQ to three-dimensional images. We can keep using the scheme of Fig. 5, with the only difference being that now VQ works
Image Communication
II (1997) 147-159
on 3-D blocks composed of pixels from several bands. Specially, the block x is now formed by stacking the subblocksx, drawn from several bands (the index k specifies the band) in the same spatial location. The whole encoding procedure is just the same as in the 2-D case: for each input block, the vector quantizer looks for the multispectral codevector yn that minimizes the generalized cost measure (2), and then the address predictor and the entropy coder operate exactly as in the 2-D case. In fact, the operating principle of the VQ encoder is independent of the dimensionality, the size and the shape of the blocks: it always consists of associating a suitable codebook address to each block of pixels. Nonetheless, it is important to realize that the block size controls the computational complexity, while shape and dimensionality influence the encoding performance and should be chosen so that each block is composed of a set of highly dependent pixels. By using 3-D blocks we allow for a higher structural freedom than in the 2-D case and exploit the statistical dependence in the best possible way because VQ is the optimal encoding strategy. However, we cannot simply stack 2-D blocks at will, because the complexity grows exponentially with the number of blocks stacked: in order to use 3-D blocks it is necessary to reduce their spatial extent. As an example, in place of blocks of 4 x 4 x 1 pixels, where the last number refers to the number of bands encoded jointly, we can use blocks of 2 x 2 x 4 pixels, but not 4 x 4 x 4 because the complexity would be already too high. In other words, to improve the spectral-domain efficiency we have to renounce some spatial-domain efficiency. As a further example, consider the case of 16 spectral bands: to encode all of them jointly with a reasonable complexity we are constrained to use blocks of 1 x 1 x 16 pixels and to neglect, therefore, any spatial dependency. APVQ would still exploit part of the spatial redundancy through address prediction, but could not make up for all the loss in the VQ step. Therefore, it is clear that to choose the shape and the dimensionality of the VQ blocks one has to take into account the amount of spatial and spectral dependency in the multispectral image. If the former is much stronger, it is convenient to keep using large 2-D blocks, and to exploit the interband dependency in a different way.
G.R. Canta, G. Poggi / Signal Processing:
3.2. Interband APVQ
Based on the previous considerations, we worked at developing another coding scheme based on APVQ where, however, VQ is always performed on 2-D blocks, and the interband dependency is exploited only through address prediction. The basic algorithm is illustrated by the scheme of Fig. 6 with reference, for the sake of simplicity, to the case of two spectral bands (the extension to the case of multiple predicted bands is straightforward). Both bands are encoded by means of APVQ, but now the indexes generated by the first band (reference band) affect also the prediction of those of the second one, so as to take into account the spectral dependence. Therefore, the ‘PRED,’ predictor in Fig. 6 is purely intraband, while the ‘PRED,’ is inter/intraband. Although the encoder is conceptually straightforward, it is not obvious how to design a good inter/ intraband index predictor. Of course, such a predictor should not be linear since the dependence among different bands is inherently nonlinear: a given area can exhibit highly variable reflectivity even in very close bands. There is no simple general procedure for the design of nonlinear predictors. The most popular approach is the maximum probability criterion, in which the current index is predicted as the one that has the highest probability of occurrence, given a set of observed indexes in its neighborhood. This approach can be exceedingly complex, depending on the number of predicting indexes considered, namely, the size of the neighborhood. In the simplest case, the index of spatial
Image Communication
11 (1997) 147-159
153
coordinates i and j in the second band, fri,j,z, is predicted only from the corresponding index in the first band, ni,j, 1, and the maximum probability prediction is thus $i. j. 2 =Uni,j.
1)
=argmaxPr{Ni,j,z=nINi.j.1 n
=I?i,j,1),
(3)
where the symbol N denotes that the index has to be considered as a random variable and n is the value it assumes. The probabilities are estimated over a suitable training set, and the maximum is evaluated off-line, so the predictor reduces to a look-up table. To also take into account the spatial dependency, and to use indexes in both the reference and the current band to carry out the prediction, we have to give up the maximum probability approach since it would require the computation and storage of a huge look-up table: one entry for each set of values assumed by the predicting indexes. As a viable alternative, we implemented a simple classified-nonlinear predictor similar to the one used in [ 171. To predict the current index, we use only one of its neighbors, chosen adaptively as the one that is expected to provide the best prediction. More precisely, the index /Zi,j,z can be predicted from either the corresponding index in the reference band ni,j, i, or one of two neighbors in the same band, Hi- i,j, 2 and Iri,j- 1.2. Therefore, three look-up tables are needed to implement the functions fs(.), fn(.) and fv(.), defined like in (3), that predict in the ‘spectral’, horizontal and vertical directions, respectively. The choice of the predicting direction is made by considering a small neighborhood of the current index and looking for the minimum-error direction. With reference to Fig. 7, let X be index to be predicted: we compare the quantities d,=IF-f,(B)I+lE-S,(A)l+lG-,f,(C)l, AH = I F -.ME)l + I B -h(A)1 + I D -fdC)l,
4 = I G -.fv@)I + I C -fvMI
Fig. 6. Interband bands).
APVQ
coding
scheme
(case of two spectral
+ ID -fv(Wl> (4)
where E, F and G are the same-band neighbors of X, and A, B, C and D are their homologous in the reference band, and choose the direction corresponding to the smallest A. As an example, if dH is
154
(k-l)-st
G.R. Canta, G. Poggi 1 Signal Processing:
band e
E
F.." .
G
X
A
B
.*f
D
EEI
t
k-th band
Fig. 7. Configuration of indexes for the classified-nonlinear predictor: X is the index to be predicted, A to G are known.
the smallest, the prediction in the horizontal direction is expected to be the most reliable, so we put 2 =f”(G). It should be clear that, regardless of how the prediction is performed, one can apply the generalized measure (2) to choose the best-matching codeword. The predictor only affects the statistics of the prediction error i - n, and hence the performance. Of course, a different predictor has to be used for the reference band, based only on intraband information, and it is reasonable to expect poorer encoding performance in this case. However, the resulting bit-rate increase is negligible for an image composed of a large number of bands, because only the first one is encoded intraband. Indeed, once the second one is predicted and encoded, it can be used in turn to predict the third one, and so on in a chain procedure that goes on up until the last one. If the image is composed of only a few bands, the encoding scheme can be slightly modified and a joint minimization can be carried out to improve the performance. In this case, all the blocks occupying the same spatial location in the various band are jointly encoded with the goal of minimizing an overall generalized distortion measure which takes into account both the intraband and the interband costs. Of course, this strategy is more computationally demanding but still affordable in the case of few bands.
4. Numerical results To assess the techniques so far proposed, and compare their performance to that of techniques
Image Communication
II (1997) 147-159
already known in the literature, we carried out experiments on a multispectral image provided us by the European Space Agency. The image portrays an agricultural region near Freiburg (Germany) from a height of 3000 m, and has been acquired by the sensor GER. There are 63 spectral bands, covering the wavelength interval 042.5 km; each band is formed by 1953 lines of 512 pixels, and each pixel is represented by two bytes, although the actual range of intensities varies with the band, and for most bands no more than ten bits out of 16 are actually data. We extracted a portion of 512 x 512 pixels, including both agricultural and urban areas, to be used as test image, keeping the rest as a training set to design the VQ codebooks, predictors and Huffman codebooks as needed. To keep the complexity reasonable for 3-D VQ, we decided to operate on groups of four bands, and to carry out on these all the comparisons of interest. Indeed, the computational complexity is the major limiting factor for the performance of all techniques considered here, since it forces one to use small blocks, be they purely spatial or three-dimensional, and to exploit only a small fraction of the redundancy of the image. In Fig. 8 the test portion of bands 10-13, that form the first set selected for our experiments, are shown. These bands clearly exhibit a high degree of correlation and so the exploitation of spectral dependency is bound to provide a significant gain in performance over the isolated encoding of each band. However, to investigate the case of weakly dependent bands we also considered a spectrally subsampled set, formed by bands 6, 12, 18 and 24, shown in Fig. 9. Besides being less correlated, these bands present strong variations in the mean intensity. This poses a problem when using 3-D VQ where blocks of all bands are quantized jointly, since errors of equal intensity, equally important for the VQ encoder, can be unnoticeable in a highenergy band, and disastrous in a low-energy band. Therefore, prior to processing, all bands are converted to float and scaled to unitary power, to be restored to the original power once finished. In all experiments the VQ codebooks are designed by means of the Kohonen algorithm (see [12, IS] for a thorough description) so that they exhibit a high
G.R. Canta, G. Poggi / Signal Processing:
Fig. 8. The first test set: (a) band
degree of organization, suited for APVQ. A specific codebook is designed for each band or combination of bands in order to take into account the statistical variations in the spectral domain and improve the performance. As an example, the codebook designed for the set of contigous bands of Fig. 8 would certainly work poorly on the spectrally subsampled set of Fig. 9, so two ad hoc codebooks are needed. The design is carried out on the training set, while the encoding results concern the corresponding test set, not used for training. The same applies to the design of all predictors and Huffman codebooks. The vector size is fixed to 16 (3-D blocks of 2 x 2 x 4 pixels and 2-D blocks of 4 x 4).
Image Communication
10; (b) band
11 (1997) 147-159
155
1I; (c) band 12; (d) band 13
In our first experiment we apply 3-D APVQ to the first set of bands using a codebook of size 256, and having 2 vary from 0 (square error measure) to 0.1 (generalized cost with heavy weight to the bitrate). The results are shown in Fig. 10 in terms of average SNR (signal-to-noise ratio) on the four bands as a function of the average bit-rate. For comparison purposes, we also show the results obtained by conventional VQ (always on 3-D blocks), and by APVQ with conventional cost measure (jL= 0), for various codebook sizes. With a codebook of size N = 256 and ,? = 0, APVQ reduces the bit-rate by 20% approximately compared to memoryless VQ: although valuable, the gain is relatively small due to the high spatial activity exhibited by
156
G.R. Canta, G. Poggi ! Signal Procesing:
Fig. 9. The second
22 0 I
.
x
Image Communication
test set: (a) band 6; (b) band
N=32
x 3-d
VQ
. 3-d
APVQ
i 1 (h=O)
0 50
rate
Fig. 10. Average SNR versus average bit-rate for the first test set using 3-D APVQ and 3-D VQ with various codebook sizes N. The curve is obtained for N = 256 and i. t (0,O. 1).
12; (c) band
If (1997) 147-159
18: (d) band 24
the image. The comparison becomes more favourable when lower bit-rates are of interest. In fact, with conventional VQ the only way to trade off distortion for bit-rate is to use smaller codebooks, while APVQ trades them off optimally (for the selected cost measure) at any desired level through the choice of i. As an example, for an SNR of about 23 dB, APVQ with N = 256 and i = 0.04 costs 0.23 bit/pixel, while VQ with N = 64 requires 0.38 bit/pixel, and APVQ itself with N = 64 and A = 0 requires 0.29 bit/pixel. In Fig. 11 we compare the results obtained by 3-D APVQ (replotted here for the sake of clarity) to those provided by interband APVQ using both the maximum probability (interband) predictor and the classified (inter/intraband) predictor in which the external band 9 is used as a reference.
G.R. Canta, G. Poggi / Signal Processing: Image Communication II (1997) 147-159
151
Table 1 SNR for the bands lo-13 of the first test set encoded by several techniques SNR3.D (dB)
SNR,.,. (dB)
SNR,,,,,. (dB)
SNRFW (dB)
22.11 22.60 22.75 21.68 22.30
22.92 22.90 22.95 22.88 22.94
21.50 21.85 22.05 21.24 21.59
21.54 22.90 21.93 22.01 22.00
Average bit-rate 0.420 bit/pixel Band 10 Band 11 Band 12 Band 13 Average
0.10
0.20
0.30
0.40
0.50 rate
23.62 25.16 23.45 24.07 24.11
22.14 22.63 22.71 21.70 22.34
Average bit-rate 0.245 bit/pixel
Fig. 11. Average SNR versus average bit-rate for the first test set using 3-D APVQ and interband APVQ (N = 1024, i.~(O,0.1)). Also shown, some points obtained by the FPVQ technique.
Band 10 Band 11 Band 12 Band 13 Average
22.11 24.00 22.74 23.25 23.11
3-D APVQ is clearly superior at all bit-rates. This is due to the much stronger dependence existing in the spectral domain than in the spatial one when bands this close (spectrally) are considered. When given the possibility, namely when using 3-D blocks, VQ exploits such a dependence very efficiently and this accounts for the 2 dB improvement observed here. Interband APVQ only partially makes up for this gap by exploiting the strong spectral dependency by means of the prediction, and largely reducing the bit-rate at almost no SNR cost. Significantly, the pure interband predictor here performs better than the inter/intraband one. In the same figure we also report some results obtained by implementing the FPVQ technique with various combinations of the relevant parameters (codebook sizes, decision threshold). It appears that FPVQ is slightly superior to interband APVQ (it uses an interband approach itself) but much worse than 3-D APVQ. Table 1 gives some numerical results detailed for each band. To gain some better insight about the quality of the encoded images, Fig. 12 shows band 12 encoded by several techniques. The analysis of the figure confirms that 3-D APVQ yields a clearly superior image quality than both interband techniques for the same bit-rate (about 0.42 bit/pixel), and still a comparable image quality for half that cost (0.2 bit/pixel).
The situation is quite different, as shown by Fig. 13, when we experiment with the second set which is the spectrally subsampled one. Here, the spectral dependency is much weaker, of course, and interband APVQ performs slightly better than both 3-D APVQ and FPVQ, whose results, in turn, are almost indistinguishable on a wide range of rates. Note that in this case the inter/intraband predictor does provide some improvements over the pure interband one, but hardly worth the extra memory and design complexity required. The results confirm that in this case, and more in general in all cases where the spatial dependency prevails, there is nothing to be gained in using 3-D blocks. However, the performance differences among all techniques are so small, here, that the computational/memory costs and the implementation complexity are more reasonable ranking criteria. Under this point of view, 3-D APVQ is again preferred because it uses only one VQ codebook for all bands and its implementation is straightforward. Interband APVQ and FPVQ have a computational cost comparable to that of 3-D APVQ, but a higher memory cost because they both need a VQ codebook and a predictor for each band. In addition, the selection of the parameters in FPVQ in order to obtain satisfactory results turns out to be quite difficult.
21.68 22.07 22.23 21.39 21.80
158
G.R. Cunta, G. Poggi i Signal Processing:
Image Communication
I I (1997) 147-159
Fig. 12. Band 12 of the first test set encoded by several techniques: (a) 3-D APVQ, iL = 0, average bit-rate 0.420 bit/pixel, SNR = 23.45 dB; (b) FPVQ, average bit-rate 0.420 bit/pixel. SNR = 22.95 dB; (c) Interband APVQ (m.p.), average bit-rate 0.420 bit/pixel, SNR = 22.77 dB; (d) 3-D APVQ. i, = 0.02, average bit-rate 0.200 bit/pixel. SNR = 22.43 dB.
5. Conclusions In this paper, we build upon the APVQ technique to devise two new coding schemes for multispectral images, one using 3-D blocks, and the other using 2-D blocks and interband prediction. We trade off distortion for coding efficiency by using a suitable cost measure and reach a bit rate as low as 0.25 bit/pixel (a compression factor of 40) with little loss of quality with respect to full-rate VQ. Numerical experiments show that the use of 3-D blocks is highly recommended whenever the interband dependence is very strong. In such a situation, indeed, the 3-D APVQ technique performs
l-2 dB better than both interband APVQ and the well-known FPVQ, which are based on 2-D blocks and interband prediction. Notice that this is the case where a good compression performance is most urgent since it typically arises when working with hyperspectral images, formed by a large number of spectrally close bands, and represented by hundreds of Mbits of data. All experiments (including unreported ones with more sophisticated predictors) showed the relative ineffectiveness of spatial prediction, with images so rich in high-frequency details. For this reason, our current line of research is centered on some form of prior classification/segmentation of the image, that
G.R. Canta, G. Poggi / Signal Processing: Image Communication 11 (1997) 147-159
230
k
A 3-d
APVQ
B Interband
APVQ
(m.p)
C Interband
APVQ
(class.)
l
FPVQ
i I
0.20
0.30
0.40
1
1
I 0.50
I
I
4
rate
Fig. 13. Average SNR versus average bit-rate for the second test set using 3-D APVQ and interband APVQ (N = 1024, n~(0,O.l)). Also shown, some points obtained by the FPVQ technique.
integrates information from all of the bands, in order to pool together pixels of similar characteristics and encode them more efficiently in the spatial as well as in the spectra domain.
Acknowledgements
The authors wish to thank Prof. L. Paura for his helpful comments and suggestions, and Dr. C.W. Tseng for her careful proof-reading of the manuscript.
References [1] G.P. Abousleman, M.W. Marcellin and B.R. Hunt, “Compression of hyperspectral imagery using the 3-d DCT and hybrid DPCM/DCT”, IEEE Trans. Geosci. Remote Sensing, Vol. 33, January 1995, pp. 26-34. [2] R.L. Baker and Y.T. Tze, “Compression of high spectral resolution imagery”, Proc. SPIE, No. 974, 1988, pp. 255-264. [3] E. Cammarota and G. Poggi, “Address predictive vector quantization of images”, Proc. AEI Symposium on Image Processing: Applications and Trends, Genova (I), June 199 1, pp. 67774.
159
[4] P.A. Chou, T. Lookabaugh and R.M. Gray, “Entropyconstrained vector quantization”, IEEE Trans. Acoust. Speech Siynal Process., Vol. ASSP-37, January 1989, pp. 3142. [S] V. Cuperman and A. Gersho, “Vector predictive coding of speech at 16 Kb/s”, IEEE Trans. Commun., Vol. COM-33, July 1985, pp. 685-696. [6] B.R. Epstein, R. Hingorani, J.M. Shapiro and M. Czigler, “Multispectral KLT-wavelet data compression for Landsat thematic mapper images”, Proc. Data Compression Conf, Snowbird (Utah), April 1992, pp. 20&208. vector quantc71 A. Gersho, “Optimal nonlinear interpolative ization”. IEEE Trans. Commun., Vol. COM-38, September 1990, pp. 1285-1287. 181 A. Gersho and R.M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Boston, MA, 1992. [91 S. Gupta and A. Gersho, “Feature predictivre vector quantization of multispectral images”, [EEE Truns. Geosci. Remote Sensing, Vol. 30, May 1992, pp. 491-501. lossy data Cl01 S. Jaggi. “An investigative study of multispectral compression using vector quantization”, Proc. SPIE, No. 1702, 1992, pp. 238-249. Cl11 N.S. Jayant and P. Noll, Digital Coding of U’auefbrms, Prentice-Hall, Englewood Cliffs, NJ, 1984. SelfOrganization and Associative Memory, Cl-21 T. Kohonen, 2nd edition, Springer, New York, 1988. for [I31 Y. Linde, A. Buzo and R.M. Gray, “An algorithm vector quantizer design”, IEEE Trans. Commun., Vol. COM-28, January 1980, pp. 8495. Cl41 J. Makhoul, S. Roucos and H. Gish, “Vector quantization in speech coding”, Proc. IEEE, Vol. 73, November 1985, pp. 1551-1588. and Y. Feng, “Image compression using [I51 N.M. Nasrabadi address-vector quantization”, ZEEE Trans. Commun., Vol. COM-38, December 1990, pp. 2166-2173. Cl61 A.N. Netravali and B.G. Haskell, Digital Pictures, Representation and Compression, Plenum Press, New York, 1988. Address-predictive vector quantization of 1171 G. Poggi, images by topology-preserving code-book ordering”, European Trans. Telecomm.. Vol. 4. July 1993, pp. 423 434. of the Kohonen algorithm in Cl81 G. Poggi, “Applications vector quantization”, European Trans. Telecomm., Vol. 6. March 1995, pp. 191-202. address-predictive [I91 G. Poggi, “Generalized-cost-measure vector quantization”, IEEE Trans. Image Processing, Vol. 5, January 1996, pp. 49955. PO1 J.A. Saghri and A.G. Tescher, “Near lossless bandwidth compression for radiometric data”, Opt. Engrg., Vol. 30, July 1991. pp. 934-939. Sci. Press, Rockc211 J. Storer. Data Compression, Computer ville, MD, 1988.