JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION
Vol. 9, No. 2, June, pp. 107–118, 1998 ARTICLE NO. VC980383
Frame-Adaptive Vector Quantization F. Idris and S. Panchanathan* Department of Electrical Engineering, University of Ottawa Ottawa, Ontario K1N 6N5, Canada Received December 17, 1993; accepted April 16, 1998
Vector quantization (VQ) is a promising technique for low bit rate image compression. Recently image sequence compression algorithms based on VQ have been reported in the literature. Image sequences are highly nonstationary and generally exhibit variations from frame to frame and from scene to scene; hence using a fixed VQ codebook to encode the different frames/ sequences may not always guarantee a good coding performance. Several adaptive techniques which improve the coding performance have been reported in the literature. However, we note that most adaptive techniques result in further increases in computational complexity and/or the bit rate. In this paper, a new frame adaptive VQ technique for image sequence compression (FAVQ) is presented. This technique exploits the inter/ intraframe correlations and provides frame adaptability at a reduced complexity. A dynamic self organized codebook is used to track the local statistics from frame to frame. Computer simulations using standard CCITT sequences demonstrate the superior coding performance of FAVQ. In addition, FAVQ is a single pass technique which makes possible real time implementation. 1998 Academic Press
1. INTRODUCTION
Image sequence compression is becoming increasingly important with the advent of broadband networks (ISDN, ATM, etc.) and coding standards (JPEG, MPEG, H.261, etc.). The goal of image sequence compression is to reduce the number of bits required to represent an image sequence while maintaining an acceptable fidelity. This will provide an efficient use of available bandwidth channel for applications in the areas of teleconferencing and telepresence. Efficient compression is achieved by exploiting the spatial and temporal redundancies in an image sequence. Typically, the spatial redundancies are removed using intraframe techniques, while the temporal redundancies are usually removed using interframe techniques. Several intraframe and interframe techniques based on scalar quantization have been reported in the literature [1]. Examples include, differential pulse code modulation (DPCM), transform coding, and hybrid coding. According to Shan* E-mail:
[email protected].
non’s rate distortion theory [3], a better performance can be achieved by coding vectors instead of scalars, even though the source is memoryless. Nasrabadi et al. [4] have presented a review of the basic vector quantization techniques and many of their variations for image/image sequence compression. In vector quantization (VQ), a set of representative images (training set) is decomposed into L-dimensional vectors. An iterative clustering algorithm such as the LBG algorithm [4] is used to generate a codebook CB 5 hWi ; 5 1, . . . , N j, where N is the size of the codebook. The codebook is then made available at both the transmitter and the receiver. In the encoding process, the image to be compressed is decomposed into L-dimensional vectors. For each input vector Vi 5 hvi 1 , vi 2 , . . . , viL j, the codebook CB is searched using a nearest neighbor rule to find the closest codeword Wj . Compression is achieved by transmitting the label (index) i, corresponding to Wi . Reconstruction of images is implemented by using i as an address to a table containing the codewords. Several image sequence compression algorithms based on VQ have been reported in the literature. The first class of algorithms uses a fixed VQ codebook. For example, Murakami et al. [7] have presented a vector quantizer for video signals where a fixed codebook is generated using a long training sequence of normalized (by average and standard deviation) vectors. The mean and standard deviation are transmitted as side information along with the label. Corte-real and Alves [8] have proposed an image sequence vector quantization scheme, where the differential image is divided into blocks of different shapes and sizes. Several vector quantizers are then used to encode the resulting blocks. Huguet and Torres [9] have extended the concept of two-dimensional VQ to image sequence compression (three-dimensional VQ). In their technique, the image sequence is first divided into three-dimensional blocks. The blocks are then rearranged into vectors and quantized using a previously trained codebook. Guo et al. [10] have proposed a technique in which the difference between the input frame and the predicted frame is vector quantized. The difference frame is divided into 16-dimensional vectors. For each difference vector, directional con-
107 1047-3203/98 $25.00 Copyright 1998 by Academic Press All rights of reproduction in any form reserved.
108
IDRIS AND PANCHANATHAN
ditional vector probability matrices are used to select a subcodebook from a larger codebook. The input vector is then encoded using the subcodebook or the larger codebook based on which results in a lower distortion. Nasrabadi et al. [5, 6, 11] have proposed an interframe hierarchical vector quantizer using quadtree decomposition (IHA-VQ), where each frame is partitioned into 7 3 7 blocks. Block matching [2] is used to estimate the motion of each block. To indicate the status (motion/no motion) of the blocks, a bit map followed by the motion vectors of the moving blocks are transmitted. The difference between the current frame and the motion estimated frame is then segmented hierarchically into 32 3 32, 16 3 16, 8 3 8, 4 3 4, and 2 3 2 blocks. The 32 3 32, 16 3 16, and 8 3 8 blocks are replenished from the previous frame, whereas a set of codebooks is used to encode the smaller blocks. There are two problems with this class of algorithms. First, a limited size codebook is not sufficient to represent the different types of image sequences. Therefore, a large codebook is required which increases the bit rate and entails practical difficulties in codebook generation and encoding processes. Second, the mismatch between the codebook and image sequences outside the training set may degrade the coding performance. The second class of algorithms is based on updating the codebook during the encoding process so that it reflects the local statistics of the frame to be encoded. For example, Goldberg and Sun [12, 13] have presented several interframe coding techniques, where the changes in successive frames are tracked by incorporating label replenishment and codebook replenishment. Here, the vectors of the first frame are used to generate the codebook. The frame is vector quantized, and the corresponding labels are stored in the label memory. The vectors in the subsequent frames are quantized, and the indices are compared with the corresponding labels in the label memory. If they differ, the label memory is replenished. In order to adapt the codebook to the local frame statistics, codebook replenishment is used where the means of the new clusters are determined using the vectors of the current frame. The new means are then compared with the corresponding codewords. If the difference is larger than a prespecified threshold, the codeword is modified. Another approach to replenish the codebook is to design a new codebook for each frame using the vectors of the current frame as a training set and the previous codebook as the initial codebook [13]. The resulting centroids are then compared to the codewords in the initial codebook. If the difference is greater than a threshold, the codeword is replenished. Monet and Labit [14] have proposed the application of codebook replenishment technique for classified pruned tree structured VQ. This technique uses the average distortion resulting from the assignment of a set of input vectors to a particular codeword as a measure to determine whether to delete, split, or modify the codeword. Yeh [15] has
presented an adaptive VQ technique for image sequence coding. In this technique, a mean/residual binary tree VQ is employed, where the quantization error is used to determine whether to replenish the codeword or not. However, in all these algorithms frame adaptability results in further increases in computational complexity, making real time implementation difficult. Recently, the International Standards Organization (ISO) have proposed a standard for video compression known as the MPEG (moving picture experts group) [17]. In the MPEG video compression standard, a group of pictures approach is used, instead of the frame-by-frame coding. A group of pictures is typically a combination of one or two intra-pictures (I), predicted picture (P), and the rest of bi-directional pictures (B). The I frames are coded using DCT on 8 3 8 blocks. The P frames are decomposed into 16 3 16 blocks and the motion-compensated difference frame is partitioned into 8 3 8 blocks which then undergo a two-dimensional DCT. We note that, at low bit rates, DCT-based techniques suffer from blocking effects. A more serious consideration from an implementation point of view is that DCT and motion estimation are computationally different algorithms which basically require two (one for DCT and other for motion estimation) different architectures. However, the combination of different architectures may result in inefficiencies in implementation as the common elements between the two algorithms and the architectures may not be fully exploited. In this paper, a new technique for image sequence compression FAVQ is presented [18]. This technique results in a good coding performance at both low and high bit rates. In addition, a unified architecture can be used for real time implementation [20, 21], as VQ and motion estimation using block matching can be viewed as template matching algorithms where each input vector is compared with a finite set of templates (codewords and candidate blocks). The rest of the paper is organized as follows. In Section 2 the proposed algorithm for image sequence compression is presented. The analysis of the computational complexity of the proposed algorithm is detailed in Section 3. Simulation results are presented in Section 4 followed by the conclusions in Section 5. 2. FRAME ADAPTIVE VQ
We now describe the proposed frame adaptive VQ technique for image sequence compression (FAVQ). In FAVQ, two coding modes are employed: interframe and intraframe. The input frame is first partitioned into 4 3 4 blocks. Each block is then re-organized into a 16-dimensional vector. To start with, the encoder attempts to code the input vector in the interframe mode; however, if it fails, the input vector is encoded in the intraframe mode.
FRAME-ADAPTIVE VECTOR QUANTIZATION
109
FIG. 1. Block schematic of FAVQ interframe mode.
Interframe mode. A block diagram of the interframe mode is shown in Fig. 1. In this mode, each input vector V F in the current frame (F ) is compared with the vector V F21 at the same spatial location in the previous frame (F 2 1). If they match within a prespecified threshold D, a flag Sf is transmitted to the receiver. Otherwise, a search area SA in (F 2 1) is defined and a match within SA is sought. The size of SA is determined from the displacements of the neighboring vectors. Here, the displacement of V iF is set to the maximum of the displacements of the F F F vectors V n1 , V n2 , or V n3 as shown in Fig. 2. We note that the size of SA can be 8 3 8, 14 3 14, or 18 3 18 pixels corresponding to a maximum displacement p of 2, 5, and 7, respectively. If a match within SA is obtained, a flag Sm followed by the displacement (motion) vector of V F are transmitted. However, if a match is not obtained even within SA, the FAVQ coder switches to the intraframe mode, where a dynamically generated, self-organized codebook is used to encode V F. Intraframe mode. In the intraframe mode, a highly adaptive VQ technique [16] is employed. Here, two
codebooks are used, namely the primary codebook (PC) and a larger secondary codebook (SC) as shown in Fig. 3. To start with, the vectors of the image to be coded are used as the codewords. For each input vector, PC is searched for a match within a prespecified threshold. If no match is obtained, the input vector is transmitted and is also appended to PC as a new codeword. If a match is obtained the index of the corresponding codeword is transmitted. When PC becomes full, the least recently used (LRU) codeword is moved from PC to SC freeing room for the new codeword. From this point on, PC is searched first for a match; however, if it fails, SC is also searched. A new codeword is appended only if no match is obtained in both codebooks, and in this case, the LRU codeword in PC is moved to SC. However, if SC is also full, the LRU codeword in SC is deleted. If a match is obtained in SC, the index of that codeword is transmitted and the codeword is swapped with the LRU codeword in PC. A flag Sc , Sp , or Ss is transmitted before transmitting the codeword, the primary label or the secondary label. To summarize, the advantages of FAVQ are:
FIG. 2. V Fi and V Fn1 , V Fn2 , V Fn3 are the input vector and the neighboring vectors coded previously, respectively.
• Both the interframe and the intraframe correlations are exploited, since two modes are employed. • Frame adaptability is achieved at a reduced complexity by using a variable size search area in the interframe mode and by using a dynamically generated codebook in the intraframe mode. • FAVQ is a single pass technique and, hence, real-time implementation is possible. • The reconstructed vectors are always within a prespecified error threshold.
110
IDRIS AND PANCHANATHAN
FIG. 3. Block schematic of FAVQ intraframe mode.
3. COMPUTATIONAL COMPLEXITY
For K input vectors of dimension L 5 n 3 n and a search area (SA) of size (2p 1 1)2 the computational complexity of VQ and the search within SA are O(KLN ) and O(KL (2p 1 1)2), respectively, where N is the codebook size and p is the maximum displacement. For K 5 1 the computational complexity of FAVQ consists of: • The computational complexity of the search within SA: CSA 5 O(L(2p 1 1)2)
(1)
• The computational complexity of searching a primary codebook of size Np , Cprimary 5 O(LNp).
(5)
where h2 is the probability of obtaining a match in the primary code book. Hence, the complexity CFAVQ of FAVQ is CFAVQ 5 KL(h1(2p 1 1)2 1 h2(Np 1 (2p 1 1)2) (6) 1 (1 2 h1 2 h2)(Ns 1 Np 1 (2p 1 1)2)). Typical values of CFAVQ are indicated in the next section.
(3)
If h1 is the probability of obtaining a match within the search area, then for K input vectors the complexity of the interframe mode Cinter is Cinter 5 (KLh1(2p 1 1)2).
Cintra 5 KL(h2(Np 1 (2p 1 1)2) 1 (1 2 h1 2 h2)(Ns 1 Np 1 (2p 1 1)2)),
(2)
• The computational complexity of searching a secondary codebook of size Ns , Csecondary 5 O(LNs).
The complexity Cintra of the intraframe mode for K input vectors is calculated using Eq. (2) and Eq. (3) and is given by
(4)
4. SIMULATIONS
Computer simulations were carried out on the standard sequences Miss America, salesman, and church. The sequences are obtained from ‘‘ftp://ftp.ipl.rpi.edu/image/sequence/.’’ The specifications of the video sequences are shown in Table 1. The first and last frames of the Miss America, salesman, and church sequences are shown in Figs. 4, 5, and 6, respectively. The Miss America sequence (Fig. 4) shows the head and shoulders of a woman talking in front of a static background; the motion in the sequence
FRAME-ADAPTIVE VECTOR QUANTIZATION
111
TABLE 1 Specifications of the Test Image Sequences
Sequence
Frame size (pixels)
Bits per pixel
Frames/s
No. of frames
Miss America Salesman Church
288 3 360 288 3 360 480 3 720
8 8 8
10 30 30
100 100 25
is confined to the head, face, and shoulders. The salesman sequence (Fig. 5) shows a man sitting at a desk in front of a static background; the motion in the sequence is confined mainly to his hands, face, and head. The church sequence (Fig. 6) shows the tower of a church; the motion in this sequence is confined to the motion of the camera. Four sets of experiments were performed. In the first experiment, simulations were performed using FAVQ on the Miss America sequence (corresponding to both small and large variations between successive frames) and the results were compared to those of the MPEG coder [22]. In the second experiment, the performance of FAVQ is evaluated and compared to that of the IHA-VQ [11]. We recall from Section 1 that the IHA-VQ technique has been recently reported in the literature as a low bit rate VQbased technique for video conferencing [11]. In the third experiment, the performance of the proposed algorithm is
FIG. 5. (a) First and (b) last frames of the salesman sequence.
studied for sequences with scene changes. In the fourth experiment the performance of FAVQ for large frame size image sequences is evaluated using the church sequence. In our experiments, the sizes of the primary and secondary codebooks are set to 4 and 256, respectively. We note that the simulations have been performed using different threshold values (D) corresponding to different bit rates. The MPEG encoder [22], which is used for comparison, has the following group of pictures structure: I B B B P B B B I B B B P B B B I B .... The following quantization matrix is employed: 16 11 12 14 12 10 16 14 13 14 18 17 16 19 24 40 26 24 22 22 24 49 35 37 29 40 58 51 61 60 57 51 56 55 64 72 92 78 64 68 87 69 55 56 80 109 81 87 95 98 103 104 103 62 77 113 121 112 100 120 92 101 103 99
FIG. 4. (a) First and (b) last frames of the Miss America sequence.
The full search block matching algorithm is used for forward/backward motion estimation. The size of the search area (for motion estimation) is fixed with a maximum displacement p of 8 pixels. The coding performance of the proposed algorithm is
112
IDRIS AND PANCHANATHAN
original and the reconstructed image, respectively. From our simulations we note that the SNR is 2–3 dB less than the PSNR; however, the nature of the results reported in this paper will not change if SNR was used since the PSNR values are reported in comparison to the PSNR of other techniques. The bit rate of the interframe mode (Rinter) consists of two components: • The bit rate (R f ) for transmitting the flag Sf , Rf 5
3Nf , N1 N2
(8)
where Nf is the number of Sf flags. • The bit rate Rmv for transmitting the motion vectors,
O log (2p 1 1) Nm
2
Rmv 5
i 51
N1 N2
2
1 RSm ,
(9)
where Nm , N1 , N2 , and RSm are the number of motion vectors, the frame size, and the bit rate of the flags Sm , respectively. Hence, the total bit rate of the interframe mode is given by Rinter 5 R f 1 Rmv .
(10)
The bit rate of the intraframe mode (Rintra) consists of three components • The bit rate (Rp) for the transmission of the primary labels, FIG. 6. (a) First and (b) last frames of the church sequence.
evaluated using rate distortion criterion [3]. Although the signal to noise ratio (SNR) is used as a fidelity measure in the signal processing literature, the peak signal to noise ratio (PSNR) is more widely used in image and video compression [1]. Consequently, we have used the PSNR to evaluate the quality of the reconstructed video sequences. For an image of size N1 3 N2 and a maximum pixel value of 255, the PSNR of the reconstructed image is calculated by PSNR 5 10 log
3
1 N1 N2
O O (X 2 Y ) 4 2552
N1 N2
ij
ij
,
Rp 5
Npl log2 Np 1 RSp N1 N2
(11)
• The bit rate (Rs) for the transmission of the secondary labels, Rs 5
Nsl log2 Ns 1 RSs N1 N2
(12)
• The bit rate (Rc) for the transmission of codewords, Rc 5
Nc Lb 1 RSc , N1 N2
(13)
(7)
2
i 51 j 51
where Xi j and Yi j are the intensity of the pixel (i, j ) in the
where Npl , Nsl , Nc , and b are the number of primary labels, secondary labels, codewords, and bits/pixel, respectively, and RSp , RSs , RSc are the bit rate for the flags to indicate that the current transmission corresponds to primary label,
FRAME-ADAPTIVE VECTOR QUANTIZATION
FIG. 7. PSNR vs frame number using FAVQ (D 5 15) and MPEG on the Miss America sequence.
secondary label, or codeword, respectively. The total bit rate of the interframe mode is given by Rinter 5 Rp 1 Rs 1 Rc .
(14)
The total bit rate Rt is sum of the bit rates of the interframe mode (Rinter) and the intraframe mode (Rintra) and is calculated using Eqs. (10) and (14): Rt 5 R f 1 Rmv 1 Rp 1 Rs 1 Rc .
(15)
In our experiments, an entropy [19] coder is used to encode the flags and the codewords. We note that all the data reported in this section are based on actual bit rates and signal to noise ratio observed and not simply on rates and PSNR predicted by the previous formulas. In the first experiment, the performance of FAVQ is evaluated for image sequences with small and large variations (changes) between successive frames. The first 32 frames of the Miss America sequence (corresponding to sequences with small variation between successive frames) are encoded using FAVQ with D 5 15 and the MPEG coder. The resulting average bit rate using both coders is 0.4 bits/pixel (bpp). The comparative chart for the PSNR of the reconstructed sequence as a function of frame number is shown in Fig. 7. It can be seen that FAVQ outperforms the MPEG coder by approximately 2 dB in terms of the average PSNR. The results of applying the FAVQ technique on every other fourth frame of the Miss America sequence with D 5 12 are tabulated in Table 2. Note that the intermediate frames are skipped to allow for larger changes between successive frames. Table 2 shows the breakdown of the total bit rate Rt into its components R f , Rm (2, 5, and 7), Rp , Rs , and Rc . It can be seen from Table 2 that as the motion in the sequences increases, the bit rate of the motion vectors Rm (2, 5, and 7) also increases. The average bit rate and PSNR are 0.53 bpp and 38 dB, respectively. The corresponding average bit rate and PSNR using the
113
MPEG coder are 0.5 bpp and 36 dB, respectively. The comparative chart of the PSNR on a frame by frame basis is shown in Fig. 8. The original image and the reconstructed image (using FAVQ) corresponding to frame number 5 are shown in Figs. 9 and 10, respectively. The corresponding error image is shown in Fig. 11. The reconstructed and error images of frame number 5 using the MPEG coder are shown in Figs. 12 and 13, respectively. It can be seen that for sequences with large variations (between frames), FAVQ outperforms the MPEG coder and yields better objective and subjective quality. In summary, the proposed coder provides better coding performance compared to the MPEG coder and is more robust to variations in the image sequence. Since the performance of a coder is usually degraded when there are large variations between successive frames, we have chosen to evaluate the performance of FAVQ using sequences with large variations in the following experiments. In the second experiment, simulation results are obtained for the Miss America sequence for threshold values of D equal to 10 and 16, corresponding to average bit rates of 0.60 and 0.25 bits/pixel (bpp), respectively. Simulations using IHA-VQ on the same sequence are used for comparison. The comparative charts of the total bit rate and PSNR on a frame-by-frame basis are shown in Figs. 14, 15, 16, and 17. It can be seen that the average coding performance of the two techniques are almost identical. However, when there are significant changes between the frames (frames 58 to 97), FAVQ outperforms IHA-VQ by adapting to the changes and maintaining a constant PSNR throughout the sequence. Moreover, FAVQ has a much smaller variation in bit rate (0.49 to 0.69 and 0.20 to 0.30 bpp), compared to IHA-VQ (0.34 to 0.74 and 0.15 to 0.42 bpp) technique. In addition, we note that the computational complexity of FAVQ is less than that of IHA-VQ. For example, at 0.6 bpp, the complexity of FAVQ and IHA-VQ are O(94) and O(140) operations per pixel, respectively. The original and the reconstructed images of frame number 64 at 0.25 bpp are shown in Figs. 18 and 19, respectively. The corresponding error image (normalized) is shown in Fig. 20 for subjective evaluation. It is clearly seen that the overall subjective quality of the coded frames are excellent. The high details and the edges are preserved after coding. To summarize, FAVQ is suitable for constant quality applications and also provides frame adaptability at a reduced complexity. The results of the previous experiments using the Miss America sequence (only every other fourth frame is encoded) are summarized in Table 3. We note that for FAVQ D values are 10, 15, and 16. It can be seen from Table 3 that at approximately 0.25 bits/pixel and 0.6 bits/pixel, FAVQ and IHA-VQ perform comparably in terms of the average PSNR. It can be seen from Table 3 that FAVQ
114
IDRIS AND PANCHANATHAN
TABLE 2 Simulation Results Using FAVQ on the Miss America Sequence (D 5 12)
and IHA-VQ outperform MPEG where approximately the same PSNR is obtained at a lower bit rate. In the third experiment, simulations were carried out on a test sequence with scene change. The test sequence is composed of two different scenes. The first scene consists of six frames of the Miss America sequence. The second scene is composed of six frames of the salesman sequence. Fig. 21 shows the percentage of occurrence of the codes generated in the interframe mode (Sf and the motion vec-
tors) and the codes generated in the intraframe mode (the primary labels, the secondary labels, and the codewords) as a function of the frame number. It can be seen, from Fig. 21, that both coding modes are utilized as expected. In other words, the intraframe codes (i.e., the contribution of the bit rates Rp , Rs , and Rc to the total bit rate Rt) are dominant at scene changes while the interframe codes (i.e., the contribution of the bit rates Rf and Rm to the total bit rate Rt) become dominant with increased interframe
FRAME-ADAPTIVE VECTOR QUANTIZATION
115
FIG. 12. Reconstructed frame number 5 of the Miss America sequence using MPEG at 0.4 bpp. FIG. 8. PSNR vs frame number using FAVQ (D 5 12) and MPEG on the Miss America sequence Here only every other 4th frame is encoded.
correlation. The total bit rate and PSNR of the frames are shown in Figs. 22 and 23, respectively. We note that the PSNR value at the scene change (frame 7) is due to the lower average PSNR value (34.4 dB) of the salesman sequence compared to the Miss America sequence. In other words, it can be seen from Fig. 23 that the transition from one scene to the next is smooth. In summary, the FAVQ encoder adapts quite well to scene changes. In the last experiment, the performance of FAVQ is evaluated using the church sequence. The average bit rate and PSNR are 0.64 (corresponding to 6.6 Mbits/s) and 34.1 dB, respectively. We note that FAVQ also results in a very
FIG. 9. Original frame number 5 of the Miss America sequence.
FIG. 13. Error frame number 5 of the Miss America sequence using MPEG at 0.4 bpp. FIG. 10. Reconstructed frame number 5 of the Miss America sequence using FAVQ at 0.4 bpp.
FIG. 11. Error frame number 5 of the Miss America sequence using FAVQ at 0.4 bpp.
FIG. 14. The total bit rate as a function of frame number using FAVQ with D 5 10 (average 5 0.6 bpp) and IHA-VQ (average 5 0.61 bpp) on the Miss America sequence.
116
IDRIS AND PANCHANATHAN
FIG. 18. Original frame number 64 of the Miss America sequence.
FIG. 15. The PSNR as a function of frame number using FAVQ with D 5 10 (average 5 39.3 dB) and IHA-VQ (average 5 39.5 dB) on the Miss America sequence.
FIG. 19. Reconstructed frame number 64 of the Miss America sequence using FAVQ at 0.25 bpp.
FIG. 16. The total bit rate as a function of frame number using FAVQ with D 5 16 (average 5 0.25 bpp) and IHA-VQ (average 5 0.24 bpp) on the Miss America sequence.
FIG. 20. Error frame number 64 of the Miss America sequence using FAVQ at 0.25 bpp.
FIG. 17. The PSNR as a function of frame number using FAVQ with D 5 16 (average 5 36.3 dB) and IHA-VQ (average 5 37.65 dB) on the Miss America sequence.
good coding performance for large frame size image sequences. 5. CONCLUSIONS
In this paper, we have presented a frame adaptive VQ (FAVQ) algorithm for image sequence compression. The
TABLE 3 The Average Bit Rate and Average PSNR for the Miss America Sequence, Where Only Every Other Fourth Frame Is Encoded Compression technique
Average bit rate (BPP)
PSNR (dB)
FAVQ FAVQ FAVQ IHA-VQ IHA-VQ MPEG
0.25 0.53 0.60 0.24 0.61 0.50
36.3 38 39.3 37.65 39.5 36
117
FRAME-ADAPTIVE VECTOR QUANTIZATION
FIG. 21. Occurrence percentage of Sf , motion vectors for a maximum displacement p of 2, 5 and 7, the primary labels, the secondary labels and the codewords for the sequence with scene change (average bit rate 5 0.80 for D 5 14).
FAVQ coder exploits the inter/intraframe correlations and provides frame adaptability at a reduced complexity. Computer simulations using standard image sequences demonstrate the superior coding performance of FAVQ, compared to other techniques reported in the literature. In addition, FAVQ is suitable for constant quality applica-
tions. Note that real time implementation of FAVQ is also possible [21]. ACKNOWLEDGMENT The authors thank Professor M. Goldberg for his valuable comments and suggestions. Also, we wish to gratefully acknowledge the financial support from the Canadian International Development Agency and the Jordan University of Science and Technology.
REFERENCES 1. A. K. Jain, Image data compression: A review, Proc. IEEE 69, No. 3, 1981, 349–389. 2. J. R. Jain and A. K. Jain, Displacement measurement and its application in interframe image coding, IEEE Trans. Commun. COM-29, No. 12, 1981, 1799–1808. 3. L. D. Davisson, Rate-distortion theory and application, Proc. IEEE 60, No. 7, 1972, 800–808.
FIG. 22. The total bit rate as a function of the frame number for the sequence with scene change (average bit rate 5 0.80 bpp for D 5 14).
4. N. M. Nasrabadi and R. A. King, Image coding using vector quantization: A review, IEEE Trans. Commun. COM-36, No. 8, 1988, 957–971. 5. N. M. Nasrabadi, S. E. Lin, and Y. Feng, Interframe hierarchical vector quantization, in IEEE 1989 International Conference on Acoustics, Speech, and Signal Processing, May 1989. 6. N. M. Nasrabadi, Interframe hierarchical address-vector quantization, Visual Commun. Image Process. SPIE ’90 1360, 1990, 558–574. 7. T. Murakami, K. Asai, and E. Yamazaki, Image sequence coding, Electron. Lett. 18, No. 23, 1982, 1005–1006. 8. L. Corte-real and A. P. Alves, Vector quantization of image sequences using variable size and variable shape blocks, Electron. Lett. 26, No. 18, 1990, 1483–1484. 9. J. Huguet and L. Torres, Vector quantization in image sequence coding, Signal Process. V: Theories Appl., 1990, 1079–1081.
FIG. 23. The PSNR as a function of the frame number for the sequence with scene change (average PSNR 5 36.73 dB for D 5 14).
10. Q. Guo, N. M. Nasrabadi, and N. Mohesenian, An interframe dynamic FSVQ codec for video sequence coding, in IEEE 1992 International Conference on Acoustics, Speech, and Signal Processing, March 1992, pp. III-501–III-504.
118
IDRIS AND PANCHANATHAN
11. N. M. Nasrabadi, C. Y. Choo, and J. U. Roy, Interframe hierarchical address-vector quantization, IEEE J. Selected Areas Commun. 10, No. 5, 1992, 960–967. 12. M. Goldberg and H. Sun, Image sequence coding using vector quantization, IEEE Trans. Commun. COM-34, No. 8, 1986, 703–710. 13. M. Goldberg and H. Sun, Frame adaptive vector quantization for image sequence coding, IEEE Trans. Commun. 36, No. 5, 1986, 629–635. 14. P. Monet and C. Labit, Codebook replenishment in classified pruned tree-structured vector quantization of image sequences, in IEEE 1990 International Conference on Acoustics, Speech, and Signal Processing, Albuquerque, 1990, Vol. 4, pp. 2285–2288. 15. C. L. Yeh, Color image sequence compression using adaptive binarytree vector quantization with codebook replenishment, in IEEE 1987 Internat. Conf. Acoustics Speech and Signal Process., Dallas, Texas, April 1987, pp. 1059–1062. 16. S. Panchanathan and M. Goldberg, A mini-max algorithm for image adaptive vector quantization, IEE Proc.: Part I. Commun. Speech Vision 138, No. 1, 1991, 53–60. 17. ISO, Coding of moving pictures and associated audio, Committee Draft of Standard ISO 11172, ISO/MPEG 90/176, December 1990. 18. F. Idris and S. Panchanathan, Image sequence coding using frame adaptive vector quantization, Visual Commun. Image Process. ’93, November 1993, Vol. 2094, pp. 941–952. 19. R. W. Hamming, Coding and information theory, Prentice-Hall, Englewood Cliffs, NJ, 1980. 20. F. Idris and S. Panchanathan, Adaptive vector quantizer for image coding, Picture Coding Sympos. ’93, March 1993, pp. 5.2–5.3. 21. F. Idris and S. Panchanathan, Associative memory architecture for video compression, in IEE Proc. Comput. Digital Techniques, 1993. 22. R. Gandhi, MPEG Encoder Software, Visual Computing and Communications Laboratory, University of Ottawa, 1993.
FAYEZ M. IDRIS is a Ph.D. candidate at the University of Ottawa, Canada. He received his M.Sc. in Electrical Engineering from the Univer-
sity of Ottawa, Canada in 1993 and the B.Sc. in Electrical Engineering from Yarmouk University, Jordan in 1985. From 1986 to 1990, he was a research and development engineer at the International Systems and Electronics Development Co., Jordan. His research interests are in the areas of image/video databases, image/video compression, and computer architecture.
SETHURAMAN PANCHANATHAN received his B.Sc. degree in Physics from the University of Madras, India in 1981, B.E. degree in Electronics and Communication Engineering from the Indian Institute of Science, India in 1984, M.Tech. degree in Electrical Engineering from the Indian Institute of Technology, Madras, India in 1986; and the Ph.D. degree in Electrical Engineering from the University of Ottawa in 1989. Dr. Panchanathan is currently an associate professor in the Department of Computer Science and Engineering at Arizona State University, Tempe, AZ. He was an associate professor and the Director of the Visual Computing and Communications Laboratory in the Department of Electrical and Computer Engineering at the University of Ottawa, Canada where he led a team of post-doctoral fellows, research engineers, and graduate students working in the areas of compression, indexing, storage, retrieval and browsing of images and video, VLSI architectures for video processing, multimedia hardware architectures, parallel processing, and multimedia communications. He has published over 120 papers in refereed journals and conferences. He was the Co-chair of the IS&T/SPIE Digital Video Compression-Algorithms and Technologies ’96 and Multimedia Hardware Architectures ’97 Conferences held in San Jose. He is the Symposium Chair of Electronic Imaging Symposium held in San Jose in February 1998. He is a senior member of the IEEE and a member of the Professional Engineers of Ontario, SPIE, and EURASIP. He is an Associate Editor of the IEEE Transactions on Circuits and Systems for Video Technology and an Area Editor of the Journal of Visual Communications and Image Representation. He is a guest editor of the special issue on ‘‘Image and Video Processing for Emerging Interactive Multimedia’’ in the IEEE Transactions on Circuits and Systems for Video Technology.