VQ on projection domain of image

VQ on projection domain of image

Signal Processing: Image Communication 5 (1993) 209-217 Elsevier 209 VQ on projection domain of image Hee Bok Park and Choong Woong Lee Department o...

1MB Sizes 0 Downloads 29 Views

Signal Processing: Image Communication 5 (1993) 209-217 Elsevier

209

VQ on projection domain of image Hee Bok Park and Choong Woong Lee Department of Electronics Engineering, Seoul National University, Shinlim-dong San 56-1, Kwanak-gu, 151-742 Seoul, South Korea Received 27 January 1992

Abstract. Vector quantization (VQ), which is the optimal image compression technique, reduces bit-rate using interpixel

correlation in a block of an image. As the block size increases, the performance of VQ improves, but serious problems exist on the computational complexity and the required memory. Also, a serious problem of VQ, which is intrinsic in block coding schemes, is blocking effects. It degrades the image quality severely. This paper presents an image compression scheme, Projection-VQ, based on VQ of projections instead of an image itself. The vector dimension of a projection is much smaller than that of the image. Therefore, we may easily process a large block in projection domain. In Projection-VQ, a block image is projected at several angles. If we transmit the projections which have edge information, we can reconstruct an image without destroying edge sharpness even with a few projections. Simulation results show that the performance of the suggested scheme is superior to the normal VQ scheme in the sense of PSNR. Also, substantial improvement in subjective image quality can be attained because the staircase noise, which is a kind of blocking effects, is not visible in the reconstructed image. Keywords. VQ; projection; reconstruction; image compression; blocking effects.

I. Introduction The purpose of image compression is reducing the bit-rate while maintaining the necessary fidelity of the image. VQ has shown to be an effective technique for image compression. It is intrinsically superior to other suboptimal techniques since it can achieve the optimal rate distortion subject to a constraint on block size (or vector dimension) ; the performance of VQ improves as the block size increases because we can exploit more statistical interpixel correlation in the large block [11]. But, the size of memory and the computational complexity required to design a proper codebook increase exponentially with the vector dimension. Even though we assume that such a codebook has been designed, the computational requirement in encoding an image increases also exponentially with it [1, 7]. Thus, the block size of VQ is usually limited to about 6 x 6.

One serious and intrinsic problem encountered in a block coding scheme, such as VQ, is blocking effects, which severely degrades the quality of the reconstructed image. There are two kinds of blocking effects: grid noise and staircase noise. The grid noise, which exists in the monotone areas of an image, results from the tendency that the intensity of a coded image changes abruptly from one block to another. The continuity of tone may be preserved within each block, but is not assured from block to block. The staircase noise exists near edges of an image. Since each block is encoded independently of its neighbors, the continuity of edges across the boundaries of blocks is not ensured. Also, because the coder cannot adequately represent all of the edge shapes within the blocks, the degradation of the entire edge deteriorated [8]. This paper proposes a new VQ technique, called Projection-VQ, which can not only increase the block size without increasing severe computational

0923-5965/93/$06.00 © 1993 Elsevier Science Publishers B.V. All rights reserved

210

H.B. Park, C W. Lee/ VQ on projection domain of image

requirement, but also preserve the edge shape clearly. It is based on combining VQ and the theorem of projection which has been applied in the areas of CT (computerized tomography) [9]. A projection is a mapping from a vector space into its subspace, so that the vector dimension of the projection is smaller than that of the original. We project an image at several angles before quantizing it, then only the informative projections are quantized and transmitted. For the reduction of bit-rate, we allocate projections adaptively depending on local feature.

2. Projection and reconstruction of image 2.1. Projection of image The projection of an image is discussed in detail in [4, 9]. We regard an image as a two-dimensional functionf(x, y) (Fig. l(a)). A line running through f ( x , y) is called a ray, and the integral o f f ( x , y) along a ray is called a ray integral. The set of ray integrals taken at an angle O forms a projection Po(t); that is, a projection is a set of non-overlapping, equally spaced parallel rays which completely cover the image. The angle of a projection, O is defined as the counter-clockwise angle which the

rays make with the vertical axis. In image compression, an image f ( x , y) is sampled in a spatial domain (Fig. l(b)). We call a sample as a pixel, and the summation of pixels as a ray sum. If we let the pixels and the ray sums as f ' s and p/s, respectively, we can express as N

j=l

wo.fj=p~,

i= 1, 2 . . . . . M,

and in matrix form

WF=P,

(2)

where N is the number of pixels and M is the total number of ray sums in all projections, wij is the weighting factor that represents the contribution of the jth pixel to the ith ray sum. In general, because we simply add pixel values to obtain a ray sum, wij's are replaced by l's or O's depending on whether or not the jth pixel is within the ith ray sum area.

2.2. Reconstruction of image The reconstruction of an image is to obtain F from P. If the inverse matrix of W exists, we can calculate F from matrix theory as follows:

F= W-1P.

(3)

% J

~~~ o

o

o

o

o

o

o o

o

o

o o

o o

o

o o

o o o

o

o

o 4 e o ~

xeosO+ysinO=h (a) unsampled image Fig. 1. Projection of an image at angle 19. Signal Processing: Image Communication

(1)

(b) sampled image

211

H.B. Park, C. W. Lee / VQ on projection domain of image

But, in the projection of an image, we cannot easily get a matrix W having an inverse matrix. Even with such a matrix, when noise is present in the measurement data P, the calculated image may be very different from the original [4]. Kaczmarz has proposed an attractive iteration method for solving (1), which is called 'method of projections' [9]. In this method, we make the initial guess of a solution as F(°)=(Aq(°),f2(°) . . . . . fN(°))T. Then an image is reconstructed by using the ray sums one by one. We obtain a more approached image to the solution with pl in the following fashion:

However, the projection of angle 90 ° can never be helped. Projection-VQ transmits the codebook indices of the projections which are most useful in the reconstruction.

(4)

where Q r = (P,1, Pr2 . . . . . p r k ) T is a projection vector at the rth angle. Po denotes the jth ray sum of Q,, and k is the vector dimension of Q~. Before explaining the Projection-VQ technique, we define 'single angle reconstruction (SAR)'as a reconstruction from only one projection. If we let the initial guess as F~(°)= (0, 0 , . . . , 0) T, the SAR with Q~ is carried out by using

F( o = F(O) _ ( W ~ F (°~ - P l ) WI ,

wTw, where W l = ( W ~ l , w~2 . . . . . W~N)T is the vector of weighting factors concerned with pl when the image is projected [2, 3, 10]. After getting F ~), we can obtain F (2) with P2. This process is repeated with P3 and so on. The value of the jth iteration is obtained from the value of the ( j - 1)th iteration by

3.1. Encoder and decoder

As a matter of convenience, we classify the ray sums of (1) by the angles. If L is the number of angles, we can write P of (2) as follows:

P~(QT,

QT .....

QT)T,

T

V ~/)- r ('/- 1) ( w f r ( J - ' ) - P J ) W J

w wj

(5)

This process continues until F ~ is obtained using all the ray sums of all the projections. Then we check the stopping criterion to end the iteration. If the stopping criterion is not satisfied, we again start the iteration with Pl and the initial value F ( ~ .

3.

Projection-VQ

If an image is partitioned into small blocks such as 4 x 4 or 8 × 8, most blocks have monotone or simple edge shape. When the block is projected at several angles, the projection of the dominant edge angle has the largest information content. We can usually reconstruct a similar image to the original with only a few projections including the dominant edge projection. For example, we can reconstruct a very similar image to the original with the projection of angle 0 ° for the image with the vertical edge.

WriWri

'

(6)

(7)

where Wri = (W,;I, W,,2. . . . . WriN)T is the vector of weighting factors concerned with Pr~ when the image is projected. The iteration of (7) continues for all the ray sums of Qr, therefore, Fr~g) is the SAR of Qr. The encoding procedure of Projection-VQ is as follows: (1) Get projections Ql, Q2 . . . . . QL of the original F °, and obtain V1, V2. . . . . VL by VQ of them. (2) Get SARs, F1, F2 . . . . . FL for VI, V2 . . . . . VL. (3) Find a SAR which has the minimum Euclidean distances from F °, and let it be Fmin. (4) Reconstruct R, as follows: Rn=Rn_lAI-Fmin,

where R0 = (0, 0 . . . . . 0) "r.

(8)

(5) I f R , satisfies the stopping criteria, transmit the codebook indices Vmin'S of Fmin'S with overhead bits, and start the processing of a new block. Vol. 5, No. 3, May 1993

212

H.B. Park, C.W.. Lee / VQ on projection domain of image

(6) Else, get a difference image E,, Fmin,

E,=E,-1-

where Eo = F °.

(9)

(7) Iterate (1)-(6) for E, instead of F °. In the decoder, we reconstruct an image by adding SARs for all transmitted projections. It is advantageous to process E, instead of F ° because the variance of E, is smaller than that o f f °. Thus we have processed E, from the second iteration. The block diagram of the above procedure is shown in Fig. 2. Overhead bits are needed to maintain the encoder and decoder in the same state. The overhead bits for a block are as follows: Block_bit = 1 (bit),

values of (a) are positive, the mean value is always greater than 0. But the mean value of (b) is near zero. Also the variance of (a) is larger than that of (b). The codebook should be designed depending on the mean value and the variance of input data. Therefore, the codebook of (a) should be different from those of (b). If the geometrical shape of a block is not circular, the structure of vector at each angle is different from other. If the number of angles is large, too many codebooks are required. Fortunately, if the block is square and the angles are equally spaced and symmetrical, we can group the projections into a few number of same structure. We may design one codebook for a group.

3.3. Stopping criteria o f iteration

Angle_bit ~>logL (bits). Block_bit distinguishes between the beginning and the continuing of a block processing. Angle_bit is used to point out the angles of projections.

Stopping criteria are needed to end the iteration. We propose two of them which compare MSE (mean square error). MSE between the original, FO=(flO,fO . . . . . f o ) , and the reconstructed image, R = (rl, r2 . . . . . rs), is defined as

3.2. Consideration for codebook design

MSE = E{(F ° - R)T(F °

The input data of VQ are classified into two kinds of vectors statistically: (a) vectors for F °, and (b) vectors for E~, E2, E3 . . . . . Because all the

=l{i~l

-

R)}

( f ° - ri):}.

to next block +

En

Rn

+

>

Vmin+ overhead bit

+ Rn (a) Encoder

channel ....

~

R~)

(b) Decoder Fig. 2. Block diagram of the encoder and the decoder in the Projection-VQscheme.

SignalProcessing:Image Communication

(10)

H.B. Park, C. W. Lee / VQ on projection domain o f image

Let MSEn be the MSE ofRn. If one of the following criteria is satisfied, the iteration ends: (1) MSE~ < MSEc, (2) MSEn_ 1- MSE~ < MSEd, where MSEc and MSEd are the pre-specified values. Criterion (1) is mainly applied to monotone blocks or simple edge blocks. We can obtain a good quality in these blocks using a few projections, so it restricts trivial bit transmission. Criterion (2) is applied to random blocks or complex edge blocks. In these blocks, it is hard to satisfy criterion (1) even with numerous iterations, so that the bit transmission for a block may be excessive. The human visual system is not so sensitive to the noise in a random area or near complex edge due to the masking effect, a well-known feature of visual perception. So, only great reduction of MSE can show the improvement of quality in those areas. Criterion (2) restricts excessive bit transmission in the block. 4. Simulation and results

Simulation has been performed for Lena image of size 512 x 512. We determined the block size as 8 × 8, so that the number of pixels, N, is 64. The number of angles, L, should be decided depending on the block size. It is useful to let L be 2 r (r is integer) in order to minimize the overhead. We can reconstruct a clear block image with 8 equally spaced projections when the size is 8 x 8 [3]. The angles of the projections which we used in the simulation with L = 8 are shown in Fig. 3. We can group the projections into three kinds of vector by the dimension K: (1) K = 8 ( 0 ° , 9 0 ° ) , (2) K = 1 5 ( - 4 5 °, 45 °) and (3) K = 11 (-67.5 °, -22.5 °, 22.5 °, 67.5°). Figure 4 shows the projection for each group. F r o m Section 3.2, we know that each group needs two codebooks statistically. Thus, we need total 6 codebooks for all kinds of input data. As an objective measure of image quality, we employ the peak signal to noise ratio (PSNR) and the bit-rate as the following definitions: P S N R = 10 log 2552 (dB), MSE

(11)

\1 ~

~

213

90° I

.

67,5o

/



Fig. 3. 8 projection angles used in the simulation. bit-rate Number of total transmitted projections x J number of pixels in the image (bits/pixel),

(12)

where J is the number of bits for a projection. Simulation results without the quantization step are shown in Fig. 5 to demonstrate the validity of the projection and the reconstruction in the proposed scheme. When the average number of projections for a block is only 1.5-1.7, the PSNR is 3637 dB. It is almost impossible to distinguish the reconstructed image from the original. This fact verifies that the projection and the reconstruction in the Projection-VQ are valid. When quantization step is inserted, the image quality gets worse and the PSNRs of Fig. 5 are the limits on the conditions. We designed the codebooks with size 256 using 12 images excluding Lena. For the training of codebooks, we used the L B G (Linde, Buzo, Gray) algorithm [6]. Figure 6 shows the PSNRs and the bit-rates of the proposed scheme. We can see that the PSNRs and the bitrates vary gradually with MSEc and MSEd. If we simply change MSEc a n d / o r MSEd, it is possible to control the bit-rate or the PSNR in a fixed coding scheme. Figure 7 shows the comparison of performance between a normal VQ with block size 4 x 4 and the Projection-VQ. The codebook size of the Vol. 5, No. 3, May 1993

214

H.B. Park, C W. L e e / V Q on projection domain o f image

pyppypypypyp+ •

*













pt

o

o

o

o

e

o

/o

J/P9























p2

o

o

o

o

o

o

,,~o,~10















+







p3

o

o

o

o

o

o

//~o,j/~tt

.

.

.

















p4

o

o

o

o

o

o

//oJ/Ptz

.





o

o

o

o

e

o

./oo~13







o

o

o

o

o

o

o

o

p

5

















p6

o

o

o

o

e

o

/~o.~14

e+

.



+

*













p7

o

o

o

o

o

o

/~o/~,s







e

o

o

o

o

o

o

o

p

6 7

8 9 10 11

8

. / / / / i (a) projection of 0 °,90 °

/

/

(c) projection of 22.5 ° , - 2 2 . 5 °, 67.5 ° - 6 7 . 5 °

(b) projection of 45 °,-45 °

+

Fig. 4. Projections of 8 × 8 block image at 3 angle groups. A dot means a pixel of image and Pu's are the summed values of the pixels between lines.

A

MSEd =5

.~ 37

v

2.0

.C2

MSF~ = 5 15

36

Z U~ 35

34

I

I

I

I

I

I

I

I

16

32

64

128

16

32

64

128

MSE0

MSEc

(b) Fig. 5. (a) PSNR and (b) the average number of projections used for a block when the quantization step is removed in the ProjectionVQ.

idSFa = 5 34

~"

33

rZ~ c~

32

o

10

o

15

MSEd = 5 ~

0.6

~'~

o)

~

31

o

10

o

15

0.5

0.4

0.3

L

I

I

I

I

I

I

I

16

32

64

128

16

32

64

128

MSE¢

MSE~

(a)

(b)

Fig. 6. (a) PSNR and (b) the bit-rate of the Projection-VQ. The used image is Lena and the codebook size is 256. Signal Processing:

Image Communication

H.B. Park, C. W. Lee / VQ on projection domain o f image

~"

32

v ok2 Z 3O

~

z

4

28

) o normal vq

I

03

I 04

I

05

I 06

017

I 0.8

bit rate (bits/pixel) Fig. 7. Comparisonbetween the Projection-VQ and a normal VQ with block size 4 × 4.

normal VQ varies with the bit-rate, but that of the Projection-VQ is fixed as 256 except one curve, the size of which is 1024. The PSNR performance of Projection-VQ is superior to that of the normal VQ by about 2.5 dB in the same bit-rate and codebook size (where the bit-rate is 0.5). This is a surprising result in the memoryless-VQ scheme, and comparable to the memory-VQ such as the FSVQ (finite state VQ) [5]. Figure 8 shows the original and the reconstructed Lena with MSEc=64 and MSEd=10, where the PSNR and the bit-rate are 32.09 dB and 0.37 bits/pixel, respectively. The staircase noise, which is intrinsic in conventional block coding schemes, is not visible in Fig. 8(b). To prove this

215

fact in detail, we show the magnified portion of Fig. 8(b) in Fig. 9(a). For comparison, the magnified portion of a normal VQ image is shown in Fig. 9(b), which is coded at 0.63 bits/pixel. The PSNRs of them are almost same. But, 'the lines on the hat' of Fig. 9(a) are much more straight and clear than those of Fig. 9(b). A problem on the reconstructed image of Projection-VQ is that the grid noise is conspicuous because of the large block size. Ramamurthi and Gersho have removed the grid noise using a nonlinear two-dimensional smoothing filter without blurring the edge areas when the areas are classified well [8]. We can easily classify the areas by analyzing the transmitted projections without more side information. If the number of transmitted projections in a block is small (1 or 2) and the variance of the block is below a pre-determined threshold, we can decide that the block is in the monotone area. To compare overall system performance, system complexity should be considered. Even if a system has excellent performance in PSNR or image quality, it is not a good system any more if it has difficulty in implementation due to system complexity. Generally, the criteria of system complexity are computational complexity and memory size. Computational complexity depends on the number of multiplication and addition. But, because the

Fig. 8. (a) Original imageand (b) reconstructedimagewith MSEc= 64, MSEd= 10, where the PSNR is 32.09dB and the bit-rate is 0.37 bits/pixel. Vol. 5, No. 3, May 1993

216

H.B. Park, C. IV. L e e / VQ on projection domain of image

Fig. 9. (a) Magnified portion of Fig. 8(b), and (b) magnified portion of a normal VQ. The PSNR of them is about 32 dB.

former is much more complicated than the latter, only the former is usually considered. In normal VQ with block size 4 × 4, the required multiplications to obtain the Euclidean distance between an input vector and a codevector are 16. If the codebook size is 28, 4096 multiplications are required to process a block. In Projection-VQ with blocksize 8 × 8, the numbers of input vector elements are 8, 15 and 11 for the 0 °, 45 ° and 22.5 ° group, respectively, which are rather smaller than that of the normal VQ with codebook size 4 × 4. When the codebooks have 28 codevectors, the required multiplications are (2 x 8 + 2 x 15 + 4 x 11) x 28= 23,040 to obtain the Euclidean distances betwen the codevectors and the projected vectors at all angles for an iteration. Also, some more computation is required in projecting, obtaining SAR, reconstructing, etc, but few compared with getting Euclidean distance. The average number of iteration for a block is about 2 at a reasonable PSNR condition. Thus we can compare the computational complexity between the normal VQ with codebook size 4 x 4 and the Projection-VQ with codebook size 8 x 8; that is, the latter has about 3 fold computational complexities than the former on the same image area. This is a very encouraging result compared with exponential increasing when the block size increases in normal VQ. The memory to store 6 codebooks in ProjectionVQ is 2 x 28 x (8 + 15 + 11) = 17,408 bits, which is SignalProcessing:ImageCommunication

about 4 times the normal VQ with codebook size 4 x 4. There is also a small increase compared with the exponential increase of the normal VQ when the block size increases.

5. Conclusions We have proposed a new image compression technique, Projection-VQ, which combines the theory of projection with VQ. The projection maps a vector space to a subspace of it; thus, the Projection-VQ have processed larger blocks than the conventional VQ schemes because the vector dimension of the projected data is small. The bitrate is reduced by adaptively allocating the number of projections depending on the block feature. The PSNR performance of the Projection-VQ improves about 2.5 dB compared with a normal VQ scheme. This result is remarkable as memoryless-VQ, and competitive with memory-VQ such as FSVQ. The edge shape is preserved well in the Projection-VQ because the projections containing edge information are transmitted preferentially. Therefore, the staircase noise is not visible. The grid noise is more conspicuous because the block size is large. But, the postprocessing to remove it can easily be carried out using overhead bits in the Projection-VQ. Thus, the overall subjective image quality of the Projection-VQ is substantial.

H.B. Park, C W. Lee / VQ on projection domain of image

References [1] A. Gersho, "On the structure of vector quantizers", IEEE Trans. Inform. Theory, Vol. IT-28, No. 2, March 1982, pp. 157 166. [2] R. Gordon, "A tutorial on ART (algebraic reconstruction techniques)", IEEE Trans. Nuclear Sci., Vol. NS-21, June 1974, pp. 78 93. [3] G.T. Herman, "ART: Mathematics of applications A report on the mathematical foundations and on applicability to real data of the reconstruction techniques", J. Theoret. BioL, Vol. 42, 1973, pp. 1-32. [4] A.K. Jain, Fundamental of Digital Image Processing, 1989, Chapter 10, pp. 431-475. [5] T. Kim, "New finite state vector quantizers for images", Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., April 1988, pp. 1180-1183. [6] Y. Linde, A. Buzo and R. M. Gray, "An algorithm for vector quantizer design", IEEE Trans. Comm., Vol. COM28, January 1980, pp. 84-95.

217

[7] N.M. Nasrabadi and Y. Feng, "A dynamic finite state vector quantization scheme", Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., April 1990, pp. 2261-2264. [8] B. Ramamurthi and A. Gersho, "Nonlinear space-variant postprocessing of block coded images", IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-34, October 1986, pp. 1258 1267. [9] A. Rosenfeld and A.C. Kak, Digital Picture Processing, Vol. 1, 1982, Chapter 8, pp. 353 430. [10] P.R. Smith, T.M. Peters and R.H.T. Bates, "Image reconstruction from finite numbers of projections", J. Phys. A: Math. Nuclear Gen., Vol. 6, March 1973, pp. 361-382. [11] Y. Yamada, S. Tazaki and R.M. Gray, "Asymptotic performance of block quantizers with difference distortion measures", IEEE Trans. Inform. Theory, Vol. IT-26, No. 2, January 1980, pp. 6-14.

Vol. 5, No. 3, May 1993