SIGNAL PROCESSING:
COMMUNICATION Signal Processing:
Image Communication 12 (1998) 231-242
A new coding algorithm for arbitrarily shaped image segments Jong-Won
Yi a,*, Soon-Jae Cho a, Wook-Joong
Kim”, Seong-Dae
Kima, Sang-Jee Lee b
a Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, 373-l Kusong-dong. Yusong-gu, Taejon 305-701, South Korea b Agency for Defence Development, South Korea Received
2 1 February
1996
Abstract In this paper, a new texture coding algorithm for arbitrarily shaped image segments is introduced. In contrast to other methods described in the literature, the proposed coding algorithm has low computational complexity, is based on the widely used 8 x 8 2D-DCT, and thus, can be readily implemented using existing block-based coding standards such as JPEG, H.261, MPEG. And the content based functionalities currently discussed in the MPEG-4 standardization phase can be easily achieved with the proposed algorithm. Computer simulations and comparisons with other results from the literature reveal that our proposed technique is quite promising and competitive. 0 1998 Elsevier Science B.V. All rights reserved.
Keywords: Object oriented coding; Content based functionality; Extension-interpolation
Texture information;
Arbitrarily
shaped image segment;
1. Introduction In traditional image coding systems, the image is divided into N x N blocks. But, this approach develops severe degradations such as blocking and mosquito artifacts at a very low bit rate. To solve these problems, the object-oriented coding schemes have been proposed and developed by many researchers. In addition, VOP (video object plane) concept is introduced for the content based functionalities currently discussed in the MPEG4 standardization phase. However, the object-oriented coding schemes and the VOP coding yield image segments of arbitrary shape. Thus, it is necessary to encode the internal contents or the texture information of the object that has an arbitrary shape. To encode the texture information of arbitrarily shaped image segments, various techniques have been proposed. Conventional methods are classified into two branches, that is, block-based DCT (discrete cosine transform) methods and shape adaptive transform coding methods. But both kinds of the methods have several * Corresponding
author. E-mail:
[email protected].
0923-5965/98/$19.00 @ 1998 Elsevier Science B.V. All rights reserved. PII SO923-5965(97)00039-g
232
J.-W.
Yi et al. I Signal Processing: Image Communication I2 (1998) 231-242
disadvantages. The block-based DCT methods have low energy compaction and the shape adaptive transform coding methods have extremely heavy computational complexity. Therefore, in order to make the transform spectrum as compact as possible as well as to reduce the computational complexity, a new approach based on block-based DCT is required. The rest of this paper is dedicated to a new proposed algorithm using an extension-interpolation method based on block-based DCT. We first briefly review the traditional coding method for arbitrarily shaped image segment in Section 2. We develop the basic key idea of the extension-interpolation method and propose our coding algorithm in Section 3. Then we compare our algorithm with SA-DCT and macroblock (MB) padding from MPEG4 in Section 4. Section 5 concludes with a summary of the paper and discussion of the future work.
2. Conventional
methods
In block-based DCT algorithms, images are separated into small blocks with fixed size N x N as depicted in Fig. 1. Since we are interested only in the object segment, we consider only internal and boundary blocks, not background blocks. For the internal blocks, all pixel values are fully defined and the conventional 2D-DCT algorithm is used to encode the texture information of these blocks. But, for the boundary blocks, only the pixels of the object region are defined. One straightforward approach is to fill zero values or the mean values of the object region outside the boundary and treat the block as traditional internal image blocks [2]. Another method is to extend the image segment with its mirror image outside the object region [2]. These methods are illustrated in Fig. 2(b)-(d). These block-based DCT methods have many advantages. They are simple, have low computational complexity, and can use the existing codec hardware to process the arbitrarily shaped image segments. But an obvious drawback of these approaches is significant increase of the high order transform coefficients and thus serious degradation of the compression performance. And more promising methods are the SA-DCT (shape-adaptive DCT) [5] and the POCS (projection onto convex sets) algorithm [3]. But these methods also have some problems. The coding efficiency of the SA-DCT
Boundary
Block
t
Background Block
Internal Block Fig. 1. A typical
example
of arbitrarily
shaped image segment and block structure.
J.-W. Yi et al. ISignal Processing: Image Communication 12 (1998) 231-242
233
A
0 0 l
(a) Original Segment
(b) Zero Stufhg
(c) Mean Stuffbg
(d) Mirror Image Extension
Method
l
l
l
l
l
l
l
Method
l
I
*
(e) Extension Interpolation Fig. 2. l-dimensional
illustration
Method
based on block-based
DCT.
method is degraded because the horizontal correlation becomes lower after vertical lD-DCTs are performed and the columns are shifted and aligned to the upper border of the 8 x 8 reference block. POCS algorithm is the iterative technique and the iterative process is terminated when the pixels outside the boundary converge. So, POCS algorithm needs much computation. The shape adaptive transform coding methods such as the GOT (generalized orthogonal transform) by Gilge [l] and the iterative coding technique by Kaup and Aach [2] provide much better performance, but they need extremely heavy computation in both the encoder and the decoder. And they cannot use the existing codec hardwares, and thus, it is difficult to implement these methods.
J.-W. Yi et al. ISignal Processing: Image Communication 12 (1998) 231-242
234
Vertical 6:8EI
original
Boudacy Block
Vertical EI
Horizontal EI
(a) Encoding Using EI
Vertical Inverse EI
Horizontal Inverse EI
Recoustructed Boudary Block
(b) Decoding Using Inverse EI Fig. 3. Boundary
block coding using EL
3. Extension-interpolation method 3.1. Overview of original extension-interpolation
method
To overcome the problems mentioned in Section 2, an extension-interpolation (EI) method has been proposed [4]. This EI method is to use an interpolation technique. Without loss of generality, we describe the EI method in one dimensional case. For a rectangular block with block length N, we interpolate N pixels with the object segment with length M (h4 c N) and replace the whole block with these interpolated N pixels as shown in Fig. 2(e). For two dimensional case, the EI method is performed in one direction at first, and then, once again in the other direction as depicted in Fig. 3(a). The basic concept of the EI method proposed in [4] is as follows. Once again we only consider the one dimensional case without loss of generality. Suppose the original segment has the length A4 and the block length is N (N > M). First we perform M-point lD-DCT to get M transform coefficients, and then, fill (N -M) zeros in the transform domain. Finally we perform N-point lD-inverse DCT to get the interpolated N pixels. This method does not introduce higher frequency components than the original segment has. But it needs much computation in performing DCT and inverse DCT. As shown in Fig. 3, when the EI method is applied to a general 8 x 8 2D-DCT block, the EI is performed 16 times: 8 times in horizontal direction and 8 times in vertical direction for the worst case. So about 16 times lD-DCT and about 16 times lD-inverse DCT are required for each block and in case of the general value of M, a fast DCT algorithm is not available. Consequently this method needs too much computation. To reduce the computation complexity, we propose a new EI method in the next section.
J.-W
Yi et al. ISignal Processing: Image Communication I2 (1998)
231-242
235
3.2. Optimal EI method in spatial domain We propose a new optimal EI method in spatial domain. We consider the encoder first. Suppose the original segment has the length M and the block length is N (N > M). Let fi (n 1) denote the luminance function of M pixels in the original object segment and fl(n2) the luminance function of N interpolated pixels which has the same number of DCT coefficients as fi(ni) and thus does not have any higher frequency component than the original segment has. M-point DCT basis {bb} for fi(nl) and N-point DCT basis {akn} for fz(nz) are given by
dm akn =
bkn =
k = 0, 7c(2n + 1 )k
&%os[
2N
1,
k=l,...,
J1IM>
k = 0,
J2/Mcos[“‘2;;1)k],
k = l,...,M-
Based on the basic concept described fi(ni) is as follows:
(1)
N-l,
(2)
1.
in Section 3.1, the matrix expression
for interpolation
of fz(nl)
with
fi(O)
f?(l) fz& - 2) _fz(N - I)_ a00 al0 a20
a01 all a21
-1
... ... ... .
a0p-1)
al(N-1) a2(N.
b(M--l)o
. qN-l)l
.
ho
/do1
h,
. . pobo(M-1) ..
h(,w-1)
I)
. . .
q.+l)o
poboo
qN--l)(N--l)
b--l),
0
0
0
0
A scaling factor ~0 in Eq. (3) is to avoid the luminance of fl(nl) and fz(n2)is given by
.
..
b(,u-,)(M-,) 0
0
change in the spatial domain.
1. (3)
Each DC component
(4)
where Mf, and Mfz are the mean values of Mfz, we have
f l(nl)and f z(n2) respectively.
Since we want to make Ml., =
(6)
J.-W. Yi et al. I Signal Processing: Image Communication 12 (1998) 231-242
236
Thus,
(7) f dN _fdN
- 2)
- I>_
where &@t?,
k=O,
1,
k=
pk =
(8)
l,...,M-1.
By Eqs. (1) and (2), k = 0, aklbh
1 [
=
1,
cos 7r(2m 2M + 1 )k
Thus we obtain the EI coefficients
(9) M-l.
k=l,...,
{Cl,,,} given by
M-l
&I
b
c
(10)
Pkaklbkm
k=O
M-l &+&+cos[ 1 =--M
1
2N sin[:(y
1
~ &zV+2&V
1
+--2v&X7
[(4
N
[(
(11)
]
- l)]
2m+l M
_ L!$..!)(2k4 - 1)] +I$ _ Z?!$
2M
lcos[
+ y)(2M
sin Ir 21+1 I
sin T y [( sin f
rc(2m + 1)k
7c(2Z+ 1)k
---
>I
’
>I
(12)
At the decoder, we reconstruct the shape of the object segment using the previously-transmitted contour information. So as shown in Fig. 3(b), we can reconstruct the original texture information in the reverse order of the encoder. In the same manner, we can obtain the inverse EI coefficients {Dim} for the decoder.
(13)
=-.
1 PO
M-l
--&+~&cos
7t(2m + 1)k 2N
1 [ cos
7Qz+ 1Y 2M
1
(14)
J.-W. Yi et al. ISignal Processing: Image Communication I2 (1998) 231-242
1
N
,&%+2&K
+p
+ +)(2M
sin[$(y
1
sin z [(
sin T ~-Z&I [( sin t 9-y { [(
1 2a
v
+ F
237
- l)]
>I
(2M_1) >
11 +
>I I
(15)
According to the above equations, we can calculate the EI coefficients {Clm} and the inverse EI coefficients {Djm} for every A4 with fixed N, and store the coefficients in the memory in advance. By very simple matrix operations in Eqs. (16) and (17), we can perform EI and inverse EI fast and easily.
fz(O)
coo c,o
f2(1)
c20 f&2) _fz(N
; - I)_
.‘.
Cop-l)
“.
Cl(M-I)
‘..
C2(M-
‘..
;
C(N-I)0 ...
C(N-l)(M-I)
I)
>
(16)
_
(17)
3.3. Horizontal-vertical
priority
decision for El
As shown in Fig. 3, we perform EI in horizontal or vertical direction first, and in the other direction later. Which direction we perform EI first influences the overall performance of the encoder. It is important to decide the priority of horizontal or vertical direction for EI only with the shape of the object segment in the block to be encoded not to introduce additional information. For the pixels of the object segment in each block, the variances of the length of the line segments in the horizontal direction and of the vertical direction are calculated respectively. Then we perform EI first in the direction that has the smaller variance of the two directions. We consider the case shown in Fig. 4, for example. The variance of the length of the horizontal line segments cri is ((5 - rnh)’ + (5 - mh)’ + (5 m~)2+(4-m~)2+(4-m~)2+(2-m~)2}/6= 1.14 where mh =(5+5+5+4+4+2)/6=4.17. In a similar manner, the variance of the length of the vertical line segments ut is 1.2. Since cri < CJ~we perform EI in the horizontal direction first. 3.4. Comparison
of computational
complexity
The computational complexity of the proposed EI and the SA-DCT is compared. We consider the tation needed to transform only a row or a column since it is sufficient for the comparison. The SA-DCT needs M-point (0 < M < N) lD-DCT. And the proposed EI method needs a simple operation by Eq. (16) and N-point lD-DCT. However, since N = 2k (where k is a positive integer) in case, various fast DCT algorithms are available [6]. The precise comparison of the computational complexity of each method is shown in Table 1. The proposed EI method needs somewhat more computation than the SA-DCT, but shows better mance. The detailed results are presented in Section 4.
compumatrix general
perfor-
J.-W.
238
Yi et al. I Signal Processing:
Communication 12 (1998) 231-242 Internal
block
block
f!M Vertical Line Segment Length Fig. 4. Horizontal-vertical
Table 1 Comparison
Proposed SA-DCT
of computational
priority
for El.
Fig. 5. Conventional
block placement.
complexity
EI
3.5. Block placement
decision
No. of multiplications
No. of additions
MN + (N/2) log, N M2
(M-1)N+(3N/2)logzN-N+l M(M1)
method for eficient
EI
Before we perform EI, we divide the object segment into N x N blocks, that is, we place the N x N blocks in the object segment. The widely used placement method is shown in Fig. 5. This method is simple but we encode more blocks than are needed. As depicted in Fig. 6, we place the blocks on the image segment optimally, that is, making the number of blocks to be encoded as small as possible, in the horizontal and the vertical directions respectively. Then we take the placement of the direction where the number of blocks is the smaller of the two directions. According to the proposed placement in Fig. 6, we have to encode only 16 blocks, while we have to encode 19 blocks by the conventional block placement in Fig. 5. This proposed block placement method divides the object segment effectively without introducing any additional information. In addition, as the number of blocks is reduced, the overall bit rate is also reduced, and the coding efficiency of EI is also improved.
3.6. Overall procedure
of proposed
coding algorithm
The overall block diagram of the proposed transform coding of arbitrarily shaped image segments is shown in Fig. 7. As shown in Fig. 7, we perform EI only for the boundary blocks, then treat the blocks as the traditional internal blocks. At the decoder, we reconstruct the shape of the object segment using the transmitted shape information, and then, decode the texture information in the reverse order of the encoder.
J.-W
Yi et al. ISignal Processing: Image Communication 12 (1998) 231-242
(b) Optimal Block Placement in Vertical Direction
(a) Optimal Block Placement in Horizontal Direction Fig. 6. Efficient block placement
Object
Optbd
Segment
Block
for El
Bit Stream
Placement
+
Optimal Block Placement
Bit Stream
(a) Encoder Fig. 7. Successive
1
ReeoWetedImage
(b) Decoder steps of overall coding scheme using EI
239
J.-W. Yi et al. ISignal Processing: Image Communication I2 (1998) 231-242
240
Fig. 8. Test images.
PSNR(dB) 50 _________ SA-DCT MacroblockPadding -----
45
40
35
30
25
5
10
15
20
25
30 g
Fig. 9. Rate-distortion
curve for
INTRA mode.
4. Simulation results We use the MOTHER AND DAUGHTER image (QCIF format) and its segment mask shown in Fig. 8 to evaluate the coding performance of the SA-DCT method, the macroblock padding method [7], and our proposed EI algorithm. Identical coding parameters were used in all the experiments, including the same quantizer, the zig-zag scanning, VLC of H.263 coder, and also the block placement method proposed in Section 3.5. Since we are interested in only the foreground image segment, we consider only the boundary blocks to calculate the total bit rate and the peak signal-to-noise ratio (PSNR) of the luminance component (Y). The coding of shape information of the foreground image segment was not taken into account to calculate the total bit rate, for the same bit rate was needed for all the methods.
J.-W. Yi et al. /Signal Processing: Image Communication 12 (1998) 231-242
SA-DCT MacroblockPadding
__---___
241
-
-----
Fig. 10. Rate4istortion curve for INTER mode.
We plot the rate-distortion curves of the reconstructed images. The results for a wide range of the quantization parameters (QP = 1, . . . ,3 1) corresponding to the bit rate for the INTRA mode and the INTER mode are shown in Figs. 9 and 10, respectively. For the INTER mode, about 0.5-0.8dB PSNR gain is achieved using the proposed EI method for QP = 5-31 when compared to the SA-DCT method. We can get a similar result for our EI method when compared to the macroblock padding method. For the INTER mode, the MB padding method shows the worst performance while the SA-DCT method and the EI method provide quite similar performance compared to each other. Fig. 11 shows the effect of the block placement on the EI method. As our block placement method reduces the number of blocks to be encoded, we cannot directly compare the EI method without block placement to the EI method with block placement like the previously shown QP-PSNR ratedistortion plots. So, the bit rate is used for the plot instead of QP. We can see that the block placement method improves the coding efficiency of the EI method.
5. Conclusion A new coding algorithm for arbitrarily shaped image segment has been proposed. In contrast to the methods described in [ 1,2], the computational burden of the EI method is very low, and it can be easily implemented with the existing codec hardwares using the widely used standard 8 x 8 2D-DCT algorithm. Consequently, we can easily embed our algorithm into the most coding schemes with a very little computational overhead. Our proposed scheme provides better performance than the conventional methods over the wide range of compression ratio. And we have also presented the optimal block placement method for the efficient EI algorithm. It is shown that this block placement method improves the coding performance of the EI method.
J.-W.
242
Yi et al. ISignal Processing: Image Communication 12 (1998)
231-242
PsNm 1B) SO
45
40
35
30
250 L
2000 4000 6000 8000 1000012000140001600018000 Bits Fig. 11. Rate-distortion
curve for block placement.
Note that the block placement method is also effective for other methods such as the SA-DCT and the macroblock padding. And also note that our overall EI algorithm achieves the performance improvement without any additional overhead bit.
[1] [2] [3] [4] [S] [6] [7]
M. Gilge, T. Engelhardt, R. Mehlan, Coding of arbitrarily shaped image segments based on a generalized orthogonal transform, Signal Processing: Image Communication 1 (2) (1989) 153-180. S.-F. Chang, D.G. Messerschmitt, Transform coding of arbitrarily shaped image segments, Proc. 1st ACM Intemat. Conf. on Multimedia, Vol. 1, 1993, pp. 83-90. H.H. Chen, M.R. Civanlar, B.G. Haskell, A block transform coder for arbitrarily shaped image segments, Proc. Very Low Bit-rate Video 94, Paper No. 1.1, 1994. S.-J. Cho, S.-W. Lee, J.-G. Choi, S.-D. Kim, Arbitrarily-shaped image segment coding using extension-interpolation, J. Korean Institute of Communication Science 20 (9) (1995) 2453-2463. T. Sikora, B. Makai, Shape-adaptive DCT for generic coding of video, IEEE Trans. on Circuits and Systems for Video Technology 5 (1) (1995) 59-62. K.R. Rao, P. Yip, Discrete Cosine Transform: Algorithm, Advantages, Applications, Academic Press, New York, 1990. Ad Hoc Group on MPEG-4 Video VM Editing, MPEG-4 Video Verification Model Version 3.0, ISO/IEC JTCl/SC29/WGll, N1277, July 1996.