Image sequence coding using temporal co-occurrence matrices

Image sequence coding using temporal co-occurrence matrices

Signal Processing: Image Communication 4 (1992) 185-193 Elsevier 185 Image sequence coding using temporal co-occurrence matrices V. Seferidis and M...

1MB Sizes 0 Downloads 32 Views

Signal Processing: Image Communication 4 (1992) 185-193 Elsevier

185

Image sequence coding using temporal co-occurrence matrices V. Seferidis and M. G h a n b a r i Department of Electronic Systems Engineering, University of Essex, Colchester C04 3SQ, United Kingdom Abstract. Applications of the temporal co-occurrence matrices to interframe video coding are discussed. In the area of low bit-rate coding an adaptive version of a simple predictive/transform coding technique is proposed which increases the subjective quality of the coded images. The adaptive process is based on the homogeneity criterion calculated from the temporal cooccurrence matrices of the image sequence. In multi-layer video coding, an algorithm for the classification of the frame difference information is introduced. It divides the interframe picture element changes into two groups, exploiting the psychovisual characteristics of the human visual system (HVS). The experimental results show the simplicity and speed of operations of the temporal co-occurrence matrices for a wide range of applications in interframe video coding. Keywords. Temporal co-occurrence matrices, low bit-rate coding, multi-layer video coding.

1. Introduction Texture is one of the i m p o r t a n t characteristics used in identifying objects or regions of interest in still images and can be defined as a structure c o m p o s e d o f a large n u m b e r of m o r e or less ordered similar elements [19]. The statistical a p p r o a c h to texture analysis regards textures as a set o f statistics extracted f r o m the local picture properties [7]. A m o n g the statistics describing a particular image, the h u m a n visual system (HVS) has the highest sensitivity to the second order [10]. Well k n o w n examples o f such statistics are the grey level difference histogram and the grey level cooccurrence matrix. T h e co-occurrence matrix can be defined as the second-order joint conditional probability density function f ( i , J l d, 0). Each element o f f ( i , J l d, 0) is the probability o f going f r o m a grey level i to a n o t h e r level j, given that the intersample stepping is d and the direction is specified by the angle 0. The elements can be written in matrix form, the socalled co-occurrence matrix. T h e f o r m a t i o n o f a co-occurrence matrix is straightforward. Consider g: Lx x Ly ~ G to be a two-dimensional digital image, with horizontal and

vertical picture elements (pixels) Lx = { 1. . . . . nx} and L y = { 1. . . . . ny}, respectively, and grey levels G = {0, 1. . . . . m - 1 }. Let d be the distance that separates two pixels at coordinates (x~ , Y 0 and (x2, y2) with grey levels i and j, respectively. T h e n the second order joint conditional probability in a picture region can be defined as [9] f ( i , j ) = ( N u m b e r o f pairs of pixels where g ( x l , Yl) = i and g(x2, Y2) = j ) / (Total n u m b e r of such pairs o f pixels in the region), (1)

where i, j = 0, 1. . . . . m - 1. E a c h f ( i , j ) forms an element o f the two-dimensional m x m co-occurrence matrix and counts how often a pair o f pixels, that are separated by a certain distance and lie along a certain direction, occur in an image [6]. If d is small relative to the texture coarseness, the matrix elements cluster near the m a i n diagonal while for larger d the values are m o r e spread out. The co-occurrence matrices can be used to extract a n u m b e r of useful textural features. Haralick et al. [8] have identified 14 such features, but usually 5 of t h e m have a wider range o f

0923-5965/92/$05.00 © 1992 Elsevier Science Publishers B.V. All rights reserved

V. Seferidis, M. Ghanbari / Applications o f the temporal co-occurrence matrices

186

applications in image processing [2]. These are energy, correlation, entropy, local homogeneity and inertia.

2. The concept of temporal co-occurrence matrices The definition of the co-occurrence matrix in (1) can be extended to the temporal domain if the pair of pixels are taken from two successive frames of an image sequence with the same coordinates [17]. This can be written according to (1) as

ft(i,j) = (Number of pairs of pixels where g(x, y, tO = i and g(x, y, t2)=j)/ (Total number of such pairs of pixels in the region),

(2)

where tj and t2 represent the time moments corresponding to two successive frames of the image sequence. The formation of the temporal co-occurrence matrix is a simple and fast operation. For every block or sub-image of Nx x Ny pixels from two successive frames, gl(x, y, tl) and g2(x, y, t2), the following operations are performed: For y = 0, 1. . . . . ivy begin For x = 0 , 11 . . . . Nx begin increment the element

f,(gl(x, y, tl), g2(x, y, t2)) by one end end

(3)

This requires NxXNy operations with each operation being simply a comparison and an addition. The algorithm is repeated then for all the blocks inside the frames and the accumulated values represent the elements of the co-occurrence matrix. Alternately the interframe co-occurrence matrix of the whole two frames can be derived if Nx and Ny are the horizontal and vertical picture dimensions, that is, Nx = nx and Ny =ny. Signal Processing: Image Communication

Figure 1 shows two temporal co-occurrence matrices taken from two different pairs of frames from the luminance signal of the 'Miss America' sequence, quantized to 16 grey levels for convenience. The original sequence of 155 frames consists of 288 lines with 360 pixels for the luminance and 144 lines with 180 pixels for the chrominance components, all quantized to 256 grey levels. The first temporal co-occurrence matrix has been calculated from frames 20 and 21 containing low motion activity whereas the second one has been extracted from frames 78 and 79 with a much higher motion activity. For both cases the matrices were calculated from the whole picture sizes. The definition in (2) implies that the diagonal elements of the temporal co-occurrence matrix represent the number of unchanged pixels, and the off-diagonal elements give the number of changed pixels from one flame to another. This property can be used to define a criterion for the amount of movement within an image given by Moving Criterion Sum of off-diagonal elements Sum of all elements

(4)

The moving criterion (MC) takes values between 0 and 1, where 0 corresponds to identical successive frames (no motion at all) and 1 corresponds to completely different frames (e.g. scene change). The variation of the MC with respect to time is an indication of the smoothness in motion. Large variations in MC indicate sharp changes in motion which will result in high bit-rate variations in interframe coders [5]. Figure 2 shows the MC values calculated from the luminance signal of the 'Miss America' sequence. As can be seen the motion variation, that is the difference between the maximum and minimum values of MC, is less than 10 percent. Statistical features similar to the ones defined for pattern analysis can also be extracted from the temporal co-occurrence matrices. For example, all the 5 features mentioned earlier for the spatial cooccurrence matrices [2] can also be applied to the

V. Seferidis, M. Ghanbari / Applications of the temporal co-occurrence matrices 22 822 52 0

0 43 18395 2299

0

0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 1432 0 50957 285 289 11066 1 569 0 2 0 0 0 0 0 0

0 0 0 0 680 3534 122 8 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

3

0

0

168 1799 224 8 7

5 254 1999 172 10

0 3 169 2262 207

0

0

0

0

0

0

10

5

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

187

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0

0

0

0

0

0

0

0 0 13 252 2866 186 3 0 0 0 0

0 0 0 8 146 987 51 1 0 0 0

0 0 0 0 6 59 329 17 1 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

5

9

0

0

18 75 3 0 0

6 2 1 0 0

2 0 0 0 0

0 0 0 0 0

(a)

18 830 45 0 0 0 0 0 0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

0

0

0

0

0

0

29 0 17242 2470 2793 49745 83 1033 6 123 0 74 0 12 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0

0 3 969 9160 1079 115 72 20 0 0 0 0 0 0 0

0 0 110 855 2911 266 112 30 4 0 0 0 0 0 0

0 0 34 120 294 902 387 106 27 7 0 0 0 0 0

0 0 9 40 102 366 1791 427 74 9 0 0 0 0 0

0 0 0 5 23 127 333 1430 578 45 6 0 0 0 0

0 0 0 0 6 30 105 464 2453 372 16 0 0 0 0

0 0 0 0 2 10 24 47 344 877 149 6 0 0 0

0 0 0 0 0 1 3 5 28 130 274 32 0 0 0

0 0 0 0 0 0 0 0 5 9 35 52 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

(b) Fig. 1. Temporal co-occurrence matrices of the 'Miss America' sequence (a) between frames 20 and 21, (b) between frames 78

and 79.

t e m p o r a l d o m a i n , p r o v i d i n g useful i n f o r m a t i o n to

d e m o n s t r a t e d in Fig. 3, w h er e a plot o f the h o m o -

i n t e r f r a m e c o d i n g techniques. T h e h o m o g e n e i t y

geneity criterion versus the f r a m e n u m b e r in the

criterion ( H C ) for instance, can be defined in the

sequence is t ak en f r o m two different regions (sub-

t e m p o r a l d o m a i n by

images) o f 64 x 64 pixels each. Th e first region contains the face area which has a high detailed p a t t e r n

Homogeneity Criterion m--1 m--I

= ~,

Z

c o m p a r e d with the p a t t e r n o f the second region

1

/=0 j=0 l + ( i - - j ) 2

ft(i,j),

(5)

t ak en f r o m a m o r e ' u n i f o r m ' area, such as the clothes. T h e selected regions are s h o w n in Fig. 4.

which c o n t a i n s i n f o r m a t i o n a b o u t the coarseness

T h e c o m p u t a t i o n a l simplicity o f the t e m p o r a l

o f the m o v i n g objects within the frames. This is

c o - o c c u r r e n c e matrices is an interesting feature. Vol. 4, No. 3, June 1992

V. Seferidis, M. Ghanbari / Applications o f the temporal co-occurrence matrices

188

0.76

0.74' e. 0

0.72' tO p,

0.70' 0

0.68

0.66 0

I

I

I

I

I

20

40

60

80

1O0

Frame

Number

Fig. 2. Moving criterion for the 'Miss America' sequence. 3000 Clothes region

2000

0

1000

V"

o

.co

0

~

0

20

40

60

80

100

120

Frame Number

Fig. 3.' Homogeneity criterion for two regions of the 'Miss America' sequence. The number of operations required to process an image is directly proportional to the number of pixels N, present in the image. All these operations are counts and sums and involve only integer arithmetic. Moreover temporal co-occurrence Signal Processing: Image Communication

matrices of sub-images can be easily implemented in real-time by parallel processing techniques. These points indicate that the temporal co-occurrence matrices are fast and simple tools for the interframe image processing.

189

V. Seferidis, M. Ghanbari/ Applications of the temporal co-occurrence matrices

Fig. 4. Selectedregions in the calculation of the homogeneity criterion.

3. Applications of temporal co-occurrence matrices 3.1. Adaptive low bit-rate coding

For the first application a 10 Hz version of the 'Miss America' sequence is used. The frame difference pictures are subdivided into 8 x 8 pixel blocks and the luminance and chrominance signals of each block are encoded by a simple predictive/transform coder. The transform coefficients for each

DCT

block are first zig-zag scanned and then quantized with a uniform quantizer. Finally a two dimensional variable length code (VLC) is applied for further compression of the quantized coefficients. In order to keep the output bit-rate fixed at a rate of 64Kbit/s, a 16Kbit elastic buffer is employed. The fullness of the buffer is monitored every sixteen 8 x 8 pixel blocks and the step sizes of the uniform quantizer are changed accordingly. The buffer feedback loop is similar to that described in the CCITT Reference Model 8 video coder [ 1]. The homogeneity criterion is employed in the feedback loop together with the buffer control loop to decide the quantization step size. Based on the homogeneity criterion, the coder decides whether a particular sub-image of 32 x 32 pixels (16 blocks of 8 x 8 pixels) contains high detailed moving objects and if so must be coded with a fine quantizer. For a uniform area a coarser step size is chosen. The block diagram of the adaptive coder is shown in Fig. 5. Figure 6 shows the 70th flame of two coded image sequences. In the first picture the quantization step size is determined by the conventional

Quant i z at i o n H

VLC

H Buffe~r

i iQuantizatI i~n.1 Homogenei t y Decision PredicI-"tor L

Fig. 5. Block diagram of the adaptive DPCM/DCT coder used in the first application. Vol. 4, No. 3, June 1992

190

V. Seferidis, M. Ghanbari / Applications of the temporal co-occurrence matrices

3.2. Pixel classification

(6)

!1%~!!!¸¸ ~ ¸¸¸%¸11 ¸¸!

.........

Fig. 6. Coded images at 64 Kbit/s (Frame 70). (a) Non-adaptive coder; (b) adaptive coder using Homogeneity Criterion.

method based on the buffer status control, while in the second one the step size of the uniform quantizer is calculated by the adaptive technique described above. The subjective improvement in the coded picture of the adaptive coder is particularly noticeable in high detailed areas such as the face. This well correlates to the psychovisual behaviour of the human visual perception. It is experienced that in viewing a head and shoulders picture, people pay more attention to the highly textured areas, such as the eyes and the mouth [13]. The presented method can then be used in knowledge based segmentation and model-based coding algorithms to design low bit-rate coders [14]. Signal Processing: Image Communication

In many applications it is advantageous to divide the interframe difference information, in either pixel or transform domain, into two, or more, layers [4, 11, 12]. Usually the first, or 'base', layer coarsely codes the frame difference picture, while the other layers encode the residual distortions of the first layer. The temporal co-occurrence matrices can also classify the interframe difference picture into several layers. The method is based on the fact that the human eye is most sensitive to large intensity changes between successive frames [16]. This characteristic of the HVS in connection with the temporal co-occurrence matrices can be used to optimise the coder performance. It is evident from (2) that the position of the elements within a temporal co-occurrence matrix contains information about the intensity changes. The further a matrix element stands from the main diagonal, the larger becomes the represented changed pixel difference and hence the more important it is for the HVS. Using this observation it is possible to separate the elements of the temporal co-occurrence matrix into two (or more) groups. In Fig. 7, for example, the elements closest to the main diagonal form the less important elements (No. 1 areas in Fig. 7) and the remaining elements are the important ones (No. 2 areas in Fig. 7). A quantitative measure for the pixel classification can be defined by the temporal classification ratio (TCR),

Fig. 7. Element separation of the temporal co-occurrence matrix.

191

V. Seferidis, M. Ghanbari / Applications of the temporal co-occurrence matrices

Pixel

--="-~'~Classi|ication]l

l.--[

DCT

II Filtering I ~ I L I and I~l~Jl-~l----I I Pred!ction I v I

IDCT

Oua.,,zatio._

VLC

~

'q'~ QuantizatiOnl

Fig. 8. Block diagram of the interframe coder using the pixel classification method.

TCR -

Sum of the most important elements Sum of the non-diagonal elements

(6) For example T C R = 0.1 implies that only 10% from the changed pixels are classified as important and the rest are non-important. For bandwidth compression, only pixels belonging to the important elements of the temporal co-occurrence matrix are coded. Discarding the non-important pixels has little effect on the reconstructed picture quality, as usually they are mostly due to noise and background luminance changes.

40000

~ •

Figure 8 shows the block diagram of a coder based on the pixel classification method. The temporal co-occurrence matrices are first calculated from the current frame and its prediction. For a predefined value of the temporal classification ratio we divide the temporal co-occurrence matrix into two regions, like the ones shown in Fig. 7. Then the frame difference picture is scanned and depending on the value of each pixel a decision is made whether it corresponds to an element of the temporal co-occurrence matrix within the area marked (1) in Fig. 7 or not. The pixels corresponding to important elements of the co-occurrence matrix

TCR=0.1 TCR=I

30000

Q- 20000

1O000

0 0

20

40

60 Frame

80

100

120

number

Fig. 9. Bit-rates for two values of the temporal classification ratio. Vol. 4, No. 3, June 1992

192

V. Seferidis, M. Ghanbari / Applications of the temporal co-occurrence matrices

Fig. 10. Coded images with temporal classification ratios: (a) 1; (b) 0 1.

form the important pixels, whereas the rest are the non-important ones. Based on this selection the frame difference image is divided into two parts, one containing the important pixels and another the less important ones. The picture corresponding to the important pixels is subdivided into 8 x 8 pixel blocks and the luminance and chrominance signals of each block are DCT coded. Since interframe pixels are usually clustered, these blocks are mainly composed of important elements which can be efficiently coded by the DCT. The transform coefficients are zig-zig scanned, linearly quantized with a fixed step size, and two-dimensionally variable length coded for further compression. In our experiments the quantization step size was set to eight (qstep = 8) with Signal Processing: Image Communication

the input coefficients in the dynamic range of -2048 to 2047. Areas corresponding to the less important pixels are totally discarded. For higher quality pictures these areas can also be coded by a second coder, providing the second layer data for transmission over packet switched networks [4]. Since only the important pixels are employed in the prediction loop, the prediction image contains many high frequency imperfections which can be eliminated with a low pass filter applied at the end of the loop. The filter is the simple loop filter described in the CCITT Reference Model 8 [1]. Figure 9 shows the bit-rate profiles of the coded 'Miss America' sequence at 10 Hz rate, for two different values of TCR = 0.1 and 1. The figure shows that using only a small portion of the changed pixels in the prediction loop, reduces the total bit-rate by a factor of 10-25%. This is achieved without much noticeable degradation in the visual quality of the coded pictures. The signalto-noise ratio (SNR) is kept almost constant to 38 dB for all the frames of the sequence with both methods. Figure 10 shows the quality of coded images (frame 70), corresponding to the bit-rate profiles of Fig. 9. The quality of the coded image corresponding to T C R = 0 . 1 is close to that of TCR = 1 considering that the bit-rate is 25% less.

4. Conclusions

The concept of temporal co-occurrence matrices has been defined and some of their principal features were described. The simplicity and speed of calculation were outlined and possibilities for parallel processing implementations were discussed. Two examples of the applications of temporal cooccurrence matrices were demonstrated. For the first application an adaptive predictive/ transform coder at the rate of 64 Kbit/s was used. Based on the homogeneity criterion the coder allocates more bits to the areas of the picture with high detailed moving objects and less bits to the uniform areas. The effect of adaption is particularly evident for head and shoulders images in which the face,

1I. Seferidis, M. Ghanbari / Applications of the temporal co-occurrence matrices

containing highly

detailed

features

(e.g.

eyes,

m o u t h ) is c o d e d w i t h a finer q u a n t i z e r t h a n the rest o f the image. F o r the s e c o n d a p p l i c a t i o n a pixel classification t e c h n i q u e was p r o p o s e d . F o r the i m p l e m e n t a t i o n d e s c r i b e d in this p a p e r the pixels w i t h h i g h e s t i n t e n s i t y c h a n g e s were first s e p a r a t e d f r o m the f r a m e difference p i c t u r e a n d t h e n c o d e d w i t h a pred i c t i v e / D C T coder. U s i n g the t e m p o r a l c o - o c c u r rence m a t r i x the c o d e r s e p a r a t e s the pixels w i t h h i g h i n t e n s i t y c h a n g e s f r o m the f r a m e difference picture. T h e s e p a r a t e d areas are t h e n c o d e d for f u r t h e r c o m p r e s s i o n . T h e e x p e r i m e n t a l results s h o w t h a t it is p o s s i b l e to r e d u c e the b i t - r a t e b y 25% w i t h o u t i n t r o d u c i n g n o t i c e a b l e d e g r a d a t i o n to the c o d e d images.

References [1] CCITT SGXV Working Party XV, "Description of Reference Model 8 (RM 8)", Specialists Group on Coding for Visual Telephony, Doc. No. 525, June 1989. [2] R.W. Conners and C.A. Harlow, "A theoretical comparison of texture algorithms", IEEE Trans. Pattern Anal. Machine InteR., Vol. PAMI-2, No. 3, May 1980, pp. 204-222. [3] S. Ericsson, "Fixed and adaptive predictors for hybrid predictive/transform coding", IEEE Trans. Comm., Vol. 33, No. 12, December 1985, pp. 1291 1302. [4] M. Ghanbari, "Two-layer coding of video signals for VBR networks", IEEEJ. Sel. Areas Comm., Vol. 7, No. 5, June 1989, pp. 771 781. [5] M. Ghanbari and D.E. Pearson, "Components of bit-rate variation in videoconference signals", Electron. Lett., Vol. 25, No. 4, February 1989, pp. 285 286. [6] C.C. Gotlieb and H.E. Kreyszig, "Texture descriptors based on co-occurrence matrices", Comput. Vision Graph. Image Process., Vol. 51, 1990, pp. 70 86.

193

[7] R.M. Haralick, "Statistical and structural approaches to texture", Proc. IEEE, Vol. 67, No. 5, May 1979, pp. 786 804. [8] R.M. Haralick, K. Shanmugam and I. Dinstein, "Textural features for image classification", IEEE Trans. Systems Man Cybernet., Vol. SMC-3, No. 6, November 1973, pp. 610-621. [9] A.K. Jain Fundamentals of Digital Image Processing, Prentice Hall, Englewood Cliffs, NJ, 1989, Chapter 9, pp. 344-346. [10] B. Julesz, "Experiments in the visual perception of texture", Sci. Amer., Vol. 232, No. 4, 1975, pp. 2-11. [11] G. Karlsson and M. Vetterli, "Subband coding of video for packet networks", Optical Engrg., Vol. 27, No. 7, July 1988, pp. 574-586. [12] F. Kishino, K. Manabe, Y. Hayashi and H. Yasuda, "Variable bit-rate coding signals for ATM networks", IEEE J. Sel. Areas Comm., Vol. 7, No 5, June 1989, pp. 801 806. [13] D. Lewis, The Secret Language of Success, Guild Publishing, London, 1989, Chapter 16, pp. 210 214. [14] H.G. Musmann, M. H6tter and J. Ostermann, "Objectoriented analysis synthesis coding of moving images", Signal Processing: Image Communication, Vol. 1, No. 2, October 1989, pp. 117 138. [ 15] T. Pavlidis, Algorithms for graphics and image processing, Computer Science Press, Rockville, MD, 1982, Chapter 6, pp. 113-116. [16] D.E. Pearson, Transmission and Display of Pictorial Information, Pentech Press, London, 1975, Chapter 2, pp. 31-50. [17] V. Seferidis and M. Ghanbari, "Use of co-occurrence matrices in the temporal domain", Electron. Lett., Vol. 26, No. 15, July 1990, pp. 1116-1118. [18] S. Tanimoto and T. Pavlidis, "A hierarchical data structure for picture processing", Comput. Graph. Image Process., Vol. 4, 1975, pp. 104 119. [19] L. Van Gool, P. Dewaele and A. Oosterlinck, "Texture analysis anno 1983", Comput. Vision Graph. Image Process., Vol. 29, 1985. pp. 336 357.

Vol.4, No. 3, June 1992