Region-Based Coding of Color Images Using Karhunen–Loeve Transform

Region-Based Coding of Color Images Using Karhunen–Loeve Transform

GRAPHICAL MODELS AND IMAGE PROCESSING Vol. 59, No. 1, January, pp. 27–38, 1997 ARTICLE NO. IP960402 Region-Based Coding of Color Images Using Karhun...

2MB Sizes 0 Downloads 90 Views

GRAPHICAL MODELS AND IMAGE PROCESSING

Vol. 59, No. 1, January, pp. 27–38, 1997 ARTICLE NO. IP960402

Region-Based Coding of Color Images Using Karhunen–Loeve Transform DRAGANA CAREVIC* AND TERRY CAELLI† Department of Computer Science, Curtin University, GPO Box U 1987, Perth 6001, Australia Received October 10, 1995; revised June 20, 1996; accepted July 10, 1996

is coded separately using some standard technique that exploits the spatial correlation between pixels, such as differential pulse code modulation (DPCM) [25], transform coding [25, 30], and vector quantization [12], developed initially for encoding monochrome images. This approach is often restrictive, insofar as it fails to completely decorrelate the source tristimulus signals and the redundancy retained between the color planes usually affects the coding performance. Additionally, most of the standard algorithms are based entirely on information theory, and, therefore, are not adapted to particular characteristics of images; i.e., they do not take into account higher levels of interpretation of the scene [16, 21]. The approach to color image compression described in this paper differs from the standard techniques in a number of ways. We model the image as composed of structural features such as regions and contours, and use a specifically designed segmentation process to extract them. It is envisaged that, by exploiting these image features to encode images, low compression rates can be achieved, while retaining, at the same time, good coding fidelities [1]. The extracted image segments are expected to be statistically homogeneous with respect to color and texture, and the textural differences between the neighboring segments are expected to be significant. Each segment is independently modeled as a multivariate (3-D) stationary Gaussian random fields with respect to the color signals (vectors) R(x, y), G(x, y), and B(x, y) at each pixel (x, y), using the discrete Karhunen–Loeve transform. This model determines the types of ‘‘spatio-chromatic’’ correlations in each region and provides a set of orthogonal basis vectors which determine the major types of color gradients over each segment. The discrete Karhunen–Loeve (KL) transform (also called the Principle Components transform) is a linear orthonormal transform which optimally decorrelates image data, provided that the underlying distribution of the data is a zero-mean stationary Gaussian random process [31]. If the second order statistics of the source random process are known, eigenvectors can be determined and

In this paper we consider a number of extensions to current adaptive compression methods for color images. One, we use clustering or segmentation procedures to determine self-similar image regions. Two, for each such region we use a Karhunen– Loeve compression method to model the important spatiochromatic information. Three, we employ linear prediction to encode the resultant eigenimages. Finally, comparisons are made with current methods and improvements are demonstrated particularly for low bit-rate coding of color textured images such as those that occur in aerial photography.  1997 Academic Press

1. INTRODUCTION

Most image compression techniques obtain low bit-rate representations of images by removing redundancy which is inherent in the image data. This redundancy is typically related to the correlation between image pixels, and in color images it has two sources: the spatial and the spectral correlations. The spatial (within-band) correlation exists among spatially adjacent pixels in the same color band, while the spectral (between-band) redundancy is represented by the correlation among pixels that have approximately the same spatial location, but belong to adjacent spectral ranges. Standard color image compression techniques usually treat the spatial and spectral correlations in color images separately. The source (original) tristimulus signals red (R(x, y)), green (G(x, y)), and blue (B(x, y)) at each pixel in the image are first transformed to a new space through some linear or nonlinear invertible coordinate-conversion process (e.q., Karhunen–Loeve transform [25, 30], NTSC YIQ and YUV systems [25], or homomorphic models [8]). The objective is to produce three new, spectrally uncorrelated planes of data (e.g., color signals). Each color plane * E-mail: [email protected]. † This project was funded by a grant from the Australian Research Council. Requests for reprints should be sent to Professor Terry Caelli, E-mail: [email protected]. 27

1077-3169/97 $25.00 Copyright  1997 by Academic Press All rights of reproduction in any form reserved.

28

CAREVIC AND CAELLI

used as the basis vectors for the KL expansion. As a result, a set of decorrelated coefficients is obtained, and the reduction in bit-rate is achieved by transmitting only lowerorder coefficients, i.e., those corresponding to the larger eigenvalues. The optimality of the KL transform is reflected in the fact that for a given total number of bits, the error incurred is minimized in the mean-square sense. This transform also has the highest energy compaction comparing to the deterministic transforms, such as Discrete Cosine Transform (DCT), Fourier transform, and Hadamard transform. The KL transform has found numerous applications for encoding monochrome and color images [7, 30, 31, pp. 126–131, 33], as well as for coding, enhancement, and restoration of multispectral imagery [15, 24, 34]. In many cases of processing color or, in general, multichannel images, however, the within-band and between-band correlations were treated separately, and the KL transform was often computed under the assumption of image stationarity, i.e., without taking into account the fact that the images are inherently nonstationary [14]. For these reasons the optimal performance of the KL transform was of limited practical value. We mention that there exists several techniques for the restoration of multichannel images based on the methods other than the KL transform, which utilize both between-band and within-band correlations and process the image channels simultaneously as a single entity. Such techniques apply least-squares methods [9, 11], Kalman filtering [10], and processing of the multichannel images in the frequency domain [19]. The segmentation-based approaches to image compression provide a framework for dealing with the inherent nonstationarity of image data. These techniques also enable the higher levels of scene interpretation to be directly incorporated in the compression process. Several researchers have investigated the segmentation-based approaches to compression of both monochrome and color images [6, 20, 22, 23, 38] and very good results have been reported. Kwon and Chellappa [23] applied a merge-and-threshold method to decompose the monochrome image and differentiated the following image features: contours, uniform regions, and textured regions. The extracted textured regions were modeled as Gaussian Markov Random Fields (GMRF’s) and were constructed using texture synthesis technique based on this model. The parameters of the GMRF’s were computed using the least-squares estimation technique. The uniform image regions were encoded using nth order polynomials, and an arithmetic coding-based technique was employed to encode the contours. GMRF’s have been used for modeling the spatial statistical interdependencies between neighboring components of 2-D textures in many other applications [3, 4, 26, 27, 37]. This model can also be extended to three dimensions and used to represent 3-D textures, but the use of

techniques for parameter estimation in this case would require additional constraints. For this reason, we applied a simpler approach to model the color textures in this paper by which we incorporate an implicit GMRF assumption. That is, under the assumed texture model the color components within a given small 3-D neighborhood can be treated as being jointly Gaussian distributed. This distribution is approximately stationary within each given image segment—due to the segmentation process. Since the KL transform performs optimally under the stationarity condition, we then apply this transform to decorrelate color image components R(x, y), G(x, y), and B(x, y) (spatiochromatic information) within small 3-D image blocks, where a separate optimal KL transform is designed for each extracted segment. The processing steps of the proposed scheme for regionbased coding of color images is shown in block diagram in Fig. 1. After segmentation and separate encoding of the image segments using the KL transform, the remaining spatial correlations between the transform coefficients are removed using linear predictive coding (LPC). The prediction error is further encoded as a stream of code words

FIG. 1. Block diagram of the proposed region-based method for encoding color images.

29

REGION-BASED CODING OF COLOR IMAGES

that take values from a four-letter alphabet. The code is designed on the joint probability of occurrences of similar quantization bin indices over several eigenimages at the particular spatial location. The resulting stream of code words is compressed using adaptive multilevel arithmetic coding. This coding algorithm also distinguishes between region interiors and borders. The border pixels of all regions in the image are extracted and grouped together to form a separate segment. The functional values of such ‘‘border’’ segments are encoded using the above procedure (i.e., the same procedure for encoding region interiors) while the positions of the border pixels are coded separately (and are also used to store the position and shape of the image regions). In Section 2 we describe, in detail, the procedure for segmenting color images. Section 3 then presents the optimal KL transform encoding technique applied to coding each of the segments separately. The LPC scheme for encoding the transform coefficients is discussed in Section 4, and Section 5 describes the method used for encoding the prediction error. Finally, Section 6 presents practical implementation of the proposed coding method and the results obtained from the computer simulation and comparisons to other methods. 2. SEGMENTATION OF COLOR IMAGES

To extract homogeneous regions in color images a segmentation procedure based on the clustering of 8-component feature vectors Kxy is applied, where, Kxy 5 [aI1n (x, y), aI2n (x, y), aI3n (x, y), bsRn (x, y), bsGn (x, y), bsBn (x, y), cxn, cyn ].

(1)

The first three components of the vector Kxy are calculated as the linear combinations of the original tristimulus color signals R(x, y), G(x, y), and B(x, y) at the pixel (x, y) [29], I1(x, y) 5 (R(x, y) 1 G(x, y) 1 B(x, y))/3

(2)

I2(x, y) 5 (R(x, y) 2 B(x, y))/2

(3)

I3(x, y) 5 (2G(x, y) 2 R(x, y) 2 B(x, y))/4.

(4)

The next three components s R(x, y), sG(x, y), and sB(x, y) are the variances of the tristimulus signals R(x, y), G(x, y), G(x, y), and B(x, y) calculated in a 3 3 3 window, while the last two components correspond to the actual image point coordinates: x and y. The variances s R(x, y), s G(x, y), and s B(x, y) are included in the feature vector to provide some elementary information about the image second order statistics to the clustering

algorithm, and the coordinates provide information about spatial vicinity between the feature vectors, forcing the clusters to be spatially contiguous. To define and control the scale of each of the feature vector components, they are weighted by scaling factors denoted as a, b, and c (.0) from (1). By adjusting the factors a, b, and c the results of clustering can be much improved. At present this adjustment has been done empirically. The subscript n used in (1) denotes the use of z-scores: each variable being normalized to zero-mean and unit variance, as, for example I1n (x, y) 5

I1(x, y) 2 I¯1 . s I1

To cluster the feature vectors Kxy we have used the CLUSTER algorithm presented in [18]. The algorithm iterates through two phases, with the first phase being the K-means (minimax) pass which creates a sequence of partitions containing 2, 3, . . . , L clusters, where L is specified by the user. In the second phase another set of partitions is created by merging existing clusters two at a time to see if better clustering can be obtained. The iterations continue until a square-error criterion is minimized. The complexity of CLUSTER is O( pL), where p is the number of feature vectors to be clustered. To improve the execution time, the number of feature vectors can be reduced by sampling every rth row and cth column in the image [13]. The remaining pixels are assigned to the cluster with the closest cluster center. Values of parameters r and c depend on image size, and we set r 5 c 5 20 for an original image size of 512 3 512. We have used the method proposed by Hoffman and Jain [13] for finding the ‘‘best’’ clustering. For each cluster i, the method computes the average within-cluster interpoint distance D(i), D(i) 5

O

1 d(x, c(i)), uGiu x[Gi

(5)

where d represents Euclidian distance, c(i) is the centroid of cluster i, and Gi is the set of points belonging to cluster i. Using (5) the statistics M(i) which indicates isolation and compactness of cluster i, is defined as M(i) 5

minj : j ?i d 2(c(i), c( j )) . D(i)

(6)

An overall merit function Mave is then calculated as a weighted average of M(i)’s, where each M(i) is weighted by the number of pixels in its cluster. Larger values of Mave indicate more acceptable clustering, but the problem in practical applications is that Mave achieves the absolute maximum for a very small number of clusters, i.e., 2 or 3.

30

CAREVIC AND CAELLI

For this reason the best clustering is chosen as the one for which Mave achieves the local maximum for the largest number of clusters within the cluster range. The image segments resulting from the above procedure can have any number of pixels. However, to reduce the computational complexity of our coding algorithm, we do not encode separately image segments that are too small. That is, image segments with less than a specified mean feature vectors, obtained as by-products of the above clustering process. The resulting segmented image is then divided into contiguous nonoverlapping n 3 n square blocks (n 5 4 for our examples). All blocks that are composed of pixels belonging to two or more different segments (i.e., the border pixels) are called ‘‘border’’ blocks and are grouped together to form a separate border segment, as shown in Fig. 2b. All other segments in the image contain n 3 n blocks in which all pixels belong exclusively to that segment. In the rest of the paper we refer to such segments as ‘‘interior’’ segments or regions. 3. REGION-BASED KL TRANSFORM OF COLOR IMAGES

O M

(7)

where Mi corresponds to the number of blocks in the ith segment. Eigenvalues of the covariance matrix Ri are calculated from Ri eik 5 lik eik ,

(8)

e Tik eil 5 0

(9)

e Tik eik 5 1.

(10)

Subsequently, the eigenvectors are arranged such that li1 $ li2 $ ? ? ? $ lid and the first Li eigenvectors, where Li # d, are used to form the transform matrix Ai 5 [ei1 ei2 . . . eiLi]T. Each vector Xij from the ith image segment is then processed by the linear transform Yij 5 Ai Xij j 5 1, . . . , Mi .

(11)

The components of the vectors Yij are completely uncorrelated, i.e.,

E[YijYTij ] 5

The aim of the segmentation process was to partition the original color image into a number of internally homogeneous regions so that techniques based on the assumption of signal stationarity could be used. At the next step all segments are separately spatio-chromatically decorrelated using the optimal discrete KL transform. That is, the mean values of the tristimulus signals R(x, y), G(x, y), and B(x, y) within segments are computed and subtracted from these signals at each segment coordinate (x, y). Each nonoverlapping n 3 n block in the ith (mean-subtracted) segment, i 5 1, . . . , N, is then mapped to a d-dimensional vector Xij , where d 5 3n2, by sequentially scanning each location in the block and taking its corresponding tricomponent color signals. Here N denotes the total number of segments to be encoded. Since the segments are implicitly modeled as separate multivariate Gaussian Markov random fields, the sets of vectors Xij created in this way are assumed to have jointly zero-mean Gaussian distributions. The d 3 d covariance matrix Ri of each such stationary zero-mean Gaussian source, where i 5 1, . . . , N, is now estimated as 1 i XijXTij , Ri 5 Mi j51

where eik is the kth eigenvector (k 5 1, . . . , p) of the ith segment and lik is its corresponding eigenvalue. The eigenvectors calculated from (8) are orthonormal, i.e.,

3

li1

0

???

0

0

li2 ? ? ?

0

?

?

???

?

?

?

???

?

0

0

? ? ? liLi

4

(12)

The number of the components of vectors Yij , Li is chosen such that Li 5 arg max hlik : lik # Di j.

(13)

k

The parameter Di in (13) represents the allowed distortion within a segment and we define the value of this parameter for each of the image segments separately. In order to set the desired level of ‘‘detail’’ in the encoding procedure we have used the following model. Analogous to the Fourier power spectrum properties of images we can characterize the image detail for color and spatial resolution (local contrast variations) in a given region to be inversely proportional to the exponential rate of decay of the region-specific eigenvalues. We therefore fit an exponential function Ki exp(2ti x) to the eigenvalues of each individual segment (with all eigenvalues less than 0.05 li1 approximated by zero) and use the inverse of the estimated exponential constant 1/ ti as the measure of the spatiochromatic detail of the segment. Parameters 1/ ti typically take values between 0.3 and 3.0, and are used in defining the allowed distortions of segments as Di 5 ci3 exp(c1 / ti 2 c2)D.

(14)

Here, c1 and c2 are positive constants, which we set to

FIG. 2. (a) Aerial Photography (spatial resolution 700 3 700, 24 bits/pixel). (b) The resulting segmentation of this image with block size n 5 4. 31

32

CAREVIC AND CAELLI

c1 5 0.142 and c2 5 0.203. The detail parameter D is defined by the user, while the constant ci3 is chosen according to the individual characteristics of each segment, and we set its value to 1.0 for the ‘‘region’’ segments and to 0.75 for the ‘‘border’’ segment. Since color images may often contain two or more spatially distinct regions with the same texture, it is desirable to process such segments using the same set of basis vectors. For this reason, prior to computing the region-specific eigenvectors, the covariance matrices of the separate image segments are mutually compared (using Euclidian distances) and merged together if similar. In this way, one covariance matrix common for each group of similarly textured segments is obtained and used to compute the basis vectors for the KL expansion as explained above. Having encoded all image segments, the resulting KL transform coefficients (i.e., the components of the transform vectors Yij ) are arranged in ‘‘eigenimages’’ in such a way that the region-based alignment of the original image is retained. These eigenimages, denoted by Vk , form a tridimensional structure with all coefficients computed from n 3 n block samples within a region ordered according to the decreasing values of their respective regionspecific eigenvalues. In this structure the number of the relevant eigenimages corresponding to the ith image segment, i 5 1, . . . , N, is Li . The encoding process now reduces to encoding this region-based hierarchy of eigenimages. In addition, region locations and descriptions are also encoded, and this has been implemented by using the quadtree method [5]. 4. LINEAR PREDICTIVE CODING OF EIGENIMAGES

The KL transform decorrelates the nonoverlapping n 3 n blocks of different image segments. However, neighboring n 3 n blocks of the original color images will typically be very similar and, consequently, some amount of correlation will also exist between spatially adjacent pixels in eigenimages. To remove such correlations a simple firstorder linear predictive model is applied where the current pixel is predicted only by the previously encoded neighboring pixel in the same row. In this, a separate predictor is designed for each eigenimage within each of the image segments. For the kth eigenimage Vk and the ith segment, the linear prediction of the current pixel at the coordinate ( p, q) is defined as «(k, p, q) 5 y(k, p, q) 2 aik y˜ (k, p 2 1, q)

(15)

y˜ (k, p, q) 5 aik y˜ (k, p 2 1, q) 1 Qi [«(k, p, q)],

(16)

where «(k, p, q) is the corresponding prediction error or difference signal, Qi [ ] defines the quantization function and aik is the prediction coefficient which is calculated

according to the method presented in [28]. In order to encode pixels from the border segment, they are first conveniently represented as one-dimensional signals. To quantize the difference signal «(k, p, q) a separate optimal uniform quantizer is assigned to each segment. The quantization step di for the ith segment is computed using the allowed distortions Di (defined by (13)) di 5 Ï12Di (for more details on the design the optimal uniform quantizer see [17, p. 103]). The quantization function applied for the ith segment is then simply defined by Qi[«(k, p, q)] 5 k(k, p, q)di ,

(17)

where k(k, p, q) is the respective quantization bin index (QBI) assigned to the difference signal «(k, p, q) as

k(k, p, q) 5

F

G

«(k, p, q) 11 di

(18)

with [x] representing the largest integer # x. 5. ENCODING THE QUANTIZATION BIN INDICES (QBIs)

The QBIs k(k, p, q) obtained as the result of quantizing the prediction error «(k, p, q) in all eigenimages in the hierarchy are first encoded as one stream of code words that take values from a four-letter alphabet. In this coding procedure, the QBIs k(k, p, q) are compared (with respect to their magnitudes) to a set of thresholds, where the thresholds are ordered in decreasing order. The greatest threshold T0 is chosen to be equal to the maximum (absolute) value of the QBIs in the hierarchy and the remaining thresholds from the set are then computed from T0 according to Tm 5 Tm21 2 1 and for Tm $ 1. A QBI k(k, p, q) is ‘‘active’’ with respect to a threshold Tm if its magnitude uk(k, p, q)u is equal or greater than the threshold. If, however, the magnitude of the index is less then the threshold, the index is termed ‘‘inactive.’’ The code utilizes the notion that, conditioned upon the occurrence of an inactive QBI k(k, p, q) in the eigenimage Vk with respect to the threshold Tm , it is also, to a certain degree, probable that the QBIs at the same spatial location ( p, q) in all eigenimages Vl for which k , l , Li will also be jointly inactive. This assumption is based on the eigenimages being ordered according to decreasing eigenvalues (representing the variances of the KL transform coefficients), with the greater values of the coefficients expected to occur in the topmost eigenimages and their magnitudes deminishing when we proceed toward the bottom of the hierarchy. We remind the reader that Li represents the number of relevant eigenimages at the spatial

REGION-BASED CODING OF COLOR IMAGES

location ( p, q), where i indexes the image segment which is used to generate the transform coefficients at this location. The coding procedure consists of M passes, where M is the number of the thresholds in this set. At each coding pass the QBIs in all eigenimages, starting from the topmost eigenimage, V0 are compared to one particular threshold from the set (in the raster scan fashion). The thresholds are ordered such that the one used in the previous pass is always greater than the threshold in the current pass. When in the mth coding pass an inactive index k(k, p, q) is detected in the eigenimage Vk , the activities of all quantization bin indices k(l, p, q) in eigenimages Vl at the spatial location ( p, q) and for k , l , Li are checked. Whenever all such descendants of an inactive root index k(k, p, q) are also jointly inactive with respect to the threshold Tm , the whole vertical string of indices k(l, p, q), k # l , Li at the location ( p, q) is encoded using the code word ‘‘inactive index string’’ (‘‘IIS’’). In the current pass all such indices are considered encoded and are not compared to the same threshold any more. Consequently, a list of coordinates of all the indices encoded by ‘‘IIS’’ is kept throughout each coding pass. If an inactive index k(k, p, q), however, has at least one active descendant k(l, p, q) in the lth eigenimage, k , l , Li , only this current index is encoded by the code word ‘‘inactive index’’ (‘‘II’’). In the cases when the current QBI is found to be active, i.e., when its magnitude is equal to the threshold value in the current coding pass, its sign is checked, and, depending on this, the index is encoded by either the code word ‘‘active positive’’ (‘‘AP’’) or ‘‘active negative’’ (‘‘AN’’). The coordinates of the index are appended to the second list, which is maintained and updated throughout the whole encoding process. Simultaneously, this current QBI is set to zero and this value is kept in the subsequent coding passes. The resulting stream of code words (‘‘IIS,’’ ‘‘II,’’ ‘‘AP,’’ and ‘‘AN’’) is entropy coded using the adaptive arithmetic coding, which provides an effective mechanism for removing redundancy of the code words, and utilizes the standard arithmetic coder [36]. This coder is based on an adaptive model by which the frequencies of symbols are updated each time a new symbol is presented. Initially, all the frequencies are set to the same value (e.g., one) reflecting the fact that the coder has no previous knowledge. As the coding process proceeds, a more accurate estimation of the data distribution in the form of a histogram is obtained at each coding step. In our application the data model is initialized at the beginning of each coding pass, i.e., whenever a new threshold value is applied. The advantages of using a small encoder alphabet has been discussed in [32]. This technique for coding the QBIs of the prediction error within the hierarchy of eigenimages bears significant similarity to the embedded coder of the image wavelet transform coefficients described by Shapiro [32]. However, by the coding method described in [32] transmitting the

33

coefficients and the corresponding decoding process can be stopped whenever a predefined bit rate or the required image quality is achieved. This flexibility in our coding method is not possible due to the fact that LPC is applied. That is, to reconstruct the LPC coded signal, all the LPC error values are required by the decoder simultaneously. However, this method enables low bit rates encoding of the QBIs and is easy to implement. 6. EXPERIMENTAL RESULTS

The coding algorithm was implemented in the C programming language on a Silicon Graphics Challenge M with four R40400SC CPUs. In our experiments we have used several RGB images with 24-bit color resolution. The overall bit rate for encoding a color image by the proposed method was calculated as the sum of the bit rate for encoding the quantized prediction error and the bit rates for encoding the side information. The side information contains the mean-values of the tristimulus signals R(x, y), G(x, y), and B(x, y) within segments, the segment KL transform basis vectors (i.e., the engenvectors) and the parameters of the LPC for the hierarchy of eigenimages within each separate segment. The shape and position of the segments, represented by the position of the n 3 n blocks in the border segment is encoded using the quadtree code. Additionally, the quantization step di , along with the number of the eigenimages within each segment and the initial (maximum) threshold value used for encoding the QBIs need also to be available to the decoder. The LPC parameters are quantized to 8-bit precision, as with the mean-values of the tristimulus signals. The eigenvectors, however, can be quantized more crudely without seriously affecting the image quality, and we quantize them to 5 bits. The quantization of the basis vectors takes place before the KL transform of the image segments and to exploit the redundancy of this set of vectors, they are encoded using the standard arithmetic coder [36]. To illustrate the performance of the proposed regionbased image coder we have used two different test images with 8-bits resolution per color component (total 24 bits/ pixel). The test images were the 700 3 700 Aerial Photography image shown in Fig. 2a and the 512 3 512 image Mandrill shown in Fig. 4a. Examples of the coded versions of the test images using the proposed region-based image coder are shown in Figs. 3a and b and Fig. 4b and c, respectively. In coding the test images the size of the nonoverlapping image blocks was set to 4 3 4. In order to evaluate the coding performance of the proposed region-based compression scheme and to compare it with other coding procedures, we have used two different coding fidelity measures. The first one was the commonly used mean-square error (MSE) which is separately calcu-

FIG. 3. Encoded version of the Aerial Photography image 0.91 bits/pixel (original shown in Fig. 2a). (b) Encoded version of this image 0.61 bits/pixel. 34

REGION-BASED CODING OF COLOR IMAGES

35

FIG. 4. (a) Original color image Mandrill, spatial resolution 512 3 512, 24 bits/pixel. (b) Encoded version of this image 0.83 bits/pixel. (c) Encoded version of this image 0.54 bits/pixel.

lated for each of the tristimulus color components R(x, y), G(x, y), and B(x, y) as MSEi 5

1 N2

O ( f (x, y) 2 ˆf (x, y)) , i

i

2

(19)

x, y

where fi (x, y) represents the original color component im-

age, ˆfi (x, y) is the reconstruction, and N 2 is the number of pixels in the image. The total mean-square error MSE is computed as the sum of mean-square errors of the component images. Although MSE is an objective measure which enables the comparison of coding fidelity among different schemes it does not indicate how well perceptually important struc-

36

CAREVIC AND CAELLI

TABLE 1 Coding Performance of Region-Based Image Coding Algorithm (RBC) MSE Bit rate (bits/pixel)

0.91 0.61 0.83 0.54

RBC

JPEG

image Mandrill and Fig. 5b is the edge stability map computed for the component R(x, y) of the encoded image Mandrill at 0.54 bits/pixel (encoded image shown in Fig. 4b). For comparison, Fig. 5c shows the results of the edge stability map computed from the R(x, y) component image of the image Mandrill coded by JPEG with 0.54 bits/pixel.

11.67 14.62

1. CONCLUSION

EMSE JPEG

RBC

Test image: Aerial photography 349.75 432.60 10.47 508.04 581.84 13.33 Test image: Mandrill 789.52 919.72 14.22 1008.54 1170.15 15.81

15.09 16.67

tures in the image, such as edges, lines, and corners are encoded. For this reason, in addition to MSE we have used a measure of ‘‘spatial stability’’ of edges in the image. This measure of coding fidelity involves computing the MSE between ‘‘edge stability’’ maps of the original and the reconstructed color component images, which we denote as ˆ i (x, y), respectively. Edge stability maps are Qi (x, y) and Q calculated by combining zero-crossings of a number of different band-pass images (resulting from filtering the image by different band-pass isotropic =2G filters) into a single gray value corresponding to the evidence for ‘‘edges’’ over all scales. Accordingly, the fidelity measure for each color component image based on the edge stability maps is calculated as EMSEi 5

1 N2

O (Q (x, y) 2 Qˆ (x, y)) . i

i

2

(20)

x, y

and the total measure, EMSE, is obtained by adding together the measures computed for each respective tristimulus color component. An outline of the method for computing image edge stability map is presented in the Appendix (see Bischof and Caelli [2] for more details). We have compared the proposed method with the JPEG image compression standard [35] with respect to both of the above-defined coding fidelity measures. The resulting MSE and EMSE of the test images for the proposed regionbased coder (RBC) and for the JPEG compression standard as functions of the bit rate are shown in Table 1. From these results it can be concluded that, for the test images, the proposed method attains smaller values of MSE than JPEG, while remaining the comparable perceived quality of coded images, as measured by the EMSE, with the performance of the RBC having distinctly better characteristics for the low bit-rate encoding. Examples of edge stability maps which are generated for the original test images and for their compressed versions in order to compute the coding fidelity measure EMSE are shown in Fig. 5. Here, Fig. 5a shows the edge stability map of the color component R(x, y) of original

In this paper we have considered a new region-based method for encoding color images. The novelty of the method lies in modeling the images as composed of the structural features of regions and borders. The proposed compression algorithm includes the extraction of the image features by a specifically designed segmentation technique and feature encoding using the optimal discrete Karhunen–Loeve transform. An important, though often overlooked property of color images is that their color components are correlated both spatially and spectrally. In contrast to the standard waveform coding techniques which treat the spectral and the spatial correlations separately, therefore failing to completely decorrelate the tristimulus color signals, the discrete Karhunen–Loeve transform is applied to spatio-chromatically decorrelate the original color signals with respect to the nonoverlapping n 3 n image blocks. The remaining spatial correlation that exists between the transform coefficients is further exploited by an adaptive linear predictive coding scheme and the prediction error of the LPC is progressively coded as a stream of code words which take values from a four-letter alphabet. This code is based on the joint probabilities of the occurrences of particular values of indices over several consecutive eigenimages as they are compared to a set of thresholds. This code provides substantial coding gains over the standard techniques for encoding the transform coefficients and is easily and efficiently compressed using the adaptive arithmetic coder. We used two aspects of measuring the coding efficiency for comparison with the JPEG image coding standard. The first involves the comparison of image quality measured by the average difference between input and compressed versions. The second involves using edge maps to evaluate comparison between original and compressed versions with a direct least-square error measure. Using these two measures of performance evaluation as a function of the compression rate we have shown that the performance of the proposed region-based method offers improved performance compared to the JPEG compression standard. However this method also provides a representation for images in terms of encoded region and boundary features and, as such, can be used for other purposes such as recognition, detection, and region-specific analysis.

37

REGION-BASED CODING OF COLOR IMAGES

FIG. 5. (a) Edge stability map computed for the color component image R(x, y) of the original color image Mandrill. (b) Edge stability map computed for the color component image R(x, y) of the coded image Mandrill using the proposed region-based coder at 0.54 bits/pixel (this coded image is shown in Fig. 4c). (c) Edge stability map computed for the color component image R(x, y) of the original image Mandrill coded by the JPEG at 0.54 bits/pixel.

APPENDIX

Calculating Edge Stability Maps As described in [2], the image f (x, y) is first filtered by a digital approximation of the Laplacian of Gaussian filter = 2G(x, y, s)

= 2G(x, y, s) 5

S

D S D

r 2 2 2s 2 2r 2 exp , r 2 5 x 2 1 y 2. 6 2f s 2s 2 (21)

Filter parameter s is varied in increments of Af octave, starting from s 5 1.0, resulting in 17 scale space images

38

CAREVIC AND CAELLI

or ‘‘slices.’’ Zero-crossings are then located in each scale space slice,

5

15.

1 if there exists a zero-crossing

Z(x, y, s ) 5

at the position (x, y) for filter size s,

17.

0 otherwise. (22)

For the sequence of such obtained ‘‘zero-crossing slices’’ Z(x, y, s1 ), . . . Z(x, y, sn ), the spatial stability index Q(x, y) corresponds to the length l of the longest subsequence Z(x, y, si ) . . . Z(x, y, si1l21 ) such that

p

si#s 9#s i1l21

Z(x, y, s 9 ) 5 1.

16.

(23)

18. 19.

20.

21.

REFERENCES

22.

1. K. Aizawa and T. S. Huang, Model-based image coding: Advanced video coding techniques for very low bit-rate applications, Proc. IEEE 83(2), 1995, 259–271.

23. 24.

2. W. F. Bischof and T. Caelli, Parsing scale-space and spatial stability analysis, Comput. Vision Graphics Image Process. 42, 1988, 192–205. 3. R. Chellappa, S. Chatterjee, and R. Bagdazian, Texture synthesis and compression using Gaussian-Markov random field models, IEEE Trans. Syst. Man Cynern. 15(2), 1985, 289–303. 4. F. S. Cohen, Z. Fan, and M. A. Patel, Classification of rotated and scaled textured images using Gaussian-Markov random field models, IEEE Trans. Pattern Anal. Machine Intell. 13(2), 1991, 192–202. 5. L. S. Davis, Two-dimensional shape representation, in Handbook of Pattern Recognition and Image Processing (T. Y. Young and K-S. Fu, Eds.), Academic Press, San Diego, 1986. 6. F. G. B. DeNatale, G. S. Desoli, and D. D. Gusto, Segmentationbase hybrid-coding of color images, in Proc. of the 1991 International Conference on Acoustics, Speech and Signal Process. - ICASSP 1991, pp. 2757–2760.

25.

26.

27.

28.

29.

7. R. D. Dony and S. Haykin, Optimally adaptive transform coding, IEEE Trans. Image Process. 4(10), 1995, 1358–1370.

30.

8. O. D. Fuergas, Digital color image processing within the framework of a human visual model, IEEE Trans. Acoust. Speech Signal Process. 27(4), 1979, 380–393.

31.

9. N. P. Galatsanos and R. T. Chin, Digital restoration of multichannel images, IEEE Trans. Acoust. Speech Sig. Process 37(3), 1989, 415–421.

32. 33.

10. N. P. Galatsanos and R. T. Chin, Restoration of color images by multichannel Kalman filtering, IEEE Trans. Signal Process. 39(10), 1991, 2237–2252.

34.

11. N. P. Galatsanos, A. K. Katsaggelos, R. T. Chin, and A. D. Hillery, Least squares restoration of multichannel images, IEEE Trans. Signal Process. 39(10), 1991, 2222–2236.

35. 36.

12. H.-M. Hang and B. G. Haskell, Interpolative vector quantization of color images, IEEE Trans. Commun. 36(4), 1987, 465–470.

37.

13. R. Hoffman and A. K. Jain, Segmentation and classification of range images, IEEE Trans. Pattern Anal. Mach. Intell. 9(5), 1987, 608–620.

38.

14. B. R. Hunt and T. M. Cannon, Nonstationary assumptions for

Gaussian models of images IEEE Trans. Syst. Man Cybern. Dec. 1976, 876–882. B. R. Hunt and O. Kubler, Karhunen–Loeve multispectral image restoration, Part I: Theory, IEEE Trans. Acoust. Speech Image Process. 32(3), 1984, 591–599. A. Ikonomopoulos and M. Kunt, High compression image coding via directional filtering, Signal Process. 8, 1985, 179–203. A. K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, Englewood Cliffs, NJ, 1986. A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, PrenticeHall, Englewood Cliffs, NJ, 1988. A. K. Katsaggelos, K. T. Lay, and N. P. Galatsanos, A general framework for frequency domain multi-channel signal processing, IEEE Trans. Image Process. 2(3), 1993, 417–420. M. Kocher and M. Kunt, Image data compression by contour-texture modeling, in Proc. SPIE Int. Conf. on the Applications of Digital Image Process., Geneva, April 1983, pp. 131–139. M. Kunt, M. Benard, and R. Leonardi, Recent results in high-compression image coding, IEEE Trans. Circuits and Syst. 34(11), 1987, 1306–1336. M. Kunt, A. Ikonomopoulos, and M. Kocher, Second-generation image-coding techniques, Proc. IEEE 73(4), 1985, 549–574. O. J. Kwon and R. Chellappa, Region-based image segmentation, Opt. Eng. 32, 1993, 1581–1587. J.-S. Lee and K. Hoppel, Principal component transformation of multifrequency polarimetric SAR imagery, IEEE Trans. Geosci. Remote Sensing 30(4), 1992, 686–696. J. O. Limb, C. B. Rubinstein, and J. E. Thompson, Digital coding of color video signals—A review, IEEE Trans. Commun. 25(11), 1977, 1349–1385. B. S. Manjunath and R. Chellappa, Unsupervised texture segmentation using Markov random field models, IEEE Trans. Pattern Anal. Mach. Intell. 13(5), 1991, 478–482. B. S. Manjunath, T. Simchony, and R. Chellappa, Stochastic and deterministic networks for texture segmentation, IEEE Trans. Acoust. Speech Signal Process. 38(6), 1990, 1039–1049. P. A. Maragos, R. W. Schafer, and R. M. Mersereau, Two-dimensional linear prediction and its application to adaptive coding of images, IEEE Trans. Acoust. Speech Signal Process. 32(6), 1984, 1213–1227. Y. Ohta, T. Kanade, and T. Sakai, Color information for region segmentation, Comput. Graphics Image Process. 13, 1980, 222–241. W. K. Pratt, Spatial transform coding of color images, IEEE Trans. Commun. Technol. 19(6), 1971, 980–992. A. Rosenfeld and A. Kak, Digital Image Processing, Academic Press, Orlando, FL, 1982. J. M. Shapiro, Embedded image coding using zerotrees of wavelet coefficients, IEEE Trans. Signal Process. 41(12), 1993, 3445–3462. M. Tasto and P. A. Wintz, Image coding by adaptive block quantization, IEEE Trans. Commun. Technol. 19(6), 1971, 957–971. D. Tretter and C. A. Bouman, Optimal transforms for multispectral and multilayer image coding, IEEE Trans. Image Process. 4(3), 1995, 269–308. G. K. Wallace, The JPEG still picture compression standard, Commun. ACM 34(4), 1991. I. H. Witten, R. Neal, and J. G. Cleary, Arithmetic coding for data compression, Commun. ACM 30, 1987, 520–540. W. Woods, Two-dimensional discrete Markovian fields, IEEE Trans. Inf. Theory 40, 1982, 232–240. X. Wu, Image coding by adaptive tree-structured segmentation, IEEE Trans. Inf. Theory 38(6), 1992, 1755–1766.