Adaptive low bit rate facial feature enhanced residual image coding method using SPIHT for compressing personal ID images

Adaptive low bit rate facial feature enhanced residual image coding method using SPIHT for compressing personal ID images

Int. J. Electron. Commun. (AEÜ) 65 (2011) 589–594 Contents lists available at ScienceDirect International Journal of Electronics and Communications ...

843KB Sizes 0 Downloads 20 Views

Int. J. Electron. Commun. (AEÜ) 65 (2011) 589–594

Contents lists available at ScienceDirect

International Journal of Electronics and Communications (AEÜ) journal homepage: www.elsevier.de/aeue

LETTER

Adaptive low bit rate facial feature enhanced residual image coding method using SPIHT for compressing personal ID images K. Somasundaram a,∗ , N. Palaniappan b a b

Image Processing Lab, Department of Computer Science and Applications, The Gandhigram Rural Institute – Deemed University, Gandhigram 624 302, India Computer Centre, The Gandhigram Rural Institute – Deemed University, Gandhigram 624 302, India

a r t i c l e

i n f o

Article history: Received 10 December 2009 Accepted 28 July 2010 Keywords: Personal ID image Facial feature Residual image

a b s t r a c t Personal identification (ID) image of a person is the main source of authentication in security systems. To retrieve an ID image quickly from the server, a very small sized image file is required. This is achieved by lossy compression of the image. In this paper we propose a method for compressing an ID image below 1 KB that gives good visual quality of the facial features. Discrete Wavelet Transform (DWT) based on CDF 9/7 wavelet filter is used to transform the image. The image is encoded with SPIHT coding at a given rate of bpp. Maxshift method is applied to the residual matrix for the facial features and encoded with SPIHT at another rate of bpp. Both the encoded bit streams are sent to the client. At the client side both the bit streams are decoded and applied with inverse wavelet transformation separately. The images are added to get the reconstructed image. Experimental results show that this method gives better visual quality of the face than that of SPIHT at bpp < 0.1. © 2010 Elsevier GmbH. All rights reserved.

1. Introduction Identification of personnel in an organization and authenticate them to access the vital resources are part of the security system. Authentication systems make use of some features of the person like personal ID image, finger print, iris print, etc. One of the most commonly used and reliable features is the personal ID image. To store large number of ID images, the images are compressed to save storage cost. Generally compression is done for the whole image at uniform bit rate. Instead of compressing the whole image at the same bit rate, the facial features may be given more importance than other areas. Zhao et al. [1] observe that the facial features eyes, nose and mouth are used to recognize the faces in still images. Shih et al. [2] proposed an algorithm for extracting faces and facial features from color images using eyes and mouth as area of interest. In this paper we propose a low bit rate, low complexity facial image compression algorithm that gives better visual quality without using any training images and code books. The DWT based image compression algorithms give superior performance in lossy compression. Higher compression ratio can be achieved by DWT in the spatial domain. A number of algorithms are available for image compression but only a very few algorithms are dealing with personal ID images. Gerek and Cinar [3] have designed an algorithm for coding facial images. In their method the facial fea-

∗ Corresponding author. Tel.: +91 451 2452371; fax: +91 451 2453071. E-mail addresses: [email protected], [email protected] (K. Somasundaram), [email protected] (N. Palaniappan). 1434-8411/$ – see front matter © 2010 Elsevier GmbH. All rights reserved. doi:10.1016/j.aeue.2010.07.009

tures are segmented from the ID image. The facial features and the remaining residual image are compressed using vector quantization separately. Elad et al. [4] use vector quantization (VQ) with geometrical canonization which gives better image quality at an image size of 1 KB. This method requires training of images and vector code book of size 40 MB (approximately) on both sides. The algorithm of Vila-Forcen et al. [5] gives better results with good visual quality. This algorithm needs the partial availability of side information during encoding and decoding. Compression of facial images using the K-means algorithm with singular vector decomposition (K-SVD) algorithm [6] gives a good compression ratio. But this method requires training of images to create K-SVD dictionaries. This method is further extended by eliminating the visually disturbing artifacts using a linear deblocking technique [7]. This improves the PSNR of the reconstructed image. The wavelet transformed coefficients are encoded efficiently using zero tree coding algorithms. The embedded zero tree wavelet algorithm (EZW) by Shapirao [8] gives good image quality. This coding was extended by Said and Perlman [9] giving a new method, set partition in hierarchical trees (SPIHT) which gives better results than EZW algorithm. SPIHT is an embedded method designed for optimal progressive transmission, as well as for compression. The transmission can be stopped at any point during decoding and the best quality image can be formed with the available bits [10]. In this paper we propose a method based on DWT and SPIHT (without arithmetic coding). First a rough image is generated at a bit rate R1 and forms bit stream B1. Using the rough image a residual image in terms of wavelet coefficients is obtained. Using maxshift method the facial features are enhanced at the bit rate of R2 where

590

K. Somasundaram, N. Palaniappan / Int. J. Electron. Commun. (AEÜ) 65 (2011) 589–594

R = R1 + R2 is the required total bit rate. This gives bit stream B2. The encoded bit streams B1and B2 are decoded and combined to give the reconstructed image. The paper is organized as follows: Section 2 gives an overview of Discrete Wavelet Transform, SPIHT Coding and maxshift Method. In Section 3 we present the proposed algorithm. The results and discussion are given in Section 4 and conclusion in Section 5. 2. Overview of methods used For continuity we present an overview of DWT, SPIHT coding and maxshift methods that form the basis for the proposed method. 2.1. Discrete wavelet transform (DWT) Wavelet based compression methods give better image quality at higher compression rates. The DWT decomposes the given signal into different sub bands which give option for multi-resolution concepts. Some of the wavelet filters are Haar, Coifman, Daubechies and Symmlets [10]. Among them Cohen–Daubechies–Feauveau 9/7 (CDF 9/7) biorthogonal wavelet compression method gives very good results. Because of its fine features JPEG2000 uses CDF 9/7 as its base for lossy image compression [10]. We also make use of CDF 9/7 in the proposed method. 2.2. SPIHT coding The basic idea of SPIHT is based on spatial oriented trees (SOT). Three lists, list of insignificant pixels (LIP), list of insignificant sets (LIS) and list of significant pixels (LSP) are used. There are two passes in SPIHT, sorting pass and refinement pass. This method requires n + 1 iterations, where: n = log2 (maximum(|C(i, j)|))

(1)

i, j represent the ith row and jth column of coefficient matrix C obtained by wavelet transform. The iterations are processed with the thresholds 2n , 2n−1 , 2n−2 ,. . .21 ,20 . In the sorting pass, the wavelet coefficients in LIP and LIS are sorted according to their magnitudes. In every iteration the coefficients which satisfy the condition:

Fig. 1. ROI methods: (a) no ROI, (b) generic scaling, (c) Maxshift, and (d) proposed method for facial images.

In maxshift method all the coefficients of ROI are scaled with a scaling value S such that non-ROI coefficients are not coded first. The scaling value S is the maximum number of bit planes required to represent the largest magnitude of the non-ROI wavelet coefficients. The major demerit of this method is that no non-ROI region coefficient is decoded until all the ROI coefficients are decoded. Fig. 1 shows the existing ROI methods and the proposed one. In some other ROI coding algorithms the standard SPIHT coding method is altered for coding ROI and non-ROI regions. These methods use separate set of lists for every bit plane or iteration. 3. Proposed method

become significant else insignificant. The significant coefficients are stored in LSP. When a coefficient is significant a bit value 1, otherwise 0 is sent to the decoder along with a sign bit 0 for positive and 1 for negative coefficients. Instead of comparing all the coefficients to reduce the complexity, the spatial oriented trees are checked by performing the test:

The ID image is transformed by CDF 9/7 wavelet filter and the wavelet coefficient matrix C is obtained. C is then coded with SPIHT at a bit rate of R1 bpp. This bit stream is transmitted as bit stream B1. In another channel B1 is decoded to produce another matrix C . A residual matrix, CR = C − C is then computed as shown in Fig. 2 Using maxshift method the wavelet coefficients representing the facial features in CR are shifted up and encoded at R2 bpp. These bits are transmitted as bit stream B2. R = R1 + R2 gives the required bit rate. On the receiving end both the bit streams B1 and B2 are decoded separately as image I1 and I2 and are added to give I, the reconstructed image.

|Max C(i, j)Tk | > 2n

3.1. Pre-processing

2n ≤ |C(i, j)| < 2n+1

(2)

(3)

where Tk denotes the kth SOT and C(i, j) belongs to the spatial oriented tree Tk . For every iteration, the refinement pass sends one bit of the coefficients in LSP generated in the previous iterations from the most significant bit (MSB) to the least significant bit (LSB).

Gray scale personal ID images of size M × M pixels are used for testing. The images are pre processed to have the face in the upper middle of the image. Fig. 3 shows the method of selection of facial features area of the personal ID image.

2.3. Maxshift method

3.2. Rough facial image encoding

Generic scaling and maximum shift are popular methods for region of interest (ROI) coding used in the JPEG 2000 standard [11]. In these two methods the wavelet coefficients of the ROI region are placed in higher bit planes than the coefficients of non-ROI region so that the coefficients of ROI region are coded first. In the case of generic scaling method it supports any scaling value S so that it keeps a fine relation between the ROI and non-ROI regions.

1. The image is transformed by CDF 9/7 wavelet for N levels, where N = log2 (M). The transformed wavelet coefficients give matrix C. 2. The wavelet coefficients are rounded to integers. C = Round(C). 3. All the wavelet coefficients of C including facial features and nonfacial features are encoded with SPIHT coder for the required bit rate R1. 4. The encoded bit stream is B1.

K. Somasundaram, N. Palaniappan / Int. J. Electron. Commun. (AEÜ) 65 (2011) 589–594

591

Fig. 2. Compression and decompression process of the proposed method.

3.3. Selection of facial features area

3.4. Residual facial image encoding

The test image is divided into 64 blocks of size M/8 × M/8 pixels as shown in Fig. 3. By trial we found that the blocks (3,3) to (3,6), (4,3) to (4,6) and (5,4) to (5,5) covers the eyes, nose and mouth areas in the test images, which are taken as facial features. When DWT is applied for N levels, most of the higher magnitude wavelet coefficients concentrate only in the higher sub bands. In the rough image coding, maximum values of those coefficients are encoded. So, the wavelet coefficients representing the facial features are taken from the (N − 3)th level where the LL sub band will have 8 × 8 coefficients. The wavelet coefficients of the spatial oriented trees (SOT) originating from (3,3) to (3,6), (4,3) to (4,6) and (5,4) to (5,5) coordinates of the LL sub band of (N − 3)th level are taken as the facial features.

1. In another channel the encoded bit stream B1 is decoded to form the matrix of wavelet coefficients C . 2. A residual matrix CR is then computed by subtracting the decoded matrix C from the original wavelet coefficient matrix C as: CR = C − C  Fig. 4 shows a numerical example for an image of 8 × 8 pixels to compute C, C and CR . 3. The spatial oriented trees corresponding to the facial features area in the matrix CR are scaled by the value S, where S is the maximum number of bit planes required to represent the largest magnitude of the non-facial features area in CR . 4. Matrix CR is encoded with SPIHT coder for the required bit rate R2. By maxshift method the facial features are encoded first and other areas are encoded if further bits are available. 5. B2 is the encoded bit stream for the residual coefficients. 3.5. Decoding 1. Both the bit streams B1 and B2 are sent to the client from the server. 2. B1 is decoded by SPIHT decoder and inverse wavelet transform is applied. B1 gives the rough image I1. 3. B2 is decoded by the SPIHT decoder and inverse wavelet transform is applied. B2 gives the facial features enhanced image I2. 4. The resulting images I1 and I2 are added to give the image I.

I = I1 + I2

Fig. 3. Selection of facial features area (eyes, nose and mouth).

where I is the final reconstructed image. An example is shown in Fig. 5.

592

K. Somasundaram, N. Palaniappan / Int. J. Electron. Commun. (AEÜ) 65 (2011) 589–594

Fig. 5. A test image at different stages: (a) original image, (b) reconstructed rough facial image at 0.03 bpp, (c) reconstructed residual facial features enhanced image at 0.01 bpp, (d) final reconstructed image, by adding (b) and (c), at 0.04 bpp.

Fig. 4. Numerical example for coding: (a) pixel values of 8 × 8 pixel block, (b) DWT coefficients at level 3 at 1 bpp (C), (c) SPIHT decoded coefficients (C ), and (d) residual coefficients (CR = C − C ).

4. Results and discussion Experiments were done by applying our method on test images of size 256 × 256 pixels with uniform background shown in Fig. 6(a)–(f) and computed the PSNR values at different bpp. For comparison, results obtained by standard SPIHT without arithmetic coding and JPEG2000 are used.

Table 1 shows the results obtained for bitrates 0.03, 0.04, 0.06, 0.08 and 0.1 bpp for different methods. Fig. 7 shows the reconstructed images using the three methods at different bit rates. From the required bit rate R, R1 is allotted more than R2 (R1 > R2) in order to get better PSNR value, because R1 covers the whole image. The proposed algorithm was implemented with Matlab Release 13. Jasper Software (Version 1.700.0) [12] was used to get the results of JPEG2000. From Table 1 we note that SPIHT gives better results than the JPEG2000 method. When compared with JPEG2000, the proposed method gives better PSNR and better visual quality. At the bpp of 0.03, JPEG2000 gives very poor performance

Fig. 6. (a)–(f) Few test images.

K. Somasundaram, N. Palaniappan / Int. J. Electron. Commun. (AEÜ) 65 (2011) 589–594

593

Table 1 PSNR values of the four test images using the SPIHT, JPEG2000 and the proposed method. Bpp (R)

R1 + R2

PSNR values Fig. 6(a)

0.03 0.04 0.06 0.08 0.10

0.02 + 0.01 0.03 + 0.01 0.04 + 0.02 0.06 + 0.02 0.08 + 0.02

Fig. 6(b)

Fig. 6(c)

Fig. 6(d)

SPIHT

JPEG2K

Proposed

SPIHT

JPEG2K

Proposed

SPIHT

JPEG2K

Proposed

SPIHT

JPEG2K

Proposed

28.31 29.58 31.14 32.72 33.58

11.56 23.25 27.05 29.46 30.96

27.04 28.67 30.21 31.77 33.19

28.63 29.55 31.47 32.56 33.42

16.78 24.22 27.41 29.71 31.33

27.46 29.02 30.05 31.99 32.89

30.42 31.34 32.93 33.90 34.78

19.94 26.69 29.56 31.43 33.06

29.33 30.86 31.99 33.39 34.27

28.95 29.74 31.04 31.98 32.66

20.09 25.19 28.07 29.62 30.66

28.10 29.27 30.31 31.54 32.30

Fig. 7. Results for test image Fig. 6(b) at different bit rates for different methods row 1, SPIHT; row 2, JPEG2000 and row 3, proposed method.

Table 2 PSNR values of the facial features of four test images using SPIHT and the proposed method. Test image

Method

Fig. 6(a)

SPIHT Proposed SPIHT Proposed SPIHT Proposed SPIHT Proposed

Fig. 6(b) Fig. 6(c) Fig. 6(d)

PSNR (dB) bpp = 0.03

bpp = 0.04

bpp = 0.06

bpp = 0.08

bpp = 0.10

27.16 28.15 27.86 28.49 27.86 28.89 26.26 25.98

28.17 28.99 28.80 29.71 28.92 29.44 27.01 27.74

29.44 31.34 30.27 31.57 31.07 31.51 28.38 28.57

31.16 32.36 31.53 32.53 32.01 32.78 29.46 29.84

31.97 33.20 32.15 33.05 33.01 33.33 30.16 30.38

with no feature seen in the image. The proposed method gives results close to that of SPIHT in terms of PSNR, but gives better visual appearance than SPIHT. Table 2 shows the PSNR values obtained for the facial features covering the 10 blocks of the test images using SPIHT and the proposed method. We note from Table 2 that the PSNR values obtained in the proposed method is always better than the SPIHT at bit rates ≤ 0.1 bpp. As the bit rate increases the difference in PSNR values decreases and at 0.1 bpp both methods give almost the same PSNR values. For the bit rates bpp > 0.1, there is no need to separate the facial features area, because SPIHT itself gives better visual quality. Another advantage of the proposed method is that it reduces the scaling value S that is used to scale the facial features area.

When Maxshift method is applied directly, the scaling value will be higher because of higher magnitude wavelet coefficients. When rough facial image coding is completed, most of the higher magnitude wavelet coefficients are encoded. Only very small values are available in the residual matrix CR . So the scaling value S will be small and makes the encoding and decoding process simple and hence fast. 5. Conclusion In this work, a new low complexity DWT based method to compress personal ID images at low bit rate is presented. It gives a compression ratio and image quality comparable to that of SPIHT but gives a better visual image than that of SPIHT at bit rates <0.1 bpp. The performance of the proposed method in

594

K. Somasundaram, N. Palaniappan / Int. J. Electron. Commun. (AEÜ) 65 (2011) 589–594

terms of PSNR is always better than JPEG2000. The proposed method can find applications where fast accesses of ID images are required. References [1] Zhao W, Chellappa R, Philips PJ. Face recognition: a literature survey. ACM Comput Surv 2003;35:399–458. [2] Shih FY, Cheng S, Chuang C-FA, Wang PSP. Extracting faces and facial features from color images. Int J Pattern Recognit Artif Intell 2008;22: 515–34. [3] Gerek ON, Cinar H. Segmentation based coding of human face images for retrieval. Signal Process 2004;84:1041–7. [4] Elad M, Goldenberg R, Kimmel R. Low bit-rate compression of facial images. IEEE Trans Image Process 2007;16:2379–83. [5] Vila-Forcen JE, Voloshynovskiy S, Koval O, Pun T. Facial image compression based on structured codebooks in overcomplete domain. Eur J Appl Signal Process 2006;69042:1–11. [6] Bryt O, Elad M. Compression of facial images using the k-SVD algorithm. J Vis Commun Image R 2008;19:3445–62. [7] Bryt O, Elad M. Improving the k-SVD facial image compression using a linear deblocking method. IEEE 25th Convention of Electrical and Electronics Engineers in Israel 2008. IEEEI 2008:533–7. [8] Shapiro JM. Embedded image coding using zero trees of wavelet coefficients. IEEE Trans Signal Process 1993;41:3445–62. [9] Said A, Pearlman WA. A new, fast and efficient image codec based on set partitioning in hierarchical trees. IEEE Trans Circ Syst Video Technol 1996;6: 243–9. [10] Salomon D. Data compression – the complete reference. second edition Springer-Verlag; 2000. [11] Christopoulos C, Skodras A, Ebrahimi T. The jpeg2000 still image coding system: an overview. IEEE Trans Consum Electron 2000;46:1103–27. [12] Adams MD. The jasper project. http://www.ece.uvic.ca/mdadams/jasper.

Dr.K. Somasundaram was born in the year 1953. He received the M.Sc. degree in Physics from University of Madras, Chennai, India in 1976, the Post Graduate Diploma in Computer Methods from Madurai Kamaraj University, Madurai, India in 1989 and the Ph.D. degree in theoretical Physics from Indian Institute of Science, Bangalore, India in 1984. He is presently the Professor and Head of the Department of Computer Science and Applications, and Head, Computer Centre at Gandhigram Rural Institute, Gandhigram, India. From 1976 to 1989, he was a Professor with the Department of Physics at the same Institute. He was previously a Researcher at an International Centre for Theoretical Physics, Trieste, Italy and a Development Fellow of Commonwealth Universities at the school of Multimedia, Edith Cowan University, Australia. His research interests are in image processing, image compression and medical imaging. He is a Life member of Indian Society for Technical Education. He is also an annual member in ACM, USA and IEEE Computer Society, USA. N. Palaniappan received M.C.A. degree in 2002 from Madurai Kamaraj University, Madurai. He is currently doing his Ph.D. in the Department of Computer Science and Applications, Gandhigram Rural Institute – Deemed University. His area of research is image compression.