A new approach for real-time reduction of blocking effect

A new approach for real-time reduction of blocking effect

Signal Processing 65 (1998) 337—346 A new approach for real-time reduction of blocking effect Sung-Wai Hong, Yuk-Hee Chan*, Wan-Chi Siu Department of...

567KB Sizes 1 Downloads 41 Views

Signal Processing 65 (1998) 337—346

A new approach for real-time reduction of blocking effect Sung-Wai Hong, Yuk-Hee Chan*, Wan-Chi Siu Department of Electronic Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Received 29 April 1996; revised 14 October 1997

Abstract A new, non-iterative post-processing approach is proposed for real-time reduction of blocking effects. The proposed approach has the merits of being fully compatible with the JPEG standard and requiring no additional transmission overhead. This is achieved by training feed-forward single-layer neural networks to restore classified block boundaries of JPEG-encoded images. Classification is based on the intensity distribution of the pixels on either sides of the block boundaries. This approach can be easily implemented and simulation results demonstrate the superiority of the proposed approach in terms of signal-to-noise ratio improvement and processing time as compared with various well-known post-processing approaches. ( 1998 Elsevier Science B.V. All rights reserved. Zusammenfassung Es wird ein neuer, nichtiterativer Ansatz zur Nachbearbeitung fu¨r die Echtzeitreduktion von Blockeffekten vorgeschlagen. Der vorgeschlagene Ansatz besitzt den Vorteil, voll mit dem JPEG Standard kompatibel zu sein und keinen zusa¨tzlichen U®bertragungsaufwand zu erfordern. Das wird durch das Training von vorgekoppelten, einschichtigen neuralen Netzen erreicht, die klassifizierte Blockgrenzen von JPEG-codierten Bildern wiederherstellen sollen. Die Klassifizierung beruht auf der Intensita¨tsverteilung der Pixels auf beiden Seiten der Blockgrenzen. Dieser Ansatz kann einfach implementiert werden und Simulationsergebnisse unterstreichen die U®berlegenheit des vorgeschlagenen Ansatzes hinsichtlich der Verbessserung des Signal—Gera¨uschabstands und der Verarbeitungszeit im Vergleich zu verschiedenen bekannten Nachbearbeitungsansa¨tzen. ( 1998 Elsevier Science B.V. All rights reserved. Re´sume´ Nous proposons dans cet article une approche nouvelle de post-traitement, non-ite´rative, pour la re´duction en temps re´el des effets de bloc. L’approche propose´e pre´sente les avantages d’eˆtre pleinement compatible avec le standard JPEG et de ne reque´rir aucun effort additionnel de transmission. Ceci se fait en faisant l’apprentissage de re´seaux de neurones a` une couche pour la restauration des frontie`res de blocs classifie´s dans les images encode´es JPEG. La classification est base´e sur la distribution d’intensite´ des pixels sur chaque coˆte´ des frontie`res de bloc. Cette approche peut eˆtre aise´ment implante´e et les re´sultats de simulation montrent la supe´riorite´ de l’approche propose´e en termes d’ame´lioration du

* Corresponding author. Tel.: (852) 2766 6224; fax: (852) 2362 8439; e-mail: [email protected]. 0165-1684/98/$19.00 ( 1998 Elsevier Science B.V. All rights reserved. PII S 0 1 6 5 - 1 6 8 4 ( 9 7 ) 0 0 2 3 6 - 3

338

S.-W. Hong et al. / Signal Processing 65 (1998) 337–346

rapport signal sur bruit et de temps de calcul vis-a`-vis des diverses approches de post-traitement connues. ( 1998 Elsevier Science B.V. All rights reserved. Keywords: Image restoration; JPEG; Blocking effect; Neural network; Classification; Frequency-sensitive competitive learning

1. Introduction The Joint Photographic Experts Group (JPEG) image compression standard [6] has been widely used as an industrial standard for stillframe, continuous-tone image compression. In this standard, source images are divided into a number of 8]8 non-overlapping blocks, and then each block is transformed with the discrete cosine transform (DCT), quantized and encoded with Huffman or arithmetic coding. However, in such a blockbased transformation, little of the inter-block correlation is exploited. When the compression rate is high, the reconstructed image exhibits visible discontinuities among adjacent blocks. Such an artifact is the so-called ‘blocking effect’. This blocking effect is commonly considered to be the most objectionable artifact in the processed image. In recent years, various approaches have been proposed for the reduction or removal of the blocking effect [4,7,8,10—12]. Owing to the popularity of the JPEG standard, attention is now being drawn to develop post-processing approaches which are JPEG-compatible and without extra transmission overheads. Well-known approaches of this category include the filtering approach [7], the non-linear interpolative decoder (NID) approach [10], the iterative constrained least-squares (CLS) recovery approach [11], and the two iterative POCS [9] recovery approaches proposed separately by Rosenholtz et al. [8] and Yang et al. [11]. Hereafter, we denote the approaches proposed by Rosenholtz et al. and Yang et al. as POCSRZ and POCSYGK, respectively. There are pros and cons in using the above approaches. For instance, the filtering approach, which makes use of a spatially invariant Gaussian low-pass filter to filter the block boundaries of the blocky image, tends to blur the sharp details near the block boundaries.

The POCS approach imposes two constraints RZ on the JPEG-encoded image to bound the possible alteration of the transform coefficients during the restoration and the smoothness across the restored block boundaries. Since the latter constraint involves low-pass filtering as in the filtering approach, it suffers from a similar problem as in the filtering approach to a certain extent. The NID approach [10] replaces the conventional inverse transform decoder with a non-linear interpolative decoder, which performs table lookups in trained codebooks to reconstruct the image blocks from the DCT transform coefficients. However, the performance of the non-linear interpolative decoder is very sensitive to the choice of input training set in the codebook design procedure when the training set is of a limited size. In the iterative CLS and POCS approaches, YGK the a priori knowledge of the smoothness of the original image is utilized to reduce the blocking effect. They have been shown to be very effective in restoring the block boundaries. However, they require exhaustive computation for the estimate to converge to the desirable solution, which makes them impossible for real-time application. There are some other post-processing approaches which aim at enhancing the image quality instead of restoring the image fidelity [7,8]. Though the images after processing by these approaches are free from the blocking effect, these approaches may further enlarge the distortion and make the image deviate from its original version. To prevent this from happening, it is essential to develop an approach that can both improve the objective fidelity and enhance the subjective quality of the processed image. In view of the disadvantages of these conventional approaches, we propose a novel JPEGcompatible, real-time approach, which aims at providing a simple and effective way to reduce the

S.-W. Hong et al. / Signal Processing 65 (1998) 337—346

339

blocking effect. This approach classifies the intensity variation across the block boundaries of a JPEGencoded image and then employs trained feedforward single-layer neural networks (FFSLNs) to restore the corresponding block boundary pixels. Classification is done by using the frequency-sensitive competitive learning (FSCL) [1] algorithm. Simulations showed that the proposed FFSLNs approach provided a better performance compared with other post-processing approaches [7,8,10,11] in terms of SNR improvement. The performance of the proposed approach can definitely be further improved by classifying the input vectors into more classes.

training process. This is because the boundary pixels of the original image and their corresponding neighbouring pixels in the JPEG-encoded image are highly correlated. To further improve the performance of the neural network, classification of the input vectors is performed to get more precise statistics of the intensity variation of the pixels across the block boundaries. Different classes of input vectors are then used to train their corresponding FFSLNs to further improve the restoration performance. However, the number of classes is kept small in order not to increase the computational burden too much.

2. The algorithm

3. Classification scheme

Although the blocking effect is caused by neglecting the inter-block correlation during the compression, a considerable amount of inter-block correlation still remains in the encoded image. When there is no other explicit information of the interblock correlation, one has to utilize it to restore the block boundary pixels. In the proposed approach, we exploit the remaining inter-block correlation among adjacent blocks and the JPEG-encoded pixels along the block boundary to restore the boundary pixel intensity. It is well-known that a neural network is robust in exploring the correlation between the input and the output of an unknown system. Specifically, it is robust and quickly produces reasonable output even though its input has not been encountered during training. In addition, it supports non-batch training. These abilities make a neural network an ideal candidate for exploring the residue correlation presented in the JPEGencoded images and implementing an adaptive blocking effect eliminating system. To optimize the generalization capability of the neural network, input training vectors and target vectors are chosen from sources that have a considerable amount of correlation. In particular, vectors extracted from the concerned regions of the JPEG-encoded image are chosen to be the input vectors, while corresponding boundary pixels extracted from the original images are used as their target output in the

It is well-known that competitive learning (CL) [3,5] is an adaptive version of the LBG algorithm for clustering analysis. The traditional LBG algorithm [2] is a batch mode algorithm and needs to access the entire training vector set every time one wants to update the codebook during the training process. Using CL has the advantage that no batching of the training vectors is needed and the codebook vectors can be updated whenever a new training vector arrives. However, conventional CL may underutilize neurons [5]. Incorporating the concept of frequency sensitivity into the CL rule can alleviate this problem while retaining the computational advantages of the conventional CL. The FSCL network consists of three layers: an input layer neuron broadcasts a given input vector to the second-layer neurons; the second layer with M neurons computes the Euclidean distances between their weight vectors and the input vector, where M is the number of classes desired; and an output layer determines the winning neuron based on the distortions computed by the second-layer neurons. The configuration of the FSCL network is illustrated as in Fig. 1. The FSCL neural network is trained with a large amount of training data, and the weight vectors Wj, associated with the jth second-layer neurons are initialized with the vectors taken randomly from the training set. The output ui of the ith output

340

S.-W. Hong et al. / Signal Processing 65 (1998) 337–346

Fig. 1. The configuration of the FSCL network.

layer neuron is given by

G

1 If EX!W E2"min EX!W E2 i j j ui" for j"0,2, M!1, 0 otherwise,

Fig. 2. The input vector extraction scheme for the FSCL network and the FFSLNs model.

(1)

where X is the input vector presented to the inputlayer neuron. During the training phase, the weight vector of the winning neuron, W , is updated by * W /%8"W 0-$#*W "W 0-$#e (n)(X!W 0-$) u , * * * * * * * (2) where W 0-$ and W /%8 are the previous and the * * updated weight vectors of the winning neuron respectively. e (n)"e~c*(n)@n is the learning rate * which decreases monotonically to zero as learning progresses, and c (n) denotes the total number of * times that the winning neuron has won, up to the current competition n. Eq. (2) incorporates the frequency sensitivity of each output-layer neuron into its learning rate. In this way, when an outputlayer neuron does not win enough input training vectors, it becomes increasingly sensitive. On the other hand, if it wins frequently, it decreases its sensitivity. This frequency-sensitive scheme reduces the likelihood that a frequently-won neuron wins a competition, and leaves more chance to other neurons. No updating is required after the training phase. Since we are interested only in the variation of the pixel intensity, all input vectors are first mean-

removed and then normalized by the maximum pixel intensity of each input vector before being inputted into the FSCL network. The combined system of this operator and the FSCL network works as a classifier. As shown in Fig. 2, each input vector consists of two block-boundary pixels and six neighbouring pixels in line with them. All input vectors are presented to the classifier in either normal or reverse order. Vectors that consist of pixels from the cross-over regions are ignored in the training phase. The number of classes is set to eight for the reason that it can provide a definite improvement in SNR without introducing too much processing overhead.

4. The FFSLNs architecture and training The proposed neural network model is composed of M feed-forward single-layer neural networks (FFSLNs) with identical structure. Each FFSLN has an input layer with nine neurons (including the bias term) and an output layer with one neuron. Each neuron executes a weighted sum of its inputs followed by a ramp activation function with a range of [0.0, 1.0]. The classical back-propagation learning algorithm is adopted to minimize the meansquare-error (MSE) between the target vector and

S.-W. Hong et al. / Signal Processing 65 (1998) 337—346

341

Fig. 3. The configuration of the FFSLNs architecture and its corresponding training scheme.

the output vector. A momentum term is added to prevent the network from sticking in the local minimum. Fig. 3 shows the training scheme of the FFSLNs. At each iteration, the input vectors taken from a set of normalized JPEG-encoded images are sequentially presented to the classifier and then fed into the corresponding FFSLN for subsequent training. Contrary to the normalization in the classification process, normalization in this process is done by dividing all pixel values of an image by the maximum possible gray-level value. The input vectors are extracted in the same manner as in the classification scheme (as shown in Fig. 2). Specifically, they are taken horizontally and then vertically in a raster scan fashion. The classified input vectors are fed either in normal or reverse order into their corresponding FFSLNs, which depends on which boundary pixel is chosen to be the desired output. The corresponding target output is simultaneously presented to the network as shown in Fig. 3. Any vector that consists of a cross-over region (as indicated in Fig. 2) is neglected in the training phase. The initial weighted values of the network connections are random values within the range of (!0.25, 0.25) and all weights are updated at each epoch. Note that the training process can be off-line.

Moreover, once the training is done, no further training is required unless the nature of the input has been changed sharply.

5. Performance evaluation Nine 256 gray-level standard images of size 256]256 were exploited in our simulations. Among these images, ‘Lena’, ‘Hat’, ‘Peppers’, ‘Germany’ and ‘Girl’ were chosen as the training set for the NID [10] and our proposed FFSLNs approaches. All images were transform-coded with the JPEG scheme. Table 1 shows the quantization table used in the simulations. We investigated the performance

Table 1 The quantization table used in JPEG encoding 50 60 70 70 90 120 255 255

60 60 70 96 130 255 255 255

70 70 80 120 200 255 255 255

70 96 120 145 255 255 255 255

90 130 200 255 255 255 255 255

120 255 255 255 255 255 255 255

255 255 255 255 255 255 255 255

255 255 255 255 255 255 255 255

342

S.-W. Hong et al. / Signal Processing 65 (1998) 337–346

Table 2 The correlation coefficients of an original boundary pixel and the pixels of its corresponding JPEG-encoded vector 0

1

2

3!

4

5

6

7

!0.0866

0.0797

0.4868

0.7958

0.2493

!0.1704

!0.4450

!0.4698

!The corresponding position of the boundary pixel in the original image is identical to the position of the third pixel of the vector in the JPEG-encoded image. Table 3 The experimental results for the images NID [10] JPEG-encoded Image

bpp

*SNR (dB)

‘Lena’! ‘Hat’! ‘Peppers’! ‘Germany’! ‘Girl’!

0.26 0.28 0.27 0.20 0.20

‘Couple’ ‘House’ ‘Sailboat’ ‘Airplane’ ‘Tiffany’

0.20 0.28 0.29 0.29 0.19

CLS [11]

POCS [8] RZ

POCS [11] YGK

Iter.

*SNR (dB)

Iter.

*SNR (dB)

Iter.

*SNR (dB)

Iter.

*SNR (dB)

1.40 0.41 1.36 0.45 0.69

— — — — —

0.14 0.21 0.12 0.11 0.15

13 14 13 13 13

!0.22 0.46 !0.26 0.11 0.20

5 5 5 5 5

!0.48 0.17 !0.49 !0.04 0.02

10 10 10 10 10

0.36 0.71 0.06 0.30 0.53

16 10 5 18 4

!0.45 !2.64 !1.73 !2.03 !1.05

— — — — —

0.12 0.15 0.10 0.12 0.10

13 12 14 12 12

!0.23 !0.42 !0.37 !0.47 0.17

5 5 5 5 5

!0.44 !0.71 !0.59 !0.75 0.07

10 10 10 10 10

0.19 0.01 !0.02 0.16 0.23

12 20 15 20 9

Iter.

The *SNR of the proposed FFSLNs approaches JPEG encoded image

bpp

FFSLN (dB) 1

FFSLN (dB) 4

FFSLN (dB) 8

FFSLN (dB) 16

‘Lena’" ‘Hat’" ‘Peppers’" ‘Germany’" ‘Girl’"

0.26 0.28 0.27 0.20 0.20

0.526 0.977 0.463 0.442 0.606

0.570 0.970 0.512 0.483 0.639

0.587 1.014 0.512 0.497 0.657

0.644 1.043 0.563 0.500 0.678

‘Couple’ ‘House’ ‘Sailboat’ ‘Airplane’ ‘Tiffany’

0.20 0.28 0.29 0.29 0.19

0.535 0.478 0.317 0.297 0.415

0.561 0.579 0.344 0.382 0.429

0.575 0.608 0.347 0.419 0.449

0.571 0.614 0.366 0.421 0.467

!Images that were used for training in the simulation of the NID approach. "Images that were used for training in the simulation of the FFSLNs approaches.

of the approaches without and with classification. They were denoted as FFSLN , FFSLN , FFSLN 1 4 8 and FFSLN , respectively. Here, the subscript 16 corresponds to the number of classes. Note the scheme with one class is actually the case without

classification. Table 2 shows the correlation coefficients of a boundary pixel of original images and the pixels of its corresponding training vector extracted from corresponding JPEG-encoded images. The statistics were obtained with the training set of

S.-W. Hong et al. / Signal Processing 65 (1998) 337—346

343

Fig. 4. (a) The Zoomed original ‘Hat’ image. (b) Zoomed JPEG-encoded ‘Hat’ image; compression rate"0.28 bpp. (c) Zoomed ‘Hat’ image restored with the proposed FFSLN approach. (d) Zoomed ‘Hat’ image restored with the NID approach [10]. (e) Zoomed ‘Hat’ 8 image restored with the POCS approach [8]. (f) Zoomed ‘Hat’ image restored with the POCS approach [11]. RZ YGK

344

S.-W. Hong et al. / Signal Processing 65 (1998) 337–346

Fig. 5. (a) The Zoomed original ‘Sailboat’ image. (b) Zoomed JPEG-encoded ‘Sailboat’ image; compression rate"0.29 bpp. (c) Zoomed ‘Sailboat’ image restored with the proposed FFSLN approach. (d) Zoomed ‘Sailboat’ image restored with the NID approach [10]. (e) 8 Zoomed ‘Sailboat’ image restored with the POCS approach [8]. (f) Zoomed ‘Sailboat’ image restored with the POCS approach [11]. RZ YGK

S.-W. Hong et al. / Signal Processing 65 (1998) 337—346

images. It shows that there is correlation among adjacent blocks and one can use it to reconstruct the original boundary pixels. During reconstruction, block boundary pixels which were outside cross-over regions were processed first, and then followed by those inside cross-over regions. The reconstruction was simply done by feeding the classified input vectors into their corresponding FFSLNs. The improvement in SNR (*SNR) is used as an objective criterion of merit for performance evaluation, which is defined as E f !f E2 B I dB, *SNR"10 log (3) 10E f !f E2 R I where f , f and f are the restored, the JPEGR B I encoded and the original images, respectively. Table 2 lists the *SNR performance of various approaches. Obviously, as shown in Table 3, the performance of the proposed approaches outperforms those of the other approaches except the NID approach, in terms of *SNR metric. Moreover, the classification of input vectors can definitely improve the performance of the FFSLNs approach. However, the marginal improvement converges to zero as the number of classes increases. The NID approach works well for the training images, but its performance dropped drastically for the test set. The reason is that the performance of the NID approach is very sensitive to the choice of the training set when the training set is of a limited size as in our simulation. That means the NID approach cannot guarantee a stable performance with a small arbitrarily selected training set. On the contrary, the FFSLNs approaches can provide promising results for both the training and the test sets under the same training condition. The proposed approaches can provide a good subjective restoration result as well. Two images, namely ‘Hat’ and ‘Sailboat’, were chosen separately from the training and non-training sets to illustrate the subjective performance of this approach. Fig. 4(a,b) are the original image of ‘Hat’ and its JPEG-encoded version coded at 0.28 bit/pixel (bpp), respectively. Fig. 4(c—f ) show the corresponding restoration results obtained with the FFSLN 8 approach and some other approaches. Similarly,

345

Fig. 5(a,b) are the original image of ‘Sailboat’ and its JPEG-encoded version coded at 0.29 bpp respectively. Fig. 5(c—f ) show the corresponding images restored with the FFSLN and some other 8 approaches. One can observe that the proposed approach can reduce most of the blocking effect when comparing Fig. 4(c) with Fig. 4(b) and Fig. 5(c) with Fig. 5(b). The proposed approach took few milliseconds to restore a 256]256 image in a 200Mz Pentium-Pro computer system. 6. Conclusions A novel JPEG-compatible approach is proposed for blocking effect reduction. This approach utilizes the correlation existing in the block boundaries and their neighbouring pixels to reduce the blocking effect. Also, making use of neural network models enables real-time realization of this approach. Simulations showed that the proposed approach achieved a better restoration performance compared with other well-known post-processing approaches [7,8,10,11] in terms of SNR improvement. References [1] S.C. Ahalt, A.K. Krishnamurthy, P. Chen, D.E. Melton, Competitive learning algorithms for vector quantization, Neural Networks 3 (1990) 277—291. [2] Y. Linde, A. Buzo, R.M. Gray, An algorithm for vector quantizer design, IEEE Trans. Commun. 28 (January 1980) 84—95. [3] J. Makhoul, S. Rpucos, H. Gish, Vector quantization in speech coding, Proc. IEEE 73 (11) (1985) 1551—1558. [4] H.S. Malvar, D.H. Staelin, The LOT: Transform coding without blocking effects, IEEE Trans. Acoust. Speech Signal Process. 37 (4) (April 1989) 553—559. [5] N.M. Nasrabadi, R.A. King, Image coding using vector quantization: A review, IEEE Trans. Commun. 36 (8) (August 1988) 957—971. [6] W.B. Pennebaker, J.L. Mitchell, JPEG Still Image Data Compression Standard, Van Nostrand Reinhold, New York, 1993. [7] H.C. Reeve, J.S. Lim, Reduction of blocking effect in image coding, Opt. Engrg. 23 (1) (January/February 1984) 34—37. [8] R. Rosenholtz, A. Zakhor, Iterative procedures for reduction of blocking effects in transform image coding, IEEE Trans. Circuits Systems for Video Technol. 2 (1) (March 1992) 91—95. Correction to ‘‘Iterative procedures for reduction of blocking effects in transform image coding’’, 2 (3) (September 1992) 325.

346

S.-W. Hong et al. / Signal Processing 65 (1998) 337–346

[9] M.I. Sezan, An overview of convex projections theory and its applications to image recovery problems, Ultramicroscopy 40 (1992) 55—67. [10] S.W. Wu, A. Gersho, Improved decoder for transform coding with application to the JPEG baseline system, IEEE Trans. Commun. 40 (2) (February 1992) 251—254.

[11] Y. Yang, N.P. Galatsanos, A.K. Katsaggelos, Regularized reconstruction to reduce blocking artifacts of block discrete cosine transform compressed images, IEEE Trans. Circuits Systems for Video Technol. 3 (6) (December 1993) 421—432. [12] Y.Q. Zhang, R.L. Pickholtz, M.H. Loew, A new approach to reduce the blocking effect of transform coding, IEEE Trans. Commun. 41 (2) (February 1993) 299—302.