Signal Processing: Image Communication 27 (2012) 749–759
Contents lists available at SciVerse ScienceDirect
Signal Processing: Image Communication journal homepage: www.elsevier.com/locate/image
Low-complexity high-quality adaptive deblocking filter for H.264/AVC system$ Shih-Chang Hsia a,n, Wei-Chih Hsu b, Sheng-Chieh Lee b a b
National Yunlin University of Science and Technology, 123 University Road, Section 3, Douliou, Yunlin, Taiwan National Kaohsiung First University of Science and Technology, 1 University Road, Yuanchau, Kaohsiung 824, Taiwan
a r t i c l e i n f o
abstract
Article history: Received 25 July 2011 Accepted 23 April 2012 Available online 7 May 2012
Blocking artifacts always appear on the reconstructed image, particularly in a low bitrate video coding system. This paper presents an adaptive offset method to improve image quality for H.264 decoding. First, the histogram statistic is used to analyze the correlation between the offset and the filtering performance. The best filtering performance can mostly be found at the position of three offsets. Second, the best offset can be searched for with the minimum SAE (Summation of Absolute Error) among the three candidates. This algorithm can not only keep low computations, but it can also obtain good filtering quality. The average performance can be improved about 0.25 to 0.45 dB (decibels) higher than the original H.264 deblocking filter. The blocky effect on the decoding image can be smoothed in vision. & 2012 Elsevier B.V. All rights reserved.
Keywords: H.264 Video coding Blocky effect Deblocking filter Adaptive filter
1. Introduction The video coding technique has widely been used to reduce the coding bit-rate in order to save the channel bandwidth and the disk space. To improve the coding performance, the well-known video coding standard, H.264/AVC[1], had been developed. This standard adopts some advanced coding technologies, such as variable block sizes, quarter sub-pixel motion compensation, intra prediction, 4 4 integer transform, multi-reference frames, among other methods. Performance can be significantly improved over the previous coding standards, such as MPEG-2 and H.263, in which the bit-rate can be about halved while keeping the same image quality. However, the blocky effect still appears on the decoding image. Eliminating blocky artifacts is an important issue to address to improve image quality for video coding
$ This work was supported by the National Science Council, Taiwan, under NSC96-2221-E-327-006-MY3. n Corresponding author. E-mail addresses:
[email protected],
[email protected] (S.-C. Hsia).
0923-5965/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.image.2012.04.004
systems. The H.264/AVC standard employs the adaptive deblocking filter on the decoding loop. The loop-filter must be implemented within both an encoder and a decoder in order to keep the same decoding pixels in the frame memory. For video coding, the image is divided into blocks for transformation and quantization. However, the transform coefficients after coarse quantization will cause visual discontinuities at the boundary of two adjacent blocks. This perceptible artifact is known as the blocking effect or the blocking artifact. The artifact is a serious concern when the quantization scale becomes large under low bitrate coding. In the H.264/AVC standard, the adaptive deblocking filter [2] is one of the key components to reduce the blocking effect and to further improve objective and subjective image quality. Recently, some authors [3–6] have proposed to reduce the complexity of the adaptive deblocking filter for hardware implementation, but few studies consider improving the algorithm quality. In this paper, a deblocking filter algorithm for the H.264/AVC standard is proposed with the use of the adaptive offset value. The results can achieve better objective and subjective quality compared to the original
750
S.-C. Hsia et al. / Signal Processing: Image Communication 27 (2012) 749–759
H.264 deblocking filter with a fixed offset [7]. The algorithm can quickly find a suitable offset parameter to improve the coding quality. The searching speed is faster than the competing method [8] with an adaptive offset approach. Furthermore, our method can be compatible with the H.264/AVC standard. This paper is organized as follows. The related work about the current deblocking filter algorithm is introduced in Section 2. The proposed algorithm is presented in Section 3. The simulations are shown in Sections 4. Finally, the conclusions are drawn in Section 5.
The a and b variables can be modified with indexA and indexB as
a ¼ 0:8 ð2ðIndexA=6Þ 1Þ,
ð5Þ
b ¼ 0:5 IndexB7:
ð6Þ
If offsetA and offsetB are negative values, the values a and b will become a smaller value to reduce the filtering power. This case is suitable for filtering a high-resolution image to avoid image blurring. On the other hand, the filtering ability would be enhanced to reduce the serious blocking artifact for a low bit-rate system.
2. Review of H.264/AVC adaptive deblocking filter 3. Proposed algorithm The deblocking filter procedure in the H.264/AVC system adopts the parameters of boundary strength (BS) and quantization (QP) to determine the filtering ability. The BS is estimated from the information of intra- or inter-coding, and the motion vector difference, as shown in Table 1. The level can be classified to an integer value from 0 to 4. When BS ¼4, this represents the strongest filtering mode. However, the filtering operation is skipped at BS ¼0. According to the BS value, one can separate the filtering mode into two types: a common mode (0 oBSo4) and a special mode (BS¼ 4). When the BS and QP values have been decided, the parameters, a and b, can be found. The a and b are used to determine the filtering threshold for each block boundary, which can be calculated by the QP value as
a ¼ 0:8 ð2ðQ P=6Þ 1Þ,
ð1Þ
b ¼ 0:5 Q P7:
ð2Þ
If QP is a fixed value, the a and b will be not adaptive. The H.264/AVC standard defines two parameters, offsetA and offsetB, to adjust the filtering ability. The offsetA and offsetB parameters are integer values between 12 and 12 (step by 2). In fact, each offset has 13 values available. For analysis, one can condense the range of offset to 6 to 6 (step by 1), and the final offset value is multiplied by 2, for the deblocking filter. The reference software JM [7] can be manually set up for the offset values before encoding videos. The two offset values can be used to adjust the indexA and indexB parameters. The two variables, indexA and indexB, can be derived by IndexA ¼ MinðMaxð0,Q P þ of f setAÞ, 51Þ,
ð3Þ
IndexB ¼ MinðMaxð0,Q P þ of f setBÞ, 51Þ:
ð4Þ
Table 1 Filter strength parameter. Block modes and conditions
BS
One of the blocks is Intra and the edge is a macroblock edge One of the blocks is Intra One of the blocks has coded residuals Difference of block motion ^1 luma sample distance Motion compensation from different reference frames Else
4 3 2 1 1 0
From the above, the filtering operation must rely on the coding parameters. Although H.264 defined the two variables with offsetA and offsetB, the parameters are always defaulted to zero or are manually set to a fixed value before encoding video sequences in the JM reference software [7]. Therefore, offsetA and offsetB will be not adapted for each coded block. In other words, offsetA and offsetB cannot be adaptive according to the current image feature. To solve this problem, an adaptive method is proposed to automatically decide a suitable offset value and to improve the filtering quality. First, the histogram statistic is employed to analyze the probability of each offset for the best filtering. Next, according to this result, the offset with higher probability in the histogram can be selected as a candidate. Then, the best offset can be chosen from all candidates for each block. 3.1. Analysis with histogram statistic The histogram statistical approach is used to analyze the correlation between the offset value and the filtering performance. Based on the histogram statistical method, the distribution of the best offset can be achieved over all offsets. One can exhaustively search all offset values to find the best PSNR (Peak Signal Noise Ratio) value corresponding to the minimum MSE(Mean Square Error). The offsets evaluated are condensed from 6 to 6 to meet the H.264 specification. Fig. 1 illustrates the flowchart to find the histogram of each offset. Initially, the offset was set to 6, and a variable RMSE (Root Mean Square Error) was set to a maximum. The RMSE is used to record the minimum MSE corresponding to the best offset. Then the H.264 deblocking filter is used to process the current decoding image with the evaluated offset. The MSE is computed with the original image and its filtering block as ! N 1 X ðf ij f^ ij Þ2 MSE ¼ , ð7Þ NN i,j ¼ 0 where N N is the block size, f ij and f^ ij is the original image and the filtering image, respectively. If the current block MSE is less than the previous recorded RMSE value, the RMSE is updated by the current MSE. When the current offset value is less than 6, the offset is increased
S.-C. Hsia et al. / Signal Processing: Image Communication 27 (2012) 749–759
751
Initial Offset=-6, RMSE=Max
Do H.264 De-blocking Filter
Reconstructed Filtering Image
Original Image
MSE Computing Offset =Offset+1
RMSE=MSE
yes If MSE
Record Offset no yes If Offset<6 no Accumulate Offset Histogram vs Minimum RMSE
Next block
Fig. 1. The flowchart of offset histogram statistic for filtering performance analysis.
by one for the next offset estimation. The filtering procedure is performed again. By repeating this procedure, one can find the best offset with the minimum MSE among 13 offsets computed at the kernel of the deblocking filter. Until offset ¼6, the evaluation for one block can be finished. Each block has only one offset for the best filtering result. Finally, the best offset with the minimum MSE on the histogram is increased by one. The histogram table can accumulate the number of the best offset. Following that, the next block can be processed with the same procedure, and the result is accumulated and put in the histogram table. The probability of each offset occurring at the best quality can be recorded with the histogram form. As all blocks are finished, the histogram table can record the times to the best filtering result from the offset 6 to 6. The histogram statistic is useful to analyze which one offset is effective in dominating the image filtering performance. An experiment was performed to analyze the changing quality under the different offset values. The tested images are selected for different features. In order to cover more of the various video features, 20 benchmarks with standard video sequences were evaluated. The H.264 coding kernel and the deblocking filter used JM17.0, and the proposed algorithm is embedded to its deblocking filter loop. The histogram table for each sequence can be recorded while corresponding to the best filtering result with the minimum
MSE. The histogram statistic is the accumulation with 30 frames for each sequence. The results are shown in Fig. 2(a)–(f) for 6 sequences testing. One finds an important factor in that most of the best offsets are located at the values 6, 6, and 0 for the best filtering quality. The probability of three offsets can share about 90 to 95 percent for the best selection. In other words, the offset 6, 6 and 0 can dominate the filtering quality. The feature is also available from the various 20 sequences testing.
3.2. Theoretical description The H.264/AVC standard defined various filtering conditions dependent on the difference of boundary pixel for the special mode and common mode processing [1]. The processing of the loop filter is controlled by the a and b with (5) and (6). The blocky effect occurs along the boundary of two successive blocks in the spatial domain. The strength of the blocky effect (SBE) can be estimated by the summation of the boundary differential of the vertical and horizontal direction as SBE ¼SBEV þSBEH [9]. The vertical SBE and the horizontal SBE on the boundary for the (m,n)th block can be defined by SBEðm,nÞ ¼ V
N 1 X j¼0
ðm,nÞ
ðm,nÞ
9f^ j,0 f^ j,N1 9,
and
752
S.-C. Hsia et al. / Signal Processing: Image Communication 27 (2012) 749–759
offset value times
offset value times 25000
30000
-5 -4
times
20000
-3 15000
-2 -1
10000
0
-6
1 2
2
-4
3
5
3
-3
15000
4
4
-2
5
-1
10000
6
6
0 5000
5000 0
0
carphone QP=30
foreman QP=30
30 QCIF frames offset value
30 QCIF frames offset value
offset value times
offset value times
35000
20000
25000 20000 15000
-6
1
18000
-6
-5
2
16000
-5
-4
3
14000
-4
-3
4
12000
-3
-2
5
10000
-1
-2
6
8000
-1
6000
0
times
30000
times
1
-5
20000
times
-6 25000
0
10000
1 2 3 4 5 6
4000
5000
2000 0
0 mobile QP=30
suzie QP=30
30 QCIF frames offset value
30 QCIF frames offset value
offset value times
offset value times
250000
600000
-5
200000
-4 150000 times
-3 -2
100000
-1
-6
1
500000
2 400000
3 4
times
-6
5
-4
2 3
-3 300000 200000
6
1 -5
0
4 -2 -1
5 6
0
50000
100000 0
0 Weather 720x480
Park_joy 1080P(HD)
30 frames offset value
30 frames offset value
Fig. 2. (a)–(f) The histogram statistic for the best offset with ‘‘Carphone’’, ‘‘Foreman’’, ’’Mobile’’, ‘‘Suzie’’, ‘‘Weather’’, ‘‘Park_joy’’sequences, respectively.
SBEðm,nÞ ¼ H
N1 X
ðm,nÞ
ðm,nÞ
9f^ 0,i f^ N1,i 9,
ð8Þ
i¼0
respectively. The relative position can refer to Fig. 3. If the SBE is high, this denotes either a high blocky effect or an existing image edge. The offset can be set to high, thereby
enhancing the filtering power with (3) and (4) to reduce the blocky effect in the H.264 loop filter. However, the H.264/ AVC standard limits the range of the offset from 6 to þ6. When the SBE is very strong, the filter still does not have enough power to smooth the discontinuous boundary even the maximum offset¼ 6 used. The performance may get
S.-C. Hsia et al. / Signal Processing: Image Communication 27 (2012) 749–759
i
0
j
. . N-1 0
N-1
0
……………
N-1
SBEH
SBEV Fig. 3. Blocky strength at horizontal and vertical boundary.
better when the offset is larger than 6. From the histogram analysis, when the best offset (BO) is over 6, most of result would be increased by one at the offset¼6 in the histogram. Hence one can give by ( if BO 4 6, Hist6 ffi Hist6 þ 1; ð9Þ if BO o6, Hist6 ffi Hist6 þ1, where the symbol, Histn, denotes the best offset n at the histogram statistic. Therefore, one can achieve Hist 6 ffi P k Z 6 Hist k , where when the offset is over 6, the best filtering in histogram all contributed to the Hist6. On the other hand, the filter must degrade its filtering ability when the SBL is low. However, the minimum offset is 6 in the H.264 standard. The Hist-6 will be increased by one when the best offset found is less than 6. One can achieve P Hist 6 ffi k r 6 Hist k . Hence we have Hist 6 ¼ Hist 6 þ Hist7 þ Hist 8... þHist P Hist 6 ¼ Hist 6 þHist 7 þ Hist 8... þ HistP :
ð10Þ
Clearly, the offset 6 and 6 will be a peak in the histogram that was verified in Fig. 2(a)–(e) from experiments. The other peak is at the zero offset in the histogram. The JM program designed the loop filter to maximize the filtering quality at the central point. The central point is zero since the offset is from 6 to þ6. The loop filter can achieve good deblocking efficiency without using the adaptive offset since the zero offset is located at a peak in the histogram. Thus, the JM program employed the zero offset as a default for its deblocking filter. 3.3. The selection of offset Based on this feature, an efficient deblocking algorithm is presented to increase the filtering quality in the decoding loop. The deblocking filter will process each pixel for the H.264 encoding and decoding system, so the computational complexity is a key concern. From the histogram analysis, when the proposed algorithm selects the offsets 6, 0, and 6 as the candidates, the probability of the best filtering performance can be about 90 percent. This approach can achieve a near optimal result while increasing few computations. Based on this feature, the detailed flowchart of the proposed deblocking algorithm
753
is illustrated in Fig. 4 for parallel processing. The parallel processing flow is suitable for VLSI (very large scale integration) implementation or multi-processor systems. For each 4 4 block filtering, the best offset can be found from the kernel of a loop filter among three offset values 0, 6, and 6. According to Eqs. (3) and (4), the variables, indexA and indexB, can be obtained as corresponding to the offset. Then, the a and b can be calculated with Eqs. (5) and (6) or table mapping. The symbols, a6 (b6), a0 (b0), and a-6 (b-6), denote the values derived from the offset values 6, 0, and 6, respectively. Once the following values, a6, a0, a-6, b6, b0 and b-6, are given, the filtering coefficients can be obtained according to a set of a and b, and the BS. The boundary pixels can be filtered with the adaptive deblocking filter. Then, the mean square error (MSE) value is computed by taking the differential with the filtered pixel and the original one. Since the current MSE calculation requires real-time operation for each pixel, the MSE operator can be simplified to save the computational cost. Instead, the distortion measure is modified with the summation of absolute error (SAE) for the (m,n)th block estimation, which can be given by SAEðm,nÞ ¼
N 1 X
ðm:nÞ
ð9ðf i,j
ðm,nÞ
f^ i,j
Þ9Þ:
ð11Þ
i,j ¼ 0
The SAE approach is used to reduce the computational complexity with only additions and subtractions. Then the best offset can be then selected with the minimum SAE value among three offsets. Finally, the best filtering output is selected from corresponding to the best offset. The filtering operation is for 4 4 blocks after the H.264 video decoding. Fig. 5 depicts the symbol for the processed pixels at the horizontal boundary. After the filtering operation, the pixels (p0, p1, p2, p3 and q0, q1, q2, q3) at the vertical boundary or the horizontal boundary will be updated. Fig. 6 shows the two adjacent blocks on a horizontal direction. The pixels p0 and q0 are adjacent on the blocking boundary, so the strength of the blocky effect produced is relatively high. A simple SAE measured method with (11) is employed to calculate the error level of the block boundary. To reduce computational complexity, the SAE value is computed with the four boundary pixels (p1, p0, q0 and q1) in each row. Since there are 4 rows in the adjacent 4 4 blocks, 8 boundary pixels are used to compute SAE. In a similar way, the SAE value is calculated for the vertical block boundary. The processing flow is from left-to-right and top-to-bottom scanning for the coded blocks. The previous row block and the left block have been filtered, so their pixels are available for SAE computing. However, the boundary pixels on the bottom and right of the block are not filtered, so these pixels are not available for evaluations. The position of the relative block is shown in Fig. 7, for SAE computing. When taking offset ¼0, the horizontal and vertical SAE can be obtained, and symbolized as SAE0H and SAE0V , respectively. The estimation for offset ¼0 is the summation of horizontal and vertical SAE as SAEof f set ¼ 0 ¼ SAE0H þ SAE0V :
ð11Þ
754
S.-C. Hsia et al. / Signal Processing: Image Communication 27 (2012) 749–759
Non-
Offset= -6, 0, 6
filtered block
Index A and Index B Generator
QP
α-6 and β-6 Table Mapping
α0 and β0 Table Mapping
BS
BS
Filtering-1 Original Pixel SAE offset=-6 Computing
Original Pixel
α6 and β6 Table Mapping
BS Filtering-2
SAEoffset=0 Computing
Filtering-3 Original Pixel SAEoffset=6 Computing
Comparison with SAE values to find a minimum one The best offset Select the best result from filtering -1 to -3 Fig. 4. The flowchart of the proposed algorithm for parallel processing.
Boundary Fig. 5. The 4 4 block edges at horizontal boundary.
As seen with a similar concept, AMSEof f set ¼ 6 and AMSEof f set ¼ 6 can be attained for offset¼ 6 and offset ¼6, respectively. The best offset (BO) can be selected with the minimum SAE as BO ¼ S½Min:ðSAEof f set ¼ 6 ,SAEof f set ¼ 0 ,SAEof f set ¼ 6 Þ,
ð12Þ
where Min(f) is the function to find the minimum value among variables (f), and S[m] is the value, m, mapping to its offset. 3.4. Complexity analysis In the H.264 deblocking filter, the offsetA and offsetB can be used to adjust the filtering quality. Since offsetA and offsetB have 13 values, the possible combination is 13 13¼169. Intuitively, the BO can be found with an exhaustive search while each offset is tested through the loop filter. There are 169 ways of filtering in order to find the best performance. The complexity of the search is too high for a real-time deblocking filter. In this study, offsetA and offsetB are used at
Fig. 6. Calculation of SAE with adjacent 4 4 blocks (gray part).
the same value. The result from offsetA and offsetB with 6, 0, and 6 can share over 90 percent of the best filtering in the histogram. Thus, one can only search the three offsets rather than 169 offsets. The filtering performance can be close to the exhaustive search, yet the complexity can be greatly reduced. In the H.264/AVC deblocking filter, the maximum number of additions is 14 for filtering the horizontal or vertical pixels in a special mode. The total requires 28 additions for one block filtering in the worst-case scenario. Each offset parameter must be evaluated through the filtering operation, and
S.-C. Hsia et al. / Signal Processing: Image Communication 27 (2012) 749–759
755
available boundary
available boundary
Non available boundary
C
Non available boundary had been filtered
C
current filteringblock
do not been filtered yet Fig. 7. The position of relative blocks for filtering.
Table 2 PSNR (dB) improvements comparsion with no filter sequences. Videos
Filter methods
QP ¼25
QP ¼30
QP ¼ 35
QP ¼ 40
Salesman (QCIF)
JM filter Proposed
0.10 0.36
0.13 0.49
0.16 0.51
0.17 0.45
Carphone (QCIF)
JM filter Proposed
0.13 0.39
0.29 0.64
0.39 0.73
0.40 0.70
Foreman (QCIF)
JM filter Proposed
0.05 0.37
0.15 0.49
0.14 0.52
0.30 0.54
Suzie (QCIF)
JM filter Proposed
0.09 0.36
0.19 0.54
0.32 0.63
0.44 0.68
Coastguard (CIF)
JM filter Proposed
0.02 0.18
0.01 0.32
0.05 0.36
0.11 0.34
Mobile (CIF)
JM filter Proposed
0.02 0.18
0.03 0.34
0.05 0.27
0.06 0.28
Footballa (CIF)
JM filter Proposed
0.09 0.24
0.08 0.29
0.08 0.21
0.07 0.18
Soccera (4CIF)
JM filter Proposed
0.15 0.37
0.21 0.46
0.24 0.51
0.19 0.45
Harboura (4CIF)
JM filter Proposed
0.09 0.37
0.18 0.42
0.17 0.40
0.15 0.39
Citya (4CIF)
JM filter Proposed
0.12 0.36
0.19 0.39
0.15 0.41
0.12 0.37
Icea (4CIF)
JM filter Proposed
0.15 0.39
0.23 0.48
0.25 0.49
0.26 0.53
Ducks_Take_Offa(1080P)
JM filter Proposed
0.12 0.36
0.22 0.51
0.24 0.52
0.17 0.48
Crowd_Runa(1080P)
JM filter Proposed
0.18 0.39
0.23 0.49
0.25 0.51
0.26 0.54
a
No training for pre-analysis sequences.
the result is estimated with (11) to find the minimum distortion. The error distortion measurement needs 15 subtractions which can be implemented with addition and twos’ complementary, for one 4 4-block filtering. Hence it requires 43 additions to evaluate one offset parameter. Totally, the full search method used 169 43¼7267
additions to find the best filtering quality. The computational cost is too high, which is not appreciated for real-time applications. The fast algorithm with PDS method [8] needs to search 16 points on average, which requires 16 43¼688 additions. With the proposed method, the number of additions can be reduced to only 3 43¼129. Clearly, the
S.-C. Hsia et al. / Signal Processing: Image Communication 27 (2012) 749–759
for each sequence testing. Table 2 shows the PSNR improvement compared with no filtering case for the various-feature video sequences under different QPs. The results show six sequences with pre-histogram analysis, and seven non-training sequences without histogram
PSNR values for Suzie, under QP=30
40 39 PSNR
proposed algorithm can greatly reduce the computational operators. A key concern is that the deblocking filter needs to be compatible with the H.264 decoding system. For adaptive offset filtering, the information about the offset must be sent to the decoder. Hence, the overhead for an extra bit is required. Since the algorithm can select from three offsets, the overhead with only two bits can stand for the offset information of each block. However, the full search and PDS [8] methods require 8 bits to cover the range form þ6 to 6 for the offsetA and offsetB. For compatibility with the H.264 format, the extra bit can be hidden in the appendix (extra information) of the frame layer or the slice layer in the encoding. The appendix is not useful for normal decoding. The decoder can play the normal videos without enhancing deblocking quality. When the high-performance deblocking filter is implemented in the decoder, the appendix can be extracted to denote the offset for each block. The image quality can be improved with the deblocking filter. To reduce the overhead for encoding, the extra bit for the offset information can be further compressed with a differential chain coding method [10] since the adjacent blocks always have the same offset.
38 37 NoFilter JMFilter Proposed
36 35 34
1
9
17 25 33 41 49 57 65 73 81 89 97 The number of frame (100 frames) PSNR value Carphone, under QP=30)
38 37.5 37
PSNR
756
36.5 NoFilter JMFilter Proposed
36
4. Experimental results
35.5 To evaluate the filtering performance, the proposed algorithm is embedded in the reference software JM 17.0 for the H.264/AVC system [7]. The quantization level QP would dominate the strength of the blocky effect. When QP is less than 25, the blocky artifact is not obvious. Hence, four different QPs (25, 30, 35, and 40) are adopted
35
31
41
51
35
25
JMFilter
25
Proposed
20 30
35 QPvalue
40
45
50
25
30
35 40 QPvalue
600
800
1000
RD curve performance for Carphone sequence
PSNR
PSNR
25 20
400
40
NoFilter JMFilter Proposed
30
20
200
45
45
35
0
Bit-rate (Kbit/s)
PSNR curve performance for Carphone sequence 40
91
No Filter
30
30
25
81
40
NoFilter JMFilter Proposed
20
71
RD curve performance for Suzie sequence
35
20
61
45
PSNR
PSNR
21
Fig. 9. (a) and (b) PSNR value in each frame, for ‘‘Suzie’’a nd ‘‘Carphone’’sequence.
PSNR curve performance for Suzie sequence
40
11
The number of frame (100 frames)
50 45
1
45
50
Fig. 8. (a) and (b) PSNR values under various QP, for ‘‘Suzie’’ and ‘‘Carphone’’ sequence, respectively.
35
NoFilter
30
JMFilter
25
Proposed
20
0
200
400
600
800
1000
1200
Bit-rate (Kbit/s) Fig. 10. (a) and (b) RD curve measured from ‘‘Suzie’’ and ‘‘Carphone’’ sequence, under various QP values.
S.-C. Hsia et al. / Signal Processing: Image Communication 27 (2012) 749–759
statistics, where only I-frames are estimated. The PSNR value of the proposed algorithm can be higher by 0.25 to 0.45 dB than the JM deblocking filter [2] with zero offset on average. The curves of the average PSNR with various QP are shown in Fig. 8(a) and (b) for ‘‘Suzie’’ and ‘‘Carphone’’ sequence, respectively. The proposed adaptive filter can achieve better quality than the traditional deblocking filter with the zero offset in each sequence. The tested curve for PSNR of each frame is shown in Fig. 9(a) and (b) for ‘‘Suzie’’ and ‘‘Carphone’’ sequences, respectively. Besides, Fig. 10(a) and (b) demonstrate the
757
RD(rate-distortion) curve with the coding bit-rate corresponding to the PSNR value, which coding bit-rate is changed by using various QP values. The approach used in this study results in better RD performance when the coding bit-rate is from 50 to 800 k. However, when the bit-rate is over 800 k, the filtering performance is slightly improved since the blocky effect is not obvious in high bit-rate coding. Now one can focus on the evaluation of subjective picture quality. Fig. 11(a)–(d) show the visual qualities of the original image, coding without filtering, the JM
Fig. 11. (a) is the original image. A reconstructed I-frame from (b) no filter, (c) adaptive de-block filter (offset ¼ 0), and (d) proposed method, for ‘‘Suzie’’sequence (QP¼ 35). (e) and (f) enlarge the local image of (c) and (d), respectively.
758
S.-C. Hsia et al. / Signal Processing: Image Communication 27 (2012) 749–759
adaptive deblocking filter (offset ¼0), and the proposed algorithm, respectively. The proposed algorithm can achieve better filtering result in vision. In order to check the image clearly, the local image is enlarged. For comparison, Fig. 11(e) and (f) show the enlarged image of Fig. 11(c) and (d), respectively. The blocky effect can be obviously removed (please see circle mark) in Fig. 11(f). Moreover, the filtering results for ‘‘Carphone’’ sequence are shown in Fig. 12(a)–(f). The proposed algorithm can
successfully remove the blocking effects and improve the picture quality. For comparison, one reference [8] with is selected that is a deblocking algorithm by adjusting the offset value. This paper presents the PDS approach to find the better offset. The number of the searching point is from 12 to 20 with four steps. The original full search requires checking 13 13 ¼169 points, while the range of each offset is from 6 to þ6. The searching complexity of the PDS method is
Fig. 12. (a) is the original image. A reconstructed I-frame from (b) no filter, (c) adaptive de-block filter (offset ¼0), and (d) proposed method, for ‘‘Carphone’’ sequence (QP ¼ 35). (e) and (f) enlarge the local image of (c) and (d), respectively.
S.-C. Hsia et al. / Signal Processing: Image Communication 27 (2012) 749–759
759
Table 3 Performance improvement comparisons with various methods. Original JM 11
Paper [8]
Proposed
Algorithm
Fixed offset
Full search
Predicted diamond search
Histogram statistic
Extra bit for offset information Check points #No of additions Computational Time per frame(ms) Complexity a PSNR improvement b
0 1 28 1.2 – –
8 169 7267 10.3 1 0.35–0.6
8 12–20 688 3.8 13%–7% 0.28–0.5
2 3 129 1.8 2% 0.25–0.45
a b
Comparison with full search for all offsets. Comparison with offsetA¼ 0; offsetB¼ 0.
about 7 to 12 percent compared with the full search method. The results are listed in Table 3. The proposed algorithm uses the histogram statistics to find the offset. After the analysis, there are three peaks in the histogram. The best offset always appears at the points 6, 0, and þ6. Only three peak points are checked with a simplified SAE method to reduce the computational complexity. The extra offset information is 2 bits for the decoder filtering. However, the PDS method [8] requires 8 bits since each offset has 4 bits to stand for 13 values. The computational complexity in this study is only 1/4–1/7 of [8], and is about only 2 percent compared with the full search method. The number of the searching point will directionally impact the CPU time. Results show that the proposed algorithm used about 1/2 and 1/5 less CPU time than the PDS method and the full search, respectively. Now the improvement of filtering quality is explored with the PSNR value for the decoding image. The proposed algorithm can improve about by 0.25 to 0.45 dB on average compared with the fixed zero-offset technique. The PSNR performance is close to [8], and degrades about 0.1 to 0.2 dB compared with the full search method. In the objective measurement, the blocky effect can be efficiently reduced for the decoding image on display. Therefore, the proposed algorithm can achieve better quality with low complexity for a low power and low bit-rate H.264/AVC system. 5. Conclusions In this paper, an efficient algorithm with the adaptive offset value is proposed to improve the blocking artifacts on the reconstructed image. With histogram statistics, the three offsets found can dominate for most of the best filtering quality. With this feature, the adaptive algorithm can improve the image quality while slightly paying the extra computational cost. The algorithm can automatically select a suitable offset to remove the blocky effect on
the blocking boundary. In objective evaluation, the PSNR of the proposed algorithm is higher than the JM loop-filter with a fixed offset by 0.25 to 0.45 dB for various sequences tested. With subjective observation, most of the blocking effects are removed and the image quality can be significantly improved. With low complexity and high filtering quality, the proposed algorithm is suitable for VLSI implementation of a low bit-rate H.264/AVC system. References [1] G. Cote, I.T.U.-T. Draft, Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264/ISO/ IEC 14496- 10 AVC), JVT-G050, Joint Video Team (JVT) of ISO/ IEC MPEG and ITU-T VCEG (2003). [2] P. List, A. Joch, J. Lainema, G. Bjøntegaard, M. Karczewicz, Adaptive deblocking filter, IEEE Transactions on Circuits and Systems for Video Tech 13 (7) (2003) 614–619. Jul. [3] Y.H. Lim, K.Y. Min, J.W. Chong, An efficient architecture of deblocking filter with high frame rate for H.264/AVC, Signal Processing: Image Communcation 21 (7) (2006) 599–607. March. [4] K.H. Lam, Reduced Complexity Deblocking Filter for H.264 Video Coding, Signals, Systems and Computers, 2005. The 39th Asilomar Conference, Nov. 2005, pp.1372–1374. [5] H. Chen, R. Hu, Y. Gao, An effective method of deblocking filter for H.264/AVC, IEEE International Symposium on Communications and Information Technologies (2007) 1092–1095. Oct. [6] H. Yadav and K.P. Rao, Optimization of the Deblocking Filter in H.264 Codec for Real Time Implementation, IEEE International Symposium on Communications and Information Technologies, Oct. 18, 2006, pp. 932–936. [7] /http://iphome.hhi.de/suchring/tml/S download/old_jm/jm17.zip. [8] S.H. Wang, S.W. Wang, Y.C. Huang, Y.S. Tung, J.L. Wu, BoundaryEnergy Sensitive Visual Deblocking for H.264/AVC Coder, SPIE Proceedings of the Applications of Digital Image Processing, XXVII, Denver, CO, Aug. 2004. [9] S.C. Hsia, B.D. Liu, J.F. Yang, Efficient postprocessor for blocky effect removal based on transform characterics, IEEE Transactions on Circuits and Systems for Video Technology 7 (6) (1997) 924–929. Dec. [10] Y.T. Hwang, Y.C. Wang, S.S. Wang, An Efficient Shape Coding Scheme and its Codec Design, IEEE Workshop on Signal Processing Systems (2001) 225–232. Sept.