Disparity estimation based view interpolation with a simplified prediction structure for bandwidth-limited multiview video coding

Disparity estimation based view interpolation with a simplified prediction structure for bandwidth-limited multiview video coding

The Journal of China Universities of Posts and Telecommunications December 2010, 17(Suppl. 2): 5–9 www.sciencedirect.com/science/journal/10058885 htt...

387KB Sizes 0 Downloads 120 Views

The Journal of China Universities of Posts and Telecommunications December 2010, 17(Suppl. 2): 5–9 www.sciencedirect.com/science/journal/10058885

http://www.jcupt.com

Disparity estimation based view interpolation with a simplified prediction structure for bandwidth-limited multiview video coding WEI Fang ( ), CHENG Fang-min, LU Xiao-han Digital Media Lab. School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract

High bandwidth has long been a bottleneck in multiview video technology. This paper proposed an algorithm which estimates the disparity of different viewpoints and then makes use of the estimated disparity map to interpolate virtual views of arbitrary middle viewpoints. For bandwidth-limited multiview applications, some real views can be ignored at encoder and synthesized automatically at decoder, thus the bandwidth required can be dramatically reduced. Moreover, a simplified prediction structure for this view-interpolation algorithm is addressed. The experimental results revealed that the proposed algorithm can reduce the video bitrates approximately 15%. And the subjective video quality is almost lossless, which is enough for some bandwidth-limited multiview applications, such as wireless video surveillance. Keywords disparity estimation, multiview video coding (MVC), view interpolation, prediction structure

1

Introduction1

Many video applications such as free Viewpoint TV (FTV) or video surveillance need a multitude of viewpoints to improve the user experience or enhance the safety. Sometimes, the number of viewpoints even goes as many as hundreds. For example, surveillance systems working throughout the day would impose tremendous pressure on both bandwidth and storage space. Hence for multiview video today, compression efficiency is considered to be the most critical problem to be solved. Many researchers have proposed various inter-view prediction algorithms to reduce the inter-view redundancy and further to improve the coding efficiency. Xing San [1] addressed a geometric prediction algorithm. Although its accuracy is good in small camera distance, it decreases sharply when camera distance increases. Both Shimizu [2] and Taguchi [3] have proposed depth maps based prediction algorithms. While this algorithm owns a high accuracy, the depth map to be transmitted from encoder to decoder is a paradox in terms of its quality and bitrate consumed. Besides, Received date: 01-12-2010 Corresponding author: WEI Fang, E-mail: [email protected] DOI: 10.1016/S1005-8885(09)60591-4

Taeyoung Chung described a simple view interpolation based on disparity estimation in Ref. [4]. However, his algorithm is restricted to very middle view interpolation and unable to interpolate arbitrary viewpoints between two cameras. Kenji Yamamoto [5] also provided a similar view interpolation algorithm. Nevertheless, his algorithm doesn't make use of the disparities of adjacent blocks to enhance disparity prediction accuracy of the current block. This paper proposed a view interpolation algorithm based on disparity estimation. This algorithm estimates every block's disparity and consequently builds a disparity map that will be employed to interpolate arbitrary viewpoints' view. It works well under relatively large camera distance compared to Xing San's method [1]. With the increase of disparity distance, the accuracy can keep relatively stable. Besides, our algorithm can be applied in decoder or local decoder to generate reference images for prediction without transmitting disparity vector, thus there is no need to take account of trade-off issues encountered in Refs. [2–3]. Moreover, arbitrary viewpoints can be generated through this algorithm. This will largely widen the application range of our algorithm. At last, compared with Kenji Yamamoto [5], we take the disparity relationship between current block and the adjacent blocks not only into the error energy function but also into

6

The Journal of China Universities of Posts and Telecommunications

disparity matrix to improve the accuracy. This paper is organized as follows. We elaborate on our view interpolation algorithm in Sect. 2. Taking the characteristic of this view interpolation algorithm into consideration, a simplified prediction structure is proposed in Sect. 3. Experimental results are described in Sect. 4. We conclude this paper in Sect. 5.

2

View interpolation algorithm

In order to make our algorithm easy to understand, we assume that there is only horizontal offset between cameras, although our algorithm can also handle vertical offset indeed. Under such assumption, there is only horizontal offset between two corresponding pixels in arbitrary two views. If the coordinate of a pixel is (x,y) in the left view, and its corresponding pixel is ( x  d , y ) in right view, then the offset d means the disparity of this pixel. Then the disparity concept is extended from pixels to blocks, which leads to simpler and more stable results than pixels ([5] used disparity of pixels). Assume that the coordinate of a block (the coordinate of the top-left pixel is used) in Vleft is (x, y). If the disparity of this block between Vleft and Vright is d, the block’s coordinate in Vright must be (x-d, y). Using the distance ratio ȕ, its coordinate in Vview could be determined, as in Fig. 1. To be clear, the solid rectangles in Fig. 1 stand for the corresponding blocks in Vleft, Vview and Vright respectively, and the hollow rectangles in Vview, stand for the projected blocks from Vleft and Vright to Vview.

an error energy function E(d) is defined as Eq. (2). The disparity de that minimizes E(d) can be used as the estimated disparity of a block. E ( d ) EY (d )  O ( EU ( d )  EV ( d ))  J ED_diff ( d ) (2) where:

Here, just as Fig. 2 and Eq. (1) illustrated, ȕ is the distance ratio of the distance of left view and middle image with the distance of left view and right view. It can be adjusted to represent views of arbitrary viewpoint. Llm E (1) Llm  Lmr

Fig. 2 Distance relationship of left camera Cleft, middle camera Cmiddle and right camera Cright

In order to seek the disparity value d of the current block,

1 MN

EY ( d )

\ Y (i , j )

M 1 N 1

¦ ¦\

Y

(i , j )

(3)

i 0 j 0

DleftY ( x  i  «¬ E d »¼ , y  j ) 

DrightY ( x  i  ¬«( E  1)d ¼» , y  j )

(4)

4 2 M d ( x  1, y )  d  M d ( x, y  1)  d  8 8 1 1 M d ( x  1, y  1)  d  M d ( x  1, y  1)  d (5) 8 8 The first three terms of Eq. (2) represent the pixel value differences between the two reference blocks. The fourth term ED_diff indicates the disparity differences between the predicted block with adjacent blocks. O and J are ED_diff

constant coefficients that determine the influence of each energy term. M and N indicate the block size. ED_diff indicates the difference between the disparities currently estimated with disparity of adjacent blocks. Md is the disparity matrix which is composed of the estimated disparities of all blocks. EU(d), EV ( d ) and \ U (i, j ) , \ V (i, j ) for chrominance components

can be defined similarly as Eq. (3) and Eq. (4). Then the disparity matrix is defined as follows: ­d e ; for E (d e )  Econst ° ° 4 M d ( x  1, y )  2 M d ( x, y  1)  °8 8 M d ( x, y ) ® 1 ° M ( x  1, y  1)  1 M ( x  1, y  1); d °8 d 8 ° for E (d e )ıEconst ¯

M d Fig. 1 Corresponding Blocks in left view Vleft, right view Vright and a middle view Vview

2010

wiener filter( M d )

(6)

(7)

In Eq. (6), we take the disparity of adjacent blocks into consideration to improve accuracy. And different adjacent blocks account for different weights. Moreover, we append a wiener filter in the rear of computation to cope with the sharp discontinuity of disparities and diminish the wrong disparities. As a result, the view-interpolated image Dview can be synthesized as Eq. (8): ­(1  E ) DleftY ( x  E M d , y )  E DrightY ( x  ( E  1) M d , y ) ° for x  E M d  W , x  ( E  1) M d ! 0 ° Dview Y ® for x  E M d ! W ° DrightY ( x  ( E  1) M d , y ) °  for x  ( E  1) M d  0 ¯ DleftY ( x  E M d , y ) (8) Where W indicates the width of encoding frames and DviewY

Supplement 2

WEI Fang, et al. / Disparity estimation based view interpolation with a simplified prediction structure for…

7

represents the luminance of the View Interpolated frame. DviewU and DviewV are defined in the same way. It must be emphasized that this algorithm adapts to bear both rotation and zoom variations as long as the cameras have a good vertical alignment to make sure that every frame has approximately the same rotation or zoom coefficients with its counterparts. In addition, it also works well under two dimensional camera matrix fields when applying a few modifications from Eq. (4) to Eq. (8). Nevertheless, this algorithm also has some drawbacks. It is hard to cope with situations where the cameras are in the different planes, because the interpolation Eq. (8) assumes that the cameras are in the same plane.

in Sect. 2, we proposed a simplified prediction structure as in Fig. 4.

3 Simplified prediction structure of MVC

Compared with the reference structure, we turn off the temporal prediction in view S1, S3, and S5, where the algorithm will be applied. It will not only largely reduce the computation complexity in temporal field but also improve the random access performance. And the quality of the interpolated views now depends largely on the accuracy of view interpolation algorithm. What’s more, we choose IBBIBB mode, in view field, instead of IBP mode for anchor frames. The main reason is that it alleviates the accuracy degradation resulted from the cancellation of temporal prediction in interpolated views. And it has some additional advantage. For example, IBBIBB mode has a good performance in some applications like FTV, which specially needs flexible switches between various viewpoints. Only one time is needed to switch from viewpoint S0 to S3 or S5 in IBBIBB mode because of the intra-coding of S4, while the IBP mode has twice as much switching delay. In fact, there is no necessity of encoding view S1, S3 and S5 in encoder, because the decoder can generate them based on the interpolation algorithm in Sect. 2. Thus, the coding complexity and bitrates can be reduced substantially. It is note-worthy that adding temporal prediction in interpolated views such as S1, S3 and S5 is not opposed. The encoder can use these views’ interpolated blocks as reference blocks. In that case, disparity vectors need be transmitted to the decoder to enhance the quality of interpolated views at the expense of the increase of coding complexity and bitrates.

Prediction structure plays an essential role in MVC systems. The majority of frames are predicted, using either the P-prediction or B-prediction, from the other view frames at the same time interval to reduce inter-view redundancies, as well as from frames within the same view to reduce temporal redundancies. 3.1

Reference prediction structure of MVC

A typical hierarchical prediction structure with four stages of a hierarchy, which is proposed by HHI initially and described in many papers such as Eq. (6) and Eq. (7), is depicted in Fig. 3. Ti in the horizontal direction refers to the time field while Si in the vertical direction refers to the view filed. I0 refers to intra-coded image, while P0 refers to predicted image and Bi refers to images that undergo i times of B-prediction. In this structure, S1, S3, and S5 sequences are coded after surrounding views in order to use adjacent views as reference.

Fig. 4

4 Fig. 3

3.2

Reference prediction structure of MVC

Simplified prediction structure of MVC

In order to use the view interpolation algorithm described

Simplifed prediction structure of MVC

Experimental results

Our experiments come to two parts. In the first part, we implement the view interpolation algorithm based on the simplified prediction structure. Objective experimental results are given. In the second part, we conduct an experiment to

8

The Journal of China Universities of Posts and Telecommunications

confirm that middle views of arbitrary viewpoints can be generated under our algorithm. Subjective experimental results are rendered. 4.1

Multiview video coding experiment

In our experiment, we use test sequences Akko&Kayo and rena (320 u 240) provided by Tanimoto Laboratory, Nagoya University [8]. The QP ranges from 32 to 45. The constants ȕ referred in Sect. 2 is set to 0.5 because, as Fig. 4 described, S0 and S2 are used to interpolate the very middle view S1. The comparison of rate-distortion performance between the proposed algorithms with the reference algorithms from HHI is drawn in Fig. 5.

Fig. 5

(a) Left view

5

R-D performance comparisons

On average, the proposed prediction structure can save as much as 14 % bitrate with 0.22 dB PSNR decrease for Akko&Kayo. While 0.22 dB is not so small, the subjective quality has no distinct loss as part B will reveal. Above all, the proposed structure coupled with the view interpolation algorithm has comparative superiority over the reference structure in the low bitrates scope. 4.2

Arbitrary viewpoint generation experiment

Application range of the proposed algorithm is not confined to the structure described in Sect. 3, because it can generate arbitrary viewpoint. For example, we can use S0 and S3 to interpolate S1, instead of using S0 and S2. In other words, if S1 is not the very middle view of S0 and S2, we can also use S0 and S2 to interpolate S1 as long as we can get the distance ratio ȕ from the camera parameters. We use test sequence teddy and Ima to conduct our arbitrary viewpoint generation experiment. Experimental results are shown in Fig. 6. Fig. 6(a) and (e) are left view and right view, while Fig. 6(b), (c) and (d) are interpolated views using the proposed algorithm. In Fig. 6(b), (c) and (d), the constants ȕ is set to 1/4, 1/2, 3/4 respectively. As shown, it is hard to figure out occlusion problem or conspicuous mismatched blocks in these views.

(b) 1/4 generated view (c) 1/2 generated view (d) 3/4 generated view Fig. 6 Arbitrary viewpoint generation results of teddy (up) and Ima (down)

Conclusions

Based on the experimental results in Sect. 4, we can come to a conclusion that this view interpolation algorithm can be used to generate arbitrary viewpoints and the simplified prediction structure coupled with the view interpolation algorithm can be applied in limited bitrates environments, especially in

2010

(e) Right view

bandwidth-limited or space-limited applications due to its good bitrates reduction and subjective quality reservation performance. Acknowledgements This project is supported by the Fundamental Research Funds for the Central Universities, No. 2009RC0131.

Supplement 2

WEI Fang, et al. / Disparity estimation based view interpolation with a simplified prediction structure for…

References 1. San X, Cai H, Lou J G, et al. Multiview Image Coding Based on Geometric Prediction, IEEE Trans on Circuits and Systems for Video Technology, 2007, 17(11): 1536–1548 2. Shimizu S, Kitahara M, Kamikura K, et al. Multi view video coding based on 3D warping with depth map. Multi media and Expo (ICME), 2010 IEEE Int conf, July, 2010, Suntec city, Singapore. 2010: 1108–1113 3. Taguchi Y, Naemura T. Ray-space coding based on free-viewpoint image synthesis. Journal of the Institute of Image Information and Television Engineers, 2006, 60(4): 569–575 4. Chung T, Seng K, Kim C S. Compression of 2-D wide Multi-View video

From p. 4 Acknowledgements This work was supported by the Natural Science Foundation of Jiangxi Province of China (2009GQW0007).

References 1. Munk B A. Frequency Selective Surfaces: Theory and Design, New York: John Wiley & Sons Inc., 2000

5. 6. 7. 8.

9

sequence using view interpolation, Int conference on Image Process, 2008: 2440–2443 Yamanoto K, Kitahara M, Kimata H, et al. Multiview Video Coding Using View Interpolation and Color Correction, IEEE Trans on Circuits and Systems for Video Technology, 2007, 17(11): 1436–1449 Lee C, Oh K J, Kim S H, et al. An efficient view interpolation scheme and coding method for multi-view video coding. Int Conf on Syst Signals and Image Process: Jun, 2007, Maribor, Slovenia. 2007: 102–105 Description of Core Experiments in MVC, ISO/IEC JTC1/SC29/WG11, N8019, Montreux, Switzerland, Apr, 2006 Updated Call for Proposals on Multi-View Video Coding, ISO/IEC JTC1/SC29/WG11, N7567, Nice, France, Oct, 2005

2. Wu T K. Frequency selective surfaces and grid array, New York: John Wiley & Sons Inc., 1995 3. Mittra R, Chan C H, Cwik T. Techniques for analyzing frequency selective surfaces-A review. Proc. IEEE, 1988, 76(12): 1593–1615 4. Michalski K A. Electromagnetic imaging of circular-cylindrical conductors and tunnels using a differential evolution algorithm. Microw. Opt. Tech. Lett., 2000, 27 (5): 330–334 5. Michalski K A. Electromagnetic imaging of elliptical-cylindrical conductors and tunnels using a differential evolution algorithm. Microw. Opt. Tech. Lett., 2001. 28 (3): 164–169 6. Qing A. Electromagnetic inverse scattering of multiple two-dimensional perfectly conducting objects by the differential evolution strategy. IEEE Trans. Antennas Propagat., 2003. 51 (6): 1251–1262