Interactive representation of still and dynamic scenes

Interactive representation of still and dynamic scenes

ARTICLE IN PRESS Signal Processing: Image Communication 21 (2006) 705–708 www.elsevier.com/locate/image Editorial Interactive representation of sti...

96KB Sizes 0 Downloads 97 Views

ARTICLE IN PRESS

Signal Processing: Image Communication 21 (2006) 705–708 www.elsevier.com/locate/image

Editorial

Interactive representation of still and dynamic scenes The interactive representation of 3D objects and dynamic scenes poses a number of challenges of great interest to the image processing community. 3D objects and dynamic scenes are both created from images and interactively rendered as images. As a result, image processing knowledge is important for ongoing progress in these arenas, which historically derive from fields such as computer graphics and computer vision [6]. One aim of this special issue is to illustrate this point through the presentation of current research topics which lie at the intersection of 3D modeling and image processing disciplines. From a general perspective, the issues related to interactive representation of 3D objects and dynamic scenes can be identified as scene modeling and, once the models are available, their interactive and/or remote inspection. Depending on the particular application, the modeling step itself might be either explicit or implicit. Moreover, this step might be performed during content creation, during interactive content rendering or both. The term ‘‘interactive’’ here means that the viewing positions or angles are freely decided by a user (or client) in real time, as opposed to video where the viewing positions and angles are fixed during content creation. 3D modeling is particularly challenging in the case of dynamic scenes where the 3D model needs to be built in real time [9,5,14]. 3D modeling remains challenging also in the case of still scenes, particularly where low cost video cameras or arrays of still cameras are used in place of expensive laser scanners or structured light systems; this calls for effective solutions to classical open issues such as the missing view problem. In recent times, ‘‘image-based rendering’’ (IBR) and ‘‘video-based rendering’’ have received a lot of attention from both the computer vision and the

image processing communities [1,10,7,4,13,3,8]. These provide effective means for deriving interactive scene representations without explicit creation of a 3D model. The IBR approach leads to simplifications in the image capture process, relative to 3D modeling. At the same time, IBR strategies pose data compression challenges [2,11,12], which are of great interest to the image and video compression community. Compression issues arise also in the context of remote inspection of 3D models, either still or dynamic. On one hand, direct compression of the whole 3D model can play an important role in dramatically increasing communication efficiency, particularly where the model is dynamically evolving with time. From an alternate perspective, compression may be applied locally, in a manner which facilitates the distribution of only a subset of the complete representation, depending on the requirements of an interactive user. One major difference with respect to traditional video compression is that in the case of remote visualization of interactive 3D representations one has available a mathematical description of the scene. Standard image and video compression approaches are the logical starting point for remote visualization of interactive 3D representations; however, one can also envisage approaches based on the mathematical description of the scene which are totally different from those employed for standard video compression. The importance of these possibilities is only beginning to emerge. There exists the possibility of remotely inspecting 3D scenes by transmitting just those images required by the user at any given point in the interaction. This may involve distributing some of the modeling tasks to the user (or client). Connected with this is the possibility of progressively compressing both the local geometry and the texture of a 3D representation.

0923-5965/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2006.08.004

ARTICLE IN PRESS 706

Editorial / Signal Processing: Image Communication 21 (2006) 705–708

These possibilities raise a number of new intriguing questions, including what balance of texture versus geometry information is more visually effective and what metrics are best suited to measure such effectiveness. This special issue presents a significant sample of computer vision and computer graphics issues which should be of interest also for the image processing community. More specifically, the paper ‘‘On-line, interactive view synthesis and augmentation,’’ by Indra Geys and Luc Van Gool, describes an on-line, interactive system for enhanced teleconferencing and tele-teaching. A ‘‘virtual’’ camera is created, that can move freely between the calibrated, static ‘‘real’’ cameras. The most relevant shot can be selected—typically the frontal view for a person. Moreover, this selection of the position of the ‘‘virtual’’ camera can also be performed dynamically by the speaker. For novel view synthesis in this paper, foreground and background are treated separately. A coarse contour-matching delivers bounding boxes in 3D space, which enclose the foreground content. Within these bounding boxes, a region-based matching is performed, using graph-cut methods. The resulting object is rendered, and remaining discrepancies in the interpolation are solved on a per pixel basis; this is done by calculating the Birchfield dissimilarity measure. A more accurate, but slower algorithm is used for the background, which is considered static. Finally, the synthesized view is augmented with the introduction of a novel functionality called ‘‘virtual post-its. ’’ The paper, ‘‘Inverse Tensor Transfer with Applications to Novel View Synthesis and Multi-baseline Stereo,’’ by Hongdong Li and Richard Hartley, introduces a new method for novel view synthesis. This method avoids the need to explicitly compute the surface geometry (e.g., via a dense depth map), overcoming many of the common problems associated with conventional stereo matching algorithms. The proposed method is based on a novel inverse tensor transfer technique, which offers a computationally simple mechanism to exploit both photometric constraints and geometric constraints among multiple images. The proposed technique produces photo-realistic novel images, even though the geometry implicitly recovered by the method is highly irregular. This is because the places where accurate geometry is difficult to recover—due to a weak dependence of photo-consistency on geometry—are also the places where novel view synthesis tends to be insensitive to the geometry. The method

works well for both calibrated images and uncalibrated images. The method can also be extended to the multi-baseline stereo (MBS) matching problem; it is shown to be capable of handling nonparallel MBS stereo configurations very efficiently. The paper, ‘‘Mosaic-Based 3D Scene Representation and Rendering,’’ by Zhigang Zhu and Allen Hanson, addresses the problem of fusing images from many video cameras or from a single moving video camera. The captured images have obvious motion parallax, but they can be aligned and integrated into a few mosaics with a large field-ofview, that preserve 3D information. The authors develop a compact geometric representation that can re-organize the original perspective images into a set of parallel projections with different oblique viewing angles. In addition to providing a wide field of view, mosaics with various oblique views can provide a good representation of occlusion regions that cannot be seen in the usual nadir view. One or more stereo pairs can be formed from a pair of mosaics with different oblique viewing angles, allowing image-based 3D viewing to be achieved. This representation can be used as both an advanced video interface and a pre-processing step for 3D reconstruction. The paper presents a ray interpolation method for generating the parallelprojection mosaics. It also presents an efficient 3D scene/object rendering strategy, based on multiple parallel-projection mosaics. Several real-world examples are provided, with applications ranging from aerial video surveillance/environmental monitoring to ground mobile robot navigation and under-vehicle inspection. The paper, ‘‘Image Warping for Compressing and Spatially Organizing a Dense Collection of Images,’’ by Daniel Aliaga, Paul Rosen, Voicu Popescu and Ingrid Carlbom, addresses the compression of image-based rendering (IBR) representations. Since IBR systems create photorealistic views of complex 3D environments by resampling large collections of images captured in the environment, the quality of the resampled images increases significantly with image capture density. Thus, a significant challenge in interactive IBR systems is to provide both fast image access along arbitrary viewpoint paths and efficient storage of large image data sets. This paper describes a spatial image hierarchy, combined with an image compression scheme that meets the requirements of interactive IBR walkthroughs. By using image warping and exploiting image coherence over the image capture plane, the proposed

ARTICLE IN PRESS Editorial / Signal Processing: Image Communication 21 (2006) 705–708

technique is shown to achieve compression performance similar to traditional motion compensated video coders, while allowing image access along arbitrary paths. By exploiting the image resampling capabilities of existing graphics hardware, interactive rates can be achieved during IBR walkthroughs. The paper, ‘‘Compression of Textured Surfaces Represented as Surfel Sets,’’ by Darom, Ruggeri, Saupe and Kiryati, introduces a method for lossy compression of genus-0 surfaces represented by a set of disks with attributes, known as surfels (surface elements). Each surfel, with its attribute vector, is mapped onto a sphere in an optimally manner with respect to the preservation of geodesic distances. The resulting spherical vector-valued function is resampled. Its components are decorrelated by the Karhunen–Loeve transform, represented by spherical wavelets and encoded using the zero-tree algorithm. The paper presents methods for geodesic distance computation on surfel-based surfaces and introduces a novel efficient approach to dense surface flattening/mapping, using rectangular distance matrices. The distance between each surfel and a set of key-surfels is optimally preserved, leading to greatly improved resolution and eliminating the need for interpolation, that complicates and slows down existing surface unfolding methods. Experimental surfel-based surface compression results look promising. The paper, ‘‘A Novel Framework for the Interactive Transmission of 3D Scenes,’’ by Pietro Zanuttigh, Nicola Brusco, David Taubman and Guido Cortelazzo, approaches the problem of how to allocate resources amongst texture and geometry. Rather than approaching this problem from a general perspective, it is considered in an interactive browsing environment, with greedy optimization of a current view, conditioned on the availability of previously transmitted information for other (possibly nearby) views, and subject to a transmission budget constraint. Texture information is available at a server in the form of scalably compressed images, corresponding to a multitude of views. Surface geometry is also represented at the server in a scalable fashion. At any point in the interactive browsing experience, the server must decide how to allocate the transmission budget between the delivery of original content from the image associated with the current view, and the delivery of refined geometry details and/or enhancements to already received views, from which the client can render the new view. At the heart of the proposed

707

approach is a novel geometry-, resolution- and distortion-sensitive strategy for blending information available from different views at the client. Geometry can be represented through a collection of scalably compressed depth maps, rather than a monolithic surface model; in this case, a dual distortion-sensitive geometry synthesis procedure is proposed to blend the information from available depth maps. The paper, ‘‘Rate-Distortion-Optimized Predictive Compression of Dynamic 3D Mesh Sequences,’’ by Karsten Mu¨ller, Aljoscha Smolic, Matthias Kautzner, Peter Eisert and Thomas Wiegand, considers the compression of 3D dynamic meshes. Recent experience with the compression of general dynamic meshes has shown that the statistical dependencies within a mesh sequence can be exploited by predictive coding approaches. The coders introduced so far use algorithms tuned on the basis of experimentally determined thresholds. The paper proposes instead the use of a Lagrangian-style rate-distortion (RD) optimization methodology, to jointly optimize both coding modes and the level of detail associated with an oct-tree approximation structure. The proposed approach combines good performing prediction strategies from existing coders and also introduces a number of novel prediction modes. The general coding structure is derived from statistical analysis of mesh sequences and exploits temporal as well as spatial mesh dependencies. The coding efficiency of the developed coder is illustrated by comparative coding results for mesh sequences at different resolutions. The proposed algorithm has a variety of applications, including pure graphics applications (virtual reality, games, etc.), 3DTV and free viewpoint video. The topics presented in this special issue focus on important aspects of computer vision or computer graphics which also present interesting challenges for the image processing community. These topics span novel view synthesis, considered in practical applications such as tele-teaching and also in more theoretical settings; mosaic-based representations; and compression issues for both IBR and modelbased representations.

References [1] E.H. Adelson, J.R. Bergen, The plenoptic function and the elements of early vision, M. Landy, J.A. Movshon (Eds.),

ARTICLE IN PRESS 708

[2]

[3]

[4]

[5]

[6]

[7]

[8] [9]

[10]

Editorial / Signal Processing: Image Communication 21 (2006) 705–708 Computational Models of Visual Processing, 1991, (Online) available: hciteseer.ist.psu.edu/adelson91plenoptic.htmli. C.-L. Chang, X. Zhu, P. Ramanathan, B. Girod, Light field compression using disparity-compensated lifting and shape adaptation, IEEE Trans. Image Proc. 15 (4) (2006) 793–806. G.M. Cortelazzo, C. Guerra, Special issue on model-based and image-based 3D scene representation for interactive visualization, Comput. Vision Image Understanding 96 (2004). S. Gortler, R. Grzeszczuk, R. Szeliski, M. Cohen, Light field rendering, in: A. Press (Ed.), Proceedings of ACM SIGGRAPH 96, Computer Graphics (Annual Conference Series), ACM Special Interest Group on Computer Graphics and Interactive Techniques, New York, NY, USA, 1996, pp. 43–54. A. Hilton, Computer vision for human modelling and analysis, Comput. Vision Human Modelling Anal. 14 (2003) 206–209. A. Leonardis, F. Solina, R. Bajcsy (Eds.), Confluence of Computer Vision and Computer Graphics, vol. 84. ser. NATO Science Series 3. High Technology. Kluwer, Dordrecht, 2000. M. Levoy, P. Hanrahan, Light field rendering, in: A. Press (Ed.), Proceedings of ACM SIGGRAPH 96, Computer Graphics (Annual Conference Series), ACM Special Interest Group on Computer Graphics and Interactive Techniques, New York, NY, USA, 1996, pp. 161–170. M. Magnor, Video-Based Rendering. AK Peters, Wellesley, Massachusetts, 2005. T. Matsuyama, X. Wu, T. Takai, S. Nobuhara, Real-time 3d shape reconstruction, dynamic 3d mesh deformation and high fidelity visualization for 3d video, Comput. Vision Image Understanding 96 (2004) 393–434. P. Rademacher, G. Bishop, Multiple-center-of-projection images, in: A. Press (Ed.), Proceedings of ACM

[11]

[12]

[13]

[14]

SIGGRAPH 98, Computer Graphics (Annual Conference Series), ACM Special Interest Group on Computer Graphics and Interactive Techniques, New York, NY, USA, 1998, pp. 199–206. P. Ramanathan, B. Girod, Receiver-driven rate-distortion optimized streaming of light fields, in: Proceedings ICIP 2005, vol. 3, September 2005, pp. 25–28. P. Ramanathan, B. Girod, Rate-distortion analysis for light field coding and streaming, Signal Process. Image Commun. 21 (2006) 462–475. H.-Y. Shum, E. Petajan, J. Osterman, Special issue on image-based modeling, rendering and animation, IEEE Trans. Video Technol. 13 (2003). M. Waschbu¨sch, S. Wu¨rmlin, D. Cotting, F. Sadlo, M. Gross, Scalable 3d video of dynamic scenes, in: The Visual Computer. Proceedings of Pacific Graphics 2005, vol. 21, Springer, Berlin/Heidelberg, September 2005, pp. 629–638.

Guido M. Cortelazzo Department of Information Engineering, University of Padua, Via Gradenigo 6A, 35131 Padua, Italy E-mail address: [email protected] URL: http://www.freia.dei.unipd.it David S. Taubman School of Electrical Engineering and Telecommunications, The University of New South Wales, UNSW Sydney 2052, Australia E-mail address: [email protected] URL: http://www.ee.unsw.edu.au/taubman