Coding of video-conference stereo image sequences using 3D models

Coding of video-conference stereo image sequences using 3D models

SIGNAL PROCESSING: COMMUNICikION ELSEVIER Signal Processing: Image Communicn~ion 9 (1997) 125-13.5 Coding of video-conference stereo image seq...

3MB Sizes 7 Downloads 158 Views

SIGNAL PROCESSING:

COMMUNICikION

ELSEVIER

Signal Processing:

Image

Communicn~ion

9 (1997)

125-13.5

Coding of video-conference stereo image sequences using 3D models Sotiris Malassiotis *, Michael G. Strintzis Information Processing Laboratory, Department of Electrical and Computer Engineering, University of Thessaloniki, Thessaloniki 540 06, Greece Received 25 September

1995

Abstract In this paper we propose an object-based stereo image coding algorithm. The algorithm relies on modeling of the object structure using 3D wire-frame models, and motion estimation using globally rigid and locally deformable motion models. Algorithms for the estimation of motion and structure parameters from stereo images are described. Motion parameters are used to construct predicted images at subsequent time instances by mapping the image texture on the object surface. Coding of object parameters, appearing background regions and prediction errors is investigated and experimental results with video-conference scenes are presented. The proposed algorithm is very efficient for applications like stereoscopic video transmission, and is especially suited to advanced applications such as generation and transmission of intermediate views for multiview receiver systems, as well as applications in which an object-wise editing of the bit-stream is required, such as video-production using preanalysed scenes or virtual reality applications. Keywords:

Stereo; Synthetic and natural hybrid coding; 3D model coding

1. Introduction The

ability

of

object-based

coding

techniques

to describe

a scene in a structural way, in contrast to traditional waveform-based coding techniques, opens new areas of applications [I]. Video production, realistic computer graphics, multimedia interfaces and medical visualisation are some of the applications that may benefit by exploiting the potential of object-based schemes. In addition, object-based coding offers a viable solution to the problem of blocking artifacts produced by * Corresponding

author. E-mail: [email protected]

0923-5965/97/$17.00 @ 1997 Elsevier Science B.V. All rights reserved PIZ SO923-5965(96)00014-8

block-based hybrid-DCT algorithms at very low bit-rates. Object-based codecs consist of an analysis part in which image pixels are grouped into objects described by their shape (2D silhouette and 3D structure), motion and texture. Objects are then synthesised at the decoder using the transmitted parameters [4,15]. The main problem in object-based analysis is the automatic extraction and modeling of objects directly from the image intensities. This task may require complex image analysis techniques to segment the scene into homogeneous regions or even user interaction so that image regions correspond to real objects on the scene. Simpler approaches segment the

126

S. Malassiotis,

M. G. StrintzisISignal

Processing:

images into regions of homogeneous intensities, and model their motion by a simple affine transformation. More sophisticated techniques detect changing regions in the scene and approximate the underlying object surface by a parametrised model. The object is assumed to move rigidly or to undergo local deformations. The proposed modeling techniques are often restricted to video-phone sequences where the scene structure is known a priori and knowledge-based parametrised models of face, arms, and body may be exploited. A more general approach should be able to deal with generic objects at different degrees of detail. 3D models of objects may be alternatively derived from stereo images. This usually requires estimation of dense disparity fields, postprocessing to remove erroneous estimates and fitting of a parametrised surface model to the calculated depth map [lo]. In [ 131 an algorithm was presented which performs optimal modeling of the scene using an hierarchically structured wire-frame model, directly from intensity images. The wire-frame model consists of adjacent triangles that may be split into smaller ones over areas that need to be represented with further detail. Motion of the model surface using both the rigid and non-rigid body assumptions is estimated concurrently with depth parameters. The present paper proposes an object-based algorithm for coding of stereo image sequences. Coding of stereo images was also investigated in [3,6,24]. In [24] an extension of the MPEG standard was proposed for the coding of stereo image sequences. The images of the right channel are bidirectionally predicted either from the corresponding image of the left channel using disparity compensation or by the previous right channel image using motion compensation. In [3] the images are segmented into regions according to a motion homogeneity criterion. The surface of each object is assumed to be a rigidly moving planar patch. In [6] the depth information obtained from stereo images is directly evoked and schemes are investigated that project the depth map at subsequent time instances using 3D rigid motion parameters. The coding algorithm for stereo image sequences presented in this paper is based on the modeling scheme in [ 131. Section 2.1 presents an algorithm for efficient segmentation of the scene into objects

Image Communication

9 (1997)

125-135

and background, using prior knowledge of the scene structure. In Sections 2.2 and 2.3 we describe the proposed surface and motion models and present methods for the estimation of the model parameters from stereo images. The proposed modeling scheme is designed so as to be descriptive of a wide variety of scenes and at the same time to be of relatively low complexity. In Section 3 we discuss the update of the object silhouette and 3D shape in time using 3D motion parameters and evaluate different schemes for image prediction. In order to assure good quality reproduction of images we introduce intraframe coding of prediction error images. Regions of background pixels, appearing as foreground objects move, are intra-coded using vector quantisation, as described in Section 4. Experimental results of tests conducted on natural stereo image sequences are presented in Section 5. Concluding, in Section 6 we discuss various applications of the proposed coding scheme.

2. Modeling

2.1. Segmentation Most of the proposed segmentation algorithms define first the mathematical models and then try to find image regions conforming to the constraints imposed by the models. This procedure usually identifies as objects, regions that do not necessarily correspond to natural objects in the scene. For example, the criterion of motion homogeneity used in [3,6] may divide a physical object into several parts if the object undergoes large rotations or deformations. This oversegmentation generally results in redundancy of the transmitted model parameters and thus reduction of the coding efficiency. Further in video production applications an object-wise editing of the bit-stream is sometimes required and therefore non-natural segmentation increases the editing time and effort. Efficient segmentation may be achieved if the structure of the scene is known a priori, as is the case with video-conferencing scenes. Since coding of such scenes is important, especially for achieving tele-presence in advanced video-conference system applications, we have concentrated our attention to this type of scenes. However, more general scenes may be handled by the proposed algorithm, provided

S. Malassiotis,

M. G. Strinfzisl Signal Processing:

that a segmentation mechanism is available. Such segmentation may, for example, be effected interactively in medical applications or in applications where mixing of natural and synthetic scenes is required. It may also be available through off-line segmentation procedures in cases where video production combines two or more preanalysed sequences. The more general case will be addressed in more detail in future work. Video-conference scenes consist of one or more speakers seated side by side. Usually, only their heads and shoulders are shown on the images. Normally, there is no camera zooming or paning. The motion of the speakers is generally small, consisting of almost rigid-body motions and non-rigid motion of eyes and mouth, while abrupt movement of hands in front of the body should be expected. A template matching procedure that combines image intensity information with prior knowledge of the object(s) shape is applied, to find an initial segmentation of the objects from the static background. The scene change detected by thresholding the difference between subsequent images is also exploited to cope with moving hands. This initial segmentation is subsequently refined using a snake algorithm that refines the object boundaries by attracting them towards the intensity edges. A template is a set of image points depicting the shape of the object boundary. The boundary of the actual object may be approximated by an affine transformation of the template. The estimation of the affine transformation parameters is performed using intensity information by applying a generalised Hough transform [2]. More than one speakers may be detected by searching the Hough parametric space for additional maxima. The initial segmentation obtained using the above technique is combined with the change detection mask in order to handle large deformations of the object silhouette which may result, for example, by moving the arms. The segmentation is finally refined using an active contour model (snake) that is attracted towards the intensity edges of the object [9,11]. The segmentation information is kept in an image, which we shall term an “object map”, assigning every image pixel a label I, 1 = 1,. . . ,N, where N is the number of objects in the scene. We also use a “background map” marking pixels belonging to the background. The background may more generally denote

Image Communication

9 (1997)

125-135

127

image regions that are irrelevant and do not need to be coded. The segmentation procedure need only be performed at the first or reference stereo frames since region boundaries are updated at subsequent time instances using motion and depth information. 2.2. Surface modeling In our approach the surface of each object is approximated by a parametrically deformable wire-frame model. This consists of a set of adjacent triangular planar patches (see Fig. 1). The vertices of the triangles Pi = (xi, yi,zi) control the shape of the surface. The selected model has some desirable properties including high geometric coverage, local control, and invariance to rigid transformations and scaling. In order to approximate complex surfaces without introducing redundant control points we have applied a spatially adaptive mesh refinement procedure. This is performed by subdividing triangles into subtriangles as shown in Fig. 2. Since the introduction of new nodes destroys the consistency of the mesh, adjacent triangles are also subdivided by the introduction of dummy control points. The decision for subdividing a triangle is taken during surface parameter estimation in image regions producing a high approximation error. After the boundary of each object has been estimated using the procedure described in Section 2.1, a wire-frame surface model is fitted over the object. A parallelogram bounding the 2D region of the object is first defined. Then this parallelogram is subdivided into isosceles triangles of equal area. Finally, on eliminating triangles that are not overlapping with

Fig. 1. Wire frame.

128

S. Malassiotis,

M. G. Strimtzis ISignal

(a)

Processing:

(b)

the object, the result of a 2D triangulation of the object region is obtained. This may be considered as the projection of the corresponding 3D wire-frame model on the image plane. An initial estimate of the depth of the nodes is obtained by applying a least-squares surface fitting algorithm to a set of sparse depth estimates [ 131. These depth estimates are obtained using a simple block matching procedure at image locations with high-intensity variance. The initial estimate of the object surface model is further refined by minimising the error in matching the two stereo images: = c

{GK,

Yr) -KG,

Y/)1* + %(P),

(1)

where I/, Z, are the luminance functions corresponding to the left and right stereo views. The points (Xl, Yf), (X,, Y,) are matching points at the left and right images, respectively, and correspond to the projection of the same 3D point P = (x, y,z) on the model surface. In the case of parallel camera geometry these are related by

x=x+bf / z r



y/

=

yr,

where b is the stereo camera baseline, and f its focal length. The function E,(P) measures the divergence from smoothness of the model surface and corresponds to the energy of a thin plate spline [ 141, & = c peL{(at, - a:)’ + (P; - 8,2)2], eEE

(3)

4:-4;-4:+4: R(ql,qz,q3,q4)=

- 4344) 2(q1q3 + q2q4)

2(q1q2

9 (1997)

125-135

where E is the set of all non-boundary edges of the wire frame, 1, is the length of an edge and (aA,Pd), (az,Pz) are the slopes of the triangular patches that have a common edge e. The binary variable pe assumes the zero value if e crosses a discontinuity. Discontinuities are detected by combining information from the initial surface estimation and intensity edges on the corresponding image frame

Fig. 2. Triangle subdivision: (a) initial mesh; (b) refined mesh. (0) active nodes; (o) inactive nodes.

8(P)

Image Communication

u41. For the estimation of the depth of control points, G”(P) is minimised using an exhaustive search procedure. We visit each node Pi = (xi, yi,Zi) of the mesh independently and search for each optimum position along the ray (A& Ks,fs), where (Xi, Yi) = (fxi/zi, fy;/zi) is the perspective projection of P; on the image plane. This is a one-dimensional search, performed efficiently using interval subdivision. The above procedure is applied iteratively and using a coarse-to-fine relaxation scheme. This consists of recursively subdividing triangles over which 8(P) is large. The above estimation procedure was shown to provide us with reliable approximations of the object surface, even over areas with little texture or large photometric variations. The successive steps of the estimation procedure are illustrated in Fig. 3. 2.3. Motion modeling We shall assume that the motion of each object consists of a globally rigid motion plus local deformations and may be described by P’=RP+T+A,

(4)

where P and P’ are the 3D positions of a surface point before and after motion, R is a 3 x 3 rotation matrix, T a translation vector and A is the deformation vector. The deformation of a surface point is found by using linear interpolation from the elementary deformations Ai assigned to every node of the wire frame. The rotation matrix R may be expressed [7] in terms of the unit quaternion vector [ql, q2, q3, q4],

2(wq2 -q:+q:

qq2q3

+

4344)

-4:+q: - qiq41

qq1q3

+ 9244)

+ qlq4) -4: - 4: + 4: + 4:I 2(q*q3

(5)

S. Malassiotis,

M. G. StrintzislSignal

Processing:

Fig. 3. Wire-frame

The proposed modeling framework is capable to describe adequately a broad category of movements, including rigid motion, partially rigid motion (e.g. human motion), elastic and plastic deformations. The motion parameters Q = (qi, I;:,di) associated with each object are estimated using an iterative algorithm minimising the displaced frame difference between subsequent frames of the right stereo channel

a(Q) =

c {Gx,Y>- I’V’, y’>)‘,

(6)

xy

where (X, Y), (X’, Y’) are the perspective projections of P, P’ related by the motion equation (4), while I and I’ are the luminance functions of subsequent image frames. A steepest descent iterative scheme is then applied for the estimation of the parameter vector Q, (7) where a superscript is used to denote the iteration order. To assure convergence of the above algorithm, initial estimates of the rigid-motion parameters are needed. Several linear algorithms for estimating rigidmotion parameters from feature point correspondences are presented in [8,22,23]. In our implementation the algorithm in [23] was used. 2.4.

Coding of object parameters

The parameters that determine the boundary, depth and motion of each object must be encoded. The boundary of each object is losslessly encoded at the start of the sequence. We have used four direction

Image Communication

adaptation

9 (1997)

125-135

129

to object.

boundary following, and variable length coding of the resulting bit-stream using a fixed Huffman code table. The object boundary must be coded only at the start of the sequence, since object boundaries at subsequent time instances are updated using estimated motion parameters. The depth components of the estimated wireframe control points are differentially quantised and transmitted to the decoder only at the start of the sequence. At subsequent time instances only small updates have to be encoded. Indexes of triangles that are split are also encoded. The differential quantisation scheme consists of predicting the current depth value from the previously quantised one using row-wise scanning of the mesh, and quantising the prediction error. To note is that the x,y components of each node do not have to be transmitted, since they are uniquely defined by the object boundary as described in Section 2.1, and therefore they can be constructed at the decoder using the same procedure from the boundary of each object. The deformation parameters corresponding to each node of the wire-frame are similarly encoded and transmitted to the decoder, where stereo pairs are reconstructed using motion compensation. Rigid motion parameters are very sensitive to quantisation and therefore are transmitted losslessly (4 bytes each). Since these are relatively few, the required bit-rate overhead is negligible. Quantisation of the motion and depth parameters results in degradation of the reconstructed image quality. The prediction error resulting from parameter quantisation is larger when the area of the triangles of the mesh is large. Also, the deformation components in the x,y directions are more sensitive to

S.

130

Malassiotis. M. G. StrintzislSignal

Processing: Image Communication 9 (1997) 125-135

Table 1 Quantization characteristics of wire-frame node parameters 100 nodes

200 nodes

z

I

4

ai?

4

2

h, Ay

6

3

quantisation compared to the component in the z direction. Due to finite image resolution, perturbations in the motion or depth parameters resulting in perturbation of perspective projections of 3D points less than a half pixel, will not have any effect on the prediction error. We have experimentally found the number of bits required to be allocated to each parameter component so that the deterioration in reconstructed image quality is unnoticeable (less than 0.1 dB PSNR). The results are given in Table 1.

3. Motion compensation In Section 2 we presented algorithms for the estimation of object motion and structure parameters using a pair of consequent stereo images. These estimates will be used to predict texture, 2D shape, and depth of the objects in the scene in a subsequent time instance. The resulting prediction error images are then encoded using intra-frame coding as described in Section 4. In this section we focus on prediction of the right channel images. The control points Pi’ of the wire-frame modeling the surface of an object at time t are mapped to a new position P:+' ,

p!+‘=flf+T+,f. I I

1’

The motion

that backprojects

equation

(8) the wire frame

to time t is P’ = RTP’+l - R=T -@A.

(9)

The above equation may be used to predict the 2D shape and texture at time t + 1. The motion compensation procedure was performed by well-known computer graphics techniques, namely

texture mapping and z-buffer hidden surface elimination. It comprises of the following steps: For every triangle PI PzP3 of the forward-projected wire-frame at time instant t + 1 we find the image grid points that are inside the projection of the triangle on the image plane. The equation of the planar patch corresponding to triangle P1PzP3 is given by z=ccx+py+y,

(10)

where (x, y,z) is a point on the patch and CI,fi, y are obtained from the 3D coordinates of PI, P2, P3. Using the projection equations X = fx/z, Y = fy/z in (10) we obtain

z= f

fr

(11)

-&y_py

For every grid point (Xi, Yi) estimated in step 1, we calculate the corresponding depth value zi given by (11). If zi is greater than the value stored in the z-buffer (a floating point image buffer initialised by 00) at (Xi, Yi) then the point in respect is hidden by an other point. Otherwise, we continue with the next step. 3. Motion equation (4) is written as

4

(4)=~Tfj (:)

-RTT-RTA,

from which we calculate the image pointA (2, p) at time t corresponding to (Xi, Yi). If (X, E) is located outside of the region corresponding to the object in respect, we continue with the next grid point. Since (2, P) is not necessarily located on an integer grid point, its luminance value is evaluated by bilinear interpolation from the previously reconstructed image it, and copied to the point (Xi, &) of the predicted image. The label of the object in respect is assigned to location (Xi, Yi) of the updated object map. In this way object boundaries are updated. The procedure is repeated for every object in the 4. scene. The procedure described above is illustrated graphically in Fig. 4.

S. Malassiotis,

M. G. StrintzisISignal

Fig. 4. Texture mapping

Processing:

procedure.

4. Coding of appearing background and prediction

errors Pixels belonging to the background map are copied to the prediction image, Moving objects are predicted using the motion parameters as described in Section 3. Pixels of the prediction image not assigned any value are assumed to be background pixels appearing as objects move. Coding of these regions comprises encoding of their boundary and their texture. In order to avoid artifacts at the object boundaries, the appearing background region boundaries are encoded losslessly. Run-length coding was seen to give slightly better results than boundary following for small, elongated regions. The resulting bit-stream is further compressed using a variable length code table. Block-based intra-frame coding techniques, such as DCT are not suitable for coding the texture of arbitrarily shaped regions, because these techniques are efficient for large block sizes (e.g. 8 x 8). Efficient covering of small, elongated regions with blocks requires small block sizes in order to maximise the number of pixels included in the blocks. For this reason, shapeadaptive DCT [5] may be used instead. However, the compression produced by this algorithm is not sufficient in most cases [4]. In our approach, vector quantisation is used for texture coding. The region is segmented into 2 x 2 blocks. The vectors built from the luminance values of the current image for each block are then assigned an index corresponding to the code-book entry they are

Image Communication

9 (1997)

125-135

131

closer to. A code book with 32 code-vectors was used, trained using the LBG [ 171 algorithm on the reference frame of the sequence. Training is performed on the reference frame and the code-book is transmitted to the decoder at the start of the sequence (1024 bits). About l-l.5 bits/pixel is needed using the above vector quantisation technique. The quantised version of the background texture is kept on a buffer and the background map is updated to include the new regions. In this way no retransmission of the same pixels will be required in the case that they reappear at a future time. After the appearing background regions have been encoded, the prediction error image is formed by subtracting the prediction image from the original one. Prediction error images are subsequently encoded using the “MPEG” [16] intra-frame coding algorithm. In order to avoid accumulation of errors, every tenth frame including the first one is intra-coded.

5. Experimental results The proposed coding algorithm was investigated by computer simulations using several typical videoconference stereo image sequences. Results will be given for the sequences “sergio” [18] (256 x 256) and “Claude” ’ (360 x 288). Results for the coding of the luminance component of the right stereo channel will be presented. Similar results are obtained for the left channel. The quality of the reconstructed image sequence coded using the object-based scheme will be compared with the quality of the images reconstructed using a basic implementation [25] of the Block Based Stereoscopic Coding (BBSC) scheme described in [24], for the same bit-rate. In BBSC the images of the right channel are bidirectionally predicted either from the corresponding image of the left channel using disparity compensation or by the previous right channel image using motion compensation. Image quality was evaluated by the peak signal-to-noise ratio (PSNR) given by 2552 PSNR = 10 log,, -dB> MSE

’ This sequence was prepared by Thomson Broadband Centre de Rennes in the RACE DISTIMA project.

Systems/

132

S. Malassiotis, M. G. Strintzisl Signal Processing: Image Communication 9 (1997) 125-135

Fig. 5. (a), (c) left and right image of “sergio”,

Fig. 6. Segmentation

(b) interpolated

view using a virtual camera in the middle of the base-line.

of the first frame from the right channel of “sergio”

where MSE is the mean square error between the original and the predicted images. The bit-rate produced by the BBSC coder is regulated by varying the quantisation step-size. An intracoded frame every 10 motion compensated frames was used. For the object-based coder the bit-rate is controlled both from the quantisation step size and the number of wire-frame nodes. The results in this section were obtained by keeping the number of nodes fixed, while the quantisation step-size was allowed to vary so as to achieve various compression rates.

and the estimated

wire-frame.

The left and right frame of the stereo image sequence “sergio” is shown in Fig. 5. An intermediate view generated by placing a virtual camera in the middle of the baseline is also shown. The segmentation of the object from the static background for a frame of this sequence is illustrated in Fig. 6. The estimated wire frame consisting of 300 nodes is shown in Fig. 6. The PSNR of the reconstructed sequence for various bit-rates, compared with BBSC is shown in Fig. 7. In Table 2 we show how the total bit-rate is allocated to

S. Malassiotis, h4.G. StrintzisISignal Processing: Image Communication 9 (1997) 125-135 42

I

41 -

Table 2 Bit allocation

object-based 0.5 Mbps -o-BBSC 0.5 Mbps x

II . .. ‘Q

-....

G-- -.. x

0 K

_e_______.. *_ _,““.--*---‘“-

-@ ..

133

.i!

Intra - first frame Object structure

4 :: object-based 0.2 Mbps + BBSC 0.2 Mbtx -+-

-I

+ boundary

+ VQ tables

0.2 Mbps

0.5 Mbps

52%

30%

1%

I%

Motion parameters

11%

Appearing

23%

18%

13%

39%

background

lntra - error images

12%

1x\ x.

36

35

t ’ 1

I._

-*-____.. +_......+~i

2

3

4

frzlle

Fig. 7. PSNR plot of the reconstructed for various bit-rates.

6

7

image sequence

a

“sergio”

Fig. 8. (a), (c) left and right image of “Claude”, (b) interpolated was artificially generated).

Fig. 9. Segmentation

I 9

the coding of various error. The left, right and the first frame of the are shown in Fig. 8.

object parameters and prediction generated intermediate view of stereo image sequence “claude” In Fig. 9 the segmentation and

view using a virtual camera in the middle of the base-line

of the first frame from the right channel of “Claude” and the estimated

wire-frame

(background

S. Malassiotis, M. G. Strintzis I Signal Processing: Image Communication 9 (1997) 125-135

134 45

object-based 1 .O Mbps 6 BSSC 1 .O Mbps -+-

” 42 -

40

0

,/’

‘,

,,/ A-‘~

I

39 1

object-based 0.5 Mbps BBSC 0.5 Mbps

n x

x x

x 7‘

2

3

Y

Gile

4

6

7

8

9

f

Fig. 10. PSNR plot of the reconstructed for various bit-rates.

image sequence “Claude”

estimated wire frame are illustrated. The PSNR of the reconstructed right channel sequence for various bitrates, compared with BBSC is shown in Fig. 10. The superiority of the proposed object-based scheme over the block-based approach is clearly illustrated by the above results. The improvement is mainly due to better motion compensated prediction achieved by the model-based motion estimation algorithm. The total running time of the proposed coder for each frame is approximately 1min on a Silicon Graphics R-4000 processor. Most of the time is spent on the motion estimation stage. Decoding of the sequence may be performed with software in near real time or in real time, especially if graphics hardware is used to perform texture mapping.

6. Conclusions In this paper we have presented an algorithm for the coding of stereoscopic image sequences. The method is based on segmentation of the scene into objects and relies on efficient modeling and estimation of 3D object structure, motion and texture. Estimated object parameters are encoded and transmitted to the decoder where the images are reconstructed by using computer graphics techniques. Unlike other facial modeling techniques the proposed algorithm does not require a complicated analysis phase and was proven to be robust for the majority of scenes examined.

The proposed algorithm may be applied in a wide range of applications including: synthetic and natural hybrid coding, 3D-TV coding, computer graphics communication, video production, and virtual reality. It is most appropriate for applications where object-based editing of the bit-stream is required such as video production and synthetic and natural hybrid coding. Preanalysed, encoded sequences may be efficiently combined to produce new sequences on demand. It is also highly appropriate for the encoding and production of multiview sequences, since it permits easy construction of intermediate views. Even for less-demanding applications such as stereo video-conference, the technique was seen to outperform consistently the corresponding block-based techniques.

Acknowledgements This work was supported by the EU CEC projects RACE DISTIMA (Digital Stereoscopic Imaging and Applications, RACE project R2045) and ACTS PANORAMA (Package for New Autostereoscopic Multiview Systems and Applications, ACTS project 092).

References K. Aizawa and T.S. Huang, “Model-based image coding: Advanced video coding techniques for very low bit-rate applications”, Proc. IEEE, Vol. 83, February 1995, pp. 259-271, PI D.H. Ballard, “Generalizing the hough transform to detect arbitrary shapes”, Pattern Recognition, Vol. 13, 1981, pp. 111-122. [31 J. Dugelay and D. Pele, “Motion and disparity analysis of a stereoscopic sequence. Application to 3DTV coding”, EUSIPCO ‘92, October 1992, pp. 1295-1298. coding of image [41 P. Gerken, “Object-based analysis-synthesis sequences at very low bit-rates”, IEEE Trans. Circuits Systems Video Technology, Vol. 4, June 1994, pp. 228-235. T. Engelhardt and R. Mehlan, “Coding [51 M. Gilge, of arbitrarily shaped image segments based on a generalized orthogonal transform”, Signal Processing: Image Communication, Vol. 1, No. 2, October 1989, pp. 153-180. 161 N. Grammalidis, S. Malassiotis, D. Tzovaras and M.G. Strintzis, “Stereo image sequence coding based on threedimensional motion estimation and compensation”, Signal Processing: Image Communication, Vol. 7, No. 2, August 1995, pp. 1299145.

S. Malassiotis, M. G. StrintzislSignal

Processing: Image Communication 9 (1997)

[71 B.K.P. Horn, ‘Closed form solution of absolute orientation using union quatemions”, .I. Opt. Sot. America, Vol. 4, 1987, pp. 629-642. [81 T.S. Huang and A.N. Netravali, “Motion and structure from feature correspondences: A review”, Proc. IEEE, Vol. 82, Februrary 1994, pp. 252-268. [91 M. Kass, A. Witkin and D. Terzopoulos, “Snakes: Active contour models”, Proc. 1st Internat. Conf on Computer Vision, 1987, pp. 259-269. [lOI R. Koch, “Automatic reconstruction of buildings from stereoscopic image sequences”, Proc. EUROGRAPHZCS’93, Vol. 12, 1993. contours: Modeling, extraction, 1111 K.F. Lai, Deformable detection and classification. Ph.D. Thesis, University of Wisconsin-Madison, 1994. issues and selection devices [I21 L. Lipton, “Compatibility for stereoscopic television”, Signal Processing: Image Communication, Vol. 4, No. 1, 1991, pp. 15-20. [I31 S. Malassiotis and M.G. Strintzis, “Optimal 3D mesh object modeling for depth estimation from stereo images”, Proc. 4th European Workshop on 30 Television, Rome, October 1993. [I41 S. Malassiotis and M.G. Strintzis, “Model based joint motion and structure estimation from stereo images”, Comp. Vision Graphics Image Process., Vol. 64, July 1996, to appear. [I51 H.G. Musmann, M. Hoetter and J. Ostermann, “Objectoriented analysis-synthesis coding of moving images”, Signal Processing: Image Communication, Vol. I, No. 2, October 1989, pp. 117-138. [I61 MPEG, “MPEG video simulation model three”, Tech. Rep., ISO/IEC JTClISC2IWGll NOOlO, MPEG 901041, July 1990.

125-135

13.5

[ 171 N.M. Nasrabadi and R.A. King, “Image coding using vector quantization: A review”, IEEE Trans. Commun., Vol. 36, August 1988, pp. 957-971. [18] D.V. Papadimitriou and T.J. Dennis, “Stereo in modelbased image coding”, Proc. Picture Coding Symposium, Sacramento, California, September 1994. [ 191 S. Pastoor, “3D-Television: A survey of recent research results on subjective requirements”, Signal Processing. Image Communication, Vol. 4, No. 1, 1991, pp. 21-32. [20] R. Skerjang and J. Liu, “A three camera approach for calculating disparity and synthesizing intermediate pictures”, Signal Processing: Image Communication, Vol. 4, No. I, 1991, pp. 55-64. [21] M.G. Strintzis, D. Tzovaras and N. Grammalidis, “Depth map and disparity field coding for the communication of multiview images”, Proc. Internal. Conj: on Digital Signal Processing (DSP’95), Nicosia, Cyprus, June 1995. [22] R.Y. Tsai and T.S. Huang, “Uniqueness and estimation of three-dimensional motion parameters of rigid objects with curved surfaces”, IEEE Trans. Pattern Anal. Mach. Zntell., Vol. PAMI-6, January 1984, pp. 13-26. [23] J. Weng, T. Huang and N. Ahuja, “Motion and structure from two perspective views: Algorithms, error analysis and error estimation”, IEEE Trans. Pattern Anal. Mach. Zntell., Vol. 11, May 1989, pp. 451-476. [24] M. Ziegler, “Digital stereoscopic television - State of the European project DISTIMA”, Proc. 4th European Workshop on 3DTV, Rome, 1993. [25] D. Tzovaras, M.G. Strintzis and H. Sahinoglou, “Evaluation of multiresolution block matching techniques for motion and disparity estimation”, Signal Processing: Image Communication, Vol. 6, No. 1, March 1994, pp. 59-67.