Insights into low-level avatar animation and MPEG-4 standardization

Signal Processing: Image Communication 17 (2002) 717–741 Insights into low-level avatar animation and MPEG-4 standardization Marius Preda*, Franc-ois...

Download PDF

1MB Sizes 0 Downloads 13 Views

Report

PDF Reader
Full Text

Signal Processing: Image Communication 17 (2002) 717–741

Insights into low-level avatar animation and MPEG-4 standardization Marius Preda*, Franc-oise Preteux Unit!e de projets ARTEMIS, Institut National des T!el!ecommunications, 9 rue Charles Fourier, 91011 Evry, Cedex, France

Abstract Referring to the new functionality of video access and coding, the survey presented here lies within the scope of MPEG-4 activities related to virtual character (VC) animation. We ﬁrst describe how Amendment 1 of the MPEG-4 standard offers an appropriate framework for virtual human animation, gesture synthesis and compression/ transmission. Speciﬁcally, face and body representation and animation are described in detail in terms of node syntax and animation stream encoding methods. Then, we discuss how this framework is extended within the ongoing standardization efforts by (1) allowing the animation of any kind of articulated model, and (2) addressing advanced modeling and animation concepts as ‘‘skin and bones’’-based approach. The new syntax for node deﬁnition and animation stream is presented and discussed in terms of genericity and additional functionalities. The biomechanical properties, modeled by means of the character skeleton that deﬁnes the bone inﬂuence on the skin region, as well as the local spatial deformations simulating muscles, are supported by speciﬁc nodes. Animating the VC consists in instantiating bone transformations and muscle control curve. Interpolation techniques, inverse kinematics, discrete cosine transform and arithmetic encoding techniques make it possible to provide a highly compressed animation stream. The new animation framework extension tools are ﬁnally evaluated in terms of realism, complexity and transmission bandwidth within a sign language communication system. r 2002 Elsevier Science B.V. All rights reserved. Keywords: Virtual character (VC) animation; MPEG-4 standard; Face and body animation (FBA); Skin&Bones (SB); Bone-based animation (BBA); Low-bit-rate compression; Animation Framework eXtension (AFX)

1. Introduction The continuous development of multimedia software and hardware technologies, together with the explosive growth of the Internet, motivate an increasing interest in effective compression tools for audio–visual content in order to drastically reduce the cost of data transmission in multimedia environments. The Moving Picture Expert Group *Corresponding author. E-mail address: [email protected] (M. Preda).

(MPEG) aims at providing standardized core technologies allowing efﬁcient storage, transmission and manipulation of audio/video/3D data. MPEG-4, an international standard since December 1999 [29], is speciﬁcally intended to cope with the requirements of multimedia applications, allowing new functionalities like video manipulation, scalable video encoding, synthetic and natural hybrid coding, 3D object compression, and face and body animation (FBA) and coding. Parametric model-based video coding provides a very low bit-rate compression, making possible

0923-5965/02/$ - see front matter r 2002 Elsevier Science B.V. All rights reserved. PII: S 0 9 2 3 - 5 9 6 5 ( 0 2 ) 0 0 0 7 7 - 2

718

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

applications like video conferencing and videotelephony, mobile communications and mobile multimedia related applications. Character animation is a well-known issue in 3D related applications, especially because of the desire to use human beings as avatars in virtual or hybrid environments. As part of Amendment 1 of the MPEG-4 standard, the Synthetic and Natural Hybrid Coding Group [29] addresses the animation of human avatars—so-called face and body animation (FBA)—by specifying (1) the avatar deﬁnition data representation, and (2) the animation data representation and compression. With the same objective of low-bit-rate streamed animation, the Animation Framework eXtension (AFX) group has currently adopted the speciﬁcations of the so-called bone-based animation (BBA) [52] allowing a more realistic animation of any kind of articulated character. AFX, which will be part of Amendment 4 of the MPEG-4 standard at the end of 2002, extends 3D related technologies of MPEG-4 Amendment 1 and addresses new ones such as subdivision surfaces, volume representation, view-dependent texture and mesh transmission. In the last decade of the 20th century, networked graphics applications using virtual characters (VC) were mostly prototype systems [7,13] demonstrating the effectiveness of the technology. At the beginning of the 21st century, the related commercial systems are booming, mainly thanks to technical developments in the area of networked games [2,19]. In this context, current efforts for providing real applications within a uniﬁed and interoperable framework are materialized by 3D graphics interchange standards such as VRML [55] and MPEG-4, AFX claiming to become the future 3D interchange reference. Animating an 3D VC involves continuously change its shape. However, different animation models can be applied. They can be structured hierarchically according to the animation control type, namely geometric, kinematic, physical, behavioral and cognitive. as proposed in [24] and illustrated in Fig. 1. When dealing with the geometric models, the animation controls act directly on the VC vertex level. Involving kinematic models enables to group

Fig. 1. CG animation modeling hierarchy.

vertices into subsets with speciﬁc kinematic rules. More complex animation controllers use the VC physical properties to generate motion through dynamic simulation. Behavioral modeling allows self-animating characters that react with respect to environmental stimuli. Recent research activities on self-learning VC have been proposed as cognitive animation models. Let us note that the animation control precision (in terms of resolution) decreases from the pyramid low levels to the higher ones. Moreover, in order to achieve the ﬁnal VC animation and thus, to deform the VC shape, the high-level controllers have to provide the low-level parameters. In order to be able to represent animation independently of the controllers used, our goal, within a standardization process, is to offer a generic and, at the same time, compact representation of the low-level animation model. Thus, we address in this paper aspects related to the ﬁrst two levels in the hierarchy: geometric and kinematic level. A comprehensive view of the evolution of 3D VC-related techniques is presented in Section 2. This survey is structured according to the three major components of an animation system (1) modeling, (2) animation, and (3) motion capture, and is based on relevant techniques reported in the literature or used by commercial products. A special attention is paid to communication systems for 3D animation data (networked virtual environments).

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

Section 3 introduces the basic concepts related to the virtual human animation tools as deﬁned in Amendment 1 of the MPEG-4 standard. The MPEG-4 virtual human body object is described in terms of (1) deﬁnition parameters, specifying the model properties (topology, geometry, texture and color), and (2) animation parameters deﬁning the 3D pose of the avatar. Section 4 shows how the FBA framework is extended within the ongoing standardization process into the BBA framework. Aiming at animating any kind of articulated model, the AFX is extensively described. The advanced modeling concepts then addressed rely on skinand-bones (SB) representation and curve-based deformations. The new syntax for node deﬁnition is analyzed in detail, respectively for the SBBone, SBSegment, SBSite, SBMuscle, and SBSkinnedModel. Then, the syntax of animation stream is discussed in terms of rotation representation, interpolation methods, and animation mask and values parameters. In Section 5, the FBA and BBA frameworks are comparatively evaluated within the speciﬁc application related to a sign language communication system. Realistic animation capabilities, compression performances, and compatibility with existing broadcast technologies are the key criteria experimentally discussed. Finally, concluding remarks and perspectives of future work are reported in the last section.

2. 3D VC animation in a standalone and a networked virtual environment The ﬁrst 3D virtual human model was designed and animated by means of the computer in the late 70s. Since then, VC models have become more and more popular, making a growing population able to impact the every day real world. Starting from simple and easy to control models used in commercial games [19,2], to more complex virtual assistants for commercial [33] or informational web sites [28], to the new stars of virtual cinema [25], television [56] and advertising [5], the 3D character model industry is currently booming.

719

Moreover, the steady improvements within the distributed network area in terms of bandwidth capabilities, bit-rate performances and advanced communication protocols have promoted the emergence of 3D communities [8] and immersion experiences [36] in distributed 3D virtual environments. Here, we present a comprehensive view of the evolution of 3D VC related techniques. This brief overview is structured according to the three major components of an animation system, namely (1) modeling, (2) animation, and (3) motion capture, which are strongly interconnected and application dependent, as we shall show in the sequel. A special attention will be paid to communication systems for 3D animation data, in other words the networked virtual environments. 2.1. VC modeling VC modeling consists in specifying the geometry and the appearance properties (color, material, texture) of a model. Designing a VC can be achieved either according to a segmented-based approach or within a seamless framework. The so-called segmented character is deﬁned as a hierarchical collection of rigid 3D objects, referred to as segments. The socalled seamless VC is geometrically deﬁned as a single and continuous mesh. The most relevant techniques used in character animation software packages for specifying the model geometry refer to surface-based representations and more speciﬁcally to (1) polygonal meshes, (2) parametric equations, and (3) implicit representations. The polygonal surface-based technique consists of explicitly specifying the set of planar polygons which compose the 3D object surface and of connecting them together [23]. Two types of information have to be kept about an object, namely, purely geometric information (coordinates of vertices) and topological information (which details how the geometric entities relate to each other). Historically, the polygonal surface-based technique is the ﬁrst method introduced in computer graphics and remains the basic tool used at the rendering stage by any other surface

720

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

representation technique. Its main advantage is the capability to deﬁne any complex surface, the smoothness depending on the number of polygons used. However, increasing the polygon number may signiﬁcantly degrade the animation performances. The strategy developed in this case consists of deriving a lower-resolution version of the VC, in animating/deforming this version, and in recovering the full resolution version by achieving a subdivision surface method [15,34,62]. Non-planar parametric representations of a surface are widely used in computer graphics because of their easy computation and simple mathematical manipulation. A non-planar surface patch, i.e. an elementary curved surface entity, is deﬁned as the surface traced out as two parameters ðu; vÞA½0; 12 ; in a two-parameter representation, Pðu; vÞ ¼ ½xðu; vÞ; yðu; vÞ; zðu; vÞ:

ð1Þ

According to the curve degree, the patch surface can be of linear, cardinal, B-spline or Be! zier type. The patch-based modeling method aims at generating patches with respect to curve proﬁle(s) and to stitch them together in order to build complex surfaces. Speciﬁc constraints have to be introduced in order to properly build a patch-based VC. In particular, it is recommended to (1) completely cover a joint with a unique patch surface, and (2) ensure the continuity of the mesh at the patch borders, by identically deforming adjacent frontiers. Mathematically, the patch-based surface representation is the limit surface resulting from the convergence of an iterative surface subdivision procedure. NURBS-based surface [43] is an extension of the B-spline patch obtained by weighting the inﬂuence of each control point. A NURBS surface of degree ðp; qÞ is deﬁned as Pm Pn i¼0 j¼0 Ni;p ðuÞNj;q ðvÞwi; j Pi; j Pðu; vÞ ¼ Pm Pn ; ð2Þ i¼0 j¼0 Ni;p ðuÞNj;q ðvÞwi; j where Ni;p and Nj;q are the B-spline basis functions, Pi; j the control points and wi; j is the weight of Pi; j : A non-uniform weighting mechanism increases the ﬂexibility of the representation, which becomes adapted for modeling a wide range of shapes with very different curvature character-

istics, while minimizing the number of control points. Metaballs [10,61], also known as blobby objects, are a type of implicit modeling technique. A metaball can be interpreted as a particle surrounded by a density ﬁeld. The density assigned to the particle (its inﬂuence) decreases with the distance to the particle location. A surface is implied by taking an isosurface through this density ﬁeld—the higher the isosurface value, the nearer it will be to the particle. The key to using metaballs is the equation specifying the inﬂuence on an arbitrary point from an arbitrary particle. Blinn [10] used exponentially decaying ﬁelds for each particle, CðrÞ ¼ b expðarÞ;

ð3Þ

and Wyvill et al. [61] deﬁned a cubic polynomial based on the radius of inﬂuence for a particle R and the distance r from the center of the particle to the ﬁeld location considered, CðrÞ ¼ 2

r3 r2 3 þ 1: R3 R2

ð4Þ

The powerful aspect of metaballs lies in the way they can be combined. By simply summing the inﬂuences of each metaball at a given point, we get very smooth blendings of the spherical inﬂuence ﬁelds (Fig. 2) allowing the realistic, organic-looking shape representation. However, achieving real-time implementation of implicit surface rendering remains difﬁcult, limiting the use of metaballs to off-line productions as in cinema industry [16]. Moreover, mapping textures on such a geometry is a difﬁcult task. Usually, the models designed with this technique have the colors speciﬁed at the vertex level or are covered by uniform (solid) colors. In practice, designing a VC highly depends on animation requirements. For example, a character

Fig. 2. Metaballs blending.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

involved in a gaming environment is usually modeled with simple polygons, while in the case of an artistic movie, characters may be either modeled as complex polygonal surfaces obtained by applying subdivision surface techniques, or directly deﬁned as NURBS surfaces. Let us now see how the animation of a 3D VC strongly depends on the modeling type. 2.2. VC animation Animating a segmented character consists in applying afﬁne transforms to each segment. The main advantages of such a simple technique are related to real-time capabilities on basic hardware conﬁgurations and intuitive motion control. However, such an animation results in seams at the joint level between two segments—Figs. 3(a) and (b). This undesirable effect can be more or less overcome by performing ad hoc methods, such as introducing spheres at the joint level, or 3D objects for masking the joints. This is the case when dealing with cartoon-like animation. Nevertheless, more realistic animation requires handling local deformations.

Fig. 3. Virtual humanoid animation.

721

Animating a seamless character consists in applying some deformations at the skin level. Major 3D mesh deformation approaches can be classiﬁed into 5 categories referred to as: 1. Lattice-based: A lattice is a set of control points, forming a 3D grid, that the user positions to control a 3D deformation. Points falling inside the grid are mapped from the unmodiﬁed lattice to the modiﬁed one using smooth interpolation. 2. Cluster-based: Grouping some vertices of the skin into clusters enables to control their displacements by using the same parameters. 3. Spline-based: Spline and, in general, curvebased deformation allows to deform a mesh with respect to the deformation of the curve. Further details are reported in Section 4. 4. Morphing-based: The morphing technique consists in smoothly changing a shape into another one. Let us mention that such a technique is very popular for animating virtual human faces. 5. Skeleton-based. The ﬁrst four categories are used in speciﬁc application cases and are more or less supported by the main animation software packages. The last category, more and more often adopted in the VC animation systems, involves the concept of skeleton. The skeleton of a VC is a set of semantic information composed in a hierarchical structure of elementary entities called bones. The seamless character is deformed by applying rigid transforms onto the skeleton bones. These transforms induce displacements of a subset of the skin vertices This technique avoids seams at joint level—Figs. 3(c) and (d). The skeleton-based animation concept will be described in detail in Section 4. To design the VC skeleton, an initialization stage is required: the designer has to specify the inﬂuence region of each bone of the skeleton as well as a measure of inﬂuence. This stage is mostly interactively and recursively repeated until reaching the desired animation effects. Deforming a seamless character remains a complex and timeconsuming task, requiring dedicated hardware. However, the increase in the performance capabilities of computer graphics hardware contributes to promote the skeleton-based animation

722

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

technique as the ‘‘standard’’ in 3D applications [1,37,31]. The current trend is in favor of subdividing the VC into seamless subparts and deforming each one independently of the others. This partitioning strategy helps to fulﬁll the realtime requirements. Within the low-level animation context, the question that arises is to identify the nature (namely, kinematic or dynamic) and the minimal number of the critical parameters which have to be speciﬁed. The kinematic type parameters are naturally involved within a forward or inverse kinematics (IK) framework. Within a forward kinematics (FK) approach, critical parameters correspond to the geometric transform applied to each bone of the skeleton. Within an IK approach, critical parameters correspond to the geometric location of an end effector. If, in the past, real-time animation based on IK was not possible because of the complexity of IK equation solving, today almost all animation packages support both animation methods. Dynamic parameters refer to physical properties of the 3D virtual object, as mass or inertia, and how external or internal forces interact with the object. Such physics-based attributes have been introduced since 1985 in the case of virtual humanlike models [3,59]. Extensive studies [6,11] on human-like virtual actor dynamics and control models for dedicated motions (walking, running, jumping, etc.) [40,41,60] have been carried out. Recently, Faloutsos et al. [21] proposed a framework making it possible to exchange controllers, a set of parameters which drive a dynamic simulation of the character and which are evolved using the goals of the animation as an objective function, resulting in physically plausible motion. Even if some positive steps have been achieved for dedicated motions, dynamically simulating articulated characters, able to perform a wide range of motor skills is still a challenging issue far from a standardized solution. However, addressing skeleton-based animation within a kinematics animation parameter representation is currently the state-of-the-art technology for VC modeling/animation, and is mature enough to be considered within an international standardization process.

2.3. Motion capture systems Computer animation of VCs and interactive virtual environments require that a motion generating source is available. The ﬁrst animation technique has been imported from cartoon animation studios: a well-trained animator roughly draws the character in the key poses and some other animators ‘‘interpolate’’ in-between. When adopting such an animation scheme in the case of a 3D VC, the role of the ‘‘less-trained’’ person is played by the computer. Indeed, the key-framebased interpolation technique is a well-spread method in 3D applications, perfectly mastered with the computer. The main difﬁculty, however, is to ﬁnd the well-trained animator. To overcome this constraint, motion capture systems were developed in the late 70s. The characteristics of an efﬁcient motion capture system are the accuracy of the captured movements, the ability to capture unconstrained movement, as well as the robustness and fastness of the calibration procedure. Motion capture technologies are generally classiﬁed into active/passive sensor-based capture according to the nature of the sensors used. Within an active sensor-based system, signals to be measured are transmitted by the sensors, while, within a passive sensor-based system, signals to be measured are obtained by light reﬂections on sensors. One of the earliest methods, using mechanical sensors [22] as active sensors, was used in a prosthetic system, which consists in a set of armatures attached all over the performer’s body and connected by using a series of rotational and linear encoders. Reading the status of all the encoders allows to analyze and retrieve the performer’s poses. The so-called acoustic method [49] is based on a set of sound transmitters pasted on the performer’s body. They are sequentially triggered to output a signal and the distances between transmitters and receivers are computed as function of the time needed for the sound to reach the receivers. The 3D position of the transmitter, and implicitly of the performer’s segment, is then computed by using a triangulation procedure or phase information.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

723

The most popular method for motion capture is based on magnetic ﬁelds [4,44]. Such a system is made of one transmitter and several magneticsensitive receivers pasted on the performer’s body. The magnetic ﬁeld intensity is measured at the receivers side and the location and orientation of each receiver are computed accordingly. Let us note that such a system is very sensitive to metallic environments and special care has to be taken. More complex active sensors are based on ﬁber optics. The principle consists in measuring the light intensity passing through the ﬂexed ﬁber optics. Such a system is usually used to equip devices like data-gloves [58]. The last class of active sensors is based on accelerometers, small devices which measure the acceleration of the body part they are attached to. When using active sensors the real actor must be equipped with a lot of cables, limiting motion freedom. However, recent motion capture systems based on wireless communication are very promising [4,44]. The second class of motion capture techniques uses passive sensors. One camera, coupled with a system of mirrors properly oriented, or several cameras make possible the 3D object reconstruction from multiple 2D views. To reduce the complexity of the analyzer, markers (light reﬂective or LEDs) are attached to the performer’s body. The markers are detected on each camera view and the 3D position of each marker is computed. However, occlusions due to performer’s motions may occur. Additional cameras are generally integrated in order to reduce loss of information and ambiguities. Since 1995, computer vision-based motion capture has become an increasingly challenging issue when dealing with tracking, pose estimation and gesture recognition oriented human motion capture. Several analysis methods have been reported [38] but the limitations imposed do not allow us to consider computer vision-based motion capture as a mature and accurate motion capture technique.

made towards creating communities populated by 3D citizens and immersion experiences as described in [36]: ‘‘Networking coupled with highly interactive technology of virtual worlds will dominate the world of computers and information technology’’. The ﬁrst complete network virtual environment (NVE), called virtual environment operating shell [12], was developed by the University of Washington. Since then an increasing number of NVEs have been proposed within special application ﬁelds or according to special architecture constraints. For example: dVS [17], created for interacting with virtual prototypes of CAD products, DIVE [14], a system which uses peer-to-peer communication, NPSNET [35] which simulates a battleﬁeld, and MASSIVE [26] which combines graphics, audio and text interfaces. The advanced functionality offered by VRE is the capability of immersion into the virtual world. VLNET [13,42] is one of the ﬁrst VREs offering realistic human representations. NVEs cover a wide range of applications: operations in dangerous environments, scientiﬁc visualization, medicine, rehabilitation and help to disabled people, psychiatry, architectural visualization, education, training, and entertainment. NVEs most often use proprietary solutions for scene graph deﬁnition and animation. Current standardization efforts within VRML and MPEG-4 (BIFS 3D, AFX, and Multiuser World), propose a uniﬁed framework, ensuring interoperability. Moreover, MPEG-4 offers low-bit-rate compression methods for scene graph components: geometry, appearance and animation. For applications dealing with populated 3D NVEs, the FBA and more recently, AFX working groups have created a 3D character representation and animation framework which:

2.4. Network virtual environment

*

Core technologies being available for modeling and animating characters, real advances can be

*

*

*

is generic enough to accept any character from simple to complex; ensures realistic rendering as well as cartoon speciﬁc effects; can be easily plugged-in with the existent motion capture techniques; supports streaming animation compressed at low bit-rate.

724

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

The following two sections present and discuss in details the FBA and AFX speciﬁcations.

3. MPEG-4 face and body animation First efforts to standardize the animation of a VC within MPEG-4 were ﬁnalized at the beginning of 1999, and dealt with speciﬁcations for deﬁning and animating a human VC. First version of the standard addresses the animation issue of a virtual human face, while Amendment 1 contains speciﬁcations related to virtual human body animation. In order to deﬁne and animate a virtual actor, MPEG-4 introduces the FBA object. Conceptually, the FBA object consists of a collection of nodes in a scene graph, which are animated by the FBA object bitstream. The shape and the appearance of the face are controlled by the bitstream instantiating facial definition parameter. The face expressions and animation are controlled by the bitstream instantiating facial animation parameter node. Virtual body geometry and color attributes are controlled by the body definition parameters (BDPs) and avatar motions by the body animation parameters (BAPs). Within the MPEG-4 framework, face animation can be performed at a high level, by using a standardized number of expressions and visemes as well as at a low level, the standard deﬁning a number of feature points on the virtual human face. Face animation within MPEG-4 has been already presented and discussed in detail [18,20,32] and is not the purpose of the present paper. Nevertheless, animation of the human body within the MPEG-4 standard is less reported in the literature [47,51] and will be addressed here below. The MPEG-4 body object is a hierarchical graph consisting of nodes associated with anatomical segments and edges deﬁning subpart relationships. Each segment is individually speciﬁed and animated by means of two distinct bitstreams, referred to as BDPs and BAPs. BDPs control the intrinsic properties of the segment, namely surface local topology, geometry and texture. BAPs deﬁne the extrinsic properties of a segment, i.e. its 3D pose with respect to a reference frame attached to the parent segment. BDPs are actor speciﬁc, hence,

the overall morphology of an actor can be readily altered by overriding the current BDPs. Contrary to BDPs, BAPs are meant to be generic. If correctly interpreted, a given set of BAPs will produce perceptually reasonably similar results, in terms of motion, when applied to different actor models speciﬁed by their own BDPs. Let us show how (1) starting from a nonarticulated and static VRML humanoid model, a complete segmentation into anatomical subparts is performed, (2) a hierarchical graph speciﬁes the body as an articulated model and (3) the BAPs generation issue is addressed. The model segmentation procedure presented in Section 3.1, as well as the BAPs production in Section 3.2, are not standardized by MPEG-4, but proposed by the authors as part of an authoring tool. 3.1. 3D virtual human body modeling The MPEG-4 body object is a segmented character as deﬁned in Section 2. The character deﬁnition, strongly based on H-Anim V2.0 speciﬁcations [27], uses scene graph nodes, namely, Humanoid node, Joint node, Segment node, and Site node. In addition, the set of human body joints (names and hierarchical topological graph) is standardized and available. Decomposing an VRML model into anatomical subparts is performed by a supervised polygonal mesh propagation algorithm. The principle is to create vertex and face adjacency lists of the submeshes associated with anatomical segments. The algorithm involves the following three main steps. The ﬁrst one, the initialization step, aims at creating interactively an ordered list of mesh vertices, the so-called joint location control vertices (JLCVs). The designer has to select between 3 and 7 vertices on the mesh, at each joint level (Figs. 4(a,b), and 5(a,b)). The second step automatically generates the related joint vertices from the JLCVs, constituting the so-called joint contour (JC). A 3D propagation-based method including angular constraints is applied to any couple of two successive JLCVs, ðvi ; vj Þ (the last JLCV is coupled with the ﬁrst one). For each vertex v belonging to the one-order neighborhood of vi ; the angle ðvvi vj Þ is computed. The vertex vmin which minimizes the

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

Fig. 4. Segmenting upper arm of the 3D model: selecting ﬁrst JC.

angular measure is selected. The procedure is iterated by replacing vi by vmin and stops as soon as there exists v (one order neighbor of the current vi ) such that v ¼ vj : The set of selected vertices will form the connection line (CL) between the initial vi and vj : By concatenating all the CLs, we obtain the JC associated with a joint (Figs. 4(d) and 5(d)). The last stage constructs the anatomical segment related to two adjacent JCs. The same propagation procedure is applied to an arbitrary pair of vertices va and vb ; each one belonging to a JC (Fig. 6(a)). An arbitrary vertex vk belonging to the CL obtained with respect to va and vb is selected (Fig. 6(b)). A 3D geodesic reconstruction (iterative elementary geodesic dilation) is applied to vk with respect to the mesh surface limited by the two JCs (Fig. 6(c)). Finally, the anatomical segment is the submesh made of the two JCs and the reconstructed component from vk (Fig. 6(d)).

725

Fig. 5. Segmenting upper arm of the 3D model: selecting second JC.

Subpart relationships between the resulting anatomical segments are represented in a hierarchical graph and follows the H-Anim speciﬁcations [27]. The tree structure of the graph deﬁnes for each component one parent node and possibly several child nodes. How does one represent the animation parameters of such a MPEG-4 compliant human body in a compact way? 3.2. 3D virtual human body animation The key concept for BAP representation is the orientation of any anatomical segment as the composition of elementary rotations, namely twisting, abduction and ﬂexion. 296 angular joint values are enough to specify any 3D pose of a virtual avatar. Angular values are speciﬁed with respect to the local 3D coordinate system of the

726

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

Fig. 7. Arm rotations with respect to standardized rotation planes.

Fig. 6. Segmenting upper arm of the 3D model: geodesic dilation.

anatomical segment. The origin of the local coordinate system is deﬁned as the gravity center of the JC common to the considered anatomical segment and its parent. The rotation planes are speciﬁed (Fig. 7) and anatomical segment rotation axes are standardized (Fig. 8). In order to facilitate BAPs manipulation, we have developed the so-called ARTEMIS avatar animation interface (3AI) (Fig. 9). The 3AI is a C++ user-friendly interface, available on X11/Motif and Windows platforms, which offers the following functionalities: (1) BAP editing, including basic and advanced instantiation techniques as linear, spline and spherical interpolation and IK; (2) 3D compositing of objects such as images, video sequences, human body models or anatomical part models (hand, arm), and 3D scenes; (3) calibration of the 3D body model according to the anthropometric characteristics of the actor in the video sequence (dimen-

Fig. 8. (a), (b) Standardized rotation axes attached to shoulder, elbow, wrist and ﬁngers.

sions of the palm, length of the ﬁngers, etc.); (4) interactive extraction of BAPs specifying any gesture posture or corresponding to the posture shown in the video sequence; (5) animation of the virtual human model according to a local resource (BAPs ﬁle) and remote resource (through UDPbased communication). 3AI has been used to generate the MPEG-4 BAPs data set for the alphabet letters used in American Sign Language (ASL) [48]. In order to provide realistic 3D deformations like muscle contraction or clothing folds and

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

727

displacements [0:1 0 0, 0:09 0 0, 0:08 0 0, 0:1 0 0, 0:22 0 0, 0:2 0 0, 0:21 0 0, 0:22 0 0, 0:29 0 0, 0:28 0 0, 0:3 0 0, 0:3 0 0, 0:35 0 0, 0:37 0 0, 0:39 0 0, 0:4 0 0, 0:4 0 0, 0:4 0 0, 0:45 0 0, 0:45 0 0] }

Fig. 9. The ARTEMIS avatar animation interface and its main functionalities (BAP editing, interpolation and UDP-based communication).

adjustments as well as to avoid the seams at the joint level (effects induced by the animation), MPEG-4 FBA includes a deformation modeling tool achieved by instanciating body deformation tables (BDTs). BDTs address non-rigid motion by specifying a list of vertices of the 3D model as well as their local displacements as functions of BAPs. Let us demonstrate the BDT concept for deforming a ﬁnger shape: BodyDefTable { bodySceneGraphNodeName ‘‘l ring proximal’’ bapIDs [143] vertexIds [40,50,51,52,53,54] bapCombinations [100, 200, 300, 450, 500] displacements [-0.11 0.12 0, 0.1 0 0, 0.1 0 0, 0.1 0 0, 0.1 0 0, 0.1 0 0, 0:25 0.25 0, 0.22 0 0, 0.22 0 0, 0.22 0 0, 0.22 0 0, 0.22 0 0, 0:35 0.37 0, 0.3 0 0, 0.3 0 0, 0.3 0 0, 0.3 0 0, 0.3 0 0 0:4 0.47 0, 0.41 0 0, 0.41 0 0, 0.41 0 0, 0.41 0 0, 0.41 0 0, 0:52 0.6 0, 0.48 0 0, 0.48 0 0, 0.48 0 0, 0.48 0 0, 0.48 0 0] } BodyDefTable { bodySceneGraphNodeName ‘‘l ring middle’’ bapIDs [143] vertexIds [0, 1, 2, 3] bapCombinations [100, 200, 300, 450, 500]

In this case, the BDTs refer to the deformation of the segments l ring proximal and l ring middle as functions of BAP #143, l ring-flexion2 The vertices with indices 50, 51, 52, 53 and 54 on surface l ring proximal are deformed. The displacements for vertex 50 are (0:11 0.12 0), (0:25 0.25 0), (0:35 0.37 0), (0:4 0.47 0) and (0:52 0.6 0) for the BAP l elbow flexion values 100, 200, 300, 450 and 500, respectively. By controlling the shape at the vertex level, the muscle-like deformations can be achieved. However, realistic deformations are possible if the deformation tables are big enough and use an important number of vertices (usually up to 30). For compactness purposes in deformation ﬁeld speciﬁcation, the standard allows a BDT interpolation method exploiting the reference BDTs associated with key frames. Fig. 10 shows the results of the BDT-based interpolation technique in the case of the simple ﬁnger ﬂexion movement. A movie presenting the BDP deformation mechanism for the ﬁnger ﬂexion is available at wwwartemis.int-evry.fr/~preda/2002ICJ.

3.3. 3D Virtual body animation parameter coding On the one hand, the independence between a generic 3D model and the BAP description makes it possible to avoid the 3D model transmission during animation. On the other hand, BAP encoding ensures a very low bit-rate transmission. Two encoding methods (predictive and DCTbased) are included in the standard. In the ﬁrst method, BAPs are quantized and coded by a predictive coding scheme. For each parameter to be coded in frame n; the decoded value of this parameter in frame n 1 is used as a prediction. The prediction error is then encoded by arithmetic coding. This scheme prevents encoding

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

728

the ASL alphabet. Related to the motion complexity, we note that all the signs are performed with one hand and the avatar body position and orientation do not change. The DCT-based coding method splits BAP time sequences into BAP segments made of 16 consecutive BAP frames. Encoding a BAP segment includes three steps achieved for all BAPs: (1) determining the 16 coefﬁcient values by using discrete cosine transform (DCT), (2) quantizing and coding the AC coefﬁcients and (3) quantizing and differential coding of the DC coefﬁcients. The global quantization step Q for the DC coefﬁcients can be controlled and, the AC coefﬁcients global quantization step is 1/3 from Q. The continuous component coefﬁcient (DC) of an intra-coded segment is encoded as it is and, for an inter-coded segment, the DC coefﬁcient of the previous segment is used as a prediction of the current DC coefﬁcient. The prediction error and alternative component coefﬁcients (AC), (for both inter- and intra-coded segments), are coded by using Huffman tables. Table 1 shows the results obtained by applying both encoding methods, predictive-based and DCT-based to BAP ﬁles associated with signs ‘‘A’’ to ‘‘L’’. In our experiments, the animation frame rate equals 10. In order to objectively compare both coding schemes, we introduce the distortion measure between the original and the decoded sequences,

error accumulation. Since BAPs can be assigned with different precision requirements, different quantization step sizes are applied. They consist of a local (BAP speciﬁc) step size and a global one (used for bit-rate control). The quantized values are passed to the adaptive arithmetic encoder. The coding efﬁciency is increased by providing the encoder with range estimates for each BAP. We have tested the MPEG-4 animation parameters encoding schemes on BAP data corresponding to

Fig. 10. BDTs interpolation. Key frames (a) and (d). Interpolated frames (b) and (c).

Table 1 Bit-rates for the predictive-based and DCT-based coding schemes Sign

A B C D E F G H I J

Q¼1

Q¼2

Q¼4

Q¼8

Q ¼ 16

Q ¼ 31

P

DCT

P

DCT

P

DCT

P

DCT

P

DCT

P

DCT

1.32 1.41 1.80 1.45 1.75 1.37 1.81 1.78 1.53 1.37

2.80 2.61 3.65 2.66 3.23 2.04 2.83 2.83 2.70 2.20

1.30 1.35 1.78 1.42 1.71 1.33 1.75 1.74 1.49 1.33

2.42 2.22 3.15 2.30 2.85 1.83 2.43 2.39 2.39 1.89

1.25 1.34 1.71 1.38 1.65 1.31 1.72 1.70 1.46 1.31

1.83 1.64 2.24 1.69 2.13 1.36 1.76 1.70 1.73 1.34

1.22 1.32 1.68 1.35 1.62 1.28 1.67 1.65 1.41 1.27

1.54 1.36 1.87 1.40 1.74 1.12 1.47 1.40 1.45 1.11

1.18 1.32 1.63 1.32 1.57 1.26 1.63 1.63 1.39 1.24

1.14 1.08 1.41 1.01 1.30 0.88 1.12 1.08 1.10 0.82

1.14 1.29 1.58 1.31 1.53 1.25 1.57 1.58 1.38 1.21

0.86 0.81 1.10 0.74 1.00 0.67 0.84 0.76 0.81 0.59

Results for signs ‘‘A’’ to ‘‘L’’. Q denotes the global quantization value.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

deﬁned as the mean square error of the BAP vectors: D¼

1 No2 frames

No2 frames X

ðdÞ 2 8BAPðoÞ i BAPi 8 ; ð5Þ

i¼0

ðdÞ where BAPðoÞ i and BAPi denotes, respectively, the original and the decoded BAP vectors of frame i: Let us analyze the compression methods performances and behaviors by studying the bit-rate as a function of the distortion. The encoding methods allow to control the bit-rate by modifying the global quantization step (Fig. 11(II)). Computing for each quantization step, the distortion as expressed in Eq. (5), enables to plot Fig. 11(I). Fig. 11(III) results from the combination of Figs. 11(I) and (II). Let us observe that both curves (Fig. 11(III)) are decreasing, the slope of the predictive-based curve is signiﬁcantly smaller than the one of the DCTbased curve. When dealing with wide range of bitrates, the DCT-based method is more appropriate. In the case of applications requiring near loss-less

729

compression, the use of the predictive-based method is recommended. In summary, FBA speciﬁcations related to virtual human characters offer a basic framework for animation, allowing the representation of the avatar as a segmented character. Moreover, the motion parameterization is very compact and makes it possible, in the compressed form, to achieve streaming animation in the very low-bitrate network environment. However, the FBA framework is limited to human-like VC animation. Therefore, to animate any articulated ﬁgure, MPEG-4 proposes the BBA speciﬁcations.

4. MPEG-4 bone-based animation 4.1. Context and objectives An articulated ﬁgure, also called kinematics linkage, consists of a series of rigid links that are connected at joints. In order to deﬁne a static 3D pose of an articulated ﬁgure including geometry,

Fig. 11. Experimental results for sign ‘‘B’’. Predictive (bullet) vs. DCT BAP coding schemes. Distortion versus Q (I), Bit-rate versus Q (II) and Distorsion versus bit-rate (III).

730

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

color and texture attributes, the functionality addressed consists in considering the entire ﬁgure as one single 3D mesh referred to a global seamless mesh. In this case, the bone skeleton is provided together with a ﬁeld of weighting vectors specifying, for each vertex of the mesh, the related inﬂuence of the bones directly altering the 3D position of the vertex. Moreover, the weighting factors can be speciﬁed implicitly and more compactly by means of two inﬂuence regions deﬁned through a set of cross sections attached to the bone. The generic method consisting in animating a VC from its skeleton is called Skin&Bones. Applications using this concept deal with the realistic animation of any kind of articulated model. AFX Skin&Bones related nodes allow the deﬁnition of any kind of skeleton hierarchy, with any kind of attached geometry (based on IndexedFaceSet or higher order geometry nodes like NURBS and subdivision surfaces), and any appearance attributes. The Skin&Bones animation bitstream—simply called BBA—provides a compact representation of motion parameters. The seamless mesh-based representation overcomes the current limitations of MPEG-4 Amendment 1 and is able to provide a realistic ﬁgure animation without specifying deformation information. AFX speciﬁcations also make it possible to deﬁne seamless parts of an articulated model and to group them together in order to achieve real-time animation. Two aspects are addressed by the Skin&Bones framework. The ﬁrst one deals with the deﬁnition of the skinned model as a static model by means of the geometry and the appearance. Also as deﬁnition part of the model, a hierarchical skeleton is semantically attached. The second aspect deals with the animation of articulated models, and more speciﬁcally with the compressed representation of the animation parameters. 4.2. Skinned model definition parameters 4.2.1. Semantics Deﬁning a skinned model involves specifying its static attributes as well as its animation behavior. From a geometric point of view, a skinned model

is such that the set of vertices which belong to the ‘‘skin’’ of the model is deﬁned as a unique list. All the shapes which form the skin share the same list of vertices. This representation type avoids seams at the skin level during the animation stage. To ensure the possibility of deﬁning various appearances at different levels of the skinned model, the skin is deﬁned as a collection of shapes. For each one, it is possible to deﬁne its own set of color, texture and material attributes, and each one includes a geometry node ﬁeld which refers to the skinned model vertices list. The animation behavior of a skinned model is deﬁned by means of a skeleton and its properties. The skeleton is a hierarchical structure constructed from bones. Three types of information are associated with a bone: 1. the relative geometrical transformation of the bone with respect to its parent in the skeleton hierarchy; 2. the inﬂuence of the bone movement on the surface of the articulated model; 3. IK related data, speciﬁc to the considered bone. Each bone has an inﬂuence on the skin surface. Thus, by changing one bone position or orientation, some vertices from the model skin will be affected by translation components, speciﬁed for each vertex. Here, deﬁning the skinned model consists in specifying for each skeleton bone an inﬂuence region, i.e. the subset of affected skin vertices and the related measure of affectedness (a set of weighting coefﬁcients). The inﬂuence region can be directly speciﬁed by the designer or can be computed just before performing animation. In the ﬁrst case, the list of affected vertices and the weighting coefﬁcients are part of the bone deﬁnition. In the second case, the key concept relies on a family ðjd Þd of bone related weighting functions deﬁned on arbitrary planes at a distance d from the bone center and being perpendicular to the bone. The support of the planar weighting function jd is partitioned into three speciﬁc zones (Zint ; Zmid and Zext ) by two concentric circles. These zones are deﬁned by two parameters, namely the inner ðrd Þ and outer ðRd Þ radius (Fig. 12).

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

731

The bone inﬂuence zone being deﬁned, animating the VC consists in translating mesh vertices with respect to the bone transforms. For any skin vertex vi ¼ ðtix ; tiy ; tiz ÞT ; the new position induced by the bone bj geometrical transformation is computed in three steps as follows: (1) calculate transform matrix [55] Mj : Mj ¼ Tbj Cbj Rbj SRbj

Sbj ðSRbj Þ1 ðCbj Þ1 ; Fig. 12. The support partitioning of the planar weighting function jd ðrd ; Rd Þ:

ð7Þ

where Tb ; Rb ; Sb ; SRb and Cb are the bone translation, rotation, scale, scaleOrientation and center matrices, respectively. (2) compute the displacement vector: dij ¼ Mj vi wji ;

ð8Þ

wji

where is the weighting coefﬁcient related to bone bj and associated with vi ; (3) compute the new position of vi : vi ’vi þ dij :

Fig. 13. Hand shape, skeleton and bone planes.

The planar weighting function jd ðrd ; Rd Þ is deﬁned as follows: 8 1; 8xAZint ; > > > < dðx; Z Þ ext ð6Þ jd ðrd ; Rd ÞðxÞ ¼ f ; 8xAZmid ; > R rd d > > : 0; 8xAZext ; where dðx; Zext Þ denotes the Euclidean distance from x to Zext and f ð Þ is a user speciﬁed fall-off to be chosen among the following standardized functions: x3 ; x2 ; x; sinðxÞ; x1=2 ; x1=3 : The number and position (distance from the bone center) of the planes are speciﬁed by the designer. Fig. 13 shows an example of a skeleton and the surface mesh representing the skin. Here, two planes have been considered relatively to the ﬁrst phalange of the index ﬁnger.

ð9Þ

Let us note that a bone movement is always speciﬁed by a geometric transformation with respect to the initial registration of the VC’s static position. A set of adjacent bones forms a kinematics chain. For kinematics chains with a large number of elements, it is more appropriate to animate them by using IK techniques and not by directly specifying the transformation of each element. To support this kind of animation, the skinned model deﬁnition is enriched with IK related data. Because of the rapid evolution in hardware capabilities, it is not appropriate that the standard impose a speciﬁc IK solver. Supporting IK is reduced to deﬁning speciﬁc rotation and translation constraints at the bone level. Since a bone is just a semantic entity, the end effector is usually deﬁned as a 3D point on the skinned model surface. In many cases, especially for complicated skinned models, it is possible to identify regions on the skin which are inﬂuenced by a small subset of the skeleton bones. In such cases, to address optimized animation, the model has to be considered as a collection of skinned models. Thus, the bones from one skinned model do not deform

732

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

the skin of others. The typical example is to consider the ﬁngers, hand and a part of the forearm as a 3D object segmented from the rest of an humanoid character. The upper part of the forearm and the rest of the body induce changes of the position and the orientation of this object in the scene, but do not induce deformation effects on the segment skin. With this optimization in mind, the deﬁnition of a skinned model within AFX allows to add other 3D objects at any level of the skeleton, including other skinned models. The bones which compose the skeleton are rigid objects and while their transformation is possible, deformation is not. To address a realistic skinned model animation, the ‘‘muscle’’ concept is introduced. A ‘‘muscle’’ is a curve with an inﬂuence region on the model skin and which can be deformed. To ensure a generic representation of the ‘‘muscle’’ form, a curve representation based on NURBS is used. 4.2.2. Node specification Within a seamless-mesh-based representation, the descriptive structure relies on nodes deﬁnition. The MPEG-4 standard architecture, based on VRML, allows to add new tools by deﬁning related nodes and associated stream. Within this requirement, our contribution [45,46] to the standard consists in specifying the nodes interface in order to support the concepts described in the previous subsection. Let us detail the nodes structure, by describing each ﬁeld. The SBBone node speciﬁes data related to a bone of the skeleton while the SBSkinnedModel node is related to character skin properties. The SBSegment node enables to add into the skeleton hierarchy any kind of standalone 3D object. To address the IK issue or to deﬁne semantic points on the model surface, the SBSite node is introduced. In order to take into account local deformation effects based on curve deformation, the SBMuscle node is deﬁned. Let us note that a similar approach was recently adopted by the H-Anim 2001 speciﬁcations [27], but exclusively within a humanoid animation framework.

4.2.2.1. SBBone node. The syntax of the SBBone node is the following: SBBone{ eventIn eventIn exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField exposedField

MFNode addChildren MFNode removeChildren SFInt32 boneID MFInt32 skinCoordIndex MFFloat skinCoordWeight SFVec3f endpoint SFInt32 falloff MFFloat sectionPosition MFFloat sectionInner MFFloat sectionOuter SFInt32 rotationOrder MFNode children SFVec3f center SFRota- rotation tion SFVec3f translation SFVec3f scale SFRota- scaleOrientation tion SFInt32 IkchainPosition MFFloat IkyawLimit MFFloat IkpitchLimit MFFloat IkrollLimit MFFloat IKTxLimit MFFloat IKTyLimit MFFloat IKTzLimit

0 [] [] 001 1 [] [] [] 0 [] 000 0010 000 000 0010 0 [ [ [ [ [ [

] ] ] ] ] ]

} The SBBone node speciﬁes four kinds of information: semantic data, bone motion, boneskin inﬂuence and bone IK constraints. The boneID ﬁeld is a unique identiﬁer which allows the bone to be addressed at run-time. The complete rigid transform of the bone as well as a scale factor with respect to an arbitrary direction are speciﬁed by means of center, translation, rotation ﬁelds and scale, scaleOrientation ﬁelds, respectively. Thus, the possible geometric

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

3D transformation consists of (in order): (1) (possibly) a non-uniform scale for an arbitrary point, (2) a rotation about an arbitrary point and axis and (3) a translation. The rotationOrder ﬁeld speciﬁes the rotation order when dealing with the decomposition of the rotation with respect to the coordinate system axes. Two ways of specifying the skin inﬂuence region of the bone are possible: 1. directly ‘‘painting’’ the bone inﬂuence on skin vertices by instantiating the skinCoordIndex and skinCoordWeight ﬁelds. The skinCoordIndex ﬁeld enumerates the indices of all the skin vertices affected by the current bone. Mostly, the skin inﬂuence region is made of vertices belonging to the 3D neighborhood of the bone. However, special inﬂuence conﬁgurations can be addressed. The skinCoordWeight ﬁeld is a list of weights (one per vertex listed in skinCoordIndex) that measures the contribution of the current bone to the vertex under consideration. The sum of all the skinCoordWeight related to a given vertex must be 1. 2. compute the inﬂuence as a measure of distance between skin vertices and the bone within several planes (cf. Section 4.2). The sectionInner ﬁeld (respectively, sectionOuter ﬁeld) is a list of inner (respectively outer) inﬂuence region radii for different planes. The sectionPosition ﬁeld is a list of the plane orientations deﬁned by the designer. The falloff ﬁeld speciﬁes the function between the amplitude affectedness and distance: 1 for x3 ; 0 for x2 ; 1 for x; 2 for sinðxÞ; 3 for x1=2 and 4 for x1=3 : In order to compute the inﬂuence region as explained in Section 4.2, the localization of the bone is speciﬁed by the center and endpoint ﬁelds. The two schemes can be used independently or in combination. In the latter case, the individual vertex weights take precedence. IK related information of a bone deals with positioning the bone into a kinematics chain and deﬁning possible motion constraints of the bone. If the bone is the root of an IK chain then IKchainPosition ¼ 1: In this case, when applying the IK scheme, only the orientation of the bone is changed. If the bone is the last element in the

733

kinematics chain IKchainPosition ¼ 2: In this case, the animation stream has to include the desired position of the bone (X ; Y and Z within world coordinates). If IKchainPosition ¼ 3; the bone belongs to the IK chain but is not the ﬁrst nor the last one in the chain. In this case, position and orientation of the bone are computed by the IK procedure. Finally, if the bone does not belong to any IK chain ðIKchainPosition ¼ 0Þ; it is necessary to transmit the bone local transformation in order to animate the bone. If an animation stream contains motion information about a bone which has IkchainPosition ¼ 1; this information will be ignored. If an animation stream contains motion information about a bone which has Ikchain Position ¼ 3; this means that the animation producer wants to ensure the orientation of the bone and the IK solver will use this value as a constraint. IK constraints of a bone are related to orientation and translation information. The IkyawLimit (respectively, IkpitchLimit and IkrollLimit) ﬁeld consists in a pair of min/max values which limit the bone rotation with respect to the X (respectively, Y and Z) axis. The IKTxLimit (respectively IKTyLimit and IKTzLimit) ﬁeld consists in a pair of min/max values which limit the bone translation in the X (respectively, Y and Z) direction. The SBBone node is used as a building block to describe the hierarchy of the articulated model by attaching one or more child objects. The children ﬁeld has the same semantic as used in VRML. Moreover, to support the dynamically change of the structure of the skeleton, the node contains addChildren and removeChildren ﬁelds. The absolute geometric transformation of any child of a bone is obtained through a composition with the bone-parent transformation. 4.2.2.2. SBSegment node. The SBSegment node syntax is the following: SBSegment{ exposedField SFString name exposedField SFVec3f centerOfMass exposedField SFVec3f momentsOfInertia

‘‘’’ 000 [000 0000 00]

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

734

exposedField exposedField eventIn eventIn

SFFloat MFNode MFNode MFNode

mass 0 children [] addChildren removeChildren

exposedField SFVec3f exposedField SFRotation exposedField SFVec3f eventIn MFNode eventIn MFNode

} The name ﬁeld must be present, so that the SBSegment can be identiﬁed at run time. Physics properties of a segment are deﬁned by mass (the total mass of the segment), centerOfMass (the location within the segment of its center of mass) and momentsOfInertia (the moment of inertia matrix) ﬁelds. The children ﬁeld can be any object attached at this level to the skeleton, including an SBSkinnedModel. An SBSegment node is a grouping node especially introduced to address two issues: 1. the requirement to separate different parts of the skinned model into deformation-independent parts. Between two deformation-independent parts, the geometrical transformation of one of them does not imply skin deformations on the other. This is essential for run-time animation optimization. The SBSegment node may contain an SBSkinnedModel node as a child. Portions of the model which are not part of the seamless mesh can be attached to the skeleton hierarchy by using an SBSegment node; 2. The requirement to attach standalone 3D objects at different parts of the skeleton hierarchy. For example, a ring can be attached to a ﬁnger; the ring geometry and attributes are deﬁned outside of the skinned model but the ring will have the same local geometrical transformation as the attached bone. 4.2.2.3. SBSite node. The syntax of the SBSite node is expressed as follows: SBSite{ exposedField exposedField exposedField exposedField

SFVec3f MFNode SFString SFRotation

center children name rotation

000 [] 0000

0010

scale 111 scaleOrienta- 0 0 1 0 tion translation 0 0 0 addChildren removeChildren

} The SBSite node indicates a precise 3D point which may or may not belong to the skin used usually to localize an end effector. The 3D point is obtained by adding to the current transformation one obtained from the center, rotation, scale, scaleOrientation and translation ﬁelds. The children ﬁeld is used to store any object that can be attached to the SBSegment node. The SBSite node can be used in three cases: (1) to deﬁne an ‘‘end effector’’, i.e. a location which can be used by an IK solver, (2) introduce an attachment point for accessories such as clothing, (3) to specify a location for a virtual camera in the reference frame of an SBSegment node. SBSite nodes are stored within the children ﬁeld of an SBSegment node. The SBSite node is a specialized grouping node that deﬁnes a coordinate system for nodes in its children ﬁeld that is relative to the coordinate system of its parent node. The reason an SBSite node is considered as a specialized grouping node is that it can only be deﬁned as a child of an SBSegment node. 4.2.2.4. SBMuscle node. The syntax SBMuscle node is deﬁned as follows:

of

the

SBMuscle{ exposedField MFInt32 skinCoord- [ ] Index exposedField MFFloat skinCoord- [ ] Weight exposedField SFNode muscleCurve exposedField SFInt32 radius 1 exposedField SFInt32 falloff 1 } The SBMuscle node enables to add local deformation for simulating muscle action at the skin level. A muscle will be deﬁned by a curve and by the area of inﬂuence of the curve. In general, a

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

muscle is a child of an SBBone node. The muscle inﬂuence can be deﬁned according to two mechanisms: 1. direct ‘‘painting’’ by listing the affected vertices of the skin (skinCoordIndex) and the affectedness measure (skinCoordWeight). 2. inﬂuence region computing by radius and falloff ﬁelds. The radius ﬁeld, speciﬁes the maximum distance where the ‘‘muscle’’ will affect the skin. The falloff ﬁeld speciﬁes the function between the amplitude affectedness and distance: 1 for x3 ; 0 for x2 ; 1 for x; 2 for sinðxÞ; 3 for x1=2 and 4 for x1=3 : The muscle curve representation is based on NurbCurve representation as deﬁned in [9], NurbsCurve { ﬁeld MFFloat knot [] ﬁeld SFInt32 order 3 exposedField MFVec3f controlPoint [ ] exposedField MFFloat weight [] exposedField SFInt32 tessellation } Performing deformation consists in changing the form of the muscle curve by affecting (1) the position of the control points of the curve, (2) the weight of control points or/and (3) the knot sequence. Depending on the producer, the animation stream contains one animation mechanism or a combination of (1), (2) and (3). At the modeling stage, each affected vertex vi of the skin is assigned to the closest point vci of the curve. During the animation, the translation of vci obtained from the updated values of the controlPoint, weight or/and knot ﬁelds, will induce a translation on vi : 1. skinCoordWeight ﬁeld is speciﬁed for vertex vi ; then Tvi ¼ skinCoordWeight½k * Tvci ;

ð10Þ

where k is the index of vertex vi in the model vertices index list; 2. radius ﬁeld is speciﬁed, then dðvi ; vci Þ Tvc ð11Þ Tvi ¼ f radius * i with f ð Þ speciﬁed by the falloff ﬁeld.

735

(e) SBSkinnedModel Node The SBSkinnedModel node syntax is the following: SBSkinnedModel{ exposedField exposedField exposedField exposedField exposedField exposedField

0000

SFString

name

SFVec3f

center 0 0 0

SFRotation SFVec3f

rota- 0 0 1 0 tion trans- 0 0 0 lation scale 1 1 1

SFVec3f SFRotation

exposed- MFNode Field exposed- SFNode Field exposed- SFNode Field

scale- 0 0 1 0 Orientation skin [] skinCoord skinNormal skeleton bones

NULL NULL

exposed- MFNode [] Field exposed- MFNode [] Field exposed- MFNode mus- [ ] Field cles exposed- MFNode seg[] Field ments [] exposed- MFNode sites Field exposedField SFNode weighsComputationSkinCoord NULL } The SBSkinnedModel node is at the top of the hierarchy of Skin and Bones related nodes and contains the deﬁnition parameters for the entire seamless model or for a seamless part of the model. Mainly, the node contains: 1. a geometrical transformation which poses the character in the scene when no animation is performed. The transformation is generic and speciﬁed by means of the following ﬁelds:

736

2.

3.

4. 5.

6.

7.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

center, translation, rotation, scale and scaleOrientation. a list of all the vertices of the skin. The skinCoord contains the 3D coordinates of all the vertices of the seamless model. a list of shapes which build the character skin. The skin ﬁeld consists of a collection of shapes that share the same skinCoord. This mechanism allows to consider the model as a continuous mesh and, at the same time, to attach different attributes (like color, texture) to different parts of the model. A skeleton hierarchy. The skeleton ﬁeld speciﬁes the root of the bone hierarchy. A list of all the bones, segments, sites, and muscles which belong to the character. The bones, segments, sites and muscles ﬁelds respectively consist in the lists of all previously deﬁned SBBone, SBSegment, SBSite and SBMuscle nodes, respectively. The weighsComputationSkinCoord ﬁeld describes a speciﬁc static position of the skinned model. In many cases, the static position of the articulated model deﬁned by skinCoord and skin ﬁelds is not appropriate to compute the inﬂuence region of a bone. In this case, the weighsComputationSkinCoord ﬁeld enables to specify the skinned model vertices in a more appropriate static posture. This posture will be used just during the initialization stage and ignored during the animation. All the skeleton transformations are related to the posture deﬁned by skinCoord ﬁeld. The name ﬁeld speciﬁes the name of the skinned model allowing an easy identiﬁcation during the animation run time.

4.3. Skinned model animation parameters 4.3.1. Animation principle Animating a 3D articulated model requires knowledge of the position of each model vertex at each key frame. Specifying such data is an enormous and expensive task. For this reason, the AFX animation system uses bone-based modeling of articulated models which effectively attaches the model vertices to a bone hierarchy (skeleton). This technique prevents the necessity to specify the

position of each vertex, only the local transformation of each bone in the skeleton being considered. The local bone transformation components (translation, rotation, scale, scaleOrientation and center) are speciﬁed at each frame and, at the vertex level, the transformation is obtained by using the bonevertex inﬂuence region. To address streamed animation, the animation data is considered separately (independent of the model deﬁnition) and is speciﬁed for each key frame. Animating a skinned model is achieved by updating the skeleton geometric transformation components and by transforming the bones and/or the muscle curve form. A general transformation of a bone, as deﬁned by the SBBone node, involves: translation in any direction, rotation with respect to any rotation axis, and scaling with respect to any direction and axis. However, many motion editing systems use the orientation decomposition according to the Euler angles. In practice, when less than three angles are sufﬁcient to describe a joint transformation, a Euler angle-based representation is more appropriate. To compact the animation stream, a rotation is represented as Euler angles in the stream. To ensure the bijectivity [30] of the transformation between the Euler angles notation and rotation matrix (or quaternion representation) the rotationOrder ﬁeld was introduced in SBBone node. A triplet of Euler angles ½y1 ; y2 ; y3 describes how a coordinate system r rotates with respect to a static coordinate system s; here, how a bone coordinate system rotates with respect to its parent coordinate system. The triplet is interpreted as a rotation by y1 around an axis A1 ; followed by a rotation by y2 around an axis A2 ; and by a rotation by y3 around an axis A3 ; with A2 different from both A1 and A3 : The rotation axes are restricted to the coordinate axes, X ; Y and Z; resulting in 12 order possibilities [53]. By considering the axis either in the bone coordinate system r or in its parent coordinate system s; there are 24 possible values for rotationOrder. The bone-base animation (BBA) of a skinned model is performed by updating the SBBone

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

transformation ﬁelds (translation, rotation, scale, center and scaleOrientation) and/or by updating the SBMuscle curve control points position, weight or knot sequence. The BBA stream contains all the animation frames or just the data at the temporal key frames. In the last case the decoder will compute the intermediate frames by temporal interpolation. Linear interpolation is used for translation and scale components, linearðt0 ; t1 ; tÞ ¼ t0 ð1 tÞ þ t1 ðtÞ;

ð12Þ

where t0 is the translation in the ﬁrst frame, t1 is the translation in the last frame, and tA½0; 1: Spherical linear quaternion interpolation is used for rotation and scaleOrientation components, Slerpðq0 ; q1 ; tÞ ¼

q0 sinðð1 tÞOÞ þ q1 sinðtOÞ ; ð13Þ sinðOÞ

where q0 is the rotation quaternion in the ﬁrst frame, q1 is the rotation quaternion in the last frame, cosðOÞ ¼ q0 q1 and tA½0; 1: Each key frame contains: 1. a natural integer KeyFrameIndex which indicates to the decoder the number of frames that have to be obtained by interpolation. If it is zero, the decoder interprets the received frame as a normal frame and sends it to the animation engine. If not, the decoder computes KeyFrameIndex intermediate frames and sends them to the animation engine, as well as the content of the key frame received. 2. the boneID as well as the animation mask for each animated bone (cf. the description of the SBBone animation mask). 3. the muscleID as well as the animation mask for each animated muscle (cf. the description of the SBMuscle animation mask). 4. the new values for each bone transformation component which needs to be updated. 5. the new values of each muscle control point position or/and weigh and knot values which need to be updated. Data related to 1–3 represent the frame animation mask while data related to 4 and 5 yield the frame animation values.

737

A BBA stream contains the information related to a maximum number of 1024 SBBone and 1024 SBMuscle nodes belonging to one or more skinned models. The identiﬁers ﬁelds boneID and muscleID must be unique in the scene graph and must belong to the interval ½0y1023: 4.3.2. SBBone animation mask and animation values To address high compression efﬁciency issues, the generic bone transformation is represented within an animation bitstream as an animation vector of the transform elementary components as presented in Table 2. The size of the animation mask of a bone can vary from 2 bits, if there is motion on one single component, to 21 bits if all the components of the local transformation change with respect to the previous key frame. When dealing with a complex rotation (in three directions) the animation mask will have the form as in Table 3.

Table 2 Geometric transformation components in SBBone node deﬁnition and corresponding animation vector in the bitstream Node representation

Bitstream representation

SFVec3f translation SFRotation rotation

int int int int int int int

SFVec3f scale SFRotation scaleOrientation SFVec3f center

Tx ; Ty ; Yz RotAngleOnAxis1, RotAngleOnAxis2, RotAngleOnAxis3 Sx ; Sy ; Sz Ax1 ; Ax2 ; Ax3 ; RotVal Cx ; Cy ; Cz

Table 3 Example of a bone animation mask for a generic rotation of the bone Binary mask vector

Semantics

0 1 1 1 1 0 0 0

IsTranslation IsRotation isRotation onAxis1 isRotation onAxis2 isRotation onAxis3 IsScale IsScaleOrientation IsCenter

738

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

The animation values of a bone form a vector which contains the new values of all the affected components. For the above example, the animation value vector is: [rotOnAxis1, rotOnAxis2, rotOnAxis3]. 4.3.3. SBMuscle animation mask and animation values An SBMuscle node can be animated by updating the control points values of the NURB curve, the points weight or/and the knot sequence. The number of control points and the number of elements of the knot sequence are integer numbers between 0 and 63 and will be encoded for each frame after the muscleID. The animation values of a muscle consists of a vector which contains the new values of the changed data. The vector is ordered according to the animation mask. 4.3.4. Frame mask and frame values representation A global animation frame is able to animate a subset of SBBone nodes or/and SBMuscle nodes from the scene graph and refers to an animation mask ﬁeld and an animation values ﬁeld which are obtained by concatenating the bone and muscle animation masks and animation values, respectively. The bone and muscle IDs are also part of the animation mask. A frame animation value contains data changed in the current frame for all the bones and muscles of the animation mask. The compression algorithms used in the case of FBA and presented in Section 3.3, namely predictive-based and DCTbased are also retained as part of the standard for encoding the BBA animation values.

5. FBA versus BBA Fig. 14 shows a comparative analysis of the FBA and BBA frameworks. While FBA is dedicated to the representation of the avatar as a segmented character, and is a more appropriate framework for cartoon like applications, BBA offers a higher degree of realistic representation, based on the well-known concept of skeleton-based animation. FBA standardizes a ﬁxed number of animation parameters (296), by

Fig. 14. FBA versus BBA comparison.

attaching to each anatomical segment one, two or three rotation angles. BBA animation data involves a non-deﬁned number of parameters which are attached to each bone of the VC skeleton. Moreover, the animation parameters can be rotations, translations and/or scaling factors. Both frameworks address streaming animation by offering low-bit-rate compression schemes. The two compression methods, frame-based and DCTbased, are adopted for both FBA and BBA. Moreover, BBA natively supports advanced animation techniques such as frame interpolation and IK. For both frameworks, the compression bitrate depends on the movement complexity (number of segments/joints involved in the motion) and is in the range of 5–30 kbps (for a frame rate of 25 fps). Within FBA, the animation stream contains information on one single human VC, whereas, within BBA, it is possible to animate several characters in a scene by using one or more streams. Muscle-like deformations are present in both frameworks: FBA

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

standardizes in the case of facial deformation a number of control points and BBA allows the designer to add curve-based deformers at any level of the skin. FBA standardizes the location and the number of control points, while BBA gives this freedom to the designer, being possible to achieve muscle-like deformation on any part of the VC skin and achieve realistic animation. From a point of view of transporting FBA and BBA data over networks (webcast or broadcast), the same mechanism based on the so-called MPEG-4 BIFS-Anim stream is used. A BIFSAnim stream is designed to animate objects in an MPEG-4 scene. Two approaches are supported: directly encode the update values for ﬁelds and include dedicated animation streams. Whatever the MPEG-4 VC animation framework (FBA or BBA), compression performances are guaranteed by using the encapsulation-based approach. For the compression performances reasons, for both, FBA and BBA frameworks, the BIFS-Anim stream use the second method. This encapsulation mechanism allows to deliver VC animation data within the MPEG-4 transport layer. Recent speciﬁcations related to transport MPEG-4 data on MPEG-2 transport channel allows us to integrate avatar and generic VC, respectively, within real applications as digital video broadcasting. Related to the terminal capabilities, BBA framework is more complex than FBA, in order to achieve realistic animation, dedicated 3D hardware being well suited. A speciﬁc application related to a sign language communication system [39,50,54] using VCs, and developed within the ViSiCAST project [57], allows us to comparatively evaluate the FBA and BBA frameworks. ViSiCAST develops, evaluates and applies realistic virtual humans (avatars), generating European deaf sign languages. By building applications for the signing system in television, multimedia, web and face-to-face transactions, ViSiCAST aims to improve the position of Europe’s deaf citizens, their access to public services and entertainments and enable them to develop and consume their own multimedia

739

content for communication, leisure and learning through: 1. systems for the generation, storage and transmission of virtual signing. 2. user-friendly methods to capture signs (where appropriate). 3. a machine readable system to describe signlanguage gestures (hand, face and body) which can be used to retrieve stored gestures or to build them from low-level motion components. It will use this descriptive language to develop translation tools from speech and text to sign. Fig. 15 shows the functional architecture of the ViSiCAST broadcast signing. Fig. 16 shows the equipment for encoding and transmission the VC animation data within an MPEG-2 transport layer as well as a PC-based terminal, able to receive, decode and render video and VC animation.

Fig. 15. ViSiCAST functional architecture.

Fig. 16. ViSiCAST broadcasting system.

740

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741

Our speciﬁc application of a communication system for deaf people community by using VC allow us to observe that MPEG-4 VC animation tools (FBA and BBA) ensure, with a low-bit-rate transmission, a complete and scalable VC animation solution in terms of realistic effects and terminal complexity.

6. Conclusion We have presented in this paper, the MPEG-4 activities related to VC animation. Our target application was a sign language communication system. We have described ﬁrst how Amendment 1 of the MPEG-4 standard offers an appropriate framework for virtual human animation, gesture synthesis and compression/transmission. Then, we have discussed how this framework is extended within the ongoing standardization efforts by (1) allowing the animation of any kind of articulated model, and (2) addressing advanced modeling and animation concepts as ‘‘skin and bones’’-based approach. Techniques such as frame interpolation, IK, discrete cosine transform and arithmetic encoding making it possible to provide a highly compressed animation stream, have been presented and discussed. The two frameworks FBA and BBA have been ﬁnally evaluated in terms of realism, accuracy and transmission bandwidth within a sign language communication system.

References [1] 3D Studio Max, Version 3.1, Autodesk, San Francisco, CA, 1999. [2] Activision, www.activision.com. [3] W.W. Armstrong, M. Green, The dynamics of articulated rigid bodies for purposes of animation, in: Proceedings of the Graphics Interface ’85, 1985, pp. 407–415. [4] Ascension Technology MotionStars http://www.ascension-tech.com/products/motionstar/. [5] Attitude Studio, Eve Solal, www.evesolal.com. [6] N. Badler, D. Metaxas, B. Webber, M. Steedman, The center for human modeling and simulation, Presence 4 (1) (1995) 81–96. [7] N. Badler, C. Phillips, B. Webber, Simulating Humans: Computer Graphics, Animation, and Control, Oxford University Press, Oxford, 1993.

[8] blaxxun Community, VRML-3D-Avatars—Multi-User Interaction, http://www.blaxxun.com/vrml/home/ccpro.htm. [9] blaxxun interactive, NURBS Extension for VRML97, 2/ 2001. [10] J. Blinn, A Generalization of algebraic surface drawing, ACM Trans. Graph. 1 (3) (July 1982) 235–256. [11] Boston Dynamics Inc., The digital biomechanics laboratory, www.bdi.com, 1998. [12] W. Bricken, G. Coco, The VEOS Project Presence, Vol. 3 (2), MIT Press, Cambridge, MA, Spring 1994, pp. 111–129. [13] T. Capin, I. Pandzic, N. Magnenat Thalmann, D. Thalmann, Virtual human representation and communication in the VLNET networked virtual environments, IEEE Comput. Graph. Appl. 17 (2) (1997) 42–53. [14] C. Carlsson, O. Hagsand, DIVE – a multi-user virtual reality system. In: Proceedings of the IEEE Virtual Reality Annual International Symposium, (VRAIS), Seattle, WA, 1993, pp. 394–400. [15] E. Catmull, J. Clark, Recursively generated B-spline surfaces on arbitrary topological meshes, Comput. Aided Design 10 (September 1978) 350–355. [16] Disney Animation Studios, Flubber (1997) http://disney. go.com/disneyvideos/liveaction/ﬂubber/. [17] Division Ltd. dVS Technical Overview, Version 2.0.4, 1st Edition, 1993. [18] P.K. Doenges, T.K. Capin, F. Lavagetto, J. Ostermann, I.S. Pandzic, E. Petajan, MPEG-4: Audio/video and synthetic graphics/audio for mixed media, Signal Processings Image Communication 9 (4) (1997) 433–463. [19] Electronic Arts, www.ea.com. [20] M. Escher, I. Pandzic, N. Magnenat-Thalmann, Facial deformations for MPEG-4, in: Computer Animation 98, Philadelpia, USA, IEEE Computer Society Press, 1998, pp. 138–145. [21] P. Falouutsos, M. van de Panne, D. Terzopoulos, Composable controllers for physics-based character animation, in: Computer Graphics Proceedings, Annual Conference Series, 2001, ACM SIGGRAPH. [22] Faro Technologies, www.farotechnologies.com. [23] J.D. Foley, A. van Damm, S.K. Feiner, J.F. Hughes, Computer Graphics—Principles and Practice, 2nd Edition, Addison-Wesley, Reading, MA, 1992. [24] J.D. Funge, AI for Computer Games and Animation: A Cognitive Modeling Approach, AK Peters, USA, August 1999. [25] Geri’s game, Toy Story (1995), A Bug’s Life (1998), Toy Story 2 (1999) and Monsters, Inc. (2001). Walt Disney Pictures and Pixar. [26] C. Greenhalgh, S. Benford, MASSIVE: a distributed virtual reality system incorporating spatial trading. In: 15th International Conference on Distributed Computing Systems (DCS’95), Vancouver, Canada, 30 May –2 June, IEEE Computer Society Press, Silver Spring, MD, 1995. [27] Humanoid Animation Speciﬁcation, www.h-anim.org. [28] Introducing Seonaid, our online news presenter, Scotland government web page, http://www.scotland.gov.uk/pages/ news/junior/introducing seonaid.aspx.

M. Preda, F. Preteux / Signal Processing: Image Communication 17 (2002) 717–741 [29] ISO/IEC JTC1/SC29/WG11, ISO/IEC 14496:1999, Coding of audio, picture, multimedia and hypermedia information, N3056, Maui, December 1999. [30] C. John Jr., Introduction to Robotics: Mechanics and Control, 2nd Edition, Addison-Wesley, Reading, MA, 1989. [31] P. Kalra, N. Magnenat-Thalmann, L. Moccozet, G. Sannier, A. Aubel, D. Thalmann, Real-time animation of realistic virtual humans, IEEE Comput. Graph. Appl. 18 (5) (1998) 42–57. [32] F. Lavagetto, R. Pockaj, The facial animation engine: toward a high-level interface for the design of Mpeg-4 compliant animated faces, IEEE Trans. Circuits Systems Video Technol. 9 (2) (March 1999) 277–289. [33] Living actor technology, http://www.living-actor.com/. [34] C. Loop, Smooth subdivision surfaces based on triangles, Master’s Thesis, Department of Mathematics, University of Utah, August 1987. [35] M.R. Macedonia, M.J. Zyda, D.R. Pratt, P.T. Barham, S. Zeswitz, NPSNET: A network software architecture for large scale virtual environments, Presence 3 (4) (1994) 265–287. [36] N. Magnenat-Thalmann, D. Thalmann, Virtual Reality Software and Technology, Encyclopedia of Computer Science and Technology, Vol. 41, Marcel Dekker, New York, 1999. [37] Maya Inﬁnity, Alias/Wavefront Inc., 1999. [38] T.B. Moeslund, E. Granum, A survey of computer visionbased human motion capture, Comput. Vision Image Understanding 81 (3) (2001) 231–268. [39] G. Mozelle, F. Pr#eteux, J.E. Viallet, Tele-sign: A compression framework for sign language distant communication, in: Proceedings of the SPIE Conference on Mathematical Modeling and Estimation Techniques in Computer Vision, San Diego, CA, July 1998, Vol. 3457, pp. 94–110. [40] M.G. Pandy, F.C. Anderson, Three-dimensional computer simulation of jumping and walking using the same model, in: Proceedings of the VIIth International Symposium on Computer Simulation in Biomechanics, August 1999. [41] M.G. Pandy, F.E. Zajac, E. Sim, W.S. Levine, An optimal control model for maximum-height human jumping, J. Biomech. 23 (12) (1990) 1185–1198. [42] I. Pandzic, N. Magnenat-Thalmann, T. Capin, Thalmann Virtual LifeNetwork: A Body-Centered Networked Virtual Environment, Presence, MIT Press, Cambridge, MA, Vol. 6 (6), 1997, pp. 676–686. [43] L. Piegl, W. Tiller, The NURBS Book, 2nd Edition, Springer, Berlin, 1997. [44] Polhemus STAR TRACKs motion capture system, http://www.polhemus.com.

741

[45] M. Preda, F. Preteux, Streamed animation within Animation Framework eXtension (AFX), ISO/IEC JTC1/SC29/ WG11, MPEG01/M7588, Pattaya, Thailand, December 2001. [46] M. Preda, F. Preteux, Deformable Bones or Muscle-based Deformation? ISO/IEC JTC1/SC29/WG11, MPEG01/ M7757, Pattaya, Thailand, December 2001. [47] M. Preda, T. Zaharia, F. Preteux, 3D body animation and coding within a MPEG-4 compliant framework, in: Proceedings of the International Workshop on SyntheticNatural Hybrid Coding and Three Dimensional Imaging (IWSNHC3DI’99), Santorini, Greece, 15–17 September 1999, pp. 74–78. [48] F. Pr#eteux, M. Preda, G. Mozelle, Donation to ISO of Hand Animation Software, ISO/IEC JTC1/SC29/WG11, M3590, July 1998, Dublin. [49] S20 Sonic Digitizers, Science Accessories Corporation. [50] Seamless Solutions Inc. demos, http://www.seamless-solutions.com. [51] H. Seo, F. Cordier, L. Philippon, N. Magnenat-Thalmann, Interactive modelling of MPEG-4 deformable human body models, DEFORM’2000 Workshop, Geneva, 29–30 November 2000, pp. 120–131. [52] M.B. S!evenier, et al. (Editors), PDAM of ISO/IEC 144961/AMD4, ISO/IEC JTC1/SC29/WG11, N4415 December 2001, Pattaya. [53] K. Shoemake, Euler Angle Conversion, Gramphics Gems IV, Academic Press Professional, Toronto, 1994. [54] Sign Language Web Site at University Lumi"ere Lyon2, http://signserver.univ-lyon2.fr. [55] The virtual reality modeling language, International Standard ISO/IEC 14772-1:1997, www.vrml.org. [56] Vandrea news presenter, Channel 5, British Broadcasting Television. [57] ViSiCAST, Virtual Signer Communication, Animation, Storage and Transmission, IST, European Project 19992002, www.visicast.org. [58] VPL Research Inc., Dataglove Model 2 Operation Manual, January 1989. [59] J. Wilhelms, B.A. Barsky, Using dynamic analysis to animate articulated bodies such as humans and robots, in: Graphics Interface ’85, May 1985, pp. 97–104. [60] W. Wooten, Simulation of leaping, tumbling, landing, and balancing humans, Ph.D. Thesis, Georgia Institute of Technology, March 1998. [61] G. Wyvill, C. McPhetters, B. Wyvill, Data structure for soft objects, The Visual Comput. 2 (1986) 227–234. . [62] D. Zorin, P. Schroder, et al., Subdivision for modeling and animation, SIGGRAPH’00 Conference Course Notes, July 2000.

Insights into low-level avatar animation and MPEG-4 standardization

Insights into low-level avatar animation and MPEG-4 standardization

Recommend Documents