Journal of Parallel and Distributed Computing 62, 1001–1020 (2002) doi:10.1006/jpdc.2001.1820
A Unified Formulation for Massively Parallel Rigid Multibody Dynamics of O(log2 n) Computational Complexity Andr!es Jaramillo-Botero1 Engineering Faculty, Pontificia Universidad Javeriana, Calle 18 #118-250 v!ıa a Pance, Cali, Colombia E-mail:
[email protected]
and Alfons Crespo I. Lorente DISCA Department, Universidad Polit!ecnica de Valencia, Camino de Vera, 14, E46022, Valencia, Spain E-mail:
[email protected] Received August 4, 1999; revised November 27, 2001; accepted January 11, 2002
A novel algorithm for the solution of the inverse dynamics problem is presented and augmented to the solution of the equations of motion (EOM) for rigid multibody chains using explicit constraint components of force. The unified model corresponds to an optimal, strictly parallel, time, space, and processor lower bound solution to the dynamics of accelerated rigid multibodies, i.e., computation time of Oðlog2 nÞ using OðnÞ processors for an n body system. Complex topological structures are supported in the form of multiple degree-of-freedom (DOF) joints/hinges, free-floating, hyper-branched, and/or closed-chain systems, with applications ranging from multibody molecular dynamics simulations and computational molecular nanotechnology, to real-time control and simulation of spatial robotic manipulators. In addition to the theoretical significance, the algorithms presented are shown to be very efficient for practical implementation on MIMD parallel architectures for large-scale systems. # 2002 Elsevier Science (USA) Key Words: strictly parallel computations; robotics; forward dynamics; inverse dynamics; multibody dynamics; molecular dynamics; computational molecular nanotechnology.
1. INTRODUCTION Two basic problems are tackled in considering the dynamic behavior of general systems. Bodies, which are subject to forces, undergo acceleration (motion) (referred 1
To whom correspondence should be addressed.
1001
0743-7315/02 $35.00 # 2002 Elsevier Science (USA)
All rights reserved.
1002
JARAMILLO-BOTERO AND CRESPO
here as the forward dynamics problem). The second problem is that of determining the forces required to produce a prescribed motion in the system (inverse dynamics). The most computationally efficient, and perhaps conventional, scheme to describe the problem of motion in serially coupled multibodies relies on the application of Newton–Euler’s set of EOM—with an OðnÞ serial complexity (for n bodies). This method is typically used for robotics applications and general multibody dynamics formalisms [5, 6]. In order to improve on the computation time of these EOM, multiprocessing should be used [16]. Unfortunately, and particularly true for the forward dynamics solution, most of the existing OðnÞ algorithms are strictly sequential with bounded parallelism [1, 2, 4, 19, 22]. This means that regardless of the number of processors employed in their computation, speedup will only improve by a constant factor or at some point saturate or even degrade performance. The poor, or null, scalability of these algorithms restricts their applicability to smallscale, short-term applications. On the other hand, recent developments in this area [8–10, 13, 16, 17, 21] have led to enormous theoretical and practical improvements on the scalability of Newtonian formulations. This article takes these advances a step forward by presenting a consistent strictly parallel formulation of the EOM, using generalized (internal) coordinates, to solve both inverse (see Eq. (1)) and forward (see Eq. (2)) dynamics problems of complex multibody topologies while maintaining lower-bound computational complexity. For the inverse dynamics problem the total effective forces are computed as FT ¼ M Q. þ xðQ; Q’ Þ:
ð1Þ
Solving for effective accelerations ðy. Þ from (1) Q. ¼ M 1 ½FT xðQ; Q’ Þ;
ð2Þ
where FT is the total effective forces vector, Q. is the effective acceleration vector, M is the mass matrix operator, and x the non-linear velocity-dependent force component vector.
1.1. Importance of the Intended Solutions The proposed inverse dynamics algorithm defines Eq. (1) as a first-order linear inhomogeneous recurrence (LIR), contrary to almost all the other known proposals [7, 16, 17]. As a result, it avoids the explicit pre-computation of input physical parameters in global cartesian coordinates. This novel algorithm is integrated into the recently developed constraint force algorithm (CFA) for large-scale molecular dynamics (MD) simulations [8, 9, 13] to compute xðQ; Q’ Þ in Eq. (2). Furthermore, new important extensions to support free-floating, hyperbranched, and closed-chain systems are also discussed. The result is an OðnÞ serial scheme that is effectively implemented on parallel architectures in Oðlog2 nÞ using OðnÞ processors, corresponding to the time lower bound (indicating the time limitation for all possible algorithms) in solving this given problem [16].
STRICTLY PARALLEL RIGID MULTIBODY DYNAMICS
1003
All the algorithms are presented using spatial algebra operators for compactness and improved physical insight. Consequently, the next section reviews the elements of this algebra within the context of rigid multibody systems.
2. SPATIAL OPERATOR ALGEBRA AND RIGID-BODY DYNAMICS Spatial Algebra has been around for some time now [4, 19]. The spatial operators have this particular denomination given that they show how physical quantities propagate through space from one rigid body to the next. They allow concise and systematic formulations of the EOM [6], and the development of efficient computational algorithms. In spatial notation, velocity, acceleration, and force vectors are all 6 1 column vectors, and each incorporates the appropriate angular and translational components stacked together. Translational and angular velocities ðv; wÞ, translational and angular forces ðf ; N Þ at any point on a body in R3 , are defined in terms of spatial velocity V , spatial acceleration V’ , and spatial force F in R6 as " # " # " # w w’ N ’ V
; V
and F
: ð3Þ v v’ f In multibody dynamics, spatial quantities must be propagated and projected onto points or unique frames in order to be operated on. For this purpose, operators for translation and rotation are also defined in spatial operator forms as " # U p* i; iþ1 P# i; iþ1
ð4Þ 0 U and " Ri; iþ1
ri; iþ1 0
# 0 ; ri; iþ1
ð5Þ
where U 2 R3 3 is the identity operator, pi; iþ1 2 R3 is any vector joining two points (e.g., from point 2 to point 1 in Fig. 1), ri; iþ1 2 R3 3 corresponds to the generalized rotation matrix that takes any point in coordinate frame i þ 1 and projects it onto frame i, and p* i; iþ1 is the skew symmetric matrix corresponding to the vector cross product operator also having the following additional properties (for any point in R3 ): p* i; j pk; l ¼ pi; j pk; l ; p* Ti; j ¼ p* i; j :
ð6Þ
The 6 6 spatial inertia of a body is obtained by combining its mass and first and second moments of mass with respect to the point of interest in the body. Assuming that the body has a mass mi and moment of inertia Ji; cm about the body’s center of
1004
JARAMILLO-BOTERO AND CRESPO
FIG. 1. Single-body mass spatial parameters.
mass, cm, the spatial inertia operator is then defined as " # Ji; cm 0 Ii; cm
2 R6 6 : 0 mi U
ð7Þ
3. INVERSE DYNAMICS 3.1. Dynamics of a Single Body From Fig. 1 let Oi and cm, be two points on a rigid body i, and sOi be the vector from point Oi to cm. Furthermore, let cm be the center of mass of the rigid body. Let vi; cm and wi; cm be the translational and angular velocities of body i at cm, and Fi; cm and Ni; cm the forces and moment at and about cm, respectively. Then the forces and velocities at the point cm and Oi are related to each other as follows: Vi; cm ¼ S# Oi; cm VOi ;
ð8Þ
FOi ¼ S# Oi; cm Fi; cm ;
ð9Þ
T
where S# Oi; cm has the same form expressed by Eq. (4) which simply states that the velocity at any point, cm in body i, can be written in terms of any other point, Oi in body i. By differentiating Eq. (8) with respect to time the expression for the spatial acceleration at and about point cm is obtained, T V’ i; cm ¼ S# Oi; cm V’ Oi þ bOi ;
ð10Þ
where bOi ¼ S’# Oi; cm VOi represents the velocity ðV Þ-dependent centrifugal acceleration components at point Oi . T
STRICTLY PARALLEL RIGID MULTIBODY DYNAMICS
1005
From the definition of the spatial moment, the spatial forces acting on the body— about cm—correspond to Fi; cm L’ i; cm ¼ Ii; cm V’ i; cm þ I’i; cm Vi; cm ;
ð11Þ
where the second term is the velocity ðV Þ-dependent gyroscopic spatial force at cm (call it Pi; cm ¼ I’i; cm Vi; cm ). In order to complete our description and generalize the EOM for serially connected multibody chains, these equations are developed about points other than the center of mass. Assuming from Fig. 1 that point Oi corresponds to the point about which body i is rotating, then from Eq. (9) and Eq. (11) the spatial forces at/about Oi are FOi ¼ S# Oi; cm Ii; cm S# Oi; cm V’ Oi þ S# Oi; cm Ii; cm bOi þ S# Oi; cm Pi; cm : T
ð12Þ
The first term is clearly identified as the application of the parallel axis theorem for spatial inertia, that allows the computation of the inertia about Oi , IOi ¼ S# Oi; cm Ii; cm S# Oi; cm : T
ð13Þ
The gyroscopic forces at point Oi are recognized from the second and third terms in Eq. (12) as POi ¼ S# Oi; cm ½Ii; cm bOi þ Pi; cm :
ð14Þ
Expanding and simplifying, POi ¼ IOi S’# Oi; cm VOi þ I’Oi VOi : T
ð15Þ
It then follows that the force at point Oi is given by FOi ¼ IOi V’ Oi þ POi :
ð16Þ
Our interest now shifts to the computation of the EOM for serially coupled rigid bodies considering the equations obtained for single bodies (serial chains of n rigid bodies are considered). For simplicity and clarity, it is assumed that all bodies have the same number of DOF and degrees of constraint (DOC). 3.2. Serial Rigid Multibody Dynamics Considering that each element/link of the serial system in Fig. 2 is a rigid body with defined cm location and inertia tensor, then its mass distribution is fully characterized. Also, from the fact that each element is considered rigid it follows that the EOM can be developed on any point of the body, as shown in Section 3.1. For convenience, this point is set to be the connection point between adjacent bodies, Oi . The procedure consists of a forward, base to tip, propagation of kinematics parameters ðV ; V’ Þ and a backward, tip to base, propagation of spatial forces (F) considering d’Alambert’s principle. The spatial velocity, V , of any link i consists of
1006
JARAMILLO-BOTERO AND CRESPO
FIG. 2. Rigid multibody serial chain.
the sum of the induced motion from its previously connected body, i 1, and its local component, i.e., T Vi ¼ P# i1; i Vi1 þ Hi Q’ i ;
ð17Þ
where Hi corresponds to the projection matrix of DOF for joint i, i.e., projects onto the axis of motion. Spatial accelerations ðV’ i Þ are then obtained by differentiating Eq. (17) with respect to time, T T V’ i ¼ P# i1; i V’ i1 þ Hi Q. i þ P’# i1; i Vi1 þ H’ i Q’ i ;
ð18Þ
where the third and fourth components correspond to the velocity-dependent T Coriolis/centrifugal accelerations defined here-in-after as b; ðbi ¼ P’# i1; i Vi1 þ H’ i Q’ i Þ. Spatial velocity and acceleration terms are recursively computed from link 1::n. Using Eq. (15) the spatial forces are then computed (from tip to base, i.e., n:::1), to complete the EOM, T Fi ¼ P# i; iþ1 Fiþ1 þ Ii V’ i þ I’i Vi þ Ii S’# Oi; cm Vi ;
ð19Þ
STRICTLY PARALLEL RIGID MULTIBODY DYNAMICS
1007
where the two last terms correspond to the gyroscopic forces, Pi , introduced by the spatial velocity when shifting the rotation point of the body away from the center of mass (cm). The effective forces on each axis of motion are consequently given by FTi ¼ HiT Fi . Equations (17)–(19), correspond to the entire Newton–Euler EOM for a fixed base serial multibody chain in coordinate-free form. Using higher level spatial operators ’ ; PT V . þ b, and ’ ¼ HQ Eqs. (17), (18), (19) are expressed, respectively, as, PT V ¼ HQ ’ þ G, where PF ¼ IV 3 2 U 0 0 0 7 6 # 6 PN 1 U 0 07 7 6 7 6 0 7: 0 P# N2 U ð20Þ P¼6 7 6 7 6 . . . . . .. 7 .. .. .. 6 .. 5 4 0 0 0 P# 1 U And the rest of the terms correspond to block diagonal matrices or block vectors of the lower level analogous operators. 3.3. Parallel Oðlog2 nÞ Inverse Dynamics Algorithm As stated previously, the solution to this problem involves the reformulation of the EOM in an LIR form. This reduces to applying the Kogge and Stone [15] recursive doubling technique, which reduces the equation set (8i), at each one of a total log2 n þ 1 steps, by powers of 2. Fig. 3 depicts a graphical representation for computing the LIR form of spatial velocities using the recursive
FIG. 3. Graphical representation of spatial velocity LIR.
1008
JARAMILLO-BOTERO AND CRESPO
doubling method (n ¼ 4). Spatial accelerations and forces are represented accordingly. The next section details the parallel algorithm for computing the inverse dynamics in LIR form using a message-passing model—presented for n ¼ P cases to avoid unnecessary load distribution details (i.e., i ¼ P ), where n denotes the number of bodies in the serially connected multibody and P the number of processors available for concurrent computation.
3.4. Detailed Parallel Algorithm Algorithm 1 describes the message-passing scheme required for implementation of the proposed inverse dynamics solution. 8i : Do Parallel T
0 T Vi0 ¼ Hi Q’ i and P# i ¼ P# i RTi
for j ¼ 1::dlog2 ne T
j1 Send P# i to processor i 2j1 iff i 2j1 50 j1T
Vpij1 ¼ P# i
Vij1
Send Vpij1 to processor i þ 2j1 iff i þ 2j1 5P T
j1 j1 j1 Receive P# i2j1 Vi2 j1 from processor i 2 T
j1 j1 Vij ¼ Vij1 þ P# i2j1 Vi2 j1 T
j1 Receive P# iþ2j1 from processor i þ 2j1 T
T
T
T
T
T
j j1 j1 j j1 j1 P# i ¼ P# iþ2j1 P# i equiv. to ) P# i2j1 ¼ P# i P# i2j1
end j Send Vi
dlog2 ne
to processor i þ 1 (requires projection onto local i)
dlog ne
Receive Vi1 2
from processor i 1
T 0 dlog ne V’ i ¼ Hi Q. i þ H’ i Q’ i þ P’# i1 Vi1 2
for j ¼ 1::dlog2 ne j1 Send V’ i to processor i þ 2j1 iff i þ 2j1 5P j1 Receive V’ i2j1 from processor i 2j1 T
j1 j j1 j1 V’ i ¼ V’ i þ P# i2j1 V’ i2j1
end j
STRICTLY PARALLEL RIGID MULTIBODY DYNAMICS
1009
0T T 0 dlog ne P# i ¼ P# i and Fi0 ¼ Fexti þ Ii V’ i 2 þ Pi for j ¼ 1::dlog2 ne send Fij1 to processor i 2j1 iff i 2j1 50 j1 j1 Receive Fiþ2 j1 from i þ 2 j1 Fij ¼ Fij1 þ P# j1 i Fiþ2j1
P# ji ¼ ðP# j1 P# j1 ÞT 2 i2j1 i T
T
end j dlog2 ne
FTi ¼ HiT Fi
ALGORITHM 1. Parallel Inverse Dynamics Algorithm.
4. FORWARD DYNAMICS The computation of the EOM reduces to solving Eq. (2). x terms do not contribute directly and linearly to spatial accelerations. Hence they are removed from the EOM via the application Newton–Euler with Q. ¼ 0. This is equivalent to ‘‘linearizing’’ the Newton–Euler EOM by expressing them without the velocity (V )-dependent terms, i.e., T V’ i ¼ P# i1 V’ i1 þ Hi Q. i
Fi ¼ Ii V’ i þ P# i Fiþ1
.; ’ ¼ HQ ) PT V ’: ) PF ¼ IV
ð21Þ ð22Þ
Equations (21) and (22) are also equivalent to the EOM for systems in static equilibrium. 4.1. Constraint Force Method to Determine Q. ðFT Þ The objective of this section is to solve Eq. (2) in OðnÞ serial computational complexity and in Oðlog2 nÞ, using OðnÞ processors, parallel computational complexity. A high level, abstract, form of the articulated body inertia operator required for solving the forward dynamics problem expressed in Eq. (2) (in particular M 1 ) is represented as follows: AR IiAR ¼ Ii þ Iiþ1
AR f2 ðIiþ1 Þ AR AR fðIiþ1 Þ; ¼ Ii þ Iiþ1 AR f1 ðIiþ1 Þ
ð23Þ
where I is the local body inertia, I AR corresponds to articulated body inertia with respect to a particular element of the multibody, f1 , and f2 are first- and T P# is obtained by transposing the accumulated result from step 1 (P# ), i.e., there is no need for explicit communication. 2
1010
JARAMILLO-BOTERO AND CRESPO
second-order polynomials, respectively, hence maxðf1 ; f2 Þ ¼ 2. This abstract form represents a non-linear recurrence of order 2. The main bottleneck in parallel computation of the existing solutions to this problem, comes from the existence of a non-linear recurrence of order > 1 [10, 13] in the articulated body inertia operator [4] in Eq. (23) that severely limits the potential for parallelism [11]. That is, regardless of the number of processors employed, its computation can be sped up only by a constant factor. A new algorithm based on a global reformulation of the problem is then required to come up with an alternative and parallelizable factorization of the mass matrix or its inverse. By decomposing the total forces of a system into active and constraint components, the EOM are reformulated in a manner that leads to a direct and parallelizable form of M 1 [10]: F ¼ HFT þ WFS ;
ð24Þ
where FT is a block vector of active force vectors, FS is a block vector of constraint force vectors, H is a block matrix of projection matrices for effective axes of motion and W corresponds to a block matrix of projection matrices (2 R6 DOC ) for constrained axes of motion. Defining the following orthogonality conditions between free and constrained axes of motion: HHT þ WWT ¼ U; HT H ¼ U; WT W ¼ U;
ð25Þ
T
H W ¼ 0; WT H ¼ 0 . is found to be and using Eqs. (21), (22), and (24) Q . ¼ ½C BT A1 BFT ; Q where A ¼ WT PT I1 PW 2 RmDOC mDOC , T T 1 H P I PH 2 RnDOF nDOF , and
B ¼ WT PT I1 PH 2 RmDOC nDOF ,
M 1 ¼ C BT A1 B:
ð26Þ C¼
ð27Þ
An important fact about Eq. (27) is that in computing the matrices A, B, and C there are only local or nearest-neighbor dependencies. Furthermore, A and C are symmetric positive definite (SPD) for serial chain systems, which guarantees the existence of A1 . Another interesting fact is that the operation count in computing the inverse of A is directly proportional to the number of DOC per body. Section 4.3 shows how Eq. (26) can be computed in strictly parallel fashion within the time lower bound for the problem.
STRICTLY PARALLEL RIGID MULTIBODY DYNAMICS
1011
4.2. Support for Free-Floating Systems To extend these algorithms for free-floating systems a six DOF (three translation, three rotation) virtual link is added between the inertial system and the first body of the chain. For the inverse dynamics case, this is done in a straightforward manner by considering the kinematics of the added virtual link and including a unit H projection matrix for the base body. On the other hand, for the Constraint Force Algorithm (CFA), this is not as straightforward given the intrinsic use of constraints to solve the EOM, i.e., a six DOF transformation implies no constraints, hence introducing a singularity into the inverse solution of A in (27). Since the effects of having a free-floating base are purely kinematic in nature, the problem can be solved by introducing additional terms to (21) and (22) to express it. Since, W0 ¼ 0 and H0 ¼ U, it then follows from (24) that F0 ¼ FT 0 , hence F0 is given from Eqs. (21) and (22) as F0 ¼ I0 q. 0 þ P# 0 F1 , therefore q. 0 ¼ I01 ðF0 P# 0 F1 Þ. Eq. (21) becomes . þV ’ b ¼ PT V ’; HQ
ð28Þ
where the free-floating contribution/effects on the overall kinematics of the system is given by, ’b ¼ ½0 0 V
T T 0 P# 0 q. 0 :
ð29Þ
This leads to ’b BFT þ A0 FS ¼ WT V
ð30Þ
. þ HT V ’ b: CFT þ BT FS ¼ Q
ð31Þ
and
Consequently, it also leads to a new expression for Q. , . ¼ CFT þ BT ½A01 ðWT V ’ 0 BFT Þ HT V ’ b; Q b where A0 ¼ A þ diagðV’ b Þ; V’ b ¼ ½ 0 00
0 V’ b ¼ ½ 0
00
0
0 0
0
ð32Þ
T W1T P# 0 I01 P# 0 W1 and T
T T T P# 0 I01 F0 P# 0 I01 P# 0 H1 FT1 :
4.3. Parallel Computations The following macroAlgorithm 2 condenses the overall parallel computations involved in computing the entire forward dynamics problem. StepComputational Complexity FT ¼ Fa x Oðlog2 nÞ in OðnÞ processors from algorithm in Algorithm 1 1 T’ . Q ¼ CFT H Vb Oð1Þ in OðnÞ processors
1012
JARAMILLO-BOTERO AND CRESPO
’b S ¼ BFT WT V 01
0
Oð1Þ in OðnÞ processors
FS ¼ A S . 2 ¼ BT FS Q .2 . ¼Q .1þQ Q
Oð1Þ in OðnÞ processors
’;Q Integrate to Q
Oð1Þ in OðnÞ processors3
Oðlog2 nÞ in OðnÞ processors Oð1Þ in OðnÞ processors
ALGORITHM 2. Macro Parallel Free-Floating base CFA Step 4 in Algorithm 2 is computed using a block variant of cyclic reduction [12] factorization scheme. The stability of the intended solution for the forward dynamics problem is guaranteed by the block tridiagonal symmetric positive definite structure of the multibody mass operator and the stability of this factorization scheme [23, 24] used to invert it. 4.4. Closed Chains and Tree-Topologies A tree topology is viewed as a set of serial chains (referred as branches), coupled together via hinges at their respective base bodies. With 1 . . . b branches in the system, the stacked vectors of spatial velocities, spatial accelerations, spatial forces and spatial inertias are defined in order to include terms for each of the b branches (see Fig. 4). Assuming an n-body system, n
b X
nj ;
j¼1
I 0 ¼ diagfIi g V0 ¼ ½ V1
V2
...
Vb T ;
’0 ¼ ½V ’1 V
’2 V
...
’ b T ; V
F0 ¼ ½ F1
F2
. . . F b T :
ð33Þ
In order to identify the branch connections, an index i ¼ 1; . . . ; l is defined that identifies the bodies within each branch, and a block connection matrix, r, is also provided. Non-zero entries define a connection between specific bodies. The EOM can then be set in a similar manner as those presented in Eqs. (21) and (22) but the higher level operators described here must be used instead: . ¼ rT V ’ ; H0 Q 0
’ 0: rF0 ¼ I0 V
0
ð34Þ
The recursive application of the CFA can be used for every branch, with the new interbody force decomposition strategy, by simply having a single DOC for the lower 3
Depends on the integration algorithm used.
STRICTLY PARALLEL RIGID MULTIBODY DYNAMICS
1013
FIG. 4. Branched and closed-chain topologies based on serial chains.
attaching body of each sub-chain. Once more, this represents an alternate lower boundary condition on each chain, given by the induced spatial acceleration of the lower sub-chain. The parallelization of this scheme can be subdivided into two tasks, computations within a branch, and computations between tree branch members. For example, consider a branched topology system with n bodies, each having one or more branches composed of l bodies (thus forming a system with a total of nl bodies); the achieved computational complexity is Oðlog2 n þ lÞ with OðnÞ processors ðP Þ. For the closed-chain case, suppose the closed chain is cut at any given point and a base is defined at yet another point. This results in two serial chains, with say n1 and n2 links. Causal/anticausal [20] directions are assigned to each arm and since the tips of each are constrained then the following boundary conditions hold for both: f1; n1 ¼ f2; n2 ¼ f
ð35Þ
V’ 1; n1 ¼ V’ 2; n2 :
ð36Þ
Mi Q. i þ x ¼ Ti JiT f ;
ð37Þ
and
It then follows that
where xi ði ¼ 1; 2Þ, corresponds to the respective non-linear velocity-dependent components, and J is the Jacobian operator relating spatial velocities to effective hinge/joint velocities. From the above equations, the spatial accelerations will be given by Q. i ¼ Mi1 ðTi xi Þ þ Mi1 ½JiT f ¼ Q. if þ DQ. i ;
ð38Þ
where the free joint accelerations (if the tip was unconstrained) are Q. if ¼ Mi1 ðTi xi Þ
ð39Þ
DQ. i ¼ Mi1 JiT f :
ð40Þ
and
This last term corresponds to the correction joint acceleration for branch 1 resulting from the tip constraint force, f . Q. 1f ; Q. 2f are computed using the recursive CFA
1014
JARAMILLO-BOTERO AND CRESPO
algorithm and DQ. 1 ; DQ. 2 computed after finding f as described subsequently. Given the spatial velocity for chain 1 as V1; n1 ¼ J1 Q’ 1 , its corresponding spatial acceleration is deduced by differentiating with respect to time V’ i; ni ¼ Ji Q. i þ J’ i Q’ i :
ð41Þ
Replacing Eq. (38) in this last equation results in the expression for spatial accelerations, V’ i; ni ¼ Ji ½Q. if þ DQ. i þ J’ i DQ’ i ¼ V’ if þ Ji DQ. i ;
ð42Þ
where V’ if ¼ Ji Q. if þ J’ i DQ’ i ði ¼ 1; 2Þ. Using Khatib’s [14] definition of operational spaces, the operational space matrix is given by 1 T L1 i Ji M i J ;
ð43Þ
Ji DQ. i ¼ L1 i f:
ð44Þ
V’ i; ni ¼ V’ if þ L1 i f:
ð45Þ
Then, Eq. (42) becomes
Given the boundary condition in (36), it follows from (45) that f is given by 1 ’ ’ f ¼ ðL1 2 L1 Þ½V 2f V 1f :
ð46Þ
5. RESULTS In this section, we analyze the performance of the algorithms presented in this paper in lieu of implemented solutions. Several parallel computing architectures were used, among others a Cray T3E-300 with 128 processors, an SGI Origin2000 with 16 processors, an Intel Paragon with 256 processors, and a quad SMP PC running Linux Redhat 4.2. Implementation was coded in ANSI C using MPICH [http:// www-unix.mcs.anl.gov/mpi/mpich] v1.0 libraries—except in Paragon which used the message-passing nx libraries—optimized accordingly for the specific architectures (though the use of shared memory channels in SMP nodes of SGI Origin2000 and PC was not possible due to MPICH version limitations). Nonetheless, this MPI (Message Passing Interface) implementation proved to be highly portable between platforms. Profiling was managed with Totalview (Cray) and Upshot (Others). Due to the difference in computing architectures (some of the systems were shared memory tightly coupled like SMP nodes in SGI or PC box and others distributed like T3E or Paragon) performance characteristics varied significantly. Some of the differences will be duly noted in this section, primarily within the T3E and the Origin2000. A custom load balancing scheme was implemented in order to adjust to the algorithmic structures introduced with the multiprocessing architectures employed. The following experiments were conducted using fixed and floating base serial multibody chains to find performance indicators such as speedup (fastest serial— Newton–Euler dynamics—compared to parallel implementation), cost of mayor
STRICTLY PARALLEL RIGID MULTIBODY DYNAMICS
1015
functional stages in each algorithm, R=C ratio (R=run time vs. C=overhead costs, including communication and synchronization costs), and load distribution: 1. n ¼ P ðn ¼ Number of bodies; P ¼ ProcessorsÞ systems. 2. n > P systems (with communication packing as possible). Asynchronous communication primitives (receive) in MPI were used in order to maximize communication overlap with execution. The results presented below (Cray T3E-300) are measured according to the following functional stages: 1. 2. 3. 4. 5. 6. 7.
Pre-processing (per body parameters). NE [1]: Calculation of spatial velocities in inverse dynamics. NE [2]: Calculation of spatial accelerations in inverse dynamics. NE [3]: Calculation of spatial forces and effective forces in inverse dynamics. CFA [1]: Calculation of spatial active forces in forward dynamics. CFA [2]: Calculation of spatial constraint forces in forward dynamics. CFA [3]: Calculation of effective accelerations in forward dynamics.
From Fig. 5 it is clear that for low number of bodies, the computation of spatial velocities has the highest cost due to the required multibody parameters computed and sent to each log2 n cycle to neighboring powers of two processors (see Algorithm 1). The forward dynamics problem includes the computation of the inverse dynamics solution, which seems convenient to analyze evident, from the results in Fig. 5, granularity differences between both. As the number of bodies increases, the computation cost shifts to the cyclic reduction factorization scheme used in the inversion of the articulated body inertia operator of computational complexity Oðlog2 nÞ. This stage is the lower bound limit for computing the forward dynamics as depicted by Algorithm 2. It should be mentioned that the computation of the inverse/forward dynamics for free-floating multibodies behaves in an almost identical manner for a high number of bodies, given that the difference in computation is given by the extra single-body-dependent terms in the expressions found in steps 2 and 3 of Algorithm 2. It should also be noted that none of the results come from exclusive access to the computing hardware platforms, hence, predicting or comparing these with the
FIG. 5. Computation times per stage in forward dynamics algorithm (left) n ¼ P ¼ 16; (right) n ¼ P ¼ 128 (Platform: T3E-300).
1016
JARAMILLO-BOTERO AND CRESPO
FIG. 6. Schematic comparison between Amdahl’s law for various degrees of parallelism (0.7–0.98) and the complete parallel dynamics solution ðn ¼ P Þ (Platform: T3E-300).
theoretical results is not possible. For the same reason, the results presented are taken from the statistical average of several runs. From Fig. 6 it is clear that the speedup increases sharply, on the particular T3E-300 used, after n ¼ P ¼ 16 (directly proportional to synchronization costs in the parallel implementation). Nonetheless, better than 0.97% degree of parallelism is only achieved after approximately n ¼ P ¼ 45. The decrease in performance near the largest number of processors is because of communication overcosts due to contention of common interconnect routes. As noted previously, the lower bound limit on performance is given by the calculation of A in (26) and A0 in step 4 of Algorithm 2 (for the floating base case). For both cases the resulting matrix is SPD. We have developed a variant, based on a novel data distribution scheme [12], of the common parallel cyclic reduction (PCR) method that resulted in significant relative performance increase (see Fig. 7). 5.1. Execution/Communication Structure The Oðlog2 nÞ communication cycles can be clearly seen from Fig. 8. A barrier was included simply for visual purposes in order to separate the computation of the inverse dynamics algorithm (initial) from the forward dynamics algorithm (final).
STRICTLY PARALLEL RIGID MULTIBODY DYNAMICS
1017
FIG. 7. Modified cyclic reduction scheme parallel computation performance (left) with respect to best serial implementation (right) with respect to conventional PCR method (Platform: Intel-Paragon).
Two important results are also evident: (1) smaller granularity of the inverse dynamics solution, and (2) higher communication costs in forward dynamics but also higher communication run-time overlaps (lower total wait time per processor— for this particular example with a low number of bodies). 5.2. n > P Systems Typically, in large multibody systems (e.g., molecular systems), the number of bodies is much larger than the number of available computing elements. In order to test the behavior of the algorithms under varying granularity (incremental as n becomes larger than P given the reduced interprocessor communication requirements), we experimented with varying ratios of n=P > 1. The results are shown in Fig. 9.
FIG. 8. Execution/communication structure for n ¼ P ¼ 4 (Platform: Origin2000, upshot).
1018
JARAMILLO-BOTERO AND CRESPO
FIG. 9. Speedups for varying P and n=P (Platform: Cray T3E-300).
From these results shown in Fig. 9, it becomes clear that by increasing ratios of n=P (increased granularity) the speedup increases slightly (vertical shift in speedup curve), but at the same time the degree of parallelism is reduced. This is due to the increase in communication/run-time overlap and the augmented possibility to perform data packing hence reducing communication latency costs. 6. CONCLUSIONS This paper presents the theoretical foundations of a novel algorithm for the solution of the EOM of a large-scale system within the time lower bound for the problem. The solutions include a strictly parallel algorithm for the inverse dynamics problem, which can be used for robotics control (by simply adding friction and gearreduction terms without affecting the strictly parallel nature of the LIR). All the algorithms have been implemented using the MPI model and tested on several MIMD architectures for large fixed and free-floating multibody structures. The results obtained demonstrate the applicability of these algorithms in the control and simulation of large-scale (> 6 DOF) fixed/free-floating robotic manipulators (e.g., for spatial applications), and in other important technologies such as large-scale longterm MD simulations and computational molecular nanotechnology (CMN) development [3, 8, 9]. The preliminary results show very good scalability for both n ¼ P and n > P large multibody dynamics computations. On the other hand, the overhead in computation introduced by the parallel solutions proposed limits their applicability to relatively large multibodies ðn > 16Þ. For n516 multibodies, the fastest serial OðnÞ algorithms can have better performance running under conventional MIMD architectures. ACKNOWLEDGMENTS Part of this research was performed in collaboration with Dr. Amir Fijany at the Jet Propulsion Laboratory (JPL-NASA) and Drs. Tahir Cagin and William A. Goddard III at the Materials and Process
STRICTLY PARALLEL RIGID MULTIBODY DYNAMICS
1019
Simulation Center (MSC), California Institute of Technology (Caltech), under support from the Pontificia Universidad Javeriana (Cali, Colombia), the National Aeronautics and Space Administration (U.S.), Caltech (U.S.), and the Universidad Polit!ecnica de Valencia (Spain).
REFERENCES 1. W. Armstrong, Recursive solution to the equations of motion of an N-link manipulator, in ‘‘Proceedings of the 5th World Congress on Theory of Machines and Mechanisms,’’ Vol. 2, American Society of Mechanical Engineers, pp. 1343–1346, 1979. 2. H. Brandl, R. Johanni and M. Otter, A very efficient algorithm for the simulation of robots and similar multibody systems without inversion of the mass matrix, in ‘‘Proceedings of the 1st IFAC/ IFIP/IMACS Symposium,’’ Vienna, Austria, pp. 95–100, 1986. 3. T. Cagin, A. Jaramillo-Botero, G. Gao and W.A. Goddard III, Molecular mechanics and molecular dynamics analysis of Drexler–Merkle gears and neon pump, presented at The Fifth Foresight Conference on Molecular Nanotechnology, Palo Alto, CA, 1997. 4. R. Featherstone, The calculation of robot dynamics using articulated body inertias, Int. J. Robot. Res. 2(1) (1983), 14–30. 5. S.Y. Nof, ‘‘Handbook of Industrial Robotics,’’ 2nd ed., John Wiley & Sons, New York, 1999. 6. A.A. Shabana, ‘‘Computational Dynamics,’’ John Wiley & Sons Inc., New York, 1994. 7. A. Fijany and A. Bejczy, Parallel computation of manipulator inverse dynamics, J. Robot. Syst. 8(5) (1991), 599–635. 8. A. Fijany, A. Jaramillo-Botero, T. Cagin and W.A. Goddard III, A fast algorithm for massively parallel, long-term, simulation of complex molecular dynamics systems, presented at PARCO’97, Bonn, Germany, September 1997. 9. A. Fijany, A. Jaramillo-Botero, T. Cagin and W.A. Goddard III, A massively parallel algorithm for solution of constrained equations of motion in molecular dynamics, presented at the American Physical Society Meeting, Kansas City, MO, March 1997. 10. A. Fijany, I. Sharf and G. d’Eleuterio, Parallel Oðlog N Þ algorithms for computation of manipulator forward dynamics, IEEE Trans. Robot. Automat. 11(3) (1995), 389–400. 11. L. Hyafil and H.T. Kung, The complexity of parallel evaluation of linear recurrences, J. ACM 24(3) (1977), 513–521. 12. A. Jaramillo-Botero, ‘‘Massively Parallel Algorithms for Long-Term, Large-Scale Molecular Dynamics Simulation of Complex Structures,’’ Ph.D. thesis, UPV, 1998. 13. A. Jaramillo-Botero, A. Fijany, T. Cagin and W.A. Goddard III, ‘‘Parallel Rigid Multibody Molecular Dynamics: Serial Chains, Closed Chains, and Hyperbranched Systems,’’ MSC-Caltech Rep. No. 2, Group Meeting July 22, 1997. 14. O. Khatib, The operational space formulation in the analysis, design, and control of manipulators, presented at 3rd International Symposium in Robotics Research, 1995. 15. P.M. Kogge and H.S. Stone, A parallel algorithm for the efficient solution of a general class of recurrence equations, IEE Trans. Comput. C-22 (1973), 789–793. 16. C.S.G. Lee and P.R. Chang, Efficient parallel algorithms for robot inverse dynamics computation, IEEE Trans. Syst. Man Cybernet. 16(4) (1986), 532–542. 17. C.S.G. Lee and P.R. Chang, Efficient parallel algorithms for robot forward dynamics computation, IEEE Trans. Syst. Man Cybernet. 18(2) (1988), 238–251. 18. G. Lipmen, ‘‘Implementation of Parallel Oðlog N Þ multibody Algorithms on Hypercube Architectures,’’ Master’s thesis, University of California Irvine, 1994. 19. G. Rodriguez, A. Jain and K. Kreutz-Delgado, Spatial operator algebra for manipulator modeling and control, Int. J. Robot. Res. 10(4) (1991), 371–381.
1020
JARAMILLO-BOTERO AND CRESPO
20. G. Rodriguez, A. Jain and K. Kreutz-Delgado, Spatial operator algebra for multibody systems dynamics, J. Astronaut. Sci. 40(1) (1992), 27–50. 21. I. Sharf and G. d’Eleuterio, An iterative approach to multibody simulation dynamics suitable for parallel implementation, J. Dyn. Syst. Meas. Control 115 (1993), 730–735. 22. A.F. Vereshchagin, Computer simulation of the dynamics of complicated mechanisms of robotmanipulators, Eng. Cybern. 6 (1974), 65–70. 23. P. Amodio and F. Mazzia, Backward error analysis of the cyclic reduction for the solution of tridiagonal systems, Math. Comput. 62(206) (1994), 601–617. 24. P. Yalamov and V. Pavlov, Stability of the block cyclic reduction, Linear Algebra Appl. 249 (1996), 341–358.
! S JARAMILLO-BOTERO received his B.Sc. in electrical engineering from Boston University, ANDRE Massachusetts, his M.Sc. in computer science from the State University of New York (SUNYBinghamton), Binghamton, and his doctorate in industrial Engineering from the Polytechnic University of Valencia, Spain. He is professor and dean of the engineering faculty at the Pontificia Universidad Javeriana, in Cali, Colombia. He has also held previously the positions of computer engineering director and electronics engineering director at the same institution. His research interests include robot perception and control, real-time dynamic control and high performance computer architectures for control. He has authored and co-authored over 30 technical papers on these topics. He has held visiting positions with the Materials and Process Simulation Center (MSC) at Caltech, and with the Ultracomputing Group at the Jet Propulsion Laboratory (JPL-NASA) in Pasadena, California. He is a Fulbright scholar and has been the recipient of a JITA (Japan Industrial Technology Association) fellowship, during which he worked at the Robotics Department in the Mechanical Engineering Laboratory (MEL), AIST (Agency of Industrial Science and Technology) in Tsukuba, Japan. ALFONS CRESPO I. LORENTE received his B.S. and Ph.D. in electrical engineering from the Polytechnic University of Valencia, Spain, in 1979 and 1984, respectively. He is a professor and chair of the Department of Computer Engineering and Science at the Polytechnic University of Valencia. Since 1988, he has been the leader of the Real-Time research group, heading several national and European research projects. His areas of technical interest are real-time systems, integration of intelligent components in real-time systems, and real-time operating systems. He is a member of IEEE Computer Society. He has authored and co-authored over 15 journal papers and numerous technical articles. He has published five books on the subject of real-time programming and control.