Sequential Linear Quadratic Optimal Control for Nonlinear Switched Systems

Sequential Linear Quadratic Optimal Control for Nonlinear Switched Systems

Proceedings of the 20th World Congress Proceedings of 20th The International Federation of Congress Automatic Control Proceedings of the the 20th Worl...

492KB Sizes 0 Downloads 166 Views

Proceedings of the 20th World Congress Proceedings of 20th The International Federation of Congress Automatic Control Proceedings of the the 20th World World Congress Proceedings of the 20th World Congress The International of Automatic Control Toulouse, France,Federation July 9-14, 2017 Available online at www.sciencedirect.com The International Federation of The International Federation of Automatic Automatic Control Control Toulouse, France, July 9-14, 2017 Toulouse, France, July 9-14, 2017 Toulouse, France, July 9-14, 2017

ScienceDirect

IFAC PapersOnLine 50-1 (2017) 1463–1469

Sequential Linear Quadratic Optimal Control for Sequential Linear Optimal Control Sequential Linear Quadratic for SequentialNonlinear Linear Quadratic Quadratic Optimal Control for for SwitchedOptimal SystemsControl Nonlinear Switched Systems Nonlinear Switched Systems Nonlinear Switched Systems

Farbod Farshidian ∗∗ Maryam Kamgarpour ∗∗ Diego Pardo ∗∗ ∗∗ ∗∗ Maryam ∗∗ ∗ Farbod Farshidian Kamgarpour Diego Pardo ∗∗ ∗∗ Farbod Farshidian Maryam Kamgarpour Diego Jonas Buchli Farbod Farshidian Maryam Kamgarpour Diego Pardo Pardo ∗∗ Jonas Buchli Jonas Jonas Buchli Buchli ∗ ∗ Agile and Dexterous Robotics Lab, ETH Z¨ urich, Switzerland ∗∗ Agile and Dexterous Robotics Lab, ETH Z¨ urich, Switzerland ∗ Agile and Dexterous Robotics Lab, ETH Z¨ (e-mail: {farbodf, depardo, buchlij}@ethz.ch), Agile and Dexterous Robotics Lab,buchlij}@ethz.ch), ETH Z¨uurich, rich, Switzerland Switzerland ∗∗ Automatic (e-mail: {farbodf, depardo, (e-mail: {farbodf, depardo, buchlij}@ethz.ch), Control Laboratory, ETH Z¨ u rich, Switzerland (e-mail: {farbodf, depardo, buchlij}@ethz.ch), ∗∗ Automatic Control Laboratory, ETH Z¨ ∗∗ urich, Switzerland ∗∗ Automatic Control Laboratory, rich, (e-mail: [email protected]. Automatic Control Laboratory, ETH ETH Z¨ Z¨uuethz.ch). rich, Switzerland Switzerland (e-mail: [email protected]. ethz.ch). (e-mail: [email protected]. ethz.ch). (e-mail: [email protected]. ethz.ch). Abstract: In this contribution, we introduce an efficient method for solving the optimal control problem Abstract: In this contribution, introduce an efficient method for solving the optimal control problem Abstract: In we introduce an method for the control for an unconstrained nonlinearwe switched system with an arbitrary cost function. We assume that the Abstract: In this this contribution, contribution, we introduce an efficient efficient method for solving solving the optimal optimal control problem problem for an unconstrained nonlinear switched system with an arbitrary cost function. We assume that the for an unconstrained nonlinear switched system with an arbitrary cost function. We assume the sequence of the switching modes are given but the switching time in between consecutive modes for an unconstrained nonlinear switched system with an arbitrary costinfunction. We assume that that the sequence ofbethe the switching modes are given given butuses the aswitching switching time between consecutive modes sequence of switching modes are but the time in between consecutive modes remains to optimized. The proposed method two-stage approach as introduced by Xu and sequence of the switching modes are given but the switching time in between consecutive modes remains to be optimized. The proposed method uses a two-stage approach as introduced by Xu and remains to be proposed method uses aa two-stage approach as by Xu Antsaklis where theThe original optimal control is transcribed an equivalent remains to(2004) be optimized. optimized. The proposed method usesproblem two-stage approachinto as introduced introduced by problem Xu and and Antsaklis (2004) where the original optimal control problem is transcribed into an equivalent problem Antsaklis (2004) where the original optimal control problem is transcribed into an equivalent problem parametrized by the switching times and the optimal control policy is obtained based on the solution of a Antsaklis (2004) where the original optimal control problem is transcribed into an equivalent problem parametrized by switching times the control policy based the of parametrized by the the value switching times and and the optimal optimal control policy is is obtained obtained basedison on the solution solution of aaa two-point boundary differential equation. The main contribution of this paper to use a Sequential parametrized by the switching times and the optimal control policy is obtained based on the solution of two-point boundary value differential equation. The main contribution of this paper is to use aa Sequential two-point boundary value differential equation. The main contribution of this paper is to use Sequential Linear Quadratic approach to synthesize the optimal controller instead of solving a boundary value two-point boundary value differential equation. The main contribution of this paper is to use a Sequential Linear Quadratic approach to synthesize the optimal controller instead solving boundary value Linear Quadratic approach to synthesize the optimal controller instead of solving boundary value problem. The proposed method is numerically more efficient and scales veryof well to theaaahigh dimensional Linear Quadratic approach to synthesize the optimal controller instead of solving boundary value problem. The proposed method is numerically more efficient and scales very well to the high dimensional problem. The proposed method is numerically more efficient and scales very well to the high dimensional problems. In order to evaluate its performance, we use two numerical examples as benchmarks to problem. The proposed method is numerically more efficient and scales very well to the high dimensional problems. In to its use numerical examples as to problems.against In order order to evaluate evaluate its performance, performance, we use two twoexample, numerical examples as benchmarks benchmarks to compare the baseline algorithm. In the thirdwe numerical we apply the proposed algorithm problems. In order to evaluate its performance, we use two numerical examples as benchmarks to compare against the baseline algorithm. In the third numerical example, we apply the proposed algorithm compare against the baseline algorithm. In the third numerical example, we apply the proposed algorithm to the Center of Mass control problem in a quadruped robot locomotion task. compare against the baseline algorithm. In the third numerical example, we apply the proposed algorithm to the of control problem in aa quadruped robot locomotion task. to the Center Center of Mass Mass problem quadruped robot task. to Center Mass control controlFederation problem in in quadruped robot locomotion locomotion task. Ltd. All rights reserved. © the 2017, IFACof(International of aAutomatic Control) Hosting by Elsevier Keywords: Control design for hybrid systems, Switching stability and control, Optimal control of Keywords: Control designcontrol for hybrid hybrid systems, Switching stability and control,and Optimal control of Keywords: Control design for systems, Switching stability control, Optimal control of hybrid systems, Optimal theory, Real-time control, Riccatiand equations, Mobile robots. Keywords: Control designcontrol for hybrid systems, Switching stability control,and Optimal control of hybrid systems, systems, Optimal theory, Real-time control, Riccatiand equations, Mobile robots. hybrid Optimal control theory, Real-time control, Riccati equations, and Mobile robots. hybrid systems, Optimal control theory, Real-time control, Riccati equations, and Mobile robots. The optimal control problem for the switched systems involves 1. INTRODUCTION The optimal control problem for the switched systems involves 1. INTRODUCTION INTRODUCTION 1. The optimal problem for systems synthesizing the optimal controller for the continuous inputs The optimal control control problem for the the switched switched systems involves involves 1. INTRODUCTION synthesizing the optimal controller for the continuous inputs synthesizing the optimal controller for the continuous and finding a mode sequence and the switching times between Switched systems are a subclass of a general family known as synthesizing the optimal controller for the continuous inputs inputs and finding a mode sequence and the switching times between Switched systems are aa system subclassmodel of aa general general family known as the and finding aa mode sequence and the times modes. In general, the procedure ofswitching synthesizing anbetween optimal Switched systems are subclass of known as hybrid systems. Hybrid consistsfamily of a finite numand finding mode sequence and the switching times between Switched systems are a subclass of a general family known as general, the of optimal hybrid systems. Hybrid Hybrid system model consists consists of events finitewhich num- the the modes. modes. Infor general, the procedure procedure of synthesizing synthesizing an optimal control lawIn a switched system can be divided an into three hybrid systems. system model of aaa finite number of dynamical subsystems subjected to discrete modes. In general, the procedure of synthesizing an optimal hybrid systems. Hybrid system model consists of events finitewhich num- the control law for a switched system can be divided into three ber of dynamical subsystems subjected to discrete control law for a switched system can be divided into three subtasks (Giua et al., 2001; Xu and Antsaklis, 2004): (1) finding ber of dynamical subsystems subjected to discrete events which cause transition between these subsystems. This transition is control law for a switched system can be divided into three ber of dynamical subsystems subjected to discrete events which subtasks (Giua et al., 2001; Xu and Antsaklis, 2004): (1) finding cause transition between these subsystems. This transition is subtasks (Giua et al., 2001; Xu and Antsaklis, 2004): (1) finding the optimal sequence of the modes, (2) finding the optimal cause transition between these subsystems. This is either triggered by an external input, or through the transition intersection subtasks (Giua et al., 2001; Xu and Antsaklis, 2004): (1) finding cause transition between these subsystems. This transition is the optimal sequence of the modes,modes, (2) finding the optimal either triggered bystates an external external input, orcertain through the intersection intersection the optimal sequence of the (2) the times between (3) finding the opeither by an input, through the of the triggered continuous trajectory to aor manifolds known switching the optimal sequence ofconsecutive the modes, modes,modes, (2) finding finding the optimal optimal either triggered bystates an external input, orcertain through the intersection switching times between consecutive (3) finding the opof the continuous trajectory to a manifolds known switching times between consecutive modes, (3) finding the timal continuous control consecutive inputs. Given the switching sequence of continuous states trajectory to a certain manifolds known as the switching surfaces. Switched systems are usually characswitching times between modes, (3) finding the opopof the switching continuoussurfaces. states trajectory tosystems a certainare manifolds known timal continuous control inputs. Given the switching sequence as the Switched usually charactimal continuous control inputs. Given the switching sequence and times, the third subtask is a regular optimal control problem as the switching surfaces. Switched systems are usually characterized by systems that have continuous state transition during timal continuous control inputs. Given the switching sequence as the switching surfaces. Switched systems aretransition usually characand times, the third subtask is a regular optimal terized by systems systems that have have continuous state during and times, the third is control problem a finite of discontinuities in thecontrol systemproblem vector terized by that continuous state during these switches. Switched system models aretransition encountered in with and times, the number third subtask subtask is aa regular regular optimal optimal control problem terized by systems that have continuous state transition during with a finite number of discontinuities in the system vector these switches. Switched system models are encountered in with a finite number of discontinuities in the system vector field. The necessary condition of optimality in the context of these switches. Switched system models are encountered in many practical applications such as automobiles and locomoaThe finite numbercondition of discontinuities in the system vector these switches. Switched system models are encountered in with field. necessary of optimality in the context of many practical applications such as automobiles and locomofield. The necessary condition of optimality in the context of hybridThe systems has been derived from Pontryagin’s maximum many practical applications such as automobiles and locomotives with different gears, DC-DC converters, manufacturing field. necessary condition of optimality in the context of many practical applications such as converters, automobilesmanufacturing and locomo- hybrid systems has been derived from Pontryagin’s maximum tives with different gears, DC-DC hybrid systems has been derived from Pontryagin’s maximum principle (Branicky et al., 1998; Sussmann, 1999; Riedinger tives with different gears, DC-DC converters, manufacturing processes, biological systems, and robotics. hybrid systems has been derived from Pontryagin’s maximum tives with biological different gears, DC-DC converters, manufacturing principle (Branicky et al., 1998; Sussmann, 1999; Riedinger processes, systems, and robotics. robotics. principle (Branicky et Sussmann, 1999; et al., 2003) and subsequently, computational techprocesses, biological systems, and (Branicky et al., al., 1998; 1998; various Sussmann, 1999; Riedinger Riedinger processes, systems, and robotics.from an application principle et al., 2003) and subsequently, various computational techOur interestbiological in switched systems originates et al., 2003) and subsequently, various computational techniques have been developed to solve this problem (Shaikh and et al., have 2003)been anddeveloped subsequently, various computational techOura interest interest in switched switched systems originates from an application application Our in systems originates from an niques to solve this problem (Shaikh and on legged robot where we model the Center of Mass (CoM) as niques have been developed to solve this problem (Shaikh and Caines, 2007; Soler et al., 2012; Pakniyat and Caines, 2014). Our interest in switched systems originates from an application niques have been developed to solve this problem (Shaikh and onswitched leggedsystem. robot where where we model model the Center of Mass Mass (CoM) as as Caines, 2007; Soler et al., 2012; Pakniyat and Caines, 2014). aaa legged robot we of (CoM) aon The control goalthe is Center to synthesize a controller Caines, 2007; Soler et al., 2012; Pakniyat and Caines, 2014). on legged robot where we model the Center of Mass (CoM) as Caines, 2007; Soler et al., 2012; Pakniyat and Caines, 2014). awhich, switched system. The control goal is to synthesize a controller Based on the Pontryagin’s maximum principle, the optimal aa switched goal is to controller for system. a given The gait,control stabilizes while itaaminimizes switched The control goalthe is robot to synthesize synthesize controller Based on the Pontryagin’s maximum principle, the optimal which, for system. a given given gait, stabilizes the robot while itmanipulate minimizes Based on the maximum principle, the optimal solution satisfy a two-point boundary value for a gait, stabilizes the robot while it minimizes awhich, cost function. To fulfill this task, the robot can onshould the Pontryagin’s Pontryagin’s maximum principle, the problem optimal which, for a given gait, stabilizes thethe robot while itmanipulate minimizes Based solution should satisfy a two-point boundary value problem athe cost function. To fulfill this task, robot can solution should satisfy a two-point boundary value problem (BVP). However, similar to the classical control problem, the aa cost function. To fulfill the robot ground reaction forcesthis at task, the stance feet can and manipulate adjust the solution should satisfy a two-point boundary value problem cost function. To fulfill this task, the robot can manipulate However, similar to the classical control problem, the the ground groundtimes reaction forcesdifferent at the the stance stance feet and adjust adjust the the (BVP). (BVP). However, similar to the classical control problem, the difficulties related to numerical solution of the necessary conthe reaction forces at feet and switching between stance leg configurations. However, similar to the solution classicalofcontrol problem,conthe the groundtimes reaction forcesdifferent at the stance feet and adjust the (BVP). difficulties related to numerical the necessary switching between stance leg configurations. difficulties related to tolimits numerical solution of ofofthe the necessary condition of optimality the application this approach. In switching times between different stance leg configurations. For instance, assume the problem of controlling the walking difficulties related numerical solution necessary conswitching times between different stance leg configurations. dition of of optimality optimality limitsit the the application ofthat this for approach. In For instance, instance, assume robot. the problem problem of controlling controlling the walking dition limits this approach. In Riedinger et al. (1999), has application been shownof a LinearFor assume the of the walking gait for a quadruped In this task, the gait is fixed, thus dition of optimality limits the application of this approach. In For instance, assume robot. the problem of controlling the walking Riedinger et al. (1999), it has been shown that for a Lineargait for a quadruped In this task, the gait is fixed, thus Riedinger et al. (1999), it has been shown that for a LinearQuadratic (LQ) problem it is sufficient to solve a sequence of gait for aa quadruped In this task, the is fixed, the sequence of mode robot. switches are known. Thegait control task thus is to Riedinger et al. (1999), it has been shown that for a Lineargait for quadruped robot. In this task, the gait is fixed, thus (LQ) problem it is sufficient to solve aa sequence of the sequence sequence of mode forces switches are known. known. Theand control task is is to to Quadratic Quadratic (LQ) problem it is sufficient to solve sequence of Riccati equations with proper transversality conditions at the the of mode switches are The control task modulate the contact of the stance legs to determine (LQ) problem it is sufficient to solve a sequence of the sequence of mode forces switches are known. Theand control task is to Quadratic Riccati equations with proper transversality conditions at the modulate the contact of the stance legs to determine Riccati equations equations with proper transversality conditions at but the switching times inwith orderproper to optimize the continuous inputs modulate the contact forces of the stance legs and to determine the switching times between each mode. Riccati transversality conditions at the modulate the contact forces of the stance legs and to determine switching times in order optimize the continuous but the switching switching times between between each mode. mode. switching times to optimize the inputs but the mode switches shouldto still calculated based oninputs the enuthe switching times in in order order tobe optimize the continuous continuous inputs but the switching times times between each each mode. the mode switches should be still calculated based on the enuthe mode switches should be still calculated based on the enu This research has been supported in part by a Max-Planck ETH Center for merations of all the possible switches at each time step. In order the mode switches should be still calculated based on the enu merations all the possible switches at eachthe time step. In order  research has been supported in by aFarshidian, Max-Planck for merations of all switches at time step. order to ease theof computational burden of finding optimal switchLearning Systems Ph.D. fellowship to part Farbod a ETH SwissCenter National This has been supported part by Max-Planck Center for  This merations of all the the possible possible switches at each eachthe time step. In In order This research research hasPh.D. been fellowship supported in in part by aaFarshidian, Max-Plancka ETH ETH Center for to ease the computational burden of finding optimal switchLearning Systems to Farbod Swiss National to ease ease the computational computational burden of finding finding the optimal optimal switching behavior, in Bengea and DeCarlo (2005), the switched Science Foundation Professorship Award to JonasFarshidian, Buchli, theaNCCR Robotics, Learning Systems Ph.D. fellowship to Farbod Swiss National to the burden of the switchLearning Systems Ph.D. fellowship to Farbod Farshidian, a Swiss National ing behavior, in Bengea and DeCarlo (2005), the switched Science Foundation Professorship Award to Jonas Buchli, the NCCR Robotics, and Maryam Kamgarpour’s ERC Starting Grant CONENE. ing Science Foundation Professorship Award Jonas Buchli, ing behavior, behavior, in in Bengea Bengea and and DeCarlo DeCarlo (2005), (2005), the the switched switched Science Foundation Professorship Award to to JonasCONENE. Buchli, the the NCCR NCCR Robotics, Robotics, and Maryam Kamgarpour’s ERC Starting Grant

and and Maryam Maryam Kamgarpour’s Kamgarpour’s ERC ERC Starting Starting Grant Grant CONENE. CONENE. Copyright © 2017 IFAC 1499Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © 2017, IFAC (International Federation of Automatic Control) Copyright © 2017 IFAC 1499 Copyright © 2017 IFAC 1499 Peer review under responsibility of International Federation of Automatic Copyright © 2017 IFAC 1499Control. 10.1016/j.ifacol.2017.08.291

Proceedings of the 20th IFAC World Congress 1464 Farbod Farshidian et al. / IFAC PapersOnLine 50-1 (2017) 1463–1469 Toulouse, France, July 9-14, 2017

system is embedded in larger family of systems defined by the convex hull of the switched subsystems. It has been shown that if the sufficient and necessary conditions for optimality in the embedded systems exists, the bang-bang optimal solution of the embedded problem is also the optimal solution of the switched system; otherwise a sub-optimal solution can be derived. Borrelli et al. (2005) propose an off-line method to synthesize an optimal control law for a discrete linear hybrid system with linear inequality constraints. The proposed method is a combination of dynamic programming and multi-parametric quadratic programming which designs a feedback law for continuous and discrete inputs in the feasible regions. A simpler approach in Bemporad and Morari (1999) uses a mixed integer linear/ quadratic program to solve the optimal control problem for mixed logical dynamical systems. Optimizing the cost function with respect to the switching times has been studied for autonomous systems by Egerstedt et al. (2003); Johnson and Murphey (2011); Wardi et al. (2012) and for non-autonomous systems by Kamgarpour and Tomlin (2012). By using the derivative of the cost function with respect to the switching time, these methods use nonlinear programming techniques to optimize the cost function. However, in general these methods do not consider the sensitivity of the continuous inputs’ control law to the switching times. While many of the aforementioned approaches are computationally demanding for real-time robotic applications, there is a class of efficient optimal control algorithms known as Sequential Linear Quadratic (SLQ) methods which can be applied to real-time, complex robotic applications (Neunert et al., 2016). An SLQ algorithm sequentially solves the extremal problem around the latest estimation of the optimal trajectories and improves these optimal trajectories using the extremal problem solution (Mayne, 1966; Todorov and Li, 2005; Sideris and Bobrow, 2005). Motivated by their efficiency in solving regular optimal control problems, in this paper we have extended an SLQ algorithm to solve the optimal control problem for nonlinear switched system with predefined mode sequence. To this end, we adopt an approach introduced by Xu and Antsaklis (2004) where the primary switched problem is transcribed into an equivalent problem. We then introduce a two-stage optimization method to optimize the continuous inputs and the switching times. While Xu and Antsaklis (2004) use a computationally demanding approach based on solving a set of two-point BVPs, we propose a new SLQ algorithm to efficiently solve the optimal control problem for nonlinear switched systems. The main contributions of this paper are: (1) it uses an efficient SLQ algorithm to synthesize the optimal control law for the continuous inputs. (2) it calculates the cost function derivative with respect to switching times using an LQ approximation of the problem. This approximation is obtained without any additional computation from the SLQ solution. (3) it introduces a new practical application of the optimal control for switched systems in the field of motion planning of legged robots. 2. PROBLEM FORMULATION In this section, we briefly introduce the optimal control problem based on the parameterization of switching times. We assume that the switched system dynamics consist of I subsystems where the system dynamics for the ith subsystem (i ∈ {1, 2, . . . , I}) is as follows x˙ (t) = fi (x(t), u(t)) for ti−1 ≤ t < ti , (1)

where x(t) ∈ Rnx is the continuous state, u(t) ∈ Rnu is the piecewise continuous control input, and fi : Rnx × Rnu → R is the vector field of subsystem i which is continuously differentiable and Lipschitz up to the first order derivatives. ti is the switching time between subsystem i and i + 1. t0 and tI are respectively the given initial time and the final time. The initial state is x0 and x(ti− ) = x(ti+ ) at the switching moments because of the state continuity condition. The optimal control problem for the switched system in Equation (1) is defined as I

min Φ(x(tI )) + ∑ t ∈ T, u(·) i=1

 ti

ti−1

Li (x, u)dt,

(2)

where Φ(·) and Li (·, ·) are the final cost and the running cost (for subsystem i) which are continuously differentiable and Lipschitz up to the second order derivatives. T is a polytope I−1 in R defined as T = {(t1 , . . . ,tI−1 )|to ≤ t1 ≤ · · · ≤ tI−1 ≤ tI }. The optimal control problem based on the parameterization of switching times can be defined as the following two-stage optimization problem min J[t, x∗ , u∗ ] s.t. {x∗ , u∗ } = argmin J[t, x, u]. (3) t∈T

with I

J[t, x, u] = Φ(xI ) + ∑ (ti − ti−1 ) i=1

 i

i−1

Li (x(z), u(z))dz,

(4)

dx(z) = (ti − ti−1 )fi (x(z), u(z)) for i − 1 ≤ z < i (5) dz (6) t = (ti − ti−1 )(z − i) + ti , in which we have replaced the independent time variable t with a normalized time variable z defined by (6). With this change of variable, while the switching times are still part of the decision variables, they are fixed parameters for the bottom-level optimization in (3). Therefore, the reformulated optimal control problem does not have variable switching times and it reduces to a conventional optimal control problem parameterized over the switching times (Theorem 1 in Xu and Antsaklis (2004)). In the next section, we introduce our algorithm for calculating the optimized cost function and its gradient. 3. SOLUTION APPROACH In this section, we use a gradient-based method to solve the top-level optimization introduced in (3) and a dynamic programming approach to synthesize the continuous inputs control law. In each iteration of the gradient-based method for finding the optimal switching time, we first solve a continuous-time optimal control problem for the system with a fixed switching times. Then, the cost function gradient with respect to the switching times is calculated in order to determine the descent direction for the switching times update. This approach is similar to Xu and Antsaklis (2004). However, instead of using the two-point BVP solver for optimizing J(t) with respect to u and calculating its gradient with respect to t, we use a more efficient approach. This facilitates implementing this algorithm in realtime on complex systems such as legged robots. Our Optimal Control for Switched Systems algorithm (OCS2 algorithm) consists of two main steps: a method which synthesizes the continuous input controller and a method which calculates the parameterized cost function derivatives with respect to the switching times. For synthesizing the continuous input controller, OCS2 uses the SLQ algorithm in Algorithm 1. As a dynamic programming approach, SLQ uses the Bellman

1500

Proceedings of the 20th IFAC World Congress Farbod Farshidian et al. / IFAC PapersOnLine 50-1 (2017) 1463–1469 Toulouse, France, July 9-14, 2017

1465

equation of optimality to locally estimate the value function and consequently the optimal control law. In the second step of OCS2, we use the approximated problem for calculating the value function from the first step to efficiently compute the value function gradient with respect to switching times.

(13)

Using the SLQ algorithm to calculate the optimal control law for the continuous inputs has two major advantages. First, as discussed by Sideris and Rodriguez (2010), this algorithm has a linear time complexity with respect to the optimization time horizon in contrast to many standard discretization-based algorithms which scale cubically (such as the direct collocation methods). Second, since in each iteration of SLQ, an LQ problem is optimized, we can efficiently calculate the value function derivatives by obtaining derivatives of an LQ subproblem’s value function. However, in contrast to a regular timevariant LQ problem in which the system dynamics and the cost function are independent of the switching times, in this problem the cost and the system dynamics of the approximated LQ subproblem are functions of the switching times. Therefore, in order to calculate the cost function gradient, we should also consider the LQ subproblem variations to the switching times. We tackle this problem more rigorously in Theorem 1. Here, we first briefly introduce the SLQ algorithm. We consider an intermediate iteration of the algorithm. We assume that I ¯ {¯x(z)}Iz=0 and {u(z)} z=0 are the nominal state and input trajectories which are obtained by forward integrating system dynamics in (5) (performing a rollout) using the latest estimation of the optimal control law with a fixed switching time vector ¯t. For simplicity of notation, in the followings we have dropped the dependencies of the nominal trajectories and consequently the approximated LQ problem with respect to ¯t. The linearized system dynamics of the equivalent system in (5) around these nominal trajectories are defined as d(δ x) = (ti − ti−1 ) (Ai (z)δ x + Bi (z)δ u) i − 1 ≤ z < i dz ¯ ¯ ∂ fi (¯x(z), u(z)) ∂ fi (¯x(z), u(z)) , Bi (z) = , (7) Ai (z) = ∂x ∂u ¯ where δ x(z) = x(z) − x¯ (z), δ u(z) = u(z) − u(z). The quadratic approximation of the cost function in (4) is I

˜ t )+ ∑ J˜ = Φ(x f

 i

i=1 i−1

(ti − ti−1 )L˜ i (z, x, u)dz

˜ f (x) = q f + qf δ x + 1 δ x Q f δ x Φ 2 L˜ i (z, x, u) = qi (z) + δ x qi (z) + δ u ri (z) + δ xT Pi (z)δ u 1 1 (8) + δ x Qi (z)δ x + δ u Ri (z)δ u. 2 2 In the above, q(z), q(z), r(z), Q(z), P(z), and R(z) are the coefficients of the Taylor expansion of the cost function in (4) evaluated at the nominal trajectories. The optimal control law for this LQ extremal subproblem can be derived by solving the following Riccati equations (Bryson, 1975) dS(z) = (ti − ti−1 )W(z), S(i− ) = S(i+ ), S(I) = Q f (9) − dz ds(z) = (ti − ti−1 )w(z), − s(i− ) = s(i+ ), s(I) = q f (10) dz ds(z) = (ti − ti−1 )w(z), − s(i− ) = s(i+ ), s(I) = q f , (11) dz nx ×nx nx , s(z) and w(z) are in R , where S(z) and W(z) are in R s(z) and w(z) are in R. These matrices are defined as

W(z) = Q(z) + A(z) S(z) + S(z)A(z) − L(z) R(z) L(z) (12)

w(z) = q(z) + A(z) s(z) − L(z) R(z) l(z)

(14) w(z) = q(z) − 0.5α(2 − α) l(z) R(z) l(z)   −1  (15) l(z) = −R(z) r(z) + B(z) s(z)     −1 (16) L(z) = −R(z) P(z) + B(z) S(z) . The updated optimal control law, the value function, and the total cost are defined as ¯ + αl(z) + L(z)δ x u(z, x) = u(z) (17) 1 (18) V (z, x) = s(z) + δx s(z) + δx S(z)δx 2 (19) J˜ = V (0, x0 ) = s(0), where z is defined in (6). α ∈ [0, 1] is the learning rate for backtracking line-search (Armijo, 1966). In each iteration of SLQ, the line-search parameter controls the maximum step to move along the feedforward component of the control law update. This parameter is chosen by starting with a full step in the direction of the update (α = 1), and then iteratively shrinking the step size until the cost associated to the updated controller rollout is lower than the current nominal trajectories cost. Before proceeding to the main theorem and its proof, we state the following lemma which will be used in Theorem 1. Lemma 1. H : Rn → R and y : R → Rn as twice continuously ¯ < ε1 and |δy| < ε2 , for small differentiable functions. If |τ − τ| enough ε1 and ε2 , H(y(τ) + δy) can be approximated as ¯ H(y(τ) + δy)  H(¯y) + δy ∇H(¯y) + ∂ y¯  ∇H(¯y)(τ − τ) 1 ¯ + δy ∇2H(¯y)δy + δy ∇2H(¯y)∂ y¯ (τ − τ) 2   1 ¯ 2, + ∂ y¯  ∇2H(¯y)∂ y¯ + ∂ 2 y¯  ∇H(¯y) (τ − τ) 2 2

dy d y ¯ ∂ y¯ = dτ where y¯ = y(τ), |τ¯ , and ∂ 2 y¯ = dτ 2 |τ¯ . Furthermore, ∇H 2 and ∇ H are the gradient and the Hessian of H respectively.

Proof. This lemma can be easily proven using the Taylor series expansion of H and few steps of simplification. Theorem 2. The partial derivative of the value function in (18) and the partial derivative of the total cost with respect to the switching time t j can be derived as 1 ∂t j V (z, x) =∂t j s(z) + δx ∂t j s(z) + δx ∂t j S(z)δx − ∂t j x¯  s(z) 2 1 1 (20) − ∂t j x¯  S(z)δx − δx S(z)∂t j x¯ 2 2 ∂t j J˜ =∂t j s(0), (21) where z is defined by (6), ∂t j is the partial derivative operator with respect to switching time t j . Then ∂t j x¯ and ∂t j u¯ are respectively the state and input sensitivity to switching time t j which are calculated by the following differential equation d(∂t j x¯ ) ¯ =(δi, j − δi−1, j )fi (¯x(z), u(z)) dz   + (ti − ti−1 ) Ai (z)∂t j x¯ + Bi (z)∂t j u¯ (22) ¯ ∂t j x¯ ) = −L(z)∂t j x¯ + ∂t j L(z)δ x¯ + ∂t j l(z), (23) ∂t j u(z,

with the initial condition ∂t j x¯ (0) = 0 and the transversality condition ∂t j x¯ (i− ) = ∂t j x¯ (i+ ). δi, j is the Kronecker delta which is one if the variables are equal, and zero otherwise. Furthermore, ∂t j S(z), ∂t j s(z), and ∂t j s(z) can be calculated from the following set of linear differential equations

1501

Proceedings of the 20th IFAC World Congress 1466 Farbod Farshidian et al. / IFAC PapersOnLine 50-1 (2017) 1463–1469 Toulouse, France, July 9-14, 2017

Algorithm 1 SLQ Algorithm Given - Mode switch sequence and switching times - Cost function (4) and system dynamics (5) Initialization - Initialize the controller with a stable control law, u0 (·) repeat I , ¯ - Forward integrate system dynamics: { x¯ (z), u(z)} z=0 - Compute the LQ approximation of the problem along the nominal trajectory, (7) and (8). - Solve the final value differential equations, (9-11). - Line search for the optimal α with policy (17) ¯ + α ∗ l(z) + L(z)δ x - Update control law: u∗ (z, x) = u(z) until l(·)2 < lmin or maximum number of iterations d(∂t j S(z)) = (δi, j − δi−1, j )W(z) + (ti − ti−1 )∂t jW(z) (24) dz d(∂t j s(z)) − = (δi, j − δi−1, j )w(z) + (ti − ti−1 )∂t jw(z) (25) dz d(∂t j s(z)) − = (δi, j − δi−1, j )w(z) + (ti − ti−1 )∂t jw(z), (26) dz where we have ∂t jW(z) = A(z) ∂t j S(z) + ∂t j S(z)A(z) − ∂t j L(z) R(z)L(z) −

− L(z) R(z)∂t j L(z)

∂t jw(z) = ∂t j q(z) + A(z) ∂t j s(z) − ∂t j L(z) R(z)l(z)

Algorithm 2 OCS2 Algorithm Given - Mode switch sequence and initial switching times, t0 - The optimal control problem in equations (1) and (2) Initialization - Empty the solution bag - Initialize SLQ policy, u0 (·) - Initialize the switching times, t0 repeat - Compute the equivalent cost function in Equation (4) - Compute the equivalent system dynamics in Equation (5) if Solution bag in not empty then - Initialized the SLQ policy with the controller in the solution bag that has the most similar switching vector else - Initialized the SLQ policy, u0 (z, x). end if - Run the SLQ algorithm - Get the optimal control u∗ (z, x; t∗ ) - Memorize the pair (t∗ , u∗ (z, x; t∗ )) in the solution bag - Calculate ∂t j S(0), ∂t j s(0), and ∂t j s(0) using (24-26). - Calculate the cost function gradient ∇t J = [∂t j s(0)] j - Use a gradient-descent method to update t∗ until gradient-descent method converges Return the optimal t∗ , and u(z, x; t∗ ).   d x¯ ¯ + (ti − ti−1 ) Ai ∂t jx¯ + Bi ∂t ju¯ . = (δi, j − δi−1, j )fi (¯x, u) dz For simplicity in notation, we drop this dependencies on time, z. Using the continuity condition of the state trajectory, the order of derivatives on the right hand side of the equation is changed which is resulted in (22). Furthermore, since the control input trajectory in SLQ comprises a time-varying feedforward term and a time-varying linear state feedback, its sensitivity can be calculated as (23). Moreover, for the initial condition of this equation we have ∂t j x¯ (z = 0) = 0 due to the fixed initial state. Based on Lemma 1, the second order approximation of the intermediate cost L(x, u) around the nominal trajectories (which are in turn a function of t j ) can be written as     ∂t j x(z) q(z) ¯ + δu)  q(z) + (t − t¯ ) L(z, x¯ (s) + δx, u(s) ∂t j u(z) j j r(z)         Q(z) P(z) ∂t j x(z) q(z) δx + (t − t¯ ) + r(z) δu P(z) R(z) ∂t j u(z) j j          (t j − t¯j )2 q(z)  ∂t2j x(z) 1 δx  Q(z) P(z) δx + + r(z) P(z) R(z) δu ∂t2j u(z) 2 δu 2      (t j − t¯j )2 ∂t j x(z)  Q(z) P(z) ∂t j x(z) + (29) ∂t j u(z) P(z) R(z) ∂t j u(z) 2 ∂t j

− L(z) R(z)∂t j l(z)

 ∂t jw(z) = ∂t j q(z) − 0.5α(2 − α) ∂t j l(z) R(z)l(z)  + l(z) R(z)∂t j l(z)   ∂t j l(z) = −R(z)−1 ∂t j r(z) + B(z) ∂t j s(z)

∂t j L(z) = −R(z)−1 B(z) ∂t j S(z). The derivative of the cost coefficients with respect to t j is      Q(z) P(z)  ∂t j q(z)  ∂t j r(z)  = P(z) R(z)  ∂t j x(z) . (27) ∂t j u(z)   ∂t j q(z) q(z) r(z) Finally, the terminal condition are defined as ∂t jS(I) = 0, ∂t js(I) = Q f ∂t jx(I), ∂t js(I) = qf ∂t jx(I), (28) and the transversality conditions are ∂t jS(i− ) = ∂t jS(i+ ), ∂t js(i− ) = ∂t js(i+ ), ∂t js(i− ) = ∂t j s(i+ ). Proof. Equations (24-26) can be derived by directly differentiating the Riccati equations in (9-11), where based on the continuity condition, the order of differentiation with respect to time, z, and t j has been changed. For calculating ∂t jW, ∂t jw, and ∂t jw, we need to calculate the derivative of the linearized dynamics and the quadratic approximation of the cost function with respect to switching time t j as well. To do so, we need to examine the impact of t j variation on the system dynamics and cost function of the approximated LQ subproblem. In each approximated LQ subproblem, we use a quadratic approximation of the cost function components L and Φ around ¯ the nominal trajectories x¯ (z) and u(z) of the equivalent system (5). We can readily derive the differential equation which determines the sensitivity of the state trajectory with respect to switching time t j by differentiating both side of the equivalent systems dynamics in (5) with respect to t j .

In the above, we have combined the like terms of the state and input increments. Based on this equation the sensitivity of the cost function components will be ∂t j r(z) =P(z) ∂t j x(z) + R(z)∂t j u(z), ∂t j R(z) = 0, ∂t j q(z) =Q(z)∂t j x(z) + P(z)∂t j u(z), ∂t j Q(z) = 0, ∂t j q(z) =q(z) ∂t j x(z) + r(z) ∂t j u(z), ∂t j P(z) = 0, which can be written in matrix form as (27). By the same process, we can derive (28) for the final cost. In order to find the linearized system dynamics sensitivity with respect to the switching time t j , we can use the result from

1502

Proceedings of the 20th IFAC World Congress Farbod Farshidian et al. / IFAC PapersOnLine 50-1 (2017) 1463–1469 Toulouse, France, July 9-14, 2017

Lemma 1 where we only keep the terms up to the first order.    d  ¯ + Ai δx + Bi δu x¯ + ∂t jx(t j − t¯j ) + δx  (t¯i − t¯i−1 ) f(¯x, u) dz    ¯ + (t¯i − t¯i−1 ) Ai ∂t j x¯ + Bi ∂t j u¯ , + (t j − t¯j ) (δi, j − δi−1, j )fi (¯x, u) where for simplicity in notation, we drop the dependencies on time, z. By equating the coefficients, we get d x¯ ¯  (t¯i − t¯i−1 ) f (¯x, u) dz   dδx  (t¯i − t¯i−1 ) Ai δx + Bi δu dz   d(∂t j x) ¯ + (t¯i − t¯i−1 ) Ai ∂t jx¯ + Bi ∂t ju¯ ,  (δi, j − δi−1, j )fi (¯x, u) dz which are respectively the nominal trajectory equation, the linear approximation of system dynamics, and trajectory sensitivity. Based on this approximation, the linear part is not a function of t j , so we get ∂t jAi (z) = 0 and ∂t jBi (z) = 0 (note that the effect of the switching time variations on the approximated system dynamics would have been appeared if we had used a second order or a higher order approximation). 4. OCS2 ALGORITHM In Section 3, we have discussed the technical details behind the OCS2 algorithm. Here, we highlight the main steps of the algorithm for synthesizing the optimal continuous control law and the optimal switching times (refer to Algorithm 2). Each iteration of the OCS2 algorithm has three main steps namely (1) using SLQ algorithm to find the continuous inputs’ optimal control, (2) calculating the cost function gradient based on the LQ approximation of the problem which has been already calculated by the SLQ algorithm, (3) using a gradient descent method to update the switching times where we use the Frank-Wolfe method (Jaggi, 2013). An interesting aspect of our algorithm is its linear-time computational complexity with respect to the optimization time-horizon both in SLQ algorithm and calculating cost gradient. As we will show in the next section, this characteristic results in a superior performance of our algorithm in comparison to the two-point BVP approach originally introduced by Xu and Antsaklis (2004). As discussed, OCS2 uses the SLQ algorithm for solving a nonlinear optimal control problem on the equivalent system with fixed switching times (refer to Algorithm 1). The SLQ algorithm is based on an iterative scheme where in each iteration it forward integrates the controlled system and then approximates the nonlinear optimal control problem with a local LQ problem. In general, the SLQ algorithm requires an initial stable controller for the first forward integration. For deriving the initial controller, we define a set of operating points in each switching mode (normally one point) and approximate the optimal control problem around these operating points with an LQ approximation. Then, we follow the same process described in the SLQ algorithm to design an initial controller. A good initialization can often increase the convergence speed of the SLQ algorithm. One interesting characteristic of our algorithm is the warm starting scheme for the initial policy of SLQ. Here, we use a memorization scheme where we store the solutions of the different runs of SLQ in a solution bag and later initialize the policy of a new run of SLQ with the most similar switching time’s policy (refer to Algorithm 2). The similarity between two switching time sequences is measured as a sum of squared differences between corresponding times in two sequences.

1467

5. CASE STUDIES In this section, we evaluate the performance of the proposed method on three numerical examples. The first two examples are provided as illustrative cases to compare the performance of the proposed algorithm to the baseline algorithm introduced by Xu and Antsaklis (2004) and the third one is about the application of our method in motion control of legged robots. We here refer to the baseline algorithm as the BVP method since it is based on the solution of two-point BVPs. Example 1: The first example is a nonlinear switched system with three mode switches which are defined as followings   x˙1 = x1 + u1 sin x1 x˙1 = x2 + u1 sin x2 1: , 2: x˙2 = −x2 + u1 cos x2 x˙2 = −x1 − u1 cos x1  x˙1 = −x1 − u1 sin x1 3: x˙2 = x2 + u1 cos x2 

with the initial condition x0 = [2, 3] . The optimization goal is to calculate the optimal switching times t1 (from subsystem 1 to 2), t2 (from subsystem 2 to 3), and the continuous control input u1 such that the following cost function is minimized  3  J = 0.5x(3) − xg 2 + 0.5 x(t) − xg 2 + u(t)2 dt, 0



where xg = [1, −1] . We apply both OCS2 and BVP to this system with uniformly distributed initial switching times. The optimized switching times, the optimized cost, the number of the iterations, and number of function calls (bottom-level optimization calls) for both algorithms are presented in Table 1. As it illustrates, OCS2 and BVP converge to the same solution within a comparable number of iterations and function calls. However, the consumed CPU times are significantly different. The BVP method utilizes about 235 seconds on an Intel’s core i7 2.7-GHz processor while the OCS2 algorithm uses only 31 seconds which is roughly 7.5 times faster (Figure 1). Example 2: In order to examine the scalability of both algorithms to higher dimensions, we augment each subsystem in Example 1 with two more states and one additional control input. The augmented states have the following system dynamics   x˙3 = −x3 + 2x3 u2 x˙3 = x3 − 3x3 u2   , 2 : 1 : x˙4 = x4 + x4 u2 x˙4 = 2x4 − 2x4 u2  x˙3 = 2x3 + x3 u2 3 : x˙4 = −x4 + 3x4 u2 

with the initial condition x0 = [2, 3, 1, 1] . The cost function is  the same as Example 1 with xg = [1, −1, 2, 2] . As Example 1, the optimized solutions are comparable for both algorithms (refer to Table 1). However the CPU times are drastically different. For the BVP method, the CPU time is about 1500 seconds while the OCS2 algorithm uses only 76 seconds which is 19.5 times more efficient. This manifests the efficiency of the OCS2 algorithm in higher dimensional problems where computational time of the BVP algorithm prohibitively increases. Example 3: Quadruped’s CoM motion control In this example, we apply OCS2 to the quadruped’s CoM control problem. In a quadruped robot, based on the stance legs configuration the system shows different dynamical behavior. Furthermore, at the switching instances the non-elastic nature of contacts introduces a jump in the state trajectory. Therefore, basically a legged robot have to be modeled as a nonlinear hybrid system with discontinuous state trajectory.

1503

Proceedings of the 20th IFAC World Congress 1468 Farbod Farshidian et al. / IFAC PapersOnLine 50-1 (2017) 1463–1469 Toulouse, France, July 9-14, 2017

1

1000

switching times [s]

CPU Time [sec]

1500 BVP OCS2

500 0

Example 1

Example 2

Example 3

initial

final

0.5

0

1

2

3

4

5

6

7

8

step sequence [1]

Fig. 1. Comparison between the CPU time consumption of the BVP algorithm (blue) and the OCS2 algorithm (orange). The evaluation is done on an Intel’s core i7 2.7-GHz processor. The BVP method run-time on Example 3 is missing since the algorithm failed to terminate.

The cost function in this example is defined as (8). The control effort weighting matrix is a diagonal matrix which penalizes all continuous control inputs (contact forces) equally. Furthermore, our cost function penalizes intermediate orientation and z-offsets during the entire trajectory. In the final cost, we penalize deviations from a target point p = [3, 0, 0] which lies 3 m in front of the robot. For the following evaluation, we assume that the x direction points to the front, the y direction points to the left and the z direction is orthogonal to ground. Furthermore, we penalize the linear and angular velocities to ensure the robot comes to a stop towards the end of the trajectory.

Table 1. Comparison between the performance of the BVP and OCS2 algorithms. In the table Itr. is the number of iterations in the gradient-descent method until it converges. FC. is the number of requests for cost function and its derivative evaluation in the gradient-descent algorithm. EX1 EX2

Alg. BVP OCS2

J 5.4498 5.4438

s1 0.2235 0.2324

s2 1.0198 1.0236

Itr. 9 7

FC. 13 14

BVP OCS2

10.3797 10.3888

0.2754 0.2973

1.6076 1.5978

9 8

13 20

The computational burden of solving the optimal control problem on such a nonlinear and hybrid system has encouraged researchers to use CoM dynamics instead of the complete system dynamics. In addition to optimizing a lower dimension problem, this approach has another important advantage: the CoM state trajectory is more smooth and the impact force does not cause a noticeable jump in states. Therefore the CoM dynamics can be modeled as a switched system which facilitates solving the optimal control problem. The CoM system has in total 12 states consisting of: 3 states for orientation, θ , 3 states for position, p, and 6 states for linear and angular velocities in body frame, v and ω. The control inputs of the CoM system are an input which triggers the mode switch and the contact forces of the stance legs {λ i (t)}4i=1 . The simplified CoM equation can be derived from the Newton-Euler equation as  4    −1  ˙  ˙ θ = R(θ )ω, ω = I − ω × Iω + ∑ σi JTω,i λ i ,  i=0

    p˙ = R(θ )v,

Fig. 2. Comparison between the initial (blue) and optimized (yellow) switching times sequence. After optimization mode 6 and 7 switching times are set to zero and the number of total steps reduced to 6 form the initial 8 steps.

4  1 mg + ∑ σi JTv,i λ i v˙ = m i=0

where R(θ ) is the rotation matrix, g is gravitational acceleration in body frame, I and m are moment of inertia about the CoM and the total mass respectively. Jω,i and Jv,i are the Jacobian matrices of the ith foot with respect to CoM which depends on the foot position and CoM’s position and orientation (for technical specifications of this quadruped robot refer to Semini et al. (2011)). In this example, we study the trotting gait. In a quadruped trotting gait, the pairwise alternating diagonal legs are the stance legs. In each phase of the motion only two opposing feet are on the ground. Since we assume that our robot has point feet, it looses controllability in one degree of freedom, namely the rotation around the connecting line between the two stance feet. Therefore, the subsystems in each phase of the trotting gait are not controllable individually. However, the ability to switch from one diagonal pair to the other, allows the robot for gaining control over that degree of freedom. Therefore, for a successful trot, the robot requires to plan and control the contact forces as well as the switching times.

Figure 2 shows the difference between the initial and optimized switching times. The initial switching times have been chosen uniformly over the three-second period. During optimization the controller both extends and shortens different switching times. More interestingly, it set the time periods of two phases to zero. Therefore, although we initially asked the robot to take 8 steps the optimized trot has 6 steps. Figure 3 shows the optimized contact forces of the trotting experiment. The vertical contact force λz which needs to compensate for gravity is fairly equally distributed between the legs. However, we can see that the plots are not perfectly symmetric. This results from the quadruped being modeled after the hardware, which has unsymmetrical inertia. Additionally, we can see that the force profiles are non-trivial and thus would be difficult to derive manually. However, the contact forces are smooth within each step sequence which makes tracking easy. Figure 4 shows a few snapshots of the quadruped during the optimized trotting gait. The CoM motion is controlled by the OCS2-designed controller. The swing feet’s trajectories (the ones that are not in contact) are designed based on a simple heuristic that the distance between the CoM and a swing foot at the moment of take off (breaking contact) and at the moment of touch down (establishing contact) should be the same. 6. CONCLUSIONS AND FUTURE WORK In this paper, we have presented an efficient method to solve the optimal control problem for nonlinear switched systems with predefined mode sequence. The proposed method is based on a two-stage optimization scheme which optimizes the continuous inputs as well as the switching times. In order to obtain an accurate estimation of the cost function derivatives, Xu and Antsaklis (2004) have introduced a set of two-point BVPs. Although there exist many numerical methods for solving two-point BVP (e.g. collocation method), many of them do not scale properly to high dimensional problems such as controlling quadruped locomotion. To tackle this issue, we have proposed the OCS2 algorithm which is based on an SLQ algorithm. In order to obtain cost function derivatives, OCS2 takes into account the sensitivity of the approximated LQ models with respect to the switching times. OCS2 obtains an approximation of the LQ

1504

5 0 -5

6x

0

6z /100

6y

1

2

3

RF Contact Forces [N]

LF Contact Forces [N]

Proceedings of the 20th IFAC World Congress Farbod Farshidian et al. / IFAC PapersOnLine 50-1 (2017) 1463–1469 Toulouse, France, July 9-14, 2017

5 0 -5

6x

0

5 0 6x

0

6z /100

6y

1

2

3

time [s]

2

3

RH Contact Forces [N]

LH Contact Forces [N]

time [s]

-5

6z /100

6y

1

5 0 -5

6x

0

time [s]

6z /100

6y

1

2

3

time [s]

Fig. 3. Contact forces as optimized by the trotting gait controller. For visibility, contact forces in z-direction have been scaled by a factor of 1/100. The gray dotted vertical lines indicate the switching times. The non-symmetric pattern results from the fact, that the quadruped is not perfectly symmetric and thus also not modeled as such.

Fig. 4. Few snapshots of the modeled quadruped during optimized trotting gait. The CoM motion is controlled by the controller synthesized by OCS2 algorithm. The swing foot trajectories (the ones that are not in contact) are designed based on a simple heuristic over the CoM linear velocity. model sensitivity, only by using the LQ approximation of the problem which has been already calculated by SLQ. Therefore, the algorithm can calculate values of the derivatives with no further computational cost for evaluating the higher order derivatives of the system dynamics and cost function. This feature increases the proposed algorithm efficiency in comparison to other methods such as direct collocation which rely on higher order derivatives. In order to demonstrate the computational efficiency of the algorithm, we have compared the CPU time used by the OCS2 algorithm with the baseline method introduced by Xu and Antsaklis (2004). We observe that as the dimensions of state and input spaces increase, the difference in computational time between the algorithms becomes significant to the point that for the quadruped CoM control task, the run-time of the baseline algorithm becomes prohibitively long. REFERENCES Armijo, L. (1966). Minimization of functions having lipschitz continuous first partial derivatives. Pacific Journal of mathematics. Bemporad, A. and Morari, M. (1999). Control of systems integrating logic, dynamics, and constraints. Automatica. Bengea, S. and DeCarlo, R. (2005). Optimal control of switching systems. Automatica. Borrelli, F., Baoti´c, M., Bemporad, A., and Morari, M. (2005). Dynamic programming for constrained optimal control of discrete-time linear hybrid systems. Automatica. Branicky, M.S., Borkar, V.S., and Mitter, S.K. (1998). A unified framework for hybrid control: model and optimal control theory. IEEE Transactions on Automatic Control.

1469

Bryson, A.E. (1975). Applied optimal control: optimization, estimation and control. CRC Press. Egerstedt, M., Wardi, Y., and Delmotte, F. (2003). Optimal control of switching times in switched dynamical systems. In 42nd IEEE Conference on Decision and Control (CDC). Giua, A., Seatzu, C., and der Mee, C.V. (2001). Optimal control of switched autonomous linear systems. In 40th IEEE Conference on Decision and Control (CDC). Jaggi, M. (2013). Revisiting frank-wolfe: Projection-free sparse convex optimization. In ICML. Johnson, E.R. and Murphey, T.D. (2011). Second-order switching time optimization for nonlinear time-varying dynamic systems. IEEE Transactions on Automatic Control. Kamgarpour, M. and Tomlin, C. (2012). On optimal control of non-autonomous switched systems with a fixed mode sequence. Automatica. Mayne, D. (1966). A second-order gradient method for determining optimal trajectories of non-linear discrete-time systems. International Journal of Control. Neunert, M., de Crousaz, C., Furrer, F., Kamel, M., Farshidian, F., Siegwart, R., and Buchli, J. (2016). Fast nonlinear model predictive control for unified trajectory optimization and tracking. In IEEE International Conference on Robotics and Automation (ICRA). Pakniyat, A. and Caines, P.E. (2014). On the relation between the minimum principle and dynamic programming for hybrid systems. In 53rd IEEE Conference on Decision and Control. Riedinger, P., Iung, C., and Kratz, F. (2003). An optimal control approach for hybrid systems. European Journal of Control. Riedinger, P., Kratz, F., Iung, C., and Zanne, C. (1999). Linear quadratic optimization for hybrid systems. In 38th IEEE Conference on Decision and Control (CDC). Semini, C., Tsagarakis, N.G., Guglielmino, E., Focchi, M., Cannella, F., and Caldwell, D.G. (2011). Design of hyq– a hydraulically and electrically actuated quadruped robot. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering. Shaikh, M.S. and Caines, P.E. (2007). On the hybrid optimal control problem: theory and algorithms. IEEE Transactions on Automatic Control. Sideris, A. and Bobrow, J. (2005). An efficient sequential linear quadratic algorithm for solving nonlinear optimal control problems. IEEE Transactions on Automatic Control. Sideris, A. and Rodriguez, L.A. (2010). A riccati approach to equality constrained linear quadratic optimal control. In Proceedings of the 2010 American Control Conference. Soler, M., Kamgarpour, M., Tomlin, C., and Staffetti, E. (2012). Multiphase mixed-integer optimal control framework for aircraft conflict avoidance. In 51st IEEE Conference on Decision and Control (CDC). Sussmann, H.J. (1999). A maximum principle for hybrid optimal control problems. In 38th IEEE Conference on Decision and Control (CDC). Todorov, E. and Li, W. (2005). A generalized iterative lqg method for locally-optimal feedback control of constrained nonlinear stochastic systems. In Proceedings of the 2005 American Control Conference. Wardi, Y., Egerstedt, M., and Twu, P. (2012). A controlledprecision algorithm for mode-switching optimization. In 51st IEEE Conference on Decision and Control (CDC). Xu, X. and Antsaklis, P.J. (2004). Optimal control of switched systems based on parameterization of the switching instants. IEEE Transactions on Automatic Control.

1505