Swing up and stabilization of the Acrobot via nonlinear optimal control based on stable manifold method

Swing up and stabilization of the Acrobot via nonlinear optimal control based on stable manifold method

10th IFAC IFAC Symposium Symposium on on Nonlinear Nonlinear Control Control Systems Systems 10th 10th IFAC Symposium on Nonlinear Control Systems Aug...

631KB Sizes 0 Downloads 43 Views

10th IFAC IFAC Symposium Symposium on on Nonlinear Nonlinear Control Control Systems Systems 10th 10th IFAC Symposium on Nonlinear Control Systems August 23-25, 2016. California, USA 10th IFAC Symposium on Nonlinear Control Systems August 23-25, 2016. Monterey, Monterey, California, USA 10th IFAC Symposium on Nonlinear Control Systems August 23-25, 2016. Monterey, California, USA August 23-25, 2016. Monterey, California, USA Available August 23-25, 2016. Monterey, California, USA online at www.sciencedirect.com

ScienceDirect IFAC-PapersOnLine 49-18 (2016) 374–379

Swing up and stabilization of the Acrobot Swing up and stabilization of the Acrobot Swing up and stabilization of the Acrobot Swing up and stabilization of the Acrobot via nonlinear optimal control based on via nonlinear optimal control based on via nonlinear optimal control based on via nonlinear optimal control based on stable manifold manifold method method stable stable manifold method stable manifold method ∗∗ ∗

Takamasa Takamasa Horibe Horibe ∗ Noboru Noboru Sakamoto Sakamoto ∗∗ ∗∗ Takamasa Horibe ∗∗∗ Noboru Sakamoto ∗∗ Takamasa Horibe Noboru Sakamoto ∗∗ ∗ ∗ Department of Aerospace Engineering, Graduate School of Department of Aerospace Engineering, Graduate School of ∗ ∗ Department of Aerospace Engineering, Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, ∗ Engineering, Nagoya University, Furo-cho, Graduate Chikusa-ku, Nagoya, Department of Aerospace Engineering, School of Engineering, [email protected]) University, Furo-cho, Chikusa-ku, Nagoya, Japan, (e-mail: Japan, (e-mail: [email protected]) Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, ∗∗ Japan, (e-mail: [email protected]) ∗∗ Department of Faculty Department of Mechatronics, Mechatronics, Faculty of of Science Science and and Engineering, Engineering, Japan, (e-mail: [email protected]) ∗∗ ∗∗ Department of Mechatronics, Faculty of Science and Engineering, Nanzan University, Yamazato-cho 18, Showa-ku, Nagoya, Japan, ∗∗Nanzan University, Yamazato-cho 18, of Showa-ku, Nagoya, Japan, Department of Mechatronics, Faculty Science and Engineering, Nanzan University, Yamazato-cho 18, Showa-ku, Nagoya, Japan, (e-mail: [email protected]) (e-mail: [email protected]) Nanzan University, Yamazato-cho 18, Showa-ku, Nagoya, Japan, (e-mail: [email protected]) (e-mail: [email protected]) Abstract: This paper considers Abstract: This paper considers the the problem problem of of swing swing up up and and stabilization stabilization for for the the Acrobot. Acrobot. It It is is Abstract: This paper considers the which problem ofbeen swingproposed up and stabilization fornonlinear the Acrobot. It is shown that stable manifold method has for computing optimal shown that This stable manifold method hasofbeen for computing optimal Abstract: paper considers the which problem swingproposed up and stabilization fornonlinear the Acrobot. It is shown that stable manifold method which has been for proposed for computing nonlinear optimal control is of feedback controllers this An stabilization control is capable capable of designing designing feedback controllers this problem. problem. An optimal optimal stabilization shown that stable manifold method which has been for proposed for computing nonlinear optimal control is is capable of as designing feedback controllers for this solving problem. An optimal stabilization controller obtained a single feedback law by numerically a Hamilton-Jacobi equation controller obtained a single feedback feedback law by numerically a Hamilton-Jacobi equation control is is capable of as designing controllers for this solving problem. An optimal stabilization controller is obtained asmethod. a single feedback lawthat by numerically solving a Hamilton-Jacobi equation by the manifold It unlike methods for up by the stable stable manifold It is is shown shown unlike existing existing methods for Acrobot Acrobot swing swing up controller is obtained asmethod. a single feedback lawthat by numerically solving a Hamilton-Jacobi equation by theasstable manifold method. It is shown that unlike existing methods for Acrobot swing up such partial feedback linearization, the resultant control is mechanically indigenous in the such partialmanifold feedback linearization, the resultant control is mechanically indigenous in the by theasstable method. It is shown that unlike existing methods for Acrobot swing up such as partial feedback linearization, the resultant is mechanically inlow. the sense that it uses uses reactions of arms arms effectively effectively and, as ascontrol consequence, control indigenous input is is kept kept sense that it reactions of and, aa consequence, control input such as partial feedback linearization, the resultant control is mechanically indigenous inlow. the sense that it uses reactions of arms effectively and, asrobustness a consequence, control input is kept low. A number of simulations verify the effectiveness and of the controller. A number of uses simulations verify the effectively effectiveness andasrobustness of thecontrol controller. sense that it reactions of arms and, a consequence, input is kept low. A number of simulations verify the effectiveness and robustness of the controller. A number of (International simulations verify the effectiveness and robustness the controller. © 2016, IFAC Federation of Automatic Control) Hosting byofElsevier Ltd. All rights reserved. Keywords: Acrobot, Acrobot, Nonlinear Nonlinear optimal optimal control, control, Hamilton-Jacobi Hamilton-Jacobi equation, equation, stable stable manifold manifold Keywords: method Keywords: method Keywords: Acrobot, Acrobot, Nonlinear Nonlinear optimal optimal control, control, Hamilton-Jacobi Hamilton-Jacobi equation, equation, stable stable manifold manifold method method 1. mentioned 1. INTRODUCTION INTRODUCTION mentioned as as aa single single feedback feedback control control method method under under some some 1. INTRODUCTION mentioned asthat a single feedback control method under some assumptions are difficult to satisfy generally. assumptions are feedback difficult to satisfy generally. 1. INTRODUCTION mentioned asthat a single control method under some assumptions that are difficult to satisfy generally. assumptions that areshow difficult to itsatisfy generally. design In In this this paper, paper, we we show that that it is is possible possible to to design In this paper, we show that it is possible to design Underactuated systems are mechanical systems which a single (without switching) optimal feedback controller Underactuated systems are mechanical systems which aIn single (withoutweswitching) optimal controller this paper, show that it is feedback possible to design a single (without switching) optimal feedback controller have fewer control inputs than degrees of freedom. ConUnderactuated systems are mechanical systems which for swing(without up and and switching) stabilizationoptimal of the the feedback Acrobot controller using the the have fewer control inputsare than degrees of systems freedom.which Con- for Underactuated systems mechanical swing up stabilization of Acrobot using a single trol underactuated systems isdegrees currently an active topic topic have control than of freedom. Constable manifold method (Sakamoto and van der Schaft, swing up and stabilization of the Acrobot using the trol offewer underactuated systems currently active haveof fewer control inputs inputs thanis degrees of an freedom. Con- for stable manifold method (Sakamoto and van der Schaft, for swing up and stabilization of the Acrobot using the trol many of underactuated systems is currently an active topic for researchers due wide application range in 2008; Sakamoto, 2013). The method has been developed manifold method (Sakamoto and van der Schaft, for researchers systems due to to is wide application range in stable trol many of underactuated currently an active topic 2008; Sakamoto, 2013). The method has been developed stable manifold method (Sakamoto and van der Schaft, for many orresearchers wide in for 2008; Sakamoto,computing 2013). Thethe method has been developed Robotics aerospace due fieldto (Liu andapplication Yu, 2013; 2013; range Xin and and numerically derivative of for Robotics aerospace field and Yu, Xin for many orresearchers due to(Liu wide application range in for numerically derivative of solution solution for 2008; Sakamoto,computing 2013). Thethe method has been developed for numerically computing the derivative of solution Liu, 2013). The Acrobot is a 2-dimensional underactuated Robotics or aerospace field (Liu and Yu, 2013; Xin and Hamilton-Jacobi equations (HJEs). When it is applied Liu, 2013).orThe Acrobotfield is a (Liu 2-dimensional underactuated Robotics aerospace and Yu, 2013; Xin and Hamilton-Jacobi equations (HJEs). When itofissolution applied for for for numerically computing the derivative mechanical system often as problem Liu, Acrobot is underactuated the Acrobot up it directly equations (HJEs). it applied for mechanical system often used used as a a benchmark benchmark problem for for Hamilton-Jacobi Liu, 2013). 2013). The The Acrobot is aa 2-dimensional 2-dimensional underactuated the Acrobot swing swing up problem, problem, it When directly enlarges the Hamilton-Jacobi equations (HJEs). When it is isenlarges applied the for mechanical systemcontrol often used as a benchmark problem for the testing nonlinear methods. It consists of two links RoA for stabilization so that the downward position is Acrobot swing up problem, it directly enlarges the testing nonlinear methods. It consists of two links mechanical systemcontrol often used as a benchmark problem for RoA for stabilization so that the downward position is the Acrobot swing up problem, it directly enlarges the testing nonlinear control methods. It consists of two links RoA for stabilization so that the downward position is and an actuator is installed only at the second joint. A included in RoA. For a survey and other solution methods and an actuator is installed only at the second joint. A testing nonlinear control methods. It consists of two links included RoA. For a so survey solution methods RoA for in stabilization that and the other downward position is common control objective of the Acrobot is to swing it up and an actuator is installed only at the second joint. A for HJEs, we refer to Aguilar and Krener (2014); Aliyu included in RoA. For a survey and other solution methods common control objective of the to swing it up and an actuator is installed onlyAcrobot at the issecond joint. A for HJEs,inwe refer andother Krener (2014); Aliyu included RoA. Fortoa Aguilar survey and solution methods from the downward position to unstable posicommon objective of Acrobot is swing (2011); Beeler et (1969); Navasca and we Aguilar and from the control downward position to the the unstable upright posicommon control objective of the the Acrobot is to toupright swing it it up up for (2011); Beeler et al. al.to (2000); Lukes (1969); (2014); NavascaAliyu and for HJEs, HJEs, we refer refer to(2000); AguilarLukes and Krener Krener (2014); Aliyu fromand the to downward position to the unstable upright posi(2011); Beeler et al. (2000); Lukes (1969); Navasca and tion stabilize it vertically. This is a challenging task Krener (2007). tion stabilize position it vertically. This is a challenging task Krener fromand the to downward to the unstable upright posi(2011); (2007). Beeler et al. (2000); Lukes (1969); Navasca and Krener (2007). because of the movement in a large range of nonlinearity. tion and to stabilize it vertically. This is a challenging task because movement in a large of nonlinearity. tion and of to the stabilize it vertically. Thisrange is a challenging task The Krener (2007). of the paper is as follows. The Acrobot The organization organization of the paper is as follows. The Acrobot because of the movement in aa large range of nonlinearity. because of the movement in large range of nonlinearity. The organization of the is as follows. the Thetheory Acrobot Generally, the swing up control is divided into two phases, model is introduced in §§paper 2. of Generally, the swing up control is divided into two phases, model is introduced in 2. §§ 33issummarizes summarizes of The organization of the paper as follows. the Thetheory Acrobot model is introduced in § 2. § 3 summarizes the theory of first, the swing up phase in which nonlinearity is dominant, Generally, the swing up control is divided into two phases, stable manifold method for HJEs. Controller design is first, the swing up phase in whichisnonlinearity dominant, Generally, the swing up control divided intoistwo phases, the the stable manifold method Controller design of is model is introduced in § 2. §for3 HJEs. summarizes the theory the stable manifold method for HJEs. Controller design is and then, stabilization phase which estabishes autonomus first, the swing up phase in which nonlinearity is dominant, precisely explained in § 5 and simulation results are shown and stabilization which estabishesisautonomus first,then, the swing up phasephase in which nonlinearity dominant, precisely § 5 andfor simulation results are shown the stableexplained manifold in method HJEs. Controller design is precisely explained in § 5 and simulation results are shown stability in a neighborhood of the origin. There are some and then, stabilization phase which estabishes autonomus in § 6. stability a neighborhood the origin. There are some in and then,instabilization phaseofwhich estabishes autonomus § 6. precisely explained in § 5 and simulation results are shown stability in a neighborhood of the origin. There are some § 6. other effective ways swing up such other effective ways to to design design swing up controllers controllers such in stability in a neighborhood of the origin. There are some in § 6. as using partial feedback linearization (Spong, 1995), enother effective ways to design swing up controllers such as using partialways feedback linearization (Spong, 1995),such en2. other effective to design swing up controllers 2. MODELING MODELING AND AND ANALYSIS ANALYSIS OF OF THE THE ACROBOT ACROBOT ergy feedback (Xin and 2012; Xin Kaneda, as partial feedback linearization en2. MODELING AND ANALYSIS OF THE ACROBOT ergy feedback (Xin and Yamasaki, Yamasaki, 2012;(Spong, Xin and and1995), Kaneda, as using using partial feedback linearization (Spong, 1995), en2. MODELING AND ANALYSIS OF THE ACROBOT ergy feedback (Xin and Yamasaki, 2012; Xin and Kaneda, 2007), trajectory tracking (Zhang al., 2013), 2007), trajectory tracking (Zhang et et al., Xin 2013), Lyapunov ergy feedback (Xin and Yamasaki, 2012; andLyapunov Kaneda, In this section, we derive aa nonlinear model of the Acrobot. In this section, we derive nonlinear model of the Acrobot. 2007), trajectory tracking (Zhang et al., 2013), Lyapunov based control et and In this1,section, we derive a nonlinear model of the Acrobot. based control (Zergeroglu (Zergeroglu et al., al.,et1998) 1998) and intelligent intelligent 2007), trajectory tracking (Zhang al., 2013), Lyapunov Figs. 2 show the Acrobot and its schematic model. The Figs. 2 show we thederive Acrobot and its schematic model. The In this1,section, a nonlinear model of the Acrobot. control (Brown and Passino, 1997). However, switching based control (Zergeroglu et al., 1998) and intelligent Figs. 1, torque 2 show isthe Acrobot and its schematic model. The control (Brown(Zergeroglu and Passino, However, switching control based control et 1997). al., 1998) and intelligent applied only to the second joint from an control applied only to its theschematic second joint fromThe an Figs. 1, torque 2 show isthe Acrobot and model. controller has no guarantee of stability in the vicinity of control (Brown and Passino, 1997). However, switching control torque is applied only to the second joint from an controller has noand guarantee stability in the vicinity of actuator control (Brown Passino,of 1997). However, switching through a pulley and a timing belt. actuator through a pulley and a timing belt. control torque is applied only to the second joint from an controller has no guarantee of stability in the vicinity of the boundary. Researchers in (Davison and Bortoff, 1997; through a pulley and a timing belt. the boundary. Researchers Bortoff, 1997; controller has no guaranteeinof(Davison stabilityand in the vicinity of actuator actuator through a pulley and a timing belt. the boundary. Researchers in (Davison and Bortoff, 1997; For the the ith ith link link (i (i = = 1, 1, 2), 2), qqi is Xin and Kaneda, Kaneda, 2001) propose propose methods to enlarge1997; the For is the the angle, angle, m mi is is the the mass, mass, Xin and 2001) methods enlarge the the boundary. Researchers in (Davison and to Bortoff, i For the ith link (ilci=is1,the 2), qdistance miii joint is theto mass, ii is the angle, region of attraction (RoA) of controllers, stabilizalli is the length, from ith the Xin propose methods to the region of Kaneda, attraction2001) (RoA) of linear linear controllers, stabilizais the length, l is the distance from ith joint to the the ith link (i = 1, 2), q Xin and and Kaneda, 2001) propose methods to enlarge enlarge the For is the angle, m is the mass, i liii is the length, lci is the distance fromaround ithi joint to the ci tion of the Acrobot by linear control is inherently difficenter of mass (COM), J is the inertia the center region of attraction (RoA) of linear controllers, stabilizai ci tion Acrobot (RoA) by linear control is inherently diffi- center of length, mass (COM), Ji isdistance the inertia the to center regionof ofthe attraction of linear controllers, stabilizali is the lci is the fromaround ith joint the tion of the AcrobotapproachOlfati-Saber by linear control is (2000) inherently difficenter of and masslet (COM), Jiigravitational is the inertiaacceleration around the (9.801 center cult. Backstepping should be of mass, g be a cult. Backstepping should be of mass, g be aJigravitational tion of the AcrobotapproachOlfati-Saber by linear control is (2000) inherently difficenter of and masslet (COM), is the inertiaacceleration around the (9.801 center cult. Backstepping approachOlfati-Saber (2000) should be of mass, and let g be a gravitational acceleration (9.801 cult. Backstepping approachOlfati-Saber (2000) should be of mass, and let g be a gravitational acceleration (9.801

Copyright © 2016 380 2405-8963 © IFAC (International Federation of Automatic Control) Copyright © 2016, 2016 IFAC IFAC 380Hosting by Elsevier Ltd. All rights reserved. Copyright © 2016 IFAC 380 Copyright © 2016 IFAC 380 Peer review under responsibility of International Federation of Automatic Copyright © 2016 IFAC 380Control. 10.1016/j.ifacol.2016.10.194

IFAC NOLCOS 2016 August 23-25, 2016. Monterey, California,Takamasa Horibe et al. / IFAC-PapersOnLine 49-18 (2016) 374–379 USA

375

m/s2 ). The equation of motion of the Acrobot is derived as follows by the method of Lagrange. M (q2 )¨ q + C(q, q) ˙ q˙ + G(q) = τ, (1) where ] [ M11 (q2 ) M12 (q2 ) M (q2 ) = M21 (q2 ) M22 ] [ a + a2 + 2a3 cos q2 a2 + a3 cos q2 = 1 a2 + a3 cos q2 a2 [ ] −a3 q˙2 sin q2 −a3 (q˙1 + q˙2 ) sin q2 C(q, q) ˙ = , a3 q˙1 sin q2 0 ] −b1 sin q1 − b2 sin(q1 + q2 ) , G(q) = −b2 sin(q1 + q2 ) [

Fig. 2. Acrobot model Table 1. System parameters

a1 = m1 L2c1 + m2 L21 + J1 , a2 = m2 L2c2 + J2 , a3 = m2 L1 Lc2 , b1 = (m1 Lc1 + m2 L1 )g, b2 = m2 Lc2 g. τ = [τ1 , τ2 ]T are the torque on the joints. We assume that there exists resistance force proportional with angular velocity such as viscous friction or counter electromotive force of the actuator. Then τ is given as τ1 = −µ1 q˙1 , τ2 = nKDC u − µ2 q˙2 , where u is the control input voltage for the actuator, n is gear ratio of the pulley, KDC is electromotive torque constant and µ1 , µ2 is the viscous resistance coefficient of the joints. Defining x = [x1 , x2 , x3 , x4 ]T = [q1 , q2 , q˙1 , q˙2 ]T as system variables, and letting H(q, q) ˙ = [H1 (q, q), ˙ H2 (q, q)] ˙ T = T C(q, q) ˙ q˙ + G(q) + [µ1 q˙1 , µ2 q˙2 ] , the dynamic equation (1) is rewritten in the state space form as x˙ = f (x) + g(x)u, (2) where   x3 x4[  ] , f (x) =  ˙  H1 (q, q) −1 −M (q2 ) H2 (q, q) ˙   0 0[  ] g(x) =  . 0 −1 −M (q2 ) nKDC

The purpose of this paper is to design a single nonlinear optimal feedback controller for swinging up and stabiliz-

m1

Arm1 weight

l1

Arm1 length

0.850[kg] 0.154[m]

lc1

Arm1 length from joint to COM

-0.0189[m]

J1

Arm1 inertia

6.25×10−3 [kg·m2 ]

µ1

Arm1 viscous resistance coefficient

5.50×10−3 [N·m·s]

m2

Arm2 weight

0.420[kg]

l2

Arm2 length

0.210[m]

lc2

Arm2 length from joint to COM

0.0743[m]

J2

Arm2 inertia

4.48×10−3 [kg·m2 ]

µ2

Arm2 viscous resistance coefficient

0.0160[N·m·s]

g

Acceleration of gravity

9.801[m/s2 ]

n

Gear ratio

14:48

KDC

Electromotive torque constant

0.0160[N·m/V]

ing the Acrobot from the stable equilibrium point x = [π, 0, 0, 0] to the unstable equilibrium point x = [0, 0, 0, 0]. The system parameters are shown in Table.1. 3. HAMILTON-JACOBI EQUATION AND STABLE MANIFOLD In this section, we briefly review the stable manifold method proposed in Sakamoto and van der Schaft (2008); Sakamoto (2013) for numerically solving HJEs for nonlinear optimal control problem. It is first shown that a stable manifold of the Hamiltonian system associated with a HJE is equivalent to the stabilizing solution of the HJE (I). Then, the stable manifold method algorithm to compute flows on the stable manifold of the Hamiltonian system is presented (II). Finally, the optimal state feedback function is constructed from the flow data by polynomial functions that define the stable manifold (III). In what follows, the procedures (I)∼(III) will be described with some details. Let us consider the optimal regulation problem for the following nonlinear affine system and quadratic cost function J.   x˙ = f (x) + g(x)u, x(0) = x0 ∫ (3) ) 1 ∞( T  J = x Qx + uT Ru dt, 2 0 where x ∈ Rn , u ∈ U ∈ Rm and R ∈ Rn×m , Q ∈ Rn×m are positive-definite matrices. We also assume that f (·) : Rn → Rn with f (0) = 0 and g(·) : Rn → R are all C ∞ . It is then possible to write f (x) = Ax+o(|x|), g(x) = B+O(|x|) with real matrices A ∈ Rn×n , B ∈ Rn×m .

Fig. 1. Acrobot experiment devise 381

IFAC NOLCOS 2016 376 Horibe et al. / IFAC-PapersOnLine 49-18 (2016) 374–379 August 23-25, 2016. Monterey, California,Takamasa USA

¯ Denote R(x) = g(x)R−1 g(x)T . Applying dynamic programming to the above problem, an HJE is derived as 1 ¯ 1 (HJ) H(x, p) = pT f (x) − pT R(x)p + xT Qx = 0 2 2 where p1 = ∂V /∂x1 , · · · , pn = ∂V /∂xn . The stabilization solution of (HJ) is defined as follows. Definition 1. A solution V (x) of (HJ) is said to be the stabilization solution if x = 0 is an asymptotically stable equilibrium of a vector field ∂H ∂p (x, p(x)), where p(x) = T (∂V /∂x) (x).

(I) A sufficient condition for the local stabilizing solution for (HJ) is obtained in van der Schaft (1991). It is a natural condition based on a linearization argument. We recall that, for the algebraic Riccati equation ¯ (RIC) P A + AT P − P R(0)P + Q = 0, which is the linearization of (HJ), a symmetric matrix P is said to be the stabilizing solution of (RIC) if it is a solution of (RIC) and A − R(0)P is stable. Theorem 1. If the Riccati equation has the stabilizing solution P , there exists, locally around the origin, the stabilizing solution V (x) to (HJ) with (∂ 2 V /∂x2 )(0) = P . Moreover, the set {(x, p) | p = (∂V /∂x)T } is a stable manifold of ∂H ∂H x˙ = (x, p), p˙ = − (x, p). (4) ∂p ∂x Throughout the paper, we assume that the stabilizing solution P to (RIC), which is a linearization of (HJ), exists. It means, as is stated in Theorem 1, that an n-dimensional stable manifold {(x, p)|p = p(x)} exists for (4) around the origin. Note that the function p(x) is used in feedback control. By a suitable linear coordinate transformation with T ∈ Rn×n , the linear part of Hamiltonian system (4) is diagonalized to get [ ′] [ ] [ ′] ¯ A − RP 0 x˙ x = ¯ )T p′ +higher order terms. p˙ ′ 0 −(A − RP (5) Recalling that P is the stabilizing solution of (RIC), A − ¯ is an asymptotically stable n × n matrix. RP

(II) The stable manifold algorithm is to compute flows on a stable manifold of a system of ordinary differential equations of the form { x˙ = F x + nx (x, p) Σ: (6) p˙ = −F T p + np (x, p) where nx , np consist of high order nonlinear C 1 functions and F ∈ Rn×n is a stable matrix. Let us define the sequence {xk (t, ξ)} and {pk (t, ξ)} by  ∫ t  Ft  eF (t−s) nx (xk (s), pk (s)) ds xk+1 = e ξ + 0 (7) ∫ ∞   −F T (t−s) pk+1 = − e np (xk (s), pk (s)) ds t

(8) x0 = eF t ξ, p0 = 0 with k = 0, 1, 2, · · · and arbitrary ξ ∈ Rn . Then, the following theorem holds. Theorem 2. (Sakamoto and van der Schaft, 2008) The sequence xk (t, ξ) and pk (t, ξ) are convergent to zero for sufficiently small |ξ|, that is, xk (t, ξ), pk (t, ξ) → 0 as

382

t → ∞ for all k = 0, 1, 2, · · · . Furthermore, xk (t, ξ) and pk (t, ξ) are uniformly convergent to a solution of (6) on [0, ∞) as k → ∞. Let x(t, ξ) and p(t, ξ) be the limits of xk (t, ξ) and pk (t, ξ), respectively. Then, x(t, ξ) and p(t, ξ) are the solution on the stable manifold of (6), that is, x(t, ξ), p(t, ξ) → 0 as t → ∞. This theorem states that we can approximate flows on the stable manifold of (6) by the sequence (7). (III) We apply the algorithm in (7)-(8) to (5) to get x′k (t, ξ), p′k (t, ξ). The flows on the stable manifold in the original coordinates is given by [ ] [ ′ ] xk (t, ξ) x (t, ξ) = T ′k pk (t, ξ) pk (t, ξ)

and the set Ωk = {(xk (t, ξ), pk (t, ξ)) | |ξ|:sufficiently small, t ∈ R} is a parametrization of the stable manifold. The approximation accuracy can be easily improved by employing larger k. If one can rewrite Ωk with the relation p = πk (x), then πk (x) will serve as an approximation of (∂V /∂x)(x). In our application for the Acrobot control, we will approximately obtain (∂V /∂x)(x) via functional fitting using polynomials. 4. CONTROLLER DESIGN 4.1 Stable manifold computation for Acrobot swing up

We apply the stable manifold method to the Acrobot. The weight matrix Q and R in (3) are chosen as follows considering maximum input voltage and stability by the corresponding LQR in the linear domain. Q = diag (0.001, 0.1, 0.0005, 0.1) , R = 1 The closed-loop poles λ of the system linearized around the origin with this weighting matrices are λ = −13.37, −5.35, −4.92, −4.64. In this approach, the trajectory xk (t, ξ) on the stable manifold is numerically computed for a large number of ξ’s. We have successfully found a ξ from which the corresponding function xk (t, ξ) passes through the pending position (π, 0, 0, 0). We have also found a different ξ for which trajectory xk (t, ξ) reaches the pending position with a much longer and involved trajectory. These two closed loop trajectories are show in Fig. 3, blue line (trajectory 1) and red dotted line (trajectory 2). They are on the same stable manifold and they both satisfy the same HJE for the optimal control problem. This phenomena is know to be an issue of non-uniqueness of solution of a HJE (Day, 1998) and has been firstly observed in a physical system in Fujimoto and Sakamoto (2011). Note that the stable manifold as an 4-dimensional manifold is unique but when it is projected on the base space x, a multi-valued function can be obtained. To see which one of the controllers is optimal, one has to compute the values of cost function. The values of the cost function are found to be 18.50(trajectory 1) and 28.01(trajectory 2) and one can see that the blue trajectory is optimal. The swing up motions by the two controllers are shown in Figs. 4, 5. In § 5, detailed simulations will be carried out for the optimal controller (trajectory 1).

IFAC NOLCOS 2016 August 23-25, 2016. Monterey, California,Takamasa Horibe et al. / IFAC-PapersOnLine 49-18 (2016) 374–379 USA

377

by the iteration. This means that the polynomial fitting is well implemented and the stable manifold along the trajectory is obtained accurately.

5

dq1[rad/s]

0 −5 −10 −15 −1

0

1

2 q1[rad]

3

4

5

−8

−6

−4 q2[rad]

−2

0

2

30

dq2[rad/s]

20 10 0 −10 −20 −30 −10

Fig. 3. Trajectories on the stable manifold The stable manifold around the the optimal swing up trajectory is shown in Fig.6. The thick line in Fig. 6 is the optimal swing up trajectory. One can see that the Acrobot swings up from the downward position (x=[±π, 0, 0, 0]) to the upright position(x=[0, 0, 0, 0]). In this trajectory, the first arm moves in the opposite direction from the origin (q˙1 < 0), and then it crosses q˙1 = 0 and converges to zero, that is, it swings up effectively using the reaction of the arms. It is significantly different from the motion by the partial feedback method in Spong (1995), and this intricate motion avoids the diverge of the second arm. The iteration number k of (7) is decided to satisfy the required accuracy. Recall that if the approximated solution is on the stable manifold, the Hamiltonian H(x, p) in (HJ) is zero throughout the whole trajectory. Therefore, we can use the Hamiltonian value as a measure of the accuracy of approximation. We set k = 15, and the maximum value of the Hamiltonian along the swing up trajectory is calculated as 10−6 , which shows the stable manifold is computed accurately. 4.2 Polynomial form feedback controller design In the previous subsection, we obtained numerical data x(t) = xk (t, ξ) and p(t) = pk (t, ξ) and the approximate stable manifold expression p = p(x) can be obtained in various ways. In this paper, the optimal feedback controller u∗ (x) = −1/2R−1 g(x)T p(x) (9) is approximated via polynomial fitting. The order of polynomial for p(x) is chosen as 7th for x1 , x3 and 5th for x2 , x4 . 5. SIMULATION RESULT In this section, the effectiveness of the proposed controller (9) is examined with simulations. The result is shown in Figs. 7, 8. The controlled trajectory is depicted with a thick blue line in Fig. 7 and it is almost identical with optimal trajectory in Fig.6 which is obtained 383

Fig. 7. Trajectory with control and the stable manifold

Next, to see robustness, we simulate the model (2) with parameter variations (+5% for a1 , a2 , b1 , −5% for a3 , b2 and +20% for µ1 , µ2 ) and external disturbance d2 =sgn(q˙2 ) ∗ 0.1[Nm] on the second joint (simulating coulomb friction). The resultant trajectory under the optimal control is illustrated in Fig. 9, with initial state x = [π, 0, 0, 0] and the time responses are shown in Fig. 10. Although deviation from the swing up trajectory in Fig. 6 is evident, the Acrobot motion is asymptotically stabilized. This is due to that the large stable manifold is computed in order for the perturbed trajectories to be included and the inherent robustness of the optimal controller is realized. Furthermore, a number of simulations with random system parameter variations are performed. Under the conditions that system parameters a1 , a2 , a3 , b1 , b2 , µ2 are varied in a range of ±5%, the proposed controller accomplishes the swing up and stabilization with 90% success rate. Even with ±10% perturbation, more than 40% success rate is confirmed.

6. CONCLUSION We have shown that it is possible to design an optimal feedback controller for swing up and stabilization for the Acrobot. The stable manifold algorithm is applied for the HJE derived for the Acobot model and the feedback controller is obtained by polynomial form. The effectiveness of the controller with robustness against parameter perturbation or disturbance is verified by simulations. This approach based on the stable manifold method is general and applicable not only to the Acrobot but also to many other nonlinear mechanical systems.

IFAC NOLCOS 2016 August 23-25, 2016. Monterey, California, USA 378 Takamasa Horibe et al. / IFAC-PapersOnLine 49-18 (2016) 374–379

0.5

0.5

0.4

0.4

Y[m]

0.3

0.5 0.4 t=1.5

0.3

t=0.7

0.2

0.2

0.2

0.1

0.1

0.1

0

0

0

−0.1

−0.1

−0.1

−0.2

−0.2

−0.3 −0.4

−0.4

−0.5

−0.5

−0.2 −0.1

0 X[m]

0.1

0.2

t=3.0

−0.2 t=0.7

−0.3

t=0

t=1.5

0.3

−0.3 −0.4

−0.2 −0.1

0 X[m]

0.1

0.2

−0.5

0.3

−0.2 −0.1

t = 0.7∼1.5[s]

t = 0∼0.7[s]

0 X[m]

0.1

0.2

0.3

t = 1.5∼3.0[s]

Fig. 4. Swing-up motion(1) by stable manifold method 0.5

0.5

0.5

0.5

0.4

0.4

0.4

0.4

0.3

0.3

0.3

0.3

0.2

0.1

0.2

Y[m]

0

0 −0.1

−0.2

−0.2

−0.3

−0.3

−0.3

−0.4

−0.4

−0.4

−0.5 −0.5

−0.5 −0.5

−0.5 −0.5

t=1.0

t=0

0.5

0 X[m]

0.5

t = 0.5∼1.0[s]

t = 0∼0.5[s]

0.2 0.1

0

−0.1

0 X[m]

t=1.4

0.1

0.1 Y[m]

Y[m]

t=0.5

t=0.5

Y[m]

0.2

t=2.5

0

−0.1

−0.1

−0.2

−0.2

t=1.4

−0.3

t=1.0

−0.4 0 X[m]

t = 1.0∼1.4[s]

0.5

−0.5 −0.5

0 X[m]

0.5

t = 1.4∼2.5[s]

Fig. 5. Swing-up motion(2) by stable manifold method REFERENCES Aguilar, C.O. and Krener, A.J. (2014). Numerical solutions to the Bellman equation of optimal control. Journal of Optimization Theory and Applications, 160(2), 527–552. Aliyu, M. (2011). Nonlinear H∞ -Control, Hamiltonian Systems and Hamilton-Jacobi Equations. CRC Press. Beeler, S.C., Tran, H.T., and Banks, H.T. (2000). Feedback control methodologies for nonlinear systems. Journal of Optimization Theory Appl., 107(1), 1–33. Brown, S.C. and Passino, K.M. (1997). Intelligent control for an acrobot. Journal of Intelligent and Robotic Systems, 18(3), 209–248. Davison, D.E. and Bortoff, S.A. (1997). Stabilization using pseudolinearization and high-gain feedback. Automatica, 33(2), 245–251. Day, M.V. (1998). On Lagrange manifolds and viscosity solutions. Journal of Mathematical Systems, Estimation and Control, 18, 369–372. Fujimoto, R. and Sakamoto, N. (2011). The stable manifold approach for optimal swing up and stabilization of an inverted pendulum with input saturation. In Proc. of IFAC World Congress. 384

Liu, Y. and Yu, H. (2013). A survey of underactuated mechanical systems. IET Control Theory Applications, 7(7), 921–935. Lukes, D.L. (1969). Optimal regulation of nonlinear dynamical systems. SIAM J. Control Optim., 7(1), 75– 100. Navasca, C. and Krener, A.J. (2007). Patchy solutions of Hamilton-Jacobi-Bellman partial differential equations. Lecture Notes in Control and Information Sciences, 364, 251–270. Olfati-Saber, R. (2000). Control of underactuated mechanical systems with two degrees of freedom and symmetry. American control conference, 6, 4092–4096. Sakamoto, N. (2013). Case studies on the application of the stable manifold approach for nonlinear optimal control design. Automatica, 49(2), 568–576. Sakamoto, N. and van der Schaft, A.J. (2008). Analytical approximation methods for the stabilizing solution of the hamilton-jacobi equation. IEEE Trans. Automatic Control, 53(10), 2335–2350. Spong, M.W. (1995). The swing up control problem for the acrobot. IEEE control systems, 15(1), 49–55. van der Schaft, A.J. (1991). On a state space approach to nonlinear H ∞ control. Syst. Control Lett., 16(1), 1–8.

IFAC NOLCOS 2016 August 23-25, 2016. Monterey, California, USA Takamasa Horibe et al. / IFAC-PapersOnLine 49-18 (2016) 374–379

379

Fig. 6. Stable manifold (light blue) around optimal swing up trajectory (thick)

Fig. 9. Trajectory with control under perturbation and diturbance

Fig. 8. Responses of states, input and estimated disturbance

Fig. 10. Responses of states, input and estimated disturbance under perturbation and diturbance

Xin, X. and Kaneda, M. (2001). A robust control approach to the swing up control problem for the Acrobot. IEEE International Conference on Intelligent Robots and Systems, 3, 1650–1665. Xin, X. and Kaneda, M. (2007). Analysis of the energybased swing-up control of the Acrobot. Int. J. Robust Nonlinear Control, 17(16), 1503–1524. Xin, X. and Yamasaki, T. (2012). Energy-based swingup control for a remotely driven acrobot: theorical and experimental results. IEEE Trans. Control System Technology, 20(4), 1048–1056. Xin, X. and Liu, Y. (2013). Reduced-order stable controllers for two-link underactuated planar robots. Automatica, 49(7), 2176–2183.

Zergeroglu, E., Dixon, W.E., Dawson, D.M., Jaffar, S., and Hannan, M.W. (1998). Lyapunov-based set-point control of the acrobot. International Journal of Robotics and Automation, 2, 887–891. Zhang, A., J. She, X.L., and Wu, M. (2013). Motion planning and tracking control for an acrobot based on a rewinding approach. Automatica, 49(1), 278–284.

385