A Level Set Approach to Online Sensing and Trajectory Optimization with Time Delays⁎

A Level Set Approach to Online Sensing and Trajectory Optimization with Time Delays⁎

10th 10th IFAC IFAC Symposium Symposium on on Intelligent Intelligent Autonomous Autonomous Vehicles Vehicles Gdansk, Poland, July 3-5, 2019 10th IFAC...

758KB Sizes 6 Downloads 13 Views

10th 10th IFAC IFAC Symposium Symposium on on Intelligent Intelligent Autonomous Autonomous Vehicles Vehicles Gdansk, Poland, July 3-5, 2019 10th IFAC Symposium on Intelligent Autonomous Vehicles Available online at www.sciencedirect.com Gdansk, Poland, July 3-5, 2019 10th IFACPoland, Symposium on Intelligent Autonomous Vehicles Gdansk, July 3-5, 2019 Gdansk, Poland, July 3-5, 2019

ScienceDirect

IFAC PapersOnLine 52-8 (2019) 301–306

A Level Level Set Set Approach Approach to to Online Online Sensing Sensing and and A A Level Set Approach to Online Sensing and  Trajectory Time Delays A Level SetOptimization Approach to with Online Sensing and Trajectory Optimization with Time Delays Trajectory Optimization with Time Delays  Trajectory Optimization with∗,∗∗Time Delays Matthew R. Kirchner ∗,∗∗

Matthew R. Kirchner ∗,∗∗ Matthew R. Kirchner ∗,∗∗ Matthew R. Kirchner ∗,∗∗ ∗ ∗ Image and Signal Processing Branch, Research Directorate, Naval ∗ ∗ Image and Signal Processing Branch, Research Directorate, Naval Image andCenter Signal Weapons Processing Branch,China Research Directorate, Naval Air Warfare Division, Lake, CA USA Air Warfare ∗ Weapons Division, Lake, CA 93555, 93555,Naval USA andCenter Signal Processing Branch,China Research Directorate, AirImage Warfare Center Weapons Division, China Lake, CA 93555, USA (e-mail: [email protected]) (e-mail: [email protected]) ∗∗ Warfare Center Air Weapons Division, China Lake, CA 93555, USA (e-mail: [email protected]) ∗∗ and Computer Engineering Department, University ∗∗ Electrical Computer Engineering Department, University of of ∗∗ Electrical and (e-mail: [email protected]) Electrical and Computer Engineering Department, University of California, Santa Barbara, CA 93106, USA (e-mail: ∗∗ California, Santa Barbara, CA 93106, USA (e-mail: Electrical and Computer Engineering Department, University of California, [email protected]) Barbara, CA 93106, USA (e-mail: California, [email protected]) Barbara, CA 93106, USA (e-mail: [email protected]) [email protected]) Abstract: Presented Presented is is a a method method to to compute compute certain certain classes classes of of Hamilton–Jacobi Hamilton–Jacobi equations equations Abstract: Abstract: Presented is acontrol method to trajectory compute certain classes of Hamilton–Jacobi equations that result from optimal and generation problems with time delays. Many that result from optimal control and trajectory generation problems with time delays. Many Abstract: Presented is acontrol method to trajectory compute certain classes of the Hamilton–Jacobi equations that result from optimal and generation problems with time delays. Manyaa robotic control and trajectory problems have limited information of operating environment robotic control and trajectory problems have limited information of the operating environment that result from optimal control andonline trajectory generation problems withtime timeafter delays. Many robotic control and trajectory problems have limited information of the operating environment priori and must continually perform trajectory optimization in real collecting priori and mustand continually perform online optimization in real time after collectingaa robotic control trajectory problems havetrajectory limited information of the operating environment priori and must continually perform online trajectory optimization in real time after collecting measurements. The sensing and optimization can induce a significant time delay, and must measurements. The sensing perform and optimization can induce a significant delay, must priori and mustfor continually online trajectory optimization in realtime time after and collecting measurements. The sensing and optimization can induce a significant time delay, and must be accounted when computing the trajectory. This paper utilizes the generalized Hopf be accounted for when computing the trajectory. This paper utilizes the generalized Hopf measurements. Thewhen sensing and optimization can scaling induce a significant time delay, and must be accounted for computing the trajectory. This paper utilizes the generalized Hopf formula, which avoids the exponential dimensional typical of other numerical methods formula, which avoids the exponential dimensional scaling typical of other numerical methods be accounted for whentheto computing thedimensional trajectory. This paper utilizes the generalized Hopf formula, which avoids exponential scaling typical of other numerical methods for computing solutions the Hamilton–Jacobi equation. We present as an example a robot for computing the Hamilton–Jacobi equation. We present as an examplemethods a robot formula, which solutions avoids theto exponential dimensional scaling typical of other numerical for computing solutions to the Hamilton–Jacobi equation. We present as an example a robot that incrementally predicts a communication channel from measurements as it travels. As part that incrementally predicts a communication channel from measurements travels. As part for computing solutions to the Hamilton–Jacobi equation. Weofpresent as as anit example a robot that incrementally predicts a communication channel from measurements as it travels. As part of this example, we introduce a seemingly new generalization a non-parametric formulation of this example, we introduce a seemingly new generalization of a non-parametric formulation that incrementally predicts a communication channel from measurements as it travels. As part this example, we introduce a seemingly new generalization of a non-parametric formulation of robotic communication channel estimation. New communication measurements are used of robotic communication channel estimation. New communication measurements formulation are used to to of this example, we introduce a seemingly new generalization of a non-parametric of robotic communication channel estimation. New communication measurements are used to improve the channel estimate and online trajectory optimization with time-delay compensation improve the channel estimate and online trajectory optimization with time-delay compensation of roboticthe communication channel estimation. Newoptimization communication measurements are used to improve channel estimate and online trajectory with time-delay compensation is performed. is performed. improve the channel estimate and online trajectory optimization with time-delay compensation is performed. © performed. 2019, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. is Keywords: Time Time Delay Delay Systems, Systems, Hamilton–Jacobi Hamilton–Jacobi Equation, Equation, Generalized Generalized Hopf Hopf Formula, Formula, Keywords: Keywords: Time Delay Systems, Hamilton–Jacobi Equation, Generalized Hopf Formula, Viscosity Solution, Optimal Control, Communication Seeking Robotics Viscosity Solution, Optimal Control, Communication Seeking Robotics Keywords:Solution, Time Delay Systems, Hamilton–Jacobi Equation, Viscosity Optimal Control, Communication SeekingGeneralized Robotics Hopf Formula, Viscosity Solution, Optimal Control, Communication Seeking Robotics 1. INTRODUCTION INTRODUCTION Fast online online computation computation is is critical critical for for these these approaches, approaches, 1. Fast 1. INTRODUCTION Fast online computation is becomes critical for these approaches, and delay from computation more apparent as and delay from computation becomes more apparent as the the 1. INTRODUCTION Fast online computation iscomplexity critical for these approaches, and delay from computation becomes more apparent as the dimensionality and model of the optimization Time delays are common presence in real-world instantiadimensionality and model complexity of the optimization Time delays are presence in instantia- and delay from computation becomes more apparent as the dimensionality and model complexity of the optimization increases. In to delays from compuTime delays are common common in real-world real-world tions of dynamic systems. This issues stability In addition addition to time time delays induced induced from computions of dynamic systems. presence This causes causes issues with withinstantiastability increases. dimensionality and model complexity of the optimization Time delays are common presence in real-world instantiaincreases. In addition to time delays induced from computation, many many robotics robotics problems problems have have limited limited information information tions of dynamicwhen systems. This causes issuesreal-time with stability and robustness robustness attempting to design con- tation, and when attempting to conincreases. In addition toproblems time delays induced from computionsrobustness of dynamic systems. This causes issuesreal-time with stability tation, many robotics limited information of the operating environment aahave priori. The robot and when attempting to design design real-time control for these systems, and the challenges of design and of the operating environment priori. The robot must must trol for these systems, and the challenges of design and tation, many robotics problems have limited information and robustness when attempting to design real-time conthe operating environment a priori. The necessitates robot must perform sensing in real-time and as aa result, trol for these systems, and the challenges of design and of analysis of systems with time delays have been well studperform sensing in real-time and as result, necessitates analysis of with time delays have been well studthe re-compuation operating a priori. The necessitates robot must trol see for these systems, and challenges of design and of perform sensing inenvironment real-time and as atrajectory result, online of with analysis of systems systems withHistorically, timethe delays have have been well many studied; (2003). there been online re-compuation of the the optimal optimal trajectory with this this ied; see Richard Richard (2003). Historically, there have been many perform sensing in real-time and as a result, necessitates analysis of systems with time delays have been well studonline re-compuation of the optimal trajectory with this new information. information. As As methods methods for for sensing sensing in in robotics robotics have have ied; see Richard (2003). Historically, thereinhave beentheory, many new attempts to compensate compensate for time time delays delays control attempts to for in control theory, online re-compuation of the optimal trajectory with ied; see Richard (2003). Historically, there have been many new information. As methods for sensing in robotics have become more sophisticated, the time delay induced beattempts to compensate time delays in control theory, become more sophisticated, the time delay induced this such as as the the well-known for Padé approximation in classical classical besuch well-known Padé approximation in new information. As methodsexample for time sensing in robotics have attempts to compensate for time delays in control theory, become more sophisticated, the delay induced becomes larger. An illustrative appeared in Usman such as the well-known Padé approximation in classical linear control control theory theory (Franklin (Franklin et et al., al., 2006, 2006, Sec. Sec. 5.7.3), 5.7.3), comes larger. An illustrative example appeared in Usman linear become more An sophisticated, the time delay induced besuch ascontrol the well-known Padédelay approximation in classical comes larger. illustrative example appeared in Usman et al. (2016), where a complicated non-parametric model linear theory (Franklin et al., 2006, Sec. 5.7.3), which approximates a pure as a rational transfer et al. (2016), where a complicated non-parametric model which approximates a(Franklin pure delay as a 2006, rational transfer comes larger. An illustrative example appeared in Usman linear control theory et al., Sec. 5.7.3), et al. (2016), where a complicated non-parametric model was used to estimate the quality of a communication which approximates a pure delay as a rational transfer function or state state transformations such as presented in was used to estimate the quality of a communication function or such as in al.used (2016), where a complicated non-parametric model which and approximates a pureIndelay as a setting, rational transfer was estimate theof of a communication signal from measurements aa transmitter with function orPearson state transformations transformations such as presented presented in et Kwon (1980). a modern real-time signal fromto measurements ofquality transmitter with a a known known Kwon and Pearson (1980). In a modern setting, real-time was used to estimate the quality of a communication function orPearson state(RTOC) transformations such as presented in signal from of a transmitter a known location. In that scheme developed to Kwon and (1980).[Ross In a modern setting, real-time optimal control [Ross et al. al. (2006)] and model location. In measurements that work, work, aa RTOC RTOC scheme was waswith developed to optimal control (RTOC) et (2006)] and model signal from measurements ofand a transmitter with a known Kwon and Pearson (1980).[Camacho In a modern setting, real-time location. In that work, a RTOC scheme was developed to minimize combined motion communication energy of optimal control (RTOC) [Ross et al. (2006)] and model predictive control (MPC) and Alba (2013)] have minimize combined motion and communication energy of predictive control (MPC) [Camacho and Alba (2013)] have location. In that work, a RTOC scheme was developed to optimal control (RTOC) [Ross et al. (2006)] and model minimize combined motion and communication energy of the signal. The RTOC was re-computed every 10 seconds predictive control (MPC) [Camacho andand Albatrajectory (2013)] have gained large scale acceptance in control opthe signal. The RTOC was re-computed every 10 seconds gained large scale acceptance in control and trajectory opminimize combined motion and communication energy of predictive control (MPC) [Camacho and Alba (2013)] have the signal. The RTOC was re-computed every 10 seconds and it was noted that the combined sensing and trajectory gained large scale acceptance in control trajectory op- and it was noted that the combined sensing and trajectory timization problems. These seek seek to find find aaand control sequence timization problems. These to control sequence the signal. The RTOC was re-computed every 10 seconds gained scale acceptance in control trajectory op- and it was noted the combined optimization was around 22 seconds, or of the timization problems. These seek tominimizes find aand control sequence and thelarge resulting trajectory that minimizes a pre-defined pre-defined optimization wasthat around seconds,sensing or 20% 20%and of trajectory the entire entire and the resulting trajectory that a and it wasinterval. noted that the combined sensing and trajectory timization problems. These seek findcost a control sequence optimization was around 2 seconds, or 20% of the entire compute A delay this large can have drastic conand the resulting trajectory thattominimizes a pre-defined cost functional. RTOC optimizes the functional dicompute interval. A delay this large can have drastic concost functional. RTOC optimizes the cost functional dioptimization wasaccounted around seconds, or 20% of the entire and the resulting trajectory that minimizes a pre-defined compute interval. A delay 2this large can have drastic consequences if not for. This was not addressed in cost functional. RTOC optimizes the time cost horizon functional directly, while MPC computes on a finite and is sequences if not accounted for. This was not addressed in rectly, while MPC computes on a finite time horizon and is compute interval. A delay this large can have drastic concost functional. RTOC optimizes the cost functional disequences if not accounted for. This was not addressed in Usman et al. (2016). rectly, while MPC computes on a finite time horizon and is frequently referred to as receding horizon control. Because Usman et al. (2016). frequently referred to as receding horizon control. Because if not accounted for. This was not addressed in rectly, while MPC computes onand a finite time horizon andreis sequences Usman et al. (2016). frequently referred to as receding horizon control. Because of this, MPC is sub-optimal necessitates online of this, MPC is sub-optimal and necessitates online re- As (2008), et in al. Lu (2016). As noted noted in Lu (2008), attempts attempts to to directly directly account account for for frequently referred tocurrent as receding control. Because of this, MPC sub-optimal andhorizon necessitates online re- Usman computation ofisthe the state frequently, while RTOC As noted in Lu (2008), attempts to directly account for time delays in MPC formulations are limited, with the computation of current state frequently, while RTOC time delays in MPC formulations are limited, with the of this, MPC is sub-optimal and necessitates online recomputation current state frequently, while RTOC only needs needs to toofbe bethe re-computed online if the the system system is perper- As noted in in Lu (2008), attempts toare directly account for time delays MPC formulations limited, with the most notable proposed in Kwon et al. (2004) for linear only re-computed online if is notableinproposed in Kwon etare al. limited, (2004) for linear computation ofbethe current optimal state frequently, while RTOC only needs tothe re-computed online if the system is per- most turbed from computed trajectory. time delays MPC formulations with the most notable proposed in Kwon et al. (2004) for linear systems. However However this this work work is is restricted restricted to to aa specific specific LQR LQR turbed from optimal only needs be computed re-computed onlinetrajectory. if the system is per- systems. turbed fromtothe the computed optimal trajectory. most notable proposed in Kwon etfor al.control (2004) for linear systems. However this work is restricted to a specific LQR cost functional and does not account saturation. cost functional and does not account for control saturation. turbed from the computed optimal trajectory. systems. However this work is restricted toapproach a specific LQR  cost functional and does not account for control saturation. We present in this paper a more general based research was supported by the Office of Naval Research under  This We present in and this does paper a account more general approach based This research was supported by the Office of Naval Research under cost functional not for control saturation.  Grant We present in this paper a more general approach based ThisN00014-18-WX01382. research was supported by the Office of Naval Research under Grant  ThisN00014-18-WX01382. We present in this paper a more general approach based research was supported by the Office of Naval Research under Grant N00014-18-WX01382.

Grant N00014-18-WX01382. 2405-8963 © 2019 2019, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Copyright © Copyright © under 2019 IFAC IFAC Peer review responsibility of International Federation of Automatic Control. Copyright © 2019 IFAC 10.1016/j.ifacol.2019.08.087 Copyright © 2019 IFAC

2019 IFAC IAV 302 Gdansk, Poland, July 3-5, 2019

Matthew R. Kirchner / IFAC PapersOnLine 52-8 (2019) 301–306

on Hamilton–Jacobi theory, which provides a natural way to deal with pure time delay in the control. Generally, solutions to optimal control and trajectory problems can be found by solving a Hamilton–Jacobi (HJ) partial differential equation (PDE) as it establishes a sufficient condition for optimality [Osmolovskii (1998)]. Traditionally, numerical solutions to HJ equations require a dense, discrete grid of the solution space as in Osher and Fedkiw (2006); Mitchell (2008). Computing the elements of this grid scales poorly with dimension and has limited use for problems with dimension greater than four. The exponential dimensional scaling in optimization is sometimes referred to as the “curse of dimensionality” [Bellman (1957)]. A new result in Darbon and Osher (2016) discovered numerical solutions based on the Hopf formula [Hopf (1965)] that do not require a grid and can be used to efficiently compute solutions to a certain class of Hamilton–Jacobi PDEs. However, that only applied to systems with timeindependent Hamiltonians of the form x˙ = f (u (t)), and has limited use for general linear control problems. Recently, the classes of systems was expanded upon and generalizations of the Hopf formula were used to solve optimal linear control problems in high-dimensions in Kirchner et al. (2018a) and differential games as applied to a multivehicle, collaborative pursuit-evasion problem in Kirchner et al. (2018b). The HJ formulation of the trajectory problem allows a simple and direct treatment of these computationally-induced time delays and there is no need to resort to approximation schemes such as Padé. The main contribution of this paper is to generalize the Hopf formula to directly account for time delays induced by online computation of the optimal control and trajectory. Motivated by robotic vehicle path planning problems where the communication is sensed and estimated online, we present as an additional contribution a seemingly new non-parametric model to estimate a communication channel where the location of the transmitter is unknown a priori. The rest of the paper is organized as follows. Section 2 reviews HJ theory as it relates to linear optimal control and presents a level set method, based on the generalized Hopf formula, for fast computation of optimal trajectories with known time delay. Section 4 presents an online trajectory problem where a robot incrementally predicts a wireless communication channel from measurements as it travels. As part of this example, we derive a non-parametric model for channel estimation. Finally, we present results from simulations of the method in Section 5. 2. SOLUTIONS TO THE HAMILTON–JACOBI EQUATION WITH THE HOPF FORMULA Before proceeding we introduce some notation and assumptions. We consider linear dynamics d x (s) = Ax (s) + Bα (s) , s ∈ [0, t] , (1) ds where x ∈ Rn is the system state and α (s) ∈ A ⊂ Rm is the control input, constrained to the convex admissible control set A. We let γ (s; x, α (·)) ∈ Rn denote a state trajectory that evolves in time, s ∈ [0, t], with control 302

input sequence α (·) ∈ A according to (1) starting from initial state x at s = 0. The trajectory γ is a solution of (1) in that it satisfies (1) almost everywhere: d γ (s; x, α (·)) = Aγ (s; x, α (·)) + Bα (·) , ds γ (0; x, α (·)) = x. We construct a cost functional for γ (s; x, α (·)), given terminal time t as  t R (t, x, α (·)) = C (s, x, α (s)) ds + J (γ (t; x, α (·))) , 0

(2) where the function C : (0, +∞) × Rn × Rm → R ∪ {+∞} is the running cost and represents the rate that cost is accrued over time. The value function v : Rn × (0, +∞) → R is defined as the minimum cost, R, among all admissible controls for a given state x with v (x, t) = inf R (t, x, α (·)) . (3) α(·)∈A

The value function in (3) satisfies the dynamic programming principle [Bryson and Ho (1975); Evans (2010)] and also satisfies the following initial value Hamilton–Jacobi (HJ) equation by defining the function ϕ : Rn × R → R as ϕ (x, s) = v (x, t − s), with ϕ being the viscosity solution of   ∂ϕ (x, s) + H (s, x, ∇x ϕ (x, s)) = 0, (4) ∂s ϕ (x, 0) = J (x) ,

where the Hamiltonian H : (0, +∞)×Rn ×Rn → R∪{+∞} is defined by H (s, x, p) = sup {−f (s, x, α) , p − C (s, x, α)} . (5) α∈Rm

The variable p in (5) denotes the costate, which in the HJ equation (4) is associated with the gradient of the value function. We denote by λ (s; x, α (·)) the costate trajectory that satisfies almost everywhere: d  λ (s; x, α (·)) = ∇x f (γ (s; x, α (·)) , s) λ (s; x, α (·)) ds λ (t; x, α (·)) = ∇x J (γ (t; x, α (·) , s)) , for ∀s ∈ [0, t] with initial costate denoted by λ (0; x, α (·)) = p. With a slight abuse of notation, we will hereafter use λ (s) to denote λ (s; x, α (·)), since the initial state and control sequence can be inferred through context with the corresponding state trajectory, γ (s; x, α (·)). 2.1 Viscosity Solutions with the Hopf Formula Consider simplified system dynamics represented as d x (s) = f (α (s)) . (6) ds The associated HJ equation no longer depends on state and is given as   ∂ϕ (x, s) + H (∇x ϕ (x, s)) = 0, (7) ∂s ϕ (x, 0) = J (x) .

When J (x) is convex and continuous in x, and H (p) is continuous in p, it was shown in Evans (2010) that an exact, point-wise viscosity solution to (7) can be found using the Hopf formula [Hopf (1965)] ϕ (x, t) = − minn {J  (p) + tH (p) − x, p} , (8) p∈R

2019 IFAC IAV Gdansk, Poland, July 3-5, 2019

Matthew R. Kirchner / IFAC PapersOnLine 52-8 (2019) 301–306

with the Fenchel–Legendre transform, denoted J  : Rn → R ∪ {+∞}, defined for a convex, proper, lower semicontinuous function J : Rn → R ∪ {+∞} [Hiriart-Urruty and Lemaréchal (2012)] as J  (p) = sup {p, x − J (x)} . (9) x∈Rn

The transform defined in (9) is also referred to in literature as the convex conjugate. Proceeding similar to Kirchner et al. (2018b), we can generalize the Hopf formula to (1) by making a change of variables (10) z (s) = e−sA x (s) , which results in the following system d z (s) = f (s, α (s)) = e−sA Bα (s) . (11) ds The terminal cost function is now defined in z with   ϕ (z, 0) = J etA z . (12) For clarity in the sections to follow, we use the notation  to refer to the Hamiltonian for systems defined by (11) H and H for systems defined by (17). Notice that the system (11) does not depend on state but is now time-varying. It was shown in (Kurzhanski and Varaiya, 2014, Section 5.3.2, p. 215) that the Hopf formula can be generalized for a time-dependent Hamiltonian of the system in (11) with     ϕ (x, t) = − minn J  e−tA p (13) p∈R

+



t

0



 (s, p) ds − x, p , H

with Hamiltonian defined by  −sA    (s, p) = sup −e Bα, p − C (s, α) . H

303

as a ball with arbitrarily small radius. As noted in Kirchner et al. (2018a) we solve for the minimum time to reach the set Ω by constructing a newton iteration, starting from an initial guess, t0 , with ϕ (x, ti ) , (16) ti+1 = ti − ∂ϕ ∂t (x, ti ) where ϕ (x, ti ) is the solution to (13) at time ti . Notice the value function mush satisfy the HJ equation ∂ϕ (x, ti ) = −H (∇x ϕ (x, ti ) , x) , ∂t where ∇x ϕ (x, ti ) is the argument of the minimizer in (13), which we will denote as p∗ . No change of variable is needed for the Newton update and we have  H (p∗ , x) = −x A p∗ + −B  p∗ ∗ . We iterate (16) until convergence at the optimal time to reach, which we denote as t∗ . The optimal control can be found directly from the necessary conditions of optimality established by Pontryagin’s principal [Pontryagin (2018)] by noting the optimal trajectory, denoted as γ ∗ , must satisfy d ∗ γ (s; x, α∗ (·)) = −∇p H (λ∗ (s)) ds = Aγ (s; x, α∗ (·))   + B∇p −B  λ∗ (s) , ∗

where λ∗ is the optimal costate trajectory and is given by 

λ∗ (s) = e−sA p∗ . This implies our optimal control is      α∗ (s) = ∇p −B  e−sA p∗  , ∗

for all time s ∈ [0, t∗ ], provided the gradient exists.

α∈Rm

For the remainder of the paper, we consider a time-optimal  is defined as formulation in which H  −sA    H (s, p) = sup −e Bα, p − IA (α) , (14) α∈Rm

where IA : R → R ∪ {+∞} is the indicator function for the set A and is defined by  0 if α ∈ A IA (α) = +∞ otherwise. Suppose A is a closed convex set such that 0 ∈ int A, where  int A denotes the interior of the set A. Then (IA ) defines a norm, which we denote with (·)A∗ , which is the dual norm [Hiriart-Urruty and Lemaréchal (2012)] to (·)A . From this we can write (14) in general as      (s, p) =  H (15) −B  e−sA p . m

A∗

2.2 Time-Optimal Control to a Goal Set

Consider a goal set Ω ⊂ Rn and a task to determine the control that drives the system into Ω in minimal time. We represent the set Ω as an implicit surface with cost function J : Rn → R such that  J (x) < 0 for any x ∈ int Ω, J (x) > 0 for any x ∈ (Rn \ Ω) ,  J (x) = 0 for any x ∈ (Ω \ int Ω) , where int Ω denotes the interior of Ω. Note that if the goal is a point in Rn , then we can represent this by choosing Ω 303

Note that in the above formulation, we compute a viscosity solution to (7) without constructing a discrete grid and this method can provide a numerical solution that is efficient to compute even when the state space is highdimensional. Additionally, no derivative approximations are needed with Hopf formula-based methods, and this eliminates the numeric dissipation introduced with the Lax–Friedrichs scheme that is necessary to maintain stability in grid-based methods. 3. ONLINE TRAJECTORY OPTIMIZATION WITH DELAY

We’ll be considering a RTOC framework for the

rest of this work with a variable time horizon s ∈ 0, tk , where each tk is the optimal time-to-go as computed by Section 2.2. This can easily be applied to MPC problems by using a fixed, finite time horizon s ∈ [0, t]. The superscript k ∈ N is used to denote the computational update of each relevant quantity, with k = 0 being the first optimization. With this notation, xk denotes our initial state when we  initiate the computation, and likewise we denote by γ ∗ s; xk , αk (·) as the k-th computed optimal trajectory. We re-compute the control online after δk seconds of traveling on the  trajectory γ ∗ s; xk , αk (·) , which gives the new initial state for the next optimization as   xk+1 = γ ∗ δ k ; xk , αk (·) .

Using xk+1 as the new initial condition, we proceed to use the methods of Section 2.2 to compute αk+1 (s). Note that

2019 IFAC IAV 304 Gdansk, Poland, July 3-5, 2019

Matthew R. Kirchner / IFAC PapersOnLine 52-8 (2019) 301–306

the time between updates, δ k need not be uniform. Now consider the following linear state space model with time delay   d x (s) = Ax (s) + Buk s − τ k , (17) ds m with delayed control input uk ∈ A ⊂ R , and τ k is the step-specific time delay. We can represent the delay dynamics in the same form as (1) by defining αk (·), for k = 0, as   k u s − τk s≥τ   αk (s) = s < τ. αk−1 s + δ k−1

We only consider causal systems and therefore assume each τ k ≥ 0 and   0 u s − τ0 s≥τ 0 α (s) = 0 s < τ. The Hamiltonian becomes      s≥τ −B  e−sA p ∗  , H (s, p) = A   −sA k−1 k−1 s+δ s<τ Bα −p e

(18)

4.1 Channel Estimation with Unknown Emitter Location It was proposed in Malmirchegini and Mostofi (2012) to model the CNR (in dB) at a particular spatial location q ∈ R2 from  observed measurements as (20) ΥdB (q) = Γ (q; qb ) + ∆ (q) , where Γ (q; qb ) is a parametric model of path loss, relating CNR to spatial distance to the (known) transmitter location, qb and is given as Γ (q; qb ) = cP L − 10nP L log (q − qb ) , where cP L and nP L are constant parameters associated with the transmitter. The quantity ∆ (q) represents the deviation of the true CNR from Γ and is modeled nonparametrically as a Gaussian process [Rasmussen (2004)] with ∆ (qi ) ∼ GP (0, k∆ (qi , qi )) , for each qi , where k∆ is referred to as the covariance function and for modeling communication channels was suggested in Malmirchegini and Mostofi (2012) to be k∆ (qi , qj ) = ξ 2 e−

qi −qj  η

+ σρ2 ,

for the Hopf formula in (13) , with p denoting the optimal for any i, j ∈ {1, . . . , } channel measurements. The idea inital costate for the update. Likewise we solve for the of constructing a model as a composition of a known optimal time-to-reach of the delayed system using the parametric model and a non-parametric deviation is not methods of Section (2.2). The Hamiltonian for the Newton uncommon and is generally formulated as Gaussian proiteration becomes cess regression with explicit basis functions (Rasmussen,      k  k    −x A p + −B p A∗ s ≥ τ 2004, Chapter 2.7). We present the model while omitting     H pk , x = . a detailed derivation as this is outside the scope of this  s < τ work. For more information, the reader is encouraged to −x A pk − pk Bαk−1 s + δ k−1 (19) review Rasmussen (2004) for a thorough and complete This implies our control becomes review of Gaussian process regression and Malmirchegini    and Mostofi (2012) for the derivation of the proposed   −sA k  ∇p −B e p  s≥τ covariance functions for communication models. αk (s) = ∗   αk−1 s + δ k−1 s < τ. The model (20) assumes a priori knowledge of the location of the transmitter, which for many robotic path planning problems may not be available at run-time. We propose 4. AN EXAMPLE OF ONLINE SENSING AND to generalize (20) by considering the case where the TRAJECTORY OPTIMIZATION transmitting location is an unknown random variable We present as an example a robotic vehicle that uses a distributed with probability density function g (qb ) and radio transmitter to communicate data to a remote base model Υ with a single Gaussian process with ΥdB (q) ∼ GP (0, k (q, qj )) , station. The goal is to plan a trajectory to deliver the robot to a location with the best possible communication where performance, in minimum time. The location of base stak (qi , qj ) = k∆ (qi , qj ) + kΓ (qi , qj ) , tion is unknown a priori and hence the communication link with k (q , q ) denoting the covariance function associated Γ i j performance needs to be spatially estimated. Initially, the to the path loss of the signal. The last equation follows robot will have only a sparse number of samples of the from the fact the the sum of two kernels, i.e. (20), results communication link, or perhaps none at all, and needs to in the sum of the respective covariance functions. To determine the channel-to-noise ratio (CNR). As the robot determine k (q , q ), we follow Neal (1996) to get Γ i j moves, more samples are collected and the estimate is (q , q k Γ i j ) = Eqb [Γ (qi ; qb ) · Γ (qj ; qb )] improved. Each time the estimate is improved, a new tra jectory needs to be generated, since the original trajectory = g (qb ) Γ (qi ; qb ) Γ (qj ; qb ) dqb . is no longer optimal under the previous estimate. R2 k

This is similar to the problem addressed in Usman et al. (2016), but that work attempted to minimize total power, was not time-optimal, and did not account for time delays. We choose this example to demonstrate the method presented since communication link performance needs to be sensed online and the computation time for the channel estimation is non-trival. In the case of Usman et al. (2016), 2 seconds of computation was required for every 10 second computation cycle.

304



Letting Kq = [k (q, q1 ) , k (q, q2 ) , . . . , k (q, q )] and K a matrix with entries defined by Kij = k (qi , qj ), the posterior estimate of the channel at an unknown location q can be found with mean   ¯ dB (q) = K  K + σ 2 I −1 y, Υ q



ρ

with y = [y1 , y2 , . . . , y ] a vector of  measurements, and I the identity matrix of the approriate size. The variance of the channel estimate is given by

2019 IFAC IAV Gdansk, Poland, July 3-5, 2019

Matthew R. Kirchner / IFAC PapersOnLine 52-8 (2019) 301–306

(a) Channel estimate with 5 measurements along shown path.

(b) Channel estimate with 30 measurements along shown path.

305

(c) Channel estimate with 75 measurements along shown path.

Fig. 1. The channel signal estimates from measurements as a vehicle traverses a known path. The vehicle path is shown in red and the ground-truth transmitter location shown as a black diamond. Best viewed in color.  −1 Σ (q) = k (q, q) − Kq  K + σρ2 I Kq . An example is shown in Figure 1, where a vehicle collects measurements and uses the above method to estimate the unkown communucation channel. 5. RESULTS The methods presented above was implemented in MATLAB R2018b on a laptop equipped with an Intel Core i7-7500 CPU running at 2.70 GHz. We used as values for the parameters for communication channels what was presented in Usman et al. (2016) with σρ = 1.64, ξ = 3.20 cP L = −41.34, η = 3.09, and nP L = 3.86. While the time to compute could vary as more measurements are gathered, we force a fixed value of time delay for consistency. We also follow Usman et al. (2016) for these values with τ k = 2 seconds and δ k = 10 seconds for all k. The newton update of (16) and (19) is stopped when ϕ(x, ti ) <= 10−3 . 5.1 Planar Motion Example 

We choose for dynamics (17) with state x ∈ [q, q] ˙ , where q ∈ R2 is spatial position of a robot and q˙ ∈ R2 is the velocity and     0010 00 0 0 0 1 0 0 A= ,B= . 0 0 0 0 1 0 0000 01 2 The control u ∈ R is constrained to lie in the set u2 ≤ 1. Since the 2-norm is self-dual, the Hamiltonian (18) for this example is     −sA  e p s≥τ  −B  (s, p) = , H 2   −sA k−1 k−1 s+δ s<τ Bα −p e and the Hamitonian (19) for the Newton update becomes     H pk , x = −x A pk + Q pk , with   −B  pk   k s≥τ   k−1 2 Q p = . k−1  k s+δ p s<τ − Bα The control is found  as  −sA k  p   −B e  αk (s) = −B  e−sA pk     αk−1 s + δ k−1 2

s≥τ

.

s<τ 305



The initial position of the robot is q0 = [45, 30] with  initial velocity q˙0 = [−10, 0] , and the transmitter is  located at qb = [25, −25] . We assume no prior channel measurements before moving, unlike in Usman et al. (2016), and use for the prior of transmitter location a uniform distribution of the operating area, with qb ∼ U ([−50, 50] × [−50, 50]). The vehicle collects an additional sample when the vehicle has displaced η meters, the scale length of k∆ kernel function. The vehicle guides to a goal set that is an ellipsoidal neighborhood around the peak of the estimated CNR by setting     

˜k , W −1 x − x ˜k ≤ 1 , Ωk = x : x − x   where x ˜k = q˜bk , 0, 0 and q˜bk is the peak of the estimated CNR at the k-ith iteration. The matrix W is positive definite and defines the shape of the neighborhood. For this example, we use   1 0 0 0 0  0 1 0 W = , 2 0 0 Vmax 0  2 0 0 0 Vmax with Vmax being the maximal allowable velocity at the goal. We consider a vehicle that comes to close to rest at the goal, so set Vmax = 0.1. This implies the follow initial value function     ˜k W −1 x − x ˜k − 1. J k (x) = x − x Figure 2 shows the estimation and associated optimal trajectory for 3 optimization cycles. The vehicle starts with only a single sample of channel at the vehicle’s initial location. With the transmitter’s location unknown and assumed equally likely at any spot in the operating area, the estimate is biased towards the center of the area. The estimate and optimal trajectory to the peak in the estimated channel is shown in Fig. 2a. After traveling along this path for 10 seconds and acquiring more channel samples, a new estimate is shown in Fig 2b. The path was in an area of low received signal strength and the new estimate reduces the estimated channel in this region, as well as shifting the estimate of the peak closer to the ground-truth location. A time delay of 2 seconds was induced from the channel estimate and is compensated in the online update to the optimal trajectory, shown in green. Figure 2c shows another cycle of more measurements and an improved estimation of the channel with corresponding optimal path.

2019 IFAC IAV Gdansk, Poland, July 3-5, 2019 306

Matthew R. Kirchner / IFAC PapersOnLine 52-8 (2019) 301–306

(a) Channel estimate and first optimal trajectory from a single measurement.

(b) After traveling along the first trajectory of δ = 10 seconds, shown in solid black, the vehicle collects the measurements marked with a ’*’. The channel estimate is updated and a new optimal trajectory is computed as shown in green.

(c) The channel estimate is updated with additional measurements and a new optimal trajectory is shown in blue.

Fig. 2. Three optimization cycles of trajectories and their corresponding channel estimates and optimal trajectories. Best viewed in color. ACKNOWLEDGEMENTS The author would like to thank Arjun Muralidharan at the University of California, Santa Barbara (now a software engineer in the Network Infrastructure Group at Google, Inc.) for the many helpful and enlightening discussions on communication models used for robotic platforms. REFERENCES Bellman, R.E. (1957). Dynamic Programming, volume 1. Princeton University Press. Bryson, A.R. and Ho, Y.C. (1975). Applied Optimal Control: Optimization, Estimation and Control. CRC Press. Camacho, E.F. and Alba, C.B. (2013). Model Predictive Control. Springer. Darbon, J. and Osher, S. (2016). Algorithms for overcoming the curse of dimensionality for certain HamiltonJacobi equations arising in control theory and elsewhere. Research in the Mathematical Sciences, 3(1), 19. Evans, L.C. (2010). Partial Differential Equations. American Mathematical Society, Providence, R.I. Franklin, G.F., Powell, J.D., and Emami-Naeini, A. (2006). Feedback Control of Dynamic Systems. Prentice Hall, 5 edition. Hiriart-Urruty, J.B. and Lemaréchal, C. (2012). Fundamentals of Convex Analysis. Springer Science & Business Media. Hopf, E. (1965). Generalized solutions of non-linear equations of first order. Journal of Mathematics and Mechanics, 14, 951–973. Kirchner, M.R., Hewer, G., Darbon, J., and Osher, S. (2018a). A primal-dual method for optimal control and trajectory generation in high-dimensional systems. In IEEE Conference on Control Technology and Applications, 1575–1582. Kirchner, M.R., Mar, R., Hewer, G., Darbon, J., Osher, S., and Chow, Y. (2018b). Time-optimal collaborative guidance using the generalized Hopf formula. IEEE Control Systems Letters, 2(2), 201–206. Kurzhanski, A.B. and Varaiya, P. (2014). Dynamics and Control of Trajectory Tubes: Theory and Computation, volume 85. Springer. 306

Kwon, W. and Pearson, A. (1980). Feedback stabilization of linear systems with delayed control. IEEE Transactions on Automatic control, 25(2), 266–269. Kwon, W.H., Lee, Y., and Han, S.H. (2004). General receding horizon control for linear time-delay systems. Automatica, 40(9), 1603–1611. Lu, M.C. (2008). Delay Identification and Model Predictive Control of Time Delayed Systems. Ph.D. thesis, McGill University. Malmirchegini, M. and Mostofi, Y. (2012). On the spatial predictability of communication channels. IEEE Transactions on Wireless Communications, 11(3), 964–978. Mitchell, I.M. (2008). The flexible, extensible and efficient toolbox of level set methods. Journal of Scientific Computing, 35(2), 300–329. Neal, R.M. (1996). Bayesian Learning for Neural Networks, volume 118 of Lecture Notes in Statistics. Springer, New York. Osher, S. and Fedkiw, R. (2006). Level Set Methods and Dynamic Implicit Surfaces, volume 153. Springer Science & Business Media. Osmolovskii, N.P. (1998). Calculus of Variations and Optimal Control, volume 180. American Mathematical Society. Pontryagin, L.S. (2018). Mathematical Theory of Optimal Processes. Routledge. Rasmussen, C.E. (2004). Gaussian Processes in Machine Learning. Springer. Richard, J.P. (2003). Time-delay systems: an overview of some recent advances and open problems. Automatica, 39(10), 1667–1694. Ross, I.M., Gong, Q., Fahroo, F., and Kang, W. (2006). Practical stabilization through real-time optimal control. In American Control Conference, 2006, 6–pp. IEEE. Usman, A., Cai, H., Mostofi, Y., and Wardi, Y. (2016). Motion and communication co-optimization with path planning and online channel prediction. In American Control Conference (ACC), 2016, 7079–7084. IEEE.