Extraction and Deployment of Human Guidance Policies

Extraction and Deployment of Human Guidance Policies

1st IFAC Conference on Cyber-Physical & Human-Systems 1st on December 7-9, 2016. Florianopolis, Brazil& 1st IFAC IFAC Conference Conference on Cyber-P...

1MB Sizes 0 Downloads 39 Views

1st IFAC Conference on Cyber-Physical & Human-Systems 1st on December 7-9, 2016. Florianopolis, Brazil& 1st IFAC IFAC Conference Conference on Cyber-Physical Cyber-Physical & Human-Systems Human-Systems 1st IFAC Conference on Cyber-Physical & Human-Systems December 7-9, 2016. Florianopolis, Brazil Available online at www.sciencedirect.com December 7-9, 2016. Florianopolis, Brazil December 7-9, 2016. Florianopolis, Brazil

ScienceDirect IFAC-PapersOnLine 49-32 (2016) 095–100

Extraction and Deployment Extraction and Extraction and Deployment ExtractionGuidance and Deployment Deployment Policies Guidance Guidance Policies Guidance Policies Policies

of of of of

Human Human Human Human

Andrew Feit and B´ er´ enice Mettler Andrew Feit and B´ e r´ e nice Mettler Andrew e e Andrew Feit Feit and and B´ B´ er´ r´ enice nice Mettler Mettler Department of Aerospace Engineering and Mechanics, University of Department of Aerospace MN Engineering and Mechanics, University of of Department of Engineering and University Minnesota, Minneapolis, 55455 (e-mail: [email protected] Department of Aerospace Aerospace MN Engineering and Mechanics, Mechanics, Universityand of Minnesota, Minneapolis, 55455 (e-mail: [email protected] and Minnesota, MN Minnesota, Minneapolis, Minneapolis, [email protected]) MN 55455 55455 (e-mail: (e-mail: [email protected] [email protected] and and [email protected]) [email protected]) [email protected]) Abstract: Abstract: Abstract: Robust and adaptive human motion performance depends on learning, planning, and deploying Abstract: Robust and and adaptive human motion performance depends on learning, planning, and deploying Robust adaptive motion performance on planning, and primitive elements of human behavior. Previous work hasdepends shown how human motion behavior can be Robust and adaptive human motion performance depends on learning, learning, planning, and deploying deploying primitive elements of behavior. Previous work has shown how human motion behavior can be primitive elements of behavior. Previous work has shown how human motion behavior can partitioned at subgoal points, and primitive elements extracted as trajectory segments between primitive elements of behavior. Previous work has shown how human motion behavior can be be partitioned at subgoal points, and primitive elements extracted as trajectory segments between partitioned subgoal points, primitive elements between subgoals. Anat aggregate set of and trajectory segments areextracted describedas bytrajectory a spatial segments cost function and partitioned at subgoal points, and primitive elements extracted as trajectory segments between subgoals. policy. An aggregate set of trajectory segments are described by a spatial spatial cost function and subgoals. An of are by cost guidance In thisset paper, Gaussiansegments process regression is used approximate cost and subgoals. An aggregate aggregate of trajectory trajectory segments are described described by aa to spatial cost function function and guidance policy. In this thisset paper, Gaussian process regression regression is Patterns used to approximate cost and guidance policy. In paper, Gaussian process is used to approximate cost and policy functions extracted from human-generated trajectories. are identifying in the guidance policy. extracted In this paper, process regression is Patterns used to approximate cost policy function functions from Gaussian human-generated trajectories. are identifying in and the functions extracted from human-generated trajectories. Patterns are identifying in policy to further decompose guidance behavior into a sequence of motion primitives. policy functionstoextracted from human-generated trajectories. Patterns are identifying in the the policy function further decompose guidance behavior into a sequence of motion primitives. policy function to decompose guidance behavior into aa sequence motion primitives. A maneuver automaton model is introduced, simplifying taskof a larger spatial policy function to further further decompose guidance behavior the intoguidance sequence ofover motion primitives. A maneuver maneuver automaton model is introduced, introduced, simplifying the guidance taskthen over larger spatial A model is simplifying the guidance task over aaa larger spatial domain. The automaton maneuver automaton and approximated policy functions are used to generate A maneuver automaton model is introduced, simplifying the guidance task over larger spatial domain. The maneuver maneuver automaton and approximated policy functions are are then then used used to to generate generate domain. The automaton and approximated policy functions new trajectories, replicating original human behavior examples. domain. The maneuver automaton and approximated functions are then used to generate new trajectories, trajectories, replicating original human behavior policy examples. new replicating original human behavior examples. new trajectories, replicating original human behavior examples. © 2016, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Keywords: Motion guidance, Perception, Planning, Autonomous guidance, Transfer learning, Keywords: Motion guidance, Perception, Perception, Planning, Autonomous Autonomous guidance, Transfer Transfer learning, Keywords: Motion Motion automaton Keywords: Motion guidance, guidance, Perception, Planning, Planning, Autonomous guidance, guidance, Transfer learning, learning, Motion automaton Motion Motion automaton automaton 1. INTRODUCTION use of subgoals and motion primitives as elements of 1. INTRODUCTION use of subgoals and motion primitives as of 1. INTRODUCTION use of of subgoals subgoals and motionhuman primitives as elements elements of human behavior. and Modeling motion policy that 1. INTRODUCTION use motion primitives as elements of human behavior. Modeling human motion policy that human behavior. Modeling human motion policy that optimize a value Modeling function ishuman a necessary precursor to human behavior. motion policy that 1.1 Motivation optimize value function is necessary precursor to 1.1 Motivation optimize aaaperception-action value function function guidance is aaa necessary necessary precursor to modeling primitives in future 1.1 Motivation Motivation optimize value is precursor to 1.1 modeling perception-action guidance primitives in future modeling perception-action guidance primitives in future work. perception-action guidance primitives in future Humans and animals routinely generate a wide range of modeling work. work. Humans and animals routinely generate aa cluttered wide range of work. Humans and animals routinely generate wide range of adaptive and robust motion behavior in and Humans and animals routinelybehavior generate a cluttered wide rangeand of adaptive and robust 2. RELATED WORK adaptive environments. and robust robust motion motion behavior infarcluttered cluttered and dynamic Their behavior behavior in exceeds and the adaptive and motion in 2. RELATED dynamic environments. Their behavior far exceeds the 2. RELATED RELATED WORK WORK dynamic environments. environments. Their behavior far exceeds exceeds the capabilities of autonomous systems, despite also having 2. WORK dynamic Their behavior far the capabilities of autonomous systems, despite also having capabilities of autonomous systems, despite also having sensory and computational Understanding the 2.1 Invariants in Human Behavior capabilities of autonomouslimitations. systems, despite also having 2.1 Invariants in Human Behavior sensory and computational limitations. Understanding the 2.1 Invariants Invariants in in Human Human Behavior Behavior sensory and and computational limitations. Understanding the processes that allow agentslimitations. to learn andUnderstanding generate this the be- 2.1 sensory computational processes that allow agents to learn and generate this beprocesses that allow agents to learn and generate this behavior would be a significant advancement for autonomous Previous work investigates the guidance process by idenprocesses thatbeallow agents to learn and generate this behavior would significant advancement for autonomous Previous work investigates the guidance process by idenhavior would would be aaa significant significant advancement for for autonomous vehicle performance and human-machine system interac- tifying Previous work investigates the guidanceSimon process by idenidenpatterns in human behavior. (1972) inhavior be advancement autonomous Previous work investigates the guidance process by vehicle performance and human-machine system interactifying patterns in human behavior. Simon (1972) invehicle performance and human-machine system interaction. tifying patterns patterns in human human behavior. Simon Simon (1972)that introduced satisficing, which encompasses approaches vehicle performance and human-machine system interactifying in behavior. (1972) intion. troduced satisficing, which encompasses approaches that tion. troduced satisficing, which encompasses approaches that simplify or reduce the problem domain, in exchange for tion. troduced satisficing, which encompasses approaches that From a control theory perspective, guidance behavior can simplify or reduce the problem domain, in exchange for simplify or or reduce optimality. the problem problem domain, in Simon exchange for From a control theory perspective, guidance behavior can solution In domain, addition,in (1990) simplify reduce the exchange for From control theory perspective, guidance behavior behavior can reducing be defined by a theory value function and guidance policy. Kong reducing solution optimality. In addition, Simon (1990) From aa control perspective, can reducing solution optimality. In addition, Simon (1990) observes patterns in agent-environment interaction that be defined by a value function and guidance policy. Kong solution optimality. In addition, Simon (1990) be defined defined by (2013) value and function and guidance policy. Kong and Mettler Feit and andguidance Mettler policy. (2015) Kong pro- reducing observes patterns in agent-environment interaction that be by aa value function observes the patterns in agent-environment agent-environment interaction and Mettler (2013) and Feit and Mettler (2015) prosimplify guidance task. Specifically, Simon notes that observes patterns in interaction that and Mettler (2013) and Feit and Mettler (2015) propose that human guidance behavior is also described by simplify the guidance task. Specifically, Simon notes that and Mettler (2013) and Feit and Mettler (2015) prosimplify the guidance task. Specifically, Simon notes that pose that human guidance behavior is also described by of human motion canSimon be described by simplify the guidance task. behavior Specifically, notes that pose that that human guidance behavior behavior is(1976) also described described by elements value and human policy functions. Lee et al.is and Mettler elements of human motion behavior can be described by pose guidance also by elements of human humanofmotion motion behavior can be be describedand by the maintenance invariants between perceptual value and policy functions. Lee et al. (1976) and Mettler elements of behavior can described by value and policy functions. Lee et al. (1976) and Mettler et al. (2014) suggest that humans generate motion by the maintenance of invariants between perceptual and value and policy functions. Lee et al. (1976) and Mettler the maintenance of invariants between perceptual and et al. (2014) suggest that humans generate motion by kinematic quantities. Finally, Lee et al. (1976) suggests maintenance of invariants between perceptual and et al. al. (2014) (2014) suggest that that humans patterns. generate Interaction motion by by the deploying sensory-motor interaction kinematic quantities. Finally, Lee et al. (1976) suggests et suggest humans generate motion kinematic quantities. Finally, Lee et et al. (1976)elements, suggests deploying sensory-motor interaction patterns. Interaction that biological motion Finally, is composed of al. primitive kinematic quantities. Lee (1976) suggests deploying sensory-motor interaction patterns. Interaction patterns extend the motion-primitive automaton concept that biological motion is composed of primitive elements, deploying sensory-motor interaction patterns. Interaction that biological biological motion isa composed composed of primitive primitive elements, are described byis τ parameter, the instantaneous patterns extend the motion-primitive automaton concept that motion of elements, patterns extend extend the motion-primitive motion-primitive automaton concept introduced by Frazzoli et al. (2005) byautomaton introducing guid- which which are described by aa ττ parameter, the instantaneous patterns the concept which are described by parameter, the instantaneous time of a motion gap closure. Motion profiles are generated introduced by Frazzoli et al. (2005) by introducing guidaremotion described by a τ parameter, the instantaneous introduced by Frazzoli Frazzoli et al. al.primitives (2005) by byare introducing guid- which ance primitives. Guidance discrete agenttime of gap closure. Motion profiles are introduced by et (2005) introducing guidtime of aaa motion motionagap gap closure. Motion profiles profiles are are generated generated ance primitives. Guidance primitives are discrete agentby maintaining constant τ˙ .Motion time of closure. generated ance primitives. Guidance primitives are discrete agentenvironment interaction elements. An agent can deploy by maintaining a constant τ ˙ . ance primitives. Guidance primitives are discrete agentby maintaining a constant τ ˙ . environment interaction elements. An agent can deploy by maintaining a constant τ ˙ . environment interaction elements. An agent agent can deploy aenvironment series of guidance primitives chosen from acan library of interaction elements. An deploy Guidance Models Mettler and Kong (2012) inteaa series of guidance primitives chosen from aa library of series elements. of guidance guidance primitives chosen from from libraryperof Human learned Each element optimizes closed-loop Human Guidance Models Mettler and Kong (2012) intealearned series of primitives chosen a library of Human Guidance Models Mettler and and Kong (2012) (2012)model integrates these concepts by introducing a hierarchical elements. Each element optimizes closed-loop perGuidance Models Mettler Kong intelearned elements. elements. Each element optimizes closed-loop closed-loop perper- Human ception and actionEach behavior performance. grates these concepts by introducing a hierarchical model learned element optimizes grates these guidance. concepts by by introducing a hierarchical hierarchical model for human This model suggests that human ception and action behavior performance. grates these concepts introducing a model ception and and action behavior behavior performance. performance. for human guidance. This model suggests that human ception for human humanconsists guidance. This model model suggestsguidance, that human human of planning, perceptual and The presentaction work focuses on identifying, representing, behavior for guidance. This suggests that behavior consists of planning, perceptual guidance, and The present work focuses on identifying, representing, behavior consists of planning, perceptual guidance, and The present work focuses on identifying, representing, tracking. In this model, planning involves choosing aand seand deploying human motion guidance policies. First, behavior consists of planning, perceptual guidance, The present work focuses on identifying, representing, tracking. In this model, planning involves choosing a seand deploying human motion guidance policies. First, tracking. In this model, planning involves choosing a seand deploying human motion guidance policies. First, quence of subgoal states. Perceptual guidance involves we approximate cost and guidance policy functions that tracking. Insubgoal this model, planning involves choosing a seand deploying human motion guidance policies. First, quence of states. Perceptual guidance involves we approximate cost and guidance policy functions that quence of of primitive subgoal states. states. Perceptual guidance involves we approximate approximate cost of and guidance policy functions that deploying elementsPerceptual of motion guidance and perception to describe an ensemble human example behavior. Second, quence subgoal involves we cost and guidance policy functions that deploying primitive elements of motion and perception to describe an ensemble of human example behavior. Second, deploying primitive elements ofinvolves motionregulating and perception perception to describe an ensemble ensemble of guidance human example example behavior. Second, eachprimitive subgoal. elements Tracking of systemwe use the identifiedof policy behavior. to generate new reach deploying motion and to describe an human Second, reach each subgoal. Tracking involves regulating systemwe use the identified guidance policy to generate new reach each subgoal. Tracking involves regulating systemwe use the identified guidance policy to generate new environment dynamics to implement a motion primitive unconstrained and constrained trajectories to validate the reach each subgoal. Tracking involves regulating systemwe use the identified guidancetrajectories policy to to generate new environment dynamics to implement motion primitive unconstrained and constrained validate the environment dynamics dynamics to to implement implement aaa motion motion primitive primitive unconstrained and and constrained constrained trajectories trajectories to to validate validate the the environment unconstrained Copyright@ 2016 IFAC 95 Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © 2016, IFAC (International Federation of Automatic Control) Copyright@ 2016 IFAC 95 Copyright@ 2016 IFAC 95 Peer review under of International Federation of Automatic Copyright@ 2016 responsibility IFAC 95 Control. 10.1016/j.ifacol.2016.12.196

2016 IFAC CPHS 96 December 7-9, 2016. Florianopolis, Brazil

Andrew Feit et al. / IFAC-PapersOnLine 49-32 (2016) 095–100

Reinforcement learning (RL) can be used to determine an optimal guidance policy from a set of (possibly suboptimal) example data. Temporal difference (TD) learning identifies the optimal cost function if rewards are known for each state transition. This approach iterates a Bellmanlike update equation to converge to the optimal cost function. Q-learning also uses an iterative equation, but determines the expected value of each action at each system state, determining both the CTG and optimal policy.

element. Kong and Mettler (2013) and Mettler et al. (2014) show that guidance primitives consist of sensory-motor interaction patterns that achieve desired motion performance and risk. Prior work on guidance (Feit and Mettler (2016)) explored closed-loop modes between perceptions and actions provided by interaction patterns. Motion Primitive Maneuver Automaton Frazzoli et al. (1999) and Frazzoli et al. (2002) introduce the maneuver automaton (MA), which is a finite-state approximation of system dynamics. A MA generates complex trajectories by constructing a sequence of motion primitive elements. Motion primitives are chosen from a library of known behaviors consisting of, for example, trim and maneuver elements. Mettler et al. (2002) evaluates this approach in the application of aerobatic rotorcraft control.

The above approaches to utility function learning require a known reward function of state transitions, which is often not available for human behavior. Inverse reinforcement learning seeks to determine a utility function, given a set of example behavior that is assumed to be optimal. Ng et al. (2000) summarizes algorithms available for this process.

To identify motion primitives in human behavior, Li and Mettler (2015) investigates the dynamic clustering of surgical motion. This provides an approach to classifying observed behavior into specific motion primitive groups. MA approaches require dynamic programming to choose an optimal sequence of motion primitives to complete a task in a constrained environment. Feit et al. (2015) proposes a human-inspired approach to constrained optimal control based on choosing a series of subgoal states. Subgoal candidates are defined by a set of necessary conditions, based on constraint geometry and a known guidance policy.

Function Approximation A regression method is required to represent estimated CTG and guidance policy functions over the spatial domain. This method should be parameter-free, and not be restricted to any specific function type. In addition, the method should provide accuracy of the function value at evaluated points, to elucidate the role of uncertainty in human decision making. Gaussian process (GP) regression is employed to meet these requirements, as described in Ebden (2008). Note that GP regression has been extended for use in a TD learning framework in Engel et al. (2003). The GPStuff toolbox for Matlab is used in this work to implement function approximation (Vanhatalo et al. (2013)).

Control Theory and Human Guidance In control theory, a value function V (x) specifies the cost accrued by following a policy, π(x) beginning at state x = [xp , xv ], partitioned into configuration and velocity states, xp and xv . Kong and Mettler (2011) observes that configuration (xp ) and velocity (xv ) dynamics can be partitioned as x˙ p = xv and x˙ v = f (xv , u). In this case, the guidance policy defines the optimal velocity at a given configuration, x∗v = π(xp ). The value function is then defined over the configuration space, V ∗ (xp ), as the spatial cost-togo. Kong and Mettler (2013) models the spatial cost-togo (CTG) and guidance policy (also termed the velocityvector-field (VVF)) for an ensemble of third-person control tasks involving guiding a model-helicopter through an obstacle field. Feit and Mettler (2015) performs a similar experiment using a first-person computer simulated environment. Both experiments show that the ensemble of resulting trajectories can be partitioned based on subgoal properties defined by Kong and Mettler (2013). The resulting sets of trajectory segments between subgoals are modeled by consistent CTG and VVF functions, using time-to-go as the cost function.

2.2 Summary The rest of the paper is organized as follows. Sec. 3 defines the guidance task, and introduces an experimental framework used to observe human guidance behavior. Second, Sec. 4 describes the decomposition of observed behavior into primitive elements, which are then described by a policy function and a maneuver automaton model. Next, Sec. 5 shows the application of this model to generate new autonomous trajectories based on the observed example behavior. Finally, Sec. 6 contains concluding remarks. 3. APPROACH Modeling guidance behavior using a policy function links human behavior with control theory. The guidance policy investigation begins with defining the guidance task. 3.1 Guidance Task

Guidance Policy Representation Kong and Mettler (2013) and Feit and Mettler (2015) observe that a library of human motion primitives are described by a spatial CTG and VVF function, which together form a control policy. Russell et al. (2003) summarizes a variety of approaches to learning or identifying cost and policy functions from a set of example behavior data. A utility function can be directly approximated when the value function is known along each example trajectory. A data regression method estimates Vˆ ∗ (xp ) ≈ V ∗ (xp ), based on a set of observed example data D = [xp , w]. This approach approximates observed behavior, with observed cost w.

This work focuses on the perceptual guidance process within the hierarchical model of human control, introduced by Mettler and Kong (2012) and Mettler et al. (2014). The perceptual guidance process generates motion between subgoal states based on perception of the environment. In a motion guidance task, an agent must determine a control sequence u(t) that drives a dynamic system, x˙ = f (x, u), from a start state x(t0 ) to a goal state x(tg ), while satisfying constraints c(x, u) > 0. A solution control sequence depends on agent-environment dynamics f (·, ·) and environment structure c(·). Warren (2006) expresses the interaction between perception and guidance by coupling 96

Andrew Feit et al. / IFAC-PapersOnLine 49-32 (2016) 095–100

97

0

0

-20

-20

y [m]

y [m]

2016 IFAC CPHS December 7-9, 2016. Florianopolis, Brazil

-40

-40

-60

-60 -20

0

20

40

-20

x [m]

(a) Raw trajectory data.

Fig. 1. Guidance and perception framework.

0

20

40

x [m]

(b) Best-time trajectories with subgoals, colored by partition.

Fig. 2. Uniform course trajectory data. travel-time trajectory for each start location, s∗i = arg minsij ∈Si tgoal (sij ) is plotted in Fig. 2b. (2) Compute pair-wise distances among best-time trajectories using dynamic time warping, D = {dmn } ∈ R20×20 , where dmn = DTW(s∗m , s∗n ). Hierarchically cluster trajectories based on distance. (3) Sort trajectory pairs, {sm , sn } in order of increasing distance and compute average trajectory s¯mn . Place a subgoal candidate gmn at the earliest point where each pair meets: gmn = s¯mn (tsg )|tsg = arg mint �sm (t) − sn (t)� < ǫsg . (4) Trim each average trajectory that passes through a subgoal to begin at that subgoal: s′mn = s¯mn (tsg : tg ). Repeat steps 1 through 4 on the set of trimmed average trajectories, until a single trajectory remains.

agent-environment dynamics with environment-perception dynamics, x ˆ˙ = Ψ(ˆ x, z). Ψ(·, ·) models the internal agent state xˆ, with measurement information input z. Kong and Mettler (2013) expresses this closed-loop model in terms of the motion guidance task. Control inputs are generated to achieve desired actions, u = k(v). The agent receives information from the environment based on agent-environment states, i = e(x). Cognitive perceptual devices in the agent generate perceptual measurements from environment information, z = h(i). The present work focuses on how actions are chosen based on a guidance law, v = π(ˆ x). Fig. 1 illustrates this closed-loop system. 3.2 Experimental Approach An experimental framework is used to investigate human guidance behavior. Subjects perform a computer simulated first-person vehicle navigation task, as introduced in Feit and Mettler (2015). The course layout and set of trajectory data is shown in Fig. 2a. Subjects navigate a Dubins-type vehicle through an obstacle field. Vehicle control inputs are forward acceleration and turn rate. Forward velocity is limited by a drag term, and turn rate is limited based on forward velocity in order to force subjects to perform dynamic planning. Subjects begin from a series of twenty start locations, traveling to a common goal doorway while attempting to minimize travel time. This task provides a set of example guidance behavior, involving a range of obstacle avoidance maneuvers.

The resulting subgoal locations and partitions are depicted in Fig. 2b. Guidance Primitive Identification The guidance equivalence s1 ∼g s2 relates two trajectory segments if they are equivalent through a symmetry transformation, s1 = T (s2 ) where T () is a transformation for which system dynamics are invariant (i.e., translation, rotation, or reflection). To evaluate this relation, transformations are applied to trajectory segments so that subgoal position and velocity directions coincide. Segments are reflected about the y-axis, based on the assertion that left and right turns are equivalent. The resulting set of transformed trajectory segments plotted in Fig. 3a are not equal, but constitute a family of similar trajectories. The hypothesis is expanded to suggest that segments are part of a library of interaction patterns, si ∈ Πxg , for goal xg which are modeled by a guidance policy xref = π(x − xg ) (Mettler and Kong (2012) and Mettler et al. (2014)).

4. GUIDANCE POLICY ANALYSIS Subgoal Identification Kong and Mettler (2013) identify patterns in guidance behavior using equivalences between trajectories based on symmetries in system dynamics. Two equivalence relations are used: the subgoal equivalence relation, ∼s , and guidance equivalence relation, ∼g . Two trajectories satisfy the subgoal equivalence, s1 ∼s s2 , if they begin at different locations, join together at a subgoal g1,2 , and remain together (i.e., follow the same path) until they reach a common goal point. The subgoal equivalence divides trajectories into primitive elements. The computational subgoal identification process works as follows, which is based on an approach introduced by Li and Mettler (2015):

4.1 Guidance Policy Learning The guidance task is modeled as a Markov decision process (MDP) for which a utility and policy can be learned based on example data (Russell et al. (2003)). For motion guidance, utility and policy functions consist of CTG and VVF functions. Direct utility estimation is used to identify ˆ ∗ (xp ) that best model approximate functions Vˆ ∗ (xp ) and π actual behavior of the human subject. RL is not used here, but may be applied in future work to determine a policy that is optimal for a specified cost function based

(1) Si = {sij } is a set of trajectories beginning at location i ∈ [1, 20], for trials j ∈ [1, ni ]. The minimum 97

2016 IFAC CPHS 98 December 7-9, 2016. Florianopolis, Brazil

Andrew Feit et al. / IFAC-PapersOnLine 49-32 (2016) 095–100

0

0

0

-10

1

-5 4

06

2 2.5

-15

-10

-15

3.5

0.02

0.04

-15

-15

0.0.1 08

y [m]

0.

y [m]

y [m]

1.5

-10 3

y [m]

-10

-5

2.5

-5

00..108

-5

3.5

064 00..0 0.02

5 0.

2

0

3

3.5

-20

4

8 0.0

0.02 0.04

-20

-20

-20

4

0

5

10

0

15

(a) Aggregate trajectory segments.

5

10

15

0

x [m]

x [m]

5

10

15

x [m]

(b) Cost standard deviation [sec]

(c) Spatial time-to-go [sec]. Contour indicates stdev = 0.02 [sec].

0

5

10

15

x [m]

(d) Velocity vector field. Contour indicates stdev = 0.02 [sec].

Fig. 3. Guidance Policy Extraction domain. To better understand how the motion policy may be extrapolated, the identified functions are further decomposed into motion primitives based on the maneuver automaton framework presented in Mettler et al. (2002) and Frazzoli et al. (2005).

on human example behavior. In the present work, timeto-go is considered as the cost function for the following reasons: 1) subjects were asked to minimize travel time during the task, 2) Lee et al. (1976) suggests that time is a quantity that humans intuitively use to plan and make decisions about motion, and 3) previous work has successfully modeled human guidance cost using time-to-go (Kong and Mettler (2013)). The training data set consists of points along each trajectory, si (k) = {xp (k), xv (k), t(k)}, for time-step k and trajectory i. The example input domain consists of xin = {xp (k)} for both spatial CTG and guidance policy functions. The example output is yctg = {t(k)} for spatial CTG, and yvv = {xv (k)} for guidance policy. Gaussian Process Regression Gaussian process regression approximates a function based on a set of squared ex −(xi −xj )2 ponential kernel functions, k(xi , xj ) = σf2 exp + 2l2 σn2 δ(xi , xj ). A matrix K is computed such that element Kij = k(xi , xj ), vector K∗ such that element K∗i = k(x∗ , xi ), and K∗∗ = k(x∗ , x∗ ). i, j ∈ [1, m] for m input example points, and x∗ is the test input. The best estimate of the approximated function at x∗ is given by y¯∗ = K∗ K −1 y, and the covariance at that point is given by var(y∗ ) = K∗∗ − K∗ K −1 K∗T . y is the example output set. K is invertible if all example inputs x are distinct. The resulting approximated CTG function is shown in Fig. 3c, the cost covariance in Fig. 3b, and the VVF function is shown in Fig. 3d.

Motion Primitive Identification To investigate possible maneuver modalities in human subject data, Fig. 4a depicts a histogram of control inputs. Fig. 4b shows two clusters that account for the majority of points within the control input data. In cluster 1, inputs are centered laterally and near the maximum forward input, indicating maximum-speed, non-turning, trim trajectories. In cluster 2, inputs are at the lateral extreme, indicating turning trajectories, representing a maneuver between trim states. To classify cluster characteristics, trajectory points belonging to each cluster are plotted in Figs. (4c and 4d). In addition, a TTG histogram is shown in Fig. 5b. A boundary is apparent based on TTG, tsg , such that for cluster 2, Vˆ ∗ (xp ) ≤ tsg . For this experiment, tsg ≈ 0.9 seconds. The maneuver cluster is also radially bounded by available example guidance behavior. A point xB is 2 , defined such that Vˆ ∗ (xB ) = tsg and var(xB ) = σmax as illustrated in Fig. 4e. θmax = � xB is the angle from xg to xB . θmax approximates the maximum angle that the identified policy can provide an accurate guidance solution. Maneuver Automaton Identification A maneuver automaton (MA) models motion as a sequence of primitive elements. Fig. 5a illustrates possible transitions between elements. The hypothesis is that humans deploy a sequence of elements to connect each subgoal pair in minimum time. Optimal trajectories are expected to begin with a trim element, T : πT (x0 , xsg,1 ), followed by one or more maneuver primitives, M : {πM (xsg,1 , xsg,2 ), . . . , πM (xsg,n−1 , xsg,n )} to reach the goal (G). Fig. 5c shows the observed frequency of each primitive element sequence within the set of human-generated trajectory segments. {T, G} sequences occur with the highest frequency, indicating that the subgoal is reached directly on a trim trajectory. {M, G} sequences begin within the maneuver, direct-evaluation region and use the VVF function to reach the goal. {T, M, G} sequences begin with a trim element, and then transition to a maneuver element to reach the goal. These

4.2 Dynamic Guidance Policy Properties Subgoal Planning In summary, we have presented a method of breaking apart a set of constrained, example trajectories by identifying subgoals. The resulting set of trajectory segments are superimposed, and a guidance policy is identified that describes the family of observed motion primitives. For an agent to generate trajectories based on this policy, two things are required. First, a series of subgoals must be selected based on environment structure, such that they can be connected by a series of unconstrained motion primitives. Feit et al. (2015) presents a set of necessary conditions which specify feasible subgoal candidates. Second, it is necessary to extrapolate the motion policy to points outside of the example data 98

2016 IFAC CPHS December 7-9, 2016. Florianopolis, Brazil

1

0

-5

-5

0

-10

-10

-0.5

-0.5

-1

-1

0

0.5

1

Lon Input [%]

(a) Histogram of control inputs (range compressed).

0

0.5

1

-15

-15

-20

-20 0

(b) Control input clusters.

5

10

15

0.897

0

x [m]

Lon Input [%]

(c) Trajectory cluster 1.

99

Cluster #2

0

0.5

y [m]

Lat Input [%]

Lat Input [%]

0.5

Cluster #1

0

Cluster 1 Cluster 2

y [m]

1

Andrew Feit et al. / IFAC-PapersOnLine 49-32 (2016) 095–100

5

10

15

x [m]

points

in

(d) Trajectory points in cluster 2.

(e) Guidance policy extrapolation.

Fig. 4. Guidance behavior dynamic clustering and extrapolation. the necessary conditions. Optimal subgoal connectivity is determined using the graph search algorithm, and CTG and VVF functions identified above from human behavior. This guidance policy is then used to generate a series of new trajectories in this environment.

trajectories begin within the extrapolated region. Finally, {M, T, G} sequences occur at the lowest frequency, and consist of a maneuver element first, and then a trim element. These trajectories occur when the vehicle is not initially on an optimal trajectory; i.e., the first trajectory segment when the vehicle is initially stationary, and does not yet have the correct heading.

6. CONCLUSION AND DISCUSSION

5. GUIDANCE POLICY DEPLOYMENT 6.1 Results 5.1 Cost and Guidance Policy Evaluation The resulting solutions in Fig. 6 are admissible with the following notable characteristics. First, trajectories consist of primarily linear elements, with minimal curvature, and use of maneuvering motion primitives. This likely occurs because the vehicle has high turning capability with respect to distances between obstructions. Verma and Mettler (2016) investigates a maneuver length scale ratio which quantifies this effect. Future work will investigate tasks that requires more off-trim behavior to better observe human maneuver policy. Second, some solution trajectories appear to be inconsistent, or sub-optimal, i.e., trajectories intersect each other. This is likely because the identified maneuver policy based on human example behavior is not necessarily optimal or consistent. Consistent solutions require identified policies to be corrected using the Bellman equation (i.e., temporal difference learning Russell et al. (2003)). Note that while solution trajectories intersect, they are all admissible (do not violate constraints), with near-optimal performance. Observed human behavior also contains intersecting solutions (Fig. 2a).

The motion guidance problem consists of determining a sequence of motion primitive elements that reach the goal in minimum time. Depending on the location of initial position xp in the guidance domain, three cases exist: (1) If Vˆx∗p < tsg and � xp < θmax , then xp is within the direct evaluation region (region (1) in Fig. 4e). A maneuver element reaches the goal using the identified VVF function, π ˆ ∗ (xp ). The maneuver sequence is {M, G}. (2) If Vˆx∗p ≥ tsg and � xp < θmax , then xp is in the extrapolated domain (region (2) in Fig. 4e). A trim (T), constant-velocity, motion element reaches a point, xsg at the boundary of the direct evaluation region. From this point, a maneuver element (M) reaches the goal. The maneuver sequence is: {T,M, G}. Trim velocity is determined by: π ˆ ∗ (xp ) = π ˆ ∗ xsg = minx∈Xsg |x − xp | . (3) If � xp > θmax , then xp is outside the guidance policy domain and xB becomes a subgoal to extend the guidance domain (regions (3) and (4) in Fig. 4e). In this case, two or more maneuver elements are executed in sequence before reaching the subgoal. The maneuver sequence is {T, M, . . . , M, G}

6.2 Future Work This work focused on motion guidance in a known environment. Human behavior, however, consists of a closedloop information cycle between perceptions and actions (Warren (2006)) in an uncertain environment. Prior work has investigated relationships between perceptual and kinematic quantities in human motion (Feit and Mettler (2016)). Future work will extend this by defining a guidance-automaton model consisting of perception-action guidance primitive elements.

Constrained Trajectory Generation Feit et al. (2015) introduces a set of necessary conditions for subgoal placement, based on constraint geometry and the unconstrained guidance policy. A best-first graph search algorithm (Russell et al. (2003)) determines the optimal constrained solution trajectory, given a guidance policy that specifies optimal unconstrained solutions between subgoals. In Fig. 6, subgoals are placed at obstruction corners, which satisfy 99

Andrew Feit et al. / IFAC-PapersOnLine 49-32 (2016) 095–100

0

100

t = 0.89677 sg

50 0

0

1

2

3

4

5

Time-to-go [sec] (b) Cluster #2 time-to-go histogram.

y[m]

Frequency [#]

2016 IFAC CPHS 100 December 7-9, 2016. Florianopolis, Brazil

-20

-40

Count

40

(a) Maneuver automaton (MA) describing motion policy behavior

20 0

-60 MG

TMG

TG

MTG

(c) Observed MA transition sequence frequencies. M:maneuver, T:trim, G:goal.

Fig. 5. Maneuver automaton model and observed behavior ACKNOWLEDGEMENTS This work is financially supported by the U.S. Office of Naval Research (2013-16, #11361538) and the National Science Foundation (CAREER 2013-18 CMMI-1254906). REFERENCES Ebden, M. (2008). Gaussian processes for regression: A quick introduction. The Website of Robotics Research Group in Department on Engineering Science, University of Oxford. Engel, Y., Mannor, S., and Meir, R. (2003). Bayes meets bellman: The gaussian process approach to temporal difference learning. In ICML, volume 20, 154. Feit, A. and Mettler, B. (2015). Experimental framework for investigating first-person guidance and perception. In Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on. IEEE. Feit, A. and Mettler, B. (2016). Information-based analysis of visual cues in human guidance. In AHS Forum, volume 72. American Helicopter Society. Feit, A., Verma, A., and Mettler, B. (2015). A humaninspired subgoal-based approach to constrained optimal control. In Decision and Control, 2015 IEEE Conference on. IEEE. Frazzoli, E., Dahleh, M.A., and Feron, E. (1999). A hybrid control architecture for aggressive maneuvering of autonomous helicopters. In Decision and Control, 1999 IEEE Conference on, volume 3, 2471–2476. IEEE. Frazzoli, E., Dahleh, M.A., and Feron, E. (2002). Realtime motion planning for agile autonomous vehicles. Journal of Guidance, Control, and Dynamics, 25(1), 116–129. Frazzoli, E., Dahleh, M.A., and Feron, E. (2005). Maneuver-based motion planning for nonlinear systems with symmetries. Robotics, IEEE Transactions on, 21(6), 1077–1091. Kong, Z. and Mettler, B. (2013). Modeling human guidance behavior based on patterns in agent–environment interactions. Human-Machine Systems, IEEE Transactions on, 43(4), 371–384. Kong, Z. and Mettler, B. (2011). An investigation of spatial behavior in agile guidance tasks. In Systems, 100

-20

0

20

40

x[m] Fig. 6. Autonomously generated trajectories using motion primitives identified from human subject.

Man, and Cybernetics (SMC), 2011 IEEE International Conference on, 2473–2480. IEEE. Lee, D.N. et al. (1976). A theory of visual control of braking based on information about time-to-collision. Perception, 5(4), 437–459. Li, B. and Mettler, B. (2015). Investigation of hierarchical architecture of human guidance behavior for skill analysis. In Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on, 1081–1088. IEEE. Mettler, B., Valenti, M., Schouwenaars, T., Frazzoli, E., and Feron, E. (2002). Rotorcraft motion planning for agile maneuvering. In Proceedings of the 58th Forum of the American Helicopter Society, Montreal, Canada, volume 32. Mettler, B. and Kong, Z. (2012). Hierarchical model of human guidance performance based on interaction patterns in behavior. Application and Theory of Automation in Command and Control Systems (ICARUS), 2nd International Conference on. Mettler, B., Kong, Z., Li, B., and Andersh, J. (2014). Systems view on spatial planning and perception based on invariants in agent-environment dynamics. Frontiers in neuroscience, 8. Ng, A.Y., Russell, S.J., et al. (2000). Algorithms for inverse reinforcement learning. In ICML, 663–670. Russell, S.J., Norvig, P., Canny, J.F., Malik, J.M., and Edwards, D.D. (2003). Artificial intelligence: a modern approach, volume 2. Prentice hall Upper Saddle River. Simon, H.A. (1972). Theories of bounded rationality. Decision and organization, 1(1), 161–176. Simon, H.A. (1990). Invariants of human behavior. Annual review of psychology, 41(1), 1–20. Vanhatalo, J., Riihim¨aki, J., Hartikainen, J., Jyl¨ anki, P., Tolvanen, V., and Vehtari, A. (2013). Gpstuff: Bayesian modeling with gaussian processes. The Journal of Machine Learning Research, 14(1), 1175–1179. Verma, A. and Mettler, B. (2016). Scaling effects in guidance performance in confined environments. Journal of Guidance, Control, and Dynamics, 1–12. Warren, W.H. (2006). The dynamics of perception and action. Psychological review, 113(2), 358.