Proceedings of the 20th World Congress Proceedings of 20th The International Federation of Congress Automatic Control Proceedings of the the 20th World World Congress Proceedings of the 20th World Congress The International of Automatic Control Toulouse, France,Federation July 9-14, 2017 Available online at www.sciencedirect.com The International Federation of The International Federation of Automatic Automatic Control Control Toulouse, France, July 9-14, 2017 Toulouse, France, July 9-14, 2017 Toulouse, France, July 9-14, 2017
ScienceDirect
IFAC PapersOnLine 50-1 (2017) 15361–15366
Particle Model Predictive Control: Particle Model Predictive Control: Particle Model Predictive Control: Particle Model Predictive Control: Tractable Stochastic Nonlinear Tractable Stochastic Nonlinear Tractable Stochastic Nonlinear Tractable Stochastic Nonlinear Output-Feedback MPC Output-Feedback Output-Feedback MPC MPC Output-Feedback MPC
Martin A. Sehr & Robert R. Bitmead Martin A. Sehr & Robert R. Bitmead Martin Martin A. A. Sehr Sehr & & Robert Robert R. R. Bitmead Bitmead Department of Mechanical & Aerospace Engineering, University of Department of & Aerospace Engineering, University of Department of Mechanical Mechanical & La Aerospace Engineering, University of California, San Diego, Jolla, CA 92093-0411, USA Department of Mechanical & Aerospace Engineering, University of California, San Diego, La Jolla, CA 92093-0411, USA California, San Diego, La Jolla, CA 92093-0411, USA (e-mail: {msehr, rbitmead}@ ucsd.edu). California, San Diego, La Jolla, CA 92093-0411, USA (e-mail: {msehr, rbitmead}@ ucsd.edu). (e-mail: (e-mail: {msehr, {msehr, rbitmead}@ rbitmead}@ ucsd.edu). ucsd.edu). Abstract: We combine conditional state density construction with an extension of the Scenario Abstract: We combine conditional state density construction with ansystems extension the Scenario Abstract: We combine state construction with extension of the Approach for stochastic Model Predictive Control to nonlinear to of yield a novel Abstract: We stochastic combine conditional conditional state density density construction with an ansystems extension ofyield the Scenario Scenario Approach for Model Predictive Control to nonlinear to aa novel Approach for stochastic Model Predictive Control to nonlinear systems to yield particle-based formulation of stochastic nonlinear output-feedback Model Predictive Control. Approach for formulation stochastic Model Predictive Control to nonlinear Model systemsPredictive to yield Control. a novel novel particle-based of stochastic stochastic nonlinear output-feedback particle-based formulation of nonlinear Model Predictive Control. Conditional densities given noisy measurement dataoutput-feedback are propagated via the Particle Filter as an particle-based formulation of stochastic nonlinear output-feedback Model Predictive Control. Conditional densities given noisy measurement data are propagated via the Particle Filter as an Conditional given noisy measurement data are propagated via the Particle Filter as an approximatedensities implementation of the Bayesian Filter. This enables a particle-based representation Conditional densities given noisy measurement data are propagated via the Particle Filter as an approximate implementation of the Bayesian Filter. This enables a particle-based representation approximate implementation of the Bayesian Filter. This enables a particle-based representation of the conditional state density, or information state, which naturally merges with scenario approximate implementation of the Bayesian Filter. This enables a particle-based representation of the conditional density, information state, which naturally merges with scenario of the state density, or information state, naturally merges with generation from thestate current systemor state. This approach attempts to address the computational of the conditional conditional density, orstate. information state, which which naturally merges with scenario scenario generation from thestate current system This approach attempts to The address the computational computational generation from the current system state. This approach attempts to address the tractability questions of general nonlinear stochastic optimal control. Particle Filter and the generation from the current system state. This approach attempts to The address the computational tractability questions of general nonlinear stochastic optimal control. Particle Filter and the tractability questions of general nonlinear stochastic optimal control. The Particle Filter Scenario Approach are shown to be fully compatible and – based on the timeand measurementtractability questions ofshown general nonlinear stochastic and optimal control. The Particle Filter and and the the Scenario Approach are to be fully compatible – based on the timeand measurementScenario Approach are shown to be fully compatible and – based on the timeand measurementupdate stages of the Particle Filter – incorporated into the optimization over future control Scenario Approach are shown to be fully compatible and – based on the timeand measurementupdate stages of Filter – into over control update stages of the the Particle Particle Filter – incorporated incorporated into the theforoptimization optimization overoffuture future control sequences. A numerical example is presented and examined the dependence solution and update stages of the Particle Filter – incorporated into the optimization over future control sequences. A numerical example is presented and examined for the dependence of solution and sequences. A numerical example is presented and examined for the dependence of solution and computational burden on the sampling configurations of the densities, scenario generation sequences. A numerical example is presented and examined for the dependence of solution and computational burden on the sampling configurations of the densities, scenario generation and computational burden on the optimization horizon. computational burden on the the sampling sampling configurations configurations of of the the densities, densities, scenario scenario generation generation and and the optimization horizon. the optimization horizon. the optimization horizon. © 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Keywords: stochastic control, model predictive control, nonlinear control, information state, Keywords: stochastic Keywords: stochastic control, control, model model predictive predictive control, control, nonlinear nonlinear control, control, information information state, state, particle filtering. Keywords: stochastic control, model predictive control, nonlinear control, information state, particle filtering. particle filtering. particle filtering. 1. INTRODUCTION The stochastic nonlinear output-feedback MPC algorithm 1. INTRODUCTION INTRODUCTION The stochastic nonlinear output-feedback MPC algorithm 1. The stochastic nonlinear MPC algorithm presented in this paper isoutput-feedback motivated by the structure of 1. INTRODUCTION The stochastic nonlinear output-feedback MPC algorithm presented in this paper is motivated by the structure of presented in this paper is motivated by the structure of Stochastic Model Predictive Control (SMPC) via finiteinModel this paper is motivated by the structure of Model Predictive Control (MPC), in its original formula- presented Stochastic Predictive Control The (SMPC) finiteModel Predictive Control (MPC), in Mayne its original original formulaModel Control (SMPC) viamethod finitehorizon stochastic optimal control. lattervia Model (MPC), in its Stochastic Model Predictive Predictive Control The (SMPC) via finitetion, is Predictive a full-stateControl feedback law (see et al.formula(2000); Stochastic Model Predictive Control (MPC), in its original formulahorizon stochastic optimal control. latter method tion, is aa(2014); full-state feedback law law (see Mayne Mayne et al. al. (2000); control. The latter requires stochastic propagatingoptimal conditional state densities using a tion, is full-state feedback (see et horizon stochastic optimal control. The latter method method Mayne Maciejowski (2002)). This underpins two horizon tion, is a(2014); full-state feedback law (see Mayne et al. (2000); (2000); requires propagating conditional state densities using Mayne Maciejowski (2002)). This underpins two requires propagating conditional state densities using Bayesian Filter (BF) and solution of the Stochastic Dy-aa Mayne (2014); Maciejowski (2002)). This underpins two requires propagating conditional state densities using a theoretical limitations of MPC: accommodation of outputMayne (2014); Maciejowski (2002)). This underpins two Bayesian Filter (BF) and solution of the Stochastic Dytheoretical limitations oftoMPC: MPC: accommodation of outputoutput- Bayesian Filter (BF) and solution of the namic Programming Equation (SDPE). ByStochastic virtue of Dyimtheoretical limitations accommodation of Bayesian Filter (BF) and solution of the Stochastic Dyfeedback, and extensionof include a compelling robustness theoretical limitations of MPC: accommodation of outputnamic Programming Equation (SDPE). By By virtue virtue law of imimfeedback, andthe extension to include include is compelling robustness Programming Equation (SDPE). of plementing a truly optimal finite-horizon in feedback, and extension to aa namic Programming Equation (SDPE). Bycontrol virtue law of imtheory given state dimension fixed. This robustness paper ad- namic feedback, andthe extension to include is a compelling compelling robustness plementing a truly truly fashion, optimal finite-horizon control in theory given state dimension fixed. This paper adplementing a optimal finite-horizon control law in a receding horizon one can deduce a number of theory given the state dimension is fixed. This paper adplementing a truly optimal finite-horizon control law in dresses the first of these issues in a rather general, though theory given the state dimension is fixed. This paper ada receding horizon fashion, one can deduce a number of dresses the first of these issues in a rather general, though a receding horizon fashion, one can deduce a number of properties of the closed-loop dynamics, including recursive dresses first of these issues in aa rather general, though aproperties receding ofhorizon fashion, one can deduce a number of practicalthe setup. dresses the first of these issues in rather general, though the closed-loop dynamics, including recursive practical setup. setup. properties of the closed-loop dynamics, including recursive feasibility of the SMPC controller, stochastic stability and practical properties of the closed-loop dynamics, including recursive practical setup. feasibility of the the SMPCclosed-loop controller, stochastic stochastic stability and There has been a number of approaches to output-feedback feasibility of controller, stability and bounds characterizing infinite-horizon perforfeasibility of the SMPC SMPCclosed-loop controller, stochastic stability and There has beenhinging a number number of approaches to output-feedback output-feedback bounds characterizing infinite-horizon perforThere has been a of approaches to MPC, mostly on the replacement of the measured bounds characterizing closed-loop infinite-horizon performance, as discussed in Sehr and Bitmead (2016a). There has been a number of approaches to output-feedback bounds characterizing closed-loop infinite-horizon perforMPC, mostly hinging onestimate, the replacement replacement ofcomputed the measured measured mance, as as discussed discussed in in Sehr Sehr and and Bitmead Bitmead (2016a). (2016a). MPC, mostly the the true state by hinging a stateon which isof via mance, MPC, mostly onestimate, the replacement ofcomputed the measured as discussed in Sehr and Bitmead optimal (2016a).outputtrue state by hinging a (e.g. stateSehr which (2016b); is via mance, Unfortunately, solving for the stochastic true state by a state estimate, which is computed via Kalman filtering and Bitmead Yan and true state by a (e.g. stateSehr estimate, which (2016b); is computed via Unfortunately, Unfortunately, solving foronthe the stochastic optimal outputKalman filtering and Bitmead Yan and solving for stochastic optimal outputfeedback controller, even the finite horizon, is computaKalman Sehr and Bitmead (2016b); Yan Unfortunately, solving foronthe stochastic optimal outputBitmeadfiltering (2005)),(e.g. moving-horizon estimator (e.g. Copp and Kalman filtering (e.g. Sehr and Bitmead (2016b); Yan and feedback controller, even the finite horizon, is computaBitmead (2005)), moving-horizon estimator (e.g. Copp Copp and feedback controller, even onfor thespecial finite horizon, horizon, is computacomputationally intractable except cases such as linear Bitmead (2005)), moving-horizon estimator (e.g. and feedback controller, even on the finite is Hespanha (2014); Sui et al. (2008)), tube-based minimax Bitmead (2005)), moving-horizon estimator (e.g. Copp and tionally intractable except for specialofcases cases such as as linear Hespanha (2014); Sui et et et al. al. (2008)), tube-based minimax special such quadraticintractable Gaussian except MPC for because the need to linear solve Hespanha Sui al. (2008)), tube-based minimax tionally intractable except specialofcases such as linear estimators (2014); (e.g. Mayne (2009)), etc. Apart from tionally Hespanha (2014); Sui et et al. al. (2008)), tube-based minimax quadratic Gaussian MPC for because the of need to solve estimators (e.g. Mayne (2009)), etc. Apart from quadratic Gaussian MPC because of the need to solve the SDPE, which incorporates the duality the optimal estimators (e.g. Mayne et al. (2009)), etc. Apart from quadratic Gaussian MPC because of the need to solve Copp and Hespanha (2014), these designs, often for linear estimators (e.g. Mayne et al. (2009)), etc. Apart from the the SDPE, which incorporates theobservability. duality of of the optimal Copp andseparate Hespanha (2014), these designs, often for linear linear SDPE, the duality optimal control law which in its incorporates effect on state While the Copp and Hespanha these designs, for SDPE, which incorporates theobservability. duality of the the optimal systems, the(2014), estimator design fromoften the control de- the Copp and Hespanha (2014), these designs, often for linear control law in its effect on state While the systems, separate the estimator design from the control decontrol law in its effect on state observability. While the BF, required to propagate the conditional state densities, systems, the design from control law intoitspropagate effect on the state observability. While the sign. Theseparate control problem may be altered tothe accommodate systems, the estimator estimator design from control dede- control BF, required conditional state densities, sign. Theseparate control problem problem may be altered tothe accommodate required to the state densities, is readily approximated using Particle Filter (PF), opensign. The control be altered to accommodate BF, required to propagate propagate theaa conditional conditional state densities, the state estimation error may by methods such as: constraint BF, sign. The control problem may be altered to accommodate is readily approximated using Particle Filter (PF), openthe state estimation estimation error by methods methods suchchance/probaas: constraint constraint is readily approximated using Particle Filter (PF), openloop solution of the SDPE results in the Filter loss of(PF), the duality the state by such as: is readily approximated using aa Particle opentightening as in Yan error and Bitmead (2005), the state estimation error by methods suchchance/probaas: constraint loop loop solution ofcontrol. the SDPE SDPE results in the loss loss in of the the duality tightening as in Yan and Bitmead (2005), solution of the results in the of duality of the optimal While not discussed this paper, tightening as Bitmead chance/probasolution ofcontrol. the SDPE results in the loss in of the duality bilistic constraints as and in Cannon et (2005), al. (2012) or Schwarm loop tightening as in in Yan Yan Bitmead (2005), chance/probaof theeffect optimal While not discussed this paper, bilistic constraints as and in Cannon Cannon et Likewise, al. (2012) or Schwarm of optimal While not discussed in paper, thisthe can control. be mitigated sub-optimally bythis imposing bilistic constraints as in et al. (2012) or Schwarm of the optimal control. While not discussed in this paper, and Nikolaou (1999), and so forth. for nonlinear bilistic constraints as in Cannon et al. (2012) or Schwarm this effect can be mitigated sub-optimally by imposing and Nikolaou (1999), and so so forth. Likewise, Likewise, forisnonlinear nonlinear effect can sub-optimally by imposing excitation requirements as in Chisci et al. (2001); Marafioti and Nikolaou (1999), and forth. for this effect requirements can be be mitigated mitigated sub-optimally by Marafioti imposing problems, where the state estimation behavior affected this and Nikolaou (1999), and so forth. Likewise, for nonlinear excitation as in Chisci et al. (2001); problems, where the state estimation estimation behavior ismodified affected excitation requirements et al. (2014). problems, state behavior affected excitation requirements as as in in Chisci Chisci et et al. al. (2001); (2001); Marafioti Marafioti by control where signal the properties, the control may beis problems, where the state estimation behavior ismodified affected et et al. al. (2014). (2014). by control signal properties, the control may be by control signal properties, the control may be modified et al. (2014). to enhance the excitation properties of the estimator, as Approximately propagating the conditional state densities by control signal properties, the control mayestimator, be modified to enhance excitation properties of as Approximately propagating the conditional conditional state densities densities to enhance the excitation properties of the the et estimator, as Approximately suggested inthe Chisci et al. (2001); Marafioti al. (2014). propagating the state by means of the PF naturally invites combination with to enhancein the excitation properties of the estimator, as Approximately propagating the conditional state densities suggested Chisci et al. (2001); Marafioti et al. (2014). by means of the PF naturally invites combination with suggested in Chisci et al. (2001); Marafioti et al. (2014). Each of these aspects of accommodation is made in an by means of the PF naturally invites combination with the more recent advances in Scenario Model Predictive suggested in Chisci et al. (2001); Marafioti et al. (2014). of the advances PF naturally invites combination with Each of fashion. these aspects aspects of accommodation accommodation is made made in an an by the means more recent recent in Scenario Scenario Model Predictive Predictive Each of these isolated more advances in Model Each of fashion. these aspects of of accommodation is is made in in an the the more recent advances in Scenario Model Predictive isolated isolated isolated fashion. fashion. Copyright 15931Hosting by Elsevier Ltd. All rights reserved. 2405-8963 © © 2017 2017, IFAC IFAC (International Federation of Automatic Control) Copyright © 2017 IFAC 15931 Copyright © 2017 IFAC 15931 Peer review under responsibility of International Federation of Automatic Copyright © 2017 IFAC 15931Control. 10.1016/j.ifacol.2017.08.2462
Proceedings of the 20th IFAC World Congress 15362 Martin A. Sehr et al. / IFAC PapersOnLine 50-1 (2017) 15361–15366 Toulouse, France, July 9-14, 2017
Control (SCMPC), as discussed for instance by Blackmore et al. (2010); Calafiore and Fagiano (2013); Grammatico et al. (2016); Lee (2014); Mesbah et al. (2014); Schildbach et al. (2014). Scenario methods deal with optimization of difficult, non-convex problems in which the initial task is recast as a parametrized collection of simpler, generally convex problems. Random sampling of uncertain signals and parameters is performed and the resulting collection of deterministic problem instances is solved. The focus has been on full state feedback for systems with linear dynamics and probabilistic state constraints. The technical construction is to take a sufficient number of samples (scenarios) to provide an adequate reconstruction of future controlled state densities for design. In contrast to solving the SDPE underlying the stochastic optimal control problem, the future controlled state densities in SCMPC are open-loop constructions. However, they present a natural fit combined with the particlebased conditional density approximations generated by the PF, where individual particles can be interpreted as scenarios from an estimation perspective. Moreover, while SCMPC is typically formulated in the linear case, the basic idea extends to the nonlinear case, albeit with the loss of many computation-saving features. In this paper, we propose and discuss this output-feedback version of SCMPC combined with the PF, which we call Particle Model Predictive Control (PMPC). Compared with the stochastic optimal output-feedback controller (computed via BF and SDPE), the PMPC controller is suboptimal in not accommodating future measurement updates and thereby losing both exact constraint violation probabilities along the horizon and the probing requirement inherent to stochastic optimal control. On the other hand, PMPC enables a generally applicable and, at least for small state dimensions, computationally tractable alternative for nonlinear stochastic output-feedback control. The structure of the paper is as follows. We briefly introduce the problem setup in Section 2 and SMPC in Section 3 and proceed by introducing the PMPC control algorithm based on its individual components and parameters in Section 4. After describing the algorithm and its correspondence to SMPC, we use a challenging scalar nonlinear example to demonstrate computational tractability and dependence of the proposed PMPC closedloop behavior on a number of parameters in Section 5. The example features nonlinear state and measurement equations and probabilistic state constraints under significant measurement noise. Finally, we conclude with Section 6. 2. STOCHASTIC OPTIMAL CONTROL – SETUP We consider receding horizon output-feedback control for nonlinear stochastic systems of the form xt+1 = f (xt , ut , wt ), x0 ∈ Rn , (1) yt = h(xt , vt ), (2) starting from known initial state probability density function, π0|−1 = pdf(x0 ). To this end, we denote the data available at time t by ζ {y0 , u0 , y1 , u1 , . . . , ut−1 , yt }, ζ {y0 }. The information state, denoted πt , is the conditional density of state xt given data ζ t . t
πt pdf xt | ζ t . (3) We further impose the following standing assumption on the random variables and control inputs. Assumption 1. The signals in (1-2) satisfy: 1. {wt } and {vt } are sequences of independent and identically distributed random variables. 2. x0 , wt , vl are mutually independent for all t, l ≥ 0. 3. The control input ut at time instant t ≥ 0 is a function of the data ζ t and given initial state density π0|−1 . Denote by Et [ · ] and Pt [ · ] the conditional expected value and probability with respect to state xt – with conditional density πt – and random variables {(wk , vk+1 ) : k ≥ t}, respectively, and by k the constraint violation level of constraint xk ∈ Xk . Our goal is to solve the finite-horizon stochastic optimal control problem (FHSOCP) t+N −1 Et c(xk , uk ) + cN (xt+N ) , inf ut ,...,ut+N −1 k=t s.t. xk+1 = f (xk , uk , wk ), PN (πt ) : x t ∼ πt , P k+1 [xk+1 ∈ Xk+1 ] ≥ 1 − k+1 , uk ∈ Uk , k = t, . . . , t + N − 1. In theory, solving the FHSOCP at each time t and subsequently implementing the first control in a receding horizon fashion leads to a number of desirable closedloop properties, as discussed in Sehr and Bitmead (2016a). However, solving the FHSOCP is computationally intractable in practice, a fact that has led to a number of approaches in MPC for nonlinear stochastic dynamics. We propose a novel strategy that is oriented at the structure of SMPC based on the FHSOCP, but numerically tractable at least for low state dimensions. As a result of the Markovian state equation (1) and measurement equation (2), the optimal control inputs in the FHSOCP must inherently be separated feedback policies (e.g. Bertsekas (1995); Kumar and Varaiya (1986)). That is, control input ut depends on the available data ζ t and initial density π0|−1 solely through the current information state, πt . Optimality thus requires propagating πt and policies gt , where ut = gt (πt ). (4) Motivated by this two-component separated structure of stochastic optimal output-feedback control, we propose an extension of the SCMPC approach to nonlinear systems, merged with a numerical approximation of the information state update via particle filtering. Before proceeding with this novel approach, we briefly revisit the two components of SMPC via solution of the FHSOCP.
3. STOCHASTIC MODEL PREDICTIVE CONTROL
0
The information state is propagated via the Bayesian Filter (see e.g. Chen (2003); Simon (2006)):
15932
Proceedings of the 20th IFAC World Congress Martin A. Sehr et al. / IFAC PapersOnLine 50-1 (2017) 15361–15366 Toulouse, France, July 9-14, 2017
pdf(yt | xt ) πt|t−1 πt = , pdf(yt | xt ) πt|t−1 dxt πt+1|t pdf(xt+1 | xt , ut ) πt dxt ,
(5) (6)
for t ∈ {0, 1, 2, . . .} and initial density π0|−1 . The recursion (5-6) has the following features: • The measurement update (5) combines the a priori conditional density, πt|t−1 , and pdf(yt | xt ), derived from (2) using knowledge of: the function h(·, ·), the density of vt , and the value of yt . • The time update (6) combines πt and pdf(xt+1 |xt , ut ), derived from (1) using knowledge of: control input ut , function f (·, ·, ·), and the density of wt . • For linear Gaussian systems, the filter recursion (5-6) reduces to the well-known Kalman Filter.
precisely is possible only in special cases such as linear Gaussian systems, where the densities can be finitely parametrized, the BF can be implemented approximately by means of the Particle Filter, with the approximation improving with the number of particles, as described for instance in Simon (2006). In parallel with the BF, the PF consists of two parts: the forward propagation of the state density, and the resampling of the density using the next measurement. The following algorithm describes a version of the PF amenable to PMPC in the context of this paper. This is a slightly modified version of the filter design described by Simon (2006). Algorithm 2 Particle Filter (PF) 1:
Combined with solution of the FHSOCP, this leads to the following SMPC algorithm, as discussed in Sehr and Bitmead (2016a).
2: 3:
Algorithm 1 Stochastic Model Predictive Control 1: 2: 3: 4: 5: 6: 7: 8: 9:
4:
Offline: Solve PN (·) for the first optimal policy, g0 (·). Online: for t = 0, 1, 2, . . . do Measure yt Compute πt Apply first optimal control policy, ut = g0 (πt ) Compute πt+1|t end for
5: 6: 7: 8:
Notice how this algorithm differs from common practice in stochastic model predictive control in that it explicitly uses the information states πt . Throughout the literature, these information states – conditional densities – are commonly replaced by state estimates. While this makes the problem more tractable, one no longer solves the underlying stochastic optimal control problem.The central divergence however lies in Step 2 of the algorithm, in which the SDPE is presumed solved offline for the optimal feedback policies, gt (πt ), from (4). This is an extraordinarily difficult proposition in many cases but captures the optimality, and hence duality, as a closed-loop feedback control law. The complexity of this step lies not only in computing a vector functional but also in the internal propagation of the information state within the SDPE. 4. TRACTABLE NONLINEAR OUTPUT-FEEDBACK MODEL PREDICTIVE CONTROL In this section, we motivate a novel approach to outputfeedback MPC that maintains the separated structure of SMPC while being numerically tractable for modest problem size. 4.1 Approximate Information State & Particle Filter The BF (5-6) propagates the information state πt to implement a necessarily separated stochastic optimal outputfeedback control law. While implementing this recursion
15363
Sample Np particles, {x− 0,p , p = 1, . . . , Np }, from density π0|−1 . for t = 0, 1, 2, . . . do Measure yk . Compute the relative likelihood qp of each particle x− t,p conditioned on the measurement yt by evaluating pdf(yt | x− t,p ) based on (2) and pdf(vt ). N p Normalize qp → qp / p=1 qp . Sample Np particles, x+ , t,p via resampling based on the relative likelihoods qp . + Given ut , propagate x− t+1,p = f (xt,p , ut , wt,p ), where wt,p is generated based on pdf(wt ). end for
While a number of variations – such as roughening of the particles and differing resampling strategies, including importance sampling – of this basic algorithm may be sensible depending on the system at hand, this basic algorithm suffices in presenting a numerical method of approximating the Bayesian Filter to arbitrary degree of accuracy with increasing number of particles Np (see e.g. Smith and Gelfand (1992)). For a more detailed discussion on the PF for use in state-estimate feedback control, see Rawlings and Mayne (2009). 4.2 Scenario MPC and Particle Model Predictive Control The Scenario Approach to MPC (e.g. Calafiore and Fagiano (2013); Grammatico et al. (2016); Lee (2014); Mesbah et al. (2014); Schildbach et al. (2014)) commences from state xt or state estimate, x ˆ t|t . It propagates, i.e. simulates, an open-loop controlled stochastic system with sampled process noise density pdf(wt ). These propagated samples are then used to evaluate controls for constraint satisfaction and for open-loop optimality with probabilities tied to the sampled wt densities. In many regards, this is congruent to repeated forward propagation of the PF via (6) without measurement update (5) and commencing from a singular density at xt or x ˆ t|t . Particle MPC simply replaces the starting point, x ˆ t|t , by the collection of particles {x+ t,p , p = 1, . . . .Np } distributed as πt , as illustrated in Figure 1.
15933
Proceedings of the 20th IFAC World Congress 15364 Martin A. Sehr et al. / IFAC PapersOnLine 50-1 (2017) 15361–15366 Toulouse, France, July 9-14, 2017
past
future
4.3 Computational Demand
time Fig. 1. State density evolution in: Scenario MPC calculations (dots and solid outlines) and, Particle MPC (dashed outlines), for three steps into the future.
Before introducing the PMPC algorithm, we define a sampled, particle version of the FHSOCP, with Ns scenarios and Np available a posteriori particles at time t, ˜ N ({x+ P t,p , p = 1, . . . , Np }) : t+N −1 Ns c(xk,s , uk ) + cN (xt+N,s ) , inf ut ,...,ut+N −1 s=0 k=t s.t. xk+1,s = f (xk,s , uk , wk,s ), xt,s ∈ {x+ t,p , p = 1, . . . , Np }, ˜ Pk+1 [xk+1 ∈ Xk+1 ] ≥ 1 − k+1 , uk ∈ Uk , s = 1, . . . , Ns , k = t, . . . , t + N − 1, where the statement ˜ k+1 [xk+1 ∈ Xk+1 ] ≥ 1 − k+1 P means that xk+1,s ∈ Xk+1 for at least (1 − k+1 )Ns scenarios. Following the approach in Schildbach et al. (2013), one may also choose to replace this constraint by xk+1 ∈ Xk+1 and select the number of scenarios Ns according to the desired constraint violation levels k+1 . We are now in position to formulate the PMPC algorithm following the schematic in Figure 1.
Computational tractability of PMPC deteriorates with increasing: number of particles; number of scenarios; system dimensions; control signal grid spacing; MPC horizon. While the number of particles required for satisfactory performance of the PF grows exponentially with the state dimension (e.g. Snyder et al. (2008)), it is unclear how to select an appropriate number of scenarios in the nonlinear case. Suppose the state and input dimensions are n and m and that the numbers of particles and scenarios are chosen as Np = P n and Ns = S n for positive integers P and S, respectively, and that the MPC horizon is N . Further assuming a grid of U m points in the control space and brute-force evaluation of all possible sequences, the order of growth for PMPC is approximately O(P n + S n U mN ).
Notice that the computational demand associated with the conditional density approximation in PMPC is additive in terms of the overall computational demand. This indicates that, provided the PF is computationally tractable for given state dimensions, tractability of PMPC is roughly equivalent to tractability of standard statefeedback SCMPC. In the example below, we found that scenario optimization tends to be the computational bottleneck at least for low system dimensions. Clearly, this observation holds only when the scenario optimization is performed by explicit enumeration of all feasible sequences over a grid in the control space, which may be avoided for particular problem instances. But the experience also confirms that in the nonlinear case the open- or closed-loop control calculation dominates the computational burden in comparison to state estimation. 5. NUMERICAL EXAMPLE Consider the scalar, nominally unstable nonlinear system xt+1 = 1.5 xt + atan (xt − 1)2 ut + wt ,
yt = x3t − xt + vt , where x0 , wt and vl are mutually independent random variables for all t, l ≥ 0 and x0 ∼ U (1, 2), wt ∼ U (−2, 2), vt ∼ N (0, 5), for all t ≥ 0. We aim to minimize the quadratic cost function
Algorithm 3 Particle Model Predictive Control (PMPC) 1: 2: 3: 4:
5: 6: 7: 8: 9:
x− 0,p ,
Generate Np a priori particles, based on π0|−1 . for t = 0, 1, 2, . . . do Measure yt . Compute the relative likelihood qp of each particle x− t,p conditioned on the measurement yt by evaluating pdf(yt | x− t,p ) based on (2) and pdf(vt ). N p Normalize qp → qp / p=1 qp . Generate Np a posteriori particles, x+ t,p , via resampling based on the relative likelihoods qp . ˜ N ({x+ Solve P t,p , p = 1, . . . , Np }) for the optimal scenario control values ut , . . . , ut+N −1 . + Given ut , propagate x− t+1,p = f (xt,p , ut , wt,p ), where wt,p is generated based on pdf(wt ). end for
JN (πt , ut , . . . , ut+N −1 ) = t+N −1 2 2 2 Et 100 xk + uk + 100 xt+N , k=t
while satisfying the constraints Pk+1 [xk+1 ≥ 1] ≥ 0.9, −5 ≤ uk ≤ 5, along the control horizon N , that is k ∈ {t, . . . , t + N − 1} for t ≥ 0. Notice how this system has both limited observability and controllability close to the constraint but infeasible unconstrained optimal states. In combination with the very noisy measurements, this is a challenging control problem. To implement PMPC as described in Section 4 for this nonlinear stochastic output-feedback control problem, we further restrict the control inputs to integer values, such that ut ∈ {−5, −4, . . . , 4, 5}. Figure 2 displays simulated closed-loop state trajectories, control values and measurement values for four PMPC controllers
15934
Proceedings of the 20th IFAC World Congress Martin A. Sehr et al. / IFAC PapersOnLine 50-1 (2017) 15361–15366 Toulouse, France, July 9-14, 2017
with differing parameters subject to the same realizations of process and measurement noise, respectively. Figure 2a displays closed-loop simulation results under PMPC with horizon N = 3, Np = 5, 000 particles and Ns = 1, 000 scenarios. While the poor observability properties of the system show close to the probabilistic constraint, it is satisfied at all times in this simulation. This is still the case when decreasing the number of particles to Np = 100 in the simulation displayed Figure 2b. However, we see how in this case, the decreased accuracy of the PF leads to larger state-values in closed-loop. Similar behavior is observed in Figure 2c when reducing the number of scenarios to Ns = 50. Additionally, the controller violates the probabilistic constraint 3 times in this case. This trend continues when reducing the horizon to N = 2, as displayed in Figure 2d. 6. CONCLUSION We presented PMPC as a novel approach to outputfeedback control of stochastic nonlinear systems. Generating scenarios not only from the distribution of the process noise but also from the particles of the Particle Filter, PMPC combines the benefits of the Particle Filter and Scenario MPC in a natural fit, allowing for a numerically tractable version of stochastic MPC with general nonlinear dynamics, cost and probabilistic constraints. Given a particular system instance, the algorithm and its properties may be adapted to exploit specific problem structure. Such extensions include: sub-optimal probing via additional constraints; scenario removal; provable closed-loop properties such as constraint satisfaction with specified confidence levels; optimization over parametrized policies. REFERENCES Bertsekas, D.P. (1995). Dynamic programming and optimal control. Athena Scientific, Belmont, MA. Blackmore, L., Ono, M., Bektassov, A., and Williams, B.C. (2010). A probabilistic particle-control approximation of chance-constrained stochastic predictive control. IEEE Transactions on Robotics, 26(3), 502–517. Calafiore, G.C. and Fagiano, L. (2013). Stochastic model predictive control of LPV systems via scenario optimization. Automatica, 49(6), 1861–1866. Cannon, M., Cheng, Q., Kouvaritakis, B., and Rakovi´c, S.V. (2012). Stochastic tube MPC with state estimation. Automatica, 48(3), 536–541. Chen, Z. (2003). Bayesian filtering: From Kalman filters to particle filters, and beyond. Statistics, 182(1), 1–69. Chisci, L., Rossiter, J.A., and Zappa, G. (2001). Systems with persistent disturbances: predictive control with restricted constraints. Automatica, 37(7), 1019–1028. Copp, D.A. and Hespanha, J.P. (2014). Nonlinear outputfeedback model predictive control with moving horizon estimation. In 53rd IEEE Conference on Decision and Control, 3511–3517. Los Angeles, CA. Grammatico, S., Zhang, X., Margellos, K., Goulart, P., and Lygeros, J. (2016). A scenario approach for nonconvex control design. IEEE Transactions on Automatic Control, 61(2), 334–345. Kumar, P.R. and Varaiya, P. (1986). Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice-Hall, Englewood Cliffs, NJ.
15365
Lee, J.H. (2014). From robust model predictive control to stochastic optimal control and approximate dynamic programming: A perspective gained from a personal journey. Computers & Chemical Engineering, 70, 114–121. Maciejowski, J.M. (2002). Predictive Control with Constraints. Prentice Hall, Englewood Cliffs, NJ. Marafioti, G., Bitmead, R.R., and Hovd, M. (2014). Persistently exciting model predictive control. International Journal of Adaptive Control and Signal Processing, 28(6), 536–552. Mayne, D.Q. (2014). Model predictive control: Recent developments and future promise. Automatica, 50(12), 2967–2986. Mayne, D.Q., Rakovi´c, S.V., Findeisen, R., and Allg¨ower, F. (2009). Robust output feedback model predictive control of constrained linear systems: Time varying case. Automatica, 45(9), 2082–2087. Mayne, D.Q., Rawlings, J.B., Rao, C.V., and Scokaert, P.O.M. (2000). Constrained model predictive control: Stability and optimality. Automatica, 36(6), 789–814. Mesbah, A., Streif, S., Findeisen, R., and Braatz, R.D. (2014). Stochastic nonlinear model predictive control with probabilistic constraints. In American Control Conference, 2413–2419. Portland, OR. Rawlings, J.B. and Mayne, D.Q. (2009). Model predictive control: Theory and design. Nob Hill Publishing, Madison, WI. Schildbach, G., Fagiano, L., Frei, C., and Morari, M. (2014). The scenario approach for stochastic model predictive control with bounds on closed-loop constraint violations. Automatica, 50(12), 3009–3018. Schildbach, G., Fagiano, L., and Morari, M. (2013). Randomized solutions to convex programs with multiple chance constraints. SIAM Journal on Optimization, 23(4), 2479–2501. Schwarm, A.T. and Nikolaou, M. (1999). Chanceconstrained model predictive control. AIChE Journal, 45(8), 1743–1752. Sehr, M.A. and Bitmead, R.R. (2016a). Stochastic model predictive control: Output-feedback, duality and guaranteed performance. Submitted to Automatica. Sehr, M.A. and Bitmead, R.R. (2016b). Sumptus cohiberi: The cost of constraints in MPC with state estimates. In American Control Conference, 901–906. Boston, MA. Simon, D. (2006). Optimal State Estimation: Kalman, H∞ , and Nonlinear Approaches. John Wiley & Sons, New York, NY. Smith, A.F.M. and Gelfand, A.E. (1992). Bayesian statistics without tears: a sampling–resampling perspective. The American Statistician, 46(2), 84–88. Snyder, C., Bengtsson, T., Bickel, P., and Anderson, J. (2008). Obstacles to high-dimensional particle filtering. Monthly Weather Review, 136(12), 4629–4640. Sui, D., Feng, L., and Hovd, M. (2008). Robust output feedback model predictive control for linear systems via moving horizon estimation. In American Control Conference, 453–458. Seattle, WA. Yan, J. and Bitmead, R.R. (2005). Incorporating state estimation into model predictive control and its application to network traffic control. Automatica, 41(4), 595–604.
15935
10
8
8
6
6
4
4
2
2
0
0
-2
ut
xt
10
0
5
10
15
20
25
-2
30
5
5
0
0
ut
xt
Proceedings of the 20th IFAC World Congress 15366 Martin A. Sehr et al. / IFAC PapersOnLine 50-1 (2017) 15361–15366 Toulouse, France, July 9-14, 2017
-5 5
10
15
20
25
30
600
600
400
400
yt
yt
5
10
15
20
25
30
0
5
10
15
20
25
30
0
5
10
15
20
25
30
-5 0
200 0
0
200 0
5
10
15
20
25
0
30
Time t
Time t
8
6
6
4
4
xt
8
2
2
0
0
-2
ut
(b) N = 3, Np = 100, Ns = 1, 000. 10
0
5
10
15
20
25
-2
30
5
5
0
0
ut
xt
(a) N = 3, Np = 5, 000, Ns = 1, 000. 10
-5 5
10
15
20
25
30
600
600
400
400
yt
yt
5
10
15
20
25
30
0
5
10
15
20
25
30
0
5
10
15
20
25
30
-5 0
200 0
0
200 0
5
10
15
20
25
0
30
Time t
Time t
(c) N = 3, Np = 5, 000, Ns = 50.
(d) N = 2, Np = 5, 000, Ns = 1, 000.
Fig. 2. Simulation data for example in Section 5 over 30 samples, running PMPC with control horizon N , number of particles Np and number of scenarios Ns . State, control and measurement values (blue), probabilistic and hard constraints (red), 95% confidence interval of PF (black). All controllers are subject to the same realization of the process noise wk .
15936