Optimization-based state estimation: Current status and some new results

Optimization-based state estimation: Current status and some new results

Journal of Process Control 22 (2012) 1439–1444 Contents lists available at SciVerse ScienceDirect Journal of Process Control journal homepage: www.e...

246KB Sizes 0 Downloads 16 Views

Journal of Process Control 22 (2012) 1439–1444

Contents lists available at SciVerse ScienceDirect

Journal of Process Control journal homepage: www.elsevier.com/locate/jprocont

Optimization-based state estimation: Current status and some new results夽 James B. Rawlings ∗ , Luo Ji Dept. of Chemical and Biological Engineering, Univ. of Wisconsin, Madison, WI, USA

a r t i c l e

i n f o

Article history: Received 2 August 2011 Received in revised form 15 February 2012 Accepted 4 March 2012 Available online 13 April 2012 Keywords: Moving horizon estimation Nonlinear state estimation Constraints Incremental input/output to state stability

a b s t r a c t This paper presents an overview of the fundamentals of moving horizon state estimation and full information estimation. The paper provides a new statement of robust global asymptotic stability (RGAS) for state estimation and establishes a new result using this definition, namely that full information is RGAS for the case of a nonlinear detectable system subject to ˇ-convergent state and measurement disturbances. Two unsolved research problems are presented: (i) suboptimal moving horizon estimation and (ii) proving RGAS for the case of bounded rather than convergent disturbances. © 2012 Elsevier Ltd. All rights reserved.

1. Introduction When I (JBR) was asked to contribute to a special issue in memoriam of Ken Muske, I knew right away that I wanted to contribute a paper on the subject of moving horizon estimation. I have several reasons for this choice. First of all, Ken also worked on this problem. In fact, he was the first graduate student in our group to work on this problem. Although Ken’s papers on model predictive control received more attention, I think the early work that he did on moving horizon estimation was just as insightful. A second reason is that moving horizon estimation has received far less attention as a subject compared to model predictive control. This difference might be expected simply due to the phenomenal success of model predictive control. After all, how many different topics can simultaneously enjoy tremendous popularity in a research community with finite attention capacity. But there are other factors that contribute to the fact that moving horizon estimation has received less attention than model predictive control. Both topics, control and estimation, are studied by researchers in the control community, and control people like to design controllers. Controllers move systems around; they are inherently exciting. Estimators on the other hand, just look at data and draw inferences about the system. They don’t move the system around. They are passive. Of course, we cannot know where to drive if we don’t know where we are, but most control researchers seem to prefer the actuator end rather than the sensor end of the problem. And given the much-trumpeted duality between estimation and control, one might expect that new results

夽 This work was supported by the TWCCC members. ∗ Corresponding author. Tel.: +1 608 263 5859; fax: +1 608 265 8794. E-mail addresses: [email protected] (J.B. Rawlings), [email protected] (L. Ji). 0959-1524/$ – see front matter © 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.jprocont.2012.03.001

could be easily transferred between the two subjects anyway. However, that has not proven to be the case. Another issue that may hinder many researchers from making quick exploratory forays into state estimation is the inherent problem complexity. The simplest control problem is regulation to the origin, and the study of that simplest problem is still far from complete. There is no serious state estimation problem without an entire sequence of measurements. And the appearance of a sequence of measurements significantly complicates the notation and basic state estimation problem statements. It’s more difficult for newcomers to get a foothold in this subject. It would be as if every control problem were stated as a tracking problem with a time-varying setpoint sequence. If that were how one had to get started in control problems, then state estimation would be about the same complexity and would probably compete better for new researchers’ attention. Thirdly, much of the state estimation theoretical literature is devoted to the study of stochastic systems so that the sensor and process disturbances can be modeled as random variables (noises) with specified probability distributions. Although the stochastic setting provides the necessary framework to elucidate important statistical properties of the state estimator (bias, variance, statistical optimality), the fundamental notions of system observability and estimator stability and robustness, which are critical to successful application of state estimators, receive almost no attention in this setting. Coupled with the relatively large amount of mathematical probability theory required to treat rigorously stochastic systems, coming up to speed on this literature poses significant hurdles to potential control researchers who may not care all that much about a proposed state estimator’s statistical properties but do care a great deal about its insensitivity to model errors.

1440

J.B. Rawlings, L. Ji / Journal of Process Control 22 (2012) 1439–1444

The goal of this paper therefore is to bring the reader up to speed on the progress in optimization-based state estimation during the last 20 years. We focus on the fundamental problem statements and the properties we expect of state estimators so that they have a good chance of working on difficult, nonlinear applications. Good numerical software for solving the state estimation problems is also critical for success in applications, but we will not address that issue here. We would like to convey the state estimation results in a more modern and compact mathematical language than has been used in most previous papers (including ours), and we would like to make a few conjectures that may prove motivational for new researchers who are looking for good, open problems to tackle. 2. Notation and properties of K and KL functions The symbols I≥0 and R≥0 denote the sets of nonnegative integers and reals, respectively. The symbol I0:N−1 denotes the set {0, 1, . . ., N − 1}. The symbol | · | denotes the Euclidean norm. The bold symbol x, denote a sequence of vector-valued variable x, {x(0), x(1),  . . . }. The notation x is the sup norm over a sequence, sup x(i),





i≥0

and xa:b denotes max x(i). The definition of system detectabila≤i≤b

ity and statements and proofs of estimator stability are significantly streamlined using the properties of K and KL functions, so we provide a brief summary here. The interested reader may also want to consult Khalil [1, pp. 144–147] and [9, Appendix B] for further discussion. Definition 1 (K, K∞ , and KL functions). A function  : R≥0 → R≥0 belongs to class K if it is continuous, zero at zero, and strictly increasing;  : R≥0 → R≥0 belongs to class K∞ if it is a class K and unbounded ((s)→ ∞ as s→ ∞). A function ˇ : R≥0 × I≥0 → R≥0 belongs to class KL if it is continuous and if, for each t ∈ I≥0 , ˇ(· , t) is a class K function and for each s ≥ 0, ˇ(s, ·) is nonincreasing and satisfies lim ˇ(s, t) = 0. t→∞

The following useful properties of these functions are established in Khalil [1, Lemma 4.2]: if ˛1 (·) and ˛2 (·) are K functions (K∞ functions), then ˛−1 ( · ) and (˛1 ◦ ˛2 )(·)1 are K functions (K∞ functions). 1 Moreover, if ˛1 (·) and ˛2 (·) are K functions and ˇ(·) is a KL function, then (r, s) = ˛1 (ˇ(˛2 (r), s)) is a KL function. We require the following basic inequalities to streamline our presentation. Brief proofs of these properties are given in Appendix A. 1. For ( · ) ∈ K, the following holds for all ai ∈ R≥0 , i ∈ I1:n (a1 + a2 + · · · + an ) ≤ (na1 ) + (na2 ) + · · · + (nan ) 1 (a1 + a2 + · · · + an ) ≥ ((a1 ) + (a2 ) + · · · + (an )) n

(1)

2. Similarly, for ˇ( · ) ∈ KL the following holds for all ai ∈ R≥0 , i ∈ I1:n , and all t ∈ R≥0 ˇ((a1 + a2 + · · · + an ), t) ≤ ˇ(na1 , t) + ˇ(na2 , t) + · · · + ˇ(nan , t) 1 (ˇ(a1 , t) + ˇ(a2 , t) + · · · + ˇ(an , t)) n

ˇ((a1 + a2 + · · · + an ), t) ≥

(2)

3. Basic definitions and assumptions We assume that the system generating the measurements is given by the standard discrete time, nonlinear system x+ y

= f (x, w) = h(x) + v

(3)

The state of the systems is x ∈ Rn , the measurement is y ∈ RP , and the notation x+ means x at the next sample time. A control input u may be included in the model, but it is considered a known variable, and its inclusion is irrelevant to state estimation, so we suppress it in the model under consideration here. We receive measurement y from the sensor, but the process disturbance, w ∈ Rg , measurement disturbance, v ∈ Rp , and system initial state, x(0), are considered unknown variables. These are often modeled as independent, random variables with stationary probability densities in stochastic estimation theory, but we will avoid random variables in this discussion and consider completely deterministic systems. We therefore model w, v, x(0) as unknown, but bounded disturbance variables. After this choice, we cannot speak about the statistical properties of the estimator, but we can discuss estimator stability, rate of convergence, and the sensitivity of the estimate error to the disturbances. We assume throughout that functions f : Rn × Rg → Rn and h : Rn → Rp are continuous. 4. Full information estimation The most accessible theory that we can present is the theory of full information estimation. Full information estimation also has the best theoretical properties in terms of stability and optimality. Unfortunately, it is also computationally intractable except for the simplest cases, such as a linear system model. Its value therefore lies in clearly defining what is desirable in a state estimator. One method for practical estimator design therefore is to come as close as possible to the properties of full information estimation while maintaining a tractable online computation. This design philosophy leads directly to moving horizon estimation (MHE). First we define some notation necessary to distinguish the system variables from the estimator variables. We have already introduced the system variables (x, w, y, v). In the estimator optimization problem, these have corresponding decision variables, which we denote (, ω, , ). The optimal decision variables are ˆ yˆ , vˆ ) and these optimal decisions are the estimates denoted (ˆx, w, provided by the state estimator. The relationships between these variables are x+ = f (x, w) + = f (, ω) ˆ xˆ + = f (ˆx, w)

Notice that it is always the system measurement y that appears in the second column of equations. We begin with a reasonably general definition of the full information estimator that produces an estimator that is stable, which we also shall define subsequently. The full information objective function is VT ((0), ω) = x ((0) − x0 ) +

T −1 

i (ω(i), (i))

(4)

i=0

subject to + = f (, ω)

y = h() + 

in which T is the current time, y(i) is the measurement at time i, and x0 is the prior information on the initial state. Because  = y − h() is the error in fitting the measurement y, i (ω, ) costs the model disturbance and the fitting error. These are the two error sources that we reconcile in all state estimation problems. The full information estimator is then defined as the solution to min VT ((0), ω)

(0),ω 1 (˛1 ◦ ˛2 )(·) is the composition of the two functions ˛1 (·) and ˛2 (·), defined by (˛1 ◦ ˛2 )(s) : = ˛1 (˛2 (s)).

y = h(x) + v y = h() +  y = h(ˆx) + vˆ

(5)

The solution to the optimization exists for all T ∈ I≥0 because VT (·) is continuous, due to the continuity of f(·) and h(·), and because VT (·) is

J.B. Rawlings, L. Ji / Journal of Process Control 22 (2012) 1439–1444

an unbounded function of its arguments, as will be clear after stage costs x (·), and i (·) are defined. We denote the solution as xˆ (0|T ), ˆ w(i|T ), 0 ≤ i ≤ T − 1, T ≥ 1, and the optimal cost as VT0 . We also use xˆ (T ) := xˆ (T |T ) to simplify the notation. The optimal solution and cost also depend on the measurement sequence y, and the prior x0 , but this dependency is made explicit only when necessary. The choice of stage costs x (·) and i (·) is made after we define the class of disturbances affecting the system. It is interesting to trace the development of results for full information estimation and its approximation by moving horizon estimation. The early papers assumed some form of system observability and ignored disturbances and focused on establishing stability given an unknown system initial condition. Both linear models with constraints [6,2] and nonlinear models were investigated [3,8]. Statistical properties were established for the linear, constrained case [11]. Of course, assuming observability is restrictive even for linear systems, and a better notion of detectability for nonlinear systems was needed. A necessary step forward was taken by Sontag and Wang with the introduction of an appropriate “detectability” definition for nonlinear systems, called incremental input/output-to-state stability (i-IOSS) [14]. Definition 2 (i-IOSS). The system x+ = f (x, w), y = h(x) is incrementally input/output-to-state stable (i-IOSS) if there exist functions ˛( · ) ∈ KL and  1 (·), 2 ( · ) ∈ K such that for every two initial states z1 and z2 , and any two disturbance sequences w1 and w2 generating state sequences x1 (z1 , w1 ) and x2 (z2 , w2 ), the following holds for all k ∈ I≥0

  x(k; z1 , w1 ) − x(k; z2 , w2 ) ≤ ˛1 (|z1 − z2 | , k)   + 1 (w1 − w2 0:k−1 ) + 2 (h(x1 ) − h(x2 )

0:k−1

)

(6)

1441

Proposition 6. 1. Every convergent sequence is ˇ-convergent for some ˇ ∈ KL. In terms of sets, for every w ∈ C, there exists ˇ( · ) ∈ KL such that w ∈ Cˇ . 2. For any given ˇ( · ) ∈ KL, there exist convergent sequences that are not ˇ-convergent for this ˇ(·). In terms of sets, for every ˇ( · ) ∈ KL, Cˇ ⊂ C, i.e., Cˇ is a strict subset of C. A proof is given in Appendix A. We will assume ˇ-convergence of the disturbances in this note so that we can ensure the decay rate given by the chosen KL-function. Due to the first part of Proposition 6 if one has a slowest converging disturbance in mind, we can choose an appropriate ˇ( · ) ∈ KL so that this disturbance is ˇ-convergent and handled by the state estimator. Assumption 7 (ˇ-convergent disturbances). The sequences w, v ∈ Cˇ for a chosen ˇ ∈ KL. Assumption 8 (Positive definite stage cost). The initial state cost and stage costs are continuous functions and satisfy the following inequalities for all x ∈ Rn , w ∈ Rg , and v ∈ Rp  (|x|) ≤ x (x) ≤ x (|x|)

(7)

x

 (|w|) +  (|v|) ≤ i (w, v) w

v

≤ w (|w|) + v (|v|)

i≥0

(8)

in which  ,  ,  , x , w , v ∈ K∞ . Also w , v satisfy the requirex w v ments of Proposition 13, i.e., there exists  w ,  v ∈ K such that the following holds for all w, v ∈ Cˇ ∞ 

i (w(i), v(i)) ≤  w (w) +  v (v)

i=0

The notation x(k ; x0 , w) denotes the solution to x+ = f (x, w) satisfying initial condition x(0) = x0 with disturbance sequence w = {w(0), w(1), . . .}. The notation h(x1 ) is then defined as {h(x(0 ; z1 , w1 )), h(x(1 ; z1 , w1 )), . . ., }. Given this new definition of detectability for nonlinear systems, full information and moving horizon estimation were shown to be asymptotically stable for the case of unknown initial state subject to convergent disturbances [9, Chapter 4]. We define these next. Definition 3 (Bounded sequences; set B). A sequence w(k), k ∈ I≥0 is bounded if w is finite. The set of bounded sequences is denoted by B. Definition 4 (Convergent sequence;   set C). A bounded sequence w(k), k ∈ I≥0 is convergent if w(k) → 0 as k→ ∞. Let C denote the set of convergent sequences C := {w ∈ B | w is convergent} The difficulty in trying to treat the entire class of convergent sequences in one go is that there is no slowest decaying convergent sequence. To circumvent this technical difficulty, we strengthen slightly the definition of convergence to ˇ-convergence as follows. Definition 5 (ˇ-convergent sequence; set Cˇ ). Given a ˇ( · ) ∈ KL, a bounded sequence w(k), k ∈ I≥0 is termed ˇ-convergent if

  w(k) ≤ ˇ(w , k)

for all k ≥ 0

The set of ˇ-convergent sequences is denoted Cˇ Cˇ := {w ∈ B | w is ˇ-convergent} The next proposition shows the key relationships between convergent sequences and ˇ-convergent sequences

Definition 9 (Robust global asymptotic stability (RGAS)). The estimate is based on the noisy measurement y = h(x(x0 , w)) + v. The estimate is RGAS if for all x0 and x0 , and bounded (w, v), there exist functions ˛( · ) ∈ KL and ıw ( · ), ıv ( · ) ∈ K such that the following holds for all k ∈ I≥0

    x(k; x0 , w) − x(k; xˆ (0|k), w  k ) ≤ ˛(x0 − x0  , k) + ıw (w0:k−1 ) + ıv (v0:k−1 )

(9)

Remark 10. The main import of the RGAS definition is that the dynamic system generating the estimate error is input-tostate stable (ISS) [13] considering the disturbances (w, v) as the input. We require a preliminary proposition concerning the behavior of the estimate error to establish the main result of interest. Proposition 11 (Boundedness and convergence of estimate error). Consider an i-IOSS (detectable) system and measurement sequence generated by (3) with disturbances satisfying Assumption 7, and stage cost satisfying Assumption 8. Then the full information estimate error satisfies the following: 1. For all k ≥ 0

    x(k; x0 , w) − x(k; xˆ (0|k), w  k ) ≤ x (x0 − x0 ) + w (w) + v (v) 



 k ) → 0 2. As k→ ∞, x(k; x0 , w) − x(k; xˆ (0|k), w

1442

J.B. Rawlings, L. Ji / Journal of Process Control 22 (2012) 1439–1444

Proof. Part 1. First we note that the optimal full information cost function is bounded above for all T ≥ 0 by V∞ defined by the following VT0

 T ) ≤ VT (x(0), w) = x (x0 − x0 ) = VT (ˆx(0|T )), w +

∞ 

T −1 









ˆ  (w(i|T )) +  (vˆ (i|T )) ≤ VT0 − VT0−M w

v

i=T −M

i (w(i), v(i)) := V∞

i=0

From Assumption 8 we have the following upper bound for V∞





V∞ ≤ x (x0 − x0 ) +  w (w) +  v (v)

(10)

Using the triangle inequality, we have for all k ≥ 0

      x0 − xˆ (0|k) ≤ x0 − x0  + xˆ (0|k) − x0 

x

noting x−1 ( · ) is a K-function and using (1) for the sum gives

      x0 − xˆ (0|k) ≤ x0 − x0  + x−1 (3x (x0 − x0 ))

(11)

x , x ∈ K. for xx , w v    k Next we develop a bound for the term w − w

0:k−1

. From the

triangle inequality and the definition of the sup norm we have





0:k−1





 k ≤ w + w

0:k−1

  . We have from the definition 0:k−1     ) ≤ V∞ for k ≥ 0, i ≤ k. This implies w  k 0:k−1 ≤ ˆ of V∞ ,  (w(i|k) w

 k Now we require a bound for w

 −1 (V∞ ). Substituting (10) into this result and using the previous w displayed equation and (1) gives

  w − w  k



0:k−1



w ≤ xw (x0 − x0 ) + w (w) + vw (v)

(12)

w , w ∈ K. Finally, notice that the same reasoning applies for xw , w  v vk  yielding to v −  0:k−1

0:k−1



k:k+l−1





2 (v −  vk+l 

) ≤ ε/3

k:k+l−1

) ≤ ε/3

for all k ≥ K, l ≥ 0. Next because ˛1 ( · ) ∈ KL we can choose L ≥ 0, which depends on K, such that



all l ≥ L(K),

k≥K

Combining these inequalities gives

    x0 − xˆ (0|k) ≤ xx (x0 − x0 ) + wx (w) + vx (v)

 k ≤ w0:k−1 + w



 k+l  1 (w − w 

Notice from the properties of K-functions, that the right-hand side is the sum of K-functions of the three disturbances, x0 − x0 , w, and v. So we have established the following bound valid for all k ≥ 0

0:k−1

    x(k + l; x0 , w) − x(k + l; xˆ (0|k + l), w  k+l ) ≤ ˛1 (x(k) − xˆ (k) , l)      k+l k:k+l−1 ) + 2 (v −  + 1 (w − w vk+1 ) ) k:k+l−1

˛1 (x(k) − xˆ (k) , l) ≤ ε/3

+ x−1 (3 w (w)) + x−1 (3 v (v))

  w − w  k

Because the right-hand side goes to zero as T→ ∞ for all M ≥ 1, using the properties of K functions then establishes (14). Next choose any ε > 0. From the i-IOSS assumption and the structure of the state model, (3), we have for all k, l ≥ 0

Because of Assumption 7 and (14), we can choose K ≥ 0 such that

and since for all k ≥ 0, x (ˆx(0|k) − x0 ) ≤ V∞ , the lower bound for x (·) in Assumption 8 gives xˆ (0|k) − x0  ≤  −1 (V∞ ). Substituting in (10),

  v −  vk 

converges as T→ ∞.2 Rearranging this inequality and using (8) gives

  v (w) + v (v) ≤ xv (x0 − x0 ) + w v

(13)

v , v ∈ K. We substitute (11), (12), and (13) into the defifor xv , w v nition of i-IOSS, (6), set k = 0 in the KL function ˛1 (· , k) and use (1) to obtain

    x(k; x0 , w) − x(k; xˆ (0|k), w  k ) ≤ x (x0 − x0 ) + w (w) + v (v)

  x(j) − xˆ (j) ≤ ε

all j ≥ K + L(K)





and we have thus shown that x(k) − xˆ (k) → 0 as k→ ∞, and part 2 is established.  Finally, we present a theorem summarizing what is currently known about the theoretical properties of full information estimation. Theorem 12 (RGAS of full information estimate for ˇ-convergent disturbances). Consider an i-IOSS (detectable) system and measurement sequence generated by (3) with disturbances satisfying Assumption 7, and stage cost satisfying Assumption 8. Then the full information estimator is RGAS. Proof. Because Proposition 11 applies, we know that the estimate error is bounded and converges. Therefore from the first part of Proposition 6 there exists ˛( · ) ∈ KL such that for all k ≥ 0

    x(k) − xˆ (k) ≤ ˛(( x (x0 − x0 ) + w (w) + v (v)), k)

From (2) we then have existence of ˛x , ˛w , ˛v ∈ KL such that

    x(k) − xˆ (k) ≤ ˛x (x0 − x0  , k) + ˛w (w , k) + ˛v (v , k)

(15)

Note that this result is stronger than we require for RGAS. To establish RGAS, we choose the KL-function ˛(·) = ˛x (·), and the two K functions ıw ( · ) = ˛w ( · , 0) and ıv ( · ) = ˛v ( · , 0) to give

    x(k) − xˆ (k) ≤ ˛(x0 − x0  , k) + ıw (w) + ıv (v)

Since w(j), v(j) for j ≥ k affect neither x(k) nor xˆ (k), this result also implies that

for all k ≥ 0, in which x , w , v ∈ K, and we have established the bound in the first part of the proposition. Part 2. First we establish that as k→ ∞

    x(k) − xˆ (k) ≤ ˛(x0 − x0  , k) + ıw (w0:k−1 ) + ıv (v0:k−1 )

ˆ vˆ (j|k) → 0 forall k − M ≤ j ≤ k, M ≥ 1 w(j|k),

and the estimate error satisfies (9) and RGAS has been established. 

(14)

T , vT at time T into the state estiIf we substitute the optimal w mation problem at time T − M for M ≥ 1, optimality at time T − M implies VT0−M ≤ VT0 −

T −1 

ˆ i (w(i|T ), vˆ (i|T ))

Notice that we can achieve KL functions of the sup norm of the disturbances in (15) because they are assumed ˇ-convergent disturbances here. But if we wish to assume only bounded disturbances, then the KL functions will have to become only K functions, but notice that is all that is required to establish RGAS as shown by (9).

i=T −M

This shows that the sequence of optimal costs {VT0 } is nondecreasing with time T, and since VT0 is bounded above (by V∞ ), the sequence

2 This argument was the main new insight exploited in Ken’s early papers on stability of MHE.

J.B. Rawlings, L. Ji / Journal of Process Control 22 (2012) 1439–1444

5. Some open research problems Next we would like to discuss two issues of both practical and theoretical interest that remain unresolved in current state estimation research. The authors would like to encourage researchers looking for interesting, open problems to take a look at these. 1. Suboptimal MHE. One of the difficulties of implementing MPC or MHE on nonlinear models is the nonconvex optimization that must be solved online. Since there is no guarantee that one can find a global minimum of a nonconvex function during the sample time, researchers have explored suboptimal MPC, in which one iterates an optimization algorithm until the sample time expires, and then uses whatever solution has been found for the control. Recent research has shown that suboptimal MPC with an appropriate warm start enjoys the same stability and robustness properties as optimal MPC [7]. This approach significantly lowers the implementation barrier for nonlinear MPC. The natural question arises, is there a suboptimal MHE that has the same properties as optimal MHE. At the time of this writing, no one has successfully formulated such a suboptimal MHE. 2. MHE with bounded disturbances. In the history of the theoretical development of MHE, the field has gradually moved from simple idealistic cases, to address the complexities of industrial implementation. We have seen progress from using nonlinear observability and zero disturbances to nonlinear detectability with asymptotically converging disturbances. It is natural to ask if we can now finally address the industrially important case of bounded disturbances in MHE as we do in nonlinear MPC. Consider the following conjecture. Conjecture 13 (RGAS of full information estimate for bounded disturbances). Given an i-IOSS (detectable) system and measurement sequence generated by (3) with bounded disturbances (w, v) ∈ B, then the full information estimate is RGAS. At the time of this writing, we do not know if this conjecture is true, and, if true, we do not know the suitable stage cost assumption to replace Assumption 8, nor the appropriate analysis to establish the conjecture. Some recollections of Ken Muske I (JBR) met Ken when he joined the PhD program at the University of Texas in January 1990. I had joined the department as an assistant professor only a few years earlier. Ken had a great deal of industrial experience implementing advanced control and model predictive control working with Setpoint, one of the main providers of industrial MPC technology at the time. Ken was interested in working on a PhD in the area of model predictive control. I was already interested in MPC, but mainly as a new method for controlling nonlinear systems, which was the PhD topic of both John Eaton and Scott Meadows. Ken was mainly interested in constrained, linear systems. So we set out to take a look at that problem together. I enjoyed working with Ken right away. He would explain to me how he addressed various issues in industrial implementations, and I would explain to him what the theory provided that was not being exploited in the current industrial technology. Our different backgrounds gave each of us the opportunity to learn a great deal from the other. Ken quickly assimilated the control theory that was useful for MPC, and started adding to it. Then he recast his previous industrial solutions and showed how we could change them to take full advantage of the newly emerging theory. This was the basis for his first major paper, “Model Predictive Control with Linear Models,” that appeared in AIChE J in 1993 [4].

1443

Ken would later receive the AIChE CAST Division’s Ted Peterson Student Paper Award (now W. David Smith, Jr. Graduate Publication Award) for this contribution to the research literature. Ken next turned to moving horizon estimation, and contributed to this new approach to state estimation with several papers, treating both constrained, linear and nonlinear systems [6,5]. Ken and I had the good fortune to collaborate with Jay Lee and David Mayne on the state estimation problem. After he received his PhD, Ken and I saw each other often at research meetings. We also went on several canoe trips together in the Quetico Provincial Park in Ontario, Canada. Bob Bird and Dale Rudd, of chemical engineering fame at the University of Wisconsin, introduced Ken to the park on our first 2-week trip in 1996. Ken and I spent a lot of time fishing together on these trips. I had previously thought that fishing was mainly a matter of luck rather than skill, but Ken convinced me otherwise. He consistently caught significantly more fish than the rest of us. I used to enjoy even watching Ken fish, he was so good at it. I also found out on these trips that Ken was a very skillful and strong canoe paddler. On one fishing expedition I borrowed one of Bob’s prize lures and managed to snag it in a low tree overhanging the spot near shore where Ken and I were fishing. We did not want to return to camp without that lure! So Ken steadied the canoe in place by paddling in a back and forth motion while I stood on top of the yoke in the center of the canoe and tried to free the lure by reaching overhead with my paddle. Of course, we both knew that this was not recommended practice in the official canoeist’s handbook, but we desperately wanted that lure, and it was our only chance. After nearly tipping the canoe a dozen times, we finally freed that lure. As a graduate student, Ken did not always listen to me, of course. I recall reading the first draft of his PhD thesis and noticing that not a single figure appeared in the entire thesis. I told Ken that I thought that most people think in pictures, and treating the reader to hundreds of pages of equations without a single figure would not be highly appreciated. But Ken insisted and told me that figures would only detract from his message.3 In memory of Ken, I tried here to write a paper without a single figure. I don’t recall ever trying to do this before, and I cannot recommend making a habit of it. Systems theory is a challenging subject, and pictures usually help convey concepts and ideas, even in proofs; perhaps, especially in proofs. Acknowledgments The authors gratefully acknowledge the financial support of the industrial members of the Texas-Wisconsin-California Control Consortium. The first author would like to acknowledge several colleagues whose helpful collaborations over many years have taught him a lot about the subject of state estimation: Ken Muske, Scott Meadows, John Campbell, Peter Findeisen, Chris Rao, Eric Haseltine, Murali Rajamani, Tyler Soderstrom, Fernando Lima, Jay Lee, Bhavik Bakshi, Joe Qin, and David Mayne. Appendix A. Technical lemmas To establish (1), notice that a1 + a2 + · · · + an ≤ nmaxai i

and since K functions are increasing, (a1 + a2 + · · · + an ) ≤ (nmaxai ) i

3

You can find Ken’s thesis here: http://jbrwww.che.wisc.edu/theses/muske.ps.

1444

J.B. Rawlings, L. Ji / Journal of Process Control 22 (2012) 1439–1444

Then obtain (1) by noting that since K functions are nonnegative (nmaxai ) ≤ (na1 ) + (na2 ) + · · · + (nan ) i

To establish the lower bound, use the fact that for all i ∈ I1:n , (ai ) ≤ (a1 + a2 + · · · an ), which follows because the ai are nonnegative and K functions are increasing. Summing the inequality over i gives the lower bound. Similar arguments apply to (2) and the displayed equation following it because ˇ( · ) ∈ KL implies ˇ( · , t) ∈ K for all t ∈ I≥0 . Proof of Proposition 6 1. Every convergent sequence is ˇ-convergent for some ˇ ∈ KL. Let w be a convergent, vector-valued sequence w := {w(k) ∈ Rn }, k ∈ I≥0 . We need to construct KL-function ˇ(w , k) such that

  w(k) ≤ ˇ(w , k) forall k ≥ 0

(A.1)

To accomplish this, first define  the following function of time g( · ) : I≥0 → R≥0 , g(t) := sup w(j) / w. The function is well-defined for j≥t

all t ∈ I≥0 because w is bounded. From its definition we see that g(0) = 1, g(·) is nonincreasing, and the convergence of w(k) to zero as k→ ∞ implies lim g(t) = 0. Then consider the following candit→∞

date for ˇ( · ) : R≥0 × I≥0 → R≥0 ˇ(s, t) = sg(t) From the previously stated properties of g(·), we can easily check that ˇ(·) satisfies the properties of a KL-function. (A.1),  To establish  assume the contrary, that for some K ∈ I≥0 , w(K) > ˇ(w , K).

  Using the definition of ˇ(·) we have that w(K) > ˇ(w , K) ≥     w w(K) / w = w(K), which is a contradiction, and the proof

such that applying w ( · ) to the terms in the sequence {ˇ(w , i)}, i ≥ 0, makes the sum convergent [12, Proposition 7]. Therefore, the following function is well defined for w ∈ Cˇ ˜ w (w) :=

∞ 

w (ˇ(w , i))

i=0

Notice that ˜ w (0) = 0 and ˜ w ( · ) is increasing. But ˜ w ( · ) may not be a K-function because it may not be continuous. Recall that an infinite series of continuous functions can converge to a discontinuous function. Think of the Fourier series for a step function, for example. But we can establish that ˜ w ( · ) is continuous at zero as follows. From [12, Proposition 7] we have the following bound valid for all a, i ∈ R≥0 w (ˇ(a, i)) ≤ (a)e−i in which ∈ K. Performing the sum gives the following upper bound for all a ∈ R≥0 0 ≤ ˜ w (a) ≤

e (a) e−1

The lower bound follows because ˜ w (0) = 0 and ˜ w ( · ) is increasing. Since ˜ w ( · ) is upper and lower bounded by continuous functions that vanish at zero, ˜ w ( · ) is continuous at zero. Continuity at zero is sufficient to define an upper bounding (continuous) Kfunction,  w ( · ), that satisfies for all a ∈ R≥0 , ˜ w (a) ≤  w (a). See [10, Proposition 11] for the details of this argument. We then have ∞ 

i (w(i)) ≤

i=0

∞ 





w (w(i)) ≤

i=0

and the result is established. 

1. For any given ˇ( · ) ∈ KL, there exist convergent sequences that are not ˇ-convergent for this ˇ(·).

References

      that ˇ(w(0) , M) ≤ (1/2) w(0). Such a finite M exists because       ˇ(w(0) , k) → 0 as k→ ∞. Then let w(k) = 2ˇ(w(0) , k) for so that w = w(0). Next choose M ∈ I>0 large enough so

k > M.Such a w sequence is convergent (and still satisfies w = w(0)), but it violates the inequality in (A.1) for all k > M, and therefore this convergent sequence is not ˇ-convergent for the given ˇ(·). Proposition 13 (K function bound for sum of cost function for ˇconvergent sequence). Let w be an arbitrary element of Cˇ . There exist K-functions  w ( · ) and w ( · ), which depend on ˇ(·), such that for any positive sequence of cost functions i : Rn → R≥0 for i ∈ I≥0 , satisfying the bound i (w) ≤ w (|w|)

forall w ∈ Rn , i ∈ I≥0

Then the following holds ∞ 

i (w(i)) ≤  w (w)

i=0





Proof. Since w ∈ Cˇ , we have that w(i) ≤ ˇ(w , i), i ≥ 0. Since ˇ(·) is a KL-function, we know that there exists a K-function w ( · )

w (ˇ(w , i))

i=0

= ˜ w (w) ≤  w (w)

is complete.

Here we simply construct a convergent sequence that converges more slowly than the given ˇ(·). Given  ˇ( · ) ∈ KL,  choose a (nonzero) sequence w(k) such that w(0) ≥ w(k) , k ∈ I≥0

∞ 

[1] H.K. Khalil, Nonlinear Systems, third ed., Prentice-Hall, Upper Saddle River, NJ, 2002. [2] E.S. Meadows, K.R. Muske, J.B. Rawlings, Constrained state estimation and discontinuous feedback in model predictive control, in: Proceedings of the 1993 European Control Conference, 1993, pp. 2308–2312. [3] H. Michalska, D.Q. Mayne, Moving horizon observers and observer-based control, IEEE Trans. Auto. Cont. 40 (6) (1995) 995–1006. [4] K.R. Muske, J.B. Rawlings, Model predictive control with linear models, AIChE J. 39 (2) (1993) 262–287. [5] K.R. Muske, J.B. Rawlings, Nonlinear moving horizon state estimation, in: R. Berber (Ed.), Methods of Model Based Process Control, NATO Advanced Study Institute Series: E Applied Sciences 293, Kluwer, Dordrecht, The Netherlands, 1995, pp. 349–365. [6] K.R. Muske, J.B. Rawlings, J.H. Lee, Receding horizon recursive state estimation, in: Proceedings of the 1993 American Control Conference, June 1993, pp. 900–904. [7] G. Pannocchia, J.B. Rawlings, S.J. Wright, Conditions under which suboptimal nonlinear MPC is inherently robust, Sys. Cont. Let. 60 (2011) 747–755. [8] C.V. Rao, J.B. Rawlings, D.Q. Mayne, Constrained state estimation for nonlinear discrete-time systems: stability and moving horizon approximations, IEEE Trans. Auto. Cont. 48 (2) (February 2003) 246–258. [9] J.B. Rawlings, D.Q. Mayne, Model Predictive Control: Theory and Design, Nob Hill Publishing, Madison, WI, 2009, 576 pp., ISBN 978-0-9759377-0-9. [10] J.B. Rawlings, D.Q. Mayne, Postface to Model Predictive Control: Theory and Design, 2011, http://jbrwww.che.wisc.edu/home/jbraw/mpc/postface.pdf. [11] D.G. Robertson, J.H. Lee, On the use of constraints in least squares estimation and control, Automatica 38 (7) (2002) 1113–1124. [12] E.D. Sontag, Comments on integral variants of ISS, Sys. Cont. Let. 34 (1998) 93–100. [13] E.D. Sontag, Y. Wang, On the characterization of the input to state stability property, Sys. Cont. Let. 24 (1995) 351–359. [14] E.D. Sontag, Y. Wang, Output-to-state stability and detectability of nonlinear systems, Sys. Cont. Let. 29 (1997) 279–290.