Solution of the input-constrained LQR problem using dynamic programming

Systems & Control Letters 56 (2007) 342 – 348 www.elsevier.com/locate/sysconle Solution of the input-constrained LQR problem using dynamic programmin...

Download PDF

188KB Sizes 0 Downloads 120 Views

Report

PDF Reader
Full Text

Systems & Control Letters 56 (2007) 342 – 348 www.elsevier.com/locate/sysconle

Solution of the input-constrained LQR problem using dynamic programming José B. Mare ∗ , José A. De Doná Centre for Complex Dynamic Systems and Control, School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan 2308, NSW, Australia Received 3 June 2004; received in revised form 12 February 2005; accepted 7 October 2006 Available online 28 November 2006

Abstract The input-constrained LQR problem is addressed in this paper; i.e., the problem of ﬁnding the optimal control law for a linear system such that a quadratic cost functional is minimised over a horizon of length N subject to the satisfaction of input constraints. A global solution (i.e., valid in the entire state space) for this problem, and for arbitrary horizon N, is derived analytically by using dynamic programming. The scalar input case is considered in this paper. Solutions to this problem (and to more general problems: state constraints, multiple inputs) have been reported recently in the literature, for example, approaches that use the geometric structure of the underlying quadratic programming problem and approaches that use multi-parametric quadratic programming techniques. The solution by dynamic programming proposed in the present paper coincides with the ones obtained by the aforementioned approaches. However, being derived using a different approach that exploits the dynamic nature of the constrained optimisation problem to obtain an analytical solution, the present result complements the previous methods and reveals additional insights into the intrinsic structure of the optimal solution. © 2006 Elsevier B.V. All rights reserved. Keywords: Constrained optimal control; Input constraints; Dynamic programming; Analytical solution

1. Introduction The solution of constrained LQR problems has attracted considerable attention recently. This interest is, mainly, due to the fact that these problems constitute the core underlying optimisation problem that is solved, at each sampling time, by model predictive control algorithms (one of the most popular control methodologies used in industry at present). Of particular interest has been the derivation of explicit solutions that, through a characterisation of the optimal solution that is computed offline, would render on-line optimisation unnecessary. Recently, two approaches have simultaneously been developed, aiming at obtaining such off-line explicit solutions. These two approaches have been reported in, for example, [2,11]. The ﬁrst method provides an algorithm, based on multi-parametric quadratic programming techniques, to obtain an explicit solution to the problem. The second method mentioned above uses geometric arguments to obtain a characterisation of the optimal solution of the resulting quadratic programme. Subsequently, ∗ Corresponding author.

E-mail addresses: [email protected] (J.B. Mare), [email protected] (J.A. De Doná). 0167-6911/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.sysconle.2006.10.018

many interesting extensions have been reported. For example, in [9] an explicit solution to a Min–Max MPC problem with bounded uncertainties is obtained; a suboptimal formulation that reduces the complexity of the solution is proposed in [7]; in [6] the inﬁnite-time solution is computed by combining multiparametric quadratic programming with reachability analysis. In fact, there exists a growing number of publications on this topic that reﬂects the interest that these problems have generated. Starting from a different perspective to the ones mentioned above, a solution to the input-constrained case has been reported in [4]. This solution, obtained by using dynamic programming arguments, was of a local nature; i.e., valid in a region of the state space, and consisted in simply clipping the optimal unconstrained solution; i.e., u = −sat (Kx), where sat (·) is the saturation function with bounds ±. (Related work that utilises a different approach based on KKT optimality conditions has also been published in [8]). In [3], the region where this solution is valid was further characterised by a set of linear inequalities and it was shown that, inside this region, the controller u = −sat (Kx) effectively reaches the constraints, thus providing a nontrivial characterisation of

J.B. Mare, J.A. De Doná / Systems & Control Letters 56 (2007) 342 – 348

the optimal solution. These results were used in [5] to obtain improved terminal constraint sets that guarantee closed-loop stability of model predictive control schemes. In the present paper, the solution by dynamic programming is further extended to provide the global solution (i.e., valid in the entire state space) to the problem for arbitrary horizon N. The global solution is, of course, not just u = −sat (Kx) and can be concisely summarised by the expression u=−sat (Lˆ N,i x + hˆ N,i ) for x ∈ Xi , where the setXi ⊂ Rn represents a region of a state-space partition Rn = j Xj . The solution presented in this paper exploits the dynamic nature of the optimisation problem to obtain an analytical characterisation of the optimal control law. Although the ﬁnal result (when evaluated at any given particular problem) obviously coincides, by optimality, with those obtained by other methods, the main contribution of the solution obtained by dynamic programming lies in that all the derivations required are analytical and, hence, the solution is given by closed-form expressions (i.e., all the expressions are closed-form functions of the data of the problem). One of the motivations of this approach is that it provides an alternative methodology, based on analytical derivations, to obtain the solution. It is envisaged that this alternative methodology could be amenable to being extended to related open problems, such as constrained control of non-linear systems, closedform solution of the constrained continuous-time LQR problem, etc. The remainder of the paper is organised as follows. In Section 2, the input-constrained LQR problem is formulated. In Section 3 the solution is provided, which comprises the control law structure and the regions of the state space where each component of the control law is valid. The derivation of the solution is done by dynamic programming and is included in the Appendix at the end of the paper. The solution is illustrated with an example in Section 4. Finally, Section 5 presents the conclusions. 2. Problem formulation Consider the discrete-time linear state-space model xk+1 = Ax k + Buk ,

(1)

where xk ∈ Rn and uk ∈ R are the state and control input, respectively. In (1) the pair (A, B) is assumed to be stabilisable, and the control input is required to satisfy the constraint uk ∈ , where [−, ], > 0. The following notation will be employed. The control sequence, for some horizon N, is denoted by uu0 {u0 , u1 , . . . , uN−1 }. For r ∈ {1, . . . , N}, and for some initial time N − r ∈ {0, 1, . . . , N −1}, let uN−r denote the partial control sequence: uN −r {uN−r , uN−(r−1) , . . . , uN−1 }.

343

By u ∈ N (uN−r ∈ r ) we denote the case in which each element in the sequence satisﬁes uk ∈ , k = 0, . . . , N − 1 (k = N − r, . . . , N − 1). The solution of (1) at time k N − r when the initial state at time N − r is xN−r = x, and the control sequence is uN−r , u is denoted by xk N−r (x, N − r). To simplify notation, the initial time is dropped when it is zero; i.e., xku (x)xku0 (x, 0). The ﬁxed-horizon optimal control problem considered is PN (x) : VNo (x) = min VN (x, u) u

subject to u ∈ N .

(2)

The cost VN (·, ·) in (2) is deﬁned by VN (x, u) =

N−1

T (xkT Qx k + uTk Ruk ) + xN P xN ,

(3)

k=0

with xk = xku (x), and where Q is the state weighting matrix, assumed to be positive semideﬁnite, R is the control weighting matrix, assumed to be positive deﬁnite, and P is the terminal state weighting matrix which is chosen as the positive deﬁnite matrix solution of the algebraic Riccati equation ¯ P = AT P A + Q − K T RK,

(4)

where KR¯ −1 B T P A,

¯ RR + B T P B.

(5)

It is well known (see, for example, [10]) that with this choice of terminal weight P, and provided that the horizon N is large enough, the resulting receding horizon implementation of the control law gives an asymptotically stable closed loop system and possesses all the properties of inﬁnite-horizon optimal control. By the receding horizon implementation it is understood the standard technique (also known as model predictive control) in which the ﬁrst control action u0 in the optimal control sequence u that minimises (2)–(3) is applied to system (1) and, as the state evolves to a new value in the next sampling time, the optimisation process is repeated over a horizon of length N (receding horizon). 3. Solution of PN (x) by dynamic programming For each r ∈ {1, . . . , N}, the partial value function (or optimal cost to go) is deﬁned by Vro (x) = min Vr (x, uN−r ) uN−r

(6)

subject to the constraint uN−r ∈ r , where the partial cost Vr (·, ·) is deﬁned by Vr (x, uN−r ) =

N−1 k=N−r

T (xkT Qx k + uTk Ruk ) + xN P xN

(7)

344

J.B. Mare, J.A. De Doná / Systems & Control Letters 56 (2007) 342 – 348 u

with xk =xk N−r (x, N −r), k=N −r, N −r +1, . . . , N. We refer to Vro (·) as the partial value function (or, just the value function) ‘at time N − r’, meaning that the (partial) value function ‘starts at time N − r’. We also deﬁne V0o (x) = x T P x.

(8)

To solve problem PN (x) deﬁned in (2)–(3), dynamic programming (see [1]) will be used, which is based on the principle of optimality: an optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the ﬁrst decision. Applying the principle of optimality to problem PN (x), we have V0o (x) = x T P x, Vro (x) = min{x T Qx u∈

o + uT Ru + Vr−1 (Ax + Bu)}

(9)

for r = 1, . . . , N, where u = uN−r and x = xN−r . With this formulation, the sequence of optimal costs to go o (x), . . . , V o (x)} and the sequence of optimal {VNo (x), VN−1 0 controls {uo0 (x), uo1 (x), . . . , uoN−1 (x)} are obtained.

In the summations in (10) and (11) it is to be understood that whenever k2 < k1 , kj2=k1 (·) = 0, the zero matrix with the same number of rows and columns as the matrices in the summation;

k2 and in (13), whenever k2 < k1 , it is to be understood that j =k1 (·)=I , the identity matrix with the same number of rows (and columns) as the matrices in the products. The following theorem provides the solution of problem PN (x) deﬁned in (2)–(3). Theorem 1. The control sequence that minimises PN (x) deﬁned in (2)–(3) is given by uoN−r (xN−r ) = −Lr xN−r − hr

for r = 1, . . . , N, where Lr and hr are deﬁned by (14) and (15), respectively, and the sequence of optimal costs to go, for r = 1, . . . , N, is given by ⎡ r−1 (Kˆ r−i Aˆ r,i,0 Vro (x) = x T P x + R¯ ⎣(Kˆ r x − hr )2 + i=1

× (A − BLr )x − Kˆ r−i Aˆ r,i,0 Bhr − Kˆ r−i

3.1. Control law

K+

hˆ r = −

i−1

r−1

ˆ

ˆ

ˆ

ˆ

i=1 Kr−i Ar,i,0 B Kr−i Ar,i,0 A , 2 ˆ ˆ 1 + r−1 i=1 (Kr−i Ar,i,0 B)

r−1

i−1 ˆ ˆ ˆ ˆ p=1 Ar,i,p Bhr−p i=1 Kr−i Ar,i,0 B(Kr−i r−1 1 + i=1 (Kˆ r−i Aˆ r,i,0 B)2

(10) + hr−i )

,

(11) where K is computed from (4)–(5) and Kˆ d , for d =1, . . . , r −1, is deﬁned by Kˆ d K − Ld ,

Aˆ r,i,p Bhr−p − hr−i )2 ⎦ .

a−1−c

(17)

Proof. See Section A.2 in Appendix A. Remark 2. Alternatively, the optimal control sequence in Theorem 1 can be rewritten as uoN−r (xN−r ) = −sat (Lˆ r xN−r + hˆ r )

(18)

for r =1, . . . , N, where Lˆ r and hˆ r are deﬁned by (10) and (11), respectively. Fig. 1 shows the way in which the optimal sequence is generated. In each pair (Lr , hr ), r = 1, . . . , N, the ﬁrst element corresponds to the row vector that multiplies x in the control law (16), while the second element is the constant term.

(12) (K, 0) = ( L1, h1)

and Aˆ a,b,c , for a = r; b = 1, . . . , r − 1; c = 0, . . . , r − 2, is deﬁned by Aˆ a,b,c

⎤

p=1

In the sequel we deﬁne all the elements that appear in the solution of PN (x), which will be presented in Theorem 1 below. Let us deﬁne for r = 1, . . . , N Lˆ r =

(16)

(A − BLj ).

(LN, hN)

(13)

j =a+1−b

Finally, for r = 1, . . . , N, deﬁne ˆ Lr if | − Lˆ r xN−r − hˆ r |, Lr = 0 otherwise, ⎧ˆ if | − Lˆ r xN−r − hˆ r |, ⎨ hr hr = − if (−Lˆ r xN−r − hˆ r ) < − , ⎩ if (−Lˆ r xN−r − hˆ r ) > .

(14) (15)

(14)

(10)

(L 2, h 2) (10)

(14) (15)

(LN, hN)

(LN-1, hN-1)

(16)

. ..

(11)

(14) (15)

x0

(15)

(L N-1, h N-1)

(1)

(14) (15)

. ..

(16)

x1

(11)

(L2, h2) (16)

. ..

xN-2

(L1, h1) (1)

(16)

xN-1

(1)

xN

Fig. 1. Control law calculation. (Notice that L1 , h1 , . . . , LN , hN are computed off-line.)

J.B. Mare, J.A. De Doná / Systems & Control Letters 56 (2007) 342 – 348

345

The dashed lines represent the order in which the components of the control law are calculated, while the solid lines show how the system evolves through the states using these control laws. The number of the equations used to obtain the expressions are also included in the ﬁgure. Notice that the control law calculation (dashed lines) is pre-computed off-line, as it does not involve knowledge of the current state x0 .

Remark 5. Notice that in the deﬁnition of the regions in Theorem 3, there are 3 possibilities at each stage r = 1, . . . , N. This gives a total of 3N possibilities, and hence 3N regions. However, in practice only a fraction of these regions turns out to be non-empty. (See the example in the next section.)

3.2. Regions of the state space

Consider the model of a DC motor driving an inertial load. The output of the system is the angular rate of the load, and the input is the applied voltage. For a sampling period of 0.01 s, the zero-order hold discretisation of the DC motor has a statespace realisation (1) with matrices 0.9662 −0.0339 0.6554 A= , B= , C = [0 1]. 0.0509 0.7398 0.0179

In the previous section, the entire ﬁxed-horizon optimal control law was characterised. In this section, we are interested only in the ﬁrst element uo0 of the optimal control sequence {uo0 (x), uo1 (x), . . . , uoN−1 (x)}, since this is the control that is actually applied in a receding horizon strategy. First, note from (14)–(15) that there are 3 possibilities for Lr and hr at each r = 1, . . . , N. This gives a total of 3N possible values for the pair (LN , hN ) that deﬁne the ﬁrst control move. However, it must be noted that for a particular value of the initial state x0 , there will be only one pair (LN , hN ) that applies (since only one of the possibilities in (14)–(15) can be true for each r, r = 1, . . . , N). In this section we characterise, by a set of linear inequalities, the regions of the state space where each of the 3N different possibilities for (LN , hN ) is valid. (Note that, although the number 3N may seem prohibitively large for N reasonably big, in practice many of these regions are empty as we will illustrate later with an example.) Theorem 3. The regions of the state space where the control law uo0 (x0 ) = −LN x0 − hN (computed in Theorem 1) is optimal is determined by N inequalities which, for each r = 1, . . . , N, are given by one, and only one, of the following cases: if − Lˆ r xN−r − hˆ r < − , (ii) Er x0 + Fr > if − Lˆ r xN−r − hˆ r > , (iii) − Er x0 + Fr if | − Lˆ r xN−r − hˆ r |, (i)

Er x0 + Fr < −

(19)

The input constraint is taken as = 5 V. In the ﬁxed-horizon cost function (3) we take N =7, Q=C T C and R = 0.1. Theorem 1 is used to obtain all the possible values for L7 and h7 in the control law u0 = −L7 x0 − h7 , considering that each control action (u0 , u1 , . . . , u6 ) can either saturate at ± or not. Once a particular pair (L7 , h7 ) is calculated, Theorem 3 is used to deﬁne the region where such control law is valid. As each u7−r can take one of three different values (−, −Lˆ r x7−r − hˆ r , ) for r = 1, . . . , 7, there are 37 possible pairs (L7 , h7 ). However, some of these pairs lead to empty regions. The resulting partition of the state space for this example is shown in Fig. 2. As an example, for x0 in the shaded region in Fig. 2, the optimal control law for horizon N = 7 is given by u0 = −[0.4783 0.7206]x0 − 3.8600. The inequalities that deﬁne this particular region are [−0.1502 0.1951]x0 8.0585, [0.1502 − 0.1951]x0 1.9415, [0.1549 − 0.1520]x0 − 6.7129,

where Er = −Lˆ r

N

[0.1858 − 0.1196]x0 − 6.6398, (A − BLi ),

(20)

N−r i=1

⎡ ⎣

N−i

⎤ (A − BLp )⎦ BhN−i+1 − hˆ r .

[0.2143 − 0.0553]x0 − 6.3919, [0.2398 0.0489]x0 − 6.0175,

i=r+1

Fr = Lˆ r

4. Example

[−0.4247 − 0.3298]x0 9.0642, (21)

p=r+1

The cases considered in (19), for each r, r = 1, . . . , N, must correspond to the ones deﬁning the pair (LN , hN ). Proof. See Section A.3 in Appendix A. Remark 4. Theorem 3 states that for each r = 1, . . . , N, the constraint on −Lˆ r xN−r − hˆ r leading to uN−r = −Lr xN−r − hr , and related to the state xN−r , can be posed as a constraint on Er x0 + Fr , related to the initial state x0 .

[0.4247 0.3298]x0 0.9358, [−0.4783 − 0.7206]x0 8.8600, [0.4783 0.7206]x0 1.1400. As can be seen from the ﬁgure, not all of the above inequalities in the deﬁnition of this region are active. Also, in Fig. 2, the optimal trajectory for an initial condition x0 = [−24 32]T is shown, where the optimal sequence has been implemented in a receding horizon fashion; i.e., at each step only the ﬁrst control action uo0 in the optimal control sequence {uo0 , uo1 , . . . , uo6 } is applied.

346

J.B. Mare, J.A. De Doná / Systems & Control Letters 56 (2007) 342 – 348 40

A.2. Proof of Theorem 1

30

Proof. The proof is done by induction. (a) V1o (·): Using (4), (5), (8) and (9) we have

20

V1o (x) = min{x T Qx + uT Ru + V0o (Ax + Bu)} u∈

x2

10

= min{x T Qx + uT Ru + (Ax + Bu)T P (Ax + Bu)} u∈

0

¯ = min{x T P x + R(Kx + u)2 }. u∈

-10 -20 -30 -40

-50

-40

-30

-20

-10

0 x1

10

20

30

40

50

Fig. 2. State-space partition and optimal trajectory.

5. Conclusions The solution of the input-constrained LQR problem has been derived analytically by using dynamic programming techniques. By exploiting the dynamic nature of the optimal control problem, insights into the intrinsic structure of the optimal solution have been obtained which complement recent results that address this problem. An example has been included, which illustrates the structure of the solution.

The control law that minimises V1 (·, ·) is the value of u that minimises (Kx + u)2 . It is clear that, in the unconstrained case, u = −Kx (the expression given by −Lˆ 1 x − hˆ 1 ) would minimise V1 (·, ·). The constraint u ∈ [−, ] and the convexity of (Kx +u)2 lead to the solution u=−sat (Kx) ≡ −L1 x −h1 , with L1 and h1 given by (14) and (15) for r = 1. Substituting the minimiser u = −L1 x − h1 into the above expression for V1o (x) leads to ¯ Kˆ 1 x − h1 )2 V1o (x) = x T P x + R( which coincides with (17) for r = 1. (b) Vro (·): We will now assume, as induction hypothesis, that Eq. (17) is valid for r − 1 and will show that this assumption implies that the hypothesis also holds for r. This, and the validity already shown for r = 1, will prove that (17) holds for all r ∈ {1, . . . , N}. By the induction hypothesis ⎡ ⎢ o Vr−1 (x) = x T P x + R¯ ⎣(Kˆ r−1 x − hr−1 )2 ⎛

Appendix A A.1. Preliminary results

+

r−2

⎜ˆ ⎝Kr−1−i Aˆ r−1,i,0 (A − BLr−1 )x.

i=1

The following results will be needed in the proof of the main result (Theorem 1).

− Kˆ r−1−i Aˆ r−1,i,0 Bhr−1

Lemma 6. The matrices deﬁned in (13) satisfy the following properties:

−Kˆ r−1−i

i−1

⎞2 ⎤ ⎥ Aˆ r−1,i,p Bhr−1−p − hr−1−i ⎠ ⎦ .

p=1

(i) Aˆ r−1,i,p = Aˆ r,i+1,p+1 , (ii) Aˆ r−1,i−1,0 (A − BLr−1 ) = Aˆ r,i,0 , (iii) Aˆ r,0,0 = I . Proof. (i) Trivial by direct substitution into (13). (ii) Trivial by direct substitution into (13). (iii) This property follows from the convention adopted:

k2 j =k1 (·) = I whenever k2 < k1 . Lemma7 (Minimum of the sum of m quadratic functions). Let 2 f (u)= m i=1 (ai u+bi ) be a function of u ∈ R, with parameters ai , bi ∈ R, where not all ai are equal m to2 zero. Then f reaches its minimum at u = − m a b / i=1 i i i=1 ai . Proof. Straightforward by setting df (u)/du = 0, since d2 f (u)/du2 > 0.

Using the principle of optimality, Eq. (9), we have o (Ax + Bu)}. Vro (x) = min{x T Qx + uT Ru + Vr−1 u∈

o (x) above, then From the expression of Vr−1 ⎧ ⎪ ⎨ o Vr (x) = min x T Qx + uT Ru + (Ax + Bu)T P (Ax + Bu) u∈ ⎪ ⎩

⎡ ⎢ + R¯ ⎣(Kˆ r−1 (Ax + Bu) − hr−1 )2

+

r−2 i=1

⎛ ⎝Kˆ r−1−i Aˆ r−1,i,0 (A − BLr−1 )(Ax + Bu)

J.B. Mare, J.A. De Doná / Systems & Control Letters 56 (2007) 342 – 348

− Kˆ r−1−i Aˆ r−1,i,0 Bhr−1

⎞2 ⎤⎫ ⎪ i−1 ⎬ ⎥ Aˆ r−1,i,p Bhr−1−p − hr−1−i ⎠ ⎦ . − Kˆ r−1−i ⎪ ⎭ p=1 Using the algebraic Riccatti equation (4), ⎧ ⎪ ⎨ o ¯ Vr (x) = min x T P x + R(Kx + u)2 u∈ ⎪ ⎩ ⎡ ⎢ + R¯ ⎣(Kˆ r−1 (Ax + Bu) − hr−1 )2

+

r−2

⎛ ⎝Kˆ r−1−i Aˆ r−1,i,0 (A − BLr−1 )(Ax + Bu)

i=1

− Kˆ r−1−i Aˆ r−1,i,0 Bhr−1 − Kˆ r−1−i

i−1 p=1

⎧ ⎪ ⎨

⎞2 ⎤⎫ ⎪ ⎬ ⎥ Aˆ r−1,i,p Bhr−1−p − hr−1−i ⎠ ⎦ ⎪ ⎭

¯ = min x T P x + R(Kx + u)2 u∈ ⎪ ⎩ ⎡ ⎢ + R¯ ⎣(Kˆ r−1 (Ax + Bu) − hr−1 )2 ⎛ r−2 ⎜ˆ + ⎝Kr−1−i Aˆ r−1,i,0 (A − BLr−1 )(Ax + Bu)

⎡ ⎢ + R¯ ⎣(Kˆ r−1 (Ax + Bu) − hr−1 )2 ⎛ r−1 ⎜ˆ + ⎝Kr−i Aˆ r−1,i−1,0 (A − BLr−1 )(Ax + Bu) i=2

⎞2 ⎤⎫ ⎪ ⎬ ⎥ − Kˆ r−i Aˆ r,i,p+1 Bhr−p−1 − hr−i ⎠ ⎦ . ⎪ ⎭ p=0 i−2

Using properties (ii) and (iii) in Lemma 6, Vro (x) can be rewritten as ⎧ ⎡ ⎪ ⎨ ⎢ Vro (x) = min x T P x + R¯ ⎣(Kx + u)2 u∈ ⎪ ⎩ +

p=0

Using property (i) in Lemma 6, ⎧ ⎪ ⎨ o ¯ Vr (x) = min x T P x + R(Kx + u)2 u∈ ⎪ ⎩ ⎡ ⎢ + R¯ ⎣(Kˆ r−1 (Ax + Bu) − hr−1 )2

+

r−2

⎛ ⎝Kˆ r−i−1 Aˆ r−1,i,0 (A − BLr−1 )(Ax + Bu)

i=1

−Kˆ r−i−1 ⎧ ⎪ ⎨

i−1 p=0

⎞2 ⎤⎫ ⎪ ⎬ ⎥ ˆ ⎠ Ar,i+1,p+1 Bhr−1−p − hr−1−i ⎦ ⎪ ⎭

¯ = min x T P x + R(Kx + u)2 u∈ ⎪ ⎩

r−1

⎛ ⎝Kˆ r−i Aˆ r,i,0 (Ax + Bu)

i=1

− Kˆ r−i

i−1 p=1

⎞2 ⎤⎫ ⎪ ⎬ ⎥ ˆ ⎠ Ar,i,p Bhr−p − hr−i ⎦ . ⎪ ⎭

Then, the control law u(x) must minimise the following sum of r quadratic functions: ⎛ r−1 ⎝Kˆ r−i Aˆ r,i,0 Ax + Kˆ r−i Aˆ r,i,0 Bu (Kx + u)2 + i=1

i=1

⎞2 ⎤⎫ ⎪ i−1 ⎬ ⎥ − Kˆ r−1−i Aˆ r−1,i,p Bhr−1−p − hr−1−i ⎠ ⎦ . ⎪ ⎭

347

− Kˆ r−i

i−1

⎞2

Aˆ r,i,p Bhr−p − hr−i ⎠ .

p=1

Using Lemma 7, and rearranging in the form of u=−Lˆ r x − hˆ r , the minimiser, in the absence of constraints, would be given by ˆ ˆ ˆ ˆ K+ r−1 i=1 Kr−i Ar,i,0 B Kr−i Ar,i,0 A u=− x 2 ˆ ˆ 1+ r−1 i=1 (Kr−i Ar,i,0 B) r−1 i−1 ˆ ˆ ˆ ˆ p=1 Ar,i,p Bhr−p +hr−i ) i=1 Kr−i Ar,i,0 B(Kr−i − − , 2 ˆ ˆ 1+ r−1 i=1 (Kr−i Ar,i,0 B) from where the expressions in (10)–(11) for Lˆ r and hˆ r are obtained. Again, by the convexity of the objective function, the input constraint leads to the expressions of Lr and hr in (14)–(15), since by deﬁnition sat (Lˆ r xN−r + hˆ r ) Lr xN−r + hr . Substituting the minimiser u = −Lr x − hr into the last expression obtained for Vro (x) leads to (17), showing that the validity of the expression for r − 1 implies the validity for r, and this concludes the proof.

348

J.B. Mare, J.A. De Doná / Systems & Control Letters 56 (2007) 342 – 348

A.3. Proof of Theorem 3 Proof. From the system equation (1) and using the control law (16), it is straightforward to obtain that the state xN−r can be expressed as a function of the initial state x0 as follows: N xN −r = (A − BLi ) x0 i=r+1

−

N−r i=1

⎡ ⎣

N−i

⎤ (A − BLp )⎦ BhN−i+1 .

p=r+1

Then, it can be readily seen from the above expression that the different cases in (19) give the linear inequality on x0 corresponding to the linear inequality on xN−r . References [1] R. Bellman, Dynamic Programming, Princeton University Press, Princeton, NJ, USA, 1957. [2] A. Bemporad, M. Morari, V. Dua, E. Pistikopoulos, The explicit linear quadratic regulator for constrained systems, Automatica 38 (2002) 3–20.

[3] J. De Doná, G. Goodwin, Elucidation of the state-space regions wherein model predictive control and anti-windup strategies achieve identical control policies, in: Proceedings of the 2000 American Control Conference, Chicago, Illinois, USA, 2000. [4] J. De Doná, G. Goodwin, M. Seron, Anti-windup and model predictive control: reﬂections and connections, European J. Control 6 (5) (2000) 467–477. [5] J. De Doná, M. Seron, D. Mayne, G. Goodwin, Enlarged terminal sets guaranteeing stability of receding horizon control, Systems Control Lett. 47 (2002) 57–63. [6] P. Grieder, F. Borrelli, F. Torrisi, M. Morari, Computation of the constrained inﬁnite time linear quadratic regulator, Automatica 40 (2004) 701–708. [7] I.A. Johansen, I. Petersen, O. Slupphaug, Explicit sub-optimal linear quadratic regulator with state and input constraints, Automatica 38 (2002) 1099–1111. [8] O. Marjanovic, B. Lennox, P. Goulding, D. Sandoz, Minimising conservativism in inﬁnite-horizon LQR control, Systems Control Lett. 46 (2002) 271–279. [9] D.R. Ramirez, E.F. Camacho, On the piecewise linear nature of min–max model predictive control with bounded uncertainties, in: Proceedings of the IEEE 40th Conference on Decision and Control, Orlando, FL, USA, 2001, pp. 4845–4850. [10] J. Rawlings, K. Muske, Stability of constrained receding horizon control, IEEE Trans. Automat. Control 38 (10) (1993) 1512–1516. [11] M. Seron, G. Goodwin, J. De Doná, Characterisation of receding horizon control for constrained linear systems, Asian J. Control 5 (2003) 271–286.

Solution of the input-constrained LQR problem using dynamic programming

Solution of the input-constrained LQR problem using dynamic programming

Recommend Documents