On the convergence of the Prediction Error Method to its global minimum*

16th IFAC Symposium on System Identification The International Federation of Automatic Control Brussels, Belgium. July 11-13, 2012 On the convergence...

Download PDF

509KB Sizes 0 Downloads 40 Views

Report

PDF Reader
Full Text

16th IFAC Symposium on System Identification The International Federation of Automatic Control Brussels, Belgium. July 11-13, 2012

On the convergence of the Prediction Error Method to its global minimum ⋆ Diego Eckhard ∗ Alexandre S. Bazanella ∗ Cristian R. Rojas H˚akan Hjalmarsson ∗∗

∗∗

∗ Department

of Electrical Engineering, Universidade Federal do Rio Grande do Sul, Brazil (e-mail: diegoeck,[email protected]) ∗∗ School of Electrical Engineering, KTH Royal Institue of Technology, SE-100 44 Stockholm, Sweden (e-mail: crro,[email protected])

Abstract: The Prediction Error Method (PEM) is related to an optimization problem built on input/output data collected from the system to be identiﬁed. It is often hard to ﬁnd the global solution of this optimization problem because the corresponding objective function presents local minima and/or the search space is constrained to a nonconvex set. The existence of local minima, and hence the difﬁculty in solving the optimization, depends mainly on the experimental conditions, more speciﬁcally on the spectrum of the input/output data collected from the system. It is therefore possible to avoid the existence of local minima by properly choosing the spectrum of the input; in this paper we show how to perform this choice. We present sufﬁcient conditions for the convergence of PEM to the global minimum and from these conditions we derive two approaches to avoid the existence of nonglobal minima. We present the application of one of these two approaches to a case study where standard identiﬁcation toolboxes tend to get trapped in nonglobal minima. Keywords: Identiﬁcation; Convergence; Gradient methods 1. INTRODUCTION The Prediction Error Method (PEM) for parameter identiﬁcation uses input-output data collected from the process to form a cost function. The parameters are then estimated as the solution of the optimization of this cost function. Under mild assumptions, the global minimum of the cost function is a consistent estimate of the model’s parameters and the asymptotic variance equals the limit of the Cram´er-Rao Bound. Therefore, identiﬁcation by means of the PEM provides a consistent and otherwise statistically appealing estimate of the system’s parameters and transfer function, provided that the global minimum of the cost function is achieved by the optimization procedure [Ljung, 1999]. One difﬁculty in applying the PEM method is that in many cases achieving the global minimum may prove difﬁcult [Ljung, 2010], for two main reasons: the cost function is usually not convex and the problem is constrained to a nonconvex set - namely, the set of parameters which yield stable predictors. Currently adopted solutions to this problem consist mainly in searching for good initial conditions to initialize the optimization, which is performed with some standard algorithm - steepest descent, Newton-Raphson, Levenberg-Marquardt, and the like. Although there seem to be no ﬁrmly established guarantees that these solutions yield the global minimum, they have been successfully applied for many years. They are able to achieve the global minimum of the objective function in most cases, but failure to do so is not such an uncommon occurrence either. As the model order becomes larger, the trend to get trapped in local minima, thus providing a useless model, seems to grow; this is the case in the case study presented in this paper.

A sufﬁcient condition for the convergence of PEM to the global minimum when the model structure is ARMAX has been presented in [Goodwin et al., 2003]. Conditions for convergence to the global minimum were also obtained when the structure of the model is Output-Error by [Zou and Heath, 2009, S¨oderstr¨om, 1975, Eckhard and Bazanella, 2011a] or ARMA ˚ om and S¨oderstr¨om, 1974]. These results are restricted to [Astr¨ the speciﬁc model structures and, more importantly, the conditions are, although theoretically enlightening, hardly enforceable a priori in a practical setting. In this work we present sufﬁcient conditions for the convergence of PEM to the global minimum that can be embedded in the experiment design framework, so that the resulting cost function is guaranteed not to have local minima. In a sense, our sufﬁcient conditions extend the results presented in [Goodwin et al., 2003, Zou and Heath, 2009] to a general model structure. On the other hand, we present different ways to ensure a priori that the conditions are satisﬁed by properly choosing the input spectrum and/or manipulating the data. We show how the user can design both the experiment used to collect the data and the iterative algorithm to ensure that the global minimum is achieved. The paper is organized as follows. Section 2 presents basic deﬁnitions and the problem formulation, and in Section 3 the steepest descent algorithm is introduced. In Section 4 sufﬁcient conditions about the convergence of the steepest descent algorithm are presented. Section 5 presents two approaches to ensure that the convergence conditions are satisﬁed. Section 6 presents a case study, and concluding remarks are given in Section 7.

⋆ This work was supported in part by CAPES and CNPq/Brasil.

978-3-902823-06-9/12/$20.00 © 2012 IFAC

698

10.3182/20120711-3-BE-2027.00371

16th IFAC Symposium on System Identification Brussels, Belgium. July 11-13, 2012

2. PRELIMINARIES

The gradient of the cost function with respect to θ is: Z n ∗ 1 π 2ℜ H −1 (e jω , θ )(G0 (e jω ) − G(e jω , θ )) 2π − π ∇ H −1 (e jω , θ )(G0 (e jω ) − G(e jω , θ )) |L(e jω )|2 Φu (ω )d ω Z π n ∗ 1 + 2ℜ H0 (e jω )(H −1 (e jω , θ ) − H0−1 (e jω )) 2π − π H0 (e jω )∇H −1 (e jω , θ ) |L(e jω )|2 σe2 d ω

∇J(θ ) =

Consider the identiﬁcation of a linear time-invariant discretetime single-input single-output “real” system S : y(t) = G0 (z)u(t) + v(t), v(t) = H0 (z)e(t).

(1)

In (1) z is the forward-shift operator, G0 (z) is the process’ transfer function, H0 (z) is the noise’s transfer functions, u(t) is the input signal, y(t) is the output signal and e(t) is a zero mean white noise sequence with variance σe2 . The transfer functions G0 (z) and H0 (z) are rational and proper. To be precise, we shall deﬁne S = [G0 (z, θ ) H0 (z, θ )]. The input and output signals are ﬁltered by a linear ﬁlter with transfer function L(z): yL (t) = L(z)y(t)

uL (t) = L(z)u(t)

(2)

and a data set Z N = {uL (1), yL (1), . . . , uL (N), yL (N)} is collected. These data are used to estimate the parameters θ ∈ R p of a model with of the following form: (3) y(t, θ ) = G(z, θ )u(t) + H(z, θ )e(t) where BT (z)θ 1 +CT (z)θ , H(z, θ ) = . (4) G(z, θ ) = T 1 + F (z)θ 1 + DT (z)θ The vectors B(z),C(z), D(z), F(z) ∈ Rn are composed of ﬁxed transfer functions. For any given θ , M(θ ) = [G(z, θ ) H(z, θ )] is called a model, and the collection of all models that can be thus generated is called the model set M M = {M(θ ) : θ ∈ D}. (5) The true system is said to belong to this model set, S ∈ M , if there exists θ0 ∈ D such that M(θ0 ) = S . We assume in this paper that S ∈ M . Let us deﬁne Γ ⊂ D as the set of all θ for which the predictor transfer functions H −1 (z, θ )G(z, θ ) and H −1 (z, θ ) are stable. In a prediction error identiﬁcation framework, a model uniquely deﬁnes the one-step-ahead predictor of given all input/output data up to time. With the proposed model structure we have the following one-step-ahead predictor [Ljung, 1999] yˆL (t, θ ) = H −1 (z, θ )G(z, θ )uL (t) + 1 − H −1 (z, θ ) yL (t) (6) The one-step-ahead prediction error is deﬁned as

ε (t, θ ) , = −H −1 (z, θ ) (G(z, θ ) − G0 (z)) L(z)u(t) +H −1 (z, θ )H0 (z)L(z)e(t).

(7)

and the PEM consists in ﬁnding the parameter θ which minimizes the following cost function J(θ ) = E[(ε (t, θ ))2 ]. Assuming that the data are collected in open loop, using Parseval’s Theorem the cost function can be rewritten as J(θ ) =

1 2π +

Z π −π

2 Φu (ω ) L(e jω )H −1 (e jω , θ )(G0 (e jω ) − G(e jω , θ )) d ω

Z 1 π

2π

−π

2 σe2 L(e jω )H −1 (e jω , θ )H0 (e jω ) d ω

where

A∗ (e jω ) =

3. THE OPTIMIZATION PROBLEM The Prediction Error Method consists in solving the following optimization problem: θˆ = arg min J(θ ). θ ∈Γ

where Γ = {θ : (6) is BIBO − stable}. Generally, it is not easy to solve this problem because the cost function J(θ ) is not convex and the search space Γ is not convex. It is common to use iterative algorithms based on the gradient of the cost function to attempt to achieve the global minimum of the criterion. The algorithm is described by θi+1 = θi − γi ∇J(θi ) (10) where γi > 0 is the step size of the algorithm. More general optimization algorithms can be treated in the same way. In this paper we restrict ourselves to the steepest descent algorithm (10) to save space. We would like such optimization algorithm to converge to the global minimum of the criterion for the largest possible set of initial conditions. A set of initial conditions for which the algorithm converges to the global minimum is called a domain of attraction (DOA) of the algorithm. Deﬁnition 1. Let θ0 be the global minimum of the function J(θ ) : R p → R+ . A set Ω ⊂ R p is a domain of attraction of the algorithm (10) for the function J(θ ) if limi→∞ θi = θ0 ∀θ0 ∈ Ω. The algorithm must be initialized within a DOA of the global minimum and the development of identiﬁcation procedures and software mostly turns around the choice of “good” initial conditions. Several choices of initial conditions are proposed in c the literature. The toolbox ident [Ljung, 2000] for Matlab uses initial conditions based on instrumental-variables while the toolbox unit [Ninness et al., 2005] applies the SteiglitzMcBride method to this end. New developments in existing algorithms tend to follow this line of thought and proposals for improvements usually consist in new ways of ﬁnding better initializations. In this paper we approach the problem from the other end: we try to make the DOA as large as possible. In this approach a fundamental concept is that of a Candidate Domain of Attraction (CDOA) for the function J(θ ) [Bazanella et al., 2012], also known as Decreasing Euclidean Parameter Error Norm (DEPEN) region [Goodwin et al., 2003]). Deﬁnition 2. Let θ0 be the global minimum of the function J(θ ). A set Λ is a Candidate Domain of Attraction for the function J(θ ) if (θ − θ0 )T ∇J(θ ) > 0 ∀θ ∈ Λ, θ 6= θ0

(8)

for all θ ∈ Γ, where Φu (ω ) is the spectrum of the input signal [Ljung, 1999].

(9)

AT (e− jω ).

(11)

This set is called Candidate Domain of Attraction because it can be made a DOA by a proper choice of the step size sequence γi . The issue of actually choosing this sequence can then be treated separately, and has been treated thoroughly in [Eckhard and

699

16th IFAC Symposium on System Identification Brussels, Belgium. July 11-13, 2012

Bazanella, 2011b, 2012]. Such a set has a few properties worth noting. Probably the most relevant property is that inside this set the gradient of the cost function ∇J(θ ) is zero only at the global minimum - there are no local minima or maxima inside it. This means that the only point inside the set Λ to which the algorithm can converge is the global minimum θ0 . Another interesting property reveals that the angle between ∇J(θ ) and the vector (θ − θ0 ) is smaller than π2 rad for all θ inside the set. From this property, if the step size of the algorithm is small enough then at each iteration the algorithm is closer to the global minimum. The next Lemma establishes a relation between the domain of attraction and the candidate domain of attraction. Lemma 1. [Bazanella et al., 2008]. Let θ0 be the global minimum of J(θ ) and deﬁne a set B(θ0 ) = θ : (θ − θ0 )T (θ − θ0 ) < α . If B(θ0 ) ⊂ Γ is a candidate domain of attraction then there is a sequence γi , i = 1, . . . , ∞ such that B(θ0 ) is a DOA of the algorithm (10) for J(θ ). The above lemma shows that for all initial conditions inside the set candidate domain of attraction B(θ0 ) there is a step size sequence γi which ensures the convergence of the algorithm to the global minimum θ0 . Actual convergence also involves the appropriate choice of the sequence γi , an issue which we do not address in this paper. In the next section, we will explore the condition (11) using properties of the cost function’s gradient. We will present sufﬁcient conditions which ensure that a set is a candidate domain of attraction and reveal some techniques used to ensure that these conditions be satisﬁed.

domain of attraction, are the input spectrum Φu (ω ) and the ﬁlter L(z). The next section will present the design of the input spectrum and the ﬁlter in order to enlarge the candidate domain of attraction. But before that, let us explore a little more the result in Theorem 2. Corollary 3. Consider that the model has the structure BT (z)η 1 +CT (z)ξ G(z, η ) = , H(z, ξ ) = , (13) T 1 + F (z)η 1 + DT (z)ξ that θ T = [η ξ ]T , θ0T = [η0 ξ0 ]T and that S ∈ M . The set Λ is a candidate DOA if 1 π

1 π

Z π

−π

2 Φu (ω )|L(e jω )|2 H −1 (e jω , θ )(G0 (e jω ) − G(e jω , θ )) 1 +CT (e jω )θ0 1 + F T (e jω )θ0 1 + DT (e jω )θ0 ℜ + − 1 +CT (e jω )θ 1 + F T (e jω )θ 1 + DT (e jω )θ 2 2 jω 2 −1 jω jω + σe |L(e )| H0 (e )(H (e , θ ) − H0−1 (e jω )) 1 +CT (e jω )θ0 ℜ d ω > 0 ∀θ ∈ Λ, θ 6= θ0 . T j ω 1 +C (e )θ

2 Φu LH −1 (ξ )(G0 − G(η )) ℜ

∀θ ∈ Λ, θ 6= θ0 .

(14)

Proof: The proof of this corollary is similar to the proof of Theorem 2 and will be omitted for reason of space. Corollary 3 speciﬁes Theorem 2 to the case where the model structure has independent parameters. The most common structures which share this property are the Box-Jenkins and OutputError, but they are not the only ones. Remark 1. If the model has an ARMAX (Auto Regressive Moving Average eXogenous) structure so that F(e jω ) = D(e jω ), the expression (12) is simpliﬁed to Z π h

2 Φu (ω ) L(e jω )H −1 (e jω , θ )(G0 (e jω ) − G(e jω , θ )) (15) −π i 2 1 +CT (e jω )θ0 +σe2 L(e jω )H0 (e jω )(H −1 (e jω , θ ) − H0−1 (e jω )) ℜ dω . 1 +CT (e jω )θ 1 π

4. CONVERGENCE CONDITIONS The candidate domain of attraction is an important tool to ensure the convergence of the algorithm. If the initial condition is inside this set then it is easy to devise an algorithm that will converge to the global minimum of the cost function. It is highly desirable to understand which factors are related to the size of this set, because then it is possible to design this factors in order to ensure that the initial condition lies inside the set. The next theorem explores the condition (11) using (9). Theorem 2. Consider that the model has the structure (4) and that S ∈ M . The set Λ is a candidate DOA if

1 + F T η0 dω + 1 + FT η −π Z π 2 1 +CT ξ0 1 σe2 LH0 (H −1 (ξ ) − H0−1 ) ℜ dω > 0 T π −π 1 +C ξ

Z π

All the previous results show that a set is a candidate domain of attraction if a speciﬁc positivity condition is satisﬁed. Each of of these conditions is presented as an integral whose integrand is composed of several terms. All of them are non-negative, with the exception of the real part of some transfer function. Note that if the real part were positive, the integrals would also be also positive and the condition would be satisﬁed. The convergence of general algorithms is frequently studied analyzing SPR (Strictly Positive Real) conditions, as the presented here. The idea is not new, and it has already been used in the context of recursive system identiﬁcation, e.g. [Riedle et al., 1986]. 5. HOW TO ENFORCE THE CONVERGENCE CONDITIONS

(12)

Proof: In the appendix. Theorem 2 presents sufﬁcient conditions which ensure that a set Λ is a candidate domain of attraction. This theorem extends the results presented in [Goodwin et al., 2003, Zou and Heath, 2009] to a more general structure. The condition (12) expresses the relation between the candidate domain of attraction and the several factors that are involved in the identiﬁcation: the transfer functions of the model G0 (z) and H0 (z), the model structure G(z, θ ) and H(z, θ ), the input spectrum Φu (ω ), the variance of the noise σe and the ﬁlter L(z). The factors the user can design, in order to shape the candidate

We now present two different ways of applying the convergence conditions presented in the previous Section in order to obtain large enough candidate domains of attraction. The ﬁrst one consists in embedding the convergence conditions into the experiment design framework, thus obtaining an input that will result in the desired CDOA when applied to the system. The second one consists in manipulating the data already obtained from a given experiment by means of a sequence of ﬁlters. 5.1 Experiment Design Experiment Design consists in choosing the input spectrum to be applied in the identiﬁcation experiment. The least costly experiment design is generally posed as an optimization problem [Lindqvist and Hjalmarsson, 2001]

700

16th IFAC Symposium on System Identification Brussels, Belgium. July 11-13, 2012

min energy of the input signal

Φu ( ω )

s.t. constraints. where several constraints may apply. It is usual to include constraints related to the estimate error, the frequency distribution of the input signal and the application of the model. The optimization problem as posed is not convex and is inﬁnite dimensional [Jansson and Hjalmarsson, 2005]. However, it is possible to parametrize the input spectrum and to describe the constraints such that the problem becomes convex and ﬁnite dimensional. The input spectrum is parametrized as M

Φu (ω ) =

∑ ck βk (e jω ) k=1

where βk (e jω ), k = 1, . . . , M are basis functions and ck , k = 1, . . . , M are the parameters to be optimized. Condition (12) can be used as an additional constraint to the experiment design. The user chooses a set Λ and the experiment design will select the values ck which satisﬁes the (12), thus assuring that Λ will be a CDOA for the PEM. However, this condition is not convex and some approximation is necessary if we want to use it as a constraint to the experiment design. Note that for any ﬁxed θ the condition (12) is convex; the problem becomes not convex because we want to ensure the condition for all θ ∈ Λ, θ 6= θ0 . On way to approach this issue is the Scenario Approach [Campi et al., 2009]. This technique proposes that Q randomly generated values of θ ∈ Λ should be selected, and Q respective conditions (12) with ﬁxed θ should be used in the experiment design instead of the original condition. The number Q should be very large such that the set of the points could almost represent the whole set Λ. The article [Campi et al., 2009] describes precisely this random approach, and gives the probability that the set of conditions represent the original condition (12) as a function of the number Q of conditions. So, condition (12) is replaced by M

φ (θi ) + ∑ ck ψk (θi ) > 0 i = 1, . . . , Q

(16)

k=1

where ψk (θ ) =

Z 1 π

π

−π

2 βk LH −1 (θ )(G0 − G(θ ))

1 +CT θ0 1 + F T θ0 1 + DT θ0 + − dω T T T 1 +C θ 1+F θ 1+D θ Z 2 1 +CT θ0 1 π φ (θ ) = σe2 dω LH0 (H −1 (θ ) − H0−1 ) ℜ T π −π 1 +C θ

ℜ

and the θi is selected randomly such that θi ∈ Λ. Note that (16) is a convex condition which can be used as a constraint in the experiment design formulation.

∆ 1 +CT (e jω )θ0 ℜ K(e jω , θ ) = ℜ . 1 +CT (e jω )θ Thus, whether or not a a set is a candidate domain of attraction depends on the function K(e jω ). This transfer function can be rewritten as (1 + p10 e− jω )(1 + p20 e− jω ) · · · (1 + pn0 e− jω ) K(e jω , θ ) = . (1 + p1 e− jω )(1 + p2 e− jω ) · · · (1 + pn e− jω ) where pi are the n zeros of the polymonial (1 +CT (z)θ ) and pi0 are the n zeros of the polymonial (1 + CT (z)θ0 ). The response at frequency ω = 0 is (1 + p10 )(1 + p20 ) · · · (1 + pn0 ) K(e j0 , θ ) = (1 + p1 )(1 + p2 ) · · · (1 + pn ) and the response at frequency ω = π is (1 − p10 )(1 − p20 ) · · · (1 − pn0 ) . K(e jπ , θ ) = (1 − p1 )(1 − p2 ) · · · (1 − pn ) Therefore, If all poles pi and zeros pi0 of the transfer function K(e jω , θ ) are inside the unit circle, both K(e j0 , θ ) and K(e jπ , θ ) are real and positive. Now we can obtain the following lemma. Lemma 4. For all θ ∈ Γ, ∃ωl , ωh such that: ℜ K(e jω , θ ) > 0 0 ≤ ω ≤ ωl , ℜ K(e jω , θ ) > 0 ωh ≤ ω ≤ π .

(17)

This result follows from the continuity of the function K(e jω , θ ). According to the previous result, the real part of the transfer function K(e jω , θ ) is positive for sufﬁciently low frequencies and for sufﬁciently high frequencies. Hence, if the ﬁlter L(z) has high gains on the low and high frequencies and has low gain for the intermediate frequencies, then (15) is positive. This property does not depend on the parameters of the true system: G0 (z), H0 (z) and σe - even though the exact values of ωl and ωh do. The same result applies to condition (14), so it extends to the classical model classes: BJ, OE, ARMAX. This result shows that if the ﬁlter L(z) is designed to have high gains on the high and low frequencies then the conditions (15) or (14) are satisﬁed for all θ ∈ Γ. The result is surprisingly simple but of course it does not come for free. By ﬁltering out the signals measured from the system its total energy will be reduced, thus increasing the variance of the resulting estimate. This variance increase tends to be large, since the frequencies that have been removed are the intermediate ones, which typically carry a signiﬁcant part of the information about the system’s behavior. To overcome this problem, we suggest the use of an iterative ﬁltering, as described in the following algorithm.

5.2 The ﬁlter L(z) The design of the ﬁlter L(z) through which the data are ﬁltered can also be used in shaping CDOAs. In order to save space, we will present the design of the ﬁlter considering that the model has an ARMAX structure, as described on the Remark 1, but the results can be easily extended to other structures. Notice that condition (15) is an integral whose integrand is composed by several terms, all of which are non-negative, with the exception of the term

701

(1) Collect a set of input-output data (y(t) and u(t)) from process and set k = 0; (2) Perform an estimation of θ 0 using the instrumental variables or the Stieglitz-McBride method; (3) Choose a low-pass ﬁlter Lk (z) with very low passband; (4) Compute uL = Lk (z)u(t) and yL = Lk (z)y(t); (5) Perform the identiﬁcation with the algorithm (10), the data uL (t) and yL (t) and the initial condition θ k . Set θ k+1 as the solution of the identiﬁcation procedure; (6) Choose a low-pass ﬁlter Lk+1 (z) with passband larger than Lk (z);

16th IFAC Symposium on System Identification Brussels, Belgium. July 11-13, 2012

(7) Set k=k+1 and go to step 4. After some iterations, the ﬁlter Lk (z) can be set as Lk = 1, meaning that the signals are not ﬁltered and that we are minimizing the desired original cost function J(θ ). The main idea of this algorithm is that if Lk (z) is similar to Lk+1 (z) then the global minimum of the respective cost functions are also similar, and the global minimum θ k is a good initial condition to the following optimization with date ﬁltered by Lk+1 . This algorithm converges to the global minimum of J(θ ), if the ﬁlters Lk (z) are close enough. 6. CASE STUDY

vector θ and each abscissa corresponds to one Monte Carlo run. The Monte-Carlo runs were re-ordered in terms of the smallest θ . It is seen that the ﬁrst 45 points of the graph present values close to the real parameter value θ0 , indicating that in these 45 runs the algorithm converged to the global minimum of the cost function. But in the remaining 55 Monte Carlo runs the values are close to another point in the parameter space, far away from the real value θ0 , indicating that the algorithm has yielded a local (nonglobal) minimum of the cost function. When the toolbox ident was used only 17 out of the 100 Monte Carlo runs have converged to the global minimum of the cost function.

Consider the “real” system described by −5

+3.061z

+ 0.1815z −6

− 1.045z

−3

+ 1.352z

−7

− 0.07957z

40

−4

− 3.356z

30

+ 0.0944z−8 /

20

1 − 6.549z−1 + 19.2z−2 − 32.83z−3 + 35.77z−4 − 25.39z−5 −6

+11.45z

H0 (z) = 1,

σe2

−7

− 3z

−8

10

+ 0.3491z

,

i

G0 (z) = −0.2073z

−2

θ

−1

= 0.01.

0 −10

This model represents a power system consisting of two synchronous generators connected to the grid, with data taken from [Anderson and Fouad, 1977]. The transfer function G0 (z) describes the relation between the ﬁeld voltage applied to the ﬁrst generator and the angular speed of the same generator at a given operating condition. Let us identify a model to this real system with the Output Error structure BT (z)θ G(z, θ ) = H(z, θ ) = 1 1 + F T (z)θ where B(z) = [z−1 z−2 z−3 z−4 z−5 z−6 z−7 z−8 0 0 0 0 0 0 0 0 ]T ,

−20 −30 −40 0

20

40 60 Monte−Carlo run

80

100

Fig. 1. Monte-Carlo runs with the toolbox unit. In Figure 2 we present the same graphic for the proposed algorithm, where it is seen that convergence to the global minimum has been achieved at every run.

F(z) = [0 0 0 0 0 0 0 0 z−1 z−2 z−3 z−4 z−5 z−6 z−7 z−8 ]T .

40

The real system belongs to the model set deﬁned by this model structure, that is, G0 (z) = G(z, θ0 ) with the “real parameter” θ0 given by

30

20

θ0 = [−0.2073 0.1815 1.352 − 3.356 3.061 − 1.045 − 0.07957 0.0944

10

θi

− 6.549 19.2 − 32.83 35.77 − 25.39 11.45 − 3 0.3491]T

In order to avoid the existence of local minima in the cost function, the second approach proposed in this paper has been applied. At each iteration the input/output data were ﬁltered by a 20th order FIR ﬁlter. Six iterations of the proposed algorithm were used, where the cut-off frequencies of the ﬁlters were 0.4, 0.5, 0.6, 0.7, 0.8 and 0.9 respectively.

0

−10

−20

−30

−40

For comparison, the parameter θ has also been estimated using the Matlab toolbox ident and the Matlab toolbox unit, besides the algorithm proposed in Subsection 5.2. In the three cases the same input signal u(t) has been applied to generate the data: a white noise sequence with unit variance (σu2 = 1) and data length N = 1, 000 samples. With this input the variance of the output signal is approximately σy2 = 100. For each one of the three algorithms, 100 Monte-Carlo runs have been performed, thus providing 100 parameter estimates with each method. The results obtained with the toolbox unit are summarized in Figure 1, where the ordinate axis presents the value of each one of the sixteen elements of the parameter

0

20

40 60 Monte−Carlo run

80

100

Fig. 2. Monte-Carlo runs with the proposed algorithm. In order to further illustrate the results, let us also compute, for 1 ˆ each method, the mean value θm = 100 ∑100 i=1 θi of the model’s ˆ parameters, where each θi represents the estimate obtained at the i − th Monte Carlo run. The mean value of the model’s parameters obtained with the toolbox unit, the toolbox ident and the proposed method are, respectively:

702

16th IFAC Symposium on System Identification Brussels, Belgium. July 11-13, 2012

θmunit = [−0.2071 −0.0360 1.3298 −1.9507 0.9517 −0.0396 − 0.0378 − 0.0276 − 5.5050 13.3882 − 18.5887 15.9136 − 8.4405 2.6183 − 0.4008 0.0164]T

θmident = [−0.2065 − 0.2952 0.8922 − 0.5885 0.3352 − 0.2073 − 0.0484 0.0841 − 4.2033 7.9007 − 8.7065 6.4202 − 3.5476 1.5685 − 0.5030 0.0858]T = [−0.2073 0.1794 1.3472 − 3.3269 3.0156 − 1.0138 − 0.0910 0.0974 − 6.5355 19.1230 − 32.6493 35.5180

θmprop

− 25.1770 11.3441 − 2.9679 0.3450]T Comparing these values with the real parameter θ0 we see that the estimate obtained with the proposed method is, in the average, much closer to the real parameter value. 7. CONCLUSION This paper presented sufﬁcient conditions for the convergence of the Prediction Error Method to the global minimum of the cost function. The spectra of the signals are key factors in these convergence conditions. We have proposed two different approaches to design the spectra and ensure that the convergence conditions are respected. The ﬁrst approach is based on least costly experiment design, where convex constraints were imposed. The second approach proposes iterative ﬁltering as a solution to the problem. A case study shows that this approach to eliminate local (nonglobal) minima from the PEM cost function can be very effective in guaranteeing convergence of the PEM to its global optimum.

H. Jansson and H. Hjalmarsson. Input design via lmis admitting frequency-wise model speciﬁcations in conﬁdence regions. Automatic Control, IEEE Transactions on, 50(10): 1534–1549, oct. 2005. K. Lindqvist and H. Hjalmarsson. Identiﬁcation for control: adaptive input design using convex optimization. In Decision and Control, 2001. Proceedings of the 40th IEEE Conference on, volume 5, pages 4326–4331, 2001. L. Ljung. System identiﬁcation: theory for the user. Prentice Hall, Upper Saddle River, NJ, EUA, 1999. L. Ljung. System Identiﬁcation Toolbox - for use with MATLAB, User’s Guide. The Mathworks, 4th edition edition, 2000. L. Ljung. Perspectives on system identiﬁcation. Annual Reviews in Control, 34(1):1–12, 2010. B. Ninness, A. Wills, and S. Gibson. The university of newcastle identiﬁcation toolbox (unit). In IFAC World Congress, pages 1–6, jul 2005. B. Riedle, L. Praly, and P. Kokotovic. Examination of the spr condition in output error parameter estimation. Automatica, 22(4):495–498, 1986. ISSN 0005-1098. doi: http://dx.doi.org/10.1016/0005-1098(86)90055-5. T. S¨oderstr¨om. On the uniqueness of maximum likelihood identiﬁcation. Automatica, 11(2):193–197, 1975. Y. Zou and W. P. Heath. Conditions for attaining global minimum in maximum likelihood system identiﬁcation. In Proc. 15th IFAC Symposium on System Identiﬁcationl, pages 1110–1115, Saint-Malo, France, July 2009. Appendix A Proof of Theorem 2: First note that

REFERENCES P.M. Anderson and A.A. Fouad. Power System Control and Stability. IEEE Press, 1977. ˚ om and T. S¨oderstr¨om. Uniqueness of the maximum K. J. Astr¨ likelihood estimates of the parameters of an arma model. IEEE Transactions on Automatic Control, 18(6):769–773, 1974. A. S. Bazanella, M. Gevers, L. Miskovic, and B. D. O. Anderson. Iterative minimization of H2 control performance criteria. Automatica, 44(10):2549–2559, 2008. A. S. Bazanella, L. Campestrini, and D. Eckhard. Data-driven Controller Design: The H2 Approach. Springer, Netherlands, 2012. M. C. Campi, S. Garatti, and M. Prandini. The scenario approach for systems and control design. Annual Reviews in Control, 33(2):149 – 157, 2009. D. Eckhard and A. S. Bazanella. On the global convergence of identiﬁcation of output error models. In Proc. 18th IFAC World congress, Milan, Italy, 2011a. D. Eckhard and A. S. Bazanella. Optimizing the convergence of data-based controller tuning. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering, 226(4):563–574, 2012. D. Eckhard and A.S. Bazanella. Robust convergence of the steepest descent method for data-based control. International Journal of Systems Science, iFirst:1–7, 2011b. G. C. Goodwin, J. C. Ag¨uero, and Robert Skelton. Conditions for local convergence of maximum likelihood estimation for armax models. In Proc. 13th IFAC Symposium on System Identiﬁcation, pages 797–802, Rotterdam, Holand, August 2003.

∇G(e jω , θ ) =

B(e jω ) BT (e jω )θ F(e jω ) − 1 + F T (e jω )θ (1 + F T (e jω )θ )2

and ∇H −1 (e jω , θ ) =

D(e jω ) (1 + DT (e jω )θ )C(e jω ) − . 1 +CT (e jω )θ (1 +CT (e jω )θ )2

So, we can write

( ∗ Z 1 + DT (e jω )θ 1 π BT (e jω )θ Φu (ω )|L(e jω )|2 ℜ G0 (e jω ) − π −π 1 +CT (e jω )θ 1 + F T (e jω )θ (1 + DT (e jω )θ )C(e jω ) BT (e jω )θ D(e jω ) − G0 (e jω ) − 1 +CT (e jω )θ 1 + F T (e jω )θ (1 +CT (e jω )θ )2 1 + DT (e jω )θ B(e jω ) BT (e jω )θ F(e jω ) − − 1 +CT (e jω )θ 1 + F T (e jω )θ (1 + F T (e jω )θ )2 n ∗ + σe2 |L(e jω )|2 ℜ H0 (e jω )(H −1 (e jω , θ ) − H0−1 (e jω )) D(e jω ) (1 + DT (e jω )θ )C(e jω ) H0 (e jω ) dω − 1 +CT (e jω )θ (1 +CT (e jω )θ )2

∇J(θ ) =

Using the assumption S ∈ M we can write Z 2 1 π Φu (ω ) L(e jω )H −1 (e jω , θ )(G0 (e jω ) − G(e jω , θ )) π −π 1 + F T (e jω )θ0 1 + DT (e jω )θ0 1 +CT (e jω )θ0 + − ℜ 1 +CT (e jω )θ 1 + F T (e jω )θ 1 + DT (e jω )θ 2 1 +CT (e jω )θ 0 + σe2 L(e jω )H0 (e jω )(H −1 (e jω , θ ) − H0−1 (e jω )) ℜ dω > 0 1 +CT θ

(θ − θ0 )T ∇J(θ ) =

Using the condition of the theorem we ensure that

703

(θ − θ0 )T ∇J > 0, ∀θ ∈ Λ, θ 6= θ0 .

On the convergence of the Prediction Error Method to its global minimum*

On the convergence of the Prediction Error Method to its global minimum*

Recommend Documents