16th IFAC Symposium on System Identification The International Federation of Automatic Control Brussels, Belgium. July 11-13, 2012
On the convergence of the Prediction Error Method to its global minimum ⋆ Diego Eckhard ∗ Alexandre S. Bazanella ∗ Cristian R. Rojas H˚akan Hjalmarsson ∗∗
∗∗
∗ Department
of Electrical Engineering, Universidade Federal do Rio Grande do Sul, Brazil (e-mail: diegoeck,
[email protected]) ∗∗ School of Electrical Engineering, KTH Royal Institue of Technology, SE-100 44 Stockholm, Sweden (e-mail: crro,
[email protected])
Abstract: The Prediction Error Method (PEM) is related to an optimization problem built on input/output data collected from the system to be identified. It is often hard to find the global solution of this optimization problem because the corresponding objective function presents local minima and/or the search space is constrained to a nonconvex set. The existence of local minima, and hence the difficulty in solving the optimization, depends mainly on the experimental conditions, more specifically on the spectrum of the input/output data collected from the system. It is therefore possible to avoid the existence of local minima by properly choosing the spectrum of the input; in this paper we show how to perform this choice. We present sufficient conditions for the convergence of PEM to the global minimum and from these conditions we derive two approaches to avoid the existence of nonglobal minima. We present the application of one of these two approaches to a case study where standard identification toolboxes tend to get trapped in nonglobal minima. Keywords: Identification; Convergence; Gradient methods 1. INTRODUCTION The Prediction Error Method (PEM) for parameter identification uses input-output data collected from the process to form a cost function. The parameters are then estimated as the solution of the optimization of this cost function. Under mild assumptions, the global minimum of the cost function is a consistent estimate of the model’s parameters and the asymptotic variance equals the limit of the Cram´er-Rao Bound. Therefore, identification by means of the PEM provides a consistent and otherwise statistically appealing estimate of the system’s parameters and transfer function, provided that the global minimum of the cost function is achieved by the optimization procedure [Ljung, 1999]. One difficulty in applying the PEM method is that in many cases achieving the global minimum may prove difficult [Ljung, 2010], for two main reasons: the cost function is usually not convex and the problem is constrained to a nonconvex set - namely, the set of parameters which yield stable predictors. Currently adopted solutions to this problem consist mainly in searching for good initial conditions to initialize the optimization, which is performed with some standard algorithm - steepest descent, Newton-Raphson, Levenberg-Marquardt, and the like. Although there seem to be no firmly established guarantees that these solutions yield the global minimum, they have been successfully applied for many years. They are able to achieve the global minimum of the objective function in most cases, but failure to do so is not such an uncommon occurrence either. As the model order becomes larger, the trend to get trapped in local minima, thus providing a useless model, seems to grow; this is the case in the case study presented in this paper.
A sufficient condition for the convergence of PEM to the global minimum when the model structure is ARMAX has been presented in [Goodwin et al., 2003]. Conditions for convergence to the global minimum were also obtained when the structure of the model is Output-Error by [Zou and Heath, 2009, S¨oderstr¨om, 1975, Eckhard and Bazanella, 2011a] or ARMA ˚ om and S¨oderstr¨om, 1974]. These results are restricted to [Astr¨ the specific model structures and, more importantly, the conditions are, although theoretically enlightening, hardly enforceable a priori in a practical setting. In this work we present sufficient conditions for the convergence of PEM to the global minimum that can be embedded in the experiment design framework, so that the resulting cost function is guaranteed not to have local minima. In a sense, our sufficient conditions extend the results presented in [Goodwin et al., 2003, Zou and Heath, 2009] to a general model structure. On the other hand, we present different ways to ensure a priori that the conditions are satisfied by properly choosing the input spectrum and/or manipulating the data. We show how the user can design both the experiment used to collect the data and the iterative algorithm to ensure that the global minimum is achieved. The paper is organized as follows. Section 2 presents basic definitions and the problem formulation, and in Section 3 the steepest descent algorithm is introduced. In Section 4 sufficient conditions about the convergence of the steepest descent algorithm are presented. Section 5 presents two approaches to ensure that the convergence conditions are satisfied. Section 6 presents a case study, and concluding remarks are given in Section 7.
⋆ This work was supported in part by CAPES and CNPq/Brasil.
978-3-902823-06-9/12/$20.00 © 2012 IFAC
698
10.3182/20120711-3-BE-2027.00371
16th IFAC Symposium on System Identification Brussels, Belgium. July 11-13, 2012
2. PRELIMINARIES
The gradient of the cost function with respect to θ is: Z n ∗ 1 π 2ℜ H −1 (e jω , θ )(G0 (e jω ) − G(e jω , θ )) 2π − π ∇ H −1 (e jω , θ )(G0 (e jω ) − G(e jω , θ )) |L(e jω )|2 Φu (ω )d ω Z π n ∗ 1 + 2ℜ H0 (e jω )(H −1 (e jω , θ ) − H0−1 (e jω )) 2π − π H0 (e jω )∇H −1 (e jω , θ ) |L(e jω )|2 σe2 d ω
∇J(θ ) =
Consider the identification of a linear time-invariant discretetime single-input single-output “real” system S : y(t) = G0 (z)u(t) + v(t), v(t) = H0 (z)e(t).
(1)
In (1) z is the forward-shift operator, G0 (z) is the process’ transfer function, H0 (z) is the noise’s transfer functions, u(t) is the input signal, y(t) is the output signal and e(t) is a zero mean white noise sequence with variance σe2 . The transfer functions G0 (z) and H0 (z) are rational and proper. To be precise, we shall define S = [G0 (z, θ ) H0 (z, θ )]. The input and output signals are filtered by a linear filter with transfer function L(z): yL (t) = L(z)y(t)
uL (t) = L(z)u(t)
(2)
and a data set Z N = {uL (1), yL (1), . . . , uL (N), yL (N)} is collected. These data are used to estimate the parameters θ ∈ R p of a model with of the following form: (3) y(t, θ ) = G(z, θ )u(t) + H(z, θ )e(t) where BT (z)θ 1 +CT (z)θ , H(z, θ ) = . (4) G(z, θ ) = T 1 + F (z)θ 1 + DT (z)θ The vectors B(z),C(z), D(z), F(z) ∈ Rn are composed of fixed transfer functions. For any given θ , M(θ ) = [G(z, θ ) H(z, θ )] is called a model, and the collection of all models that can be thus generated is called the model set M M = {M(θ ) : θ ∈ D}. (5) The true system is said to belong to this model set, S ∈ M , if there exists θ0 ∈ D such that M(θ0 ) = S . We assume in this paper that S ∈ M . Let us define Γ ⊂ D as the set of all θ for which the predictor transfer functions H −1 (z, θ )G(z, θ ) and H −1 (z, θ ) are stable. In a prediction error identification framework, a model uniquely defines the one-step-ahead predictor of given all input/output data up to time. With the proposed model structure we have the following one-step-ahead predictor [Ljung, 1999] yˆL (t, θ ) = H −1 (z, θ )G(z, θ )uL (t) + 1 − H −1 (z, θ ) yL (t) (6) The one-step-ahead prediction error is defined as
ε (t, θ ) , = −H −1 (z, θ ) (G(z, θ ) − G0 (z)) L(z)u(t) +H −1 (z, θ )H0 (z)L(z)e(t).
(7)
and the PEM consists in finding the parameter θ which minimizes the following cost function J(θ ) = E[(ε (t, θ ))2 ]. Assuming that the data are collected in open loop, using Parseval’s Theorem the cost function can be rewritten as J(θ ) =
1 2π +
Z π −π
2 Φu (ω ) L(e jω )H −1 (e jω , θ )(G0 (e jω ) − G(e jω , θ )) d ω
Z 1 π
2π
−π
2 σe2 L(e jω )H −1 (e jω , θ )H0 (e jω ) d ω
where
A∗ (e jω ) =
3. THE OPTIMIZATION PROBLEM The Prediction Error Method consists in solving the following optimization problem: θˆ = arg min J(θ ). θ ∈Γ
where Γ = {θ : (6) is BIBO − stable}. Generally, it is not easy to solve this problem because the cost function J(θ ) is not convex and the search space Γ is not convex. It is common to use iterative algorithms based on the gradient of the cost function to attempt to achieve the global minimum of the criterion. The algorithm is described by θi+1 = θi − γi ∇J(θi ) (10) where γi > 0 is the step size of the algorithm. More general optimization algorithms can be treated in the same way. In this paper we restrict ourselves to the steepest descent algorithm (10) to save space. We would like such optimization algorithm to converge to the global minimum of the criterion for the largest possible set of initial conditions. A set of initial conditions for which the algorithm converges to the global minimum is called a domain of attraction (DOA) of the algorithm. Definition 1. Let θ0 be the global minimum of the function J(θ ) : R p → R+ . A set Ω ⊂ R p is a domain of attraction of the algorithm (10) for the function J(θ ) if limi→∞ θi = θ0 ∀θ0 ∈ Ω. The algorithm must be initialized within a DOA of the global minimum and the development of identification procedures and software mostly turns around the choice of “good” initial conditions. Several choices of initial conditions are proposed in c the literature. The toolbox ident [Ljung, 2000] for Matlab uses initial conditions based on instrumental-variables while the toolbox unit [Ninness et al., 2005] applies the SteiglitzMcBride method to this end. New developments in existing algorithms tend to follow this line of thought and proposals for improvements usually consist in new ways of finding better initializations. In this paper we approach the problem from the other end: we try to make the DOA as large as possible. In this approach a fundamental concept is that of a Candidate Domain of Attraction (CDOA) for the function J(θ ) [Bazanella et al., 2012], also known as Decreasing Euclidean Parameter Error Norm (DEPEN) region [Goodwin et al., 2003]). Definition 2. Let θ0 be the global minimum of the function J(θ ). A set Λ is a Candidate Domain of Attraction for the function J(θ ) if (θ − θ0 )T ∇J(θ ) > 0 ∀θ ∈ Λ, θ 6= θ0
(8)
for all θ ∈ Γ, where Φu (ω ) is the spectrum of the input signal [Ljung, 1999].
(9)
AT (e− jω ).
(11)
This set is called Candidate Domain of Attraction because it can be made a DOA by a proper choice of the step size sequence γi . The issue of actually choosing this sequence can then be treated separately, and has been treated thoroughly in [Eckhard and
699
16th IFAC Symposium on System Identification Brussels, Belgium. July 11-13, 2012
Bazanella, 2011b, 2012]. Such a set has a few properties worth noting. Probably the most relevant property is that inside this set the gradient of the cost function ∇J(θ ) is zero only at the global minimum - there are no local minima or maxima inside it. This means that the only point inside the set Λ to which the algorithm can converge is the global minimum θ0 . Another interesting property reveals that the angle between ∇J(θ ) and the vector (θ − θ0 ) is smaller than π2 rad for all θ inside the set. From this property, if the step size of the algorithm is small enough then at each iteration the algorithm is closer to the global minimum. The next Lemma establishes a relation between the domain of attraction and the candidate domain of attraction. Lemma 1. [Bazanella et al., 2008]. Let θ0 be the global minimum of J(θ ) and define a set B(θ0 ) = θ : (θ − θ0 )T (θ − θ0 ) < α . If B(θ0 ) ⊂ Γ is a candidate domain of attraction then there is a sequence γi , i = 1, . . . , ∞ such that B(θ0 ) is a DOA of the algorithm (10) for J(θ ). The above lemma shows that for all initial conditions inside the set candidate domain of attraction B(θ0 ) there is a step size sequence γi which ensures the convergence of the algorithm to the global minimum θ0 . Actual convergence also involves the appropriate choice of the sequence γi , an issue which we do not address in this paper. In the next section, we will explore the condition (11) using properties of the cost function’s gradient. We will present sufficient conditions which ensure that a set is a candidate domain of attraction and reveal some techniques used to ensure that these conditions be satisfied.
domain of attraction, are the input spectrum Φu (ω ) and the filter L(z). The next section will present the design of the input spectrum and the filter in order to enlarge the candidate domain of attraction. But before that, let us explore a little more the result in Theorem 2. Corollary 3. Consider that the model has the structure BT (z)η 1 +CT (z)ξ G(z, η ) = , H(z, ξ ) = , (13) T 1 + F (z)η 1 + DT (z)ξ that θ T = [η ξ ]T , θ0T = [η0 ξ0 ]T and that S ∈ M . The set Λ is a candidate DOA if 1 π
1 π
Z π
−π
2 Φu (ω )|L(e jω )|2 H −1 (e jω , θ )(G0 (e jω ) − G(e jω , θ )) 1 +CT (e jω )θ0 1 + F T (e jω )θ0 1 + DT (e jω )θ0 ℜ + − 1 +CT (e jω )θ 1 + F T (e jω )θ 1 + DT (e jω )θ 2 2 jω 2 −1 jω jω + σe |L(e )| H0 (e )(H (e , θ ) − H0−1 (e jω )) 1 +CT (e jω )θ0 ℜ d ω > 0 ∀θ ∈ Λ, θ 6= θ0 . T j ω 1 +C (e )θ
2 Φu LH −1 (ξ )(G0 − G(η )) ℜ
∀θ ∈ Λ, θ 6= θ0 .
(14)
Proof: The proof of this corollary is similar to the proof of Theorem 2 and will be omitted for reason of space. Corollary 3 specifies Theorem 2 to the case where the model structure has independent parameters. The most common structures which share this property are the Box-Jenkins and OutputError, but they are not the only ones. Remark 1. If the model has an ARMAX (Auto Regressive Moving Average eXogenous) structure so that F(e jω ) = D(e jω ), the expression (12) is simplified to Z π h
2 Φu (ω ) L(e jω )H −1 (e jω , θ )(G0 (e jω ) − G(e jω , θ )) (15) −π i 2 1 +CT (e jω )θ0 +σe2 L(e jω )H0 (e jω )(H −1 (e jω , θ ) − H0−1 (e jω )) ℜ dω . 1 +CT (e jω )θ 1 π
4. CONVERGENCE CONDITIONS The candidate domain of attraction is an important tool to ensure the convergence of the algorithm. If the initial condition is inside this set then it is easy to devise an algorithm that will converge to the global minimum of the cost function. It is highly desirable to understand which factors are related to the size of this set, because then it is possible to design this factors in order to ensure that the initial condition lies inside the set. The next theorem explores the condition (11) using (9). Theorem 2. Consider that the model has the structure (4) and that S ∈ M . The set Λ is a candidate DOA if
1 + F T η0 dω + 1 + FT η −π Z π 2 1 +CT ξ0 1 σe2 LH0 (H −1 (ξ ) − H0−1 ) ℜ dω > 0 T π −π 1 +C ξ
Z π
All the previous results show that a set is a candidate domain of attraction if a specific positivity condition is satisfied. Each of of these conditions is presented as an integral whose integrand is composed of several terms. All of them are non-negative, with the exception of the real part of some transfer function. Note that if the real part were positive, the integrals would also be also positive and the condition would be satisfied. The convergence of general algorithms is frequently studied analyzing SPR (Strictly Positive Real) conditions, as the presented here. The idea is not new, and it has already been used in the context of recursive system identification, e.g. [Riedle et al., 1986]. 5. HOW TO ENFORCE THE CONVERGENCE CONDITIONS
(12)
Proof: In the appendix. Theorem 2 presents sufficient conditions which ensure that a set Λ is a candidate domain of attraction. This theorem extends the results presented in [Goodwin et al., 2003, Zou and Heath, 2009] to a more general structure. The condition (12) expresses the relation between the candidate domain of attraction and the several factors that are involved in the identification: the transfer functions of the model G0 (z) and H0 (z), the model structure G(z, θ ) and H(z, θ ), the input spectrum Φu (ω ), the variance of the noise σe and the filter L(z). The factors the user can design, in order to shape the candidate
We now present two different ways of applying the convergence conditions presented in the previous Section in order to obtain large enough candidate domains of attraction. The first one consists in embedding the convergence conditions into the experiment design framework, thus obtaining an input that will result in the desired CDOA when applied to the system. The second one consists in manipulating the data already obtained from a given experiment by means of a sequence of filters. 5.1 Experiment Design Experiment Design consists in choosing the input spectrum to be applied in the identification experiment. The least costly experiment design is generally posed as an optimization problem [Lindqvist and Hjalmarsson, 2001]
700
16th IFAC Symposium on System Identification Brussels, Belgium. July 11-13, 2012
min energy of the input signal
Φu ( ω )
s.t. constraints. where several constraints may apply. It is usual to include constraints related to the estimate error, the frequency distribution of the input signal and the application of the model. The optimization problem as posed is not convex and is infinite dimensional [Jansson and Hjalmarsson, 2005]. However, it is possible to parametrize the input spectrum and to describe the constraints such that the problem becomes convex and finite dimensional. The input spectrum is parametrized as M
Φu (ω ) =
∑ ck βk (e jω ) k=1
where βk (e jω ), k = 1, . . . , M are basis functions and ck , k = 1, . . . , M are the parameters to be optimized. Condition (12) can be used as an additional constraint to the experiment design. The user chooses a set Λ and the experiment design will select the values ck which satisfies the (12), thus assuring that Λ will be a CDOA for the PEM. However, this condition is not convex and some approximation is necessary if we want to use it as a constraint to the experiment design. Note that for any fixed θ the condition (12) is convex; the problem becomes not convex because we want to ensure the condition for all θ ∈ Λ, θ 6= θ0 . On way to approach this issue is the Scenario Approach [Campi et al., 2009]. This technique proposes that Q randomly generated values of θ ∈ Λ should be selected, and Q respective conditions (12) with fixed θ should be used in the experiment design instead of the original condition. The number Q should be very large such that the set of the points could almost represent the whole set Λ. The article [Campi et al., 2009] describes precisely this random approach, and gives the probability that the set of conditions represent the original condition (12) as a function of the number Q of conditions. So, condition (12) is replaced by M
φ (θi ) + ∑ ck ψk (θi ) > 0 i = 1, . . . , Q
(16)
k=1
where ψk (θ ) =
Z 1 π
π
−π
2 βk LH −1 (θ )(G0 − G(θ ))
1 +CT θ0 1 + F T θ0 1 + DT θ0 + − dω T T T 1 +C θ 1+F θ 1+D θ Z 2 1 +CT θ0 1 π φ (θ ) = σe2 dω LH0 (H −1 (θ ) − H0−1 ) ℜ T π −π 1 +C θ
ℜ
and the θi is selected randomly such that θi ∈ Λ. Note that (16) is a convex condition which can be used as a constraint in the experiment design formulation.
∆ 1 +CT (e jω )θ0 ℜ K(e jω , θ ) = ℜ . 1 +CT (e jω )θ Thus, whether or not a a set is a candidate domain of attraction depends on the function K(e jω ). This transfer function can be rewritten as (1 + p10 e− jω )(1 + p20 e− jω ) · · · (1 + pn0 e− jω ) K(e jω , θ ) = . (1 + p1 e− jω )(1 + p2 e− jω ) · · · (1 + pn e− jω ) where pi are the n zeros of the polymonial (1 +CT (z)θ ) and pi0 are the n zeros of the polymonial (1 + CT (z)θ0 ). The response at frequency ω = 0 is (1 + p10 )(1 + p20 ) · · · (1 + pn0 ) K(e j0 , θ ) = (1 + p1 )(1 + p2 ) · · · (1 + pn ) and the response at frequency ω = π is (1 − p10 )(1 − p20 ) · · · (1 − pn0 ) . K(e jπ , θ ) = (1 − p1 )(1 − p2 ) · · · (1 − pn ) Therefore, If all poles pi and zeros pi0 of the transfer function K(e jω , θ ) are inside the unit circle, both K(e j0 , θ ) and K(e jπ , θ ) are real and positive. Now we can obtain the following lemma. Lemma 4. For all θ ∈ Γ, ∃ωl , ωh such that: ℜ K(e jω , θ ) > 0 0 ≤ ω ≤ ωl , ℜ K(e jω , θ ) > 0 ωh ≤ ω ≤ π .
(17)
This result follows from the continuity of the function K(e jω , θ ). According to the previous result, the real part of the transfer function K(e jω , θ ) is positive for sufficiently low frequencies and for sufficiently high frequencies. Hence, if the filter L(z) has high gains on the low and high frequencies and has low gain for the intermediate frequencies, then (15) is positive. This property does not depend on the parameters of the true system: G0 (z), H0 (z) and σe - even though the exact values of ωl and ωh do. The same result applies to condition (14), so it extends to the classical model classes: BJ, OE, ARMAX. This result shows that if the filter L(z) is designed to have high gains on the high and low frequencies then the conditions (15) or (14) are satisfied for all θ ∈ Γ. The result is surprisingly simple but of course it does not come for free. By filtering out the signals measured from the system its total energy will be reduced, thus increasing the variance of the resulting estimate. This variance increase tends to be large, since the frequencies that have been removed are the intermediate ones, which typically carry a significant part of the information about the system’s behavior. To overcome this problem, we suggest the use of an iterative filtering, as described in the following algorithm.
5.2 The filter L(z) The design of the filter L(z) through which the data are filtered can also be used in shaping CDOAs. In order to save space, we will present the design of the filter considering that the model has an ARMAX structure, as described on the Remark 1, but the results can be easily extended to other structures. Notice that condition (15) is an integral whose integrand is composed by several terms, all of which are non-negative, with the exception of the term
701
(1) Collect a set of input-output data (y(t) and u(t)) from process and set k = 0; (2) Perform an estimation of θ 0 using the instrumental variables or the Stieglitz-McBride method; (3) Choose a low-pass filter Lk (z) with very low passband; (4) Compute uL = Lk (z)u(t) and yL = Lk (z)y(t); (5) Perform the identification with the algorithm (10), the data uL (t) and yL (t) and the initial condition θ k . Set θ k+1 as the solution of the identification procedure; (6) Choose a low-pass filter Lk+1 (z) with passband larger than Lk (z);
16th IFAC Symposium on System Identification Brussels, Belgium. July 11-13, 2012
(7) Set k=k+1 and go to step 4. After some iterations, the filter Lk (z) can be set as Lk = 1, meaning that the signals are not filtered and that we are minimizing the desired original cost function J(θ ). The main idea of this algorithm is that if Lk (z) is similar to Lk+1 (z) then the global minimum of the respective cost functions are also similar, and the global minimum θ k is a good initial condition to the following optimization with date filtered by Lk+1 . This algorithm converges to the global minimum of J(θ ), if the filters Lk (z) are close enough. 6. CASE STUDY
vector θ and each abscissa corresponds to one Monte Carlo run. The Monte-Carlo runs were re-ordered in terms of the smallest θ . It is seen that the first 45 points of the graph present values close to the real parameter value θ0 , indicating that in these 45 runs the algorithm converged to the global minimum of the cost function. But in the remaining 55 Monte Carlo runs the values are close to another point in the parameter space, far away from the real value θ0 , indicating that the algorithm has yielded a local (nonglobal) minimum of the cost function. When the toolbox ident was used only 17 out of the 100 Monte Carlo runs have converged to the global minimum of the cost function.
Consider the “real” system described by −5
+3.061z
+ 0.1815z −6
− 1.045z
−3
+ 1.352z
−7
− 0.07957z
40
−4
− 3.356z
30
+ 0.0944z−8 /
20
1 − 6.549z−1 + 19.2z−2 − 32.83z−3 + 35.77z−4 − 25.39z−5 −6
+11.45z
H0 (z) = 1,
σe2
−7
− 3z
−8
10
+ 0.3491z
,
i
G0 (z) = −0.2073z
−2
θ
−1
= 0.01.
0 −10
This model represents a power system consisting of two synchronous generators connected to the grid, with data taken from [Anderson and Fouad, 1977]. The transfer function G0 (z) describes the relation between the field voltage applied to the first generator and the angular speed of the same generator at a given operating condition. Let us identify a model to this real system with the Output Error structure BT (z)θ G(z, θ ) = H(z, θ ) = 1 1 + F T (z)θ where B(z) = [z−1 z−2 z−3 z−4 z−5 z−6 z−7 z−8 0 0 0 0 0 0 0 0 ]T ,
−20 −30 −40 0
20
40 60 Monte−Carlo run
80
100
Fig. 1. Monte-Carlo runs with the toolbox unit. In Figure 2 we present the same graphic for the proposed algorithm, where it is seen that convergence to the global minimum has been achieved at every run.
F(z) = [0 0 0 0 0 0 0 0 z−1 z−2 z−3 z−4 z−5 z−6 z−7 z−8 ]T .
40
The real system belongs to the model set defined by this model structure, that is, G0 (z) = G(z, θ0 ) with the “real parameter” θ0 given by
30
20
θ0 = [−0.2073 0.1815 1.352 − 3.356 3.061 − 1.045 − 0.07957 0.0944
10
θi
− 6.549 19.2 − 32.83 35.77 − 25.39 11.45 − 3 0.3491]T
In order to avoid the existence of local minima in the cost function, the second approach proposed in this paper has been applied. At each iteration the input/output data were filtered by a 20th order FIR filter. Six iterations of the proposed algorithm were used, where the cut-off frequencies of the filters were 0.4, 0.5, 0.6, 0.7, 0.8 and 0.9 respectively.
0
−10
−20
−30
−40
For comparison, the parameter θ has also been estimated using the Matlab toolbox ident and the Matlab toolbox unit, besides the algorithm proposed in Subsection 5.2. In the three cases the same input signal u(t) has been applied to generate the data: a white noise sequence with unit variance (σu2 = 1) and data length N = 1, 000 samples. With this input the variance of the output signal is approximately σy2 = 100. For each one of the three algorithms, 100 Monte-Carlo runs have been performed, thus providing 100 parameter estimates with each method. The results obtained with the toolbox unit are summarized in Figure 1, where the ordinate axis presents the value of each one of the sixteen elements of the parameter
0
20
40 60 Monte−Carlo run
80
100
Fig. 2. Monte-Carlo runs with the proposed algorithm. In order to further illustrate the results, let us also compute, for 1 ˆ each method, the mean value θm = 100 ∑100 i=1 θi of the model’s ˆ parameters, where each θi represents the estimate obtained at the i − th Monte Carlo run. The mean value of the model’s parameters obtained with the toolbox unit, the toolbox ident and the proposed method are, respectively:
702
16th IFAC Symposium on System Identification Brussels, Belgium. July 11-13, 2012
θmunit = [−0.2071 −0.0360 1.3298 −1.9507 0.9517 −0.0396 − 0.0378 − 0.0276 − 5.5050 13.3882 − 18.5887 15.9136 − 8.4405 2.6183 − 0.4008 0.0164]T
θmident = [−0.2065 − 0.2952 0.8922 − 0.5885 0.3352 − 0.2073 − 0.0484 0.0841 − 4.2033 7.9007 − 8.7065 6.4202 − 3.5476 1.5685 − 0.5030 0.0858]T = [−0.2073 0.1794 1.3472 − 3.3269 3.0156 − 1.0138 − 0.0910 0.0974 − 6.5355 19.1230 − 32.6493 35.5180
θmprop
− 25.1770 11.3441 − 2.9679 0.3450]T Comparing these values with the real parameter θ0 we see that the estimate obtained with the proposed method is, in the average, much closer to the real parameter value. 7. CONCLUSION This paper presented sufficient conditions for the convergence of the Prediction Error Method to the global minimum of the cost function. The spectra of the signals are key factors in these convergence conditions. We have proposed two different approaches to design the spectra and ensure that the convergence conditions are respected. The first approach is based on least costly experiment design, where convex constraints were imposed. The second approach proposes iterative filtering as a solution to the problem. A case study shows that this approach to eliminate local (nonglobal) minima from the PEM cost function can be very effective in guaranteeing convergence of the PEM to its global optimum.
H. Jansson and H. Hjalmarsson. Input design via lmis admitting frequency-wise model specifications in confidence regions. Automatic Control, IEEE Transactions on, 50(10): 1534–1549, oct. 2005. K. Lindqvist and H. Hjalmarsson. Identification for control: adaptive input design using convex optimization. In Decision and Control, 2001. Proceedings of the 40th IEEE Conference on, volume 5, pages 4326–4331, 2001. L. Ljung. System identification: theory for the user. Prentice Hall, Upper Saddle River, NJ, EUA, 1999. L. Ljung. System Identification Toolbox - for use with MATLAB, User’s Guide. The Mathworks, 4th edition edition, 2000. L. Ljung. Perspectives on system identification. Annual Reviews in Control, 34(1):1–12, 2010. B. Ninness, A. Wills, and S. Gibson. The university of newcastle identification toolbox (unit). In IFAC World Congress, pages 1–6, jul 2005. B. Riedle, L. Praly, and P. Kokotovic. Examination of the spr condition in output error parameter estimation. Automatica, 22(4):495–498, 1986. ISSN 0005-1098. doi: http://dx.doi.org/10.1016/0005-1098(86)90055-5. T. S¨oderstr¨om. On the uniqueness of maximum likelihood identification. Automatica, 11(2):193–197, 1975. Y. Zou and W. P. Heath. Conditions for attaining global minimum in maximum likelihood system identification. In Proc. 15th IFAC Symposium on System Identificationl, pages 1110–1115, Saint-Malo, France, July 2009. Appendix A Proof of Theorem 2: First note that
REFERENCES P.M. Anderson and A.A. Fouad. Power System Control and Stability. IEEE Press, 1977. ˚ om and T. S¨oderstr¨om. Uniqueness of the maximum K. J. Astr¨ likelihood estimates of the parameters of an arma model. IEEE Transactions on Automatic Control, 18(6):769–773, 1974. A. S. Bazanella, M. Gevers, L. Miskovic, and B. D. O. Anderson. Iterative minimization of H2 control performance criteria. Automatica, 44(10):2549–2559, 2008. A. S. Bazanella, L. Campestrini, and D. Eckhard. Data-driven Controller Design: The H2 Approach. Springer, Netherlands, 2012. M. C. Campi, S. Garatti, and M. Prandini. The scenario approach for systems and control design. Annual Reviews in Control, 33(2):149 – 157, 2009. D. Eckhard and A. S. Bazanella. On the global convergence of identification of output error models. In Proc. 18th IFAC World congress, Milan, Italy, 2011a. D. Eckhard and A. S. Bazanella. Optimizing the convergence of data-based controller tuning. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering, 226(4):563–574, 2012. D. Eckhard and A.S. Bazanella. Robust convergence of the steepest descent method for data-based control. International Journal of Systems Science, iFirst:1–7, 2011b. G. C. Goodwin, J. C. Ag¨uero, and Robert Skelton. Conditions for local convergence of maximum likelihood estimation for armax models. In Proc. 13th IFAC Symposium on System Identification, pages 797–802, Rotterdam, Holand, August 2003.
∇G(e jω , θ ) =
B(e jω ) BT (e jω )θ F(e jω ) − 1 + F T (e jω )θ (1 + F T (e jω )θ )2
and ∇H −1 (e jω , θ ) =
D(e jω ) (1 + DT (e jω )θ )C(e jω ) − . 1 +CT (e jω )θ (1 +CT (e jω )θ )2
So, we can write
( ∗ Z 1 + DT (e jω )θ 1 π BT (e jω )θ Φu (ω )|L(e jω )|2 ℜ G0 (e jω ) − π −π 1 +CT (e jω )θ 1 + F T (e jω )θ (1 + DT (e jω )θ )C(e jω ) BT (e jω )θ D(e jω ) − G0 (e jω ) − 1 +CT (e jω )θ 1 + F T (e jω )θ (1 +CT (e jω )θ )2 1 + DT (e jω )θ B(e jω ) BT (e jω )θ F(e jω ) − − 1 +CT (e jω )θ 1 + F T (e jω )θ (1 + F T (e jω )θ )2 n ∗ + σe2 |L(e jω )|2 ℜ H0 (e jω )(H −1 (e jω , θ ) − H0−1 (e jω )) D(e jω ) (1 + DT (e jω )θ )C(e jω ) H0 (e jω ) dω − 1 +CT (e jω )θ (1 +CT (e jω )θ )2
∇J(θ ) =
Using the assumption S ∈ M we can write Z 2 1 π Φu (ω ) L(e jω )H −1 (e jω , θ )(G0 (e jω ) − G(e jω , θ )) π −π 1 + F T (e jω )θ0 1 + DT (e jω )θ0 1 +CT (e jω )θ0 + − ℜ 1 +CT (e jω )θ 1 + F T (e jω )θ 1 + DT (e jω )θ 2 1 +CT (e jω )θ 0 + σe2 L(e jω )H0 (e jω )(H −1 (e jω , θ ) − H0−1 (e jω )) ℜ dω > 0 1 +CT θ
(θ − θ0 )T ∇J(θ ) =
Using the condition of the theorem we ensure that
703
(θ − θ0 )T ∇J > 0, ∀θ ∈ Λ, θ 6= θ0 .