Signal Processing 99 (2014) 171–184
Contents lists available at ScienceDirect
Signal Processing journal homepage: www.elsevier.com/locate/sigpro
Review
Recursive hidden input estimation in nonlinear dynamic systems with varying amounts of a priori knowledge Ulaş Güntürkün a,n, James P. Reilly b, Thia Kirubarajan b, Hubert deBruin b a b
Inverse Problems and Cognitive Systems Laboratory (IPCSL), Ozyegin University, Istanbul, Turkey Electrical and Computer Engineering, McMaster University, Hamilton, Ontario, Canada
a r t i c l e i n f o
abstract
Article history: Received 12 April 2013 Received in revised form 24 October 2013 Accepted 30 December 2013 Available online 6 January 2014
Estimation of additive driving-forces (e.g., hidden inputs) in nonlinear dynamic systems is addressed with varying amounts of a priori knowledge on system models exemplified by three typical scenarios: (1) there is no sufficient prior knowledge to build a mathematical model of the underlying system; (2) the system is partially described by an analytic model; (3) a complete and accurate model of the underlying system is available. Three algorithms are proposed for each scenario and analyzed comprehensively. The adaptive driving-force estimator (ADFE) [1,2] is used for the retrieval of driving-forces using only the system outputs for the first scenario. A variational Bayesian and a Bayesian algorithm are established for the second and the third scenarios, respectively. All three algorithms are studied in depth on a nonlinear dynamic system with equivalent computational resources, and the Posterior Cramer–Rao Lower Bounds (PCRLB) are specified as performance metrics for each case. The results lead to a thorough understanding of the capabilities and limitations of the ADFE, which manifests itself as an effective technique for the estimation of rapidly varying hidden inputs unless a complete and accurate model is available. Moreover, the methods developed in this paper facilitate a suitable framework for the construction of new and efficient tools for various input estimation problems. In particular, the proposed algorithms constitute a readily available basis for the design of novel input residual estimators to approach the Fault Diagnosis and Isolation (FDI) problem from a new and different perspective. & 2014 Elsevier B.V. All rights reserved.
Keywords: Driving-force estimation Input estimation Fault diagnosis and isolation Echo state network Expectation-maximization Rao-Blackwellized particle filter Posterior Cramer–Rao lower bound
Contents 1.
2.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 1.1. Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 1.1.1. Input estimation in fully identified systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 1.1.2. Input estimaton in partially known systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 1.1.3. Input estimation in unknown systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 1.2. Statement of contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 1.3. Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 The adaptive driving-force estimator (ADFE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 2.1. A brief derivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
n
Corresponding author. E-mail addresses:
[email protected] (U. Güntürkün),
[email protected] (J.P. Reilly),
[email protected] (T. Kirubarajan),
[email protected] (H. deBruin). URL: http://www.ipcsl.org (U. Güntürkün). 0165-1684/$ - see front matter & 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sigpro.2013.12.032
172
3. 4. 5.
6.
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184
2.2. Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rao-Blackwellized particle filter with augmented state (RBPF-AR(P)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expectation maximization-Kalman Smoother-particle filter (EM-KS-PF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1. Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. Calculation of lower bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3. Performance results at minimal complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4. Effect of computational resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A The ADFE algorithm and its computational complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix B The RBPF-AR(P) Algorithm and its computational complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix C The EM-KS-PF algorithm and its computational complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1. Introduction Estimation of hidden states, retrieval of driving-forces (e.g., hidden inputs and external perturbations), or detection of hidden parameters of a dynamic system is typically an illposed inverse problem. The availability of sufficient prior knowledge on the underlying phenomenon can greatly facilitate solving such inverse problems. This enables one to invoke Bayesian inference to find regularized solutions to inverse problems by virtue of incorporating all prior knowledge on the system in a probabilistically optimum fashion. However, in many other cases, (e.g., voice production, brain activity, sea dynamics, a wireless multipath link, and evolution of seismic waves) there may be too little or no prior knowledge and evidence to build a specific and accurate model of the underlying system for use of the Bayesian methods. The requirement in such cases is to reconstruct the quantities of interest using only a finite set of time series data without the availability of an analytic model. As a consequence, finding regular solutions to such inverse problems is a common need for many applications in physical, engineering, and biomedical systems. Our interest is in the estimation of the driving-forces that perturb nonlinear dynamic systems. Those perturbation signals can be viewed as unknown inputs to the unperturbed dynamics, which are hidden from the observer most of the time if not always. A few examples of how the driving-forces in many physical phenomena affect the system dynamics are the mobility of transmit and/or receive terminals in a wireless channel, the effect of wind or small random elements on the sea surface for the evolution of the sea clutter, blinking of eye during the measurement of brain activity, and the development of abrupt or incipient faults in a nonlinear control system. The evolution of brain activity and the development of faults in nonlinear dynamic systems are some good examples of how the internal complexity of some dynamic systems could make it virtually impossible to devise an accurate model of the system. 1.1. Literature review Estimation of hidden inputs has received considerable attention in both engineering [3–26] and physics [27–35] disciplines. In what follows, we present a brief summary of the vast amount of literature in three categories with
175 175 176 177 177 177 178 179 180 180 180 180 181 182 183
respect to the assumed availability of a priori knowledge on the system. 1.1.1. Input estimation in fully identified systems Most of the engineering literature for the input estimation in linear systems has focused on the fault diagnosis and isolation, where abrupt or incipient faults are treated as hidden inputs and inferred via parity check methods [3– 5], or through the generalizations of the linear optimum Gaussian filtering (i.e., Kalman filter) [6–8]. For the estimation of hidden inputs in fully identified nonlinear deterministic systems, a commonly employed approach is to transform the nonlinear dynamics to a new set of coordinates where linear methods can be applied [9–14]. The literature on the input estimation in nonlinear stochastic systems is typically concerned with the estimation of static or slowly changing variables by augmenting the state model for incorporation of hidden parameters. This allows the use of sequential Markov Chain Monte Carlo methods (MCMC) (or particle filters) that operate on an augmented system model [15–22]. In a recent work, authors addressed the maximum a posteriori probability (MAP) estimates of state and hidden input jointly using Gauss–Newton method [23]. 1.1.2. Input estimaton in partially known systems All the methods described in [3–23] assume that a complete and accurate analytic system model is available. However, in some cases, the unknown system can be specified only partially, where the missing parts of the model need to be inferred from data. A typical approach for the estimation of missing model parts is embodied by parameterizing the missing part, and employing the missing data-maximum likelihood (ML) methods. To this end, the Expectation-Maximization (EM) method is proven to be an effective method [36,24] for parameter estimation in partially known systems. In [25,26], authors introduce a mixture of Gaussian (moG) model for the unknown measurement equation in a nonlinear dynamic system, and estimate the parameters of the moG model so as to accomplish the desired state and input estimation. 1.1.3. Input estimation in unknown systems Estimation of hidden inputs in nonlinear systems with unknown dynamics has received attention mostly from
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184
the physics literature, where input estimation is termed “driving-force estimation”, or “perturbation estimation”. In such problems, lack of sufficient prior knowledge of the system necessitated to resort to blind approaches based on some weaker assumptions. The earliest work for the estimation of driving-forces from systems with unknown dynamics can be dated back to the recurrent plots [27], followed by some refined versions [28,29], [30, Section 4.2]. In [31–35], the prediction error idea behind the recurrent plots was solidified using nonlinear predictors such as a radial basis function (RBF) network and a feedforward multilayer perceptron (MLP). Consequently, all the contributions in [27–35] can be considered as different realizations of reconstructing very slowly varying driving-forces from the prediction error. Moreover, batch processing is common to all of [27–35] where the data are presented to the processor in successive epochs to accomplish the required reconstruction performance. In many real-world applications however (e.g., sea clutter and wireless channel), it is essential to estimate the signal of interest in an online fashion and update these estimates as the new data become available. Also, the driving-signal may not change very slowly in many realworld problems. Motivated by these facts, and building on the contributions listed above, we described an adaptive sequential method for driving-force estimation in [1,2] termed the “adaptive driving-force estimator” (ADFE). It is demonstrated in [1] that the ADFE provides a much better immunity against the measurement noise than the offline driving-force estimation scheme presented in [31–35]. It is also shown in [1] that the ADFE is capable of reconstructing both slowly and rapidly varying forces that perturb the system in additive, multiplicative or even exponential forms up to a similar degree of accuracy. In [2], we illustrated the performance of the ADFE on the radar scene analysis problem for the extraction of the signature of a small random target using texture modeling of sea clutter. 1.2. Statement of contributions In this paper, we study the hidden input estimation problem under a broader scope and develop some stateof-the-art techniques tailored to three typical estimation scenarios. To this end, the contribution of this paper lies in the following aspects: First, the capabilities and shortcomings of the ADFE are investigated in depth and better understood. Second, a Bayesian and a variational Bayesian framework are constructed to address the estimation of an irregular hidden input source when the underlying system is fully or partially known, respectively. This leads to the development of two particular algorithms, namely RBPF-AR(P) and EM-KS-PF. In order to facilitate the selection and use of the methods on various input estimation problems, computational complexities of all algorithms are derived in specific flop counts. Another important contribution of this paper is that the methods addressed and studied in detail herein provide an adequate basis for the design of novel and efficient tools for the fault diagnosis and isolation (FDI) problem. This follows from the fact that the discrepancy between the nominal system input and the successfully estimated
173
effective input source is expected to serve as an indicator of a fault, and perform better than relying on the noisy system outputs in search of a fault. 1.3. Problem statement In mathematical terms, we can state our problem as follows. Consider a (possibly nonlinear) dynamic system as follows: xn þ 1 ¼ gðxn ; un Þ þωn þ 1
ð1Þ
yn þ 1 ¼ hðxn þ 1 Þ þ υn þ 1 ;
ð2Þ 2
where the continuous and measurable mapping g : R -R defines the state transition, and h : R-R defines the evolution of the observables, respectively. Here, xn A R denotes the state, un A R is the unknown input signal (driving-force), yn A R denotes the observable at discrete time n A ½0; T 1, T A N, ωn A R is the process noise, and υn A R is the measurement noise, both with zero mean and the variances s2ω and s2υ , respectively. Also, R denotes the real space. Note that (1) can be extended to cover the cases where the state is multidimensional without loss of generality. The problem that we are posing is similar to the state estimation problem, where the requirement is to estimate the state xn. Our problem, however, is different: We wish to estimate the driving-force un with varying amounts of knowledge on gðÞ and hðÞ, given that both mappings are differentiable everywhere and that hðÞ is invertible. We study the estimation of un in (1) under three typical observation conditions. Specifically, we raise and seek to answer the following questions: How accurately could the driving-force un be estimated if an analytic model of the underlying system were 1. fully available (i.e., the state-space pair as in (1) and (2), and an evolutionary model of the driving-force are all known)? 2. partially available (i.e., only the state-space equations (1) and (2) are known)? 3. not available at all (i.e., the system cannot be described analytically, the knowledge of (1) and (2) is not available)? An estimator based on the Rao-Blackwellized Particle Filter (RBPF) is tailored to answer question 1, which provides a benchmark for the other techniques. Since any band-limited process can be represented as an autoregressive (AR) model of an arbitrary order P, the evolution of the driving-force can be augmented into the state of the system in a form suitable for using the RBPF. The resulting algorithm is termed the RBPF-AR(P). Question 2 is explored by interpreting the problem from a missing-data perspective and developing a variational Bayesian framework. To this end, the EM algorithm [36] is combined with a Kalman smoother (KS) and a standard particle filter (PF) [37,38] to yield the EM-KS-PF algorithm. The adaptive driving-force estimator [1,2] has been shown to be an adequate technique to address the third question given that the unknown system is governed by an
174
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184
everywhere-differentiable mapping and the signal-to-noise ratio is medium to high. Herein, we test the performance of the ADFE for the estimation of an abruptly varying hidden input under the effect of undesired outliers. For RBPF-AR (P), EM-KS-PF and the ADFE, an equivalent amount of computational resources is made available for a fair performance assessment. The rest of this article is organized as follows. In Section 2, the ADFE algorithm is derived with some brief remarks on the application-oriented constraints of the algorithm. In Sections 3 and 4, the RBPF-AR(P) and the EM-KSPF algorithms are specified respectively. Section 5 is devoted to the experiments conducted on a nonlinear dynamic system, in which the driving-force is an additive term exhibiting rapid variations. Section 6 finalizes the paper with the concluding remarks with notes on the direction of the future research. The pseudocodes and computational complexities of the ADFE, the RBPFAR(P) and the EM-KS-PF algorithms are presented in subsequent appendix sections. 2. The adaptive driving-force estimator (ADFE) 2.1. A brief derivation The basic idea underlying the adaptive driving-force estimator is to predict the observables one-step into the future using a bank of echo state networks (ESN) [1,2,39– 41] in an online fashion, and then exploit the fact that the driving-force is not explicitly modeled by the predictor. Based on this simple idea, we relate the driving-force to the prediction error of the ESNs. A predictive model of the unknown environment can be built by transforming the evolution of the observables yn from the unknown system in (1) and (2) into a time series model. In such a model, the requirement for knowledge of the equations in (1) and (2), and the dependence of the observables on the unknown state are eliminated. For derivation, let us begin with re-arranging (2), i.e., xn ¼ h
1
ðyn υn Þ:
ð3Þ
Now let us substitute (3) in the state equation in (1): xn þ 1 ¼ gðh
1
ðyn υn Þ; un Þ þ ωn þ 1 :
ð4Þ
α and β. Then, we can write yn þ 1 f ðyn ; un Þ þ vn þ 1 ;
ð6Þ
where the overall effect of the transformed and the additive noise processes is denoted by vn þ 1 . In (6), the observable yn þ 1 is expressed as a function of the previous observable yn, and the unknown driving-force, un. Next, consider that we design an online nonlinear predictor (i.e., an ESN bank), which receives only the previous observable in its input and provides one-step approximations to the unknown system. Let zn þ 1 denote the output of the nonlinear predictor. Following from the universal approximation property of the recurrent neural networks, zn þ 1 can be described as zn þ 1 ¼ f~ ðyn ; 0Þ;
ð7Þ
where yn is input to the predictor, and zn þ 1 is obtained at its output. Note that the notation f~ ðyn ; 0Þ implies that the driving-force is not provided to the neural network. Therefore, the absence of un at the neural network input is represented by 0 as a second argument. The predictor0 s trainable parameters are updated at each time instant based on the arrival of a new observable. Then the objective is to extract the driving-force from the online prediction error en þ 1 ¼ yn þ 1 zn þ 1 based on the finitedifference approximation, given that the mapping f is differentiable everywhere: en þ 1 ¼ yn þ 1 zn þ 1
ð8Þ
en þ 1 ¼ f ðyn ; un Þ f~ ðyn ; 0Þ þ vn þ 1
ð9Þ
en þ 1
∂f ðyn ; un Þ un þ εn þ 1 : ∂un
ð10Þ
In (10), εn þ 1 denotes the overall approximation error. The final step of the algorithm is to acquire the refined drivingforce estimates using a noise-smoother and a regularized adaptive filter based on (10). The resulting estimation algorithm encompasses the following three main operations: 1. Predictive modeling of the unknown system using ESNs. 2. Finite-difference approximation of the one-step prediction error as in (10). 3. Noise smoothing and regularized adaptive estimation of the hidden input from the linearized prediction error.
If we substitute (4) in (2), we obtain the observable as yn þ 1 ¼ hðgðh
1
ðyn υn Þ; un Þ þ ωn þ 1 Þ þυn þ 1 :
ð5Þ
Let us denote the nested combination of mappings by 1 f ðα; βÞ 9hðgðh ðαÞ; βÞÞ with the aid of two dummy variables,
One iteration of the algorithm is depicted in Fig. 1. The “noise-smoothing” block in Fig. 1 is implemented as a recursive first-order low-pass filter, the purpose of which is to reduce the variance of the noise component of en þ 1 .
Fig. 1. Block diagram of the ADFE. The task of the functional analysis block is to provide a model in which the prediction error en þ 1 is linearly dependent on the driving-force. z 1 denotes the unit delay operator. “TDL” denotes a tapped-delay line of L shift registers. “LMS” stands for the least mean squares algorithm. rn 9 ½rn ; …; r n L þ 1 T .
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184
The “regularized adaptive estimator” realizes an adaptive forward prediction error filter, which adaptively tracks the desired slowly varying component of the input r n þ 1 and smoothes-out the noise. Derivation of the adaptive drivingforce estimator is elaborated in [1], and the particular algorithm ADFE and its computational complexity in flop counts are presented in Appendix A. 2.2. Remarks An underlying assumption for the realization of the ADFE is the differentiability of f(.,.), which implies that the mappings, g and h are differentiable too. To this end, we stress that the ADFE is not necessarily applicable to some switching dynamic systems that exhibit sharp transitions among various states, where gðÞ or hðÞ may be non-differentiable. We also note that the determination of the drivingforce when the system is unknown is an ill-posed problem. In our case, the resulting ambiguity is resolved by forcing a solution for which the corresponding partial derivative term in (10) is slowly varying. The slow variation assumption is justified by virtue of experiments in [42] using a variety of datasets that were studied in depth in [43]. Our results suggest that this assumption is far from being stringent. Nevertheless, this condition could still be violated in such cases where the underlying dynamics of the nonlinear system exhibit an irregular behavior. Another observation from (5) is that it follows that the noise processes can be greatly amplified by the nonlinear 1 functions gðÞ, hðÞ, or h ðÞ. This may harm the measurability of f(.,.), thus the predictive modeling of the observables may fail. To sum up, in some practical applications, the predictability of f(.,.) and the slow-variation of ∂f ðyn ; un Þ=∂un may not hold due to the effect of undesired outliers. In order to put the algorithm to test against these challenges, we will measure the algorithm0 s performance on varying levels of process and measurement noises that represent the undesired outliers in the experiments section. Then we will refer to the Posterior Cramer Rao Lower Bound (PCRLB) as a mathematically rigorous tool to assess the algorithm0 s performance under varying environmental conditions. 3. Rao-Blackwellized particle filter with augmented state (RBPF-AR(P)) In this section, we further specify the model in (1) and (2) to study the estimation of driving-forces that can be modeled as the additive components in the nonlinear state equations as follows: xn þ 1 ¼ gðxn Þ þ un þ 1 þωn þ 1
ð11Þ
yn þ 1 ¼ hðxn þ 1 Þ þ υn þ 1 :
ð12Þ
Since any band-limited discrete time stochastic process can be described by an autoregressive (AR) model of arbitrary order P [44], a linear submodel can be devised to describe the dynamic evolution of un such that un þ 1 ¼ c1 un þ⋯ þcP un P þ 1 þ ζ n þ 1
ð13Þ
175
where ζ n is additive white Gaussian noise (AWGN) that accounts for the model input, with a variance s2ζ . The AR (P) model in (13) can be expressed by a Markovian transition in the matrix form. For this transformation, let us collect the P driving-force samples from time n to n P þ 1 in a vector and define this vector as un 9½un ; …; un P þ 1 T A RP :
ð14Þ
In the RBPF context, we refer to the vector un as the “linear state vector”. Also, un can be regressed on its previous state by a P P regression matrix, which holds the model coefficients fcp jp ¼ 1; …; Pg in the following form: 2 3 c1 c2 ⋯ cP 1 cP 61 0 … 0 07 6 7 B¼6 ð15Þ 7: 4⋮ ⋱ ⋱ ⋱ ⋮ 5 0
0
⋯
1
0
This enables us to re-arrange (11) and (12) as given in (16)–(18), in which both the linear and the nonlinear state variables are represented in dynamic Markovian substructures as un þ 1 ¼ Bun þξn þ 1
ð16Þ
xn þ 1 ¼ gðxn Þ þbun þ ωn þ 1
ð17Þ
yn þ 1 ¼ hðxn þ 1 Þ þ υn þ 1 :
ð18Þ
The ð1 PÞ vector b in (17) is defined as b ¼ ½1; 0; …; 0. The ðP 1Þ noise vector is given by ξn 9½ζ n ; 0; …; 0T . The authors of [45–47] demonstrate that the RaoBlackwellized particle filter is a favorable technique for the estimation of linear state variables, whenever it is possible to partition the state equation into a linear and a nonlinear sub-model as in (16) and (17). The RBPF exploits the linear sub-structure by virtue of Rao-Blackwellization (or marginalization). In particular, the nonlinear state xn is estimated with a particle filter, and the linear state un is then marginalized from the nonlinear state. This marginalization is operated by a Kalman filter (KF). Due to the linearity of (16) and (17) in un , and the Gaussianity of ξn and ωn , the KF estimates of the linear states un are optimum. Therefore, the RBPF is an approximation to the optimum solution. This explains why the RBPF provides superior performance over the standard particle filters [45,47]. For a more formal description of the RBPF, let pðun ; xn jy1:n Þ denote the joint posterior of the linear and nonlinear state, which can be written in the form pðun ; xn jy1:n Þ ¼ pðun jxn Þpðxn jy1:n Þ
ð19Þ
where we have made use of the fact that the measurements are independent of the linear state. Then the idea is to marginalize the posterior density of the linear state from the joint posterior in (19) as follows: Z Z pðun jy1:n Þ ¼ pðun ; xn jy1:n Þ dxn ¼ pðun jxn Þpðxn jy1:n Þ dxn : ð20Þ Here, pðxn jy1:n Þ in the second line of (20) can be approxi^ n jy1:n Þ denote mated by a particle filter. Specifically, let pðx the particle filter approximation to pðxn jy1:n Þ, which can be
176
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184
described as M
^ n jy1:n Þ ¼ ∑ wðiÞ δðxn xðiÞ pðx n Þ
ð21Þ
i¼1
where M denotes the number of particles. Substituting (21) in the second line of (20), the approximate posterior density for the linear state is obtained as M
^ n jy1:n Þ ¼ ∑ wðiÞ pðun jxðiÞ pðu n Þ:
ð22Þ
i¼1
The conditional mean estimates of the linear state variable are approximated from (22) in the following form: Z M ^ n jy1:n Þ dun ¼ ∑ wðiÞ E½un jxðiÞ un pðu ð23Þ E½un jy1:n n ;
Fig. 2. Graphical representation of the EM-KS-PF method. At the beginning of each EM iteration, the particle filter generates the nonlinear state estimates x^ 1 ; …; x^ T using the previously available driving-force vector estimates, u^ 1 ; …; u^ T . Then the EM algorithm updates the estimate of the missing parameter vector Θ using the data provided by the particle filter and generates updated driving-force estimates.
i¼1
where E½: denotes the statistical expectation operation and E½un jxðiÞ n is computed using a Kalman filter exploiting the linear Gaussian substructure that encompasses (16) and (17). The overall operation of the augmented RBPF technique described herein is presented as RBPF-AR (P) algorithm in Appendix B with the associated computational complexity. 4. Expectation maximization-Kalman Smoother-particle filter (EM-KS-PF) The Rao-Blackwellized particle filter is intended for applications where the exact knowledge of the system equations (e.g., (16)–(18)) is available. However, in some applications, a model of the underlying dynamic system may be only partially available [17,45,24]. In this section, we consider such cases where the nonlinear state equation and the measurement equation are available, yet the evolution of the linear state is hidden from the observer. We propose to use the expectation-maximization (EM) [48] algorithm for such estimation scenarios, where the hidden states have to be inferred from incomplete data. The objective of the EM algorithm is to parameterize the missing information, and deduce maximum likelihood (ML) estimates of those parameters. The driving-force estimator based on the EM algorithm is embodied by the following two main operations: 1. Particle filtering: the task associated with the particle filter is to estimate the nonlinear state xn given the following pieces of information: the knowledge of (17) and (18), both of which are assumed known, the set of measurements y1:n , and the estimates of un, which are to be provided by the EM algorithm. 2. EM-KS: the EM algorithm on the other hand produces the ML estimates of the unknown quantities in (16) exploiting: a Kalman smoother operated on the structure of (16) with unknown coefficients, and (17), which is assumed known; and the set of pseudo-measurements fmn ¼ xn gðxn Þg, where the estimates of xn are provided by the PF.
Following from this brief description, we abbreviate the resulting algorithm as the EM-KS-PF. The operation of the EM-KS-PF is illustrated on a block diagram in Fig. 2. The pseudo-measurements for the EM algorithm are defined as mn 9 xn gðxn 1 Þ such that the T 1 vector m 9 ½m1 ; …; mT T represents the available data for the EM. The missing data are represented by a T P 1 vector ½uT1 ; …; uTT T . The vector of hidden parameters is defined as Θ 9½c1 ; …; cP ; s2ζ T . Then the complete data can be defined as a T ðP þ1Þ 1 vector such as d 9 ½uT1 ; …; uTT ; m1 ; …; mT T as depicted in Fig. 2. In the E-step, the expectation of the log-likelihood of the complete data E½log pðdjΘÞjΘk ;m is computed. A Kalman smoother is used for simulating the expectations in the E-step. The desired driving-force estimates (i.e., the missing data) are obtained as the by-product of the Kalman smoother. In the M-step, the missing parameter vector Θ that maximizes the expected log-likelihood is calculated. After the completion of the M-step, the E-step is reinitiated with the updated missing parameter vector and the updated available data vector m that is provided by the re-run of the particle filter. For the particle filtering operations, we use a standard particle filter where systematic resampling is used at each iteration and the importance density is set to the prior. Now, let us elaborate on the EM method reproducing (16) and (17), on which the EM and KS operations are performed: un ¼ Bun 1 þ ξn
ð24Þ
mn ¼ bun þωn :
ð25Þ
As the unknown regression coefficients fcp jp ¼ 1; …; Pg occupy only the first row of the matrix B, performing the M-step over the model in (24) and (25) would be wasteful of computational resources. As an alternative, we simplify (24) and (25) to obtain (26) and (27), where we deal with the 1 P vector of unknown coefficients (c 9½c1 ; …; cP ) instead of the P P matrix, B: un ¼ cun 1 þζ n
ð26Þ
m n ¼ u n þ ωn :
ð27Þ
Consequently, the E-step of the EM algorithm operates on (24) and (25), whereas the M-step exploits (26) and (27).
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184
Let us describe the log-likelihood of the complete data as log pðdjΘÞ ¼ log pðuT1 ; …; uTT ; m1 ; …; mT jΘÞ
ð28Þ
log pðdjΘÞ ¼ log pðuT1 ; …; uTT jΘÞ þ log pðm1 ; …; mT juT1 ; …; uTT ; ΘÞ:
ð29Þ
Exploiting the linearity of our sub-model in (26) and (27) and using the whiteness and Gaussianity of ζ n and ωn , (29) can be expanded as log pðdjΘÞ ¼ log pðu0 Þ
1 T ∑ ðun cT un 1 Þ2 2s2ζ n ¼ 1
Augmented system
RBPF-AR(2)
EM-KS-PF
ADFE
un þ 1 ¼ Bun þ ξn þ 1 25xn þ bun þ ωn þ 1 xn þ 1 ¼ 0:5xn þ 1 þ x2n
✓ ✓
✓
✓
✓
yn þ 1 ¼
1 T ∑ ðmn un Þ2 : 2s2ω n ¼ 1
x2n þ 1 þ νn þ 1 20
corresponding computational complexity are presented in Appendix C. ð30Þ
Using (30), we are ready to formulate the E and the M-steps in mathematical terms:
E-step: Let Θ½k be an estimate of the parameters at the kth EM iteration. Then for the E-step, we compute the expectation of the log-likelihood in (30) given the available data m, and our current parameter estimate, Θ½k : h i EðΘjΘ½k Þ 9E log pðdjΘÞjm;Θ½k h i T 1 T log s2ζ 2 ∑ E ðun Þ2 jm;Θ½k ¼ o1 2 2sζ n ¼ 1 h i 2cT E un un 1 jm;Θ½k h i þcT E un 1 uTn 1 jm;Θ½k c h i 1 T 2 ∑ E ðmn un Þ2 jm;Θ½k ð31Þ 2sω n ¼ 1
Table 1 Availability of the system equations to the estimation methods. The ✓ and signs indicate the availability and unavailability of a particular line of the system equations respectively to the corresponding filters.
T T log s2ζ T log 2π log s2ω 2 2
177
5. Experiments 5.1. Experimental setup We consider the nonlinear stochastic model in (33) and (34) for the experiments, which was originally studied as a severely nonlinear setting for the Bayesian sequential state estimation [37] xn þ 1 ¼ 0:5xn þ
yn þ 1 ¼
25xn þ un þ ωn þ 1 1 þx2n
x2n þ 1 þ υn þ 1 : 20
ð33Þ
ð34Þ
Our objective in this section is to estimate un in (33) under varying observation conditions using the following estimators: 1. Full a priori information (FAI): For the application of the RBPF-AR(P) in this case, we construct un in (13) to be a 2nd-order autoregressive process (i.e., P ¼ 2).1 This application will be referred to as RBPF-AR(2), for which the model and all parameters are assumed known. 2. Partial a priori information (PAI): The method we propose for this estimation situation is the EM-KSPF algorithm. The objective of the EM algorithm is to estimate the coefficient vector ½c1 ; …; cP T . 3. Finally, for such cases where there is no prior knowledge on the system dynamics, the ADFE is invoked without the knowledge of (33) and (34).
where we have defined the constant term as o1 9 log pðu0 Þ T log 2π ðT =2Þ log s2ω . As noted before, the E-step operates on the model in (24) and (25), and the indicated expectations in (31) are approximated using a KS. For instance, in order to compute the term E½ðun Þ2 jm;Θ½k , we first compute E½un uTn jm;Θ½k using a KS. Then E½ðun Þ2 jm;Θ½k is retrieved from the first element of the P P matrix E½un uTn jm;Θ½k . The operation of the KS is described in detail in line 3 of Algorithm 3 in Appendix C. M-step: We find the value of Θ that maximizes EðΘjΘ½k Þ, which becomes the next estimate of the missing parameters:
The availability of a priori knowledge for each filter is summarized in Table 1.
Θ½k þ 1 ¼ arg max EðΘjΘ½k Þ
5.2. Calculation of lower bounds
Θ
ð32Þ
½k
where EðΘjΘ Þ is as given in (31). The details and the outcome of the maximization operation are provided in line 4 of Algorithm 3 in Appendix C. As opposed to the ADFE and RBPF-AR(P), the EM-KS-PF algorithm is based on batch processing due to the nature of the EM algorithm. It should also be noted that replacing a Kalman smoother by a Kalman filter yields a significant drop in the algorithm0 s performance. The pseudocode for the EM-KS-PF algorithm and the
Performances of all estimators mentioned above will be evaluated with a reference to the PCRLB, which is individually derived for each case listed above. In particular, for 1 The regression coefficients ½c1 ; c2 T in (13) are determined using the covariance method so as to minimize the forward prediction error in the least squares sense. The command arcov is readily available in Matlab for this operation. Setting P ¼ 2 is sufficient since it leads to a negligibly small variance of modeling error s2ξ . Using P4 2 does not lead to a significant decrease in s2ξ .
178
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184
Table 2 Theoretical and simulated computational complexity of the driving-force estimation methods. The EM-KS-PF is realized by 50 particles with a model order of P ¼ 2 and 5 EM iterations. The RBPF-AR(2) is realized with 275 particles with P ¼ 2. The ADFE has 34 ESNs in the ESN bank, each having 30% connected dynamic reservoirs with 10 neurons. The regularized LMS estimator has 100 tap-weights. The specific flop counts for the ADFE, the EM-KS-PF, and the RBPF-AR(2) follow from (A.1), (B.2) and (C.1), respectively. Filter
EM-KS-PF RBPF-AR(2) ADFE
Realization
CEMKSPF ð50; 2; 5Þ CRBPF ð275; 2Þ CADFE ð0:3; 34; 10; 100Þ
Flop count Theoretical
Simulated
37,125 36,983 36,410
37,594 37,812 37,337
the full a priori information (FAI), the PCRLB is calculated in a relatively straightforward fashion following the ideas developed in [49, Section IV, Case1] and denoted “PCRLBFAI”. For the partial a priori information (PAI) case however, the unknown regression coefficients are further augmented into the overall state-space model as deterministic constants. This leads to the singularity of the posterior conditional probability density function pðun ; xn jy1:n Þ 8 n, and causes the entire system to be nonlinear, which introduces a substantial intricacy to the calculation of the PCRLB. Nevertheless, following [49, Section IV, Case2], we calculate the PCRLB for this case in an approximate fashion, and denote it as “PCRLB-PAI”. Finally, for completely unknown cases, the joint density pðun ; xn jy1:n Þ cannot be specified analytically at all. To this end, we refer to the PCRLB-PAI also as an approximation to the lower bound for the completely unknown systems. 5.3. Performance results at minimal complexity Our objective in the first part of the experiments is to realize all of the methods mentioned above at the minimal computational cost. To this end, we first determine the minimum complexity of the EM-KS-PF method since it is computationally the most demanding algorithm. Specifically, using 50 particles and 5 EM iterations leads to a reasonable performance for the EM-KS-PF with a model order of P ¼ 2. Increasing the computational resources does not lead to a remarkable increase in the performance for the EM-KS-PF. With this setting, the EM-KS-PF algorithm is realized at approximately 37 103 flops. In Table 2, this setup is represented by CEMKSPF ð50; 2; 5Þ. Then we realize the other estimators at a similar level of computational cost as presented in Table 2, where the theoretical and simulated flop counts2 are given for each estimator. The discrepancy between the theoretical and simulated flop counts in Table 2 is less than 3% for all estimators, which confirms the accuracy of the computational complexity derivations. We proceed to the experiments with the settings given in the second column of Table 2. 2 The flops are simulated using the lightspeed toolbox [50], which leads to much more accurate flop estimates than the Matlab 5.3.
Fig. 3. An example run of the ADFE. The solid curve denotes the actual driving-force, un. The dashed curve is the estimate provided by the ADFE, which are obtained by a single example run of the ADFE algorithm. The ADFE is realized with the values and the corresponding complexity given in Table 2. s2ω ¼ 1:5811 (i.e., P u =s2ω ¼ 5 dB) and s2υ ¼ 0:05.
An example run of the ADFE algorithm with the complexity CADFE ð0:3; 34; 10; 100Þ is illustrated in Fig. 3, which demonstrates the accuracy of the proposed method for the reconstruction of a rapidly varying non-smooth driving-force. For the sake of clarity, only the last 50 samples are shown in Fig. 3. As stated in Section 2.2, we conduct our experiments for varying environmental conditions which are represented by the changes in the model uncertainty and sensor noise in the model (33) and (34). Let Pu denote the power of the driving-force signal. We vary the ratio of Pu to the variance of the dynamic noise in the range P u =s2ω A f 20; 20g dB with 5 dB increments in order to test the performance of the estimators for varying level of unstructured outliers or model uncertainty. We consider two sensor noise levels. For the first case, we consider a lownoise sensor, and set s2υ ¼ 0:05. Then we repeat our experiments for the same range of P u =s2ω with a highnoise sensor model by setting s2υ ¼ 5. In Fig. 4, the performance of the RBPF-AR(2) is shown with respect to the PCRLB-FAI, where, for the particle filtering operations, we use the systematic resampling scheme [38,51], due to its ease of implementation and low complexity [52]. As expected, the RBPF exhibits an estimation variance that is close to the PCRLB following a similar trend. In Fig. 5, performances of the EM-KS-PF and the ADFE are compared with respect to the PCRLB-PAI. The EM algorithm has been simulated by 100 independent initializations of the regression coefficients such that ½c1 ; c2 T A ½ 1; 1 and the average of those trials is calculated. For particle filtering, again we use systematic resampling algorithm. Comparing the performances of EM-KSPF with the ADFE, it is clear that the ADFE provides a considerably better performance for all noise levels. However, inspecting the curves for the ADFE s2υ ¼ 5 and ADFE s2υ ¼ 0:05 reveals that the ADFE is relatively sensitive to both the effect of outliers and the
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184
179
Table 3 Performance of all estimators vs the increasing computational complexity (s2ω ¼ 5 and s2υ ¼ 0:5). Flop count
EM-KS-PF
ADFE
RBPF-AR(2)
RMSE 37 103 100 103 200 103 400 103
0.74 0.73 0.73 0.73
0.44 0.25 0.21 0.17
0.019 0.019 0.019 0.019
Fig. 4. Performance the Rao-Blackwellized Particle Filter with a 2ndorder augmented state (RBPF-AR(2)) assuming the full knowledge of both the state-space pair, and the 2-nd order AR modeling of the hidden input with a computational cost at approximately C ¼ 37 103 flops. PCRLB for this case is abbreviated as PCRLB-FAI. The y-axis is the root mean square error (RMSE), while the x-axis is the ratio of the power of the driving-force to the variance of dynamic noise in dB.
Fig. 6. 95% Confidence intervals for the standard deviation of estimation error for the ADFE with CADFE f250; 500; 1000g 103 flops, and the PCRLB calculated for partial a priori (PAI) case with s2υ ¼ 5.
Fig. 5. Performance of the Expectation-Maximization-Kalman SmootherParticle Filter (EM-KS-PF), and the Adaptive Driving-Force Estimator (ADFE) for partial a priori and no a priori information cases respectively, both with a computational cost at approximately C ¼ 37 103 flops. The PCRLB for these cases is abbreviated as PCRLB-PAI.
measurement noise. The estimation errors for all estimators in Figs. 4 and 5 are averaged over 100 independent runs of the respective algorithms. The outputs of the EMKS-PF and the ADFE are normalized by the largest absolute values of their respective estimates so as to compensate the effect of arbitrary scaling. 5.4. Effect of computational resources Based on the results presented in Fig. 5, the following question can be raised: How much better could the ADFE and the other estimators perform if a larger amount of computational resources were available? To address this question, we setup another experiment in which the noise levels are fixed at s2ω ¼ 5 and s2υ ¼ 0:5. Then we calculate the RMSE for C f37; 100; 200; 400g 103 flops. The
results are presented in Table 3. Clearly, for the ADFE, the performance is remarkably improved with the availability of larger amounts of computational power. For all the other estimators however, no improvement is observed regardless of how much more computational power becomes available. Our final set of experiments are designed to specify the conditions under which the ADFE could perform within some statistical tolerance limits. To this end, we note that the RMSE curves in Figs. 4 and 5 were obtained by averaging over 100 runs, since it is feasible to estimate the point variance over a large number of independent trials empirically (e.g., 100) exploiting the LLN.3 However, for such algorithms that require very large computational complexities, it is more practical to average the results over smaller ensembles (e.g., 10 independent trials) and then refer to some specified tolerance limits for performance assessment. Hence, an interval estimate of the error standard deviation can be obtained in terms of 95% confidence area. Specifically, in Fig. 6, we illustrate the 95% confidence intervals for CADFE f250; 500; 1000g 103 flops for the noisy sensor case (i.e., s2υ ¼ 5). Then a practically meaningful way to judge the performance of the ADFE is to address under what conditions the PCRLBPAI would lie in the 95% confidence interval although PCRLB-PAI is an approximation to the true bound for the ADFE. Inspecting Fig. 6, we notice that P u =s2ω Z 0 dB and
3
LLN is an abbreviation for the law of large numbers.
180
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184
CADFE Z 106 flops would be required for the ADFE to provide desirable confidence limits. 5.5. Discussion The results of the experiments studied in this section reveal some interesting observations: (1) The essence of the Bayesian inference is to incorporate all the prior knowledge on the underlying phenomena in a probabilistically optimum fashion to solve the inverse problems. Hence, the Bayesian estimators heavily rely on the accuracy of the model that describes the evolution of the underlying phenomena. Consider the lowest noise levels, (s2ω ¼ s2υ ¼ 0:05). In this case, the EM algorithm produces after the 50 th iteration an estimate of the coefficient vector in (15), θ^ ¼ ½0:79; 0:49T averaged over 100 Monte Carlo trials. Further iterations do not lead to any remarkable change in the estimate of θ. The actual coefficient vector however is θ ¼ ½0:72; 1. This observation suggests that a slight discrepancy between the actual dynamics and the model on which the Bayesian estimator operates can lead to a substantial decline in the precision of the estimates. Note that the EM algorithm is very sensitive to the process noise. To this end, we observed that the performance of the EM algorithm gets better as s2ω decreases. However, those results are omitted herein as we are interested in testing the performances of methods against the effect of undesired outliers. (2) Comparing the results in Figs. 4 and 5, we conclude that unless the full prior knowledge about both the governing dynamics of the underlying system and the evolution of the driving-force is available, the proposed ADFE is evidently an effective approach for the drivingforce estimation problem. (3) A shortcoming of the ADFE however is its sensitivity to the increasing noise levels in the system. Note that the ADFE was shown to be considerably less sensitive to the process noise when the driving-force is a smooth signal in multiplicative and/or exponent forms in [1]. Following from the results presented in [1] and Figs. 5 and 6 herein, it can be deduced that the performance of the ADFE for non-smooth forces can be substantially improved only for moderate levels of undesired outliers at the expense of more than 100 times larger computational resources than the nominal working conditions. In other words, lack of a priori information can be compensated up to a certain degree only if significant computational resources are available. 6. Conclusions We have studied the hidden input estimation in a comprehensive fashion and nominated some state-of-theart techniques for three fundamental estimation scenarios including the adaptive driving-force estimator (ADFE) [1,2], which is designed for estimating the hidden inputs (drivingforces) from nonlinear systems with unknown dynamics. By virtue of experiments conducted on a general nonlinear system, we have demonstrated that if the underlying system is not fully specified by a mathematical model that describes both the governing system dynamics (e.g., state-space
equations) and the evolution of the hidden force, then the ADFE is indeed a favorable method. It is also illustrated by a statistical analysis that if very large computational resources are available, then the ADFE provides a performance within the desirable tolerance limits. The ADFE owes its good performance to the selection of two robust recursive nonlinear filters that are capable of adapting to their environment: A bank of Echo State Networks for predictive modeling of the observables from the unknown system; and the regularized Least Mean Squares filter for the refinement of the raw driving-force estimates. As mentioned in the introduction of this article, the results reported herein leads to the anticipation that the ADFE can be considered to be a good candidate for the diagnosis of abruptly or slowly developing faults in nonlinear dynamic systems for such cases where the system model is not available. In the typical applications of artificial intelligence to the fault diagnosis problem, system outputs are predicted by the neural network, and the unexpected deviations in the output residual are used as a fault indicator [53, Chapter 9]. However, our preliminary experiments on this problem show that significantly better performance can be obtained if the input estimation residual is used as the fault indicator rather than the output residual. The rationale behind this observation lies in the fact that the input signal in control applications is typically known and deterministic, whereas the system outputs are mostly corrupted by noise. Then an adequate input estimator (e.g., the ADFE, the RBPF-AR(P), or the EM-KS-PF) can be used for estimating the control input under the normal operating conditions, and the nominal values of the estimation error can be calculated. Hence, the deviation of the input estimation error from the nominal values can be used as a fault indicator. The final results will be declared in a future article once the experiments have been completed.
Acknowledgments We greatly acknowledge Dr. M. Grasselli of McMaster University for his support and contributions to this work. Many thanks are also owed to Dr. T. Minka of Microsoft Research (Cambridge, UK) for his generous assistance on the use of the Lightspeed toolbox. Appendix A. The ADFE algorithm and its computational complexity The ADFEis presented in Algorithm 1, where the operations between the lines 1 and 6 pertain to the ESN bank. The bold face, capital letters denote (N N) matrices, and bold face, small letters denote (N 1) column vectors, N being the number of hidden units in one ESN. ρ denotes the connectivity rate in the hidden layer of the ESN, and Q denotes the number of ESNs in the bank. Line 7 denotes the averaging of the prediction error over all ESNs. Line 8 represents the noise-smoothing operation by an exponentially decaying filter. The commands between the lines 9 and 14 embody the regularized adaptive predictor which is composed of L unit delay elements.
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184
u^ n þ 1jn : one-step predicted estimates of the vector un ; and Cn þ 1jn : prediction covariance for u^ n þ 1jn .
Algorithm 1. The ADFE. Require Observables from the environment; yn ; n ¼ 1; …; T Ensure Driving-force estimates; r^ n ; n ¼ 1; …; T forðn ¼ 0; no T ; n þ þ Þf forði ¼ 1; ir Q ; iþ þ Þf
Algorithm 2. The RBPF-AR(P). Require: Observables from the environment fyn jn ¼ 1; …; T g and an analytic description of the underlying mechanism (e.g., (16)– (18)) Ensure: Conditional mean estimates of the linear state vector; u^ njn
ðiÞ ðiÞ in;ðiÞ yn þ ϕðiÞ 1: sðiÞ n þ 1 ¼ tanhðW sn þ w n þ 1Þ
2: 3:
zðiÞ nþ1 eðiÞ nþ1
¼ ðwout;ðiÞ ÞT sðiÞ n nþ1 ¼ yn þ 1 zðiÞ nþ1
5: 6:
for n ¼ 1; …; T Initialize particles, i ¼ 1; …; M; xðiÞ 0j 1 pðx0 Þ and
ðiÞ τ 1 PðiÞ n sn þ 1
ðiÞ
4: kn þ 1 ¼
ðiÞ ðiÞ set uðiÞ 0j 1 ¼ u0 , and C0j 1 ¼ C0
T ðiÞ ðiÞ 1 þ τ 1 ðsðiÞ n þ 1 Þ Pn sn þ 1 ðiÞ out;ðiÞ ðiÞ wout;ðiÞ ¼ w þ k n n þ 1 en þ 1 nþ1 ðiÞ T ðiÞ 1 ðiÞ 1 ðiÞ Pn þ 1 ¼ τ Pn τ kn þ 1 ðsðiÞ n þ 1 Þ Pn
forðn ¼ 0; nr T ; n þ þ Þf forði ¼ 1; ir M; i þ þ Þf 1: Evaluate the importance weights wðiÞ n according to the likelihood
g
ðiÞ 2 function pðyn jxðiÞ n Þ ¼ N ðhðxn Þ; sυ Þ and normalize
ð1=Q Þ∑Qi¼ 1 eðiÞ nþ1
7: en þ 1 ¼ 8: r n þ 1 ¼ ð1 ΓÞen þ 1 þ Γrn ðLÞ T T 9: r^ n þ 1 ¼ aTn rn where an 9 ½að1Þ n ; …; an and rn 9 ½r n ; …; r n L þ 1 10: αn þ 1 ¼ r n þ 1 r^ n þ 1 11: βn þ 1 ¼ r^ n þ 1 r^ n 12: an þ 1 ¼ an þ μn ½αn þ 1 λβn þ 1 rn
ðjÞ ~ ðiÞ ¼ wðiÞ =∑M w j ¼ 1 wn
ðjÞ ~ ðjÞ 2: Resample M particles with replacement, PðxðiÞ njn ¼ xnjn 1 Þ ¼ w n
3: Linear state prediction and linear state prediction covariance: ðiÞ ðiÞ u^ n þ 1jn ¼ Bu^ njn
Cn þ 1jn ¼ BCnjn BT þ R 4: For i ¼ 1; …; M, New particle prediction
13: μn þ 1 ¼ μn þ γ½αn þ 1 λβn þ 1 ψ Tn rn 14: ψ n þ 1 ¼ ½I ð1 þ λÞμn rn rTn ψ n þ ½αn þ 1 λβn þ 1 rn g
ðiÞ
ðiÞ ðiÞ 2 ^ xðiÞ n þ 1jn pðxn þ 1jn jxn Þ ¼ N ðgðxn Þ þ bu njn ; bCnjn b þ sω Þ
The corresponding computational complexity of the algorithm is calculated in terms of flops in [42]. For the sake of brevity, only the final expression is given in (A.1). A flop is defined as one addition, subtraction, multiplication or division of two floating-point numbers. In (A.1), κ 1 and κ 2 indicate the number of flops per division and exponential operations respectively. A neutral flop count is estimated to be κ1 ¼ 8 for the division4 and κ 2 ¼ 20 for the exp function using the C source code [50]. The order of instructions in the derivation of Algorithm 1 is optimized in such a way that the computationally least expensive solution is obtained. Even without this optimization, the rapidity of growth of the algorithm0 s complexity would not exceed ρN2 for a single ESN. Then we conclude that the total complexity of the ADFE is OðρQN2 Þ taking into account that ρQN2 b L. That is CADFE ðρ; Q ; N; LÞ ¼ ½ð2ρ þ 5ÞN2 þ ð2κ 1 þκ 2 þ12 ρÞN þ 1Q þ 11L þ 10 þ κ1 :
181
ðA:1Þ
Appendix B. The RBPF-AR(P) Algorithm and its computational complexity The overall implementation of the Rao-Blackwellized particle filter with the augmented state is provided in a pseudo-code in Algorithm 2, where the nonlinear state is estimated with a standard particle filter. Hence, the importance function is set equal to the transition prior, and the systematic resampling is used at each iteration. The filter0 s computational complexity is derived from Algorithm 2 and elaborated in the supporting material [42]. Before explaining the steps of Algorithm 2, let us introduce the following notation for the estimated quantities, which will be commonly used in the sequel: u^ njn : filtered estimates of the vector un ; and Cnjn : filtering covariance for u^ njn , 4 Since most processors can do an addition, comparison, or multiplication in a single cycle, those are all counted as one flop. But division always takes more than one cycle.
T
ðiÞ ðiÞ Pseudo-measurement prediction mðiÞ n þ 1 ¼ xn þ 1 gðxn Þ T
5: Ωn þ 1 ¼ bCn þ 1jn b þ s2ω {Innovation covariance} T
6: Υ n þ 1 ¼ Cn þ 1jn b Ωnþ11 {Kalman gain} 7: Updated linear state estimate and updated linear state covariance: ðiÞ ðiÞ ^ ðiÞ u^ n þ 1jn þ 1 ¼ u^ n þ 1jn þ Υ n þ 1 ðmðiÞ n þ 1 bu n þ 1jn Þ
Cn þ 1jn þ 1 ¼ Cn þ 1jn Υ n þ 1 Ωn þ 1 Υ Tn þ 1 g 8: Desired estimates of linear state: ^ ðiÞ u^ njn ¼ ð1=MÞ∑M i ¼ 1 u njn {Importance weights have already been normalized} g
Now let us have a closer look at the steps of Algorithm 2. The first and the second tabs in line 3 of Algorithm 2 are the Kalman prediction equations, where R stands for the P P covariance matrix for the evolution uncertainty for the linear state vector, whose only nonzero entry is its first element, s2ζ . For instance, letting P ¼ 2, we have " # s2ζ 0 R¼ : ðB:1Þ 0 0 The first tab of the line 4 denotes the sampling of particles from the transition prior, whereas the second tab of line 4 explains the derivation of the pseudo-measurements obtained from the nonlinear state equation. The Kalman gain vector at time n is denoted Υ n at line 6, which is a P 1 column vector. Line 7 describes the Kalman measurement update equations. The estimate of the linear state vector associated with ðiÞ the ðiÞ th particle is denoted u^ njn as given in the first tab of line 7. Cnjn in the second tab of line 7 denotes the P P ðiÞ covariance matrix for the estimation error for u^ njn . Note that each particle is associated with a Kalman filter. However, the same covariance matrix is propagated for all Kalman filters, since the measurement equation is independent of the linear state vector. This greatly reduces the complexity of the RBPF-AR(P). Finally, the desired conditional mean estimates of the linear state vector are obtained by simply averaging over all
182
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184
þ2P þ ς2 þ ς3 þ ðP þ 1Þς6 1:
particles as observed in line 8 as the importance weights have already been normalized. The filtered driving-force estimate, u^ n equals the first element of the vector u^ njn . The complexity of the RBPF-AR(P) algorithm is studied in [54] and illustrated to be OðP3 Þ. For the sake of generality, the authors of [54] analyzed such cases where both parts of the state model are linear, and the nonlinearity is confined only to the measurement model. Building on the ideas developed in [54], we have separately studied the complexity of the RBPF-AR(P) for the augmented model in (16)–(18). Following from the detailed derivation in the supporting document [42], the final flop count expression for the RBPF-AR(P) is given in (B.2). Once again, we made use of the neutral flop count estimates for the operations such as exp , square root, log, and random number generation. The operations other than multiplication and addition in (B.2) are the following: ς1 : evaluation and normalization of importance weight per particle. ς2 : resampling for all particles. Since the nonlinear state is one-dimensional, the Cholesky factorization is reduced to the calculation of a square root. So, ς3 ¼ sqrt. ς4 : random number generation per particle, ς5 : evaluation of the nonlinear mapping, gðÞ. ς6 denotes division. Then
Appendix C. The EM-KS-PF algorithm and its computational complexity The following notation is introduced in Algorithm 3 for the smoothed estimates that are produced by the Kalman smoother which is an integral part of the EM-KS-PF algorithm: u^ njT : fixed-interval smoothed estimates of the vector un and Cn þ 1jT : smoothing covariance for u^ njT . The fixed-interval smoothing implies that all the data available from time 1 to T are utilized to estimate the vector un . The entire realization of the EM-KS-PF is given in a pseudo-code in Algorithm 3. The total number of EM iterations in Algorithm 3 is denoted by K. Although there does not exist an analytic way of finding the optimum value of K, in practice we have observed that more than 4–6 EM iterations do not lead to a significant improvement in the estimates of un. These results match with the observations of other researchers in the literature [25,26].
CRBPF ðM; PÞ ¼ Mð2P2 þ 5Pþ ς1 þς4 þς5 þ4Þ þ 4P3 þ5P2 Algorithm 3. The EM-KS-PF.
Require Observables from the environment fyn jn ¼ 1; …; T g and an analytic description of the system (e.g., (24) and (25)) Ensure Estimates of the missing parameters Θ and the smoothed conditional mean estimates of the driving-force; u^ n for n ¼ 1; …; T ðiÞ ðiÞ ðiÞ Initialize particles, i ¼ 1; …; M; xðiÞ 0j 1 pðx0 Þ and set u0j 1 ¼ u0 , and C0j 1 ¼ C0
Initialize the estimate for the missing parameters: Θ½0 for ðk ¼ 0; k o K; k þ þ Þf forðn ¼ 0; n r T ; nþ þ Þf forði ¼ 1; i r M; iþ þ Þf ðiÞ ðjÞ ~ ðiÞ ¼ wðiÞ =∑M 1: Evaluate the importance weights wðiÞ n ¼ pðyn jxn Þ and normalize w j ¼ 1 wn ðjÞ ~ ðjÞ 2: Resample M particles, PðxðiÞ njn ¼ xnjn 1 Þ ¼ w n
3: E-STEP (Kalman smoother): 1. Forward Recursions: State prediction and state prediction covariance: ðiÞ ðiÞ u^ n þ 1jn ¼ B½k u^ njn
Cn þ 1jn ¼ B½k Cnjn ðB½k ÞT þ R ½k ðiÞ
ðiÞ ðiÞ 2 ^ Particle prediction xðiÞ n þ 1jn pðxn þ 1jn jxn Þ ¼ N ðgðxn Þþ bu njn ; bCnjn b þ sω Þ
Measurement prediction
mðiÞ nþ1
ðiÞ ¼ xðiÞ n þ 1 gðxn Þ
T
ζ n þ 1 ¼ bCn þ 1jn b þ s2ω {Innovation covariance} Ωn þ 1 ¼ Cn þ 1jn b ζ nþ11 {Kalman gain} Updated state estimate and state covariance: T
ðiÞ ðiÞ ^ ðiÞ u^ n þ 1jn þ 1 ¼ u^ n þ 1jn þ Ωn þ 1 ðmðiÞ n þ 1 bu n þ 1jn Þ
Cn þ 1jn þ 1 ¼ Cn þ 1jn Ωn þ 1 ζ n þ 1 ΩTn þ 1 gg 2. Backward Recursions: forðn ¼ T ; nZ 0; n Þf forði ¼ 1; i r M; i þ þ Þf Pn ¼ Cnjn ðB½k ÞT Cnþ11jn ðiÞ ðiÞ ðiÞ ðiÞ u^ njT ¼ u^ njn þ Pn ðu^ n þ 1jT u^ n þ 1jn Þ
CnjT ¼ Cnjn þ Pn ðCn þ 1jT Cn þ 1jn Þ g ^ ðiÞ u^ njT ¼ ð1=MÞ∑M i ¼ 1 u njT {Desired estimates} ^ ðiÞ u^ n 1jT ¼ ð1=MÞ∑M i ¼ 1 u n 1jT E½u^ n ðu^ n ÞT ¼ CnjT þ u^ njT ðu^ njT ÞT E½u^ n 1 ðu^ n 1 ÞT ¼ Cn 1jT þ u^ n 1jT ðu^ n 1jT ÞT E½u^ n ðu^ n 1 ÞT ¼ Pn 1 CnjT þ u^ njT ðu^ n 1jT ÞT g
ðB:2Þ
T
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184 4: M-STEP: T
ðc½k þ 1 ÞT ¼ ∑ E½un ðu^ n 1 ÞT jm;Θ½k þ 1 s2;½k ¼ ξ
1 T
n¼1 T
∑
T
∑ E½u^ n 1 ðu^ n 1 ÞT jm;Θ½k
183
1
n¼1
h i h i E ðun Þ2 2ðc½k ÞT ðE½un ðu^ n 1 ÞT ÞT þ ðc½k ÞT E u^ n 1 ðu^ n 1 ÞT c½k jm;Θ½k
n¼1
þ 1 into the first element of R ½k þ 1 Insert ðc½k þ 1 ÞT into the first row of B½k þ 1 and s2;½k ξ
g
The computational cost required for the implementation of the EM-KS-PF algorithm is described (C.1), which is again elaborated in the supporting document [42]. Then CEMKSPF ðM; P; KÞ ¼ K½10P3 þ 13P2 þð4P2 þ 6P þ c1
[21]
[22]
þ ς4 þ ς5 þ4ÞM þ ς2 þ ς3 þ ðP þ 1Þς6 þς7 1: ðC:1Þ
[23]
where ½ς1 ; …; ς6 are the same as in (B.2) and ς7 denotes the flop count for the inversion of a P P matrix.
[24]
References
[25]
[1] U. Güntürkün, Sequential reconstruction of driving-forces from nonlinear nonstationary dynamics, Phys. D: Nonlinear Phenom. 239 (13) (2010) 1095–1107, http://dx.doi.org/10.1016/j.physd.2010.02.014. ISSN: 01672789. [2] U. Güntürkün, Toward the development of radar scene analyzer for cognitive radar, IEEE J. Ocean. Eng. (Special Issue on Non-Rayleigh Reverberation and Clutter) 35 (2) (2010) 303–313, http://dx.doi.org/ 10.1109/JOE.2010.2043378. ISSN: 0364-9059. [3] Y. Guan, M. Saif, A novel approach to the design of unknown input observers, IEEE Trans. Autom. Control 36 (1991) 632–635. [4] M. Darouach, M. Zasadzinski, S. Xu, Full-order observers for linear systems with unknown inputs, IEEE Trans. Autom. Control 39 (1994) 606–609. [5] M. Hou, R. Patton, Input observability and input reconstruction, Automatica 34 (1998) 789–794. [6] R. Mehra, J. Peschon, An introduction approach to fault detection and diagnosis in dynamic systems, Automatica 7 (1971) 637–640. [7] S. Gillijns, B.D. Moor, Unbiased minimum-variance input and state estimation for linear discrete-time systems, Automatica 43 (2006) 111–116. [8] M. Hou, R. Patton, Optimal filtering for systems with unknown inputs, IEEE Trans. Autom. Control 43 (1998) 445–449. [9] A. Krener, W. Respondek, Nonlinear observers with linearizable error dynamics, SIAM J. Control Optim. 23 (1985) 197–216. [10] X. Xia, W. Gao, Nonlinear observer design by observer error linearization, SIAM J. Control Optim. 27 (1989) 199–216. [11] F. Zhu, Z. Han, A note on observers for Lipschitz nonlinear systems, IEEE Trans. Autom. Control 47 (2002) 1751–1754. [12] M. Boutayeb, M. Darouach, H. Rafaralahy, Generalized state-space observers for chaotic synchronization and secure communication, IEEE Trans. Circuits Syst. 49 (2002) 345–349. [13] M. Arcak, P. Kokotovic, Nonlinear observers: a circle criterion design and robustness analysis, Automatica 37 (2001) 1923–1930. [14] G. Pillonetto, M. Saccomani, Input estimation in nonlinear dynamical systems using differential algebra techniques, Automatica 42 (2006) 2117–2129. [15] G. Kitagawa, A self-organizing state-space model, J. Am. Stat. Assoc. 93 (1998) 1203–1215. [16] G. Storvik, Particle filters for state-space models with the presence of unknown static parameters, IEEE Trans. Signal Process. 50 (2002) 281–289. [17] J. Liu, M. West, Combined Parameter and State Estimation in Simulation-Based Filtering, Sequential Monte Carlo in Practice, Springer-Verlag, New York, 2001. [18] B. Zhang, M. Chen, D. Zhou, Chaotic secure communication based on particle filtering, Chaos Solitons Fractals 30 (2006) 1273–1280. [19] B. Zhang, M. Chen, D. Zhou, Z. Li, Particle-filter-based estimation and prediction of chaotic states, Chaos Solitons Fractals 32 (2007) 1491–1498. [20] M. Orchard, G. Vachtsevanos, A particle filtering-based framework for real-time fault diagnosis and failure prognosis in a turbine engine, in:
[26]
[27] [28] [29] [30] [31]
[32]
[33]
[34] [35] [36] [37]
[38]
[39]
[40]
[41]
[42]
[43]
Mediterranean Conference on Control Automation, 2007, MED 0 07, 2007, pp. 1–6. http://dx.doi.org/10.1109/MED.2007.4433871. N. de Freitas, Rao-Blackwellised particle filtering for fault diagnosis, in: Aerospace Conference Proceedings, 2002, IEEE, vol. 4, 2002, pp. 41767–4-1772. http://dx.doi.org/10.1109/AERO.2002.1036890. L. Yan, S. Duoqing, K. Liang, Rao-Blackwellized particle filtering for fault detection and diagnosis, in: 2010 29th Chinese Control Conference (CCC), 2010, pp. 3870–3875. H. Fang, R.A. de Callafon, J. Cortes, Simultaneous input and state estimation for nonlinear systems with applications to flow field estimation, Automatica 49 (9) (2013) 2805–2812. C. Andrieu, A. Doucet, Online expectation-maximization type algorithms for parameter estimation in general state space models, in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP0 03) 6, 2003, pp. 69–72. A. Zia, T. Kirubarajan, J. Reilly, D. Yee, K. Punithakumar, S. Shirani, An EM algorithm for nonlinear state estimation with model uncertainties, IEEE Trans. Signal Process. 56 (2008) 921–936. A. Zia, J. Reilly, J. Manton, S. Shirani, An information geometric approach to ML estimation with incomplete data, IEEE Trans. Signal Process. 55 (2007) 3975–3986. J. Eckmann, S. Kamphorst, D. Ruelle, Recurrence plots of dynamical systems, Europhys. Lett. 4 (1987) 973–977. M. Casdagli, Recurrence plots revisited, Physica D 108 (1997) 12–44. T. Schreiber, Detecting and analysing nonstationarity in a time series with nonlinear cross-predictions, Phys. Rev. Lett. 78 (1997) 843. T. Schreiber, Interdisciplinary application of nonlinear time series methods, Phys. Rep. 308 (1999) 1–64. P.F. Verdes, P.M. Granitto, H.D. Navone, H.A. Ceccatto, Nonstationary time-series analysis: accurate reconstruction of driving forces, Phys. Rev. Lett. 87 (2001) 124101. M. Szeliga, P. Verdes, P. Granitto, H. Ceccatto, Extracting driving signals from non-stationary time series, in: Proceedings of Brazilian Symposium on Neural Networks (SBRN0 97), Pernambuco, Brazil, 2002, pp. 104–108. M. Szeliga, P. Verdes, P. Granitto, H. Ceccatto, Artificial neural network learning of nonstationary behavior in time series, Int. J. Neural. Syst. 13 (2003) 103–109. M. Szeliga, P. Verdes, P. Granitto, H. Ceccatto, Modeling nonstationary dynamics, Physica A 197 (2003) 190–194. P. Verdes, P. Granitto, H. Ceccatto, Overembedding method for modeling nonstationary systems, Phys. Rev. Lett. 118701 (2006) 1–4. T. Moon, The expectation-maximization algorithm, IEEE Signal Process. Mag. 13 (1996) 47–60. N. Gordon, D. Salmond, A. Smith, Novel approach to nonlinear/nonGaussian Bayesian state estimation, IEE Proc.—Radar Sonar Navig. 140 (1993) 107–113. M. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking, IEEE Proc. Signal Process. 50 (2) (2002) 174–188. H. Jaeger, H. Haas, Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication, Science 304 (2004) 78–80. H. Jaeger, The echo state approach to analysing and training recurrent neural networks, GMD Report 148, Fraunhofer Institute for Autonomous Intelligent Systems (AIS), Schloss Birlinghoven 53757 Sankt Augustin, Germany, Dec. 2001. H. Jaeger, Adaptive nonlinear system identification with echo state networks, in: D. Derickson (Ed.), Advances in Neural Information Processing Systems, vol. 15, MIT Press, Cambridge, MA, 2003, pp. 593–600. U. Güntürkün, J.P. Reilly, T. Kirubarajan, H. de Bruin, Supporting Document, Website, 2011, URL: 〈http://www.ipcsl.org/publications/ ADFESupportingDocument.html〉. U. Güntürkün, Recursive estimation of driving-forces from nonlinear nonstationary systems with unknown dynamics (Ph.D. thesis), McMaster University, Hamilton, ON, September 2010. URL: 〈http:// www.ipcsl.org/dissertation/DissertationofUlasGunturkun.html〉.
184
U. Güntürkün et al. / Signal Processing 99 (2014) 171–184
[44] A. Papoulis, Predictable processes and Wold0 s decomposition: a review, IEEE Trans. Acoust. Speech Signal Process. 33 (1985) 933–938. [45] C. Andrieu, A. Doucet, Particle filtering for partially observed Gaussian state space models, J. R. Stat. Soc. B 64 (2000) 827–836. [46] G. Hendeby, R. Karlsson, F. Gustafsson, A new formulation of the Rao-Blackwellized particle filter, in: IEEE/SP 14th Workshop on Statistical Signal Processing, 2007, SSP ’07, 2007, pp. 84–88. http:// dx.doi.org/10.1109/SSP.2007.4301223. [47] T. Schön, F. Gustafsson, P. Nordlund, Marginalized particle filters for mixed linear/nonlinear state-space models, IEEE Trans. Signal Process. 53 (2005) 2279–2289. [48] A. Dempster, N. Laird, D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.: Ser. B 39 (1977) 1–38. [49] P. Tichavsky, C. Muravchik, A. Nehorai, Posterior Cramer–Rao lower bounds for discrete-time nonlinear filtering, IEEE Trans. Signal Process. 46 (1998) 1386–1396.
[50] T. Minka, The Lightspeed Matlab Toolbox, 〈http://research.microsoft. com/en-us/um/people/minka/software/lightspeed/〉, 2009. [51] G. Kitagawa, Monte Carlo filter and smoother for non-Gaussian nonlinear state space models, J. Comput. Graph. Stat. 5 (1) (1996) 1–25. [52] J. Hol, T. Schön, F. Gustafsson, On resampling algorithms for particle filters, in: Proceedings of IEEE Nonlinear Statistical Signal Processing Workshop, Cambridge, UK, 2006. [53] J. Chen, R. Patton, Robust Model-Based Fault Diagnosis for Dynamic Systems, Kluwer Academic Publishers, Norwell, Massachusetts, USA, 1999. [54] R. Karlsson, T. Schön, F. Gustafsson, Complexity analysis of the marginalized particle filter, IEEE Trans. Signal Process. 53 (2005) 4408–4411.