Computers and Structures 85 (2007) 1257–1263 www.elsevier.com/locate/compstruc
Kalman filtering for neural prediction of response spectra from mining tremors Agnieszka Krok, Zenon Waszczyszyn
*
Cracow University of Technology, Institute of Computational Civil Engineering, Warszawska 24, 31-155 Krako´w, Poland Received 14 December 2005; accepted 21 November 2006 Available online 30 January 2007
Abstract Acceleration response spectra (ARS) for mining tremors in the Upper Silesian Coalfield, Poland are generated using neural networks trained by means of Kalman filtering. The target ARS were computed on the base of measured accelerograms. It was proved that the standard feed-forward, layered neural network, trained by the DEFK (decoupled extended Kalman filter) algorithm is numerically much less efficient than the standard recurrent NN learnt by Recurrent DEKF, cf. [Haykin S, (editor). Kalman filtering and neural networks. New York: John Wiley & Sons; 2001]. It is also shown that the studied KF algorithms are better than the traditional Resilient-Propagation learning method. The improvement of the training process and neural prediction due to introduction of an autoregressive input is also discussed in the paper. 2007 Elsevier Ltd. All rights reserved. Keywords: Acceleration response spectrum; Mining tremor; Neural networks; Kalman filtering; Autoregressive input
1. Introduction Response spectra caused by paraseismic excitations (in the paper only mining tremors are analyzed) are used for the design of buildings in mining regions, as well as for evaluation of damage resistance of actual buildings [2]. The monitoring of paraseismic excitation can be difficult or even impossible on a greater number of buildings. Thus, either standardized design response spectra or average response spectra, computed on the base of earlier measured accelerograms at the buildings, are used, cf. reference [3]. That is why artificial neural networks (ANNs) have recently been applied in order to predict (generate) more exactly response spectra related to individual buildings. In recent years ANNs have been introduced to predict response spectra on the basis of the existing experimental evidence [4,5] or generated pseudo-empirical accelerograms (computer simulations) [6]. An attempt to predict ARS in *
Corresponding author. Tel.: +48 12 628 2546; fax: +48 12 628 2034. E-mail addresses:
[email protected] (A. Krok),
[email protected]. edu.pl (Z. Waszczyszyn). 0045-7949/$ - see front matter 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.compstruc.2006.11.029
two Polish mining regions was carried out in [3]. Corresponding ANNs were designed on the base of accelerograms of the surface waves, measured on the ground level at selected buildings, for known values of epicentre distances and energy of mining tremors. The analysis of this problem was presented in [7], where Kalman filtering (KF) was introduced as a refined learning method of multilayer perceptron. KF was formulated in 1960’s as an algorithm for the analysis of linear discrete stochastic processes [8]. The introduction of KF as a batch-oriented, second order learning method opened the door for formulating efficient, mathematically well grounded ANNs, cf. [1,9]. They were applied in the late 1980’s in automatics as a very promising tool for the analysis of non-linear stochastic processes, in diagnostics, prediction and identification of damage in control systems [10]. A numerical version of KF was applied in solid mechanics to the analysis of inverse problems related to the parametric identification of material constitutive equations and crack propagation, cf. also references in [11]. In these papers the sensitivity analysis theory was used to linearized
1258
A. Krok, Z. Waszczyszyn / Computers and Structures 85 (2007) 1257–1263
KFs. In [12] KF was used for the learning of a multilayer perceptron which was applied in structural dynamics for the identification of structural parameters. Modified versions of KFs, used as learning methods, enable the formulation of ANNs of not only high accuracy of neural prediction but also new types of networks for the analysis of problems unsolvable by traditional ANNs [13]. The present paper is a development of [3,7]. Besides DEKF (decoupled extended Kalman filter), its modification RDEKF (recurrent DEKF) is applied. The later modification is coupled with a multilayer version of the Elman NN [14]. An autoregressive (time-delayed) input is used and the training and testing patterns are based on the records measured in the Upper Silesian Coalfield, Poland. The acceleration response spectra (ARS), obtained by means of RDEKF, are compared with ARS computed by networks learnt by the DEKF algorithm. ARS computed either by the standard feed-forward or recurrent layered neural networks with the autoregressive input, trained by the standard Resilient-Propagation (Rprop) learning method, are also presented.
2. Neural Kalman filters 2.1. Network architectures Basic equations and algorithms of Kalman filtering, used for ANN training, are modified depending on the NN architecture. Following [1] we investigate two architectures, corresponding to the standard feed-forward, layered network FLNN (frequently called multilayer perceptron [9]) and the recurrent multilayered network RLNN (Elman type network [14]), cf. Fig. 1a and b, respectively.
2.2. Kalman filter equations for RLNN Extended KF is based on two equations: process Eq. (1) and measurement Eq. (2), modified for using in RLNN into the following form, cf. [1]: fwi ðk þ 1Þ; vi ðk þ 1Þg ¼ fwi ðkÞ; vi ðkÞg þ xðkÞ;
ð1Þ
yðkÞ ¼ hðwðkÞ; vðkÞ; xðkÞÞ þ mðkÞ;
ð2Þ
where k is the discrete pseudo-time parameter; i, the number of neuron in ANN; wðkÞ ¼ fwi ðkÞ, vi ðkÞ j i ¼ 1; 2; . . . ; ng, the state vector (one-column matrix) corresponding to the set of vectors w of synaptic weights and biases, and neuron recurrent outputs v for n neurons of NN; h, the non-linear vector-function of input/output relation; x/y, the input/output vectors; xðkÞ; mðkÞ, the Gaussian process and measurement noises with mean and covariance matrices defined by E½xðkÞ ¼ E½mðkÞ ¼ 0; E½xðkÞxT ðlÞ ¼ QðkÞdkl ;
E½mðkÞmT ðlÞ ¼ RðkÞdkl :
ð3Þ
Fig. 1. Multilayer NNs with an autoregressive input yk1: (a) network FLNN with only feed-forward transmission of signals and (b) recurrent network RLNN with internal time-delay connection.
The name ‘Extended’ KF (EKF) is used because of the introduction of vector-function h in the measurement Eq. (2), in which a non-linear output–input relation is considered. In case of the feed-forward network FLNN the state vector is composed only of the weight/biases vector w and the non-linear vector function corresponds to h ¼ hðwðkÞ; xðkÞÞ so the vector v is omitted and KF equations are simpler than those in Eqs. (1) and (2). 2.3. Algorithm RDEKF The numerical efficiency of KL algorithms induces decoupling them to the state vector groups [1]. From among different possibilities the decoupling level was related to the individual NN neurons (nodes i ¼ 1; 2; . . . ; nÞ. On such a background the decoupled extended Kalman filter (DEKF) was formulated for the network FLNN. The algorithm RDEKF (recurrent DEKF), formulated for the recurrent RLNN, takes the following form on the base of Eqs. (1) and (2): AðkÞ ¼ ½RðkÞ þ
g X
1
T Hrec ðkÞPi ðkÞ ; i
i¼1
Ki ðkÞ ¼ Pi ðkÞHrec i ðkÞAðkÞ; eðkÞ ¼ yðkÞ y^ðkÞ;
ð4Þ
f^ wi ðk þ 1Þ; vi ðk þ 1Þg ¼ f^ wi ðkÞ; vi ðkÞg þ Ki ðkÞeðkÞ; T Pi ðkÞ þ Qi ðkÞ; Pi ðk þ 1Þ ¼ Pi ðkÞ KðkÞHrec i
where Ki(k) is the Kalman gain matrix; Pi(k), the approximate error covariance matrix; eðkÞ ¼ yðkÞ ^yðkÞ, the error vector; where y(k) is the target vector for the kth presenta^ i ðkÞ and ^yi ðkÞ are a posteriori tion of a training pattern; w estimates of weight vector and output vector. An essential problem of RDEKF algorithm is the computation of matrix Hrec of current linearization i Hrec i ¼ ohi ðk; w; vÞ=ofwi ; vi g;
ð5Þ
A. Krok, Z. Waszczyszyn / Computers and Structures 85 (2007) 1257–1263
1259
^ i ðkÞ; ^vi ðkÞ a priori estimators. computed at f^ wi ; ^vi g, where w The linearization in the present paper was performed by the back-propagation in time procedure. Consequently, in case of the algorithm DEKF used in the network FLNN, formula (5) takes the form
the maximal acceleration; T ðtÞ½s ¼ 1=f ðtÞ, the period of vibration for natural frequency f(t) (Hz).
Hi ðkÞ ¼ ohi ðk; wÞ=owi jwi ¼^wi :
4.1. Sets of patterns and design of networks
ð5aÞ
3. Surface vibrations from mining tremors in USC region Following [3] the acceleration response spectra (ARS) are neurally simulated for known energy E and epicentral distance re of mining tremors from the measurement points on the soil level at monitoring buildings. The accelerograms of surface waves were taken from the Upper Silesian Coalfield (USC) region in Poland and corresponding values of E and re were evaluated by seismic stations placed nearby. The data were taken from [3]. They concern 145 tremors of energy E 2 ½2 104 ; 4 106 J and epicentral distance re 2 [ 0, 1200] m. In Fig. 2a one of the measured accelerograms is shown. The corresponding ARS was computed for damping parameter n = 2% using formulae known in structural dynamics cf. e.g. [5]. In Fig. 2b the dimensionless ARS is shown adopting the following definition: bðT Þ ¼ S a ðT ; E; re Þ=amax ;
ð6Þ
2
where Sa [m/s ] is the ARS computed on the base of accel2 erogram a (t) shown in Fig. 1a; amax [(m=s Þ ¼ maxt jaðtÞj,
4. Neural analysis
A set of 145 ARS was taken from [3] in discretized form bk ¼ bðT k Þ for pseudo-time parameters k ¼ 1; 2; . . . ; K and K = 198. This makes a set of P ¼ 145 198 ¼ 28710 patterns. The same training and testing sets, as those randomly selected in [3], composed of ARSL = 113 and ARST = 32 spectra, were used in the presented paper for the network learning and testing, respectively. This corresponds to L ¼ ARSL K ¼ 113 198 ¼ 22374 and T ¼ ARST K ¼ 32 198 ¼ 6336 patterns, correspondingly. The following input vector and scalar output were adopted: x ¼ fbk1 ; E; re g;
y ¼ bk bðT k Þ:
ð7Þ
The selection of the autoregressive (time-delayed) input bk1 led to significant speeding-up of the training process, unlike [3] where the corresponding input was x1 = Tk. After extensive numerical experiments the network of architecture FLNN: 3-15-1 was designed, assuming bipolar activation function in neurons of the hidden layer and identity activation function for the output. The same architecture and neurons were used in recurrent network RLNN: 3-15-1. The numbers of network parameters, corresponding to the networks FLNN and RLNN, equal 86 and 102, respectively. In case of FLNN the following noise functions were adopted: Rk ¼ 7 expððs 1Þ=50Þ; Qk ¼ 0:001 expððs 1Þ=50ÞI;
ð8Þ
where I is the unit matrices of dimension (4 · 4) for the i ¼ 1; 2; . . ., 15 neurons of the hidden layer and (16 · 16) matrix for the output; s, the number of the epoch in training process. In case of RLNN, after extensive numerical experiments, the following noise functions were found: Rk ¼ 10 expððs 1Þ=30Þ; Qk ¼ 0:01 expððs 1Þ=10ÞI
Fig. 2. (a) Accelerogram recorded on February 25, 1996 for E ¼ 2 106 J, re ¼ 740 m. (b) Dimensionless ARS.
ð9Þ
and the corresponding matrices are of dimensions (5 · 5) and (17 · 17). The inputs E and re were transformed by the function ln and then all inputs and outputs were scaled to the range [0, 1] (because of the linear output the adopted scaling was made only for numerical purposes). The training was performed by our own computer simulator written in MATLAB language and KF procedures written in C++. The training started from initial values of NN parameters randomly selected from the range [0.5, 0.5]. The
1260
A. Krok, Z. Waszczyszyn / Computers and Structures 85 (2007) 1257–1263
training process was controlled by decrease of the MeanSquared-Error (MSEV)
0.8 MSEL (s)
2
ðdp y p Þ ;
ð10Þ
p¼1
where s is the number of epoch; V = L, T, the number of patterns in the training or testing sets, respectively; dp ; y p scaled target and computed output values for the pth pattern. Stopping criteria corresponded either to the fixed number of epochs S (Early stopping) or to such a number of epochs S* which gives the training error MSELðS Þ eadm .
MSET (s) Errors MSE*103
1 MSEVðsÞ ¼ V
V X
Table 1 Errors and statistical parameters for networks RLNN and FLNN learnt by RDEKF and DEKF algorithms Networks and KF algorithms
Numbers of epochs S*, S
Errors MSEV · 103 L
T
rT
St eT
RLNN (RDEKF)
24 27 100
1.01 0.87 0.41
0.88 0.76 0.37
0.9861 0.9865 0.9874
0.2114 0.1974 0.1381
FLNN (DEKF)
188 300 500
1.00 0.94 0.87
1.17 1.10 1.02
0.9877 0.9877 0.9877
0.2445 0.2366 0.2282
Statistical parameters
0.4
RDEKF DEKF
0.2 0.1 0
4.2. Training and testing of networks RLNN and FLNN by RDEKF and DEKF algorithms
24
20
40 60 number of epochs s
80
100
6
Testing error MSET *103
The preliminary training of RLNN was performed assuming the early stopping number of epochs S = 100. The corresponding MSE errors and statistical parameters are listed in Table 1 and shown in Fig. 2a. The obtained errors are significantly smaller than those computed by the feed-forward network FLNN. In Fig. 3a it is visible that using the RDEKF algorithm in network RLNN after S* = 24 epochs the admissible error eadm ¼ 1 104 is attained. Applying network FLNN (and algorithm DEKF) S* = 188 epochs was needed to reach this error. The recurrent network enables us to speed-up significantly the iteration since at s = 27 epochs the training error was the same as for the feed-forward network at s = 300 epochs. Applying stopping criterion S = 100 the errors for RDEKF are twice smaller than those reached by DEKF at S = 500 epochs. In Fig. 3b the relation of training errors MSEL to testing errors MSET is shown. The diagonal, marked as a broken line, determines points of DMSEðsÞ ¼ MSELðsÞ MSETðsÞ ¼ 0. In case of application of algorithm RDEKF the error difference is DMSEð100Þ ¼ 1 105 and for DEKF the difference equals DMSEð500Þ ¼ 1:5 104 . In Fig. 4 there are shown selected spectra computed by the recurrent network, using RDEKF algorithm, and by feed-forward network learnt by DEKF (training spectrum No. 113 is denoted by ARS l #113 and testing spectrum
0.6
4
DEKF
2
RDEKF
0 1 8
straight line
6
MSEL
4 2 0
0
2 4 6 Training error MSEL *103
8
Fig. 3. (a) Training and testing curves for learning algorithms RDEKF and DEKF. (b) Relations of training errors MSEL and testing errors MSET for different learning methods.
ARS t #11). As can be seen that these spectra have shapes similar to the graphics of target spectra (ARS computed from measured accelerograms). It was found that nearly all the neurally computed spectra give the approximation of target curves from below if DEKF is applied. In case of the RDEKF this conclusion is valid for low values of vibration periods (in a very non-smooth part of ARS graphics). The neural predictions of ARS discussed above are completed by two new spectra Nos. 5 and 17. They are shown in Fig. 5 for the range T k 2 ½0:02; 0:308 s that corresponds to frequencies fk 2 ½3:25; 50 Hz. These ranges cover the spectra of medium height flat buildings analyzed in [15]. On the base of measurements carried out at 13 five-storey buildings of various construction the basic vibration periods were computed for the range [0.155, 0.294] s. In Fig. 5 a very good prediction of ARS taken from training set of patterns is clearly visible. Especially the graphs were made for the spectra different from those shown in Fig. 4. A similar well fitting of computed to target spectra was stated in all the other testing target spectra.
A. Krok, Z. Waszczyszyn / Computers and Structures 85 (2007) 1257–1263 ARSt#5
ARSl#113 4
3
DEKF
DEKF Beta(T)
3 Beta (T)
target ARS RDEKF
target ARS RDEKF
2
2
1
1
0 0
0
0.5 1 Vibration periods T[s]
0
0.1 0.2 Vibration periods T [s]
ARSt#17
ARSt#11 4
target ARS
target ARS
3
RDEKF
3
RDEKF
DEKF
DEKF
Beta(T )
Beta(T)
1261
2
2
1
1
0
0
0.5 1 Vibration periods T [s]
0
0.1 0.2 Vibration periods T [s]
0.3
Fig. 4. Target and neurally computed spectra for selected accelerograms from the training set No. 113 and testing set No. 11.
Fig. 5. Target and neurally computed spectra for the selected testing accelerograms Nos. 5 and 17.
4.3. Application of Rprop learning method
that the ARS computed by Rprop can give prediction from above target spectra. In Fig. 6 the neural predictions of ARS obtained in [3] by the network 3-6-12-5-1 with 179 parameters, trained by Rprop are also shown. The input vector in this network was adopted to be x ¼ fT k ; E; re g, i.e. the autoregressive input bk1 was not introduced as it was done in the present paper. It is visible that the time-delay input bk1 used in a small network (the network of architecture 3-15-1 has 76 parameters) leads to much better approximation than two times bigger network discussed in [3] but without using the autoregressive input bk1. It is worth mentioning that a bad behavior of network without the autoregressive term was also observed during the training of networks by means of Kalman filtering. A significant increase of epochs numbers or even divergence of the iteration process took place. It should be mentioned that Kalman filtering, used as a new learning method is time consuming. When applying DEKF and 100 epochs the CPU time was about 38%
In order to compare the numerical efficiency of Kalman filtering the computation was also performed by a traditional learning method. Following [3] the Rprop (Resilient-Propagation) method and MATLAB Neural Network Toolbox [16] were used to the training of the same networks FLNN and RLNN of structure 3-15-1. In case of recurrent network the program taken from [16] was modified by adding a procedure which enabled us to take into account internal time-delay connections as shown Fig. 1b. This version of the learning method is called Rprop-R. In Table 2 errors corresponding to different numbers of stopping epochs S are shown. In case of Rprop-R quite small errors were obtained at about s = 50 epochs, but then the iteration was very slowly converged. The graphics shown in Fig. 6 correspond to the graphics in Fig. 5, i.e. they are related to the spectra ARS t #5 and ARS t #17. Looking at Fig. 6 we can come to conclusions
1262
A. Krok, Z. Waszczyszyn / Computers and Structures 85 (2007) 1257–1263
Table 2 Errors and statistical parameters for networks FLNN and DLNN learnt by Rprop and Rprop-R Networks and Rprop algorithms
Numbers of epochs S
Errors MSEV · 103 L
T
rT
St eT
RLNN (Rprop-R)
50 200 1000 10000
0.84 0.54 0.42 0.40
1.067 0.744 0.464 0.454
0.9854 0.9864 0.9870 0.9877
0.2328 0.1944 0.1535 0.1519
FLNN (Rprop)
200 1000 10000
0.94 0.57 0.42
2.40 1.20 0.56
0.9329 0.9683 0.9845
0.3545 0.2509 0.1681
4
Statistical parameters
ARS t #5 target ARS
5. Final conclusions
Rprop–R Rprop
Beta (T)
3
1. Kalman filtering method of network learning enables us to increase the accuracy of neurally predicted ARS from mining tremors. 2. Formulation of RDEKF algorithm for the learning of the recurrent layered NN appears to be much more numerically efficient than using DEKF in the feed-forward layered NN. 3. Introduction of the autoregressive input (time-delayed input) significantly improves the speed-up of the training process. 4. An interesting feature of neural prediction, related to the approximation of a part of target ARS, from below and above needs further research. 5. Despite a small number of needed iterations the CPU time for the network training by the RDEKF algorithm can be higher than in case of the Rprop-R algorithm. That is why new investigations have been developed to redesign the network trained by RDEKF by means of a pruning type approach.
Rprop[10]
2
1
0 0
3.5
0.1 0.2 Vibration periods T [s] ARS t #17 target ARS
3
Rprop–R Rprop Rprop[10]
Beta (T)
The first results related to the application of a pruning type algorithm taken from [17] are very promising but it is too early to draw final conclusions. Finally, it is worth mentioning once more that the main goal, the Kalman filtering, applied to the network training, is a method mathematically well grounded. It is convergent faster than the standard approach which bases on the Rprop learning method. Moreover, when the testing results presented in Tables 1 and 2 are compared, the superiority of the RDEKF algorithm over the Rprop-R method is evident. The longer training time is not reflected in the operational time for the new ARS generation. The networks of the same size but trained by different learning methods will consume practically the same CPU time.
2
Acknowledgements Financial support by the Foundation for the Polish Science, Subsidy No. 13/2001 ’’Application of artificial neural networks to the analysis of civil engineering problems’’ is gratefully acknowledged.
1
0
0
0.1 0.2 Vibration periods T [s]
0.3
Fig. 6. ARS predicted by networks trained by Rprop learning methods using S = 10 000 epochs.
higher that the time needed to carry out 10 000 epochs using Rprop learning method. The computational time needed for the network training by KF algorithms might be significantly decreased if additional numerical algorithms were applied. Such a research type investigations have been developed by the authors.
References [1] Haykin S, editor. Kalman filtering and neural networks. New York: John Wiley & Sons; 2001. [2] Technical Recommendations for the Building Construction on Mining Areas (in Polish), ITB, Gliwice, 2000. [3] Kuz´niar K, Macia˛g E, Waszczyszyn Z. Computation of response spectra from mining tremors using neural networks. Soil Dyn Earthq Eng 2005;25:331–9. [4] Cheung K, Popplewell N. Neural network for earthquake selection in structural time history analysis. Earthq Eng Struct Dyn 1994;23:303–96.
A. Krok, Z. Waszczyszyn / Computers and Structures 85 (2007) 1257–1263 [5] Ghaboussi J, Lin ChJ. New method of generating spectrum compatible accelerograms using neural networks. Earthq Eng Struct Dyn 1998;27:377–96. [6] Lee SCh, Han SW. Neural-network-based models for generating artificial earthquakes and response spectra. Comput Struct 2002;80:1627–38. [7] Krok A, Waszczyszyn Z. Neural prediction of response spectra from mining tremors using recurrent layered networks and Kalman filtering. In: Bathe KJ, editor. Computational fluid and solid mechanics. Proc. 3rd MIT Conf., June 14–17. Amsterdam: Elsevier; 2005. p. 302–5. [8] Kalman RE. A new approach to linear filtering and prediction problems. Trans ASME Series D J Basic Eng 1960;82:34–45. [9] Haykin S. Neural networks – a comprehensive foundation. 2nd ed. Englewood Cliffs, NJ, USA: MacMillan College Publ.; 1999. [10] Korbicz J, Obuchowicz A, Ucin´ski D. Artificial neural networks – foundations and applications (in Polish). Warsaw: Akad. Ofic. Wyd. PLJ; 1994.
1263
[11] Bolzon G, Fedele R, Maier G. Parameter identification of a cohesive crack model by Kalman filter. Comput Meth Appl Mech Eng 2002;191:2847–71. [12] Perez-Ortiz A, Gers FA, Schmidhuber J. Kalman filters improved LSTM network performance in problems unsolvable by traditional recurrent nets. Neural Net 2003;16(2):17. [13] Sato T, Sato M. Structural identification using neural network and Kalman filter. JSCE 1997;14:23–32. [14] Pham DT, Liu X. Neural networks for identification, prediction and control. London: Springer-Verlag; 1995. [15] Ciesielski R, Kuz´niar K, Macia˛g E, Tatara T. Empirical formulae for fundamental natural periods of buildings with load bearing walls (in Polish). Arch Civil Eng 1992;28:291–9. [16] Neural Network Toolbox for Use with MATLAB, User’s Guide, Version 4, The Math Works Inc., Natick MA, 2001. [17] Prechelt L. Connection pruning with static and adaptive pruning schedules. Neuro-Computing 1997;16:49–61.