Response of the parameters of a neural network to pseudoperiodic time series

Response of the parameters of a neural network to pseudoperiodic time series

Physica D 268 (2014) 79–90 Contents lists available at ScienceDirect Physica D journal homepage: www.elsevier.com/locate/physd Response of the para...

7MB Sizes 0 Downloads 10 Views

Physica D 268 (2014) 79–90

Contents lists available at ScienceDirect

Physica D journal homepage: www.elsevier.com/locate/physd

Response of the parameters of a neural network to pseudoperiodic time series Yi Zhao a,∗ , Tongfeng Weng a , Michael Small b a

Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, People’s Republic of China

b

School of Mathematics and Statistics, The University of Western Australia, Crawley, WA 6009, Australia

highlights • • • •

We provide a method for identifying the dynamics of pseudoperiodic time series. The method can directly distinguish periodic dynamics from the chaotic counterparts. The method shows great advantage of robustness against strong observational noise. Both experimental and theoretical evidences give positive support to our method.

article

info

Article history: Received 9 January 2013 Received in revised form 6 August 2013 Accepted 8 November 2013 Available online 15 November 2013 Communicated by Y. Nishiura Keywords: Pseudoperiodic time series Chaos Neural network

abstract We propose a representation plane constructed from parameters of a multilayer neural network, with the aim of characterizing the dynamical character of a learned time series. We find that fluctuation of this plane reveals distinct features of the time series. Specifically, a periodic representation plane corresponds to a periodic time series, even when contaminated with strong observational noise or dynamical noise. We present a theoretical explanation for how the neural network training algorithm adjusts parameters of this representation plane and thereby encodes the specific characteristics of the underlying system. This ability, which is intrinsic to the architecture of the neural network, can be employed to distinguish the chaotic time series from periodic counterparts. It provides a new path toward identifying the dynamics of pseudoperiodic time series. Furthermore, we extract statistics from the representation plane to quantify its character. We then validate this idea with various numerical data generated by the known periodic and chaotic dynamics and experimentally recorded human electrocardiogram data. © 2013 Elsevier B.V. All rights reserved.

1. Introduction Chaos in nonlinear dynamical systems has become a widelyknown phenomenon and its presence has been found in various fields of science over decades [1–3]. Techniques for chaos identification have received substantial attention in recent years, especially with the objective of exploring chaotic dynamics from pseudoperiodic time series. Pseudoperiodic time series widely exists in nature, such as annual sunspot numbers, laser outputs, and human biomedical signals [3]. To facilitate the analysis of such pseudoperiodic time series, it is necessary to discover their underlying dynamics [4]. The available methods include the estimation of various quantitative invariants of the attractor [5,6] and time–frequency representation analysis [7]. However, computation of these quantities for noisy experimental data is not always reliable. For example, filtered noise can also mimic lowdimensional chaotic attractors [8]. Moreover, some quantitative



Corresponding author. Tel.: +86 75526035689. E-mail addresses: [email protected], [email protected] (Y. Zhao).

0167-2789/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.physd.2013.11.002

criteria, such as Lyapunov exponent and correlation dimension rely heavily on the proper phase space reconstruction, and aim to quantify the geometric information of the attractors when it is confirmed to be present [9]. In addition, there is no unique way to select accurate embedding dimension and time delay for a given time series [10]. For these reasons, Zhang et al. [11] constructed the complex network from pseudoperiodic time series through the separation of cycles as nodes, and from the perspective of the complex network regime. They were then able to identify the dynamics of corresponding time series. However the relationship between the dynamics of the time series and topology of the corresponding network constructed is still open to some interpretation. Moreover, their approach to divide cycles is based on local minima, which is quite sensitive to noise or at least an appropriate criteria for separating cycles. Later, Pukenas et al. proposed another similar algorithm for detecting pseudoperiodic dynamics [12], in which their statistical criterion was Lyapunov exponent. Cvitanović et al. summarized developments on chaos identification and consideration about cycle decomposition [13]. Surrogate data methods provide indirect evidence of nonlinear characteristics with statistical significance for pseudoperiodic

80

Y. Zhao et al. / Physica D 268 (2014) 79–90

time series. Theiler first presented the cycle shuffling algorithm to generate surrogate data for pseudoperiodic time series [14]. However, it may give spurious results in the event that time cycles are not properly separated. Dolan et al. therefore introduced a simple recurrence method to improve the cycle shuffling so as to avoid the discontinuity and instability of the surrogate data [15]. Small et al. proposed the pseudoperiodic surrogate algorithm, which first achieved surrogate data by means of the phase space construction of the original time series [16]. Luo et al. described a surrogate algorithm for continuous dynamical systems, which made use of linear stochastic dependence between the cycles of the pseudoperiodic orbits [17]. However, Shiro et al. pointed out Luo’s method may give incorrect results in some cases [18]. More recently, Nakamura [19] and Coelho et al. [20] claimed different algorithms for testing intercycle determinism in pseudoperiodic time series respectively. Like all statistical testing, the surrogate data method merely excludes the given hypothesis but cannot confirm the specific dynamics of the given data. Besides these methods involved in nonlinear time series analysis, spectrum techniques (Fourier and wavelet analysis) are also popular tools in this area [21]. However, these tools are usually used in signal processing [22], and seldom applied directly for the dynamical identification of time series due to their inherent linearity. Artificial neural networks, as a biologically inspired computation, have been widely applied for time series classification, modeling, and prediction [23–25]. They have useful ability of capturing the dynamics hidden in the data with no prior knowledge of the data structure. Furthermore, there have been relatively few attempts at applying this modeling idea to characterize the pseudoperiodic data. The work described here is motivated by the use of forecasting of a nonparametric model to detect chaotic dynamics, as proposed by Sugihara and May [26]. Sugihara and May showed that the short-term prediction is feasible and can be used to qualitatively detect the dynamical character of a system. In this paper, we show that pseudoperiodic time series can also be investigated from the parameters of a class of highly parametric models: neural networks. During the training process, the neural network updates its parameters to capture the underlying dynamics of the time series. We show how the parameters of the neural network encode the dynamics into their adjustments. A representation plane composed of certain parameters from the neural network model-fit can then be used to reveal the corresponding characteristics. We apply the multilayer neural network to pseudoperiodic time series forecasting and seek to detect their underlying dynamics. By pseudoperiodic time series we mean that they are either periodic or chaotic—in the presence of noise. Of course, there are other possibilities. Here, we only focused on the identification of these two types of dynamics through the representation plane. Specifically, noisy periodic signals correspond to the periodic representation plane and chaotic time series result in the representation planes exhibiting an irregular character. Thereby, it provides a new path toward distinguishing the chaotic time series from periodic counterparts. 2. Identification principle based on neural network architecture The basic architecture of a neural network is that some collection of input vectors, hidden neurons and then the output operator are connected via numerous interconnected pathways from input to output. The multilayer neural network we adopt comprises an input layer, a single hidden (neuron) layer, and a linear output layer. It is known that this multilayer neural network can accurately approximate any nonlinear function [27]. Consider that there are K neurons in the hidden layer and g (·) denotes the sigmoid activation function of each neuron. The input vector,

p = {p1 , p2 , . . . , pR } of which pi is a row vector, is fed to the neural network, and then the network gives its output denoted by y. W = {ωij |i = 1, 2, . . . , K , j = 1, 2, . . . , R} denotes weights associated with connections between the input and hidden layers; V = {vi |i = 1, 2, . . . , K } denotes weights associated with connections between the hidden and output layers; b = {bi |i = 1, 2, . . . , K } and b0 are the biases assigned to neurons and the output layer respectively. So the transfer function of a multilayer neural network is given by y = b0 +

K 



 R  vi g (ωij pj + bi ) .

i =1

(1)

j =1

In the weight matrix of W, the matrix element ωij reflects the connection strength between the ith neuron and jth input data. When the network adjusts its parameters to model the characteristic (such as periodicity) of the training sample, such information is expected to be encoded by these parameters. However, exactly how these parameters encode this information has never previously been addressed. In the current work, we consider probing such a process and identify an intrinsic law related to the neural network architecture. 2.1. Training process Here, we take a general training algorithm, the Levenberg Marquardt (LM) algorithm, as an example. The LM algorithm has been widely used as an optimization algorithm for dealing with nonlinear least squares minimization problems [28]. It is introduced to the feed-forward networks for improving the training speed. Compared to the Gauss–Newton method, it has an extra term to avoiding ill-conditioning, which will ensure the good performance for strongly nonlinear problems. Hence, it is generally the algorithm of choice for training feed-forward neural networks. Consider that a function f (z ) need to be minimized with the following form: f (z ) =

m 1

2 j =1

rj2 (z )

(2)

where z = (z1 , z2 , . . . , zn ) is a vector composed of the model parameters and rj is the model residual. Here, rj represents the error term in the neural network training process, reflecting the training performance. The Gauss–Newton method adjusts its parameters as follows:

∆z = [J T (z )J (z )]−1 J T (z )r (z )

(3) ∂r { ∂ zji |1

where the Jacobian matrix J defined as J (z ) = ≤ j ≤ m, 1 ≤ i ≤ n}, r (z ) = (r1 (z ), r2 (z ), . . . , rm (z )). The LM algorithm further modifies the Gauss–Newton method as follows:

∆z = [J T (z )J (z ) + µI ]−1 J T (z )r (z ).

(4)

The new parameter µ guarantees the inverse operation of expression {J T (z )J (z ) + µI } is meaningful. In addition, due to its value varying in each training iteration, it can enhance the convergence rate in the training process. 2.2. Representation plane and its response Now, we deduce the corresponding response of the defined representation plane of the neural network to the data. The treatment is equivalent for any other algorithms as such a response is intrinsic to the architecture of the neural network. Suppose that X = {xi , i = 1, 2, . . . , N } is the training data set under consideration. We use R successive data to predict the next forward (R + 1)th data. Therefore, the training data set is split into a series of successive subsets,

Y. Zhao et al. / Physica D 268 (2014) 79–90

81

Fig. 1. Free-run prediction on a periodic sine obtained by the neural network. The blue line represents the original signal and the dotted red line is the prediction (panel (a)) and its corresponding representation plane (panel (b)).

which constructs a matrix {xi,j |1 ≤ i ≤ R, 1 ≤ j ≤ N − R}, the target output vector is Op = {xj |j = R + 1, . . . , N } . Therefore, the input vector {pi |i = 1, 2, . . . , R} . is fed to the neural network, where pi is a row vector (xi,1 , xi,2 , . . . , xi,N −R ). For any neuron of the neural network under consideration, its associated parameter vector is Λ = [ω1 , ω2 , . . . , ωR , b1 , v1 , b0 ]T and we neglect the subscript denoting the neuron index. After the (k − 1)th training step, the training performance of the neural network can be expressed by Ek−1 =

N −R 1

2 i=1

e2i =

1 2

(Op − Oˆ kp−1 )2

(5)

ˆ pk−1 is the prediction on the target data Op . where O According to Eq. (4), the adaptive weight vector ∆ω = [∆ω1 , ∆ω2 , . . . , ∆ωR ] can be achieved by the following equation: ∆ω = [J T J + µI ]−1 J T Ek−1

(6)

where J = [Ck−1 · p1 , . . . , Ck−1 · pR , fk−1 , gk−1 , hk−1 ] is the Jacobian matrix with its components expressed as Ck−1 =  (∂ Ek−1 /∂( Ri=1 ωi xi,j )|j = 1, . . . , N −R), fk−1 = ∂ Ek−1 /∂ b1 , gk−1 = ∂ Ek−1 /∂v1 , hk−1 = ∂ Ek−1 /∂ b0 , and µ is an adaptive parameter that can be considered as a constant in each iteration. By examining the components of ∆ω, one intriguing result is that if the input vectors pm and pn are equal, ∆ωm and ∆ωn are also the same as follows:

∆ωm = ∆ωn =

Ck−1 · pm R 

Ck2−1 p2i

+

fk2−1

+

gk2−1

+

. h2k−1

(7)

+ µEk−1

i =1

Since pm and pn are equal, changes to the weights associated with pm and pn are the same over the whole training process. Hence, the final weight distribution for periodic data has a periodic character. As a particular case, we consider a simplified situation with just five input data {a1 , a2 , a3 , a4 , a5 } fed to the neural network. According to the LM algorithm, we obtain their associated weight adjustments ∆ω1 and ∆ω5 after the (k − 1)th training step, which are explicitly expressed by ∆ω1

= Ek−1

ca1

µ(c 2 a21 + c 2 a22 + c 2 a23 + c 2 a24 + c 2 a25 + µ + fk2−1 + gk2−1 + h2k−1 )   c 3 a22 + c 3 a23 + c 3 a24 + c 3 a25 + cfk2−1 + ch2k−1 + cgk2−1 c 3 a31 − + Ek−1 (8) µa1 µ

∆ω5

= Ek−1

ca5

µ(c 2 a21 + c 2 a22 + c 2 a23 + c 2 a24 + c 2 a25 + µ + fk2−1 + gk2−1 + h2k−1 )   c 3 a21 + c 3 a22 + c 3 a23 + c 3 a24 + cfk2−1 + ch2k−1 + cgk2−1 c 3 a35 − + Ek−1 (9) µa5 µ

where c=

∂ Ek−1 ∂(ω1 a1 + ω2 a2 + ω3 a3 + ω4 a4 + ω5 a5 )

and the other parameters are defined as before. We observe that the adjustment of weight parameters presents linear relationship to the input data in Eqs. (8) and (9). When a1 = a5 , ∆ω1 is correspondingly equal to ∆ω5 . So a pair of the same input bring about the same weight updating. If this pair of inputs remains the same, then their weight adjustments are also consistent. Eq. (7) can be regarded as the generalization of this case given any R input vectors. Normally, the training strategy of neural network is that the parameters are updated to minimize the mean square error step by step. So, if we can guarantee that the final error sequence {Op −Oˆ kp−1 } is small enough, then we obtain the optimal parameters in the representation plane. In fact, the input vectors can be considered as different trajectories calculated with different initial conditions. Then, for periodic systems, the approximately same initial values lead to the identical shape. Based on the above analysis, the periodic data can theoretically lead to the periodic representation plane. But for chaotic systems, due to the nature of sensitivity to initial conditions, the small deviation of initial values can trigger dramatically different shapes. Thereby, for input vector pi , (i = 1, 2, . . . , R), its behavior may differ significantly from the remaining input vectors pj , (j ̸= i). The striking dissimilarity among these input vectors disrupts or destroys the consistent weight adjustments, resulting in an irregular character shown on the representation plane. As a result, the structure evident on the representation plane actually reflects the feature of the learned time series generated from either periodic or chaotic systems. Therefore, we define the weight matrix as a representation plane to embody the underlying dynamics. For a fair comparison, we normalize the matrix of weight values. 3. Characterization of pseudoperiodic time series by the representation plane We first use a simple sine signal to illustrate this approach. The sine signal is generated from the interval [−150π , 150π ] with the

82

Y. Zhao et al. / Physica D 268 (2014) 79–90

Fig. 2. Free-run prediction on periodic (a) and chaotic (c) Rössler data. The period of the periodic data is 24. The blue line represents the original signal and the dotted red line is its prediction. Panel (b) and (d) give two representation planes of neural networks for the periodic and chaotic data, respectively.

step size π /8, whose period is 16. Therefore, we get 2400 periodic points, where the first 1950 points is selected as training data with the rest points for testing. Fig. 1 shows the free-run prediction of the well-trained neural network and corresponding representation plane. By free-run prediction, we mean that the future is predicted iteratively by the current and previous predicted values, in contrast to the one-step prediction. Clearly, such a plane has a periodic character, which shows the same property as the original data. Next, we validate this method with pseudoperiodic time series from the Rössler system and Logistic map in both periodic and chaotic regimes. The Rössler system is given by x˙ = −(y + z ), y˙ = x + ay, z˙ = b + z (x − c ), where a, b and c are coefficients. We generate periodic and chaotic data by setting a = b = 0.1, c = 6 and a = b = 0.1, c = 9 respectively. We select 2000 successive x-component data points, in which the first 1800 points are used as the training data and the rest is the test data. Fig. 2 presents the free-run prediction on the test data by neural networks and their representation planes. From Fig. 2, it is obvious that the representation plane of the neural network exhibits a periodic shape with the same period as the original data, while the representation plane of the neural network

for modeling the chaotic counterpart displays distinct irregular fluctuation. Therefore, the representation plane, which reflects the dynamical character hidden in the given data, shows striking distinction for periodic and chaotic data. Chaotic attractors are known to be the closure of infinite set of unstable periodic orbits (UPOs) [13,29,30]. So each subset, {xi , xi+1 , . . . , xi+R }Ni=−1R , intersects a diverse set of UPOs and thus contains segments from that set. It is impossible to make consistent adjustments to the weights associated with two or more input indexes given these successive input vectors, and thus the accumulated weight adjustments (i.e. the final weight values) appear to be irregular. On the other hand, we can say that the accumulated weight adjustments are robust against stochastic disturbance (noise) by taking periodic time series into consideration. As a result, the accumulated weight adjustments not only eliminate the possible coincidence or disturbance in the chaotic data but also enhance the regular character of the periodic data. We now examine the logistic map given by xn+1 = Cxn (1 − xn ), where C is a constant. Here we set C = 3.74 and 3.60 respectively to generate periodic and chaotic data. In the periodic case with periodicity of 5 contaminated by small observational

Y. Zhao et al. / Physica D 268 (2014) 79–90

83

Fig. 3. Free-run prediction on periodic (a) and chaotic (c) Logistic data. The blue line represents the original signal and the dotted red line is the prediction. Panel (b) and (d) are corresponding representation planes of two neural networks for the periodic and chaotic data, respectively.

4. Robustness analysis

based on the phase space reconstruction. But these criteria are sensitive to noise, and thus they may misjudge the periodic dynamics. To address the robustness of the proposed method, we add white and colored random noise to these data, and repeat the previous procedure. Fig. 5 illustrates the result of the periodic Rössler data with observational noise (SNR = 27 dB). Fig. 6 gives the free-run prediction of two neural networks and their representation planes for the periodic Logistic data contaminated by white and colored noise with SNR = 22 dB. It indicates that, given the LM training algorithm, this new method can tolerate lowlevel noise. The representation plane, therefore, can identify and further highlight periodic characteristics, which may not be perceived through the time series itself or by common quantitative criteria. Correlation dimension of the previous periodic and chaotic Rössler data calculated by Gaussian kernel algorithm [31] is 1.209 and 1.921 respectively, and correlation dimension of the same chaotic data contaminated with small noise is close to the value of periodic counterpart. So a direct comparison of correlation dimension values is rarely sufficient to differentiate the periodic time series from the chaotic counterpart.

4.1. Robustness against the observational noise

4.2. Compared to other techniques

The preceding simulated chaotic data can also be distinguished from its periodic counterparts by means of quantitative invariants

We now consider the application of the pseudoperiodic surrogate (PPS) method to the previous time series as a comparison.

noise. The signal to noise ratio (SNR) of the noisy data is 22 dB. Fig. 3 presents the free-run prediction on the test periodic and chaotic Logistic data by two neural networks and their own representation planes. As shown in Fig. 3(b), the representation plane of neural network for the periodic time series takes the periodic picture, suggesting that the representation plane can reflect the noisy periodic dynamics. Since chaotic training data usually includes various UPOs in the phase–space, the representation plane for the chaotic Logistic data thereby takes an irregular distribution, which is attributed to the subtle variation between any two input subsets. Finally, we test the utility of our approach when applied to the nonstationary data. Here, we generate a periodic nonstationary signal consisting of 2000 data points by the Sine function plus a quadratic function. We then repeat the experiment and obtain its free-run prediction as well as the representation plane in Fig. 4. Apparently periodic fluctuation shown in the representation plane implies that the considered data has periodic character, as we expected. This result further suggests that our approach is also applicable to detect the periodic dynamics for nonstationary signals.

84

Y. Zhao et al. / Physica D 268 (2014) 79–90

Fig. 4. Free-run prediction on nonstationary data (a) and its related representation plane (b). The blue line represents the original signal and the dotted red line is the prediction.

Fig. 5. Representation planes of periodic Rössler data contaminated by white (a) and colored (b) noise, and chaotic Rössler data contaminated by white (c) and colored (d) noise, respectively.

Y. Zhao et al. / Physica D 268 (2014) 79–90

85

Fig. 6. Free-run prediction on periodic Logistic data contaminated by white (a) and colored (c) noise. The blue line represents the original signal and the dotted red line is the predicted signal. Panel (b) and (d) give their corresponding representation planes.

The PPS method can test whether an observed time series is consistent with the periodic process contaminated by uncorrelated noise [16]. Hence, this algorithm generates surrogate time series that are both statistically similar to the original data and also consistent with a noise periodic or pseudoperiodic source. The process of this method is that it first reconstructs the time series in their phase space and then randomly picks up an initial point in the constructed phase space. The next point is then iteratively selected from neighbors covered by the given radius of the current point. Hence, the deviation between the original data and its PPS surrogate data is determined by this radius. When the radius increases gradually, such deviation will obliterate the fine dynamics. For example, the chaotic Rössler data can be destroyed by the small deviation in terms of the statistical measurements, while the periodic orbits have to be obliterated by the larger deviation (i.e. larger radius). Different dynamics which exist in chaotic and periodic data lead to distinct trends of their surrogates produced by the PPS method with increasing noise radius. Trends of surrogates for chaotic and periodic time series, therefore, are distinguishable [32]. In practical realizations the surrogate generation algorithm defines a probability with which the next point to be chosen in the surrogate is not the next point of the original data. The value of the

radius is deduced according to the given probability. The lower probability means the next point of the surrogate is the original next data point with higher chance (i.e. the radius is smaller), and vice versa. We apply this method to the previous periodic Rössler data added with white noise (SNR 26 dB) and colored noise (SNR 26 dB). The statistical criterion is complexity [33]. The results are presented in Fig. 7, where the hypothesis of periodic orbits is significantly rejected. That is, the periodic Rössler data with small observational noise is rejected to be periodic data with reference to the PPS method. In addition, another powerful tool for the analysis and visualization of dynamical characteristics hidden in time series is the recurrence plot (RP) method [34]. By transforming time series into the RP regime, we can measure various statistical values from the RP based on recurrence quantification analysis (RQA) [35], including the percent determinism (DET), average line length of diagonal lines (L) and the maximum diagonal line length (Lmax ). Simultaneously, an amount of dynamical invariants can also be derived from the RP [36], such as the correlation dimension, Shannon entropy and mutual information. Here, we employ the RQA to characterize the previous periodic and chaotic Logistic data. DET, L and Lmax are 0.9417, 9.0236 and 50 respectively for

86

Y. Zhao et al. / Physica D 268 (2014) 79–90

Fig. 7. The straight line is complexity of the Rössler data contaminated by white (a) and colored (b) noise; dots are the mean of complexity of 100 surrogates at the probability from 0.1 to 0.9; two dashed lines denote the means plus one standard deviation (the upper line) and the mean minus one standard deviation (the lower line); two solid lines are the maximum and minimum complexity among the 100 surrogates.

the periodic data contaminated with noise (SNR 24 dB). For the chaotic data, they are 0.9249, 9.3203 and 68 respectively. The values in two groups show no significant difference. Based on our current analysis, it may give spurious results. Meanwhile, in the transformation process, from time series to the RP, the selection of optimal threshold value is still a difficult problem. We observe that in this experiment the recurrence plots of periodic and chaotic data are obviously different, so other RQA techniques are feasible to better quantify them. Essentially, the periodic response of the representation plane to the periodic time series is intrinsic to the architecture of the multilayer neural network. Hornik showed that this universal approximation to a nonlinear function was due to the architecture of the neural network but not the special choice of activation functions [37]. Different training algorithms just determine the performance of the neural network for prediction as well as robustness. Remarkably, the momentum and adaptive gradient descent training algorithm outperforms the LM algorithm in terms of robustness against noise. We generate the periodic Rössler data with the periodicity of 30, which is then contaminated with white noise (SNR = 17 dB) and colored noise (SNR = 18 dB). We observe that with the addition of noise, the related representation plane still exhibits periodic fluctuation reflecting the periodic character of the deterministic source, as presented in Fig. 8. We further find that this new training algorithm makes the distinction between two weights associated with the same inputs disturbed by noise theoretically independent of the noise sources. Consider a periodic time series X with N measurements and period length of L divided by a time window with the window size of R(R > L)and step size of 1 to form a matrix. p1 = {x1,1 , x1,2 , . . . , x1,N −R } and pL+1 = {xL+1,1 , xL+1,2 , . . . , xL+1,N −R } that should be the same but have small deviation due to noise are two input vectors fed to the neural network. Given the momentum and adaptive gradient descent training algorithm [38], it can be observed that the final accumulated deviation of their corresponding weight adjustment associated with p1 and pL+1 is given by:

∆2 ω = ∆ω1 − ∆ωL+1

   N −R    N −R  2 1 2 ∂ 21 ei ei  ∂ 2   i=1 i=1 = ∆ − .   ∂ω1 ∂ωL+1

(10)

We substitute ∂ ei /∂ω1 and ∂ ei /∂ωL+1 with the explicit expansion, and Eq. (10) is then expressed as

 ∆2 ω =

N −R  i =1

 ∂ ei

 

ei ∆   





R 

ωj xj,i

   (x1,i − xL+1,i )  

j =1

=

N −R  i =1

∂ ei

ei

 ∂

R 

 εi

(11)

ωj xj,i

j=1

where ei is the prediction error, ωj represents weights between the jth input vector and one neuron, and εi is the deviation between the two input row vectors. When the network predicts the data well, the prediction error can be made small. The derivative with respect to the input variables multiplied by weights is approximately constant as f (·) can be considered as a linear function within the region of interest. Deviation between these two row vectors is assumed to be random, and hence the sum is close to zero. So the deviation introduced by noise takes little influence to ∆2 ω. Certainly, if the noise level becomes larger, it will be a big challenge for accurate prediction. 4.3. Robustness against the driving noise Besides the observational noise, various real data are usually contaminated by driving noise [39]. Until now, it is still a great challenge to correctly detect the underlying dynamics of these data. To test whether our method can address this problem, we add a Gaussian noise term to the Rössler x-component equation and generate periodic data coupled with the driving noise. The noise level is 40dB. The predicted result and the associated representation plane are shown in Fig. 9, where the representation

Y. Zhao et al. / Physica D 268 (2014) 79–90

87

Fig. 8. Free-run prediction on periodic data with white noise (a) and colored noise (c). The blue line represents the original signal and the dotted red line is the prediction. Panel (b) and (d) present corresponding representation planes.

plane presents apparently periodic character, correctly reflecting the underlying dynamics of the considered signal. The finding suggests that the described method is feasible to identify the periodic dynamics with weak dynamical noise. 4.4. Quantitative analysis of the representation plane We follow the spirit of the RQA approach to extract statistics from the representation plane. Consider that the representation plane is denoted by a weight matrix W = {ωij |i = 1, 2, . . . , K , j = 1, 2, . . . , R}, where K represents the total number of neurons in the hidden layer and R is the size of input vector. We give three statistical measurements to quantify the representation plane, specially, the average weight value  R ,K 1 AW = RK i,j=1 ωij , the average row vector similarity AR = 1 R

R

i=1

in both periodic and chaotic regimes. The AW, AR and AC values are 0.22, 1 and 0.15 respectively for the periodic data while for the chaotic data they correspondingly are 0.07, 0.88 and 0.08. These measurements provide us with a quantitative analysis of the representation plane, and can unfold the intrinsic difference between two representation planes for periodic and chaotic dynamics. In addition, we examine the robustness of the proposed statistics against noise. Fig. 10 shows the AR and AC values versus noise levels for the previous Logistic data. Again, AC and AR give good quantification of the representation plane and distinguish the periodic representation plane from the chaotic counterpart under noise disturbance. These results further indicate that the representation plane captures the dynamics of the given time series. Note that, we omit the corresponding results for the AW measure since it is not robust to noise.

j =K

maxj̸=i {cov[ωi,1:K , ωj,1:K ]}j=1 and the average column vec-

K

5. Application to human ECG data

j =R

tor similarity AC = i=1 maxj̸=i {cov[ω1:R,i , ω1:R,j ]}j=1 , where cov represents the cross correlation coefficient between two different vectors. Note that given the above definition, the AR value is equal to one for the periodic data. We employ these statistics to characterize the previous representation plane of the Logistic map 1 K

As an example of application to practical time series, we employ the representation plane to investigate the dynamical character of human ECG recording. The ECG recording contains abundant information reflecting the cardiac state, which is considered as

88

Y. Zhao et al. / Physica D 268 (2014) 79–90

Fig. 9. Free-run prediction on periodic data with driving noise (a). The blue line represents the original signal and the dotted red line is the prediction. Panel (b) shows the corresponding representation plane.

Fig. 10. The statistical value vs varying SNR. The AR (blue square solid line) and AC (blue diamond dashed line) of noisy periodic data, the AR (red square solid line) and AC (red diamond dashed line) of noisy chaotic data. (a) white noise and (b) colored noise. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

a prominent tool to monitor the heart activity [40,41]. Here, the ECG recording is measured from a health volunteer in the relaxed condition. Fig. 11(a) shows the ECG signal waveform, which exhibits aperiodic character due to the variation of the RR interval. We choose 2000 data points, of which the first 1600 is the training data and the rest is testing. It is well known that overfitting is a problem endemic to neural networks, especially when applied to real data modeling [42]. Therefore, avoiding overfitting of neural networks is a primary requirement to ensure their successful application in practice. We take an information theoretic criterion, minimum description length (MDL), to determine the neural network with adequate generalization in the case of overfitting [43,44]. This method utilizes a trade-off strategy to select the optimal model according to the equilibrium between the model parameters and model prediction errors. For the series of neural

network candidates, it calculates the description length of each model (see the function definition and formula in Ref. [44]). We then obtain a function of the number of neurons. The principle of the method states that the model with the minimum description length is selected as the optimal one. We point out that the applicable improvement techniques are not limited to MDL. Since the response of the representation plane is intrinsic to the architecture of the neural network. This method should work well with other available methods that improve the prediction performance of neural networks. Fig. 11(b) shows the description length (DL) curve varying with number of neurons from one to twenty five, where the minimal DL value occurs at the position of eight neurons. The independent construction of each neural networks causes the description length to fluctuate. We thereby employ the nonlinear

Y. Zhao et al. / Physica D 268 (2014) 79–90

89

Fig. 11. (a) Normal ECG data and (b) DL curve (solid line) and its nonlinear fitting (dashed line) of which the minimum is highlighted by the arrow. Short-term prediction for the normal sinus ECG data (c) and the representation plane of this neural network for this prediction (d). The dotted solid line represents the original signal and the dotted line is the prediction.

curve fitting technique to smooth such disturbances and capture the true tendency as well as the minimum. We adopt the neural network with eight neurons for modeling the given ECG data. The free-run prediction obtained by the model exactly captures the test data presented in Fig. 11(c). The irregular surface of the representation plane in Fig. 11(d) suggests that the normal ECG data measured is consistent with the chaotic character. This result agrees with results in the literature [45,32]. Like the previous analysis, the result gives indirect evidence that the short-term normal cardiac data is consistent with chaotic dynamics. 6. Conclusion The presence of strong periodicity may hinder our understanding of the inherent deterministic behavior of pseudoperiodic time series. Chaotic invariants and surrogate data methods have been proposed to identify the underlying periodic and chaotic dynamics, but their realization usually depends on phase reconstruction, which is sensitive to noise and appropriate parameter selection. In this paper, we propose a novel method by means of a modeling strategy, which maps the data’s inherent behavior onto the representation plane of the neural network. When the neural network is trained, the weak but distinct deterministic features of the pseu-

doperiodic data are encoded into updated weights, and are also clearly reflected on the representation plane of the neural network. In detail, we find that the presence of strong (approximately) periodic character of (low-dimensional) chaotic time series overwhelms other deterministic features, but yet the subtle variation associated with the unstable periodic orbits appears to be manifest in the representation plane of the neural network. In particular, the representation plane provides intuitive and direct evidence of the true periodic time series by highlighting its periodic behavior. Moreover, given an appropriate training algorithm the representation plane theoretically and empirically shows great advantage of robustness against strong observational noise or even dynamical noise. In the general case, when the representation plane exhibits irregular character, it is not enough to justify that the learned time series has a chaotic dynamics since the given time series may come from other dynamical systems, rather than the chaotic dynamics. Nevertheless, the proposed representation plane can still exactly reflect the true dynamics hidden in the time series. It is then feasible to quantify the irregular characters of representation plane so as to further characterize the underlying dynamics. Hence, from the perspective of modeling, the representation plane of the neural network paves the way to distinguish between many pseudoperiodic time series with specific periodic and chaotic dynamics.

90

Y. Zhao et al. / Physica D 268 (2014) 79–90

Acknowledgments This research was funded by China National Scientific Foundation Grant, No. 608001014, and Scientific Foundation Grant of Guang Dong province, China, No 9451805707002363. References [1] S.N. Rasband, Chaotic Dynamics of Nonlinear Systems, Wiley, New York, 1990. [2] E. Ott, T. Sauer, J.A. Yorke, Coping with Chaos: Analysis of Chaotic Data and the Exploitation of Chaotic Systems, Wiley-Interscience, 1994. [3] G. Nicolis, C. Nicolis, Foundations of Complex Systems, World Scientific, 2007. [4] W.A. Sethares, Tatra Mt. Math. Publ. 23 (2001) 1–16. [5] K. Judd, Physica D 56 (1992) 216–228. [6] M. Small, K. Judd, A. Mees, Stat. Comput. 11 (2001) 257–268. [7] C. Capus, K. Brown, J. Acoust. Soc. Am. 113 (2003) 3253–3263. [8] N.J. Corron, S.T. Hayes, S.D. Pethel, J.N. Blakely, Phys. Rev. Lett. 97 (2006) 024101. [9] H. Kantz, T. Schreiber, Nonlinear Time Series Analysis, Cambridge University Press, 2004. [10] L.M. Pecora, L. Moniz, J. Nichols, T.L. Carroll, Chaos 17 (2007) 013110. [11] J. Zhang, M. Small, Phys. Rev. Lett. 96 (2006) 238701. [12] K. Pukenas, K. Muckus, Electron. Electr. Eng. 8 (2007) 53–56. [13] P. Cvitanović, R. Artuso, R. Mainieri, G. Tanner, G. Vattay, Chaos: Classical and Quantum, Niels Bohr Institute, 2010. [14] J. Theiler, Phys. Lett. A. 196 (1995) 335–341. [15] K. Dolan, A. Witt, M.L. Spano, A. Neiman, F. Moss, Phys. Rev. E. 59 (1999) 5235–5241. [16] M. Small, D.J. Yu, R.G. Harrison, Phys. Rev. Lett. 87 (2001) 188101. [17] X.D. Luo, T. Nakamura, M. Small, Phys. Rev. E. 71 (2005) 026230.

[18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45]

M. Shiro, Y. Hirata, K. Aihara, Artifical Life Robot. 15 (2010) 496–499. T. Nakamura, M. Small, Phys. Rev. E. 72 (2005) 056216. M.C.S. Coelho, E.M.A.M. Mendes, L.A. Aguirre, Chaos 18 (2008) 023125. J.S. Walker, Not. AMS 44 (1997) 658–670. M. Sifuzzaman, M.R. Islam, M.Z. Ali, J. Phys. Sci. 13 (2009) 121–134. K.L. Priddy, P.E. Keller, Artifical Neural Networks: An Introduction, SPIE Press, 2005. V.M. Krasnopolsky, H. Schiller, Neural Netw. 16 (2003) 321–334. A. Jain, A.M. Kumar, Appl. Soft. Comput. 7 (2007) 585–592. G. Sugihara, R.M. May, Nature 344 (1990) 734–741. G. Cybenko, Math. Control Signals Systems 2 (1989) 303–314. M.T. Hagan, M. Menhaj, IEEE Trans. Neural Netw. 5 (1994) 989–993. B.R. Hunt, E. Ott, Phys. Rev. Lett. 76 (1996) 2254–2257. K. Narayanan, R.B. Govindan, M.S. Gopinathan, Phys. Rev. E. 57 (1998) 4594–4603. D. Yu, M. Small, R.G. Harrison, C. Diks, Phys. Rev. E. 61 (2000) 3750–3756. Y. Zhao, J.F. Sun, M. Small, Internat. J. Bifur. Chaos 18 (2008) 141–160. A. Lempel, J. Ziv, IEEE Trans. Inform. Theory 22 (1976) 75–81. N. Marwan, M.C. Romano, M. Thiel, J. Kurths, Phys. Rep. 438 (2007) 237–329. L.L. Trulla, A. Giuliani, J.P. Zbilut, C.L. Webber, Phys. Lett. A. 223 (1996) 255–260. M. Thiel, M.C. Romano, P.L. Read, J. Kurths, Chaos 14 (2004) 234–243. K. Hornik, Neural Netw. 4 (1991) 251–257. M. Moreira, E. Fiesler, IDIAP Technical Report, 1995. T. Nakamura, M. Small, Physica D 223 (2006) 54–68. M. Richter, T. Schreiber, Phys. Rev. E. 58 (1998) 6392–6398. U.R. Acharya, O. Faust, N. Kannathal, T.L. Chua, S. Laxminarayan, Comput. Methods Programs Biomed. 80 (2005) 37–45. D.M. Hawkins, J. Chem. Inf. Comput. Sci. 44 (2004) 1–12. M. Small, C.K. Tse, Phys. Rev. E. 66 (2002) 066701. Y. Zhao, M. Small, IEEE Trans. Circuits Syst. 53 (2006) 722–732. R.B. Govindan, K. Narayanan, M.S. Gopinathan, Chaos 8 (1998) 495–502.