Mutual information algorithms

Mutual information algorithms

Mechanical Systems and Signal Processing 24 (2010) 2947–2960 Contents lists available at ScienceDirect Mechanical Systems and Signal Processing jour...

1MB Sizes 2 Downloads 180 Views

Mechanical Systems and Signal Processing 24 (2010) 2947–2960

Contents lists available at ScienceDirect

Mechanical Systems and Signal Processing journal homepage: www.elsevier.com/locate/jnlabr/ymssp

Mutual information algorithms Ai-Hua Jiang a,b,, Xiu-Chang Huang a,b, Zhen-Hua Zhang a,b, Jun Li a,b, Zhi-Yi Zhang a,b, Hong-Xin Hua a,b a b

State Key Laboratory of Mechanical System and Vibration, Shanghai Jiao Tong University, No.800 Dongchuan Road, Shanghai 200240, China. Noise and Vibration Control Laboratory for Ship-board Equipment, Shanghai Jiao Tong University, Shanghai, China

a r t i c l e in fo

abstract

Article history: Received 30 September 2009 Received in revised form 31 March 2010 Accepted 19 May 2010 Available online 26 May 2010

Three new mutual information algorithms are raised for time delay in the phase space reconstruction process. Firstly, Cellucci’s mutual information algorithm is analyzed based on partitioning plane, which is constructed by a pair of Lorenz series with the same size, into four and sixteen grids with equal distribution probability in elements on each axis. Then three new mutual information algorithms are promoted based on the original probability matrix that shows the distribution of points corresponding to the data pairs of Lorenz series on the plane, the matrix excluding the last column and the last row of the original one as well as the proportionally revised matrix from the original one. Synchronously, an algorithm to compute the probability matrix is also advanced by sorting two series and replacing each numerical value with its order number in its own series so as to judge the element in which data sets are located. The optimal time delay of the three new mutual information algorithms as well as the computing time is also compared when series sizes are different. Finally, after reconstructing phase space with the optimal time delay, comparison between the maximal Lyapunov exponent calculated by Rosenstein’s algorithm from time series and that gained by Jacobi matrix from Lorenz equation is used to confirm the validity of the new mutual information algorithms. The results show that Cellucci’s mutual information algorithm will lead to wrong optimal time delay when series size is not a multiple of elements. The three new algorithms, whose results are more steady when a large number of data pairs are used, can not only eliminate the default of Cellucci’s algorithm but also is very speedy, and the time spent on calculations by three algorithms nearly enhances linearly with the increase in series size. Moreover, the algorithm using original probability distribution matrix is more accurate than the others when small size series are used, and is also faster than the others irrespective of how large the size of series is. Besides, the lesser error of the maximal Lyapunov exponents from the comparison shows that the three new mutual information algorithms are available and feasible. & 2010 Elsevier Ltd. All rights reserved.

Keywords: Phase space reconstruction Time delay Mutual information Maximal Lyapunov exponent

 Corresponding author at: State Key Laboratory of Mechanical System and Vibration, Shanghai Jiao Tong University, No.800 Dongchuan Road, Shanghai 200240, China. E-mail address: [email protected] (A.-H. Jiang).

0888-3270/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.ymssp.2010.05.015

2948

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

1. Introduction 1.1. Theory of mutual information Mutual information is widely used to ascertain one of the most important parameters, time delay, in reconstructing phase space from nonlinear time series. Provided that {s(ti)} (i= 1,2yN) is a nonlinear series, which comes from experiment and whose data acquiring interval is Dt, the series can be reconstructed as shown below ðsðt1 Þ, sðt1 þ tÞ, . . ., sðt1 þðd1ÞtÞÞ, ðsðt2 Þ, sðt2 tÞ, . . ., sðt2 þ ðd1ÞÞÞ . . .. . . ðsðtN Þ, sðtN þ tÞ . . ., sðtN þðd1ÞtÞÞ: where t represents the time delay. The mutual information between S ={s(t1), s(t2),y, s(tN)} and Q={s(t1 + t), s(t2 + t),y, s(tN + t)}, I(S,Q), is the average bits where S can be predicted by the measurement from Q. I(S,Q) and can be expressed as IðS,Q Þ ¼ HðQ Þ þ HðSÞHðS,Q Þ

ð1Þ

where H(Q) and H(S) are the entropy of Q and S, respectively, and H(S,Q) is the mutual entropy between S and Q. Normally, the moment of the first minimal mutual information is taken as the optimal time delay for phase space reconstruction. To figure out the mutual information between two time series, numerical values of S and Q generally corresponded to the points of two perpendicular axes. A plane, SQ, is also introduced in this process and data pairs of S and Q correspond to the points in the SQ plane. Then the interval between the maximal value of Q and the minimal value of Q is partitioned to a certain number of elements, {qi}, (i= 1, 2y), according to different standards. The sketch map of partition on the SQ plane is shown in Fig. 1. The probability distribution of points, corresponding to data pairs in qi, P(qi), can be gained by dividing the size of the data series, N. Furthermore, the partitioning of axis S is similar to that of axis Q. In addition, the superposition between si and qj can be expressed as (si,qj) and the probability distribution in (si,qj) and P(si,qj) is the ratio between the point amount in (si,qj) and N. Thus, the mutual information between S and Q can be calculated by Eq. (1), and the expressions of H(S), H(Q) and H(S,Q) are shown below. HðQ Þ ¼ 

HðSÞ ¼ 

X

X

Pðqi Þlog2 Pðqi Þ

Pðsi Þlog2 Pðsi Þ

X HðQ ,SÞ ¼ HðS,Q Þ ¼  Pðsi ,qj Þlog2 Pðsi ,qj Þ i,j

where P(si) is the probability distribution of section si. Eq. (1) changes to Eq. (2) as well. IðQ ,SÞ ¼

  XX Psq ðsi ,qj Þ Psq ðsi ,qj Þlog2 Ps ðsi ÞPq ðqj Þ i j

Fig. 1. Sketch map of partitioning SQ plane. (a) Equal probability distribution in every element and (b) equal distance in every element.

ð2Þ

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

2949

1.2. Criteria of partitioning SQ plane There are mainly two criteria, namely equal probability distribution in every si or qj and equal distance between the upper and the lower limit of each si or qj, used for partitioning the SQ plane. Fig. 1(a) shows the partition of SQ plane based on the first standard. It can be seen that the amount of points located in each element, si or qj, is the same, but the widths of elements may be different. Fig. 1(b) displays the partition of SQ plane based on the second criterion. In Fig. 1(b), no matter how many points are there in si or qj, the width of each element is the same. Fraser’s algorithm, which is the first mutual information algorithm used for ascertaining optimal time delay, is based on the first standard [1]. It firstly partitions SQ plane into four grids in terms of equal probability in each element, si and qj, while the number of the points in grids may be different from each other. Then whether there are substructures in these grids and whether data points distribute sparsely in these grids are judged. If there are no substructures in any grid or these points are sparse in these grids, mutual information of two series can be found by current partition of the SQ plane. In contrast, each grid, which contains substructures, should be partitioned into another four sub-level grids. This process will continue until no substructures exist in each grid. The guide line to judge the existence of substructure is whether points distribute uniformly in the grid. Fig. 2 reveals a Fraser’s algorithm example, in which points fall uniformly into R1(2) and R1(3); so it is not necessary to further partition these grids, whereas R1(1), R1(4) and R2(4,2) should be partitioned again because of the existing substructures. Fraser’s algorithm is mentioned in the appendix. Fraser’s algorithm is time-consuming and memory-consuming. To ascertain the existence of substructures, double level partition of one grid should be done. In this process, an amount of probability distribution matrixes, whose number increases in terms of power of four, needs to be stored. Besides, the calculation of probability distribution matrixes is also time-consuming. So another approach, partitioning SQ plane with equal distance, is advanced. Compared to Fraser’s algorithm, figuring out mutual information by partitioning SQ plane with equal distance is fast and easy to realize by program, but the result of this algorithm is greatly sensitive to the amount of elements. Unsuitable elements number used to divide the SQ plane will lead to wrong result. Making use of the superabundant elements to divide the plane, there are so many grids, which only contain few points, even no points, that the probability distribution is not reflected exactly and wrong mutual information will be deduce, too. In contrast, if very few elements are used to partition the plane, there will be substructures in grids. As a result, the acquired mutual information is inaccurate. Therefore, there are large amounts of works focused on how to ascertain the number of elements in mutual information algorithm with equal-distance elements. pffiffiffiffi For a series whose total length is N, Tukey suggested that the number of elements on each axis in the SQ plane should be N[2], while Bendat recommended 1:87ðN1Þ0:4 as the element number[3]. Rissasen even theoretically deduced the optimal number of elements based on the theory of stochastic complexity. He believes that the element number, which leads to minimal stochastic complexity, is the best[4]. The equation of stochastic complexity is shown in Eq. (3). !     N N þ m1 R þ log2 FðmÞ ¼ N log2 þlog2 ð3Þ N1 ,. . .Nm m N where F(m) represents stochastic complexity, m is the element number, R is the difference between the maximal and minimal of series, and Ni(i=1,2y) is the occupancy in each element after partitioning the SQ plane. Besides, the multinomial and binomial coefficients in Eq. (3) are listed in Eqs. (4) and (5). ! N N! ¼ ð4Þ N1 ,N2 . . .Nm N1 !N2 !. . .Nm ! 

N þm1 N

 ¼

ðN þ m1Þ! N!ðm1Þ!

ð5Þ

R0

R1 (1)

R2 (1, 1)

R2 (1, 2)

R2 (1, 3)

R1 (2)

R1 (4)

R1 (3)

R2 (1, 4)

R2 (4, 1)

R3 (4, 2, 1)

R2 (4, 2)

R3 (4, 2, 2)

R2 (4, 3)

R3 (4, 2, 3)

Fig. 2. Example of Fraser’s mutual information algorithm.

R2 (4, 4)

R3 (4, 2, 4)

2950

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

It seems that a complete available mutual information algorithm is put forward with Rissasen’s method. But what seems unfortunate is that the length of the series, N, is a big number in most cases, and the factorial of N in Eqs (4) and (5) will be beyond the calculating ability of computers. Although they are changed as shown below, the time for working out the complexity is so long that the benefit from partitioning the SQ plane with equal distance is counteracted. So far, generally, there is no acknowledged feasible method to ascertain the optimal element number.

log2

N m X i X X N! ¼ log2 ðNiÞ log2 ðNi jÞ N1 !N2 !:. . .Nm ! i¼0 i¼1j¼0

log2

ðN þ m1Þ! ¼ N!ðm1Þ!

Nþ m1 X i¼0

log2 ðN þ m1iÞ

N X i¼0

log2 ðNiÞ

m 1 X

log2 ðm1iÞ

i¼0

Also, the algorithm with equal-distance elements is not fit for multi-variable time series reconstruction, which demands the same number of elements for different variables. The Rossler time series, whose numerical values distribute ununiformly in phase space, are taken as an example. The equation of Rossler system is shown in Eq. (6), where d =0.2, e=0.4, and f= 5.7. Ten thousand data sets, which come from discretization of Rossler equation by the four-order Runge–Kutta algorithm, are shown in Fig. 3(a). The original point of the data sets is (  1, 0, 1), and the time step is 0.05 s. Fig. 3(b) is a bar chart, which shows the distribution of numerical value of the three variables in Rossler equation. In this picture, most of the numerical values in the lowest bar chart, representing the situation of z, congregate from 0 to 0.4, in contrast to the situations of x and y, which is displayed at the top and the middle charts respectively. Fig. 3(c) indicates the mutual information of the three variables when the number of elements used to partition each axis of the SQ plane is 40. It can be seen from Fig. 3(c) that the optimal time delay from variable z is inaccurate. So, the algorithm with equal-distance

Fig. 3. Mutual information of time series from the three variables of Rossler with equal-distance elements. (a) Ten thousand data sets of Rosseler attractor, (b) bar charts of numerical value distribution of three variables, and (c) mutual information of series of x, y and z with 40 equal-distance elements.

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

elements is improper for the system with the same situation as the Rossler system.) 8 > < dx=dt ¼ yz dy=dt ¼ x þ dy > : dz=dt ¼ eþ xzfz

2951

ð6Þ

Under this condition, Cellucci put forward a mutual information algorithm based on statistic. This algorithm, which is based on equal probability distribution partition SQ plane, can not only work out mutual information with much less time than that of Fraser algorithm’s but also meet the demand of multi-variables reconstruction from time series. But at the same time, Cellucci’s method has an important default, which will be introduced in the next section. So, the purpose of this article is to advance a new algorithm that can eliminate the default of Cellucci’s method and elicit accurately the optimal time delay.

2. Cellucci’s algorithm Cellucci’s algorithm was constructed based on the null hypothesis that series S and Q are statistically independent [5]. If the hypothesis is true, the points of data pairs coming from S and Q distribute uniformly in the SQ plane. Then mutual information computed by Eq. (2) is zero and the amount of points in each grid is known. Supposing that O(si) is the quantity of points in the ith element on axis S and O(qj) is the quantity of points in the jth element on axis Q, the amount of points in (i,j) should be a known number, E(si,qj), which is shown in Eq. (7).    Oðqj Þ Oðsi ÞOðqj Þ Oðsi Þ ¼ ð7Þ Eðsi ,qj Þ ¼ NPðsi ÞPðqj Þ ¼ N N N N When the standard of equal probability distribution is used to partition the SQ plane and both element numbers of S and Q are NE, then P(si), the probability of si, and P(qj), the probability of qj, are P(si) =P(qj) = 1/NE. So Eq. (7) becomes Eq. (8). Eðsi, q j Þ ¼ NPðsi ÞPðqj Þ ¼

N NE2

ð8Þ

Here, if E(si,qj) is a known number, the element number NE should also be a known number from Eq. (8). In Cellucci’s algorithm, Eðsi ,qj Þ Z5 was taken because there is little probability that substructures exist in grid under this condition. Eq. (8) changes to Eq. (9). NE r

 1=2 N 5

ð9Þ

Normally, NE is taken as the maximal integer, which satisfies Eq. (9) in partitioning the SQ plane, when N is a multiple of NE. Then each element has the same number of points, and Eq. (2) becomes Eq. (10). IðQ ,SÞ ¼

NE X NE X

Pðsi ,qj Þlog2 ðNE2 Pðsi ,qj ÞÞ

ð10Þ

i¼1j¼1

However, when N is not a multiple of NE, the maximal integer of Eq. (9), the numbers of points in elements are not the same, and another process to confirm NE should be brought forward. In this case, Cellucci put forward two qualifications that NE should fulfill. One is that each grid should have no less than one point, i.e.Oðsi ,qj Þ Z 1in all (si,qj), and the other one is that 80% grids should have no less than five points, i.e.Oðsi ,qj Þ Z5 in at least 80% (si,qj). To estimate whether NE meets the second demand, w2 testing for uniform distribution on the SQ plane is used. The equation of w2 test is as follows:

w2 ¼

NE X NE X fOðsi ,qj ÞEðsi ,qj Þg2 i¼1j¼1

Eðsi ,qj Þ

ð11Þ

The degree of freedom of w2 is n ¼ ðNE 1Þ2 , and then the probability, which points representing data pairs distribute uniformly on the SQ plane, is as follows:   n w2 , P¼G 2 2 Z w2 =2 1 et t n=21 dt ¼ 1 Gðn=2Þ 0 Z1 1 et t n=21 dt ¼ 0:8 ð12Þ ¼ Gðn=2Þ w2 =2

where GðxÞ ¼

R1 0

et t x1 dt. So NE can be deduced from Eqs. (11) and (12).

2952

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

Fig. 4. Three-dimensional diagram of 4096 Lorenz data pairs whose original point is (8.331,13.291,18.063).

3. Default of Cellucci’s algorithm Cellucci’s algorithm can work out mutual information effectively when N is the multiple of NE, and the time consumption is also much less than that of Fraser’s algorithm; however, when N is not a multiple of NE, the process to ascertain NE in Cellucci’s algorithm is a null effort. Take the series of Lorenz equation, which is shown in Eq. (13), where s =10, b= 8/3 and r =28, as an example to show the shortcoming of Cellucci’s algorithm. One hundred thousand data pairs, which originate from (8.331, 13.291, 18.063), are obtained by the four-order Runge–Kutta algorithm with 0.01 s time step. All the processes to generate Lorenz data pairs can be carried out by Tisean3.0.0 [6]. Fig. 4 shows a three-dimensional diagram displaying the forward 4096 data pairs. Then series of x in the 4096 data pairs, whose length is not a multiple of NE, which is the maximal integer satisfying Eq. (9), are taken as S in mutual information calculation and the series, which lag to S with 1, 10, 100 and 200 time steps, are used as Q. Partitioning SQ plane with two and four elements, namely four and sixteen grids, respectively, by equal probability distribution, series of matrixes of probability distribution are shown in Fig. 5 and 6. 8 > < dx=dt ¼ sðxyÞ dy=dt ¼ xz þ rxy ð13Þ > : dz=dt ¼ xybz Fig. 5 shows the matrixes of probability when S and Q are partitioned into two elements with equal probability, and all the occupancies in these matrixes are more than five. So the qualifications of Cellucci’s algorithm, when N is not a multiple of NE, are fulfilled. Fig. 6 shows the matrixes of probability when four elements are used in partitioning the SQ plane. In contrast to the condition of Fig. 5, it can be seen that there is at least one zero in each matrix and more zeros appear when less lags are implemented. When quantity of elements increases according to square, occupancies equal to zero in matrixes of Fig. 6 will produce more occupancies equal to zero in the new probability distribution matrix. Thus, the partitions do not meet the demand in the algorithm. In other words, Cellucci’s algorithm can divide the SQ plane into four grids only when the total length of data pairs, N, is a number and not a multiple of the elements number, NE. Obviously, merely partitioning the SQ plane into four grids with equal probability distribution would lead to false result. Fig. 7 shows the results of mutual information from Cellucci’s algorithm with different series lengths, 4096, which is not a multiple of NE and 4500, which is a multiple of NE. As can be seen, mutual information keeps decreasing with increase in lag when N equals to 4096, and the first minimal mutual information is gained when the lag is 81, which is far from the optimal time delay, 18, shown in Section 5, while in the case of N = 4500, the first minimal mutual information locates at the 19th lag, which is quite close to the optimal time delay. So some new algorithms need to be advanced. Because most numbers are not a multiple of NE, which means that only limited numbers, such as 4500, 8000, 12,500, 24,500 and 50,000, can gain right mutual information based on the Cellucci algorithm. Besides, the process to ascertain NE from w2 test of Oðsi ,qj Þ Z 5 in at least 80% (si,qj) is complicated. 4. New algorithms Three new algorithms based on Cellucci’s algorithm are discussed in this section. Though Cellucci’s algorithm is feasible only in certain cases, the idea that deducing element amount from assumed numbers of distribution points in the SQ plane is worthy of recommendation.

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

2013

35

1702

346

1391

657

730

1381

35

2013

346

1702

657

1391

1381

730

2953

Fig. 5. Matrixes of probability distribution when S and Q are partitioned to two elements with equal probability. (a) lagging 1 time step, (b) lagging 10 time steps, (c) lagging 100 time steps, and (d) lagging 200 time steps.

0

0

750 225 19

0

328 540 156

28 961 35

0

274 423 327

0

996 28

0

35 955 34

0

0

0

34 990

0

0

43 323 400 258

362 161 334 167

171 193 99 561

345 339 340

324 182 29 489

425 140 254 205

339 684

10 141 550 368

385 368 271

1

0

Fig. 6. Matrixes of probability distribution when S and Q are partitioned to four elements with equal probability. (a) lagging 1 time step, (b) lagging 10 time steps, (c) lagging 100 time steps, and (d) lagging 200 time steps.

Fig. 7. Mutual information from Cellucci’s algorithm when series lengths are 4096 and 4500.

4.1. Process of new algorithms The processes to calculate mutual information in the new algorithms are exhibited below. Firstly, element amount, NE, is also found as the maximal integer from Eq (9). Then the maximal integer number, N1, which is equal to or less than N/NE, can be taken as number of data points in each si or qj to partition axis S and axis Q from their minimal value to maximal value, respectively. If N/N1 is an integer, the partitioned elements have the same data points, whereas if N/N1 is not an integer, the last elements of axes S and Q have data points less than N1. Then, mutual information can be directly computed from the original probability matrix in the first case, and there are three approaches to dispose the probability matrix in the second case. The first method is to calculate the mutual information from matrix, which removes the last row and the last column of the original probability matrix when N/N1 is not an integer. Because the Cellucci algorithm is promoted based on the statistic of a great deal of data pairs, ignoring limit data pairs will indistinctively affect the result and the max amount of data points to be neglected is just N1 1. The default of the Cellucci algorithm also vanishes at the same time when these changes are over on the original probability matrix. The new matrix used to elicit mutual information is expressed as C1. Another method is to calculate mutual information by the original probability matrix, no matter whether N/NE is an integer or not. Taking five as the anticipant data points in (si,qi) is because no substructures exist in grids in that case, and the data points in the grids of the last row and the last column are less than five when N/N1 is not an integer; thus acquiring mutual information right from the original probability matrix is also rational in this aspect. The original probability matrix is expressed as C2. The last method is to modify the last row and the last column of C2 according to the scale between data points in the last element and data points in other element, and the detailed approach is shown as follows. Provided N2 is the residue when N divides NE, then a new matrix, C3, appears by multiplying the occupancies of the last row and the last column in C2 with

2954

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

N1/N2. What needs to be pointed out is that the overlapping occupancy of the last row and the last column in C2 should multiply N1/N2 once only. 4.2. Probability distribution matrix calculation To carry out mutual information calculation in a short time even if large amounts of data pairs are involved, a rapid algorithm of probability distribution matrix is also raised here. It can be known from the algorithm described above that the key step in mutual information calculation is how to gain the probability distribution matrix. The time consumption and space consumption in that step occupy the majority of the whole mutual information algorithm. In the case of partitioning the SQ plane with equal probability on both axes, the matrix only relates to the order of each numerical value in series, but not the numerical value itself. So any changes on the numerical value of series are acceptable as long as the order of each number in series keeps constant. This trait gives birth to a rapid and easily realized algorithm. Thus, the estimation of the original probability distribution matrix is based on the following method. Above all, the two series, S and Q, are sorted by a rapid sorting program, and then the numerical values of S and Q are replaced by their order numbers. As a result, two new data vectors, A and B, are obtained from S and Q, respectively, and no matter how large the numerical ranges of S and Q are, numerical values in A and B are integer from 1 to N. Therefore, eliciting probability matrix from A and B is much easier than that from S and Q. Due to element data amount, N1, is a known number, the grid that data set, (A(i),B(i))(i= 1,2y N), falls into is also clear. For each step to judge the position of (A(i),B(i)), the probability matrix occupancy, C(m,n), should be plus one, where m is the maximal integer less than 1+ A(i)/N1 and n is the maximal integer less than 1+ B(i)/N1. Following this course, the original probability, C2, matrix can be obtained. After C2 is acquired, Eq. (2) changes to Eq. (14) based on the first method raised in Section 4.1. " # k k X X Cðm,nÞ Cðm,nÞN  IðQ ,SÞ ¼ log2 ð14Þ N N12 m¼1n¼1 where Nn equals toN1 Uk, and k is the maximal integer equal to or less than N/N1. If the second method described in 4.1 is used, Eq. (2) changes to Eq. (15).   kX þ 1 kX þ1 Cðm,nÞ Cðm,nÞN log2 ð15Þ IðQ ,SÞ ¼ N Nðm,nÞ m¼1n¼1 where 8 N N > < 1 2 N22 Nðm,nÞ ¼ > : N2 1

m ¼ k þ1 or n ¼ k þ 1 m ¼ n ¼ kþ 1

;

others

If the third method described in 4.1 is used, Eq. (2) changes to Eq. (16).    kX þ 1 kX þ1 C  ðm,nÞ C ðm,nÞðkþ 1Þ IðQ ,SÞ ¼ log2 ðkþ 1ÞN1 N1 m¼1n¼1

ð16Þ

where 8 > < Cðm,nÞ 0 om,n rk C  ðm,nÞ ¼ N1 > others : N2 Cðm,nÞ 4.3. Optimal time delay from new algorithms According to the new mutual information algorithms formatted in Section 4, time series with different lengths in Lorenz system are used to calculate the optimal time delay. With the same coefficients, original point and time step introduced in section 3, one hundred thousand data pairs of Lorenz system are produced. Then the forward 1024, 2048, 4096, 8192, 16,384, 24,576, 32,768 and 65,536 data points of variable x are utilized as the first series, S, and another series, which lags S from 1 to 200 in the ten thousand data point of x, is used as the second series, Q. So for each length of time series, 200 mutual information values can be acquired. With three different methods to calculate the probability distribution matrix, there are totally 24 groups of mutual information curves, which are shown in Fig. 8 (a)–(c). Table 1 shows the accurate parameters appearing in the process of mutual information calculation by series with different lengths listed above, where t represents the time spent on mutual information calculation, and the other symbols are the same as those referred above. It can be seen from Table 1 that with the increase in the length of data pairs, the optimal time delays,t, i.e. the lag of the first minimal mutual information, are stable no matter which probability distribution matrix is employed, and these optimal time delays are fix at 17 when the length of the time series is equal to or greater than 16,384. However, the lags of

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

2955

Fig. 8. Mutual information from three different algorithms of probability distribution matrix. (a) Mutual information from C1, (b) Mutual information from C2, and (c) Mutual information from C3.

Table 1 Parameters appearing in the process of mutual information calculation by series with different lengths. N

1024 2048 4096 8192 16,384 24,576 32,768 65,536

ESQ

5 5 5 5 5 5 5 5

NE

14 20 28 40 57 70 80 114

N1

73 102 146 204 287 351 409 574

N/NE is integer

No No No No No No No No

C1

C2

C3

t

t (s)

t

t (s)

t

t(s)

22 20 19 18 17 17 17 17

0.188 0.375 0.781 1.609 3.375 5.219 7.063 15.593

17 17 19 18 17 17 17 17

0.172 0.375 0.765 1.594 3.344 5.187 7.062 15.578

16 17 19 18 17 17 17 17

0.188 0.39 0.797 1.64 3.438 5.297 7.187 15.86

the first minimal mutual information deviate from the steady result when short series are implemented. The deviation of C1 is the largest among the results from three probability distribution matrixes, whereas the deviation of C2 is the smallest. It can also be known that with the increase in N, time spent on the calculations enhances in nearly a linear trend; the relationship between length of data sets and calculation time is shown in Fig. 9. Furthermore, the time of mutual information calculation with C1, C2 and C3 is basically adjacent, and the time with C2 is weakly less than the others. But the new algorithms, which cost no more than 15.86 s for 65,536 data pairs, are much faster than Fraser’s algorithm, which

2956

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

Fig. 9. Relationship between length of time series and time consumption.

Fig. 10. Partial enlarged detail of the mutual information calculated by the three algorithms with different data pair lengths. (a) 50,000 data pairs, (b) 50,001 data pairs, (c) 50,250 data pairs, and (d) 50,499 data pairs.

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

2957

Table 2 Parameters appearing in mutual information calculation with integral and non-integral N/NE. N

50,000 50,001 50,250 50,499

ESQ

5 5 5 5

NE

100 100 100 100

N1

500 500 502 504

N/NE is integer

Yes No No No

C1

C2

C3

t

t (s)

t

t (s)

t

t (s)

17 17 17 17

11.516 11.516 11.521 11.594

17 17 17 17

11.281 11.281 11.312 11.360

17 17 17 17

11.578 11.578 11.625 11.797

costs more than 200 s when the length of series is 4096. All the algorithms mentioned above are realized by Matlab 6.5 installed on a PC with 3 GHz CPU frequency and 2GB memory. 4.4. Comparison of the three new algorithms To further compare the differences among the three new algorithms when N, length of time series, is a multiple of NE, element amount, and when N is not a multiple of NE, the time series, whose lengths are 50,000, 50,001, 50,250 and 50,499, are also used in this section. In the case of E(si,qj) equal to five, 50,000 is a multiple of element amount coming from Eq. (9), and each element possesses 500 data points. So the three probability matrixes are absolutely the same then. However, the cases are different when lengths of data pairs are 50,001, 50,250 and 50,499. When 50,001 data pairs are used, C1 truncates the least point on the SQ plane, while it ignores most points on the SQ plane when length of data pairs is 50,499, and middle amount of points is eliminated when the length is 50,250. Amplified partly curved pictures of results from the three new algorithms with the same lengths are shown in Fig. 10(a)–(d) respectively. Table 2 also exhibits the parameters appearing in the calculation process. As can be seen from Fig. 10, although there are evident differences among the mutual information coming from the three probability distribution matrixes, lags of the first minimal mutual information are the same. Fig. 10(a) shows the results from the three algorithms when N equals to 50,000, and three curves overlap with each other completely, while in the cases when lengths of data pairs are 50,001, 50,250 and 50,499, C3’s curves, which are shown in Fig. 10(b)–(d) respectively, are higher than C1’s and C2’s, and Fig. 10(b) shows this trend more obviously than the other two pictures. Except when N equals to 50,001, the curves of C1 and C2 shown in Fig. 10(b) nearly overlapped absolutely, but when N equals to 50,250 and 50,499, mutual information curve from C2 is partly higher than that from C1, and after comparing certain numerical values of mutual information when N are 50,001, 50,250 and 50,499, values of mutual information from C2 are higher than those from C1. 5. Validation of new algorithms To validate the new algorithms, the lag of the optimal time delay coming from the algorithms will be checked, and estimation on the precision of invariable parameters, such as Lyapunov exponents, correlation dimensions and Kolmogorov entropy[7,8], after reconstructing the phase space based on the optimal time delay is an effective approach. As for the importance of the biggest Lyapunov exponent in measurement of nonlinear system, the precision of the biggest Lyapunov exponent is regarded as the standard of optimal time delay in this thesis. There are mainly two approaches to figure out the biggest Lyapunov exponent. The first way is to calculate the parameter according to the system differential equation when this equation is known and the other one is to count the parameter from time series when the measurement of dynamic system is available. An approach, which is commonly used when system equation is known, is to compute the average eigenvalues of Jacobi matrix from system differential equation in a period of time[9]. Besides, computing the Lyapunov exponent spectrum from the definition of Lyapunov exponents is also an available method. Also, there are a number of algorithms used to compute the biggest Lyapuov exponent from time series, such as Rosentein’s algorithm, Wolf’s algorithm and so on[10][11]. As an extended method, Brown’s algorithm can even be used to compute the Lyapunov exponent spectrum from time series[12]. For the differential equation of Lorenz is known and the time series of Lorenz equation is also gained by four-order Runge-Kutta algorithm, both approaches to calculate the biggest Lyapunov exponent are available in this case. The biggest exponent from the first method is taken as the standard value in a Lorenz system, because it comes from the differential equation directly. And the extent to which the biggest Lyapunov exponent based on the second method is close to the standard exponent reflects the precision of reconstruction, namely precision of time delay and embedding dimension. And the Jacobi matrix of the Lorenz equation is shown as follows:    s s 0    z þ r 1 x  ð17Þ      y x b 

2958

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

Table 3 Maximal Lyapunov exponents gained from Jacobi matrix and phase space. constructed by time series with different time delays. QR decomposition of Jacobi matrix

0.9061

Rosentein’s algorithm

Relative error (%)

Time delay t

Maximal Lyapunov exponent

16 17 18 19

0.9006 0.9036 0.9065 0.9110

0.607 0.276 0.044 0.541

Based on the time delay acquired in Section 4.3, phase space can be reconstructed when the right embedding dimension is known. And the Takens theory indicates that the phase space reconstructed from time series has the same nonlinear invariables of the original nonlinear system when embedded dimension is bigger than double fractal dimension of the original nonlinear system [13]. So for the Lorenz system, whose fractal dimension is 2.0627160[14], taking five as the embedded dimension to reconstruct phase space is feasible. With the reconstructed phase space from ten thousand data points of variable  produced in Section 4.3, maximal Lyapunov exponent is calculated by Rosentein’s algorithm when time delays are 16, 17, 18 and 19, respectively, and embedded dimension is 5. The calculation is finished by Tisean3.0.1. At the same time the maximal Lyapunov exponent is also computed from the Jacobi matrix of the Lorenz differential equation by QR decomposition. The results from the two methods are shown in Table 3. It can be known from Table 3 that the relative error of maximal Lyapunov exponents between Rosentein’ algorithm and QR decomposition of Jacobi matrix is just 0.267%, when time delay is 17, which is quite close to the optimal time delay, 18. So the new algorithm is feasible and effective. 6. Conclusion This paper put forward three new algorithms to calculate mutual information for time delay in phase space construction from time series. The following results are obtained in the design of this algorithm: 1) Cellucci’s mutual information algorithm is feasible when the length of data pairs used to calculate mutual information is a multiple of element amount in partitioning the plane built by the two series of data pairs. However, wrong result will be acquired by this algorithm when the length of data pairs is not a multiple of element amount. 2) Probability distribution matrix can be gained by sorting the two series, respectively, and replacing each of the numerical values with its order number in its own series so as to judge the element where data pairs locate in. 3) When small amounts of data pairs are used to compute mutual information, optimal time delay is unstable, and the stability of algorithm using the original probability matrix to calculate mutual information is the best among the three new algorithms, whereas the stability of algorithm using interceptive original probability matrix is the worst. When large amounts of data pairs are used, the optimal time delays from the three new algorithms are the same and keep stable following the increasing length of data pairs. 4) Time spent on the calculations by three algorithms, which are much faster than Fraser’s algorithm, nearly enhances linearly, following the increasing length of data pairs. And the time of mutual information calculation by the three algorithms is basically adjacent. But the calculation by the original probability matrix spends less time weakly than that of the others, and the calculation by the modified probability matrix spends a little more time than the others. 5) When data length is the multiple of element amount, mutual information from the three algorithms is the same, whereas when data length is not the multiple of element amount, mutual information from modified probability distribution matrix is bigger than that from the other two matrixes, and mutual information from interceptive original matrix is the minimal. But the optimal time delays from the three algorithms are the same. 6) Maximal Lyapunov exponent coming from optimal time delay gained by the three new algorithms is a little different from the maximal exponent coming from QR decomposition of Jacobi matrix of system differential equation. The lesser error shows that the three new algorithms of mutual information are feasible and credible. 7. Discussion Though the three algorithms introduced above can acquire the approximation of optimal time delay for phase space reconstruction from time series, there are still some issues to be resolved. Above all, further research on how to choose the optimal occupancy in (si,qj) is necessary. Although with the supposition that when E(si,qj) is equal to five, the approximation of optimal time delay which is greatly close to the accuracy can be gained, whether E(si,qj) equal to five is the best choice and how to prove the rationality in choosing value of E(si,qj) are still

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

2959

not illuminated. And if another better expectancy of occupancy in (si,qj) is introduced, the optimal time delay coming from the new algorithms should be completely the same with the accurate value. Another aspect of concern is that there is still possibility that substructures exist in grids on the SQ plane even if E(si,qj)=5 is taken as the occupancy in each grid. Because partition SQ plane by equal probability only regulates the number of data pairs in the elements of axis S and axis Q, and the data pairs may gather in some grids. To amend the mutual information in these grids, introducing the process of substructures judgment shown in Fraser’s algorithm may be advantageous, and because of the pre-partition of SQ plane, the amount of substructures is also much less than that in Fraser’s algorithm, so fast speed in mutual information calculation is also realizable. The third issue, which is worthy of attention, is the validity of the three algorithms under the influence of diversified noises, and the scale of the noise, within which the algorithms are feasible. As all of the data series obtained from experiments contain various kinds of noises, which may lead to wrong optimal time delay. The initial verification has shown that the optimal time delay calculated from Eq. (16) by 16,384 Lorenz series with uniform white noise, which contains ten percent energy of the original series’, is the same with that calculated by the series without noises. And the optimal time delay computed from Eq. (14) and (15) by the same length of Lorenz series with uniform white noise, which contains five percent energy of the original series’, can also lead to the same result of the original series. The Lorenz serials with the influence of uniform white noise can be obtained by Tisean3.0.0. But further validation with the other types of noise signals hasn’t been realized.

Acknowledgements We would like to thank the advice from Ph.D. C. J. Cellucci on Rissasen’s sophistic complexity theory. The discussions about usage of Tisean 3.0.1 with Lei Min, assistant professor of Shanghai Jiao Tong University, are also acknowledged with gratitude. At last, particular thanks are expressed to Li Jin-Hua, postgraduate in foreign language department in East China Normal University, for her abortive recession on this paper. This paper was supported by special funding for Ph.D. candidate of Shanghai Jiao Tong University. Appendix:. Independent coordinates for strange attractors from mutual information (Fraser’s algorithm) Given two time series {X}={x1,x2,yan} and {Y}={y1,y2,yyen}, their mutual information, I(X,Y), is the average number of bits that probability of xi (i= 1,2yn) appearing in {X} can be predicted by measuring the probability of yes appearing in {Y}. And the relationship is symmetrical, I(X,Y)= I(Y,X). (an ,yen) is the point in XY plane. The schematic image of two steps in partitions plane built by the two series is shown in Fig. 11(a)–(c). Fig. 11 (a) denotes the step to partition XY plane into four grids with equal probability on axis X and axis Y, respectively, which are represented by R1(1), R1(2), R1(3), and R1(4) respectively. Provided there are substructures in R1(1), the grid should be divided into smaller grids with equal probability partition again. Then the amount of data points between x0 and x3, Nx0  x3, is the same as that between x3 and x1, Nx3  x1, but they are only half of Nx1  x2. So the probability distribution of element between x0 and x3 is Nx0-x3/N0, where N0 is the total number of data pairs in XY plane. Besides, the case of partition on axis y is same as that on axis x. Then the substructures of R1(1) are composed by R2(1,1), R2(1,2), R2(1,3) and R2(1,4). The second step of partition is shown in Fig. 11(b). Supposing there are substructures in R2(1,3), the partition of R2(1,3) is similar to that of R1(1)). Whereas the difference is that the sub partitions of R2(1,3) are formed by sub grids R3(1,3,1), R3(1,3,2),

Fig.11. Steps to partition XY plane. (a) Original plane is partitioned into four grids, (b) the first step to partition R1(1), and (c) the first step to partition R1(1).

2960

A.-H. Jiang et al. / Mechanical Systems and Signal Processing 24 (2010) 2947–2960

R3(1,3,3) and R3(1,3,4). Following this process, a tree of substructures can be gained, and calculation of mutual information can be realized. Based on partition of XY plane shown above, the mutual information between {X} and {Y} can be calculated by the following relation: IðX,YÞ ¼ ð1=N0 ÞFðR0 ðZ0 ÞÞlogðN0 Þ where N0 denotes the number of data pairs in XY plane, F(R0(Z0)) is the mutual information in the grid, R0(Z0) . If there is no substructure in Ark(Ski), then F(Ark(Ski)) can be computed with the equation listed below: FðRk ðZk ÞÞ ¼ NðRk ðZk ÞÞlog2 ½NðRk ðZk ÞÞ where N(Ark(Ski)) is the number of data points in grid Ark(Ski). And k is the layer of XY plane partition. If there are still substructures in Ark(Ski), further dividing of substructure should be done, and equation of F(Ark(Ski)) is also changed as follows: FðRk ðZk ÞÞ ¼ NðRk ðZk ÞÞlog2 4 þ

4 X

FðRk þ 1 ðZk ,iÞÞ

i¼1

where Rk + 1(Zk,i) is the subgrid of Rk(Zk). Then the issue left is how to test the substructure on Rk(Zk). By the Fraser and Swinney criterion, w2 test is used to examine substructures on both k+ 1 and k+2 layers of Rk(Zk). Data Points are deemed to distribute on Rk(Zk) equally when both w23 o 1:547 and w215 o 1:287are satisfied, where   X 4  16 1 NðRk ðZk ÞÞ 2 w23 ¼ NðRk þ 1 ðZk ,iÞ , 9 NðRk ðZk ÞÞ i ¼ 1 4

w215 ¼

  X 4 X 4  256 1 NðRk ðZk ÞÞ 2 NðRk þ 2 ðZk ,i,jÞ 225 NðRk ðZk ÞÞ i ¼ 1 j ¼ 1 16

N(Rk + 1(Zk,,i)) and N(Rk + 2(Zk,i,j)) are the amounts of Rk + 1(Zk,,i) and Rk + 2(Zk,i,j), respectively. Then the partitioning process will be terminated and calculation of mutual information between X and Y is finished. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

A.M. Fraser, H.L. Swinney, Independent coordinates for strange attractors from mutual information, Physical Review A 33 (1986) 1134–1140. F. Mosteller, J.W. Tukey, in: Data Analysis and Regression, Addison-Wesley, 1977. J.B. Bendat, A.G. Piersol, in: Measurement and Analysis of Random Data, John Wiley, New York, 1966. Y. Rissanen, in: Stochastic Complexity in Statistical Inquiry, World Scientific, Singapore, 1992. C.J. Cellucci, A.M. Albano, P.E. Rapp, Statistical validation of mutual information calculations: comparisons of alternative numerical algorithms, Physical Review E 71 (2005) 1–14. R. Hegger, H. Kantz, T. Schreiber, Practical implementation of nonlinear time series methods: the TISEAN package, CHAOS 9 (1999) 413–435. H. Kantz, T. Schreiber, in: Nonlinear Time Series Analysis, Cambridge University Press, Cambridge, 2004. Edward Ott, in: Chaos in Dynamic System, Cambridge University Press, Cambridge, 2002. Tetsuji Kawabe, Indicator of chaos based on the Riemannian geometric approach, Physical Review E 71 (2005) 1–4. M.T. Rosenstein, J.J. Collins, C.J. De Luca, A practical method for calculating largest Lyapunov exponents from small data sets, Physical Review D 65 (1993) 117. A. Wolf, J.B. Swift, H.L. Swinney, J.A. Vastano, Determining Lyapunov exponents from a time series, Physical Review D 16 (1985) 285–317. Paul Reggie Brown, Henry D.I. Bryant, Abarbanel, computing the lyapunov spectrum of a dynamical system from observed time series, Physical Review A 43 (1991) 2787–2906. F Takens, Detecting strange attractor in turbulence, Lecture Notes in Mathematics 898 (1981) 361–381. Divakar Viswanath, The fractal property of the Lorenz attractor, Physical Review D: Nonlinear Phenomena 190 (2004) 115–128.