Journal Pre-proof Analyzing dynamic association of multivariate time series based on method of directed limited penetrable visibility graph Xuan Yu, Suixiang Shi, Lingyu Xu, Jie Yu, Yaya Liu
PII: DOI: Reference:
S0378-4371(19)31890-4 https://doi.org/10.1016/j.physa.2019.123381 PHYSA 123381
To appear in:
Physica A
Received date : 9 September 2018 Revised date : 18 September 2019 Please cite this article as: X. Yu, S. Shi, L. Xu et al., Analyzing dynamic association of multivariate time series based on method of directed limited penetrable visibility graph, Physica A (2019), doi: https://doi.org/10.1016/j.physa.2019.123381. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Published by Elsevier B.V.
Highlights
Journal Pre-proof Highlights
of p ro Pr eal
urn
This paper reveals the associated relationship among multivariate time series. This paper constructs the Multivariate Time Series-Dynamic Association Network. We analyze long-term and short-term associated evolution of the data by the network. The associated trend of multivariate time series is related to actual events.
Jo
Original source file Click here to view linked References
Journal Pre-proof
Analyzing Dynamic Association of Multivariate Time Series based on Method of Directed Limited Penetrable Visibility Graph Xuan Yu,a Suixiang Shi,a,b Lingyu Xu,a,c Jie Yu,a,1 and Yaya Liua a
p ro
of
School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China. b Key Laboratory of Digital Ocean, National Marine Data and Information Service, Tianjin, 300171, China. c Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai, 200444, China.
Abstract
urn
Keywords
al
Pr e-
In order to study the characteristics of the evolution behavior of the relationship among multivariate time series, this paper proposes a method of constructing Multivariate Time Series-Dynamic Association Network (MTS-DAN) which represents multivariate time series associated relationship in the specific period. Firstly, we adopt transfer entropy algorithm to measure the associated relationship among multivariate time series. Secondly, the temporal behavior of the relationship is constructed into a complex network by the directed limited penetrable visibility graph (DLPVG) method. Thirdly, we explore the potential patterns of multivariate time series according to the physical characteristics of the network. Artificially generated data, SST and financial time series data are as sample separately in this paper. The experimental results reveal some statistical evidences that the associated relationship among multivariate time series is in a dynamic evolution process. There are association patterns among multivariate time series and a few types of patterns play a significant role in the process, while the clustering effect appears in the long-term evolution process. Furthermore, the results also show that multivariate time series have a close relation with actual events, which indicates that the method is of great significance to the research and prediction of events.
Multivariate time series; complex network; associated relationship; directed limited penetrable visibility graph
Jo
1. Introduction
Constructing complex networks from multivariate time series and studying their dynamic behavior have been widely applied [1-4]. There are associations between time series data with the same attributes. For example, the wave of sea water will lead to the interaction of Sea Surface Temperature (SST) between different regions in the ocean. Fluctuation in oil market will cause interactions between different oil prices. Previous studies have witnessed the success and effectiveness of complex network in analyzing characteristics of multivariate time series in many fields, such as engineering [5, 6, 47], marine science [7, 8], economic [9, 1
Corresponding author at: School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China. E-mail address:
[email protected] (J. Yu)
1
Journal Pre-proof
Pr e-
p ro
of
10], medicine [45], etc. Gao et al. analyzed multivariate time series and constructed complex network by taking the distances between time series as the relationship sets [47]. Du et al. detected red tide by measuring the relationship between different water quality factors and constructing complex network [7]. Koutlis et al. used the improved Granger causality method to measure the interdependence of multivariate time series and form the causality network, which identifies epileptic seizures emerging during electroencephalogram recordings [45]. Studying the evolution of the interaction behavior among multivariate time series is helpful to researchers to uncover the hidden interaction information [11-14, 45]. Based on the existing contributions, however, there are two problems with the existing studies. Firstly, the current study did not quantify the impact among multivariate time series, which ignores the pattern of the associated relationship changes. Secondly, there is no study to research the relation between the evolution behavior of the associated relationship and actual events. Motivated by these issues, this paper works from three aspects. First, we apply transfer entropy method to measure the associated relationship among multivariate time series. Second, we put forward the method of Multivariate Time Series-Dynamic Association Network (MTS-DAN) which constructs the temporal behavior of the relationship into the complex network by utilizing the directed limited penetrable visibility graph (DLPVG). Finally, this paper explores the pattern of the associated relationship changes among multivariate time series and the relation with actual events by analyzing the physical characteristics of the network. The remainder of this paper is organized as follows. Section 2 provides a review of the relevant literature. Section 3 describes the method of constructing the MTS-DAN in detail. Experiments and discussion of this study are reported in section 4. Finally, section 5 summarizes the conclusion.
2. Related Work
al
This section introduces the related work of measuring the associated relationship among multivariate time series and the method of constructing time series into complex networks.
urn
Granger causality test [15, 16], mutual information [17, 18] and transfer entropy [19, 20] are commonly used methods to measure associated relationship among multivariate time series. Compared with Granger causality test based on model and mutual information method without directional and dynamic properties, transfer entropy [21, 22] is effective for both linear and nonlinear time series data. It is not limited by the model and the measured associated relationship is directional. Transfer entropy overcomes the shortcomings of Granger causality test and mutual information method.
Jo
Correlation and distance among multivariate time series only reflect limited information. It is necessary to find a more appropriate and rigorous way to describe the associated relationship among multivariate time series and mine more hidden patterns. Visibility graph (VG) method is utilized to map a time series into a complex network, which was put forward by Lacasa et al. [23]. The VG algorithm preserves the structural characteristics of the original time series and mines deeper information from the constructed network [46]. The associated relationship between data is complex in the current condition of large-scale data. The VG algorithm has low complexity and high evaluation to deal with the relationship between multivariate time series. Afterwards, Luque et al. [24] presented horizontal visibility graph (HVG) which is simpler in geometry and more efficient in calculation with the VG. Zhou et al. [25] then proposed an improved method named limited penetrable visibility graph (LPVG). The advantages of the LPVG compared to the VG and the HVG are that the average degree of
2
Journal Pre-proof nodes in the complex network constructed is larger, and the network nodes are more closely connected [26-29]. The degree distribution of nodes in the network change much when the time series changes, which shows that there is a good discrimination effect in the degree distribution. In our research, the conversion between the associated relationships has the time attribute. That is, one associated relationship converts to another associated relationship over time. Therefore, we propose a directed limited penetrable visibility (DLPVG) method based on the LPVG and map the time series to the MTS-DAN.
of
3. Multivariate Time Series-Dynamic Association Network (MTS-DAN)
Pr e-
p ro
The MTS-DAN is a complex network, including the characteristics of associated relationship among multivariate time series change over time. In our research, the nodes in the MTS-DAN represent the associated relationship among multivariate time series in the specific period. We express the relationship as an association pattern and embody information fusion of multivariate time series. The directed edges in the network represent conversion relations of association patterns. Corresponding association pattern of an initial node converts into corresponding association pattern of the end node. From the network, we obtain the associated relationship of multivariate time series at the specific period and the dynamic characteristics of the relationship. In this paper, T ( T1 ,T2 ,...,Tx ) is defined as multivariate time series and the number of series is x .
al
In the process of constructing the MTS-DAN, we first adopt sliding window to divide multivariate time series into sub-sequence data to represent attributes of time series in the specific period. Secondly, we measure the directed associated relationship among different sub-sequences based on transfer entropy algorithm. Then the results are expressed as association patterns and projected in the high-dimensional space. Thirdly, we apply principal component analysis (PCA) algorithm to reduce the dimension of high-dimensional data and get the one-dimensional associated relationship vector group. In particular, the temporal behavior of the association patterns is constructed to the MTS-DAN by the DLPVG.
urn
In summary, the key steps of constructing the MTS-DAN are: (1) Segment multivariate time series by sliding window. (2) Apply transfer entropy method to represent associated relationship and express them as association patterns. (3) Obtain the one-dimensional associated relationship vector group by the PCA algorithm. (4) Construct the complex network according to the DLPVG algorithm. The constructing process of the MTS-DAN is shown in Figure 1. 3.1 Preprocess of Multivariate Time Series
Jo
The first step of constructing the MTS-DAN is preprocessing the data. Suppose the length of time series and the sliding window are L and w respectively. In this paper, we get the appropriate length of sliding window by sensitivity analysis experiment [30, 31]. We choose day t as start point and get sub-periodt, which is from day t to day t ( w 1 ) . Then, we choose day t 1 as start point and get sub-periodt+1, which is from day t 1 to day t ( w 1 1 ) . By that analogy, we obtain a series of sub-periods.
3
urn
al
Pr e-
p ro
of
Journal Pre-proof
Figure 1: Schematic illustration of constructing Multivariate Time Series-Dynamic Association Network.
3.2 Definition and Representation of Association Pattern Based on Transfer Entropy
Jo
This paper carries out a quantitative assessment to describe the associated relationship between multivariate time series by applying transfer entropy method. The associated relationship is directional. In a sub-period, the calculation process of associated relationship between any two time series as below. We first define Markov property of the time series N as: p( Nt 1 | Nt ,..., Nt l 1 ) p( Nt 1 | Nt ,..., Nt l )
Associated relationship from time series N to time series M is given by:
4
(1)
Journal Pre-proof (k )
TN M
(l )
p( M t 1 | M t , Nt ) (l ) , ut ( M t 1 , M t , Nt ) p( ut ) log (k ) p( M t 1 | M t ) ut
(2)
t is the discrete time index. The joint probability density function p( ut ) is the probability of combination of M t 1 , M t (k )
and N t
(l )
with particular values. p( M t 1 | M t
(k )
, Nt
(l )
)
) are the conditional probability that M t 1 has a certain value when the
value of previous variables M t this paper, k l 1 .
(k )
and N t
(l )
are given and M t
(k )
is given respectively. In
of
and p( M t 1 | M t
(k )
p ro
Formula (2) shows when the state of M t at time t is determined by its historical status completely, which ignores associated relationship from N to M . The result of associated relationship from N to M is zero when the state of N does not effect on the transition probability of M . After determining the associated relationship TN M between any two time series, a x x matrix is formed according to the associated relationship of all pairs of time series. ... T1 x (t ) ... T2 x (t ) ... ... Tx x (t )
Pr e-
T11 (t ) T12 (t ) T (t ) T (t ) 2 2 Tt 21 ... ... Tx1 (t ) Tx2 (t )
(3)
To better describe the associated relationship, we express the matrix as an expression. TN M 0 represents time series N has associated relationship to time series M and the
Jo
urn
al
relationship is denoted as N ( M ) . The value of N ( M ) is TN M . Suppose Sa, Sb, Sc, Sd denote four time series respectively and P is the association pattern. In a sub-period, if the associated relationship among the four time series is Sa(Sb,Sd), Sb(Sa,Sc), Sc(Sa,Sb,Sd) and Sd(Sb,Sc), the association pattern would be expressed as P(Sa(Sb,Sd), Sb(Sa,Sc), Sc(Sa,Sb,Sd), Sd(Sb,Sc)) (Figure 2).
Figure 2: The definition and representation of the association pattern.
3.3 Construction of One-Dimensional Associated Relationship Vector Group among Multivariate Time Series According to the PCA method [32-35], the matrix forms one-dimensional associated relationship vector group. The process is: The matrix at time t is described as a state
5
Journal Pre-proof vector TEt ( T1 2 ,T1 3 ,...,T1 x ,...,Tx 1 ,Tx 2 ,...,T( x 1 )( x 1 ) ) .
The
high-dimensional
of
associated relationship vector group is {TE1 ,TE 2 ,...,TE q } . The number n of multivariate time series determines the dimension d of vector TE and the relationship is n! d A( n ,2 ) ( A is the arrangement of mathematical function). Next we apply the ( n 2 )! PCA to reduce dimension of the high-dimensional associated relationship vector group. Relative to the other methods of reducing dimension, the advantage of the PCA is retaining more characteristics of the original data. The one-dimensional associated relationship vector group is expressed as TEP { TEP1 ,TEP2 ,...,TEPq } after dimension reduction.
p ro
3.4 Construction of the MTS-DAN by the DLPVG Algorithm
Pr e-
In our research, the LPVG algorithm is extended to the DLPVG algorithm to reflect the direction of associated relationship among multivariate time series. Each vector in the one-dimensional associated relationship vector group corresponds to an association pattern. The associated relationship is closer when the value increases. Therefore, the DLPVG algorithm is adopted here to construct the one-dimensional associated relationship vector group into the complex network. The DLPVG algorithm has the advantage of representing the conversion ability among the association patterns. The stronger the conversion capacity of an association pattern (node) is, the easier it would convert to other pattern or be converted by other pattern. The effect of this ability in the network is that the degree of the node is larger. The conversion ability relates to the vector values of the corresponding node and its adjacent nodes.
al
Firstly, we obtain a series of association patterns with the move of sliding window. One association pattern converts to another as time goes by: pattern1→pattern2→ pattern3→...→pattern q ( q L w 1 ) . In the DLPVG algorithm, we define every vector in the one-dimensional associated relationship vector group as a node of the network. The connection between the vectors satisfying the visibility criteria is defined as edge. The visibility criterion is:
urn
In the one-dimensional associated relationship vector group TEP { TEP1 ,TEP2 ,...,TEPq } ( i 1,..., q ) with length q , if two random vectors
( ta ,TEPa ) and ( tb ,TEPb ) separated by m vectors and visible to each other, there are s vectors ( ti ,TEPi ) , s N between these two vectors. That is, m is the distance between ( ta ,TEPa ) and ( tb ,TEPb ) , s denotes the number of vectors that satisfy the condition. Among them, vectors ta ti tb
Jo
satisfy TEPi TEPb ( TEPa TEPb )
TEPd TEPb ( TEPa TEPb )
N 1).
tb ti , and other m s vectors ta td tb satisfy tb t a
tb t d . As shown in Figure 3 (the limited penetrable distance tb t a
6
of
Journal Pre-proof
p ro
Figure 3: Directed limited penetrable visibility graph method.
urn
al
Pr e-
In Figure 3, the black vertical bar represents the association pattern at the specific period among multivariate time series. In the one-dimensional associated relationship vector group, every vector is served as a node in the network. The square bar shows the corresponding amplitude, representing the vector value in the one-dimensional associated relationship vector group. Each vector corresponds to an association pattern. The association pattern may repeat throughout the evolution process. According to the above visibility criteria, the nodes satisfying the conditions are connected to construct a directed limited penetrable visibility network. The solid connecting line in the figure is obtained based on the VG. The virtual connecting line represents the added line according to the DLPVG. To map a complex network, this paper regards association patterns as nodes. The edges are obtained by the DLPVG. The weights of the edges are all one. The connection of node A and B in the network means that the association pattern at time A could convert to the association pattern at time B. The DLPVG inherits the dynamic characteristics of the one-dimensional associated relationship vector group which means the visual axis unchanged after scaling. The connectivity of network is stable. Each node connects at least 2( N 1 ) nodes, which reflects the conversion trend of the association patterns. The vectors in the one-dimensional associated relationship vector group TEP { TEP1 ,TEP2 ,...,TEPq } are regarded as nodes in the network. According to the visibility criterion of the DLPVG, we construct the MTS-DAN.
4. Experiments and Discussion
Jo
In this paper, we focus on the characteristics of the evolution behavior of the relationship among multivariate time series. As the precondition of our work, there should be the theoretical and realistic relationships among multivariate time series. This paper chooses two kinds of data to experiment. Firstly, we analyze the adaptability of the MTS-DAN method to artificially generated data and their noise-added data. Secondly, real life data are used to show the results of the MTS-DAN including SST data and financial data. 4.1 Artificially Generated Data We first use artificially generated time series (periodic data, fractal data and random data) to construct the MTS-DAN respectively. The adaptability of the networks is measured from two aspects: recognition and anti-noise ability. This paper obtains a grasp of the behavior of the information-less time series by analyzing statistical characteristics of the network, including
7
Journal Pre-proof
of
the number of nodes and edges, density, the average path length, the degree distribution and the average clustering coefficient. The number of nodes and edges reflect the density of the MTS-DAN. The average path length of the MTS-DAN indicates the average distance between all pairs of nodes, which describes the degree of separation between nodes in the network. This paper analyzes the degree distribution by degree centrality. The degree centrality is an index to evaluate the importance of the node according to the number of its adjacent nodes. The importance of a node is reflected when the number of its adjacent nodes increases. The average clustering coefficient describes aggregation trend of the nodes in the MTS-DAN, which measures the clustering condition within the network by averaging the clustering coefficient of all their nodes. In addition, we apply Gauss white noise to process the artificially generated data and investigate the recognition ability and anti-noise ability of the MTS-DAN.
p ro
4.1.1 Description of Periodic Data, Fractal Data and Random Data
In order to analyze the adaptability of the MTS-DAN method to periodic data, fractal data and random data, four time series are selected respectively in this paper for experiments.
The sampling intervals are all 0.01.
Pr e-
Periodic data: The sinusoidal data are used to construct the MTS-DAN, which are y s i n(x ) y s i n2(x ) y s i n3(x ) y s i n2(x ) 2
(4)
urn
al
Fractal data: We apply fractional Brownian motion (fBm) data to construct the MTS-DAN. FBm is a typical fractal motion and its equation is f ( x x ) f ( x ) P t F( t ) (5) || x || H
Jo
where, x is a point in the E-dimensional Euclidean space R E . F ( t ) denotes the distribution function of the random variable t and t obeys standard normal distribution N ( 0 , 2 ) . || x || represents the sampling interval. H is Hurst parameter and H ( 0 ,1 ) . In this paper, the values of H are selected as 0.2, 0.3, 0.4, 0.5 to construct the MTS-DAN respectively. Random data: Random time series are generated in this paper following the random walk process. The lengths of the sinusoidal sequence, the fBm sequence and the random sequence are all 1000. Furthermore, every sequence is added Gauss white noise with intensity of 10 dB respectively for comparison experiment.
8
Journal Pre-proof 4.1.2 Statistical Characteristics of the MTS-DAN
Pr e-
p ro
of
Example with a sliding window length of 50, the results of the degree distribution of the MTS-DANs constructed by the artificially generated data are shown in Figure 4. We can recognize the degree centrality of the nodes all obey power law distribution p( w ) ~ w , which indicates that the method converts the three data to scale-free networks. The results demonstrate that the MTS-DAN has the ability to recognize artificially generated time series. By the MTS-DAN method, we find that there are some key association patterns among the multivariate periodic, fractal and random time series respectively. In addition, the degree distribution of the noise-added data networks are all slightly different from those of the original data networks, indicating that the method has better anti-noise ability to the three artificially generated data.
Figure 4: The degree centrality of the nodes of the MTS-DAN constructed by the artificially generated data. (a) Periodic data network and noise-added periodic data network. (b) Fractal data network and noise-added fractal data network. (c) Random data network and noise-added random data network.
al
The analysis of artificially generated data all made based on physical structure indexes of the MTS-DAN (Table 1). Compared with the original artificially generated data networks, the numbers of edges of the noise-added data networks are less respectively. Simultaneously, density, the average clustering coefficient and the average path length of the noise-added data networks all have less change which represents the MTS-DANs are less affected by noise. The experiments demonstrate the MTS-DAN has anti-noise ability by applying the artificially generated data. In summary, we can conclude that the MTS-DAN method has a good adaptability to artificially generated time series. Table 1: The analysis of artificially generated data based on physical structure indexes of the MTS-DAN.
Noise-added Fractal periodic data data network network
Noise-added fractal data network
Random data network
Noise-added random data network
950
950
950
950
950
950
5903
5807
6088
5789
6951
5823
0.007
0.006
0.007
0.006
0.008
0.006
0.367
0.367
0.367
0.368
0.366
0.368
4.617
4.714
4.512
4.402
4.336
4.617
Jo
The number of nodes The number of edges Density The average clustering coefficient The average path length
Periodic data network
urn
Indexes
9
Journal Pre-proof 4.2 Description of Real Life Data 4.2.1 SST Time Series Data
Pr e-
p ro
of
The SST data used in this experiment is the Optimum Interpolation Sea Surface Temperature (OISST), an optimally interpolated SST, from the National Oceanic and Atmospheric Administration (NOAA). The data is global grid data. The horizontal resolution is 1°×1°, and time resolution is days. The time length is from January 1, 1993 to July 16, 2017 (8,963 days). The research objects are selected at five points in the Yellow Sea. The visualization is shown in Figure 5.
Figure 5: SST data description.
4.2.2 Financial Time Series Data
al
The financial time series data we used from the international oil market, which are the West Texas Intermediate spot price (WTI), the Brent spot price (BRT), the New York harbor conventional gasoline regular spot price (NYH) and the U.S. gulf coast conventional gasoline regular spot price (USG). The time length is from January 4, 2000 to December 31, 2015 (4,018 days). 4.3 Statistical Characteristics of the MTS-DAN
urn
When the lengths of the periods are different, the specific patterns are shown according to the number of nodes and edges, the average path length and the average clustering coefficient of the MTS-DAN. By the sensitivity analysis, the appropriate lengths of sliding windows are selected. We verify the effectiveness of the selected lengths through degree centrality. 4.3.1 SST Time Series Data
Jo
By setting different periods to construct the MTS-DAN of SST time series data, we obtain the characteristics of the number of nodes and edges, the average clustering coefficient and the average path length (Figure 6). As we can see in Figure 6 (a) (b), as the length of the sliding window increases, the number of nodes and edges in the MTS-DAN both decrease. When the length of the sliding window is between 100 days and 500 days, the average clustering coefficient and the average path length change fastest. According to the results of the number of the edges and the average clustering coefficient, we set the sliding window to 100 days, 300 days, 365 days, 500 days, 1000 days or 1500 days respectively. Then the degree distribution is obtained at the different size of the sliding window. We can recognize the degree centrality of the node in the MTS-DAN obeys power law distribution
10
Journal Pre-proof
p ro
of
p( w ) ~ w (Figure 6(c)). Fewer nodes occupy higher weights, means that several types of association patterns play an essential role in the evolution behavior of the relationship among multivariate time series. The degree distributions of the six parameters indicate that the MTS-DANs obtained are all scale-free networks. The overall trend of the degree distribution of the parameter w 100 days is different from those of the other five parameters. From the above statistical characteristics with the different periods, the physical structure of the network changes most obvious when the period is between 100 days and 500 days.
4.3.2 Financial Time Series Data
Pr e-
Figure 6: Statistical characteristics of the MTS-DAN obtained by SST time series data with the different periods. (a) The number of nodes and edges. (b) The average clustering coefficient and the average path length of the networks. (c) The distribution of the degree. ‘W’ represents the length of sliding window.
Jo
urn
al
The results of statistical characteristics of the MTS-DAN obtained by spot price time series data with the different periods are shown in Figure 7. The number of nodes and edges in the MTS-DAN both decrease when the length of the sliding window increases (Figure 7 (a)). The number of edges and the average clustering coefficient, the average path length all change most obvious when the length of the sliding window is between 10 days and 200 days (Figure 7 (a) (b)). Therefore we set the sliding window to 10 days, 100 days, 200 days, 300 days, 500 days or 1000 days respectively and obtain the degree distribution. The degree centrality of the nodes in the MTS-DAN all obey power law distribution p( w ) ~ w (Figure 7(c)). According to the statistical results, the structural fluctuations of the network are most obvious when the length of the sliding window is between 10 days and 200 days. The sensitivity of the network is strongest indicating that the evolution behavior of the relationship among multivariate spot price time series changes sharply between these periods.
Figure 7: Statistical characteristics of the MTS-DAN obtained by spot price time series data with the different periods. (a) The number of nodes and edges. (b) The average clustering coefficient and the average path length of the networks. (c) The distribution of the degree. ‘W’ represents the length of sliding window.
11
Journal Pre-proof 4.4 Long-Term Trends and Clustering Effect of Associated Relationship among Multivariate Time Series In the full period, we analyze the long-term trends of the evolution behavior of the relationship among multivariate time series. This paper applies community partitioning algorithm [36, 37] to analyze the clustering effect of the MTS-DAN. The more densely connected points in the network are divided into one community when the modularity is as large as possible.
of
4.4.1 SST Time Series Data
urn
al
Pr e-
p ro
For SST time series data, there is an association pattern among five SST time series in the full period (Figure 8 (a)). The associated relationship of Point A to Point D, Point B to Point A, Point E to Point A are stronger which are denoted by P(Point A(Point D), Point B(Point A), Point E(Point A)). The results show the associated relationship among five SST time series in the Yellow Sea in the long-term trend. The MTS-DAN constructed by SST data has ten communities when the period is 300 days (Figure 8 (c)). The number of nodes in the three communities is higher than 10% respectively. The amount of the nodes among the three communities accounts for 69.12%. Therefore, the three communities are the main communities that represent the entire network. Figure 8 (b) shows the evolution behavior of the associated relationship in the long-term process. All association patterns are divided into communities by community partitioning algorithm. Every association pattern corresponds to a community. The clustering effect appears in the long-term evolution process. The analysis of communities provides the important information for predicting and studying the evolution behavior of the relationship among SST time series.
Jo
Figure 8: (a) Association pattern of the MTS-DAN constructed by SST data during the entire period. (b) The evolution behavior of the associated relationship in the long-term process ( W 300 days). Zero to nine represents the number of ten communities respectively. (c) The result of community division of the MTS-DAN. There are three main communities in the network when the period is 300 days.
4.4.2 Financial time series data The results of long-term trends and clustering effect of the MTS-DAN constructed by spot price time series are shown in Figure 9. The association pattern among four spot price time series during the entire period is P(WTI(BRT), BRT(USG), NYH(WTI, BRT)) (Figure 9 (a)). Through the experiments, the MTS-DAN is divided into seven communities when the period is 200 days (Figure 9 (c)). The number of nodes in the three communities is higher than 17% respectively. The amount of the nodes among the three communities accounts for 71.48%. The three communities are the main communities that represent the entire network. In Figure 9 (b), the clustering effect of the evolution behavior of the associated relationship among spot 12
Journal Pre-proof
of
price time series appears in the long-term process, which provides the basis for analyzing the international oil market.
p ro
Figure 9: (a) Association pattern of the MTS-DAN constructed by oil price data during the entire period. (b) The evolution behavior of the associated relationship in the long-term process ( W 200 days). Zero to six represents the number of seven communities respectively. (c) The results of community division of the MTS-DAN. There are three main communities in the network when the period is 200 days.
4.5 Analysis between the MTS-DAN and Actual Events
al
Pr e-
In the MTS-DAN of this paper, the value of transfer entropy represents the strength of the associated relationship (as described in part 3.2). The associated relationship is closer when the value increases. The results of associated relationship among multivariate time series are expressed as high-dimensional data and then are mapped into one-dimensional space. According to the concept of the DLPVG algorithm, the number of connections between a node and other nodes relates to the values of the node and its adjacent nodes. Betweenness centrality (BC) is applied to measure the importance of the node within the MTS-DAN to find the key node, in terms of the fraction of shortest paths passing through that node. Nodes with high betweenness centrality have significant impact on the conversion of associated relationship in the MTS-DAN. Analyzing the relation between the key association pattern and the actual events is helpful to research the connection between the evolution behavior of the relationship and the events. Betweenness centrality of the node is calculated as follows: g st ( a ) 2 ( N 1 )( N 2 ) s a t g st
(6)
urn
CB ( a )
where g st is the total number of shortest paths between node s and t , and g st ( a ) is the number of such paths that pass through node a . N is the number of nodes in the network. 4.5.1 SST Time Series Data
Jo
This paper analyzes the relation between the evolution behavior of the relationship among multivariate SST data and El Niño phenomena [38-40]. The El Niño indexes [41, 42] are different internationally. According to the National Centers for Environmental Prediction (NCEP), we use Niño3 index to determine the El Niño events. Table 2 shows the year and duration of the El Niño events from 1993 to 2017. The nodes connected in the MTS-DAN reflect the conversion between two association patterns. Mining the physical characteristics of the MTS-DAN is useful to analyze the fluctuation of SST. According to the sensitivity analysis of SST data, we set the length of sliding window is 100 days, 300 days, 365 days or 500 days to construct the MTS-DAN
13
Journal Pre-proof respectively. Few distinct peaks are shown in the results of betweenness centrality. The peak indicates that the association pattern corresponding to the node is significant. We take the top 20% of the nodes in the betweenness centrality results as key nodes according to the characteristics of the scale-free network. The key nodes in the MTS-DANs are shown in Figure 10. The results indicate that the corresponding time of the key node mostly corresponds to the second half of the El Niño occurrence period. Table 2: Year and duration of the El Niño events from 1993 to 2017
p ro
of
Duration (months) 14 9 6 11 14
urn
al
Pr e-
Year 1997/1998 2002/2003 2006/2007 2009/2010 2014/2016
Figure 10: The relation between the key nodes of the MTS-DAN and the El Niño events. The yellow dashed box is the key node and the solid red box is the event. (a)(b)(c)(d) represent the results when the period is 100 days, 300 days, 365 days or 500 days respectively.
Jo
There are coincidences between the key nodes and the events. And there are two types of errors in the experiment. The first one is the event did not occur at the key node. The second error is, which is not a key node when the event occurs. This paper defines the two type nodes as noise nodes and error nodes respectively. The precision and recall rates, the F1 value are adopted to evaluate the results. The precision and recall rates reflect the accuracy and comprehensiveness of detecting the occurrence of events respectively. Because there is an inverse interdependence between the precision rate and the recall rate, we measure the comprehensive results of the precision and recall rates by the F1 value. The precision rate, the recall rate and the F1 value are all 100% when the period is 300 days (Table 3).
14
Journal Pre-proof Table 3: The results of the precision rate, the recall rate and the F1 value at the different periods
Period (days) 100 300 365 500
Precision 0.83 1 0.80 0.80
Recall 0.83 1 0.80 0.80
F1 0.83 1 0.80 0.80
4.5.2 Financial time series data
of
According to the literature [43, 44], there are four events related to the crude oil market from 2000 to 2015 as shown in Table 4. Table 4: The events and year related to the crude oil market from 2000 to 2015.
Events Iraq war "Katrina" in Mexico Global financial crisis Unrest in Syria and the Middle East
p ro
Year 2003.3 2005.8 2008.9 2010-2014
Jo
urn
al
Pr e-
Analyzing the MTS-DAN constructed by oil price time series is useful to research the crude oil market. According to the sensitivity analysis of the oil price data, we set the length of the sliding window is 30 days, 100 days, 150 days or 200 days to construct the MTS-DAN respectively. The key nodes of the MTS-DAN are shown in Figure 11.
Figure 11: The relation between the key nodes of the MTS-DAN and the events in the crude oil market. The yellow dashed box is the key node and the solid red box is the event. (a)(b)(c)(d) represent the results when the period is 30 days, 100 days, 150 days or 200 days respectively.
15
Journal Pre-proof The precision rate, the recall rate and the F1 value are shown in Table 5 when the period is 30 days, 100 days, 150 days or 200 days respectively. The results of the precision rate, the recall rate and F1 value are 100% when the period is 150 days. The experiment indicates the fluctuation of crude oil market has the closest relation with the events when the period is 150 days. The conclusion is consistent with the strategic petroleum reserves (SPRs) of some developed countries. SPRs reduce the risks to the world economy associated with reliance on geographically concentrated crude oil supplies, which has great significance to many countries [48]. Precision 0.75 0.75 1 0.75
4.6 Performance analysis
Recall 0.75 0.75 1 0.75
F1 0.75 0.75 1 0.75
p ro
Period (days) 30 100 150 200
of
Table 5: The results of the precision rate, the recall rate and the F1 value at the different periods
Jo
urn
al
Pr e-
This paper adopts the method of Granger causality test, mutual information and LPVG to construct the network for comparison. Jiang et al. proposed the Granger causality test to reconstruct multivariate time series [43]. They regard the causal pattern between multivariate time series as nodes and use the following sequence relationship between the patterns as edges to find the relationship among the four oil stocks time series. We apply the data to construct the network by Granger causality test, mutual information and LPVG respectively. The results of betweenness centrality of the network and the events are compared to demonstrate the performance of the MTS-DAN. The precision and recall rates, the F1 value with the different periods obtained by the three networking methods are shown in Figure 12. From the results, the MTS-DAN method has a better performance where the precision and recall rates, the F1 value are higher than those of Granger causality test, mutual information and LPVG.
16
Journal Pre-proof Figure 12: (a) The results of SST data. (b) The results of financial data. MP, MR, and MF are the precision, the recall, and F1 value obtained by the mutual information respectively. GP, GR, and GF are the precision, the recall, and the F1 value obtained by the Granger causality test respectively. LP, LR, and LF are the precision, the recall, and F1 value obtained by the LPVG respectively. P, R, and F are the precision, the recall, and F1 value obtained by the MTS-DAN respectively.
5. Conclusions
Conflicts of Interest
Pr e-
p ro
of
In this paper, the method of constructing Multivariate Time Series-Dynamic Association Network (MTS-DAN) is proposed. We construct the associated relationship into the complex network to explore the characteristics of the evolution behavior of the relationship among multivariate time series. The experiments of artificially generated data, SST and financial data show that the MTS-DAN has essential reference value and practical significance for the analysis of the associated relationship among multivariate time series. Firstly, the degree of the nodes of the network obeys the power law distribution, which indicates that there are key association patterns among multivariate time series. The results of artificially generated data show that the MTS-DAN has good recognition ability and anti-noise ability. Secondly, the results of betweenness centrality identify the key association patterns are related to the actual events. This shows that studying the characteristic of the evolution behavior of the relationship among multivariate time series is useful for predicting the events. Thirdly, the community division of the network indicates that the clustering effect of the association patterns appears in the long-term evolution process. Finally, the advantage of the MTS-DAN method is not limited to the type or number of multivariate time series. The method can not only handle four or five time series, but also explore the relationship among fewer or more time series, which is meaningful for the research of data mining. Future studies might also work on exploring the upper bound and lower bound of the number of multivariate time series.
urn
Funding Statement
al
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work was supported by National Program on Key Research Project of China [2016YFC1401900] and Open fund of the Key Laboratory of Digital Ocean, SOA [B201801030].
[1] [2] [3] [4] [5]
Jo
References
Sufang An, Xiangyun Gao, Meihui Jiang, et al. Multivariate financial time series in the light of complex network analysis. Physica A: Statistical Mechanics and its Applications,2018,503:1241-1255. Xu M, Han M, Lin H. Wavelet-denoising multivariate echo state networks for multivariate time series prediction. Information Sciences, 2018. Yan Y, Zhang S, Tang J, et al. Understanding characteristics in multivariate traffic flow time series from complex network structure. Physica A Statistical Mechanics & Its Applications, 2017, 477. Huang Q, Zhao C, Wang X, et al. Predicting the structural evolution of networks by applying multivariate time series. Physica A Statistical Mechanics & Its Applications, 2015, 428:470-480. Lu S, Zhang H, Li X, et al. Modeling the global ionospheric variations based on complex network. Journal of Atmospheric and Solar-Terrestrial Physics, 2018.
17
Journal Pre-proof
[12] [13] [14] [15]
[16] [17] [18] [19] [20] [21] [22] [23]
[24] [25] [26]
[27]
[28] [29] [30] [31] [32]
of
[11]
p ro
[10]
Pr e-
[9]
al
[8]
urn
[7]
Óskarsdóttir M, Calster T V, Baesens B, et al. Time series for early churn detection: using similarity based classification for dynamic networks. Expert Systems with Applications, 2018, 106. Du X, Shao F, Wu S, et al. Complex network modeling for mechanisms of red tide occurrence: A case study in Bohai Sea and North Yellow Sea of China. Ecological Modelling, 2017, 361:41-48. Deza J I, Ihshaish H. The Construction of Complex Networks from Linear and Nonlinear Measures – Climate Networks . Procedia Computer Science, 2015, 51:404-412. Long W, Guan L, Shen J, et al. A complex network for studying the transmission mechanisms in stock market. Physica A Statistical Mechanics & Its Applications, 2017, 484:345-357. An Y, Sun M, Gao C, et al. Analysis of the impact of crude oil price fluctuations on China's stock market in different periods-Based on time series network model. Physica A Statistical Mechanics & Its Applications, 2018, 492. Tanizawa T, Nakamura T, Taya F. Directed networks with underlying time structures from multivariate time series. Physics, 2014. Tuncel K S, Baydogan M G. Autoregressive forests for multivariate time series modeling. Pattern Recognition, 2017. Li J, Pedrycz W, Jamal I. Multivariate Time series Anomaly Detection: A Framework of Hidden Markov Models. Applied Soft Computing, 2017. Gao W, Ye W C W. Directed information graphs for the Granger causality of multivariate time series. Physica A Statistical Mechanics & Its Applications, 2017, 486. Charakopoulos A K, Katsouli G A, Karakasidis T E. Dynamics and causalities of atmospheric and oceanic data identified by complex networks and Granger causality analysis. Physica A Statistical Mechanics & Its Applications, 2018, 495. Gao W, Ye W C W. Directed information graphs for the Granger causality of multivariate time series. Physica A Statistical Mechanics & Its Applications, 2017, 486. Xue C, Song W, Qin L, et al. A mutual-information-based mining method for marine abnormal association rules. Computers & Geosciences, 2015, 76:121-129. Escolano F, Hancock E R, Lozano M A, et al. The mutual information between graphs. Pattern Recognition Letters, 2016, 87(C):12-19. He J, Shang P. Comparison of transfer entropy methods for financial time series. Physica A Statistical Mechanics & Its Applications, 2017, 482. Mao X, Shang P. Transfer entropy between multivariate time series. Communications in Nonlinear Science & Numerical Simulation, 2017, 47:338-347. Choi H. Localization and regularization of normalized transfer entropy. Neurocomputing, 2014, 139:408-414. Guo C, Yang F, Yu W. A Causality Capturing Method for Diagnosis Based on Transfer Entropy by Analyzing Trends of Time Series. Ifac Papersonline, 2015, 48(21):778-783. Lacasa L, Luque B, Ballesteros F, et al. From time series to complex networks: the visibility graph.. Proceedings of the National Academy of Sciences of the United States of America, 2008, 105(13):4972-5. Luque B, Lacasa L, Ballesteros F, et al. Horizontal visibility graphs: exact results for random time series. Physical Review E Statistical Nonlinear & Soft Matter Physics, 2009, 80(2):046103. Zhou T T. Limited penetrable visibility graph for establishing complex network from time series. Acta Physica Sinica, 2012, 61(3):355-367. Wang J, Yang C, Wang R, et al. Functional brain networks in Alzheimer’s disease: EEG analysis based on limited penetrable visibility graph and phase space method. Physica A Statistical Mechanics & Its Applications, 2016, 460:174-187. Zhong-Ke Gao, Qing Cai, Yu-Xuan Yang, et al. Time-dependent limited penetrable visibility graph analysis of nonstationary time series. Physica A Statistical Mechanics & Its Applications, 2017, 476:43-48. Li X, Sun M, Gao C, et al. The parametric modified limited penetrable visibility graph for constructing complex networks from time series. Physica A Statistical Mechanics & Its Applications, 2018, 492. Wang R F, Liu J, Wang J. Epileptic EEG analysis algorithm based on power spectrum and limited penetrable visibility graph. Journal of Computer Applications, 2016. Borgonovo E, Buzzard G T, Wendell R E. A Global Tolerance Approach to Sensitivity Analysis in Linear Programming. European Journal of Operational Research, 2017, 267. Hamdia K M, Ghasemi H, Zhuang X, et al. Sensitivity and uncertainty analysis for flexoelectric nanostructures. Computer Methods in Applied Mechanics & Engineering, 2018, 337. Bouhlel J, Jouanrimbaud D B, Abouelkaram S, et al. Comparison of common components analysis with principal components analysis and independent components analysis: Application to SPME-GC-MS volatolomic signatures. Talanta, 2018, 178:854.
Jo
[6]
18
Journal Pre-proof
[41] [42] [43] [44]
[45] [46] [47]
[48]
of
[39] [40]
p ro
[38]
Pr e-
[37]
al
[35] [36]
urn
[34]
Alharbi T. Energy resolution improvement of CdTe detectors by using the principal component analysis technique. Nuclear Instruments & Methods in Physics Research, 2018, 882:114-116. Schimit P H T, Pereira F H. Disease spreading in complex networks: A numerical study with Principal Component Analysis. Expert Systems with Applications, 2018, 97:41-50. Gupta A, Barbu A. Parameterized Principal Component Analysis. Pattern Recognition, 2018, 78. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, Fast unfolding of communities in large networks, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P1000. Yang L, Ying S, Shan J, et al. The Research of Weighted Community Partition based on SimHash . Procedia Computer Science, 2013, 17:797-802. Alizadehchoobari O. Contrasting global teleconnection features of the eastern Pacific and central Pacific El Niño events. Dynamics of Atmospheres & Oceans, 2017, 80:139-154. Trenberth K E. El Niño Southern Oscillation (ENSO) . Journal of Hydrologic Engineering, 2013(5). Wang L, Li Q, Mao X Z, et al. Interannual sea level variability in the Pearl River Estuary and its response to El Niño–Southern Oscillation. Global & Planetary Change, 2018, 162:163-174. Gong H, Wang Z G, Zhao C H. A new index for El Niño . Marine Forecasts , 2017, 34(3):17-25. Shuhua Liu, Xiaotao Peng, Qiong Chen, et al. The 1997–1998 El Niño event recorded by a stalagmite from central China. Quaternary International,2018,487:71-77. Jiang M, Gao X, An H, et al. Reconstructing complex network for characterizing the time-varying causality evolution behavior of multivariate time series. Scientific Reports, 2017, 7(1):10486. Huang X, An H, Gao X, et al. Multiresolution transmission of the correlation modes between bivariate time series based on complex network theory. Physica A Statistical Mechanics & Its Applications, 2015, 428:493-506. Koutlis C , Kugiumtzis D . Discrimination of coupling structures using causality networks from multivariate time series. Chaos, 2016, 26(9):424. Gao Z K , Small M , Jürgen Kurths. Complex network analysis of time series. EPL (Europhysics Letters), 2016, 116(5):50001. Gao Z K , Fang P C , Ding M S , et al. Multivariate weighted complex network analysis for characterizing nonlinear dynamic behavior in two-phase flow. Experimental Thermal and Fluid Science, 2015, 60:157-164. David L. Weimer, Strategic Petroleum Reserves. Encyclopedia of Energy, 2004, 739-748.
Jo
[33]
19
Author Contributions Section
Journal Pre-proof Author Contributions X.Y., S.S. and L.X. designed the structure of the research; X.Y. performed the research; X.Y. and J.Y. designed the model; X.Y., and Y.L. analyzed the data; X.Y. wrote the paper. All co-authors of this paper reviewed the
Jo
urn
al
Pr e-
p ro
of
manuscript.