Chaos, Solitons and Fractals 123 (2019) 294–303
Contents lists available at ScienceDirect
Chaos, Solitons and Fractals Nonlinear Science, and Nonequilibrium and Complex Phenomena journal homepage: www.elsevier.com/locate/chaos
Applying correlation dimension to the analysis of the evolution of network structureR Chun-Xiao Nie School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou 310018, China
a r t i c l e
i n f o
Article history: Received 20 November 2018 Revised 2 March 2019 Accepted 12 April 2019
MSC: 00-01 99-00 Keywords: Correlation dimension Financial market Threshold network Jaccard distance
a b s t r a c t We propose a network model that includes some critical events. These critical events correspond to drastic changes in the structure. Here, the correlation dimension is found to be able to globally characterize changes in the network structure. Based on the model, we find a relationship between the stability of the network structure and the dimension. The structural evolution of the network is divided into different phases, with corresponding critical events between the different phases. If the network changes less in the same phase, the resulting Jaccard distance matrix has a smaller dimension. We also validated the conclusions based on the model with real data. We use stock data to construct some threshold networks, and find that the dimension of the surrogate time series is larger than the dimension based on the original data. This implies that changes in the network structure are not fully extracted by the one factor model. Finally, by comparing with the network model based on Erdös-Rényi random graph, we find that the correlation dimension can be used to capture the hidden temporal features in the network set.
1. Introduction In the last two decades, complex networks have been widely studied as a modern tool for analyzing natural and social systems [1–5]. In particular, some non-trivial structures have important practical meanings and are widely concerned, such as motifs and communities [6–9]. As the number of constructed networks increases, mining patterns in network-based collections emerges as an important topic. At present, some researchers are concerned with this problem. For example, researchers used a spectral approach to characterize the similarities between graphs and studied datasets based on a large number of graphs, where each point is an Erdös-Rényi graph or Chung-Lu graph and so on [10]. In this article, we focus on a collection of networks with evolutionary structure. First, we propose a toy model for generating a collection of networks with evolutionary structure. One of the features of the model is that the structure evolves over time. Some researchers have constructed evolving network models for evaluating community detection algorithms, etc. For example, Granell et al. proposed a model with a community structure [11], and Sengupta et al. proposed a model with overlapping community structures [12]. Compared with the previous researches, the proposed model does not focus on the changes of the community structure, but on
R
Fully documented templates are available in the elsarticle package on CTAN. E-mail addresses:
[email protected],
[email protected]
https://doi.org/10.1016/j.chaos.2019.04.022 0960-0779/© 2019 Elsevier Ltd. All rights reserved.
© 2019 Elsevier Ltd. All rights reserved.
the characteristics of the sudden change of the global structure. Thus, the second feature of the proposed model here is with some critical events. Near these critical events, the network structure changes drastically. We find that the evolutionary characteristics of the financial threshold network are similar to those of the model presented here. In order to describe this global structural change, we use the correlation dimension as a basic indicator to study the relationship between network structure changes and dimensions. Correlation dimension is an important concept in nonlinear dynamics, which is often used to study the dimension of attractors [13,14]. Correlation dimensions are not only used in nonlinear systems, but have also been used in financial data analysis [15–17], as well as in machine learning [18,19]. Recently, the correlation dimension has been used to analyze the stocks in the financial market [20]. Furthermore, the researcher has applied the model to study the relationship between the correlation dimension and the cluster structure [21]. Further, the random matrix theory is applied to characterize the cluster structure and compared with the method based on the correlation dimension [22]. The evolution of different types of networks has been widely concerned by researchers. For example, some studies focus on the characteristics of networks based on correlation matrices over time, which are commonly found in the analysis of financial data [23]. In addition, previous research has focused on the evolution of the community structure of Blogosphere-based networks [24]. In the study of this article, it is necessary to transform the evolution process of the network structure into a distance matrix when
C.-X. Nie / Chaos, Solitons and Fractals 123 (2019) 294–303
studying the changes in the network structure, so that the correlation dimension can be applied. Currently, there are many metrics that are used to characterize the distance between networks. Many metrics are constructed based on maximum common subgraph (MCS), but finding MCS is an NP hard problem [25–34]. Here, we use the Jaccard distance to describe the difference in structure between networks, so that the Jaccard distance matrix can directly express the evolution of the structure. In network analysis, the Jaccard similarity (distance) defined on the edge set is a concept that characterizes structural similarity [35,36]. It can be proved that Jaccard distance is a metric [37]. In many fields, Jaccard similarity or Jaccard distance is often used as a basic measure to characterize the similarity between sets [38–42]. In addition, the Jaccard distance of the network has been used to analyze the dynamics of the network. Arnaboldi and Passarella et al. analyzed the stability of the network using the Jaccard similarity of the network in analyzing the dynamics of the Ego network [43]. The Jaccard distance is also used to study the similarities between networks in the natural sciences, such as metabolic networks [44]. There are a variety of methods that can be used to characterize the differences between the structures of the network, in addition to Jaccard distances, as well as indicators such as Hamming distance and spanning tree similarities [35]. Furthermore, studies have used spectral methods to characterize networks, such as applying eigenvalues of adjacency matrices and Frobenius norm to characterize the level of non-normal networks that deviate from normal [45]. In order to study the application of the proposed model in real data analysis, we construct a series of networks based on time series and find that there are some features consistent with the model. In the study of financial time series, because it contains dynamics, it is possible to construct a series of networks. For example, when we specify a calculation time window and slide the time window to calculate the correlation coefficient matrix, a series of networks based on these matrices can be generated. Then, the distance defined on a series of networks is used to characterize the relationship between the networks, so that we can study the dynamics of the network by analyzing a distance matrix. We will focus on the distance matrix of the networks generated by correlation matrix in the financial market. In recent years, these networks have received increasing attention from researchers, and a large number of papers have studied financial networks in different markets [46–49]. In econophysics, researchers often use MST (Minimum Spanning Tree) and PMFG (Planar Maximally Filtered Graph) algorithms to construct networks (graphs) [46,47], or directly use threshold methods to extract network structures in correlation matrices [48,49]. Assuming that there are n financial time series, MST and PMFG respectively construct a network with n − 1 or 3(n − 2 ) edges, and these structures can effectively extract the structure in the correlation coefficient matrix. In addition, the method for constructing the threshold network includes several types, for example, Yang and Yang extract correlation coefficients whose absolute values are larger than a specified value [48], and other methods use correlation coefficients that are greater than a specified threshold [49], and so on. Based on these commonly used methods, researchers have systematically analyzed financial data of different types or markets, such as the stock market and the foreign exchange market, and so on [50–53]. In recent years, some researchers have applied network metrics to analyze the dynamics of financial networks. For example, in [54], the author studied the Jaccard distance matrix of the financial threshold network, and in [55], the researchers analyzed the financial PMFG based on the edit distance matrix. Here, we use the correlation dimension to analyze the Jaccard distance matrix between the financial threshold networks. We find that the
295
correlation dimension can effectively depict the fine structure in the Jaccard matrix globally. This paper is organized as follows. First, we describe the methods used in this paper, including correlation dimension, threshold network, network model with community structures, and the model proposed in this article. Second, in the results section, we use the model to analyze the correlation dimensions of different parameters and study the relationship between dimensions and the stability of network changes. Then, we use financial data to construct an example and apply the correlation dimension to analyze the difference between the structure extracted by the financial model and the hidden structure in the original data. Finally, we analyze the correlation dimension as a global indicator that can be used to capture the evolutionary structure in the network collection. 2. Data and methods 2.1. Data We use price data for constituents of the S&P100 index, and some of the stocks with missing data are removed. We select 93 stocks and extract a daily closing price series for each stock from 2006/9/13 to 2010/9/15 (A total of 757 trading days.). All of the closing price time series used in this article can be obtained publicly and extracted from Yahoo Finance (https://finance.yahoo. com/). 2.2. Threshold network We assume that a total n stocks. In the calculation of the threshold network, first, the price series {Pi (t)} (i = 1 · · · n) of the stock i is converted into a return series {ri (t)}, where ri (t ) = log(Pi (t + 1 )) − log(Pi (t )). Second, we calculate the correlation coefficient as in Eq. (1), such that an n × n correlation coefficient matrix ρ = [ρ (i, j )] is calculated. Third, we generate a threshold network from this correlation matrix, where each network node corresponds to a stock.
ρ (i, j ) =
ri (t )r j (t ) − ri (t )r j (t ) (ri (t )2 − ri (t )2 r j (t )2 − r j (t )2 )1/2
(1)
Here, we use the threshold method commonly used in existing research to construct a network, that is, specify a threshold and convert the correlation coefficient matrix into an adjacency matrix [48,49]. Since the elements in the matrix of correlation coefficients of different periods may change dramatically, it is necessary to set thresholds to ensure that different threshold networks can be properly compared. Furthermore, in order to compare with the model in Section 2.7, as in previous studies, we used a threshold constraint such that the number of connected edges in the threshold network did not change over time [56]. The way we determine the threshold is as follows. First, we construct the set ρtri = {ρ (i, j ), j > i} with the elements of the upper triangular matrix of ρ , then, a threshold constraint is defined as follows, pt is defined as pt = Card ({g > at , g ∈ ρtri } )/Card (ρtri ), the corresponding at is the selected threshold and the symbol “Card” means the cardinality of the set. Thus, when the number of stocks remains constant, the number of edges in the threshold network does not change, although the corresponding thresholds may change at different times. The dynamics of the network naturally involves a sequence of networks. Here, we move the time window to construct this sequence, where the length of the window is L1 . Thus if the length of the time series is L and the sliding time window is 1, then this sequence includes L − L1 + 1 networks. In our study, L1 is set to
296
C.-X. Nie / Chaos, Solitons and Fractals 123 (2019) 294–303
252 days, so that a total of 506 networks are calculated (506 = 757 − 252 + 1). 2.3. Surrogate time series based on the one factor model The one factor model is a financial model widely used in investment analysis in theory and practice [57]. The model uses market returns (rMarket (t)) as a factor and estimates α i and β i for each asset, as shown by the Eq. (2). In addition, i (t) is a random perturbation term, where the mean and variance are assumed to be 0 and σi2 , respectively. It is also often used to generate time series for comparative analysis, such as in [58] and [59].
ri (t ) = αi + βi rMarket (t ) + i (t )
(2)
As in previous studies [58,59], we generate a surrogate time series as follows. First, we use the average of the returns of all considered stocks as the market returns, which is rMarket (t ) = 1 i ri (t ). Then, we estimate the values of α i and β i for each stock, n and calculate the standard deviation (σ i ) of i (t). Third, we use the estimated values of α i and β i and the standard deviation to calculate the surrogate time series (Eq. (3)). Here, i,surr (t ) = σi wGauss , where wGauss is a Gaussian variable, the mean is 0, and the standard deviation is 1. Thus, we extract the impact of the market on each stock and reconstruct the return series. We will use these surrogate series in this article to generate a baseline for comparison.
ri,surr (t ) = αi + βi rMarket (t ) + i,surr (t )
(3)
In Eq. (5), rk is the threshold, is the Heaviside step function, and if there is a relational expression between C(rk ) and rk , such as Eq. (6), the corresponding Dimcorr is named the correlation dimension of space (W, d).
Cn (r ) = C0 r Dimcorr
(6)
In estimating the dimension, one question is how to choose the thresholds {rk }. Here we use the method proposed in the literature [20,21]. The threshold p1 is set, and the corresponding r(1) satisfies the equation p1 = Card ({l < r (1 ), l ∈ Ltri } )/Card (Ltri ), where Ltri = {di j , j > i}. Similarly, p2 and r(2) are set and selected. We select some of the values {rk } in interval [r(1), r(2)] as thresholds. In our study, p1 and p2 need to be set to ensure that Eq. (6) can be well fitted. In order to estimate the value of Dimcorr , in the log-log scale coordinates system, we use the least squares method to estimate the coefficients of the linear term in Eq. (7).
log(Cn (r )) = C0 + Dimcorr ∗ log(r )
(7)
In general, the Jaccard distance matrix changes as the network set changes, resulting in a significant difference between the distance distributions. In order to properly compare the dimensions of different network sets, we usually need to ensure that p1 and p2 are unchanged in the comparative analysis. In addition, in the following sections, we only use the Jaccard distance, if the other distance is defined, then the method of calculating the dimension does not change.
2.4. Jaccard distance
2.6. An evolutionary network model with some communities
Here, we apply the Jaccard distance defined on the network as a basic tool, which takes values in the interval [0,1], and the larger the value, the smaller the similarity [35,36]. The Jaccard distance of networks W1 (V, E1 ) and W2 )(V, E2 ) is defined as Eq. (4). The sym bol E1 E2 means the intersection of edge sets E1 and E2 , and the symbol E1 E2 means the union of E1 and E2 . We need to note that the node set of both networks is V.
In order to study the application of dimension in the analysis of network structure more widely, we use the benchmark model proposed by Granell et al. [11]. In this model, the community evolves over time, each time t corresponds to a network Wt , and the number of nodes in the network does not change, which makes it possible to calculate the Jaccard distance and analyze its evolutionary structure. In [11], Granell et al. proposed three benchmark models. Here we use the grow-shrink benchmark model to generate a network set [11]. This model is constructed based on the stochastic block model in which the connection probability of a node inside a community is pin , and the probability between different communities is pout . It includes two communities, which are assumed to be A and B, and both include n nodes (NA = n, NB = n), that is, a total of N = 2n nodes. In addition, the model assumes that nodes in A with a fraction of f are assigned to B in the evolution process. During the evolution of the network, the total nodes are unchanged, and the number of nodes in community A is NA = n − n f [2x(t + τ /4 ) − 1], where x(t ) = 2t (0 ≤ t < 12 ) or
J ( 1, 2 ) = 1 − = 1−
Card (E1 E2 ) Card (E1 E2 )
Card (E1 E2 ) Card (E1 ) + Card (E2 ) − Card (E1 E2 )
(4)
Corresponding to a network set, including N networks, an N × N Jaccard distance matrix can be calculated, which depicts the similarity between each network and other networks. In this way, the relationship between the networks can be expressed directly from the Jaccard distance matrix. Thus, an important issue is the construction of indicators to characterize the global structure of the entire Jaccard distance matrix. Next, we apply the correlation dimension to characterize the global structure of the Jaccard distance matrix. 2.5. Correlation dimension We assume that there is a set of networks W = {Wi (Vi , Ti ), i = 1, · · · , N}, where Vi is the node set of the network Wi and Ti is the adjacency matrix. If the distance d between networks is defined, then (W, d) is a metric space [60]. When the number of elements in W is finite, it is a discrete metric space. Corresponding to the set W, a distance matrix is computed and denoted as D = [di j ], where the element di j = d (Wi , W j ). Similar to the original definition of the correlation dimension [13,14], we calculate the correlation function as shown in Eq. (5).
Cn (rk ) =
1 n j
1 (rk − di j ) n i
(5)
x(t ) = 2 − 2t ( 12 ≤ t < 1 )). Here, t ≡ (t/τ + ϕ ) mod 1, the constant ϕ is specified as 0 when including two communities, and other values when the number of communities is greater than 2. The number of nodes (NA , NB ) in the community changes as the value of t mod 1 changes, such as (NA ) = (n − n f, n + n f ) when t/τ mod 1 = 1/4. Then, when a node in the community A is assigned to the B community, all links of the node are eliminated, and this node is connected with other nodes in the community B by probability pin . Similarly, the node is connected to a node in A with a probability of pout . Details of this model and other evolution models can be found in the literature [11]. In addition, the parameters we set are consistent with this original paper. 2.7. An evolutionary network model with some critical events Since the Erdös-Rényi random graphs are widely used in the study of complex networks [61], in this article, we will apply the Erdös-Rényi random graphs to construct a model with structural
C.-X. Nie / Chaos, Solitons and Fractals 123 (2019) 294–303
evolution, that is, the change in network structure depends on time. We assume a total of nphase phases L = {Li , i = 1 · · · n phase }, where each phase Lk includes nk networks {Wk (V, Tk ), j = 1 · · · nk }, j
j
j
j
each network Wk (V, Tk ) being generated based on the previous n phase one. Thus, the number of networks is N = k=1 nk . Here, the j
node set V is always unchanged, Tk is an adjacency matrix, and there is a difference between adjacent matrices of different networks, which is due to the randomized reconnection operation. In addition, in order to construct a model with some critical events, we need some other parameters. First, we set up a parameter set +1 +1 pc = { pi,i , i = 1 · · · n phase − 1}, where each element pi,i depicts c c a sudden change in the network structure from the ith phase Li to the i + 1th phase Li+1 . Second, we set nphase sets of intervals q phase = {[q1k , q2k ], k = 1 · · · n phase }, each of which is used to generate a set of random numbers that satisfy a uniform distribution, and the interval [q1k , q2k ] corresponds to phase Lk . Therefore, each phase Lk corresponds to a set of random numbers that satisfy a uniform distribution over the interval [q1k , q2k ], including nk − 1 +1 numbers {qi,i , i = 1 · · · nk − 1}. These random numbers are used L k
to characterize the stability of the network at each phase. For each j j network Wk (V, Tk ) in phase Lk , it is generated by the previous network according to the following rules. In our study, in order to characterize differences between networks in different phases, it is necessary to set the number in pc to be significantly larger than {q2k , k = 1 · · · n phase }. Thus, there are n phase − 1 critical events between nphase phases, and the network structure changes drastically at the critical event. In our method, we need to specify a random graph G(NV , p) as input, where p is a parameter used to generate a random graph and NV is the number of nodes. The method is described in detail below. •
•
•
In phase L1 , where W11 (V, T11 ) is set to a Erdös-Rényi random graph G(NV , p). The first network (Wk1 (V, Tk1 )) of a phase Lk (k = 1) is generated based on the last network of the previous phase Lk−1 . We nk nk randomly reconnect the edges in Wk−1 (V, Tk−1 ), and the ratio of the number of randomly reconnected edges to all edges is pkc −1,k . j j j−1 j−1 If j = 1, Wk (V, Tk ) is generated based on Wk (V, Tk ). The j−1
j−1
edges in Wk (V, Tk ) are randomly reconnected, where the ratio of the number of reconnected edges to the number of all j−1, j edges is qL . k
It should be noted that we need to avoid generating multiple edges when randomly reconnecting edges. In addition, in the Erdös-Rényi random graph G(NV , p), the edges between the nodes are randomly connected, corresponding to the probability p. Therefore, the networks generated according to the above steps are all random graphs. In this paper, p is set to 0.15. In order to compare with the random graph, here, we also set pt = 0.15 to construct the financial threshold network, which means that it can be compared with the Erdös-Rényi random graph G(NV , p) of p = 0.15. 3. Results 3.1. Analysis of toy model based on correlation dimension In this section, we use the model to analyze the possible causes of differences in correlation dimensions. Here, in order to analyze the impact of the heterogeneity of critical events on the network structure. We set 5 phases, 4 critical events, and different events +1 correspond to different pi,i values. First, five phases are set up, c each of which contains 20 0, 50, 10 0, 150, 100 networks for a total
297
Table 1 Dimension estimation results of different Jaccard matrices. label
qphase
Dimcorr
AveJaccard
StdJaccard
No.1 No.2 No.3 No.4 No.5 No.6 No.7 No.8 No.9 No.10 No.11 No.12
[0.01,0.02] [0.015,0.025] [0.02,0.03] [0.025,0.035] [0.03,0.04] [0.035,0.045] [0.04,0.05] [0.045,0.055] [0.05,0.06] [0.055,0.065] [0.06,0.07] [0.065,0.075]
1.491 1.718 1.969 2.215 2.453 2.748 3.093 3.447 3.813 4.194 4.613 5.012
0.8254 0.8442 0.8570 0.8658 0.8721 0.8774 0.8819 0.8853 0.8883 0.8905 0.8926 0.8943
0.1851 0.1664 0.1508 0.1395 0.1307 0.1223 0.1156 0.1094 0.1046 0.1001 0.0959 0.0921
Figure: The graph corresponding to the calculation result of the dimension. AveJaccard = j>i J (i, j )/Card ({J (i, j )| j > i} ). StdJaccard = j>i (J (i, j ) − AveJaccard )/Card ({J (i, j )| j > i} ).
of N = 600 networks. Second, we set all the intervals {[q1k , q2k ], k = 1, · · · , 5} to the same interval q phase = [q1phase , q2phase ], which indicates that in the statistical sense, the level of network change in each phase is the same. Finally, we set pc = {0.4, 0.2, 0.8, 0.5}, where the minimum is p2c ,3 (0.2) and the maximum is p3c ,4 (0.8), which makes a significant difference between the critical events. The qphase is adjusted in the calculation, and the details are shown in the second column of Table 1. In addition, we need to set a random graph G(NV , p) as the first network W11 (V, T11 ). We selected 12 sets of parameters for analysis. There are no strict constraints on the settings of the parameters of the initial network. Here, we set NV = 200 and p = 0.15. In the calculation of the threshold network for the analysis of financial data in Section 3.2, we set pt = 0.15 so that the results of this section can be compared with the results of Section 3.2. Below, we study the dimensions of the Jaccard matrix corresponding to different parameters. Fig. 1 shows three of the twelve Jaccard matrices with parameters corresponding to q phase = [0.01, 0.02], q phase = [0.035, 0.045] and q phase = [0.065, 0.075]. Intuitively, we can observe the change in Jaccard distance due to the critical event from Fig. 1(a), resulting in five clear modules. Furthermore, q2c ,3 significantly less than q3c ,4 means that there is a smaller critical event change between phase 2 and phase 3, ie the level of difference between the two phases is smaller, which can be clearly observed in Fig. 1(a). From the comparative analysis, it can be clearly found that the values of the different phases in the Jaccard distance matrix shown in Fig. 1(a) are larger, the distinction between the other phases is clearer, and the boundaries of different phases can be clearly distinguished. As qphase increases, the boundary between the different phases of the Jaccard distance matrix is more blurred, and only the difference near the critical event can be observed at q phase = [0.065, 0.075]. In addition, to globally describe the Jaccard distance matrix, we extract the upper triangular elements of the matrix and calculate the average(AveJaccard ) and standard deviation (StdJaccard ). As the qphase increases, the level of network change at each stage increases, the distance between the different phases increases, and the standard deviation decreases. Consistent with the intuitive results, Fig. 1(a) corresponds to a smaller AveJaccard and a larger StdJaccard . We find that the AveJaccard value of Fig. 1(a) is smaller than the values of Fig. 1(b) and Fig. 1(c). Below, we study the evolution of the network from the perspective of the correlation dimension. In Sections 3.1 and 3.2, we always set p1 = 0.05, p2 = 0.1 to estimate the dimensions. Empirical calculations show that the differences between the different distance matrices in Fig. 1 can be captured by the correlation dimension. The dimension of
298
C.-X. Nie / Chaos, Solitons and Fractals 123 (2019) 294–303
Fig. 1. The three subgraphs correspond to the parameters q phase = [0.01, 0.02] (a), q phase = [0.035, 0.045] (b) and q phase = [0.065, 0.075] (c) in Table 1, respectively.
the matrix shown in Fig. 1(a) is significantly smaller than the dimensions of Fig. 1(b) and (c) (Table 1). Here, we only show the calculation results of Fig. 1(a) and (c) (Fig. 2), and the dimensions can be well estimated. More calculation results are shown in the third column of Table 1. In our study, p2phase − p1phase is always equal to 0.01, so the variance of the uniform distribution corresponding to all inter1 vals qphase is 12 ( p2phase − p1phase ) ( ≈ 0.0 0 0833). Therefore, we use p2
+ p1
the mean (m = phase 2 phase ) as an alternative variable to characterize the interval qphase , where the larger the mean, the more stable the changes within the different phases of network evolution. Fig. 3 shows the relationship between the mean and the dimension, and the calculations show that the smaller mean corresponds to a smaller dimension. 3.2. Dimension estimation and comparison analysis based on real data and one factor model Below, we analyze an example based on real-world data, where each network of the network set is a financial threshold network. We find that the dimension of the Jaccard distance matrix based on real data is close to the dimension of our toy model, and has a smaller dimension and richer detail than the Jaccard distance matrix based on the one factor model. That is, the dimension is
significantly smaller than the dimension based on the one factor model. First, we select the price series of the constituents of S&P100 and calculate a series of threshold networks, where pt = 0.15. Here, the time interval we selected is 2006/9/13-2010/9/15, a total of 506 (L − L1 + 1 = 506) threshold networks can be generated. Then, we calculate the Jaccard distance matrix (Fig. 4(a)) and the correlation dimension (Fig. 4(b)). Similarly, we use a one factor model to generate surrogate time series, and construct a series of threshold networks according to the same parameter pt , and then calculate the Jaccard distance matrix, as shown in Fig. 5(a). Here, the correlation dimension is 1.96 (Fig. 5(b)), which is larger than the value shown in Fig. 4(b). Finally, we divide the time period into two periods, the first time period from 2006/9/13 to 2008/9/12, and the second time period from 2007/9/17 to 2009/9/15. In our study, the collapse of Lehman Brothers was the boundary between the phases. The time period corresponding to the last network of phase 1 is 2007/9/142008/9/12, and the time period of the first network corresponding to phase 2 is 20 07/9/17-20 08/9/15 (September 13 and 14 are weekends). Then we estimate the one factor model using the return series of the two time periods separately. It should be noted that there is an overlap between the two periods of time here, which is caused by the time window in the calculation. The first time
C.-X. Nie / Chaos, Solitons and Fractals 123 (2019) 294–303
299
changes in the network structure. Moreover, there are still some differences between these Jaccard distance matrices. First, intuitively, we find that in Fig. 4, the differences between networks in the same phase are smaller, resulting in greater differences between the different phases. Second, different from the matrices shown in Figs. 4 and 5, in Fig. 6, since the one factor model is separately estimated for two phases, the similarity between the networks in different phases is reduced, resulting in only two modules being included therein. Finally, we find that the correlation dimension of both matrices is less than 3, which is similar to the previous calculation using the toy model, for example when the interval qphase is [0.01,0.02]. However, the matrix in Fig. 4(a) has a smaller dimension, combined with the calculations from the previous section, suggesting that the level of stability in the phase is higher. Intuitively, we also found more detail in Fig. 4(a), including large modules, and large modules still include smaller modules. These rich fine structures are not found in Fig. 5(a) and Fig. 6(a). Since the covariance between two assets can be directly expressed as cov(ri,surr , r j,surr ) = βi β j cov(rMarket , rMarket ) on the basis of a one factor model, our results show that the dynamics of the threshold network is only partially dependent on the components of the market, but more details may be related to other factors, such as the industry in which the stock belongs. 3.3. The time variation characteristics of the network collection
Fig. 2. The two subgraphs show the dimension estimation results of the q phase = [0.01, 0.02] (a) and q phase = [0.065, 0.075] (b), respectively.
Fig. 3. There is a linear relationship between m and dimension. The smaller dimension corresponds to a smaller m value.
period and the second time period respectively correspond to 253 threshold networks, and the Jaccard distance matrix of a total of 506 networks is as shown in the Fig. 6(a), and its dimension is 2.222 (Fig. 6(b)). We find a significant change in the network structure near the time of the collapse of Lehman Brothers in Fig. 4, and a similar change in Fig. 5, but the changes shown in Fig. 5 are not very significant. In addition, Figs. 4 and 5 also show that the dynamics of the threshold network change over time and suddenly change at some points. These points are the critical events for dramatic
The network structure in the previously constructed model changes over time, including some critical events. We have found that the Jaccard distance matrix generated by this model has a small dimension (Table 1). One issue that we need to consider is whether the dimension changes significantly if we eliminate the effects of time. Next, this section shows that dimension can indeed be used as a tool to characterize whether a collection of networks is organized by time. First, we apply the Erdös-Rényi random graph model to construct a collection of networks with no evolutionary structure and calculate the Jaccard distance matrix. Here, in order to compare with the results of Section 3.1, we set the number of networks N = 600. The parameters NV and p in the random graph G(NV , p) are 200 and 0.15, respectively. We generate an example and calculate the AveJaccard and StdJaccard of the Jaccard distance matrix as 0.9189 and 0.0037, respectively, where AveJaccard is significantly larger than the value in Table 1, and StdJaccard is significantly smaller than the value in Table 1. Since the standard deviation of the elements is small, we use [ p1 , p2 ] = [0.1, 0.25] when estimating the correlation dimension (Fig. 7(a)). In this example, the dimension is 405.6, which is significantly larger than the result calculated in Sections 3.1 and 3.2. Further, consistent with the above steps, we repeatedly generate some Jaccard distance matrices using the Erdös-Rényi random graph model and estimate the dimensions, where the parameters are set to be consistent with the example in Fig. 7(a). Here, we generated a total of 300 Jaccard distance matrices, where the frequency histogram of the dimensions is shown in Fig. 7(b) (Note: the histograms in this article are normalized). We can clearly see that the dimensions based on the random graph are significantly larger than the dimensions in Sections 3.1 and 3.2. The dimensions calculated in the previous two sections are all small positive numbers, and here are all numbers greater than 400. The analysis in this section shows that the dimension of a network set without evolutionary structure is significantly larger than a small positive number, such as 5, thus suggesting that the correlation dimension can be used as an indicator to characterize the evolutionary structure in the network set.
300
C.-X. Nie / Chaos, Solitons and Fractals 123 (2019) 294–303
Fig. 4. The figure shows the Jaccard distance matrix (a) and dimension estimation result (b) based on the real return series.
Fig. 5. The figure shows the Jaccard distance matrix (a) and dimension estimation result (b) based on the one factor model.
Fig. 6. The Jaccard distance matrix (a) contains two time periods, each corresponding to a clear module. Figure (b) is the result of dimension estimation.
C.-X. Nie / Chaos, Solitons and Fractals 123 (2019) 294–303
301
Fig. 7. (a) The dimensional estimation result of the Jaccard distance matrix of the Erdös-Rényi random graphs. (b) A frequency histogram of the correlation dimensions based on the Erdös-Rényi random graphs, with a total of 300 experimental results.
Fig. 8. Figure (a) shows the Jaccard distance matrix generated by the evolution model with community structure. Figure (b) shows the results of the correlation dimension estimation for figure (a).
Fig. 9. Figure (a) shows the dimension estimation result for an example where the Jaccard distance matrix is generated from the Erdös-Rényi random graphs. Figure (b) shows the frequency histogram of the dimension based on Erdös-Rényi random graph.
302
C.-X. Nie / Chaos, Solitons and Fractals 123 (2019) 294–303
3.4. The dimension of the evolution model with the communities In this section, we consider an evolutionary network model with a community structure, which was proposed by Granell et al. [11]. A total of 101 networks {Wt , t = 1 . . . 101} are generated, the number of nodes in each network is 64, so the sum of the number of nodes in A and B is always equal to 64. We calculate the Jaccard distance matrix (Fig. 8(a)) of the 101 networks and estimate the dimension (Fig. 8(b)), where p1 = 0.01 and p2 = 0.05. For comparison, we calculate the connection density p(Wt ) for each network Wt and generate the Erdös-Rényi random graphs {G(64, p(Wt )), t = 1 . . . 101}. Then we calculate the Jaccard distance matrix of these networks, and the estimated dimension is shown in Fig. 9(a), where p1 = 0.01 and p2 = 0.05. It can be found that the dimension (128.3382) is significantly larger than the dimension (4.0748) shown in Fig. 8(a). Further, we repeat the process and calculate the dimension, the frequency histogram shown in Fig. 9(b). The results confirm that the dimension of the Jaccard distance matrix without evolutionary structure is larger than the dimension of the network evolution model with community structure. 4. Discussion and conclusions 4.1. Discussion In previous studies, the correlation dimension can be used to discuss the intrinsic dimension of high-dimensional data [18]. Here, we apply the correlation dimension to analyze the distance matrix based on the network. One of the differences from previous research is that network-based distances are usually not Euclidean distances. However, we still find that correlation dimensions can effectively capture the time-dependent characteristics of a group of networks, that is, a small dimension. In addition, the Jaccard distance used in this paper is a metric, so that if a Jaccard distance is defined on a group of networks, a discrete metric space is constructed. Therefore, our results show that the correlation dimension can be used as an indicator to characterize this discrete metric space. Further, the case based on other types of metrics can also be studied in detail, and the method used in this paper can still be used as a meaningful choice. 4.2. Conclusions This paper proposes a model for generating model with critical events and applying correlation dimensions to analyze the model. By adjusting the parameters we found that higher levels of stability correspond to smaller dimensions. In addition, in our study, the dimension of the benchmark for comparison generated by the Erdös-Rényi random graph is significantly larger than 6, and the dimension of the Jaccard distance matrix obtained based on the model and the correlation network is less than 6. Therefore, comparative analysis shows that the correlation dimension can be used as an indicator to characterize the network set with the metric structure. Further, our empirical analysis uses data from the stock market and finds that the time series generated by the one factor model in finance theory does not fully characterize the changes in the network structure, resulting in a dimension greater than the dimension based on real data. In order to extensively study the relationship between the evolutionary structure of the network and the dimension, we also use the existing evolution model with community structure to generate the network set and calculate the dimension of the Jaccard distance matrix. The calculations show that the dimension is smaller than the values of the Jaccard distance matrix without evolutionary structure. In summary, we found that correlation dimensions can be used as a tool to analyze changes in the structure of complex networks,
which are widely present in different complex systems, such as the financial markets. Declaration of interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgment This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. References [1] Costa LF, Rodrigues FA, Travieso G, Boas P. Characterization of complex networks: a survey of measurements. Adv Phys 2007;56(1):167–242. [2] Newman MEJ. The structure and function of complex networks. SIAM Rev 2003;45(2):167–256. [3] Boccalettia S, Latorab V, Morenod Y, Chavezf M, Hwang D-U. Complex networks: structure and dynamics. Phys Rep 2006;424:175–308. [4] Zanin M, Papoc D, Sousa PA, Menasalvas E, Nicchi A, Kubik E, et al. Combining complex networks and data mining: why and how. Phys Rep 2016;635:1–44. [5] Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Network motifs: simple building blocks of complex networks. Science 2002;298:824–7. [6] Fortunato S. Community detection in graphs. Phys Rep 2010;486:75–174. [7] Malliaros FD, Vazirgiannis M. Clustering and community detection in directed networks: a survey. Phys Rep 2013;533:95–142. ´ [8] Danon L, Dlaz-Guilera A, Duch J, Arenas A. Comparing community structure identification. J Stat Mech 20 05;09:P090 08. [9] Wernicke S, Rasche F. FANMOD: a tool for fast network motif detection. Bioinformatics 2006;22(9):1152–3. [10] Rajendran K, Kattis A, Holiday A, Kondor R, GKevrekidis I. Data mining when each data point is a network. In: International Conference on Patterns of Dynamics; 2016. p. 289–317. [11] Granell C, Darst RK, Arenas A, Fortunato S, Gómez S. Benchmark model to assess community structure in evolving networks. Phys Rev E 2015;92:012805. [12] Sengupta N, Hamann M, Wagner D. Benchmark generator for dynamic overlapping communities in networks. IEEE; 2017. [13] Grassberger P, Procaccia I. Characterization of strange attractors. Phys Rev Lett 1983;50(5):346. [14] Grassberger P. Generalized dimensions of strange attractors. Phys Lett A 1983;97(6):227–30. [15] Mayfield E, Mizrach B. On determining the dimension of real-time stock price data. J Bus Econ Stat 1992;10(3):367–74. [16] SSoofi A, Galka A. Measuring the complexity of currency markets by fractal dimension analysis. Int J Theor ApplFinance 2003;6(06):553–63. [17] Wang H, Chen G, Lü J. Complex dynamical behaviors of daily data series in stock exchange. Phys Lett A 2004;333(3):246–55. [18] Camastra F, Vinciarelli A. Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans Pattern Anal MachIntell 2002;24(10):1404–7. [19] Kégl B. Intrinsic dimension estimation using packing numbers. Adv Neural Inf Process Syst 2002:681–8. [20] Nie C-X. Correlation dimension of financial market. Physica A 2017;473:632–9. [21] Nie C-X. Dynamics of cluster structure in financial correlation matrix. Chaos Solitons Fractals 2017;104:835–40. [22] Nie C-X. Cluster structure in the correlation coefficient matrix can be characterized by abnormal eigenvalues. Physica A 2018;491:574–81. [23] Cui T, Caravelli F, Ududec C. Correlations and clustering in wholesale electricity markets. Physica A 2018;492:15071522. [24] Chi Y, Zhu S, Song X, Tatemura J, Tseng BL. Structural and temporal analysis of the blogosphere through community factorization. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2007. p. 163–72. [25] Efatmaneshnik M, JRyan M. A general framework for measuring system complexity. Complexity 2016;21(S1):533–46. [26] Bunke H, Shearer K. A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 1998;19:255–9. [27] Fernández M-L, Valiente G. A graph distance metric combining maximum common subgraph and minimum common supergraph. Pattern Recognit Lett 2001;22:753–8. [28] Levi G. A note on the derivation of maximal common subgraphs of two directed or undirected graphs. Calcolo 1973;9(4):341–52. [29] Wallis WD, Shoubridge P, Kraetz M, Ray D. Graph distances using graph union. Pattern Recognit Lett 2001;22:701–4. [30] Bunke H. On a relation between graph edit distance and maximum common subgraph. Pattern Recognit Lett 1997;18:689–94. [31] Bunke H, Jiang X, Kandel A. On the minimum common supergraph of two graphs. Computing 20 0 0;65(1):13–25. [32] Xiao Y, Dong H, Wu W, Xiong M, WeiWang, Shi B. Structure-based graph distance measures of high degree of precision. Pattern Recognit 2008;41:3547–61.
C.-X. Nie / Chaos, Solitons and Fractals 123 (2019) 294–303 [33] Barrow HG, Burstall RM. Subgraph isomorphism, matching relational structures and maximal cliques. Inf Process Lett 1976;4(4):83–4. [34] HLipkus A. A proof of the triangle inequality for the Tanimoto distance. J Math Chem 1999;26:263–5. [35] Donnat C, Holmes S. Tracking network dynamics: a survey of distances and similarity metrics. 2018; arXiv:1801.07351. [36] Rawashdeh A, Ralescu AL. Similarity measure for social networks-a brief survey. Maics 2015:153–9. [37] Levandowsky M, Winter D. Distances between sets. Nature 1971;234(5):34–5. [38] Jousselme A-L, Maupin P. Distances in evidence theory: comprehensive survey and generalizations. Int J Approx Reasoning 2012;53:118–45. [39] Reina DG, Toral SL, Johnson P, Barrero F. Improving discovery phase of reactive ad hoc routing protocols using Jaccard distance. The J Supercomput 2014;67(1):131–52. [40] Pradhan N, Gyanchandani M, Wadhvani R. A review on text similarity technique used in ir and its application. Int J Comput Appl 2015;120(9): 29–34. [41] Cha S-H. Comprehensive survey on distance/similarity measures between probability density functions. Int J Math ModelsMethods Appl Sci 20 07;4(1):30 0–7. [42] Hamers L, Hemeryck Y, Herweyers G, Janssen M, Keters H, Rousseau R, et al. Similarity measures in scientometric research: the Jaccard index versus salton’s cosine formula. Inf Process Manage 1989;25(3):315–18. [43] Arnaboldi V, Passarella A, Conti M, Dunbar RI. The structure of ego networks in twitter. In: Online Social Networks: Human Cognitive Constraints in Facebook and Twitter Personal Graphs. Elsevier; 2013. p. 75–92. [44] Duan G, Christian N, Schwachtje J, Walther D, Ebenhöh O. The metabolic interplay between plants and phytopathogens. Metabolites 2013;3(1):1–23. [45] Asllani M, Lambiotte R, Carletti T. Structure and dynamical behavior of non-normal networks. Sci Adv 2018;4(12):eaau9403. [46] Mantegna RN. Hierarchical structure in financial markets. EurPhysJB 1999;11(1):193–7.
303
[47] Tumminello M, Aste T, Matteo T, Mantegna RN. A tool for filtering information in complex systems. PNAS 2005;102(30):10421–6. [48] Yang Y, Yang H. Complex network-based time series analysis. Phys A 2008;387:1381–6. [49] Tse CK, Liu J, Lau FC. A network perspective of the stock market. J Empir Finance 2010;17:659–67. [50] Tumminello M, Matteo T, Aste T, Mantegna RN. Correlation based networks of equity returns sampled at different time horizons. EurPhysJB 2007;55:209–17. [51] Buccheri G, Marmi S, NMantegna R. Evolution of correlation structure of industrial indices of U.S. equity markets. Phys Rev E 2013;88:012806. [52] Song D-M, Tumminello M, Zhou W-X, NMantegna R. Evolution of worldwide stock markets, correlation structure, and correlation-based graphs. Phys Rev E 2011;84:026108. [53] Wang G-J, Xie C, Chen Y-J, Chen S. Statistical properties of the foreign exchange network at different time scales: evidence from detrended cross-correlation coefficient and minimum spanning tree. Entropy 2013;15(5):1643–62. [54] Nobi A, Lee S, Kim DH, Lee JW. Correlation and network topologies in global and local stock indices. Phys Lett A 2014;378:2482–9. [55] Zhao L, Li W, Cai X. Structure and dynamics of stock market in times of crisis. Phys Lett A 2016;380:654–66. [56] Nie C-X, Song F-T. Constructing financial network based on PMFG and threshold method. Physica A 2018;495:104–13. [57] YCampbell J, WLo A, MacKinlay A. The econometrics of financial markets. Princeton University Press; 1997. p. 219–52. [58] Bonanno G, Caldarelli G, Lillo F, NMantegna R. Topology of correlation-based minimal spanning trees in real and model markets. Phys Rev E 2003;68(4):046130. [59] Bonanno G, Caldarelli G, Lillo F, Miccichè S, Vandewalle N, Mantegna RN. Networks of equities in financial markets. Eur Phys J B 2004;38:363–71. [60] Lang S. Real and functional analysis. Springer; 1993. p. 19. [61] Erdös P, Rényi A. On the evolution of random graphs. Publ Math Inst Hung Acad Sci 1960;5:17–61.