Journal of Network and Computer Applications 57 (2015) 12–20
Contents lists available at ScienceDirect
Journal of Network and Computer Applications journal homepage: www.elsevier.com/locate/jnca
A PCA based optimization approach for IP traffic matrix estimation Erdun Zhao, Liansheng Tan n School of Computer Science, Central China Normal University, Wuhan 430079, PR China
art ic l e i nf o
a b s t r a c t
Article history: Received 18 October 2014 Received in revised form 11 May 2015 Accepted 8 July 2015 Available online 22 July 2015
Inferring traffic matrix (TM) from link measurements and routing information has important applications including capacity planning, traffic engineering and network reliability analysis. The challenge comes from that there are more unknowns than data. To face this challenge, this paper describes the inference problem as an optimization problem, where the objective is to minimize the Mahalanobis distance between the solution and a certain prior distribution, subject to the routing and link measurement constraints. This optimization problem is then solved by the Moore–Penrose inverse of the routing matrix. To reduce the computing complexity, a principal component analysis (PCA) approach is further applied in solving the optimization problem. We obtain the explicit formulas by using the Moore–Penrose inverse and the PCA theory. On the basis of the generalized inverse of routing matrix and the PCA theory, we propose an interesting generalized Tomogravity approach, which is subsequently termed as PCAOM. We present the complete mathematical solution and the algorithm of the described TM estimation problem. By introducing a weight parameter, a generalized algorithm is presented, which can be applied flexibly by adjusting the importance of the prior according to the accuracy of the prior or even no prior is required when the prior is unavailable. Numerical results are provided to demonstrate the accuracy of our method with the dataset of Abilene network through the comparison with the famous Tomogravity method. Given that we have proposed two algorithms for the optimization problem of TM estimation, we also provide a guideline on how to choose the proper algorithm according to the availability of the prior information. & 2015 Elsevier Ltd. All rights reserved.
Keywords: Traffic matrix Mahalanobis distance Moore–Penrose inverse Principal component analysis (PCA) Prior distribution On-line estimation
1. Introduction As an important branch of Internet engineering, traffic engineering plays an increasingly significant role in the next generation Internet architecture. Traffic matrix (TM), which records the traffic volume of flows between origin–destination (OD) Internet node pairs, is important for many Internet engineering tasks, such as traffic engineering, network planning and dimensioning, load balancing and fault diagnosis and so on (Roughan et al., 2003). Recently, TM has also been applied to anomaly detection (Lakhina et al., 2004a; Wang et al., 2012; Tian et al., 2014). However, it is very difficult, if it is impossible, to obtain an accurate TM directly by using the existing Internet hardware or software. One reason is the bursty nature of Internet traffic, and the other insurmountable difficulty is due to the large number of OD pairs. Although it is difficult to record the OD pair traffic flow volumes directly, the number of links is usually relatively small,
n
Corresponding author. E-mail addresses:
[email protected] (E. Zhao),
[email protected] (L. Tan). http://dx.doi.org/10.1016/j.jnca.2015.07.006 1084-8045/& 2015 Elsevier Ltd. All rights reserved.
and subsequently the link traffic counts can be obtained without difficulty by using the simple network management protocol (SNMP) or the information in the router which connects to that link. With consideration to the above fact, many research efforts have focused on TM estimation by utilizing the information of measurements on link loads, using linear programming (LP) or statistical inference techniques (see, e.g. Medina et al., 2002 and the references therein). Being organized into an n-dimension vector for convenience, let X denote the unknown TM. By denoting the m-dimension vector of link counts as Y and the m n routing matrix as A, we formulate the linear equations Y¼AX to relate the three components X, A and Y. Because the number of OD pairs is almost always much larger than the number of the links, to find an unique solution from the equations Y¼ AX is obviously an ill-posed problem. For this reason, most methods on TM estimation use the TMs prior distribution or a given theoretical distribution as the extra information. Recent attempts to the above TM estimation problem include the following. Goldschmidt (2000) modeled the TM estimation problem as a linear programming (LP) optimization problem with the assumption that the link counts Y are accurate. The aim therein is to find a
E. Zhao, L. Tan / Journal of Network and Computer Applications 57 (2015) 12–20
vector as the TM estimation from the feasible solution space of equations Y¼AX which has the maximum sum of all elements. Cao et al. (2000) utilize a time-varying statistical approach to estimate TM by using link counts at router interfaces. A statistical algorithm called the expectation-maximization (EM) method is applied by Medina et al. (2002) to compute TM under the assumption that X obeys Gaussian distribution. Under the assumption that X is Poisson distributed, Vardi (1996) infers the TMs by a moment estimator which used the estimation of the first moment and the second moment (covariance matrix) of link counts. The Bayesian method by Tebaldi and West (1998) models the TM X as a Poisson distribution and then X is estimated by the conditional expectation EðX=YÞ. With the OD flows independence assumption, a method called the gravity model by Roughan et al. (2002) was applied to estimate TMs in IP network, in which the traffic volume of OD pairs nodes (origin or destination) is proportional to the total traffic volume of that node. Through the analysis of various objective functions in the references, a new objective function with information entropy has been proposed by Zhang et al. (2003a, 2005), which is also based on the similar OD independence assumption and is a generalized version of the gravity models by Roughan et al. (2002). By resorting to the special structure of the route matrix in an IP-based virtual private networks (VPN), Shioda and Ohtani (2006) proposed a computationefficient TM estimator without computing matrix inverse. It should be pointed out that the efficiency comes from the simple structures of routing matrix and the simple assumption of TM covariance matrix, which unfortunately limits its application scope. Some TM estimation algorithms for new applications have also been explored recently (Luigi Conti et al., 2010; Jiang et al., 2011b; Hua et al., 2015; Nie et al., 2015). By considering the long-range dependent nature of Internet traffic, Luigi Conti et al. (2010) have explored an expectationmaximization (EM) based algorithm to estimate the TM and its Hurst parameter which is related to the long-range dependent model. Jiang et al. (2011b) divide traffic matrix into tendency terms and fluctuation terms, and explored a time–frequency model to characterize the traffic matrix time–frequency nature. By analyzing the two terms carefully and using different algorithms respectively, a comprehensive algorithm has been designed. This method is originated from analysis of the structure of the TM process, and can deal with more complex situation. Apart from the methods which take the IP OD flows as study targets, the authors of the most recent references (Hua et al., 2015) and Nie et al. (2015) applied the TM estimation techniques in data center networks (DCNs) and IP-over-WDM backbone network respectively. Both references have designed new algorithms to adapt their special environments. Most of the afore-mentioned methods are based on the assumption that the TM obeys a certain distribution, which unfortunately limits their application scopes. To overcome this drawback, recent progress has been achieved by Zhang et al. (2003b) and Tan and Wang (2007a,b), with new objective functions and subsequently new models in nonlinear optimal problem for TM estimation being proposed, which finds the desired solution (a vector) from the feasible space. In particular, the Tomogravity method proposed by Zhang et al. (2003b) is the early efficient approach, in which the prior is set to be the output of gravity method and TM is solved using a weighted quadratic programming method. The method gained the efficiency for its simplicity. But its accuracy is heavily dependent on the accuracy of the prior information and the weights used in the optimization. A novel method based on Lagrangian multiplier is proposed by Tan and Wang (2007a) and an algorithm based on a kind of generalized inverse of matrix, namely {1}-inverse is given by Tan and Wang (2007b). Furthermore, by discussing the advantages and disadvantages of the Fanout model deeply, we recently presented an improved Fanout estimator called Tomofanout by Tan and Zhou (2015), which can grasp more information than the original Fanout
13
model and then results in a more accurate TM estimation performance. Recently, a Mahalanobis distance based regressive inference (MDRI) method is used to investigate TM estimation problem by Jiang et al. (2010). The method includes two steps. It firstly resorts to the autoregressive moving average (ARMA) model to compute an initial TM X0. Secondly, an optimal model based on the Mahalanobis distance is established, which aims to minimize an objective function: ðY AXÞT V Y 1 ðY AXÞ þ λðX X 0 ÞT V X 1 ðX X 0 Þ subject to the constraint Y AX ¼ 0, where V Y 1 , V X 1 are the covariance matrices of Y and X respectively, λ is a weight parameter. The model was then solved by an iterative algorithm. By considering the variance matrices of Y and X simultaneously, the paper has obtained a good TM estimator. However, the iterative algorithm has to deal with the high-dimension vector X at every step and so made the computation complexity unexpectedly. Another problem is that the constraint Y AX ¼ 0 would make the first part ðY AXÞT V Y 1 ðY AXÞ of the objective function useless. The principal component analysis (PCA) can reduce the dimensionality of possibly correlated variables by using orthogonal transformation into a set of values of linearly uncorrelated variables called principal components. The PCA method can be used to TM estimation if the elements in TM are correlated. Lakhina et al. (2004b) applied the PCA method to reduce the dimensionality of the set of all OD flows into a set of low dimension flow: eigenflows. By this way, a new approach for traffic matrix estimation is developed by Soule et al. (2005), by only estimating the k most important eigenflows. When k⪡N, the problem of estimating the eigenflows from link traffic is a wellposed problem and the OD flow can be recovered from these estimated eigenflows. However it is argued that the PCA-based method can obtain good performance but behaves unstable since the top k principal components that best describe the underlying dimension can change with time. More importantly, it would lose too much information to make the problem well posed. In this paper, we first describe the inference problem as two optimization problems, where the objective is either to minimize the Euclidean distance or the Mahalanobis distance between the solution and a given prior distribution, subject to the routing and link measurement constraints. Both optimization problems are then solved by applying the Moore–Penrose inverse. Then, we formulate the problem to be a PCA based one which gives rise to a method termed PCA based optimization method (PCAOM). By the method, we reduce the problem's dimensions and obtain the PCA based solution explicitly, on which two algorithms are subsequently developed. At last, we evaluate our algorithms by the Abilene dataset (Zhang). Being different from the method by Jiang et al. (2010), our methods have the capacity to combine the PCA into the optimal model in order to reduce the problem complexity by working with the explicit formula of the solution. Secondly, our method does not need an iterative algorithm which may result unpredicted computation complexity. Moreover, we do not need special prior estimation procedure or even no need for prior, though accurate prior can be helpful. Compared to the PCA method by Soule et al. (2005), our method can benefit from the prior information to gain a more accurate and computation-efficient solution. The rest of the paper is organized as follows. In Section 2, we model the TM estimation problem as two optimization problems, present the solutions for the optimization problems, and discuss their computational complexity. In Section 3, we firstly give a PCA based model with its solution, and then provide an algorithm to calculate the TM. We then extend the algorithm to a generalized case, where the weight on the prior is considered. In Section 4, we compare the PCAOM approach with the state-of-the-art method
14
E. Zhao, L. Tan / Journal of Network and Computer Applications 57 (2015) 12–20
Tomogravity by numerical results using the Abilene dataset. The numerical results demonstrate the advantages of PCAOM. Finally, the paper is concluded in Section 5.
2. Two optimal models and solutions In this section, we describe two optimization problems for TM estimation: one is the optimization problem by Tan and Wang (2007a,b), and the other is a problem based on the concept of Mahalanobis distance. We firstly solve both problems by optimization theory and obtain solutions based on the Moore–Penrose inverse. Then the complexities of both optimal problems are analyzed. 2.1. The generalized inverse of a matrix Some preliminary results related to the generalized inverse of a matrix are necessary for what we shall present in the sequel. For every matrix A A RðsÞnm , a unique matrix A† A RðsÞnm , which is called the Moore–Penrose inverse, exists (Ben-Israel and Greville, 2003) satisfying ðiÞ ðiiÞ
AA† A ¼ A;
following matrix equation: Y ¼ AX:
ð1Þ
From the ill-posed nature of the TM estimation problem, additional information is required to estimate an unique solution for the TM from the solution space of the linear system (1). In this paper, we assume the available additional information to be the prior of TM X or its prior distribution. Given the prior X ð0Þ of TM, one's task should be to search a vector from the feasible space which is “closest” to the prior X ð0Þ . A proper metric is the Euclidean distance. The Euclidean ð0Þ T distance between X ¼ ðx1 ; …; xn ÞT and X ð0Þ ¼ ðxð0Þ 1 ; ⋯; xn Þ in Eucliq ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi E dean space Rn, can be expressed as follows d ¼ ðX X ð0Þ ÞT ðX X ð0Þ Þ;
where ðÞT means the transpose of ðÞ. When the prior TM is given by the form of a random distribution, the Euclidean distance cannot achieve the metric role between a vector to the prior distribution. To define the distance between a vector and a random distribution, the Mahalanobis distance is chosen. Given an n dimension distribution F, the Mahalanobis distance of point X ¼ ðx1 ; …; xn ÞT in Rn to F is defined as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi M d ¼ ðX X ð0Þ ÞT V 1 ðX X ð0Þ Þ, where X ð0Þ A Rn and V A Rnn are the mean vector and covariance matrix of prior distribution F, respectively.
A† AA† ¼ A†
ðiiiÞ
ðAA† ÞT ¼ AA† ;
ðivÞ
ðA† AÞT ¼ A† A; T
where A denotes the conjugate transpose of A. In the special case that A is a square nonsingular matrix, the Moore–Penrose inverse of A is simply its inverse, i.e., A† ¼ A 1 . In the case a matrix A satisfies only the first condition, it is called a {1}-inverse. The {1}inverse is not unique but play an important role in the following development. A matrix equation Mz ¼ f , with M A RðsÞmn , is consistent (can be solved) for z if and only if ðI MM Þf ¼ 0 for any { 1}-inverse M . If the equation is consistent, its general solution can be written (Ben-Israel and Greville, 2003) in the form z ¼ M f þ ðI n M MÞh, where In is the n n identity matrix and h is an arbitrary n 1 vector. The general form of any {1}-inverse of a matrix M is (Ben-Israel and Greville, 2003) M G ¼ M þK M M MK M MM ; where M is a particular {1}-inverse of M such as its Moore– Penrose inverse and KM is an arbitrary matrix with appropriate dimension. 2.2. Problem statement Considering a network with p nodes described by a directed graph, assuming all pairs of nodes have traffic demands, then it is easy to see that there are n ¼ p ðp 1Þ OD pairs. These n traffic demands can be organized by a matrix, with the (i,j) element X i;j to be the traffic demands from the source node i ð1 ri r pÞ to the destination node j ð1 r j r p; ja iÞ. For convenience, we orderly place the traffic demands to form a vector, that is, we number the OD pairs from 1 to n and let X j ð1 r j rnÞ denote the traffic demands of the jth OD pair. Assuming that there are m links all together, all the link counts form a vector Y ¼ ðY 1 ; Y 2 ; …; Y m Þ with Yl being the link count for link l. Denoting A ¼ ½ai;j mn to be the m n routing matrix with element ai;j ¼ 1 to mean that the path of the OD pair j includes the link i, and ai;j ¼ 0, otherwise. Therefore, it is straightforward to see that the OD traffic demands X, the link counts vector Y and the routing matrix A can be described by the
2.3. Solution of the Euclidean distance based optimal problem (OP1) For the case of Euclidean distance, the TM estimation problem can be formulated as follows: OP1 : minðX X ð0Þ ÞT ðX X ð0Þ Þ subject to : AX ¼ Y where X
ð0Þ
ð2Þ
is the TM prior of X.
Theorem 1. The OP1 (2) has the following optimized solution: X n ¼ X ð0Þ þ A þ ðY AXÞ
ð3Þ
X n ¼ A þ Y þ ðI A þ AÞX ð0Þ ;
ð4Þ
where A
þ
is the Moore–Penrose inverse of A.
The proof can been found in Ben-Israel and Greville (2003). Remark 1. In (3), Xn can be explained as a corrected version of the prior X ð0Þ , by adding a vector A þ ðY AX ð0Þ Þ, in which ðY AX ð0Þ Þ is the difference between Y and AX ð0Þ , which represents the deviation of the constraint (1) by X ð0Þ . In (4), Xn is the sum of another two parts. The first part A þ Y is the least square solution of equation AX¼ Y, which is also an approximation of TM. The second part is the vector ðI A þ AÞX ð0Þ , a correction information from the prior X ð0Þ , which can be considered as a correction for the first part. 2.4. Solution of the Mahalanobis distance based optimal problem (OP2) Similar to the OP1, the TM estimation problem can be formulated as the following optimization problem in terms of the Mahalanobis distance: OP2 : minðX X ð0Þ ÞT V 1 ðX X ð0Þ Þ subject to : AX ¼ Y; ð0Þ
ð5Þ
where X is the mean vector of initial TM distribution, and V A Rnn is the positive covariance matrix of prior distribution of X. By noting the covariance matrix and its inverse to be positive, we decompose V 1 by the Choleskey decomposition, and obtain the following conclusion.
E. Zhao, L. Tan / Journal of Network and Computer Applications 57 (2015) 12–20
Theorem 2. The OP2 (5) can be solved by þ
n
þ
X ¼ RðARÞ Y þ ðI RðARÞ AÞX
ð0Þ
ð6Þ
whereR is a nonsingular lower triangle matrix, which is the Choleskey decomposition matrix of V that satisfied V ¼ RRT :
ð7Þ
Proof. By substituting Eq. (7) into OP2 and by simple deduction, we get the following form of OP2: minðR 1 X R 1 X ð0Þ ÞT ðR 1 X R 1 X ð0Þ Þ subject to : ARR 1 X ¼ Y;
ð8Þ ð0Þ
Denote A ¼ AR, X ¼ R 1 X, X ¼ R 1 X ð0Þ , the model has the same form of (2) as in the following: OP20
minðX X
ð0Þ T
Þ ðX X
ð0Þ
Þ
subject to : AX ¼ Y: Then we can easily get the solution of OP20 from (4): n
þ
þ
X ¼ A Y þ ðI A AÞX
ð0Þ
¼ ðARÞ Y þ ðI ðARÞ ARÞR 1 X ð0Þ ; þ
þ
Correspondingly, from X ¼ R tion of OP2 as
1
X we obtain the optimal solu-
X n ¼ RðARÞ þ Y þ RðI ðARÞ þ ARÞR 1 X ð0Þ ; ¼ RðARÞ þ Y þ ðI RðARÞ þ AÞX ð0Þ :□ In the following, we analyze the computational performance of the solution of OP1 and OP2. Theorem 3. The OP1 solution (3) requires a computational complexity of Oðm2 nÞ, the OP2 solution (6) requires a computational complexity of Oðn3 Þ when m o n, where m is the number of links, n is the number of OD pairs. Proof. From Eq. (3), it is not difficult to see that the computation requires computing a Moore–Penrose inversion, two multiplication between a matrix and a vector, and one addition between two vectors. The complexity is mainly due to matrix multiplication and the calculating of the Moore–Penrose generalized inverse A þ . For a rectangle matrix A, the algorithm given by Chen and Ji (2011) is proposed for computing its Moore–Penrose inverse with the complexity Oðm2 nnÞ. And the matrices multiplication operations also require a complexity OðmnnÞ. Thus to compute (3), a total computation of Oðmn þ m2 nnÞ ¼ Oðm2 nnÞ is required. For the solution of (6), an extra Choleskey decomposition of V and one extra multiplication between two matrices are needed, the main computation is then from Choleskey decomposition which needs Oðn3 Þ. Since m o n, similar analysis shows the total complexity of (6) is Oðn3 Þ.□ Remark 2. In fact, given the routing matrix A and the covariance matrix V, the Moore–Penrose generalized inverse A þ and the Choleskey decomposition need be calculated only once. Because it is rare to change the network routes, and the covariance matrix only needs to be updated periodically. So when estimating TM online, the coefficient matrix in (3) and (6) can be calculated in advance. In this case, the estimations in (3) and (6) only require a complexity of O(mn) and Oðn2 Þ respectively. 3. PCA based optimization method (PCAOM) From Theorem 3, the computation complexity of the algorithm is highly dependent on the dimension of X which is very large in
15
real network. The principal component analysis (PCA) is an efficient method for dimensionality reduction. In this section, we present a PCA based optimization model termed as PCAOM. In the following, we differentiate the covariance matrix between X and Y, i.e., we denote VX as covariance matrix of X and VY as that of Y. 3.1. PCAOM We denote X~ as the principal component (PC) vector of X. That is, given the covariance matrix V X ¼ covðX; XÞ, we have the orthogonal transformation V~ X ¼ Z TX V X Z X ; where ZX being an orthogonal matrix whose columns are normal eigenvectors of VX, V~ X ¼ diagðσ 1 ; σ 2 ; …; σ n Þ is a diagonal matrix, σ i ði ¼ 1; …; nÞ are the eigenvalues of VX. We arrange the eigenvalues by the decreasing order, that means, the element σ1 is the largest, σ2 is the second one, and so on. The corresponding eigenvectors are arranged suitably in ZX. By the theory of PCA, we have X~ ¼ Z TX X:
ð9Þ
Since the link volume Y is a linear combination of TM X, it elements also are correlated. By the same way as for X, we define Y~ as the PC vector of Y and have Y~ ¼ Z TY Y; where ZY is an orthogonal transformation matrix, whose columns are eigenvectors of VY arranged in the same way as of ZX. By the defined notations above, we have X ¼ Z X X~ and Y ¼ Z Y Y~ since Z X ; Z Y are orthogonal matrices and Z X 1 ¼ Z TX , Z Y 1 ¼ Z TY . Then, we can transform OP2 into the following optimal problem: ð0Þ 1 ð0Þ minðX~ X~ ÞT V~ X ðX~ X~ Þ subject to : AZ X X~ ¼ Z Y Y~ ;
OP3 :
ð10Þ
ð0Þ where X~ ¼ Z TX X ð0Þ , and the diagonal matrix V~ X is the covariance matrix of PC vector X~ . The problem OP3 tries to find the minimal Mahalanobis distance between the PC vector X~ of X with the ð0Þ vector X~ under the constraints AZ X X~ ¼ Z Y Y~ , which is equivalent to the constraints AX ¼Y. Now the OP3 has the form of OP2, so it can be solved by (6). According to the results of PCA theory, in general the most important components are concentrated on the first some components of X~ , Y~ , we can reduce the dimensions of the problem OP3 by discarding the less important components. For X~ , the cumulative energy content c(k) is defined as the sum of the energy content across all of the eigenvalues from 1 through k:
cðkÞ ¼
k X
σi:
i¼1
In order to preserve most of the information while reducing the dimensions as small as possible, we can choose a value of kn as n small as possible so that the cumulative energy cðk Þ proportion is above a certain threshold L, like 90 percent. That is, given the threshold L, cðkÞ n ZL ; ð11Þ k ¼ arg min k∣ cðnÞ where c(n) is the total energy of all the elements. Similarly we can do the same reduction for Y~ . Through this way, we can keep proper information while reducing the dimensionality largely. Now given a threshold L, assume through the dimensionality reduction of PCA, there are left
16
E. Zhao, L. Tan / Journal of Network and Computer Applications 57 (2015) 12–20
^ elements of Y~ , then the the first n^ elements of X~ and the first m problem becomes ð0Þ 1 ð0Þ minðX^ X^ ÞT V^ X ðX^ X^ Þ subject to : A^ Z^ X X^ ¼ Z^ Y Y^ ;
Based on (15), we get a generalized PCAOM algorithm in Algorithm 2.
OP30 :
ð12Þ
where V^ X is the principal n^ n^ sub-matrix of V~ , A^ is the first sub^ n^ rows, Z^ X ; Z^ Y are the n^ n^ principal matrix of A with the m ^ m ^ principal sub-matrix of ZX and ZY sub-matrix and the m respectively. Again, the OP30 has the form of OP2, from (6) the solution is n ^ A^ Z^ X RÞ ^ þ Z^ Y Y^ þ ðI Rð ^ A^ Z^ X RÞ ^ þ A^ Z^ X ÞX^ ð0Þ ; X^ ¼ Rð
ð13Þ
where R^ is the Choleskey decomposition matrix of V^ X satisfied T V^ X ¼ R^ R^ , it is noted that the V^ X is a diagonal matrix and so there is no need to do Choleskey decomposition really. Now omitting the less important components of X~ and from (9), the OP2 problem can be solved approximately by X n ¼ Z X X^
n
Require The routing matrix A; the link counts Y. the prior covariance matrices VX of X and VY of Y; the prior TM X ð0Þ ; the energy threshold L; the weight factor γ; Ensure VX and VY are positive; 0 o L r 1; 0 r γ r 1; 1: Performing PCA for VX and VY, and getting Z X ; V~ X ; Z Y ; V~ Y ;
^ according to (11); 2: Find n^ and m 3: Extracting V^ X from V~ X , and calculate R^ from V^ X ; 4: Extracting Z^ X ; Z^ Y from Z X ; Z Y and extracting A^ from A; n 5: Find X^ from (15); 6: Extracting Z X from ZX and get Xn from (14); 7: Performing the IPF procedure to get the last TM X return X;
ð14Þ
where Z X is the first n n^ principal sub-matrix of Z X . The complexity of (13) is the same as that of (6) since it adds an extra PCA transformation and omits the Choleskey decomposition with the same computation requirements. For the online estima2 tion, the complexity can reduced to n^ from Remark 2. 3.2. The whole algorithm of PCAOM From the subsection above, we summarized the whole estimation procedure as in Algorithm 1. Algorithm 1. The algorithm of PCAOM. Require : The routing matrix A; the link counts Y; the prior covariance matrices VX of X and VY of Y; the prior TM X ð0Þ ; the threshold L; Ensure VX and VY are positive; 0 o L r 1; 1: Performing PCA for VX and VY, and getting Z X ; V~ X ; Z Y ; V~ Y ;
^ according to (11); 2: Find n^ and m 3: Extracting V^ X from V~ X , and calculate R^ from V^ X ; 4: Extracting Z^ X ; Z^ Y from Z X ; Z Y and extracting A^ from A; n 5: Find X^ from (13);
4. Performance evaluation In this section, we evaluate our method by comparing the estimating errors of our method PCAOM with that of the Tomogravity. The reason we choose Tomogravity as the reference model is that Tomogravity is the most similar method to our PCAOM, and is used for the practical network engineering with both accuracy and efficiency. The results are all based on the real data from the Abilene network (Zhang). The topology of the Abilene network is shown in Fig. 1 which is comprised of 12 nodes, and 54 links with circles denoting routers and directed lines as links. According to the dataset, there are 144 traffic OD flow altogether, and 24 weeks' data are collected and stored in files, named from X01 to X24 respectively. For all simulation results, we set the threshold parameter L ¼ 99:0%.
4.1. The efficiency of PCA for dimensionality reduction
6: Extracting Z X from ZX and get Xn from (14); 7: Performing the iterative proportional fitting (IPF) algorithm to get the last TM X 8: return X;
3.3. A generalized PCAOM algorithm As the explanation about the formula (6) in Remark 1, the n ^ þ Z^ Y Y^ ^ A^ Z^ X RÞ estimated X^ can also be considered as the sum of Rð ð0Þ ^ and a transformation of X . Therefore when we have less ð0Þ confidence about the prior of X ð0Þ (X^ ), we have reasons to believe that we should give less weight on it. With this regard, we introduce a weight factor γ to reflect this confidence, and generalize Eq. (13) into the following equation: n ^ ^ þ Z^ Y Y^ þ γ ðI Rð ^ þ A^ Z^ X ÞX ð0Þ ^ A^ Z^ X RÞ ^ A^ Z^ X RÞ X^ ¼ Rð
Algorithm 2. The generalized algorithm of PCAOM.
To illustrate the efficiency of PCA for dimensionality reduction in PCAOM, we experiment with all the 24 weeks' Abilene dataset. Figures 2 and 3 give the numbers of PCs required for TM X and link counts Y respectively to reserve various energy levels (threshold L). From Figs. 2 and 3, we can see that the dimensions of both X and Y have been reduced to a half or below for all weeks, since the number of flows are 144 and the link number is 54 in the network. Even for some weeks, the reduced dimensions reach 95% of its original dimensions, for example the week 4th, 6th, 7th, 8th and 23rd, 24th. Also we can see from the figures when L is large, the reduction is small.
ð15Þ
ð0Þ where γ A ½0; 1. With a small γ, the prior X^ plays less role in the n X^ , this should be applied in the case of the inaccurate prior. On the contrary, with a rather accurate prior, a large γ is expected to n let the prior play a more important effect on the X^ . When γ ¼ 0, the estimation X^ has no relationship with the prior, and when γ ¼ 1, Eq. (15) is just (13).
Fig. 1. The Abilene network topology.
E. Zhao, L. Tan / Journal of Network and Computer Applications 57 (2015) 12–20
where X n ðtÞ and X(t) is respectively the estimated n elements TM and the real n elements TM of the time t. The TRE is the mean estimation error of all the flows at of each time t, and so it represents the estimation errors of all flows as a whole at the studied time. The other performance metrics are the cumulative distribution function (CDF) of the mean relative error. For SRE and TRE, we denote CSRE and CTRE as the CDF of SRE and TRE respectively. From CSRE and CTRE, one can observe how the errors are distributed.
80 L = 90% L = 95% L = 99%
60 50 40 30 20
4.3. Numerical results
10 0 0
5
10
15
20
25
Week Fig. 2. The reserved dimension knof TM X by PCA with different thresholds L.
25 L = 90% L = 95% L = 99%
15
10
5
0
0
5
10
15
20
25
Week Fig. 3. The reserved dimension kn of link count Y by PCA with different thresholds L.
4.2. Performance metrics To evaluate the presented method, we introduce some performance measures. One popular performance measure used in TM estimation is the mean relative error. We consider this measure as the main performance metrics, but in different aspects. We differ the mean relative error as the spatial mean relative error (SRE) and temporal mean relative error (TRE) as in the references. Both of the mean relative error are in the sense of the L2 norm of the error vectors. The spatial mean relative error (SRE) of the kth OD flow is defined as follows: n
J X T ðkÞ X T ðkÞ J 2 SREðkÞ ¼ J X T ðkÞ J 2
J X n ðtÞ XðtÞ J 2 J XðtÞ J 2
40 Algo2(γ = 0.2) Algo1(γ = 1.0) TomoG
30 20 10 0
20
40
60 Flow ID
ð16Þ
where k ¼1,…,n, X nT ðkÞ and XT(k) are respectively the estimated and the real T elements in the traffic vector of the kth OD flow during the estimated interval 1…T, and T is the number of the estimated time points. The SRE of a flow is a metric for measuring the mean estimation error for the studied time interval, and so it reflects the spatial estimation error of each OD flow. The temporal mean relative error (TRE) of the time t is defined as follows: TREðtÞ ¼
4.3.1. Estimation errors Figures 4 and 5 show the estimation errors by two measures, SRE and TRE. We simulate all the 24 weeks' data and get similar results. Here we select the result from the 5th week data to illustrate the performance of the presented algorithm in both figures. In the figures, TomoG denotes the Tomogravity methods plotted in black color, with Algo1 and Algo2 referring to Algorithms 1 and 2, plotted in red and in blue respectively. For Algorithm 2, we set the weight parameter γ ¼ 0:2 in the simulation. Figure 4(a) plots the SREs for the three methods, and Fig. 4(b) gives their corresponding CSRE. The x-axis denotes OD flow ID and y-axis indicates SRE of TM estimations. In Fig. 4(b), x-axis represents the SRE , whereas y-axis indicates the CSRE. From Fig. 4(a) we can see that most
SRE
Reserved Dimension
20
Based on the performance metrics defined in the last subsection, we evaluate the performance of our estimation methods by using the data of the Abilene dataset to demonstrate the results. We only give the results of PCAOM method, i.e., the Algorithms 1 and 2 here for simply, since the performance of OP2 is not or the same accurate as that of the model PCAOM and only performs a little better than Tomogravity from our experiments. We estimated the whole data of the 24 files and show the results for some weeks. When γ ¼ 1:0, Algorithm 2 becomes Algorithm 1, so when we denote γ ¼ 1:0 in the figures, we also refer to Algorithm 1. We consider the online estimation and use the data of the last week to compute the covariance matrix and then estimate the TM of the current day. In all the simulation, we take TM prior X ð0Þ to be the solution of simple gravity method as the simple Tomogravity method does if not especially indicated.
ð17Þ
80
100
120
1 0.8 SRE CDF
Reserved Dimension
70
17
0.6 0.4
Algo2(γ = 0.2) Algo1(γ = 1.0) TomoG
0.2 0 0
1
2
3
4
5
SRE
Fig. 4. (a) SRE. (b) CSRE. Tomogravity in black, Algorithm 1 in red, and Algorithm 2 in blue. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
18
E. Zhao, L. Tan / Journal of Network and Computer Applications 57 (2015) 12–20
1
0.4 TomoG Algo1(γ = 1.0) Algo2(γ = 0.2)
TRE CDF
TRE
0.3 0.2 0.1 0
TomoG Algo1(γ = 1.0) Algo2(γ = 0.4)
0.8 0.6 0.4 0.2
0
500
1000
1500
0 0.1
2000
0.15
0.2
Time 1
0.35
0.4
0.3
0.35
0.4
TomoG Algo1(γ = 1.0) Algo2(γ = 0.4)
0.8 TRE CDF
TRE CDF
0.3
1
0.8 0.6 0.4
TomoG Algo1(γ = 1.0) Algo2(γ = 0.2)
0.2 0 0.05
0.25 TRE
0.1
0.15
0.2
0.25
0.3
0.35
0.6 0.4 0.2 0 0.1
0.4
TRE
0.15
0.2
0.25 TRE
4.3.2. TM tracking performance Figure 7 shows TM tracking samples. Here we give three OD flows, i.e. flow 16th, 66th, 93rd of the 5th week data, with true value in black, estimation with Tomogravity in red, and estimation with Algorithm 2 with a parameter γ ¼ 0:2 in blue. We omit the results of Algorithm 1 since it is not good as that of Algorithm 2. From the figure, we can see that for most of the time points, the estimated TM of our Algorithm 2 are obviously more close to the true TM than that of the Tomogravity for flow 16th and 93th.
OD #16
10
5
0
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Time 6 OD #66
of the flows' SREs of Algorithms 1 and 2 are lower than that of Tomogravity method. This can also be observed from the CSRE curve of Fig. 4(b), which shows that the CSRE curve of Algorithm 2 is above that of the other two algorithms, and the Tomogravity method has the lowest CSRE. We get the SREs and their empirical distributions for the three algorithms firstly. And then, we obtain the empirical expectations of the SREs as 2.1897, 1.6290 and 1.5815 for the three algorithms: Tomogravity, Algorithms 1 and 2. That is, the PCAOM Algorithm 2 has the smallest SRE mean and the Tomogravity has the largest one. Figure 5(a) plots the TRE of the three methods, and Fig. 5(b) shows the corresponding CTRE. The x-axis denotes time points of the week, which is distributed as a points for each 5 min, so there are 2016 points for a week. The y-axis indicates TRE. From Fig. 5(a) we can clearly see that the TREs of Algorithm 2 (in blue) are lower than that of the other two cases in all. This indicates that Algorithm 2 holds the highest accuracy over the time. And Algorithm 1 also behaves better than Tomogravity method obviously. The average of the TREs are 0.2322, 0.2045, and 0.1789 respectively for Tomogravity method, Algorithms 1 and 2. This is also can be verified from the CTRE curve in Fig. 5(b) which shows that the percent of the TREs lower than 0.2 in Algorithm 2 is 74.9%, for Algorithm 1 the value is 47.8%, and for Tomogravity it is only 11.6%. For TREs below 0.25, the numbers are 97.3%, 90.4% and 74.9% correspondingly. Therefore the TREs of our Algorithms 2 and 1, are better than those of Tomogravity methods as a whole. We have also given the CTRE curves in Fig. 6 for two extra weeks data, 13th week and 21st week. From the figure, we can draw the same conclusion as that in Fig. 5. The mean TREs of the 13th week are 0.2371, 0.2263, 0.2127, for Tomogravity method, Algorithm 1 and 2 respectively with γ ¼ 0:4. For the 21st week, the numbers are 0.2850, 0.2757, 0.2585 correspondingly.
Fig. 6. (a) CTRE of the 13th week TM. (b) CTRE for the 21st week TM. Tomogravity in black, Algorithm 1 in red, and Algorithm 2 in blue. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
4 2 0
0
200
400
600
800
1000 Time
1200
1400
1600
1800
2000
0
200
400
600
800
1000
1200
1400
1600
1800
2000
10 OD #93
Fig. 5. (a) TRE. (b) CTRE. Tomogravity in black, Algorithm 1 in red, and Algorithm 2 in blue. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
5
0
Time
Fig. 7. Estimation results about OD flows. True in black, Tomogravity in red, and Algorithm 2 in blue. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)
For flow 66th, the difference between the estimated TM of our Algorithm 2 with the true TM is as good as for that of Tomogravity method, and both methods approximate to the true TM rather well. Through studying the other flows, we find that this phenomenon is a general case, that is to say, our PCAOM method is obviously better than Tomogravity method does.
4.3.3. The influence of the parameters γ and the prior X ð0Þ For Algorithm 2, the weight parameter γ is important. From the formula (15), if the prior TM X ð0Þ is accurate, the γ should be large and to make X ð0Þ play more important role on the estimated TM. On other hand, if the prior TM X ð0Þ is inaccurate, γ should be small to make the X ð0Þ less important. Since we know that the estimation from Gravity method is less accurate than that of Tomogravity
E. Zhao, L. Tan / Journal of Network and Computer Applications 57 (2015) 12–20
Gravity method), we should turn small γ (0.3) to reduce the important weight of prior, while the prior are accurate (from Tomogravity method), we should turn large γ (0.7) to increase the important weight of prior. However, for both cases, when the optimal γ used, the PCAOM is better than the case of no prior (γ ¼ 0). We can also clearly see from the figure that with the same γ, the mean TRE using the rather accurate prior X ð0Þ (from Tomogravity method) is less than that of the inaccurate prior (from Gravity model). With the inaccurate prior, it is possible to get worse estimation results if γ is not properly selected. This also means, sometimes, no prior maybe better than an inaccurate prior. The CTRE of the estimations with different prior and different γ is also displayed in Fig. 9. From the figure we can also obtain the same conclusion.
0.21 0.205
Gravity Prior Tomogravity Prior
0.2
Mean of TRE
0.195 0.19 0.185 0.18 0.175 0.17 0.165
0
0.2
0.4 0.6 Parameter γ
0.8
1
Fig. 8. The mean of TRE with various γ and TM prior. 1 0.9 0.8
TRE CDF
0.7 0.6 0.5 0.4 0.3
TomoG Gravity Prior: γ = 0.3 Gravity Prior: γ = 0.7 TomoG Prior: γ = 0.3 TomoG Prior: γ = 0.7
0.2 0.1 0 0.1
0.15
0.2
0.25
0.3
0.35
TRE
1
0.8
TomoG TomoG: δ =0.05 TomoG: δ =0.08) Algo2(γ = 0.7): δ =0.05 Algo2(γ = 0.7): δ =0.08
TRE CDF
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.1
0.15
0.2
0.25
0.3
4.3.4. Robustness performance This subsection discusses the robustness of the two methods: Algorithm 2 and Tomogravity method. As Zhang et al. (2003b) and Jiang et al. (2011a), we introduce the error ϵ ¼ Y nNð0; δÞ to the link load Y with Nð0; δÞ denote the normal distribution with mean 0 and variance δ. We discuss the robustness of two methods in two cases of δ ¼ 0:05 and δ ¼ 0:08 for the 5th week data. Again, we use the CTRE as the criteria and plot the CTRE curves in Fig. 10. Here, we take the parameter γ ¼ 0:7 with a gravity prior in Algorithm 2. It is found from the figure that the introduction of the error can affect the estimation accuracy obviously. However, the estimation from Algorithm 2 with noise is even better than that of Tomogravity method with free noise. This indicates that our PCAOM method is more robust than Tomogravity method.
5. Conclusion
Fig. 9. The CTRE with various γ and TM prior.
0.9
19
0.35
TRE
Fig. 10. The CTRE with noise.
method, we take these two estimations as prior and see what is the influence of X ð0Þ . Figure 8 shows the mean TREs with γ varying from 0.0 to 1.0 for the two kinds of priors for the 5th week data. From the figure, we can see that γ has an important effect on the TM estimation accuracy and there exist different optimal γ values for the two priors. Reading from the figure, the optimal γ for Tomogravity prior is about 0.7 and the optimal γ for Gravity prior is about 0.3. This means that when the prior are inaccurate (from
We have provided a novel approach, termed as PCAOM, which is based on the generalized inverse of the routing matrix and PCA theory. The approach improves the accuracy of the solution of an important and well-known teletraffic problem of evaluating TM. We have formulated two optimal problems for estimating TMs, which are subsequently solved by the theory of generalized Moore–Penrose inverse of routing matrix explicitly. In particular, we model the problem with reduced dimensionality by PCA method and give the computing algorithms. Unlike most of the existing approaches, PCAOM gives explicit formulas to the TM estimation, in which the relationship between TM with link counts and prior TM is disclosed by using the generalized inverse of the routing matrix. Based on the explicit expression, the solution can therefore be further generalized and make it possible to adapt one's confidence about the TM prior by a weight parameter. Using the PCAOM approach, no assumption on TM distribution type is required and subsequently it is suitable to any kind of TM distribution. The performance analysis and numerical results with comparison to Tomogravity method have demonstrated that the TMs estimations derived from our PCAOM approach are closer to the original TMs. In particular, our PCAOM approach can be adaptively adjusted to the prior accuracy and it is more efficient by the PCA dimensionality reduction technology, which is particularly important for on-line estimation. Both theoretical analysis and numerical studies show that our methods have advantages over the existing representative methods and can be applied in real Internet engineering.
Acknowledgments This research is supported by the National Natural Science Foundation of China (Nos. 61370107 and 61202470) and by self-
20
E. Zhao, L. Tan / Journal of Network and Computer Applications 57 (2015) 12–20
determined research funds of CCNU from the colleges’ basic research and operation of MOE (No. CCNU14A05018). References Ben-Israel A, Greville TNE. Generalized inverses: theory and applications. New York: Wiley; 1974 [2003]. Cao Jin, Davis Drew, Vander Wiel Scott, Yu Bin. Time-varying network tomography: router link data. J Am Stat Assoc 2000;95(452):1063–75. Chen Xuzhou, Ji Jun. Computing the Moore–Penrose inverse of a matrix through symmetric rank-one updates. Am J Comput Math 2011;1:147. Goldschmidt O. Isp backbone traffic inference methods to support traffic engineering. In: Internet statistics and metrics analysis (ISMA) workshop. 2000. p. 1063– 75. Hua Zhiming, Qiao Yan, Luo Jun. Coarse-grained traffic matrix estimation for data center networks. Comput Commun 2015;56:25–34. Jiang Dingde, Wang Xingwei, Guo Lei. Mahalanobis distance-based traffic matrix estimation. Eur Trans Telecommun 2010;21(3):195–201. Jiang Dingde, Wang Xingwei, Guo Lei, Ni Haizhuan, Chen Zhenhua. Accurate estimation of large-scale ip traffic matrix. AEU—Int J Electr Commun 2011a;65(1):75–86. Jiang Dingde, Xu Zhengzheng, Chen Zhenhua, Han Yang, Xu Hongwei. Joint time– frequency sparse estimation of large-scale network traffic. Comput Netw 2011;55:3533–47. Lakhina Anukool, Crovella Mark, Diot Christophe. Diagnosing network-wide traffic anomalies. In: ACM SIGCOMM computer communication review, vol. 34. ACM; 2004a. p. 219–30. Lakhina Anukool, Papagiannaki Konstantina, Crovella Mark, Diot Christophe, Kolaczyk Eric D, Taft Nina. Structural analysis of network traffic flows, vol. 32. ACM; 2004b. Luigi Conti Pier, De Giovanni Livia, Naldi Maurizio. Blind maximum likelihood estimation of traffic matrices under long-range dependent traffic. Comput Netw 2010;54(15):2626–39. Medina Alberto, Taft Nina, Salamatian Kave, Bhattacharyya Supratik, Diot Christophe. Traffic matrix estimation: existing techniques and new directions. In: ACM SIGCOMM computer communication review, vol. 32. ACM; 2002. p. 161– 74. Nie Laisen, Jiang Dingde, Guo Lei. A convex optimization-based traffic matrix estimation approach in IP-over-WDM backbone networks. J Netw Comput Appl 2015;50:32–8.
Roughan Matthew, Greenberg Albert, Kalmanek Charles, Rumsewicz Michael, Yates Jennifer, Zhang Yin. Experience in measuring backbone traffic variability: models, metrics, measurements and meaning. In: Proceedings of the 2nd ACM SIGCOMM workshop on internet measurement. ACM; 2002. p. 91–2. Roughan Matthew, Thorup Mikkel, Zhang Yin. Traffic engineering with estimated traffic matrices. In: Proceedings of the 3rd ACM SIGCOMM conference on internet measurement. ACM; 2003. p. 248–58. Shioda Shigeo, Ohtani Kazuya. Estimating the source–destination traffic matrix of a VPN from access-link loads. Comput Commun 2006;29(18):3663–78. Soule Augustin, Lakhina Anukool, Taft Nina, Papagiannaki Konstantina, Salamatian Kave, Nucci Antonio, et al. Traffic matrices: balancing measurements, inference and modeling. In: ACM SIGMETRICS performance evaluation review, vol. 33. ACM; 2005. p. 362–73. Tan Liansheng, Wang Xiangjun. On IP traffic matrix estimation. In: Proceedings of 16th international conference on computer communications and networks, 2007. ICCCN 2007. IEEE; 2007. p. 617–24. Tan Liansheng, Wang Xiangjun. A novel method to estimate ip traffic matrix. IEEE Commun Lett 2007b;11(11):907–9. Tan Liansheng, Zhou Haifeng. Tomofanout: a novel approach for large-scale IP traffic matrix estimation with excellent accuracy. Ann Telecommun 2015;70(3– 4):149–58. Tebaldi Claudia, West Mike. Bayesian inference on network traffic using link count data. J Am Stat Assoc 1998;93(442):557–73. Tian Hui, Zhong Binze, Shen Hong. Diffusion wavelet-based analysis on traffic matrices by different diffusion operators. Comput Electr Eng 2014;40:1874–82. Vardi Yehuda. Network tomography: estimating source–destination traffic intensities from link data. J Am Stat Assoc 1996;91(433):365–77. Wang Zhe, Hu Kai, Xu Ke, Yin Baolin Kai, Dong Xiaowen. Structural analysis of network traffic matrix via relaxed principal component pursuit. Comput Netw 2012;56(7):2049–67. Zhang Yin, Roughan Matthew, Lund Carsten, Donoho David. An informationtheoretic approach to traffic matrix estimation. In: Proceedings of the 2003 conference on applications, technologies, architectures, and protocols for computer communications. ACM; 2003a. p. 301–12. Zhang Yin, Roughan Matthew, Nick Duffield, Greenberg Albert. Fast accurate computation of large-scale ip traffic matrices from link loads. In: ACM SIGMETRICS performance evaluation review, vol. 31. ACM; 2003b; p. 206–17. Zhang Yin, Roughan Matthew, Lund Carsten, Donoho David L. Estimating point-topoint and point-to-multipoint traffic matrices: an information-theoretic approach. IEEE/ACM Trans Netw (TON) 2005;13(5):947–60. Zhang Yin. Abilene dataset. URL 〈http://www.cs.utexas.edu/yzhang/research/Abile neTM/〉.