Linear Algebra and its Applications 471 (2015) 369–382
Contents lists available at ScienceDirect
Linear Algebra and its Applications www.elsevier.com/locate/laa
Parameter-invariant models for load balancing on heterogeneous networks ✩ Chenggui Zhao School of Information, Yunnan University of Finance and Economics, Kunming, 650221, China
a r t i c l e
i n f o
Article history: Received 19 July 2014 Accepted 4 January 2015 Available online 28 January 2015 Submitted by A. Frommer MSC: 05C50 05C90 68M07 Keywords: Parameter-invariant Iteration LB Laplacian
a b s t r a c t Several schemes for homogeneous networks have been generalized for load balancing (LB) on heterogeneous networks. However, LB parameters must be recalculated with the change of weight distribution among network nodes in known schemes and this increases the computational complexity and weakens numerical stability on the base of original schemes in which LB parameters are only decided by communication topology. The known load balancing strategies are redesigned to contain a diffusion matrix M ∗ with parameters independent of network heterogeneity. In the proposed strategies, LB parameters are only decided by network topology which is identical with the situation of homogeneous network. This leads to lower overhead of computing LB parameters and fewer LB steps than in the known schemes. Furthermore, the proposed schemes have little variation in the number of LB steps to distinct load distributions, so that they have an enhanced numerical stability. Theoretical and experimental results verify these improvements. © 2015 Elsevier Inc. All rights reserved.
✩ This paper is supported by the Applied Basic Research Project of Yunnan Province, China, under Grant No. 2010ZC095, and the Natural Science Foundation of Education Department of Yunnan Province, China, under Grant No. 2012Z064. E-mail address:
[email protected].
http://dx.doi.org/10.1016/j.laa.2015.01.002 0024-3795/© 2015 Elsevier Inc. All rights reserved.
370
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
1. Introduction Load balancing (LB) is an important technology to improve the performance of parallel and distributed systems when highly uneven load distribution arises and system performance degrades evidently. LB algorithms are used to make nodes migrate their loads from weighty nodes to light ones. Generally, LB algorithms can be executed in two steps. Firstly, calculate the amount of load that should be migrated between a node and its neighbors for achieving the balanced status. Secondly, select load items and transfer them. Several significant LB algorithms have been developed for general homogeneous networks. In the case of globally known node loads, R. Blake et al. [4] solved the load balancing issue by solving a linear system of equations. If nodes only know load information of their direct neighbors, the network must exchange loads among all linked nodes step by step to achieve the balanced status using local iterative LB algorithms that are divided into two types: diffusion with all-ports and dimension exchange with single-port communicating mode between a node and its neighbors in an LB step. Diffusion LB algorithms were developed by R. Diekmann et al. [3] and they also include a strict theoretical analysis of the accuracy and efficiency of the proposed algorithms. R. Elsässer et al. [7] presented a diffusion algorithm called optimal diffusion scheme (OPT) that uses all the nonzero Laplacian eigenvalues of the communication graph to construct the iteration polynomial which describes the LB process. The eigenvalues of the graph Laplacian are computed to build a polynomial iteration method and sorted to avoid possible trap of numerical instability. OPT performs fast when the graph spectrum is easy to calculate (as it is for some small graphs and regular graphs). The LB algorithms stated in [3] can only be applied to homogeneous networks (namely, all nodes with equal computation capability) since the node load is balanced to the average of the loads distributed in the whole network. However, many load balancing issues come from applications in heterogeneous environments where network nodes have different processing capabilities. This requires LB algorithms for heterogeneous networks. R. Elsässer et al. [8] generalized several algorithms (including FOS [1], SOS [5], CHEBY [6] and OPT [7]) to be suitable for solving the LB problem in heterogeneous networks. Their schemes balance node loads to be proportional to their weights, and the balancing flow is minimal with regard to the l2 -norm. The major alteration to the original algorithms is to replace the Laplacian matrix L with a generalized Laplacian LC −1 where C is the diagonal matrix whose diagonal entries are the node weights. So we shall refer to their algorithm on the base of an existing X scheme as X-C scheme. The quality of LB algorithms was explored in [8] and the quality was evaluated by two factors, the number of iterations and the amount of load migrated over links to achieve the balanced status. The convergence of the iteration is influenced by a key parameter α called diffusion parameter, which is decided by two values, the second smallest and the maximal eigenvalue of LC −1 . It is clear that these two values vary when the network heterogeneity changes. This means all eigenvalues of the whole network must be recalculated once node weights (even for just one node) are changed, which increases
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
371
the computational overhead and weakens numerical stability compared to the original X scheme. Like most LB schemes, the quality of the load flow is measured by the l2 -norm and the balancing flow of the schemes given in [8] is minimal with regard to the l2 -norm. Other significant work in load balancing was already done in recent years. Rotaruon et al. [9] proposed a variant of generalized diffusion for a heterogeneous environment characterized by different computing powers and uniform communication. They showed that their algorithm theoretically converges faster than the hydrodynamic algorithm [2]. Dimitrakopoulou et. al [12] also proposed an optimal diffusion policy for heterogeneous network and they developed the convergence theory of the local Extrapolated Diffusion (EDF) method for heterogeneous torus networks. Sharma and Kanungo [11] presented a dynamic load balancing algorithm for heterogeneous multi-core processor cluster. Devine et al. [10] provided a survey of dynamic load balancing technologies. This paper develops several strategies with LB parameters only related to network topology, described in a framework similar to [8]. The proposed schemes exhibit an enhancement in iterative efficiency and numerical stability to the X-C schemes in [8]. Simulation results on some familiar network topologies, under various initial conditions, demonstrate the improvement given by this work. We firstly present some basic definitions and notations for convenience to describe diffusion load balancing algorithms and then extend several diffusion algorithms to heterogeneous networks. At last, we analyze the performance of these schemes and present simulation results to show the viability of our approach. 2. Problem formulation and related work Let G(V, E) denote an undirected and connected graph with n nodes and m edges. Node vi in V has a weight ci and load wi . Let wi = (w1i , w2i , . . . , wni )T be a node loads vector in the i-th LB step and let c = (c1 , c2 , . . . , cn )T denote the node weight vector. Let u be the all-1 vector. The balanced load vector w∗ should satisfy the following formula w∗ =
1 uT w 0 c = T cuT w0 . T u c u c
(1)
Let A be the adjacency matrix and B be the node–edge incidence matrix of G, respectively. The Laplacian L of G is defined as L = BB T and all distinct eigenvalues of L are sorted as 0 = λ1 < λ2 < . . . < λs . The weighted Laplacian Lw is defined as T Lw = Bw Bw , where Bw = BF −1/2 and F denotes an m × m diagonal matrix with the edge weights of G as its diagonal entries, for dealing with the case of different communication speed on network links (weighted on edges of G) (see [3] for more details of the weighted Laplacian). Let x = (x1 , x2 , . . . , xm )T be a vector with its i-th entry xi representing the flow on the i-th edge of G. It is a balancing flow on G if and only if Bx = w0 − w∗ . Such equation means that the difference between the initial load and the balanced load at each node is diffused by a flow on edges incident on this node according to the flow
372
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
Algorithm 1 [FOS-C] 1: for all edges e = (vi , vj ) of G do 2: yek−1 = α/fe (wik−1 /ci − wjk−1 /cj ); 3: xke = xk−1 + yek−1 ; e k−1 k 4: wi = wi − e=(vi ,vj )∈E yek−1 ; 5: end for
directions. The balanced status is achieved globally after undergoing a flow xe along edge e of G. The l2 -norm of x is denoted as x2 . Local diffusion LB algorithms work using load information of neighboring nodes only for the reason of globally unknown load information. Load migration is executed only among adjacent nodes. If only the form of the iteration polynomial is evaluated, the simplest of these algorithms is the first order scheme (FOS) presented by Cybenko [1]. Let y k and xk be two flow vectors whose entries represent the current flow and the total load flow up to the k-th iteration along edges of G respectively. FOS can be expressed as the matrix iterative form wk = M wk−1 , and the matrix M = I − αL is called as the diffusion matrix for the homogenous network. The parameter α determines the speed of the iteration to the balanced load w∗. The optimal value of parameter α is α = 2/(λ2 (L) + λs (L)), which leads to the fastest rate of convergence to the balanced status. Algorithm 1 describes the generalized FOS [8] for heterogeneous networks. The main alteration of FOS-C is to define the diffusion matrix M as M = I − αLC −1 in which the optimal value of the parameter α is given as α = 2/(λ2 (LC −1 ) + λs (LC −1 )). This means that if the nodes weights distributed in the network are changed, the eigenvalues of LC −1 must be recalculated to obtain a renewed value α. Obviously, this is not necessary when the network is homogenous. Replacing L with LC −1 , several LB schemes SOS, CHEBY, and OPT were generalized in [8], indicated as SOS-C, CHEBY-C and OPT-C respectively in this article. A polynomial based load balancing scheme is any scheme for which the work load wk in step k can be expressed in the form wk = pk (M )w0 , where pk (x) is a polynomial with degree less than or equal to k satisfying p(1) = 1. When a polynomial based LB scheme is algorithmically feasible, the polynomial pk (x) must satisfy some kind of short recurrence relation so that wk can be computed from wk−1 . The performance of such a scheme will depend on the speed with which the error vector ek = wk − w∗ tends to zero. This paper reconstructs the polynomial based LB schemes FOS, SOS, CHEBY, and OPT by modifying the diffusion matrix M to M ∗ , which leads to a better convergence performance. 3. Parameter-invariant load balancing for heterogeneous networks In this section, we firstly construct the diffusion matrix M ∗ , which contains the LB parameter α independent of the weights matrix C. Then several existing algorithms, FOS, SOS, CHEBY and OPT, are generalized to heterogeneous networks and we shall
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
373
refer to these schemes based on M ∗ as X-R schemes since they imply a key matrix R contained in L∗ . 3.1. Parameter-invariant diffusion matrix M ∗ Let R denote the matrix defined as R = I −(1/uT c)cuT where I is the identity matrix. We now construct a generalized Laplacian L∗ of G as L∗ = LR, and based on L∗ we build a parameter-invariant diffusion matrix M ∗ as M ∗ = I − αL∗ . In the following, we will show that M ∗ is suitable to construct a polynomial LB scheme, and that the diffusion parameter α will not vary with different node weights. Lemma 1. The matrices L and L∗ have the same eigenvalues and eigenvectors, except the eigenvectors corresponding to the eigenvalue 0. Proof. Firstly, since Rc = 0 we have that c is an eigenvector of L∗ with eigenvalue 0. If z is an eigenvector of L other than u, belonging to some eigenvalue b, then z is orthogonal to u. So Rz = z so that L∗ z = Lz which demonstrates that z is also an eigenvector of L∗ , with the same eigenvalue b. 2 Lemma 2. Let {zi | 1 ≤ i ≤ n} be the linear independent eigenvectors corresponding to the increasingly ordered eigenvalues {λi | 0 = λ1 ≤ λ2 ≤ . . . ≤ λn } of L∗ respectively. Then n {zi | 1 ≤ i ≤ n} constitutes a basis of Rn . Let w0 = i=1 ai zi for some constants ai . Then the error vector ek satisfies e0 = a2 z2 + a3 z3 + . . . + an zn and ek = pk (M ∗ )e0 , k = 1, 2, . . . . Proof. Lemma 1 shows that z1 is an eigenvector of L∗ corresponding to the eigenvalue 0 and zi (2 ≤ i ≤ n) are also eigenvectors of L so {zi | 2 ≤ i ≤ n} is orthogonal to u. From L∗ z1 = L(I − (1/uT c)cuT )z1 = 0 we know that (I − (1/uT c)cuT )z1 = ku with k ∈ R. Since uT (I −(1/uT c)cuT )z1 = 0 = kuT u we have k = 0, and thus (1/uT c)cuT z1 = z1 . As zi when i ≥ 2 is perpendicular to u, we have w∗ = (1/uT c)cuT w0 = (1/uT c)cuT (a1 z1 + a2 z2 + . . . + an zn ) = a1 (1/uT c)cuT z1 = a1 z1 . The first equation now follows from e0 = w0 − w∗ . Since p(1) = 1 and w∗ is an eigenvector of pk (M ∗ ) with eigenvalue 1, we have ek = wk − w∗ = pk (M ∗ )(w0 − w∗ ) = pk (M ∗ )e0 . 2 Besides the conclusions stated above, we also see that L∗ and L have nearly equal computation cost for matrix–vector multiplication, since L∗ w involves O(n2 + n) operations and Lw needs O(n2 ) for any vector w. As the main theoretical result, LB schemes based on L∗ have the same rate of convergence than the ones based on L, explained in the following theorem. Theorem 3. Any LB scheme with iteration polynomials wk = pk (M ∗ )w0 produces the same error ek as the one produced by wk = pk (M )w0 .
374
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
Proof. An LB scheme with wk = pk (M ∗ )w0 produces the error ek = pk (M ∗ )e0 = n n pk (M ∗ ) i=2 ai zi = i=2 pk (1 − λi )ai zi , which is equal to the error by the iteration wk = pk (M )w0 due to the common eigenvectors {zi | 2 ≤ i ≤ n} of L and L∗ . 2 3.2. Redesigning several existing load balancing schemes for heterogeneous environments The matrix M ∗ has eigenvalues μi = 1 − αλi (0 ≤ i ≤ s) which depend on the key parameter α. Define the parameter γ as γ = max{|μ2 |, |μm |} = max{|1 − αλ2 |, |1 − αλm |}. It has been proved in [3] that optimal convergence of the LB procedure is obtained when γ is minimal which is γ = (λ2 − λm )/(λ2 + λm ) and is achieved when α = 2/(λ2 +λm ). It follows from the previous theorems that M ∗ has the same eigenvalues as M , so that M ∗ and M have the same optimal LB parameter α. We redesign existing LB strategies by replacing the diffusion matrix M of the original schemes with M ∗ and employing the same parameters as in the original schemes based on M . We refer to these redesigned strategies as X-R and those presented in [8] as X-C, related to the original schemes X. We also prove that these redesigned schemes actually have the same convergence performance than the original ones. It is known from Theorem 3 that s ek = i=2 pk (μi )ai zi and k e ≤ max pk (μi ), 2 ≤ i ≤ s e0 . 2 2
(2)
3.2.1. First order scheme (FOS-R) The earliest local LB algorithm is the first order scheme (FOS) [1,5]. FOS takes the iteration polynomial as pk (t) = tk . Replacing the original diffusion matrix M with M ∗ = I − αL∗ , FOS can be redesigned as FOS-R resulting in wk = M ∗ wk−1 (k = 1, 2, . . .). Considering |pk (μi )| = μki ≤ γ k for 2 ≤ i ≤ s, by Eq. (2), the error estimation k e ≤ γ k e0 2 2
(3)
holds. 3.2.2. Second order scheme (SOS-R) The second order scheme (SOS) was introduced in [5], which uses the overrelaxed scheme to obtain wk as w1 = M ∗ w0 , wk = βM ∗ wk−1 + (1 − β)wk−2 , k ≥ 2. The iterative polynomials pk (t) are defined according to p0 (t) = 1, p1 (t) = t, pk (t) = βtpk−1 (t) + (1 − β)pk−2 (t). It is known [6] that wk converges to w∗ whenever 0 < β < 2 and the optimal β is βo = 2/(1 + (1 − γ)1/2 ). Combining the known fact that max{|pk (μi )|, 2 ≤ i ≤ s} = (βo − 1)k/2 (1 + k(1 − γ)1/2 ) (see [6] and [3]) and Eq. (2), we have k e ≤ (βo − 1)k/2 1 + k(1 − γ)1/2 e0 . 2 2
(4)
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
375
Algorithm 2 [OPT-R] for k from 1 to s − 1 do wk = (I − 1/λk+1 L∗ )wk−1 ; y k = 1/λk+1 B T Rwk−1 ; xk = xk−1 + y k ; end for 3.2.3. Optimal polynomial scheme (OPT-R) OPT [3,7] has an upper bound of s − 1 iterations to achieve the balanced status, and it is optimal relative to the aforesaid diffusion schemes for which usually need more than s − 1 iterations. But OPT requires to compute all distinct eigenvalues of L and it has a possible trap of numerical instability. So OPT is more suitable for regular networks since their Laplacian eigenvalues are easy to calculate. In [7], it was shown how to avoid possible trap of numerical instability. OPT can be redesigned as OPT-R as given by Algorithm 2. It should be noted that OPT-R uses the same LB parameter λk+1 as OPT. According to Theorem 3, the error ek satisfies es−1 =0. The flow xs−1 satisfies Bxs−1 = w0 − w∗ and becomes balanced. Clearly, if the eigenvalues of the Laplacian are easily available, there is an advantage of OPT over SOS, CHEBY, and FOS. 4. Flow analysis The flows produced by general LB schemes for homogeneous networks are minimal with regard to the l2 -norm. The basic results on this topic were shown in [3]. In this section, we will analyze the flow x produced by the X-R strategies and show that it is also l2 -minimal. Lemma 4. (See [3].) Let G be connected and L be the Laplacian of G. The equation Lz = b has a solution if and only if b belongs the orthogonal complement of the space spanned by u. Lemma 5. (See [3].) Provided that b is orthogonal to u, the solution of minimizing x2 over all x with Bx = b is given by x = B T z, where z satisfies Lz = b. As the vector w0 − w∗ is orthogonal to u, Lemma 5 shows that the equation Lz = w0 − w∗ has a solution z so that the flow defined by x = B T z is an l2 -minimal flow and satisfies Bx = w0 − w∗ . The next lemma shows that if a sequence of work loads converges to the average load, and if there is an l2 -minimal flow for each such load, then these minimal flows converge to the minimal flow for the average load. Lemma 6. (See [3].) Let wk be a sequence of work loads which converges to the average load w∗ with the minimal flow xk (see Lemma 5) defined by Bxk = w∗ − w0 having the form xk = B T z k , where Lz k = wk − w0 . Then there is a flow x∗ as the limit of
376
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
sequence xk . The flow x∗ is l2 -minimal and satisfies Bx∗ = w∗ − w0 , thus becoming an average flow. Now, Theorem 7 shows that our schemes result in an l2 -minimal flow. Theorem 7. Let the polynomial pk (t) be defined by the recurrence formula pk (t) = (πk t − τk )pk−1 (t) + θk pk−2 (t), with θ0 = 0 and πk + τk + θk = 1. For k ≥ 1, let b0 = −απ1 Rw0 ; x1 = y 0 = B T b0 ; w1 = w0 + By 0 . For k ≥ 2, let bk−1 = −απk Rwk−1 − θk bk−2 ; y k−1 = B T bk−1 ; xk = xk−1 + y k−1 ; wk = wk−1 + By k−1 be the update process for xk and wk . Then for wk = pk (I − αLR)w0 , we have limk→∞ wk = w∗ , limk→∞ xk = x∗ , and x∗ is the l2 -minimal flow which satisfies Bx∗ = w∗ − w0 . Proof. This theorem can be proved by substituting the matrix R for C −1 in Theorem 1 in [8], without any further changes. 2 5. Simulation results This section includes some simulation results to illustrate the performance of the X-R schemes, executed on some familiar communication architectures including the one-dimensional 64 nodes Path, the two-dimensional Mesh 8 × 8 and the six-dimensional Cube(6) networks (for short, denoted with Path, Mesh and Cube, respectively, in the sequel). Each of them consists of 64 nodes. First, we evaluate the efficiency and stability of X-R on various combinations of node loads and weights on each of the topologies. Second, we compare X-R with X-C on Mesh 8 × 8 and Cube(6). For these two purposes we built a simulator to generate topologies and configurations, described in the following. 5.1. Load balancing results with X-R schemes To emphasize the heterogeneity of networks, different weights are assigned to network nodes in three different ways. Considering the influence of initial loads, we did experiments with highly unbalanced as well as randomly distributed initial loads. In addition, the Laplacian eigenvalues included in the OPT-X schemes are reordered as the Leja points to improve the efficiency and stability of the numerical simulation process. The details of these initial conditions can be described as follows. Node weight assignment: HOMO: All nodes have identical weight 1. This simulates a homogeneous network, as a special case of heterogeneous network with C = I. SEMI: Half of nodes have a weight 1 and others 2. This simulates a heterogeneous network consisting of two sorts of subnetwork. CS: One node has a high weight n+1 and the others 1. This simulates a heterogeneous network with the traditional client–server model which serves as an important network model.
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
377
Table 1 The LB parameters and the number s of distinct eigenvalues. Network
Weight
s
λ2
Path
HOMO SEMI CS
64 64 64
0.0024
HOMO SEMI CS
33 33 33
HOMO SEMI CS
7 7 7
Mesh
Cube
λm
αo
βo
γ
3.9976
0.5000
1.9065
0.9988
0.1522
7.6955
0.2549
1.5676
0.9612
2.0000
12.0000
0.1429
1.1766
0.7143
Node initial load distribution: PEAK: One node has a massive load of 100n and all the others have load 0; RAN: The total load amount of 100n is uniformly randomly distributed among n nodes. Eigenvalues order: Initially, all the distinct Laplacian eigenvalues are increasingly sorted as 0 = λ1 < λ2 < . . . < λs . Further, these eigenvalues are rearranged as the Leja points order [13,14], i−1 according to λ1 = λ1 = 0, λ2 = λs , λi (3 ≤ i ≤ s) = arg maxλ∈Ki { j=2 |1 − λ/λj |λ}, where K3 = {λi | 2 ≤ i ≤ s − 1}, Ki (4 ≤ i ≤ s) = Ki−1 \{λi−1 }. Unless otherwise specified, we always preprocess the eigenvalues in the OPT-X schemes in this way. It can be seen from Table 1 that our schemes compute the LB parameters independently of the network heterogeneity, so only eigenvalues of the ordinary Laplacian L are necessary to implement the iteration process. This is in contrast to [8], where LB parameters were obtained using the distinct eigenvalues of LC −1 that vary with the network heterogeneity. Generally, the distinct eigenvalues of LC −1 with node weights SEMI and CS are more difficult to get than the ones with HOMO. For each of the networks, we conduct experiments with FOS-R, SOS-R and OPT-R. For preventing OPT from numerical instability, the eigenvalues of the Laplacian are ordered as the Leja points. Note that weights on the edges have not been considered since generally current networks have large bandwidth which means that the flows on links will not be limited. Table 2 demonstrates that our schemes have the same property of l2-minimal flow as the schemes in [8], despite of possibly different flows in the i-th iteration. The last six columns list the number of iterations for achieving the balanced status. Table 2 also shows that our schemes reduce the number of iterations remarkably on each experimental network as compared to the existing generalizations in [8], and this number is not sensitive to the node weights and initial loads. For instance, in the case of PEAK load and CS weight, FOS-C needs about 24 153 iterations on Path and 1945 on Mesh to converge to the balanced status but only 9077 and 285 iterations are required, respectively, by the FOS-R scheme in the same case. The convergence speed does not show an evident variation with the different weights and loads distributions which differs from [8] where
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
378
Table 2 The flow and the number of iterations for X-R and X-C schemes, compared under different combinations of load and weight distributions. Network
Load
Weight
Flow
Iterations FOS-R
SOS-C
SOS-R
OPT-C
OPT-R
PEAK
HOMO SEMI CS
29 213 25 676 14 606
9655 13 092 24 153
9655 9459 9077
294 340 476
294 289 279
63 63 63
63 63 63
RAN
HOMO SEMI CS
27 516 23 579 13 796
5831 11 559 24 174
7632 8577 9147
194 305 473
237 263 281
63 63 63
63 63 63
PEAK
HOMO SEMI CS
6849 6624 3424
303 470 1945
303 300 285
52 65 136
52 52 50
27 40 35
27 27 26
RAN
HOMO SEMI CS
326 778 2388
189 339 1946
225 261 287
35 49 136
45 47 50
27 39 38
27 26 26
PEAK
HOMO SEMI CS
2844 2812 1422
37 56 497
37 37 35
18 22 69
18 18 17
6 11 11
6 6 6
RAN
HOMO SEMI CS
195 276 1196
23 38 497
30 32 35
12 16 69
15 15 17
6 11 11
6 6 6
FOS-C Path
Mesh
Cube
Fig. 1. Load PEAK is balanced on Mesh 8 × 8 network by FOS-R (a) and OPT-R (b) respectively, with three distributions of node weights.
dramatically unbalanced initial loads and weights make the algorithms take quite more time to balance than those for relatively evenly distributed initial states. Fig. 1 and Fig. 2 illustrate the relation between the errors and the number of iterations by the OPT-R and FOS-R schemes on the Mesh network with weight models HOMO, SEMI, and CS, Fig. 1 with PEAK load distribution and Fig. 2 with RAN, respectively. On the Mesh network we achieve the balanced state using FOS-R after only about
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
379
Fig. 2. Load RAN is balanced on Mesh 8 × 8 network by FOS-R (a) and OPT-R (b) respectively, with three distributions of node weights.
300 iterations in either case for the initial loads and weights distributions. On the other hand, FOS-R has an apparent increase relative to OPT-R in the number of iterations. 5.2. Comparison of simulation results for X-R with X-C We compare the X-R schemes in terms of the number of iterations until convergence and stability with the X-C schemes presented in [8]. Only two communication topologies, Mesh 8 × 8 and Cube(6), are considered. In Fig. 3, PEAK and RAN load models are balanced for Mesh 8 × 8 under weight models SEMI and CS with FOS-R and FOS-C. Fig. 3 very clear illustrates the fact that FOS-R converges faster than FOS-C, with a rapid and monotone error decrease. As an observation, to balance the RAN load under the CS weight situation, FOS-R requires 287 iterations with maximal error 2081 whereas FOS-C requires 1946 iterations with maximal error 3173. In Fig. 4, the loads PEAK and RAN are balanced on Mesh 8 × 8 under weight models SEMI and CS with OPT-R and OPT-C. It is apparent that OPT-R needs less iterative steps than OPT-C in most cases. For instance, in the case of SEMI weight, OPT-R takes 27 iterations with maximal error 6321 but OPT-C takes 40 iterations with maximal error 14 051 to drive the PEAK load to the balanced status; and OPT-R takes 26 iterations with maximal error 4387 but OPT-C takes 39 iterations with maximal error 5318 to drive the RAN load to the balanced status. Nevertheless, in the case of CS weights, whatever initial node load, OPT-R needs less iterations than OPT-C and has a similar stability. Only taking PEAK into account, OPT-R requires 26 iterations with maximal error 3174 and OPT-C requires 35 iterations with maximal error 3174 to achieve the balanced status.
380
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
Fig. 3. Load PEAK (a) and RAN (b) are balanced on Mesh 8 × 8 by FOS-R and FOS-C with weight models SEMI and CS.
Fig. 4. Load PEAK (a) and RAN (b) are balanced on Mesh 8 × 8 by OPT-R and OPT-C with weight models SEMI and CS.
In the case of a communication topology of the type Cube(6) network, we also conduct the experiments with all kinds of load and weight distributions with X-R and X-C policies. It can be observed that FOS-R and OPT-R perform better than FOS-C and OPT-C in almost all settings, as Fig. 5 and Fig. 6 illustrate. The difference is remarkable when FOS-R and FOS-C are compared and slight in the case of OPT-R and OPT-C. It should be noted that sometimes OPT-R performs even worse than OPT-C on Mesh
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
381
Fig. 5. Load PEAK (a) and RAN (b) are balanced on Cube(6) by FOS-R and FOS-C with weight models SEMI and CS.
Fig. 6. Load PEAK (a) and RAN (b) are balanced on Cube(6) by OPT-R and OPT-C with weight models SEMI and CS.
but always better on Cube, which means that the complex spectrum structure of the communication topology might weaken the advantage of the proposed OPT-R scheme. 6. Conclusion In this paper, we contribute several diffusion schemes with a new diffusion matrix for load balancing on heterogeneous networks. Theoretical and practical results prove that
382
C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382
the proposed schemes evidently improve the iteration efficiency and numerical stability. In the future, these schemes should be applied to more distributed environments and it should also be considered to extend them to large scale environments, like multistage and OTIS interconnection networks. References [1] G. Cybenko, Load balancing for distributed memory multiprocessors, J. Parallel Distrib. Comput. 7 (1989) 279–301. [2] C.C. Hui, S.T. Chanson, Theoretical analysis of the heterogeneous dynamic load balancing problem using a hydro-dynamic approach, J. Parallel Distrib. Comput. 43 (1997) 139–146. [3] R. Diekmann, A. Frommer, B. Monien, Efficient schemes for nearest neighbor load balancing, Parallel Comput. 25 (7) (1999) 789–812. [4] Y.F. Hu, R.J. Blake, D.R. Emerson, An optimal migration algorithm for dynamic load balancing, Concurr. Pract. Exp. 10 (6) (1998) 467–483. [5] B. Ghosh, S. Muthukrishnan, M.H. Schultz, First- and second-order diffusive methods for rapid, coarse, distributed load balancing, Theory Comput. Syst. 31 (4) (1998) 331–354. [6] G. Golub, R. Varga, Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods, Numer. Math. 3 (1961) 147–156. [7] R. Elsässer, A. Frommer, B. Monien, R. Preis, Optimal and alternating direction load balancing schemes, in: Europar’99, in: Amestoy, et al. (Eds.), Lecture Notes in Comput. Sci., vol. 1685, 1999, pp. 280–290. [8] R. Elsässer, B. Monien, R. Preis, Diffusion schemes for load balancing on heterogeneous networks, Theory Comput. Syst. 35 (2002) 305–320. [9] T. Rotaru, H.H. Nägeli, Dynamic load balancing by diffusion in heterogeneous systems, J. Parallel Distrib. Comput. 64 (4) (2004) 481–497. [10] K.D. Devine, E.G. Boman, R.T. Heaphy, et al., New challenges in dynamic load balancing, Appl. Numer. Math. 52 (2) (2005) 133–152. [11] R. Sharma, P. Kanungo, Dynamic load balancing algorithm for heterogeneous multi-core processors cluster, in: Proceedings of the 2014 Fourth International Conference on Communication Systems and Network Technologies, CSNT2014, IEEE Computer Society, 2014, pp. 288–292. [12] K.A. Dimitrakopoulou, N.M. Missirlis, Optimal diffusion for load balancing in heterogeneous networks, in: Proceedings of the Parallel Processing and Applied Mathematics, Springer, Berlin, Heidelberg, 2014, pp. 214–223. [13] L. Reichel, The application of Leja points to Richardson iteration and polynomial preconditioning, Linear Algebra Appl. 154–156 (1991) 389–414. [14] A. Martínez, L. Bergamaschi, M. Caliari, M. Vianello, A massively parallel exponential integrator for advection–diffusion models, J. Comput. Appl. Math. 231 (1) (2009) 82–91.