Parameter-invariant models for load balancing on heterogeneous networks

Linear Algebra and its Applications 471 (2015) 369–382 Contents lists available at ScienceDirect Linear Algebra and its Applications www.elsevier.co...

Download PDF

663KB Sizes 2 Downloads 106 Views

Report

PDF Reader
Full Text

Linear Algebra and its Applications 471 (2015) 369–382

Contents lists available at ScienceDirect

Linear Algebra and its Applications www.elsevier.com/locate/laa

Parameter-invariant models for load balancing on heterogeneous networks ✩ Chenggui Zhao School of Information, Yunnan University of Finance and Economics, Kunming, 650221, China

a r t i c l e

i n f o

Article history: Received 19 July 2014 Accepted 4 January 2015 Available online 28 January 2015 Submitted by A. Frommer MSC: 05C50 05C90 68M07 Keywords: Parameter-invariant Iteration LB Laplacian

a b s t r a c t Several schemes for homogeneous networks have been generalized for load balancing (LB) on heterogeneous networks. However, LB parameters must be recalculated with the change of weight distribution among network nodes in known schemes and this increases the computational complexity and weakens numerical stability on the base of original schemes in which LB parameters are only decided by communication topology. The known load balancing strategies are redesigned to contain a diﬀusion matrix M ∗ with parameters independent of network heterogeneity. In the proposed strategies, LB parameters are only decided by network topology which is identical with the situation of homogeneous network. This leads to lower overhead of computing LB parameters and fewer LB steps than in the known schemes. Furthermore, the proposed schemes have little variation in the number of LB steps to distinct load distributions, so that they have an enhanced numerical stability. Theoretical and experimental results verify these improvements. © 2015 Elsevier Inc. All rights reserved.

✩ This paper is supported by the Applied Basic Research Project of Yunnan Province, China, under Grant No. 2010ZC095, and the Natural Science Foundation of Education Department of Yunnan Province, China, under Grant No. 2012Z064. E-mail address: [email protected].

http://dx.doi.org/10.1016/j.laa.2015.01.002 0024-3795/© 2015 Elsevier Inc. All rights reserved.

370

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

1. Introduction Load balancing (LB) is an important technology to improve the performance of parallel and distributed systems when highly uneven load distribution arises and system performance degrades evidently. LB algorithms are used to make nodes migrate their loads from weighty nodes to light ones. Generally, LB algorithms can be executed in two steps. Firstly, calculate the amount of load that should be migrated between a node and its neighbors for achieving the balanced status. Secondly, select load items and transfer them. Several signiﬁcant LB algorithms have been developed for general homogeneous networks. In the case of globally known node loads, R. Blake et al. [4] solved the load balancing issue by solving a linear system of equations. If nodes only know load information of their direct neighbors, the network must exchange loads among all linked nodes step by step to achieve the balanced status using local iterative LB algorithms that are divided into two types: diﬀusion with all-ports and dimension exchange with single-port communicating mode between a node and its neighbors in an LB step. Diﬀusion LB algorithms were developed by R. Diekmann et al. [3] and they also include a strict theoretical analysis of the accuracy and eﬃciency of the proposed algorithms. R. Elsässer et al. [7] presented a diﬀusion algorithm called optimal diﬀusion scheme (OPT) that uses all the nonzero Laplacian eigenvalues of the communication graph to construct the iteration polynomial which describes the LB process. The eigenvalues of the graph Laplacian are computed to build a polynomial iteration method and sorted to avoid possible trap of numerical instability. OPT performs fast when the graph spectrum is easy to calculate (as it is for some small graphs and regular graphs). The LB algorithms stated in [3] can only be applied to homogeneous networks (namely, all nodes with equal computation capability) since the node load is balanced to the average of the loads distributed in the whole network. However, many load balancing issues come from applications in heterogeneous environments where network nodes have diﬀerent processing capabilities. This requires LB algorithms for heterogeneous networks. R. Elsässer et al. [8] generalized several algorithms (including FOS [1], SOS [5], CHEBY [6] and OPT [7]) to be suitable for solving the LB problem in heterogeneous networks. Their schemes balance node loads to be proportional to their weights, and the balancing ﬂow is minimal with regard to the l2 -norm. The major alteration to the original algorithms is to replace the Laplacian matrix L with a generalized Laplacian LC −1 where C is the diagonal matrix whose diagonal entries are the node weights. So we shall refer to their algorithm on the base of an existing X scheme as X-C scheme. The quality of LB algorithms was explored in [8] and the quality was evaluated by two factors, the number of iterations and the amount of load migrated over links to achieve the balanced status. The convergence of the iteration is inﬂuenced by a key parameter α called diﬀusion parameter, which is decided by two values, the second smallest and the maximal eigenvalue of LC −1 . It is clear that these two values vary when the network heterogeneity changes. This means all eigenvalues of the whole network must be recalculated once node weights (even for just one node) are changed, which increases

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

371

the computational overhead and weakens numerical stability compared to the original X scheme. Like most LB schemes, the quality of the load ﬂow is measured by the l2 -norm and the balancing ﬂow of the schemes given in [8] is minimal with regard to the l2 -norm. Other signiﬁcant work in load balancing was already done in recent years. Rotaruon et al. [9] proposed a variant of generalized diﬀusion for a heterogeneous environment characterized by diﬀerent computing powers and uniform communication. They showed that their algorithm theoretically converges faster than the hydrodynamic algorithm [2]. Dimitrakopoulou et. al [12] also proposed an optimal diﬀusion policy for heterogeneous network and they developed the convergence theory of the local Extrapolated Diﬀusion (EDF) method for heterogeneous torus networks. Sharma and Kanungo [11] presented a dynamic load balancing algorithm for heterogeneous multi-core processor cluster. Devine et al. [10] provided a survey of dynamic load balancing technologies. This paper develops several strategies with LB parameters only related to network topology, described in a framework similar to [8]. The proposed schemes exhibit an enhancement in iterative eﬃciency and numerical stability to the X-C schemes in [8]. Simulation results on some familiar network topologies, under various initial conditions, demonstrate the improvement given by this work. We ﬁrstly present some basic deﬁnitions and notations for convenience to describe diﬀusion load balancing algorithms and then extend several diﬀusion algorithms to heterogeneous networks. At last, we analyze the performance of these schemes and present simulation results to show the viability of our approach. 2. Problem formulation and related work Let G(V, E) denote an undirected and connected graph with n nodes and m edges. Node vi in V has a weight ci and load wi . Let wi = (w1i , w2i , . . . , wni )T be a node loads vector in the i-th LB step and let c = (c1 , c2 , . . . , cn )T denote the node weight vector. Let u be the all-1 vector. The balanced load vector w∗ should satisfy the following formula w∗ =

1 uT w 0 c = T cuT w0 . T u c u c

(1)

Let A be the adjacency matrix and B be the node–edge incidence matrix of G, respectively. The Laplacian L of G is deﬁned as L = BB T and all distinct eigenvalues of L are sorted as 0 = λ1 < λ2 < . . . < λs . The weighted Laplacian Lw is deﬁned as T Lw = Bw Bw , where Bw = BF −1/2 and F denotes an m × m diagonal matrix with the edge weights of G as its diagonal entries, for dealing with the case of diﬀerent communication speed on network links (weighted on edges of G) (see [3] for more details of the weighted Laplacian). Let x = (x1 , x2 , . . . , xm )T be a vector with its i-th entry xi representing the ﬂow on the i-th edge of G. It is a balancing ﬂow on G if and only if Bx = w0 − w∗ . Such equation means that the diﬀerence between the initial load and the balanced load at each node is diﬀused by a ﬂow on edges incident on this node according to the ﬂow

372

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

Algorithm 1 [FOS-C] 1: for all edges e = (vi , vj ) of G do 2: yek−1 = α/fe (wik−1 /ci − wjk−1 /cj ); 3: xke = xk−1 + yek−1 ; e k−1 k 4: wi = wi − e=(vi ,vj )∈E yek−1 ; 5: end for

directions. The balanced status is achieved globally after undergoing a ﬂow xe along edge e of G. The l2 -norm of x is denoted as x2 . Local diﬀusion LB algorithms work using load information of neighboring nodes only for the reason of globally unknown load information. Load migration is executed only among adjacent nodes. If only the form of the iteration polynomial is evaluated, the simplest of these algorithms is the ﬁrst order scheme (FOS) presented by Cybenko [1]. Let y k and xk be two ﬂow vectors whose entries represent the current ﬂow and the total load ﬂow up to the k-th iteration along edges of G respectively. FOS can be expressed as the matrix iterative form wk = M wk−1 , and the matrix M = I − αL is called as the diﬀusion matrix for the homogenous network. The parameter α determines the speed of the iteration to the balanced load w∗. The optimal value of parameter α is α = 2/(λ2 (L) + λs (L)), which leads to the fastest rate of convergence to the balanced status. Algorithm 1 describes the generalized FOS [8] for heterogeneous networks. The main alteration of FOS-C is to deﬁne the diﬀusion matrix M as M = I − αLC −1 in which the optimal value of the parameter α is given as α = 2/(λ2 (LC −1 ) + λs (LC −1 )). This means that if the nodes weights distributed in the network are changed, the eigenvalues of LC −1 must be recalculated to obtain a renewed value α. Obviously, this is not necessary when the network is homogenous. Replacing L with LC −1 , several LB schemes SOS, CHEBY, and OPT were generalized in [8], indicated as SOS-C, CHEBY-C and OPT-C respectively in this article. A polynomial based load balancing scheme is any scheme for which the work load wk in step k can be expressed in the form wk = pk (M )w0 , where pk (x) is a polynomial with degree less than or equal to k satisfying p(1) = 1. When a polynomial based LB scheme is algorithmically feasible, the polynomial pk (x) must satisfy some kind of short recurrence relation so that wk can be computed from wk−1 . The performance of such a scheme will depend on the speed with which the error vector ek = wk − w∗ tends to zero. This paper reconstructs the polynomial based LB schemes FOS, SOS, CHEBY, and OPT by modifying the diﬀusion matrix M to M ∗ , which leads to a better convergence performance. 3. Parameter-invariant load balancing for heterogeneous networks In this section, we ﬁrstly construct the diﬀusion matrix M ∗ , which contains the LB parameter α independent of the weights matrix C. Then several existing algorithms, FOS, SOS, CHEBY and OPT, are generalized to heterogeneous networks and we shall

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

373

refer to these schemes based on M ∗ as X-R schemes since they imply a key matrix R contained in L∗ . 3.1. Parameter-invariant diﬀusion matrix M ∗ Let R denote the matrix deﬁned as R = I −(1/uT c)cuT where I is the identity matrix. We now construct a generalized Laplacian L∗ of G as L∗ = LR, and based on L∗ we build a parameter-invariant diﬀusion matrix M ∗ as M ∗ = I − αL∗ . In the following, we will show that M ∗ is suitable to construct a polynomial LB scheme, and that the diﬀusion parameter α will not vary with diﬀerent node weights. Lemma 1. The matrices L and L∗ have the same eigenvalues and eigenvectors, except the eigenvectors corresponding to the eigenvalue 0. Proof. Firstly, since Rc = 0 we have that c is an eigenvector of L∗ with eigenvalue 0. If z is an eigenvector of L other than u, belonging to some eigenvalue b, then z is orthogonal to u. So Rz = z so that L∗ z = Lz which demonstrates that z is also an eigenvector of L∗ , with the same eigenvalue b. 2 Lemma 2. Let {zi | 1 ≤ i ≤ n} be the linear independent eigenvectors corresponding to the increasingly ordered eigenvalues {λi | 0 = λ1 ≤ λ2 ≤ . . . ≤ λn } of L∗ respectively. Then n {zi | 1 ≤ i ≤ n} constitutes a basis of Rn . Let w0 = i=1 ai zi for some constants ai . Then the error vector ek satisﬁes e0 = a2 z2 + a3 z3 + . . . + an zn and ek = pk (M ∗ )e0 , k = 1, 2, . . . . Proof. Lemma 1 shows that z1 is an eigenvector of L∗ corresponding to the eigenvalue 0 and zi (2 ≤ i ≤ n) are also eigenvectors of L so {zi | 2 ≤ i ≤ n} is orthogonal to u. From L∗ z1 = L(I − (1/uT c)cuT )z1 = 0 we know that (I − (1/uT c)cuT )z1 = ku with k ∈ R. Since uT (I −(1/uT c)cuT )z1 = 0 = kuT u we have k = 0, and thus (1/uT c)cuT z1 = z1 . As zi when i ≥ 2 is perpendicular to u, we have w∗ = (1/uT c)cuT w0 = (1/uT c)cuT (a1 z1 + a2 z2 + . . . + an zn ) = a1 (1/uT c)cuT z1 = a1 z1 . The ﬁrst equation now follows from e0 = w0 − w∗ . Since p(1) = 1 and w∗ is an eigenvector of pk (M ∗ ) with eigenvalue 1, we have ek = wk − w∗ = pk (M ∗ )(w0 − w∗ ) = pk (M ∗ )e0 . 2 Besides the conclusions stated above, we also see that L∗ and L have nearly equal computation cost for matrix–vector multiplication, since L∗ w involves O(n2 + n) operations and Lw needs O(n2 ) for any vector w. As the main theoretical result, LB schemes based on L∗ have the same rate of convergence than the ones based on L, explained in the following theorem. Theorem 3. Any LB scheme with iteration polynomials wk = pk (M ∗ )w0 produces the same error ek as the one produced by wk = pk (M )w0 .

374

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

Proof. An LB scheme with wk = pk (M ∗ )w0 produces the error ek = pk (M ∗ )e0 = n n pk (M ∗ ) i=2 ai zi = i=2 pk (1 − λi )ai zi , which is equal to the error by the iteration wk = pk (M )w0 due to the common eigenvectors {zi | 2 ≤ i ≤ n} of L and L∗ . 2 3.2. Redesigning several existing load balancing schemes for heterogeneous environments The matrix M ∗ has eigenvalues μi = 1 − αλi (0 ≤ i ≤ s) which depend on the key parameter α. Deﬁne the parameter γ as γ = max{|μ2 |, |μm |} = max{|1 − αλ2 |, |1 − αλm |}. It has been proved in [3] that optimal convergence of the LB procedure is obtained when γ is minimal which is γ = (λ2 − λm )/(λ2 + λm ) and is achieved when α = 2/(λ2 +λm ). It follows from the previous theorems that M ∗ has the same eigenvalues as M , so that M ∗ and M have the same optimal LB parameter α. We redesign existing LB strategies by replacing the diﬀusion matrix M of the original schemes with M ∗ and employing the same parameters as in the original schemes based on M . We refer to these redesigned strategies as X-R and those presented in [8] as X-C, related to the original schemes X. We also prove that these redesigned schemes actually have the same convergence performance than the original ones. It is known from Theorem 3 that s ek = i=2 pk (μi )ai zi and k e ≤ max pk (μi ), 2 ≤ i ≤ s e0 . 2 2

(2)

3.2.1. First order scheme (FOS-R) The earliest local LB algorithm is the ﬁrst order scheme (FOS) [1,5]. FOS takes the iteration polynomial as pk (t) = tk . Replacing the original diﬀusion matrix M with M ∗ = I − αL∗ , FOS can be redesigned as FOS-R resulting in wk = M ∗ wk−1 (k = 1, 2, . . .). Considering |pk (μi )| = μki ≤ γ k for 2 ≤ i ≤ s, by Eq. (2), the error estimation k e ≤ γ k e0 2 2

(3)

holds. 3.2.2. Second order scheme (SOS-R) The second order scheme (SOS) was introduced in [5], which uses the overrelaxed scheme to obtain wk as w1 = M ∗ w0 , wk = βM ∗ wk−1 + (1 − β)wk−2 , k ≥ 2. The iterative polynomials pk (t) are deﬁned according to p0 (t) = 1, p1 (t) = t, pk (t) = βtpk−1 (t) + (1 − β)pk−2 (t). It is known [6] that wk converges to w∗ whenever 0 < β < 2 and the optimal β is βo = 2/(1 + (1 − γ)1/2 ). Combining the known fact that max{|pk (μi )|, 2 ≤ i ≤ s} = (βo − 1)k/2 (1 + k(1 − γ)1/2 ) (see [6] and [3]) and Eq. (2), we have k e ≤ (βo − 1)k/2 1 + k(1 − γ)1/2 e0 . 2 2

(4)

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

375

Algorithm 2 [OPT-R] for k from 1 to s − 1 do wk = (I − 1/λk+1 L∗ )wk−1 ; y k = 1/λk+1 B T Rwk−1 ; xk = xk−1 + y k ; end for 3.2.3. Optimal polynomial scheme (OPT-R) OPT [3,7] has an upper bound of s − 1 iterations to achieve the balanced status, and it is optimal relative to the aforesaid diﬀusion schemes for which usually need more than s − 1 iterations. But OPT requires to compute all distinct eigenvalues of L and it has a possible trap of numerical instability. So OPT is more suitable for regular networks since their Laplacian eigenvalues are easy to calculate. In [7], it was shown how to avoid possible trap of numerical instability. OPT can be redesigned as OPT-R as given by Algorithm 2. It should be noted that OPT-R uses the same LB parameter λk+1 as OPT. According to Theorem 3, the error ek satisﬁes es−1 =0. The ﬂow xs−1 satisﬁes Bxs−1 = w0 − w∗ and becomes balanced. Clearly, if the eigenvalues of the Laplacian are easily available, there is an advantage of OPT over SOS, CHEBY, and FOS. 4. Flow analysis The ﬂows produced by general LB schemes for homogeneous networks are minimal with regard to the l2 -norm. The basic results on this topic were shown in [3]. In this section, we will analyze the ﬂow x produced by the X-R strategies and show that it is also l2 -minimal. Lemma 4. (See [3].) Let G be connected and L be the Laplacian of G. The equation Lz = b has a solution if and only if b belongs the orthogonal complement of the space spanned by u. Lemma 5. (See [3].) Provided that b is orthogonal to u, the solution of minimizing x2 over all x with Bx = b is given by x = B T z, where z satisﬁes Lz = b. As the vector w0 − w∗ is orthogonal to u, Lemma 5 shows that the equation Lz = w0 − w∗ has a solution z so that the ﬂow deﬁned by x = B T z is an l2 -minimal ﬂow and satisﬁes Bx = w0 − w∗ . The next lemma shows that if a sequence of work loads converges to the average load, and if there is an l2 -minimal ﬂow for each such load, then these minimal ﬂows converge to the minimal ﬂow for the average load. Lemma 6. (See [3].) Let wk be a sequence of work loads which converges to the average load w∗ with the minimal ﬂow xk (see Lemma 5) deﬁned by Bxk = w∗ − w0 having the form xk = B T z k , where Lz k = wk − w0 . Then there is a ﬂow x∗ as the limit of

376

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

sequence xk . The ﬂow x∗ is l2 -minimal and satisﬁes Bx∗ = w∗ − w0 , thus becoming an average ﬂow. Now, Theorem 7 shows that our schemes result in an l2 -minimal ﬂow. Theorem 7. Let the polynomial pk (t) be deﬁned by the recurrence formula pk (t) = (πk t − τk )pk−1 (t) + θk pk−2 (t), with θ0 = 0 and πk + τk + θk = 1. For k ≥ 1, let b0 = −απ1 Rw0 ; x1 = y 0 = B T b0 ; w1 = w0 + By 0 . For k ≥ 2, let bk−1 = −απk Rwk−1 − θk bk−2 ; y k−1 = B T bk−1 ; xk = xk−1 + y k−1 ; wk = wk−1 + By k−1 be the update process for xk and wk . Then for wk = pk (I − αLR)w0 , we have limk→∞ wk = w∗ , limk→∞ xk = x∗ , and x∗ is the l2 -minimal ﬂow which satisﬁes Bx∗ = w∗ − w0 . Proof. This theorem can be proved by substituting the matrix R for C −1 in Theorem 1 in [8], without any further changes. 2 5. Simulation results This section includes some simulation results to illustrate the performance of the X-R schemes, executed on some familiar communication architectures including the one-dimensional 64 nodes Path, the two-dimensional Mesh 8 × 8 and the six-dimensional Cube(6) networks (for short, denoted with Path, Mesh and Cube, respectively, in the sequel). Each of them consists of 64 nodes. First, we evaluate the eﬃciency and stability of X-R on various combinations of node loads and weights on each of the topologies. Second, we compare X-R with X-C on Mesh 8 × 8 and Cube(6). For these two purposes we built a simulator to generate topologies and conﬁgurations, described in the following. 5.1. Load balancing results with X-R schemes To emphasize the heterogeneity of networks, diﬀerent weights are assigned to network nodes in three diﬀerent ways. Considering the inﬂuence of initial loads, we did experiments with highly unbalanced as well as randomly distributed initial loads. In addition, the Laplacian eigenvalues included in the OPT-X schemes are reordered as the Leja points to improve the eﬃciency and stability of the numerical simulation process. The details of these initial conditions can be described as follows. Node weight assignment: HOMO: All nodes have identical weight 1. This simulates a homogeneous network, as a special case of heterogeneous network with C = I. SEMI: Half of nodes have a weight 1 and others 2. This simulates a heterogeneous network consisting of two sorts of subnetwork. CS: One node has a high weight n+1 and the others 1. This simulates a heterogeneous network with the traditional client–server model which serves as an important network model.

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

377

Table 1 The LB parameters and the number s of distinct eigenvalues. Network

Weight

s

λ2

Path

HOMO SEMI CS

64 64 64

0.0024

HOMO SEMI CS

33 33 33

HOMO SEMI CS

7 7 7

Mesh

Cube

λm

αo

βo

γ

3.9976

0.5000

1.9065

0.9988

0.1522

7.6955

0.2549

1.5676

0.9612

2.0000

12.0000

0.1429

1.1766

0.7143

Node initial load distribution: PEAK: One node has a massive load of 100n and all the others have load 0; RAN: The total load amount of 100n is uniformly randomly distributed among n nodes. Eigenvalues order: Initially, all the distinct Laplacian eigenvalues are increasingly sorted as 0 = λ1 < λ2 < . . . < λs . Further, these eigenvalues are rearranged as the Leja points order [13,14], i−1 according to λ1 = λ1 = 0, λ2 = λs , λi (3 ≤ i ≤ s) = arg maxλ∈Ki { j=2 |1 − λ/λj |λ}, where K3 = {λi | 2 ≤ i ≤ s − 1}, Ki (4 ≤ i ≤ s) = Ki−1 \{λi−1 }. Unless otherwise speciﬁed, we always preprocess the eigenvalues in the OPT-X schemes in this way. It can be seen from Table 1 that our schemes compute the LB parameters independently of the network heterogeneity, so only eigenvalues of the ordinary Laplacian L are necessary to implement the iteration process. This is in contrast to [8], where LB parameters were obtained using the distinct eigenvalues of LC −1 that vary with the network heterogeneity. Generally, the distinct eigenvalues of LC −1 with node weights SEMI and CS are more diﬃcult to get than the ones with HOMO. For each of the networks, we conduct experiments with FOS-R, SOS-R and OPT-R. For preventing OPT from numerical instability, the eigenvalues of the Laplacian are ordered as the Leja points. Note that weights on the edges have not been considered since generally current networks have large bandwidth which means that the ﬂows on links will not be limited. Table 2 demonstrates that our schemes have the same property of l2-minimal ﬂow as the schemes in [8], despite of possibly diﬀerent ﬂows in the i-th iteration. The last six columns list the number of iterations for achieving the balanced status. Table 2 also shows that our schemes reduce the number of iterations remarkably on each experimental network as compared to the existing generalizations in [8], and this number is not sensitive to the node weights and initial loads. For instance, in the case of PEAK load and CS weight, FOS-C needs about 24 153 iterations on Path and 1945 on Mesh to converge to the balanced status but only 9077 and 285 iterations are required, respectively, by the FOS-R scheme in the same case. The convergence speed does not show an evident variation with the diﬀerent weights and loads distributions which diﬀers from [8] where

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

378

Table 2 The ﬂow and the number of iterations for X-R and X-C schemes, compared under diﬀerent combinations of load and weight distributions. Network

Load

Weight

Flow

Iterations FOS-R

SOS-C

SOS-R

OPT-C

OPT-R

PEAK

HOMO SEMI CS

29 213 25 676 14 606

9655 13 092 24 153

9655 9459 9077

294 340 476

294 289 279

63 63 63

63 63 63

RAN

HOMO SEMI CS

27 516 23 579 13 796

5831 11 559 24 174

7632 8577 9147

194 305 473

237 263 281

63 63 63

63 63 63

PEAK

HOMO SEMI CS

6849 6624 3424

303 470 1945

303 300 285

52 65 136

52 52 50

27 40 35

27 27 26

RAN

HOMO SEMI CS

326 778 2388

189 339 1946

225 261 287

35 49 136

45 47 50

27 39 38

27 26 26

PEAK

HOMO SEMI CS

2844 2812 1422

37 56 497

37 37 35

18 22 69

18 18 17

6 11 11

6 6 6

RAN

HOMO SEMI CS

195 276 1196

23 38 497

30 32 35

12 16 69

15 15 17

6 11 11

6 6 6

FOS-C Path

Mesh

Cube

Fig. 1. Load PEAK is balanced on Mesh 8 × 8 network by FOS-R (a) and OPT-R (b) respectively, with three distributions of node weights.

dramatically unbalanced initial loads and weights make the algorithms take quite more time to balance than those for relatively evenly distributed initial states. Fig. 1 and Fig. 2 illustrate the relation between the errors and the number of iterations by the OPT-R and FOS-R schemes on the Mesh network with weight models HOMO, SEMI, and CS, Fig. 1 with PEAK load distribution and Fig. 2 with RAN, respectively. On the Mesh network we achieve the balanced state using FOS-R after only about

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

379

Fig. 2. Load RAN is balanced on Mesh 8 × 8 network by FOS-R (a) and OPT-R (b) respectively, with three distributions of node weights.

300 iterations in either case for the initial loads and weights distributions. On the other hand, FOS-R has an apparent increase relative to OPT-R in the number of iterations. 5.2. Comparison of simulation results for X-R with X-C We compare the X-R schemes in terms of the number of iterations until convergence and stability with the X-C schemes presented in [8]. Only two communication topologies, Mesh 8 × 8 and Cube(6), are considered. In Fig. 3, PEAK and RAN load models are balanced for Mesh 8 × 8 under weight models SEMI and CS with FOS-R and FOS-C. Fig. 3 very clear illustrates the fact that FOS-R converges faster than FOS-C, with a rapid and monotone error decrease. As an observation, to balance the RAN load under the CS weight situation, FOS-R requires 287 iterations with maximal error 2081 whereas FOS-C requires 1946 iterations with maximal error 3173. In Fig. 4, the loads PEAK and RAN are balanced on Mesh 8 × 8 under weight models SEMI and CS with OPT-R and OPT-C. It is apparent that OPT-R needs less iterative steps than OPT-C in most cases. For instance, in the case of SEMI weight, OPT-R takes 27 iterations with maximal error 6321 but OPT-C takes 40 iterations with maximal error 14 051 to drive the PEAK load to the balanced status; and OPT-R takes 26 iterations with maximal error 4387 but OPT-C takes 39 iterations with maximal error 5318 to drive the RAN load to the balanced status. Nevertheless, in the case of CS weights, whatever initial node load, OPT-R needs less iterations than OPT-C and has a similar stability. Only taking PEAK into account, OPT-R requires 26 iterations with maximal error 3174 and OPT-C requires 35 iterations with maximal error 3174 to achieve the balanced status.

380

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

Fig. 3. Load PEAK (a) and RAN (b) are balanced on Mesh 8 × 8 by FOS-R and FOS-C with weight models SEMI and CS.

Fig. 4. Load PEAK (a) and RAN (b) are balanced on Mesh 8 × 8 by OPT-R and OPT-C with weight models SEMI and CS.

In the case of a communication topology of the type Cube(6) network, we also conduct the experiments with all kinds of load and weight distributions with X-R and X-C policies. It can be observed that FOS-R and OPT-R perform better than FOS-C and OPT-C in almost all settings, as Fig. 5 and Fig. 6 illustrate. The diﬀerence is remarkable when FOS-R and FOS-C are compared and slight in the case of OPT-R and OPT-C. It should be noted that sometimes OPT-R performs even worse than OPT-C on Mesh

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

381

Fig. 5. Load PEAK (a) and RAN (b) are balanced on Cube(6) by FOS-R and FOS-C with weight models SEMI and CS.

Fig. 6. Load PEAK (a) and RAN (b) are balanced on Cube(6) by OPT-R and OPT-C with weight models SEMI and CS.

but always better on Cube, which means that the complex spectrum structure of the communication topology might weaken the advantage of the proposed OPT-R scheme. 6. Conclusion In this paper, we contribute several diﬀusion schemes with a new diﬀusion matrix for load balancing on heterogeneous networks. Theoretical and practical results prove that

382

C. Zhao / Linear Algebra and its Applications 471 (2015) 369–382

the proposed schemes evidently improve the iteration eﬃciency and numerical stability. In the future, these schemes should be applied to more distributed environments and it should also be considered to extend them to large scale environments, like multistage and OTIS interconnection networks. References [1] G. Cybenko, Load balancing for distributed memory multiprocessors, J. Parallel Distrib. Comput. 7 (1989) 279–301. [2] C.C. Hui, S.T. Chanson, Theoretical analysis of the heterogeneous dynamic load balancing problem using a hydro-dynamic approach, J. Parallel Distrib. Comput. 43 (1997) 139–146. [3] R. Diekmann, A. Frommer, B. Monien, Eﬃcient schemes for nearest neighbor load balancing, Parallel Comput. 25 (7) (1999) 789–812. [4] Y.F. Hu, R.J. Blake, D.R. Emerson, An optimal migration algorithm for dynamic load balancing, Concurr. Pract. Exp. 10 (6) (1998) 467–483. [5] B. Ghosh, S. Muthukrishnan, M.H. Schultz, First- and second-order diﬀusive methods for rapid, coarse, distributed load balancing, Theory Comput. Syst. 31 (4) (1998) 331–354. [6] G. Golub, R. Varga, Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods, Numer. Math. 3 (1961) 147–156. [7] R. Elsässer, A. Frommer, B. Monien, R. Preis, Optimal and alternating direction load balancing schemes, in: Europar’99, in: Amestoy, et al. (Eds.), Lecture Notes in Comput. Sci., vol. 1685, 1999, pp. 280–290. [8] R. Elsässer, B. Monien, R. Preis, Diﬀusion schemes for load balancing on heterogeneous networks, Theory Comput. Syst. 35 (2002) 305–320. [9] T. Rotaru, H.H. Nägeli, Dynamic load balancing by diﬀusion in heterogeneous systems, J. Parallel Distrib. Comput. 64 (4) (2004) 481–497. [10] K.D. Devine, E.G. Boman, R.T. Heaphy, et al., New challenges in dynamic load balancing, Appl. Numer. Math. 52 (2) (2005) 133–152. [11] R. Sharma, P. Kanungo, Dynamic load balancing algorithm for heterogeneous multi-core processors cluster, in: Proceedings of the 2014 Fourth International Conference on Communication Systems and Network Technologies, CSNT2014, IEEE Computer Society, 2014, pp. 288–292. [12] K.A. Dimitrakopoulou, N.M. Missirlis, Optimal diﬀusion for load balancing in heterogeneous networks, in: Proceedings of the Parallel Processing and Applied Mathematics, Springer, Berlin, Heidelberg, 2014, pp. 214–223. [13] L. Reichel, The application of Leja points to Richardson iteration and polynomial preconditioning, Linear Algebra Appl. 154–156 (1991) 389–414. [14] A. Martínez, L. Bergamaschi, M. Caliari, M. Vianello, A massively parallel exponential integrator for advection–diﬀusion models, J. Comput. Appl. Math. 231 (1) (2009) 82–91.

Parameter-invariant models for load balancing on heterogeneous networks

Parameter-invariant models for load balancing on heterogeneous networks

Recommend Documents