European Journal of Operational Research 147 (2003) 430–447 www.elsevier.com/locate/dsw
Computing, Artificial Intelligence and Information Technology
Optimal task allocation and hardware redundancy policies in distributed computing systems Chung-Chi Hsieh
*
Department of Industrial Management Science, National Cheng Kung University, 1 University Road, Tainan 70101, Taiwan, ROC Received 5 June 2001; accepted 27 May 2002
Abstract A distributed computing system (DCS) in general consists of processing nodes, communication channels, and tasks. Achieving a reliable DCS thus comprises three parts: the realization of reliable task processing, reliable communication among processing nodes, and a good task allocation strategy. In this study, we examine the relationship between system cost and system reliability in a cycle-free hardware-redundant DCS where multiple processors are available at each processing node and multiple communication links are available at each communication channel. Intuitively, higher hardware redundancy leads to higher system reliability which results in the reduction of communication cost. Such an endowment of hardware redundancy, however, incurs higher hardware operating cost. A unified model of system cost is therefore developed in this study that is a complex function of task allocation and hardware redundancy policies, and a hybrid genetic algorithm (HGA) based on genetic algorithms and a local search procedure is proposed to seek the optimal task allocation and hardware redundancy policies. The proposed algorithm is tested on randomly generated DCSs and compared with a simple genetic algorithm (SGA). The simulation results show that the HGA gives higher solution quality in less computational time than the SGA. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Distributed computing system; Hardware redundancy; Genetic algorithms; System reliability
1. Introduction Distributed computing systems (DCSs) have played an important role in computer networking [17,25]. An DCS consists of processing nodes, communication channels, each connecting a pair of processing nodes, and tasks, such as programs and software, allocated in the DCS. An DCS is said to be hardware redundant if at least one processing node has multiple processors or at least each communication channel has multiple communication links.
*
Fax: +886-6-2362162. E-mail address:
[email protected] (C.-C. Hsieh).
0377-2217/03/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 7 - 2 2 1 7 ( 0 2 ) 0 0 4 5 6 - 3
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
431
For an DCS to be reliable, a task, consisting of several modules shall run successfully on the DCS during the mission of task execution. Thus, the system reliability of an DCS can be defined as the probability of the successful execution of a task assigned to the processing nodes of the DCS by some task allocation strategy during the mission [22]. As such, achieving a reliable DCS comprises three parts: the realization of reliable task processing, reliable communication channels, and a good task allocation strategy. Optimal allocation of tasks to the processing nodes of an DCS such that the system reliability is maximized is investigated in [19,23]. When redundancy is taken into account, higher system reliability can be achieved by deploying good task-allocation strategy and increasing either software redundancy [4,13,20,24] or hardware redundancy [11,12,21,22,26]. In the studies considering software redundancy, multiple copies of a single resource are considered in [4] and the reliability problem of an DCS is formulated which aims to reduce the total communication cost. An algorithm based on the Lagrangian relaxation and subgradient methods is then deployed to obtain the optimal resource allocation policy. Schloss and Stonebraker [20] consider data redundancy in a distributed database system. The objective is to seek the minimal data replication while retaining high system reliability. In [24], replicated objects are optimally allocated to achieve system reliability in distributed database systems with ring topology. A heuristic approach based on genetic algorithms to allocate files optimally in distributed systems is proposed in [13]. Due to the complexity of the reliability problem for general software-redundant DCSs, reliability evaluation itself is often computationally expensive. Thus, many studies have proposed efficient algorithms for evaluating the reliability of DCSs of different topologies [1–3,15,16,27]. In the context of hardware redundancy, the studies [21,22] consider hardware redundancy for maximizing the system reliability, and present task-allocation models under different hardware redundancy levels. Due to the exponential nature of computations, the problem is solved using the state-space searchtree algorithm. An improved approach based on a branch-and-bound algorithm is proposed in [11,12]. Verma and Tamhankar [26] adopt the task-allocation models in [22] and propose a branch-and-bound algorithm for solving the multiple-join problem in distributed database systems. In these studies considering hardware redundancy, however, the hardware redundancy levels are given constants rather than decision variables. Thus, the amount of hardware redundancy at processing nodes and communication channels remains unanswered. In this paper, we focus on acyclic hardware-redundant DCSs and examine the scenario in which the decision maker, based upon his/her expertise, is able to provide ex ante the information regarding the ranges of hardware redundancy levels at processing nodes and communication channels, which can be incorporated into task execution. We aim to develop the optimal task allocation and hardware redundancy policies to make the best of available processors and communication links. Clearly, the case of no hardware redundancy and the case of fixed hardware redundancy levels in the forgoing studies are special cases in our study. By allowing various hardware redundancy levels at processing nodes and communication links, the DCS becomes more flexible and more cost-effective. Intuitively, higher hardware redundancy levels lead to higher system reliability, which results in the reduction of communication cost. However, such an endowment of hardware redundancy incurs higher hardware operating cost. Therefore, a unified model of system cost is proposed which takes into account the system reliability as well as the hardware redundancy levels. Our objective is to determine the optimal task allocation strategy and the optimal hardware redundancy level for each processing node and communication channel, subject to the availability of hardware redundancy levels, such that system cost is minimized. As this cost minimization problem is highly combinatorial, a hybrid genetic algorithm (HGA) based on genetic algorithm and a local search procedure is developed to seek the optimal task allocation and hardware redundancy policies. Enabled by the convexity of system cost with respect to the hardware redundancy levels, the local search procedure incorporated in the HGA searches for the local optimum for a given task allocation, and the HGA performs genetic search over the subspace of task allocations. The HGA is tested on 40 randomly generated DCSs of different sizes and compared with a simple genetic algorithm (SGA), which directly searches over
432
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
the space of task allocation and hardware redundancy levels. The simulation results show that the HGA gives higher solution quality with less computational time than the SGA. The remainder of the paper is organized as follows. Section 2 presents the reliability analysis for an acyclic hardware-redundant distributed computing system. In Section 3, a unified cost model for a hardware-redundant DCS is developed. In Sections 4 and 5, a simple genetic algorithm and the proposed hybrid genetic algorithm are developed respectively to obtain the optimal task allocation and hardware redundancy policies. Simulation results for both algorithms are illustrated, compared, and discussed in Section 6. Section 7 concludes with a brief summary.
2. Reliability analysis In this section, we discuss the reliability of a distributed computing system in which a given task is performed. We assume a cycle-free DCS and constant failure rates for processors and communication links. The system reliability of an DCS for a given task is defined as the probability that the task, assigned to the processing nodes of the DCS by some task allocation, runs successfully during the mission of task execution [22]. That is, the system reliability of an DCS is the product of the probability that each processing node designated to process some module(s) of the task is operational during the period of module execution, and the probability that each communication path between two module-assigned processing nodes is operational during the period of intermodule communication [5]. In other words, this is a series system. In what follows, we present notations used in this paper and the system reliability model. Acronyms DCS IMC AET GA
distributed computing system intermodule communication accumulative execution time genetic algorithm
Notation P N Pk rkp pk kpk cpk L L Lk rkl lk klk wk clk lk T M
set of processing nodes in an DCS number of processing nodes (equivalent to the DCS size) kth processing node in P redundancy level (or number of processors) at Pk processor at Pk failure rate of pk unit deployment cost of pk set of communication channels in an DCS number of communication channels kth communication channel in L redundancy level (or number of communication links) at Lk communication link at Lk failure rate of l k transmission rate of l k unit deployment cost of l k communication cost per unit time of l k task to be processed on an DCS number of modules of T
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
433
ith module in T communication path between Pi and Pj 0, if l k is on Pij ; 0, otherwise data required to be transmitted between mi and mj through l k p hardware redundancy vector ½r1 rLþN where rk ¼ rkl for k ¼ 1::L, and rk ¼ rkL for k ¼ L::ðL þ N Þ r vector ½r1 rLþN denoting the minimal hardware redundancy levels for processing nodes and communication channels r vector ½r1 rLþN denoting the maximal hardware redundancy levels for processing nodes and communication channels c fixed cost X M N task allocation matrix ½xik a index set fai ; i ¼ 1::Mg where xiai ¼ 1 E M N AET matrix ½eik W M M IMC matrix ½wij tkp total execution time of processor pk during the mission Rpk ðXÞ reliability of pk during the mission tkl total execution time of link l k during the mission Rlk ðXÞ reliability of l k during the mission RðXÞ system reliability of an DCS with no hardware redundancy RðX; rÞ system reliability of an DCS with hardware redundancy r zðX; rÞ system cost of an DCS with hardware redundancy r Ci ith chromosome in the current population pop_size population size max_gen maximal number of generations nc number of crossover points pc crossover probability pm mutation probability mi Pij dðk; i; jÞ sðk; i; jÞ r
A cycle-free DCS, in the form of tree, consists of processing nodes and bi-directional communication links. Denote by P ¼ fPk ; k ¼ 1::N g the set of processing nodes in an DCS, where N is the number of processing nodes, which also represents the size of the DCS. Each processing node Pk 2 P has rkp P 1 identical processors. 1 Let pk denote a processor at the processing node Pk . A simple DCS which has eight processing nodes is shown in Fig. 1. Let L ¼ fLk ; k ¼ 1::Lg be the set of communication channels among the processing nodes, where L is the number of communication channels. 2 Note that L ¼ ðN 1Þ because the DCS is cycle-free. Each communication channel Lk consists of rkl P 1 identical communication links. Let l k be a communication link of Lk with the transmission rate wk . Further, we let Pij be the communication path between two arbitrary processing nodes Pi and Pj , and define an indicator function dðk; i; jÞ ¼ 1 if l k is on Pij ; 0, otherwise. Finally, we define a hardware redundancy vector r ¼ ½r1 rk rLþN of size ðL þ N Þ to represent the hardware redundancy levels for communication channels and processing nodes, p where rk ¼ rkl for k ¼ 1::L, and rk ¼ rkL for k ¼ ðL þ 1Þ::ðL þ N Þ. Consider a task of M modules T ¼ fmi ; i ¼ 1::Mg to be executed on an DCS, where mi is the ith module of T. The task T is assigned to the processing nodes P according to an M N task allocation matrix
1
We assume active redundancy. That is, if a module of a task is assigned to a processing node, every processor at this processing node will process the module simultaneously [7]. 2 If there is a communication channel between two processing nodes, these two processing nodes are said to be adjacent to each other.
434
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
Fig. 1. A cycle-free distributed computing system of 8 processing nodes.
X ¼ ½xik , where entry xik ¼ 1 if module mi 2 T is assigned to the processing node Pk 2 P; xik ¼ 0, otherwise. 3 Because we do not consider software redundancy, there are exactly M entries of 1Õs in X. We therefore define a set a to represent the assigned processing nodes: a ¼ fa1 ; . . . ; aM g where ai denotes the index of a processing node to which the ith module is assigned, i.e. xiai ¼ 1. The notion a, together with the hardware redundancy vector r, will facilitate the discussion later in Sections 4 and 5, when determining the optimal task allocation and hardware redundancy policies. Since processors at different processing nodes may have different processing speeds in general, the accumulative execution times (AETs) of a module running at different processing nodes may be different [5]. Let E ¼ ½eik denote the AET matrix, in which entry eik is the AET that module mi 2 T takes to run at processor pk . If mi cannot be executed by pk , eik is 1. Thus, in the simulation we let such an AET take on a large integral value. Once T is assigned to the DCS, intermodule communication in terms of data quantity may be required between two modules during the mission. Let W ¼ ½wij denote the M M IMC matrix among modules where entry wij represents the data amount of IMC between modules mi and mj . 2.1. Reliability evaluation of processors and communication links When the task T is assigned to an DCS according execution time, tkp , PMto some task allocation X, the total p p of processor pk of Pk during the mission is tk ¼ i¼1 xik eik . Thus, the reliability, Rk ðXÞ, of pk during the mission is PM p p p Rpk ðXÞ ¼ ekk tk ¼ ekk i¼1 xik eik ð1Þ no module is assigned to the processing node Pk , i.e. where kpk is the constant failure rate of pk . Notice that if p Pk is idle, xik ¼ 0 for i ¼ 1::M, and thus Rpk ðXÞ ¼ ekk 0 ¼ 1. The communication time, tkl , on communication link l k is M 1 X M X tkl ¼ sðk; i; jÞ=wk i¼1 j¼iþ1
where sðk; i; jÞ is the amount of data required to be transmitted between modules mi and mj through l k , i.e. sðk; i; jÞ ¼
N X N X
dðk; a; bÞxia xjb wij
a¼1 b6¼a
3
The processing nodes are called active if they process some modules; idle, otherwise.
ð2Þ
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
And, the reliability, Rlk ðXÞ, of l k during the mission is given by !! M 1 X M X l l klk tkl Rk ðXÞ ¼ e ¼ exp kk sðk; i; jÞ=wk
435
ð3Þ
i¼1 j¼iþ1
where klk is the constant failure rate of l k . 2.2. Reliability evaluation of an DCS For a distributed computing system with no hardware redundancy, i.e. rkl ¼ 1 at each communication channel Lk and rkp ¼ 1 at each processing node Pk , the system reliability for a given task T under task allocation X, following (1) and (3), is RðXÞ ¼
L Y
Rlk ðXÞ
k¼1
N Y
Rpk ðXÞ
ð4Þ
k¼1
For an DCS with hardware redundancy, on the other hand, the system reliability is a generalization of (4): RðX; rÞ ¼
L Y
½1 ð1 Rlk ðXÞÞrk
k¼1
N Y
½1 ð1 Rpk ðXÞÞrLþk
ð5Þ
k¼1
3. Unified cost model In this section, we develop a cost model for an DCS in which the task allocation and hardware redundancy policies are to be determined. For a given hardware redundancy policy, the system reliability can be maximized by finding an optimal task allocation strategy. With additional hardware redundancy at active processing nodes, the value of the system reliability will be increased, hence reducing average communication cost in the long run. Nevertheless, additional hardware redundancy inevitably increases system operating cost. In this study we assume the system cost of an DCS can be represented by the cost of hardware deployment and the cost of communication during the task execution. The cost, f ðrÞ, due to the hardware redundancy of processing nodes and communication channels takes a linear form f ðrÞ ¼
L X k¼1
rk clk þ
LþN X
rk cpkL þ c
ð6Þ
k¼Lþ1
where clk is the unit deployment cost of each communication link l k , cpk the unit deployment cost of each processor pk , and c is a fixed cost. During the mission of processing the task, communication among the processing nodes takes place, which incurs communication time and hence cost. When the task is successfully executed, the communication cost is linearly proportional to the total execution time: " # L M 1 X M X X sðk; i; jÞ gðXÞ ¼ lk wk k¼1 i¼1 j¼iþ1 where lk is the communication cost per unit time of link l k and sðk; i; jÞ follows (2). Yet, due to the unreliability of an DCS, the average communication cost in the long run, follows the geometric probability distribution:
436
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
" # L M 1 X M X X gðXÞ sðk; i; jÞ ¼ hðX; rÞ ¼ l RðX; rÞ k¼1 k i¼1 j¼iþ1 wk RðX; rÞ
ð7Þ
Therefore, the system cost, zðX; rÞ, a function of the task allocation policy and the hardware redundancy policy, is given by zðX; rÞ ¼ f ðrÞ þ hðX; rÞ
ð8Þ
where f ðrÞ and hðX; rÞ follow (6) and (7) respectively. Thus, the objective is to find the optimal task allocation and hardware redundancy policies such that the system cost zðX; rÞ in (8) is minimized: min X;r
s:t:
ð9Þ
zðX; rÞ N X
xik ¼ 1;
i ¼ 1; 2; . . . ; M
ð10Þ
xik 6 X;
k ¼ 1; 2; . . . ; N
ð11Þ
k¼1 M X i¼1
r6r6r
ð12Þ
where X is the number of modules allowed to be executed on a processing node, r and r are the minimal and maximal hardware redundancy levels respectively, each of size ðL þ N Þ. As the problem of minimizing (9) is a complicated combinatorial optimization problem, heuristic approaches based on genetic algorithms are developed in the next two sections to obtain the optimal task allocation and hardware redundancy policies.
4. Simple genetic algorithm Genetic algorithms (GAs), pioneered by Holland [9], are global search heuristics which, based on the mechanics of natural selection and natural genetics, are aimed for solving complex optimization and combinatorial optimization problems [6,8,9,18]. An GA initializes and maintains a population of potential solutions, called chromosomes, in the problem. These chromosomes are selected, according to their fitness values, and made to evolve, using genetic operators such as crossover and mutation, to form a new population. This process is repeated, in hope that desirable characteristics can be passed from generation to generation, until the stopping criteria are met. The general procedure of a simple genetic algorithm (SGA) is as follows: Step 1. Initialize a population of chromosomes. Step 2. Evaluate and select chromosomes in terms of their fitness values. Step 3. If the maximal number of iterations, max_gen, is reached or the chromosomes converge, terminate the iteration and return the best chromosome; otherwise, go to Step 4. Step 4. Generate a new population by deploying genetic operators to the current chromosomes, and repeat Step 2. We now describe the procedure of using the SGA to solve (9).
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
437
4.1. A genetic representation of potential solutions For the problem formulation (9), a potential solution is composed of ðM þ L þ N Þ integral variables: allocation of M modules of a task, i.e. a ¼ ½a1 aM , and ðL þ N Þ hardware redundancy levels to be determined, i.e. r ¼ ½r1 rLþN . These variables are encoded in GAs using binary strings. The size of the binary string for each module depends upon the number of processing nodes in an DCS. The conversion between the binary string for mi and the index, ai , of the processing node is given by ai ¼ 1 þ
d1 X
bj 2j ;
bj 2 f0; 1g
ð13Þ
j¼0
where d ¼ dlog2 N e is the number of bits, and hbd1 bj b0 i denotes the binary string. The size of the binary string for hardware redundancy levels, on the other hand, is subject to the ranges of hardware redundancy levels that the decision maker provides. That is, given the minimal hardware redundancy vector r and the maximal hardware redundancy vector r, the conversion between the binary string and the hardware redundancy level ri 2 r is ri ¼ r i þ
di 1 X
bj 2j ;
bj 2 f0; 1g
ð14Þ
j¼0
where ½ri ; ri ; ri 2 r; ri 2 r, is the predetermined interval on which ri can take, and di ¼ log2 dri ri þ 1e is the number of bits. Therefore, a chromosome, representing a potential solution of (9), is the concatenation of the binary strings of module allocation a and hardware redundancy levels r. An example of the chromosome thus formed is shown below: Variables Chromosome
a1 0110
aM 0110
r1 0011
rL 1001
rLþ1 0011
rLþN 1001
where each variable takes on a 4-bit binary string. 4.2. Evaluation of chromosomes In GAs the higher the fitness value of a chromosome, the more likely it will be selected in the next generation. The fitness value of a chromosome could be evaluated by the objective function value, but such an assessment is not always plausible in practice. Therefore, two procedures are often encountered to transform the objective function value to the fitness value: (i) fitness mapping, and (ii) fitness scaling. Since the problem concerned is a minimization problem, fitness mapping is necessary that maps the objective function values to fitness values so that the solutions with smaller objective function values will have higher fitness values. A linear mapping function is considered in this study. It is important to note that in order to compute the objective function value, the task allocation and hardware redundancy variables have to be decoded from the chromosome using (13) and (14) respectively. The purpose of fitness scaling is to maintain diversity in the population by controlling the number of expected copies allowed for the best chromosomes. A linear scaling function is assumed. 4.3. Selection The selection scheme adopted in this study is the deterministic sampling method [14] in which the probability, pi , of selecting a chromosome, Ci , is given by its fitness value divided by the sum of the fitness values of all chromosomes. As such, the expected number, ei , of offspring for Ci is ei ¼ pop size pi , where
438
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
Fig. 2. 2-point crossover of two chromosomes.
pop_size is the population size. Since the actual number of copies for each chromosome takes on an integral value, it is determined as follows: the number of copies for Ci to be placed in the next population is first given by the integral part of ei ; the remaining chromosomes needed to fill in the next population are then drawn from the list of the fractional parts of the ei sorted in a decreasing order. 4.4. Genetic operators The genetic operators used to generate a new population are multi-point crossover and mutation. Crossover exchanges genetic material of two chromosomes with probability pc . As shown in Fig. 2(a), for instance, 2-point crossover which randomly selects two cutting points is deployed on chromosomes Ci and Cj . The resulting chromosomes by gluing the alternate parts of C0i and C0j are illustrated in Fig. 2(b) where the exchanged bits are represented by the dotted lines. Mutation, on the other hand, alters a bit in a chromosome with a small probability pm .
5. Hybrid genetic algorithm As discussed in the previous section, there are ðM þ 2N 1Þ variables to be determined. That is, the number of variables increases roughly twice as the number of processing nodes grows. In the case of the task size being less than the number of processing nodes, some processing nodes and communication channels are in fact idle, resulting in inefficiency when searching over the solution subspace, involving these idle processing nodes and communication channels. Further, even we are able to exclude these idle processing nodes and communication channels, the dimension of the resulting solution space grows linearly with respect to the number of tasks and the number of communication channels. Therefore, we develop a more efficient hybrid genetic algorithm (HGA) that combines genetic algorithm and a local search procedure. The local search procedure incorporated in the HGA searches for the locally optimal hardware redundancy levels for a given task allocation, and the HGA performs genetic search over the subspace of task allocations. By letting zðX; rX Þ be the locally optimal system cost and rX the locally optimal hardware redundancy levels for given X, i.e. zðX; rX Þ ¼ minr zðX; rÞ, the problem formulation in (9) is transformed into min X
zðX; rX Þ
and the HGA consists of the following steps: Step 1. Initialize a population of chromosomes, representing potential task allocations. Step 2. Evaluate the current chromosomes using the local search procedure to obtain the locally optimal hardware redundancy levels and system costs. Select chromosomes in terms of their fitness values. Step 3. If the maximal number of iterations, max_gen, is reached or the chromosomes converge, terminate the iteration and return the best chromosome; otherwise, go to Step 4.
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
439
Step 4. Generate a new population by deploying genetic operators to the current chromosomes, and repeat Step 2. The HGA differs from the SGA in Step 1, where each chromosome represents a potential task allocation, and in Step 2 where evaluation of a chromosome now involves the computing of locally optimal hardware redundancy levels such that zðX; rÞ is minimized for the task allocation X it represents. In the following discussion, we derive the structural property of system cost with respect to the hardware redundancy levels, which enables the finding of the optimal hardware redundancy levels for a given task allocation. 5.1. Local search To derive the structural property of system cost with respect to the hardware redundancy levels, the hardware redundancy levels are generalized to take on real values. It is clear from (6) that f ðrÞ is linear in r. For any idle processing node Pk , k 2 ½1; N , the reliability Rpk ðXÞ ¼ 1 and thus ohðX; rÞ=orLþk ¼ 0; likewise, for any idle communication channel Lk , k 2 ½1; L, ohðX; rÞ=ork ¼ 0. Further, for any active processing node Pk , k 2 ½1; N , ohðX; rÞ o 1 1 r ¼U ½ð1 Rpk ðXÞ Lþk lnð1 Rpk ðXÞÞÞ ¼U r 2 orLþk orLþk 1 ð1 Rpk ðXÞrLþk Þ ð1 ð1 Rpk ðXÞ Lþk ÞÞ r
where U ¼ hðX; rÞ=ð1 ð1 Rpk ðXÞ Lþk Þ. Since lnð1 Rpk ðXÞÞ < 0, ohðX; rÞ=orLþk < 0. It can also be shown 2 that o2 hðX; rÞ=orLþk > 0. Similarly, for any active communication channel Lk , k 2 ½1; L, ohðX; rÞ=ork < 0 and o2 hðX; rÞ=ork2 > 0. It is therefore concluded that hðX; rÞ is continuous and monotonic decreasing in r. It is further shown in Appendix A that hðX; rÞ is convex with respect to r. Since f ðrÞ in (6) is linear in r, following from Lemma 1 zðX; rÞ is convex with respect to r. That is, the optimal hardware redundancy levels are unique. A local search method would suffice to find the optimum solution. However, since our objective is to find the optimal hardware redundancy levels of integral values, we then search the integral hardware redundancy vector in the neighborhood of the optimal solution that gives the best system cost. A gradient-based local search procedure is as follows: ðsÞ
Step 1. Set the iteration index s ¼ 0; initialize a real hardware redundancy vector rðsÞ where every rk 2 rðsÞ takes on a real value. Step 2. Compute the gradient rzðX; rðsÞ Þ for the current hardware redundancy vector rðsÞ : 3T 2 ozðX; rÞ
ðsÞ 7 6 r1 ¼r1 or1 7 6 ðsÞ . 7 6 rzðX; r Þ ¼ 6 .. 7 5 4 ozðX; rÞ
ðsÞ orLþN rLþN ¼rLþN and calculate the next potential hardware redundancy vector r0 by r0
rðsÞ arzðX; rðsÞ Þ
where a < 1 is a small constant. r0 is then thresholded, using r and r, to obtain the next hardware redundancy vector rðsþ1Þ : 8 0 < rk ; if rk 6 rk0 6 rk ðsþ1Þ rk ¼ rk ; if rk0 < rk : rk ; if rk0 > rk
440
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
Step 3. If the norm, krðsÞ rðsþ1Þ k, of two successive hardware redundancy vectors rðsÞ and rðsþ1Þ is greater than some predetermined value s, increase s by 1 and repeat Step 2; otherwise, go to Step 4. Step 4. Compute the optimal integral hardware redundancy levels based upon rðsÞ by enumerating the ðsÞ ðsÞ neighboring integers of each rk 2 rðsÞ ; rk 6¼ rk ; rk .
6. Experimental results Four sets of randomly generated problems with DCS sizes N ¼ 4, 8, 16, and 32, are used to compare the performance of the proposed HGA and that of the SGA. Because the approach of encoding task allocation is the same for both algorithms, the task size is fixed to be M ¼ 4 in the experiment. Yet, the M N AET matrix and the M M IMC matrix of a task are generated randomly for all problems. For each problem set, ten problems of the same DCS size are generated randomly, totaling 40 different problems in the simulation. For each of 40 problems, 20 simulation runs are conducted for each of the algorithms, and systems costs and computational times are averaged over these simulation runs. The simulation results are summarized in Table 1. A random DCS of a given size refers to its topology being random, the failure rates and unit deployment costs of processors and communication links being random, and the transmission rates of communication links being random. Since DCSs are cycle-free, the topology of a random DCS is constructed in form of a tree as follows. A list of a random permutation of the processing nodes is first obtained. The first processing node in the permutation list is assigned to the root of the tree, the level of which is set to 1, and the second processing node is assigned to the second level of the tree and connected to the first processing node to form a communication channel. Starting from level two, the rest of the processing nodes in the permutation list are drawn sequentially and allocated to the levels of the tree with probability pn being at the current level and probability ð1 pn Þ being at the next level. Once allocated, the processing node is connected to one of the processing nodes at the previous level with equal probability to form a communication channel. The node-allocation procedure is repeated until the permutation list is empty. Note that after a processing node is allocated to the next level, the counter of the current level is increased by one. For each problem in the experiment, the probability pn is drawn uniformly from an open interval ð0; 1Þ. The failure rates of processors at the processing nodes and those of communication links at the communication channels are random variables uniformly distributed over some predetermined intervals, which are chosen experimentally such that no trivial solution takes place; the unit deployment costs of processors and the unit deployment costs and transmission rates of communication links are also uniform random variables on some predetermined intervals. As the topologies of the DCSs are randomly generated in form of tree, various shapes are possible. In the simulation, for example, 7 of 10 DCSs of N ¼ 4 are star-shaped and 3 of them are series-shaped. Similarly, to generate a random task of a given size, the entries of the M N AET matrix and those of the M M IMC matrix are drawn from a uniform distribution on prespecified intervals. Yet, it is possible in practical situations that some modules cannot be executed by some processing nodes. To incorporate this possibility in the experiment, each entry in the AET matrix might take on a large number, signifying the infeasible allocation of a module to the corresponding processing node, with a small probability pa 1; a random value drawn from a uniform distribution on some predetermined interval with probability ð1 pa Þ. The simulation results by deploying the algorithms SGA and HGA to solve 40 randomly generated problems are summarized in Table 1. The first column of Table 1 indicates the problem sets, where M is the task size, N is the DCS size, and S (in the parentheses) is the problem index for each problem set. The second and fourth columns of Table 1 represent the averaged best system costs over 20 simulation runs obtained by deploying SGA and HGA respectively. The averaged computing times for both algorithms are
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
441
Table 1 Results for randomly generated problems M N ðSÞ
zGA
timeGA
zHGA
timeHGA
Dz (%)
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
65.0562 38.6528 50.5662 39.3549 43.9343 41.8500 63.7546 37.4001 62.1640 58.1898
6.10 5.75 7.20 5.80 20.90 21.55 12.90 30.60 10.80 10.80
60.6616 35.8897 45.8349 36.1238 41.6249 36.7834 61.0213 35.4857 59.8798 56.0850
1.65 0.85 1.35 1.85 1.90 1.35 1.90 1.20 1.65 1.55
6.8 7.1 9.4 8.2 5.3 12.1 4.3 5.1 3.7 3.6
4 8 4 8 4 8 4 8 4 8 4 8 4 8 4 8 4 8 4 8
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
65.7361 53.2433 62.3976 71.7255 78.2295 68.8431 72.6442 43.1410 73.1547 79.7159
43.50 60.40 60.05 45.80 64.65 45.35 45.80 46.25 41.00 70.10
57.2992 49.9804 57.5199 64.3263 69.3723 64.2946 65.5657 39.3699 67.8593 70.6188
8.10 7.15 26.40 14.55 8.50 24.50 7.60 13.10 25.15 21.85
12.8 6.1 7.8 10.3 11.3 6.6 9.7 8.7 7.2 11.4
4 16 4 16 4 16 4 16 4 16 4 16 4 16 4 16 4 16 4 16
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
99.3634 85.1631 100.8902 102.4747 98.6694 114.8582 118.0839 89.4155 86.6875 116.6028
130.95 111.70 107.70 164.90 112.40 112.60 112.50 117.20 125.10 171.95
92.6052 80.7162 90.6737 90.8785 89.7003 95.6994 105.9161 84.8213 77.7774 109.7598
38.25 96.65 23.75 28.80 89.20 15.35 21.35 83.75 17.05 10.55
6.8 5.2 10.1 11.3 9.1 16.7 10.3 5.1 10.3 5.9
4 32 4 32 4 32 4 32 4 32 4 32 4 32 4 32 4 32 4 32
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
174.1888 166.7614 179.2091 170.0490 167.4432 149.6656 184.7781 141.2580 148.4086 165.6908
323.15 303.95 285.00 306.00 301.55 284.15 283.80 302.50 338.65 338.25
163.5346 153.6121 166.6132 149.2341 149.2474 138.5555 163.7780 133.0566 137.2859 149.2564
48.70 65.90 25.05 23.60 90.05 107.45 62.15 104.30 51.75 11.80
6.1 7.9 7.0 12.2 10.9 7.4 11.4 5.8 7.5 9.9
listed in the third and fifth columns of Table 1 respectively. The last column of Table 1 represents the differences Dz of averaged best system costs in percentage between the two algorithms, i.e. Dz ¼ ðzSGA zHGA Þ=zHGA 100, where zSGA and zHGA are the averaged best system costs respectively. Table 2 summarizes the number of optimum solutions found and the maximum deviation among the obtained solutions for HGA. For N ¼ 4 and 8, the solutions are compared with the global optimum obtained by exhaustive search; and the maximum deviation in percentage, dmax , in the fourth column of Table 2 is defined as dmax ¼ ðzmax zopt Þ=zopt 100, where zmax is the maximum system cost over 20 simulation runs
442
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
Table 2 Results for HGA M N ðSÞ
# of simulation runs
# of optimal solutions obtained
Maximum deviation (%)
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 19 20 20 20
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.3634 0.0000 0.0000 0.0000
4 8 4 8 4 8 4 8 4 8 4 8 4 8 4 8 4 8 4 8
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
20 20 20 20 20 20 20 20 20 20
18 13 16 19 15 14 16 15 17 16
0.5955 1.8673 0.7548 2.0562 1.5585 1.6628 0.3969 0.6723 1.5874 0.4584
4 16 4 16 4 16 4 16 4 16 4 16 4 16 4 16 4 16 4 16
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
20 20 20 20 20 20 20 20 20 20
– – – – – – – – – –
2.3098 2.7643 3.7284 1.3981 1.2851 2.9680 1.3466 2.1015 0.9531 5.9342
4 32 4 32 4 32 4 32 4 32 4 32 4 32 4 32 4 32 4 32
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
20 20 20 20 20 20 20 20 20 20
– – – – – – – – – –
4.2332 3.1445 4.9212 5.1235 3.6790 2.8826 4.5788 3.0450 3.6430 5.7108
and zopt is the global optimum system cost. In the problem sets of N ¼ 16 and 32, because exhaustively searching for the optimum solution requires extremely long time, the third column for these two problem sets is left blank and the maximum deviation in the fourth column of Table 2 is defined as dmax ¼ ðzmax zmin Þ=zmin 100 where zmax and zmin are the maximum and minimum system costs over 20 simulation runs. For all simulation runs, the maximal redundancy level and the minimum redundancy level for each processing node and each communication channel are 16 and 1 respectively; the fixed cost in (6) is c ¼ 0;
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
443
and a processing node cannot process more than one module, i.e. X ¼ 1 in (11). The GA parameters for both SGA and HGA in these four problem sets are listed below:
pop_size max_gen nc pc pm
M ¼ 4, N ¼ 4 SGA HGA
M ¼ 4, N ¼ 8 SGA HGA
M ¼ 4, N ¼ 16 SGA HGA
M ¼ 4, N ¼ 32 SGA HGA
80 800 3 0.9 0.2
150 1200 6 0.9 0.2
200 1500 6 0.9 0.2
250 1800 6 0.9 0.2
20 250 2 0.9 0.2
30 300 2 0.9 0.2
50 500 2 0.9 0.2
60 600 3 0.9 0.2
where pop_size is the population size, max_gen the maximal number of generations, nc the number of crossover points, pc the crossover probability, and pm the mutation probability. Consider first the first problem set where M ¼ 4 and N ¼ 4. The simulation results are depicted in the first ten rows of Table 1, which reveals that the HGA performs better than the SGA in either solution quality or computational time. That is, with smaller numbers of iterations and population size, the HGA is able to produce a better solution efficiently than the SGA. The number of optimum solutions found is depicted in Table 2. For these 10 problems, only one simulation run for S ¼ 7 does not yield the optimum solution. A close look at that particular simulation run which does not yield the optimum solution reveals that in the simulation the chromosomes with system cost 61.2309 outperform the other chromosomes and quickly dominate the population, resulting in rapid convergence in the evolution process. Yet, the quality of the solution is satisfactory with cost deviation 0.3634%, as shown in Table 2. For the second problem set where M ¼ 4 and N ¼ 8, the HGA again performs better than the SGA with average 9.19% improvement in system cost. The execution times for the HGA vary significantly in different problems. It may be due to the fact that the converging speed of the local search is generally not constant for different task allocations in different problems. Further, the number of optimum solutions obtained for N ¼ 8 is decreased, compared to that for N ¼ 4. Yet, the maximal deviations in the fourth column of Table 2 indicate that the solution quality is still satisfactory. Similarly, for N ¼ 16 and 32, the HGA has 9.1% and 8.6% improvement in system costs over the SGA. And, the HGA in general takes less computing time than the SGA. Of note is that in earlier simulations the HGA required more computing time than the SGA for the problem of M ¼ 4, N ¼ 16, S ¼ 8 and that of M ¼ 4, N ¼ 8, S ¼ 4 because the threshold value s in the local search procedure was set too small and a lot of iterations were required to terminate the local search procedure. From the experiments, it can be revealed that the threshold value shall be properly set in the HGA in order to gain efficiency. To study the influence of hardware redundancy on system cost and system reliability, we use the problem of N ¼ 8, M ¼ 4, S ¼ 4 for illustration. (The corresponding DCS is shown in Fig. 3.) For this problem, the
Fig. 3. DCS for the problem of N ¼ 8, M ¼ 4, S ¼ 4.
444
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
optimal task allocation strategy without hardware redundancy is identical to that with hardware redundancy: m1 , m2 , m3 and m4 being allocated to P6 , P4 , P2 and P1 respectively; and the optimal system cost and system reliability without hardware redundancy are 97.8097 and 0.257689 respectively. Although the failure rates of processors and communication links are small in the simulation, long communication times on communication channels result in an unreliable DCS. Deploying various hardware redundancy levels with the same task allocation strategy give the following system costs and system reliabilities: P1
P2
P4
P6
L1
L2
L3
System cost
System reliability
1 1 2 2 1 1 1 1 1 2
1 2 2 2 1 1 2 1 2 2
1 1 1 2 1 1 1 1 1 2
1 1 1 2 1 1 1 1 1 2
1 1 1 1 2 2 2 3 3 3
1 1 1 1 2 2 2 3 3 3
1 1 1 1 2 3 3 3 3 3
97.8097 93.5941 94.5256 93.8531 65.4866 65.1920 64.2602 65.7743 65.2017 68.9777
0.257689 0.278451 0.288297 0.306662 0.591570 0.618354 0.668174 0.743660 0.803576 0.884988
where the hardware redundancy levels with system cost 64.2602 are the optimal hardware redundancy levels. Since system unreliability is due to long communication times on communication channels, increasing the hardware redundancy levels of processing nodes does not render significant improvement on system cost and system reliability. However, increasing the hardware redundancy levels of communication channels significantly improves system cost and system reliability. It can also be observed that higher hardware redundancy levels lead to a more reliable DCS but do not necessarily render a more cost-effective one.
7. Summary A unified model of system cost in an acyclic distributed computing system is proposed that takes into account the sources of cost due to hardware redundancy and system reliability. An efficient hybrid genetic algorithm that combines genetic algorithm and a local search procedure has been developed to seek the optimal hardware redundancy levels and optimal task allocation such that system cost is minimized. Enabled by the convexity of system cost with respect to hardware redundancy levels, the local search procedure finds the locally optimal hardware redundancy levels for a given task allocation, and the genetic algorithm searches over the subspace of local optima for various task allocations. The simulation results show that the HGA renders a more promising solution with less computational time than the simple genetic algorithm. Different problem formulations for achieving economic task allocation are possible. For instance, an alternative is to maximize the DCS reliability subject to some cost constraints. Yet, in these problem formulations, including ours, cost coefficients have to be accurately estimated and accurate estimation of cost coefficients calls for experienced personnel or experts. An interesting problem that relates to our work is the static load balancing problem, which is to determine the optimal processing rates of the processing nodes subject to the random arrivals of the tasks such that the mean job response time is minimized [10]. In the static load balancing problem task allocations are given and tasks arrive at the DCS following some stochastic processes, whereas our model task allocations
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
445
are decision variables. Further, the processing speeds of the processing nodes are decision variables in the static load balancing problem while the processing speeds of the processing nodes and communication among tasks are given in our model. Though the static load balancing problem differs from our model in problem formulation, it points out some possible extensions to the current study. Firstly, the current work handles the load of a processing node by imposing the capability that is the upper limit of modules a processing node can execute. In the future, we can handle the load of a processing time using processing time in order to improve the total execution time of the task. Secondly, the current study considers tasks being executed on the DCS one at a time. If the tasks arrive at the DCS following some stochastic process, it would be worthy to explore how to dynamically allocate the modules of each arriving task to the processing nodes in order to achieve maximum average performance in the long run.
Acknowledgement This research was supported by National Science Council, Taiwan, ROC under grant #NSC-91-2213-E006-072.
Appendix A. Convexity of h(X; r) with respect to r Lemma 1. Given a task allocation X, hðX; rÞ is convex with respect to r. Proof. Suppose some region of hðX; rÞ is concave with respect to r. Then, there exists two hardware redundancy vectors, u and v, in that region such that hðX; ðu þ vÞ=2Þ > ðhðX; uÞ þ hðX; vÞÞ=2. Let ( 1 Rlk ðXÞ; k ¼ 1::L Fk ¼ 1 Rpk ðXÞ; k ¼ ðL þ 1Þ::ðL þ N Þ Then, for every index k 2 ½1; ðL þ N Þ, u =2
ðuk þvk Þ=2
v =2 2
() 1 ðFkuk þ Fkvk Þ þ Fkuk þvk 2 ðu þv Þ=2 () ð1 Fkuk Þð1 Fkvk Þ 6 1 Fk k k
ðFk k Fk k Þ P 0 () ðFkuk þ Fkvk Þ 6 2Fk ðuk þvk Þ=2
6 1 2Fk
þ Fkuk þvk
where uk 2 u and vk 2 v; and for all kÕs, LY þN LY þN 2 ðu þv Þ=2 ð1 Fkuk Þð1 Fkvk Þ 6 1 Fk k k () Q LþN k¼1
k¼1
P QLþN k¼1
k¼1
ð1 Fkuk Þ
2 QLþN k¼1
2 ðuk þvk Þ=2
ð1 Fk
Þ
Since QLþN k¼1
1 1 2 þ QLþN P QLþN vk 1=2 uk QLþN ð1 Fkuk Þ ð1 F Þ ½ k¼1 ð1 Fk Þ k¼1 ð1 Fkvk Þ k k¼1
we have QLþN k¼1
1 1 2 þ QLþN P QLþN uk vk ðuk þvk Þ=2 ð1 Fk Þ Þ k¼1 ð1 Fk Þ k¼1 ð1 Fk
ð1 Fkvk Þ
1=2
446
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
Therefore, " # # 1 1 2 hðX; uÞ þ hðX; vÞ ¼ gðXÞ QLþN þ QLþN P gðXÞ QLþN uk vk ðuk þvk Þ=2 Þ k¼1 ð1 Fk Þ k¼1 ð1 Fk Þ k¼1 ð1 Fk "
¼ 2hðX; ðu þ vÞ=2Þ contradicting that hðX; ðu þ vÞ=2Þ > ðhðX; uÞ þ hðX; vÞÞ=2. It is therefore concluded that hðX; uÞ is convex with respect to r.
References [1] M.S. Chang, D.J. Chen, M.S. Lin, K.L. Ku, The distributed program reliability analysis on star topologies, Computers and Operations Research 27 (2000) 129–142. [2] P.Y. Chang, D.J. Chen, Optimal routing for distributed computing systems with data replication, in: Proceedings of IEEE International on Computer Performance and Dependability Symposium, 1996, pp. 42–51. [3] C.M. Chen, J.D. Ortiz, Reliability issues with multiprocessor distributed database systems: A case study, IEEE Transactions on Reliability 38 (1989) 153–158. [4] G.M. Chiu, C.S. Raghavendra, A model for optimal resource allocation in distributed computing systems, in: Proceedings of Seventh Annual Joint Conference of the IEEE Computer and Communications Societies on Networks: Evolution or Revolution, IEEE, 1988, pp. 1032–1039. [5] W.W. Chu, M.T. Lan, J. Hellerstein, Estimation of intermodule communiation (IMG) and its applications in distributed processing systems, IEEE Transactions on Computers C-33 (1984) 691–699. [6] L. Davis, Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, 1991. [7] E.A. Elsayed, Reliability Engineering, Addison Wesley, New York, 1996. [8] M. Gen, R. Cheng, Genetic Algorithms and Engineering Design, John Wiley & Sons, New York, 1997. [9] J.H. Holland, Adaptation in Neural and Artificial Systems, University of Michigan Press, Ann Arbor, MI, 1975. [10] H. Kameda, J. Li, C. Kim, Y. Zhang, Optimal Loading Balancing in Distributed Computer Systems, Springer, New York, 1997. [11] S. Kartik, C.S.R. Murthy, Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer systems, IEEE Transactions on Reliability 44 (4) (1995) 575–586. [12] S. Kartik, C.S.R. Murthy, Task allocation algorithms for maximizing reliability of distributed computing systems, IEEE Transactions on Computers 46 (6) (1997) 719–724. [13] A. Kumar, R.M. Pathak, Y.P. Gupta, Genetic algorithm based approach for file allocation on distributed systems, Computers and Operations Research 22 (1995) 41–54. [14] C.T. Lin, C.S.G. Lee, Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems, Prentice Hall, Singapore, 1999. [15] M.S. Lin, D.J. Chang, The reliability problem in distributed database systems, in: Proceedings of 1997 International Conference on Information, Communications and Signal Processing, vol. 2, 1997, pp. 795–799. [16] M.S. Lin, M.S. Chang, D.J. Chen, Distributed-program reliability analysis: Complexity and efficient algorithms, IEEE Transactions on Reliability 48 (1999) 87–95. [17] C.S. Raghavendra, S. Hariri, Reliability optimization in the design of distributed systems, IEEE Transactions on Reliability SE-11 (1985) 1184–1193. [18] C.R. Reeves, Modern Heuristic Techniques for Combinatorial Problems, John Wiley & Sons, New York, 1993. [19] P.Y. Richard, E.Y.S. Lee, M. Tsuchiya, A task allocation model for distributed computing systems, IEEE Transactions on Computers C-31 (1982) 41–47. [20] G.A. Schloss, M. Stonebraker, Highly redundant management of distributed data, in: Proceedings of Workshop on the Management of Replicated Data, 1990, pp. 91–95. [21] S.M. Sha, J.P. Wang, M. Goto, Task allocation for maximizing reliability of distributed computer systems, IEEE Transactions on Computers 41 (9) (1992) 1156–1168. [22] S.M. Shatz, J.P. Wang, Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer systems, IEEE Transactions on Reliability 38 (1989) 16–27. [23] C.C. Shen, W.H. Tsai, A graph matching approach to optimal task assignment in distributed computing systems under a minmax criterion, IEEE Transactions on Computers C-34 (1985) 197–203.
C.-C. Hsieh / European Journal of Operational Research 147 (2003) 430–447
447
[24] A.B. Stephens, Y. Yesha, K.E. Humenik, Optimal allocation for partially replicated database systems on ring networks, IEEE Transactions on Knowledge and Data Engineering 6 (1994) 975–982. [25] A.S. Tanenbaum, Distributed Operating Systems, Prentice Hall, Englewood Cliffs, NJ, 1995. [26] A.K. Verma, M.T. Tamhankar, Reliability-based optimal task-allocation in distributed-database management systems, IEEE Transactions on Reliability 46 (1997) 452–459. [27] R.K. Wood, Factoring algorithms for computing k-terminal network reliability, IEEE Transactions on Reliability 35 (1986) 269– 278.