JOURNAL
OF PARALLEL
AND DISTRIBUTED
COMPUTING
10,85-89
(
1990)
Sensitivity Study of the Load Balancing Algorithm in a Distributed System ANNAHA~ AT&T Bell Laboratories, Naperville, Illinois 60566
AND THEODOREJ.JOHNSON Courant Institute of Mathematical Sciences, New York, New York 10012
This paper explains the sensitivity of load balancing performance in a distributed system. The focus of this study is load balancing and its relation to optimal process and read site placement. The system model is based on the LOCUS distributed file system. This system allows replicated files. Process migration is included in the simulation system model. Synchronization policy is enforced by the CSS (Centralized Synchronization Sites) program. All requests to open a file for access must be sent to the file’s CSS. The CSS checks for accessconflicts. An algorithm that increases system performance through load balancing is provided. This algorithm bases its decisions on data collected by the system. Sensitivity of the algorithm characteristics and performance are analyzed and discussed. o 1990Academic Press, Inc.
I. INTRODUCTION Factors to consider when selecting a machine for process execution include resource availability and optimum use of resources. The sender-initiated and receiver-initiated strategies for adaptive load sharing with distributed control are compared in [ 1,4]. An improvement in system performance through file replication, file migration, and process migration is presented in [ 21. This paper introduces a sensitivity study of a dynamic load balancing policy in a distributed system. The distributed system consists of a number of hosts connected by a local area network. The file system in the study is modeled on the LOCUS distributed system [ 3, 5 1. This file system allows replicated files and provides a synchronization policy to update remote copies. The simulation model allows process migration to different sites depending on the loads on the hosts. The implemented algorithm selects the site for process execution and decides on the read site placement. This algorithm bases its decisions on information collected in the system. A token is periodically entered into the system to collect information about resource usage. This information is then used in the load balancing strategy. The algorithm bases its decisions on various workload and system param-
eters. The algorithm also allows for out-of-date information, since collecting the information is time consuming and can also overload the system. The algorithm for dynamic load balancing attempts to maximize performance in a distributed system by selecting the site for process execution and deciding on the read site placement. The sensitivity study of the token period and the weights used by the algorithm provide incentive for using this method in the distributed systems. 2. MODEL
OF A DISTRIBUTED
SYSTEM
The system is modeled by an open queuing network as shown in Fig. 1. An open queuing network consists of a number of interconnected queued servers. Each CPU has a round-robin service discipline, and the disks and the network have a FCFS (First Come First Served) discipline. The service time distribution for the CPU and the disks is uniform. The service time distribution for the network is deterministic, but dependent on the size of the message transmitted. A job is entered into the system independently of the number of jobs already in the system. Jobs are submitted at a rate of three per second at each host. After visiting a sequence of servers, the job terminates. Because the submitted jobs have different demands on the host and the network resources, load balancing is needed to achieve the most efficient use of the system resources. This model is then implemented as an event-driven simulator. The most explored area of concentration in this study is the distributed file system. The synchronous policy of the distributed file system is the multiple reader, single writer policy. This policy is enforced by the use of the CSS (as in the LOCUS operating system) and defined in [ 51. This policy requires that every file have a unique CSS associated with it. To open a file for access, a request is sent to the file’s CSS. If the request for access does not conflict with any current accesses, the request is granted. Otherwise, the request is refused. If the request to access has been granted, the requesting process may access the file. After a process has finished the file access, the CSS must be notified so that it may update its file tables. The file system supports replicated files 85
0743-73 15/90 $3.00 Copyright 0 1990 by Academic Press, Inc. All rights of reproduction in any form reserved.
86
HAC AND JOHNSON a
job departure network
reouest
high priority --L
CPU
I
low priority
terminot
1
. .
. l .
terrn~nol
n
x$g
FIG. 1. The models (a) of a host and (b) of a distributed system.
(that is, the files that may have copies existing on a number of hosts). The update servers use the high priority queues while all other jobs use the low priority queues. 3. A WORKLOAD
MODEL
The requests for job execution are scheduled for each host independently of the scheduling of requests at other hosts. After a job execution request is sent to a host, the next execution request is scheduled to occur after a period determined by a sample from a uniform random variable. The uniform distribution for interarrival times was chosen arbitrarily. Eight different job types are specified, each with different service requirements. These job types cover a wide range of possible jobs in a real system. The ratio of read-towrite disk accesses is 1: 1 from the experience with measurement data collected in an industry local area network environment. The graphs in Fig. 2 illustrate the paths taken by each job type. The system workload is specified by the probabilities of jobs of each job type. 4. ALGORITHM
FOR
LOAD
BALANCING
The algorithm that chooses the execution site and the read site uses vectors of workloads and host characteristics for
each host. A vector is constructed for each of the possible selection sites. The vectors are computed so that the longest vector indicates the worst selection choice. Therefore, the host with the shortest vector is chosen. When an execution site is selected, the selection sites include all hosts. When a read site is selected, the selection sites include all hosts with a copy of the file to be read from. The algorithm does not check if the read site contains the most current version of the file. The workload characteristics considered for process placement are CPU queue length, CPU utilization, and number of jobs active at a host. Disk characteristics are not considered in this study because they only affect file access. The work minimization characteristics for process placement are: ( 1) if the host being considered for job placement is the host requesting the job; (2) if the job accesses a file and the file is stored at the host being considered; and (3) if the job is interactive (that is, accesses a terminal) and the terminal is at the host being considered. The workload characteristics for read site placement are disk queue length, disk utilization, and number of jobs accessing a file on the disk. Because accessing a file takes disk time, for simplicity, CPU characteristics are not considered. The work minimization characteristic for read site placement checks whether the file is stored locally. Let h define the host being considered. Let w(h) be the length of the vector of load to be assigned to h. The host selection algorithm is as follows: 1) for every host h being considered as a placement choice 2) for every workload characteristic being considered 3) w(h) = w(h) + ((weight fir workload characteristic) X ( workload characteristic))? 2 4) for every work minimization characteristic being considered 5) if the host being considered does not meet the work minimization condition 6) w(h) = w(h) + (weight for work minimization condition)? 2 7) choose the host, k, such that k = ( k : w(k) = min( w( h)) } Steps 2 and 3 consider workload characteristics. The length of the vector is increased with the square of the value of the workload measurement. This makes hosts with lower workload dimensions more likely to be chosen. Steps 4 through 6 consider work minimization characteristics. A host that does not meet the work minimization condition (for example, if the file is local) has the length of its vector increased. These hosts are less likely to be chosen. Sensitivity analysis of the algorithm considers weights for the workload characteristics and for the work minimization characteristics. The following parameters will be used in the sensitivity analysis: H, the number of hosts in the distributed system; IV, the number of workload characteristics; L, the number of work minimization characteristics; C,, the ith
SENSITIVITY
STUDY
OF THE LOAD BALANCING
ALGORITHM
87
b I) remote
file
octess network
lob ler~inotion
a user cl=
I
’
I
CPU
q---(-j=
=
FIG. 2.
job lerminolion
The graphs of the workload model. (a) Jop type 1, 5; (b) job type 2, 3,6, 7; and (c)job
workload characteristic, i E { 1, . . . , N} ; Di, the weight for the ith workload characteristic, i E ( 1, . . . , N); Mj, the weight for the jth work minimization characteristic, j E { 1, . . . , L ) ; and T, the token interval. By using the algorithm, the length of the vector of load to be assigned to host h is calculated r=N
w(h)=
C
type 4, 8.
The token interval depends on the load on various hosts. If the load is balanced then the token can collect the information less often. If the load changes then the token has to collect the current information about the system load. The token interval T is inversely proportional to the average sum of differences between workload characteristics on the remote and the local hosts for all remote hosts.
J=L
(01 X Ci)2 + C M,',
For every workload characteristic i, the difference between this characteristic on the remote and the local hosts is greater than the weight for this characteristic: ( .Ymcllc - Cp')>D,,
iE{l,...,
N).
(2)
System performance will be improved by using the load balancing algorithm if the weights for workload characteristics and the work minimization characteristics satisfy conditions (2) and (3) and the token interval satisfies condition (4). By substituting (2) in ( 1) we have
For every work minimization characteristic j, the weight for this characteristic is greater than the value inversely proportional to the maximum workload characteristic on this host: L
max,( Cf”“‘) ’
j=L
- Cy’)X
i=l
Ci]‘+
C dt!f:.
(5)
,=I
By substituting ( 3) in ( 1) we have
1
M,>
i=N
w(h) < c [(cy””
iE{l,...,N},
i=N
jE{l,...,L}.
(3)
(6) i= ,
88
HAC
AND
By adding left and right sides of (5) and (6), respectively, we have i=N
i=l
<
TABLE I Turnaround Time and its Improvement Using Load Balancing for Different Workload Types
1 [ max;( Cfoca’)] 2
w(h) + 2 (0; x c;)2 +
i=N c
JOHNSON
Turnaround time [ms] No load balancing
j=L
CF’)
[(cyore-
X Cj12
+
2
i=l
M,’
+
W(h).
j=l
Workload trpe 1c 2c 3D 4D 5E 6E
i=N
toi
x
ci)2
+
[max~~~+d~~2 1 I
i=N <
c
With load balancing
Improvement
[%I
(7)
Hence,
C i=l
Turnaround time [ms]
File access
CPU job
File access
CPU job
File access
CPU job
1578 1578 2936 2936 1796 1796
8508 8508 1669 7669 9408 9408
1236 1410 2476 2544 1589 1671
7258 7238 6163 7466 8008 7736
21.7 10.6 15.7 13.4 11.5 6.9
14.7 14.9 19.6 2.6 14.8 17.7
J=L [(C:emore
-
cpq
x
CJ2
+
c
i=l
Mj.
(8)
j=l
5. SOME RESULTS Finally, we have i=N
i=N 2
i= I
(0;
X Cj)2 < 2 [ ( Cymore - Cf”“‘) i= I
X Cj12
j=L
+CA+ j=l
1 [maxi(Cpi)]2.
I-~
To produce the best tuning strategy for the placement algorithm, experiments were executed using the load balancing for both process placement and read site placement. The two most favorable results for each workload type (C, CPU-intensive; D, I/O-intensive; E, mixed jobs) are presented in Table I.
“I
From (9 ), we can see that the weights for the workload characteristics allow for placement of processes at the hosts on which workload characteristics are small in comparison with the workload characteristics on the local host. Because the elapsed time of a process is smaller for the smaller workload characteristics (i.e., lighter load on the host) the system performance will be improved. Since the token interval is inversely proportional to the average sum of differences between workload characteristic on the remote and the local hosts for all remote hosts, the information about workload characteristics is updated in a timely manner. System load will be balanced by using the load balancing algorithm if the weights for workload characteristics and the work minimization characteristics satisfy conditions (2) and ( 3) and the token interval satisfies condition (4). The weights for the workload characteristics allow for placement of processes at the hosts on which workload characteristics are small in comparison with the workload characteristics on the local host. Also, the information about workload characteristics is updated in a timely manner. By using condition (9 ), we can see that the processes are transferred to the lightly loaded host subject to the condition on the work minimization characteristics that prevents flooding of these hosts by the processes. This causes the system load to be balanced by using the load balancing algorithm.
6. CONCLUSION The algorithm presented in this paper allows dynamic load balancing in a distributed system. This algorithm bases its decisions on the work minimization characteristic and the workload characteristic in the system. The algorithm uses vectors of loads on the hosts to choose the best host for process placement or read site placement. Sensitivity study of the algorithm allows for choosing the characteristics to balance the load. REFERENCES ,
Eager, D. L., Lazowska, E. D., and Zahorjan, J. A comparison ofreceiverinitiated and sender-initiated adaptive load sharing. Performance Evaluation. 6, 1 (March 1986), 53-68. algorithm for performance improvement through 2. HaC, A. A distributed file replication, file migration and process migration. IEEE Trans. Sojiwure Engrg. 15, 11 (Nov. 1989), 1459-1470.
3. Walker,
B., Popek, G., English, R., Kline, C., and Thiel, G. The LOCUS operating system. Proc. Ninth Symposium on Operating Systems Principles. ACM, New York, 1983, pp. 49-70. distributed
4,
Wang,
Y.-T.,
and Morris,
R. J. T. Load
sharing
in distributed
IEEE Trans. Comput. C-34, 3 (Mar. 1985), 204-217. 5, The LOCUS Distributed System Architecture. LOCUS Corporation,
Santa Monica,
CA, June
1984, Edition
3.1.
systems. Computing
SENSITIVITY
STUDY OF THE LOAD
ANNA HAC is a member of the technical staff in the Switching System Evolution Department at AT&T Bell Laboratories in Naperville, Illinois. She received the MS. and Ph.D. degrees in computer science from the Technical University of Warsaw, Poland, in 1977 and 1982, respectively. She was a visiting scientist at the University of London, England, a postdoctoral fellow at the University of California at Berkeley, and an assistant professor of computer science at the Johns Hopkins University. Her research is in system and workload modeling, performance analysis, reliability, modeling process synchronization mechanisms for distributed systems,distributed file Received April 29, 1988; revised April 22, 1989; accepted October 24, 1989
BALANCING
ALGORITHM
89
systems,and distributed algorithms. Currently, she works on reliable so&ware architecture for switching systems. She is a member of the ACM and a senior member of the IEEE. THEODORE J. JOHNSON is an assistant professor of computer science at the University of Florida in Gainesville. He received the B.A. degree in mathematics from the Johns Hopkins University in 1986 and the Ph.D. degree in computer science from the Courant Institute of Mathematical Sciences of New York University in 1990. His research interests are concurrency control; parallel, distributed, and concurrent algorithms; and performance analysis. He is a member of the ACM and Eta Kappa Nu.