Operations Research Letters 38 (2010) 502–504
Contents lists available at ScienceDirect
Operations Research Letters journal homepage: www.elsevier.com/locate/orl
Grid scheduling with NBU service times Yusik Kim a , Rhonda Righter b,∗ , Ronald Wolff b a b
INRIA Bordeaux - Sud-Ouest, 351 cours de la Libération, 33405 Talence Cedex, France Department of IEOR, University of California, 4141 Etcheverry Hall, Berkeley, CA 94720, USA
article
info
Article history: Received 31 January 2009 Accepted 16 August 2010 Available online 16 September 2010
abstract In the highly unpredictable environment of grid computing, it often makes sense to replicate the same job on several of the available servers. We prove that never replicating is optimal for a heterogeneous multi-server system in a random environment when service times are New Better than Used. © 2010 Elsevier B.V. All rights reserved.
Keywords: Multi-server queues Stochastic scheduling Grid computing NBU processing times
1. Introduction Grid computing [2] is a new distributed computing paradigm where Internet-connected servers are integrated to form a fast virtual computer. Examples of existing projects include SETI@Home and Folding@Home. Servers participating in the grid have different owners, a system load that varies over time, and unknown local scheduling policies. The service time (the time between submission and completion) of a job on a server depends on the server rather than properties of the job, and is highly variable. In this situation, it may make sense to assign replicas of a job to several servers simultaneously and use the output of whichever finishes first. While this is clearly not optimal when processing times are deterministic, various amounts of replication can be shown to be optimal depending on the variability of the processing times, where generally more variability implies more replication [3]. By replicating, we complete individual jobs more quickly at the expense of occupying servers that could otherwise be used to work on different jobs. Whether or not it is a worthwhile trade-off depends on the service time distribution. We model the computing grid as a heterogeneous multiserver queueing system, with an infinite input queue. The infinite queue is reasonable for large-scale computing applications such as SETI@Home in which there are a vast number of similar computations to be done. We also assume that there is a single job class, and that all randomness and heterogeneity in processing is due to the server. This is the case when all jobs perform the
∗
Corresponding author. E-mail addresses:
[email protected] (Y. Kim),
[email protected] (R. Righter),
[email protected] (R. Wolff). 0167-6377/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.orl.2010.08.009
same subroutine on different data, as in Single-Program-MultipleData (SPMD) parallel programs, and for many grid applications. There is an environmental state that may affect server speeds and availabilities and that evolves independently of the scheduling actions. We assume that given the environmental state, the processing times of (replicas of) the same job on different servers are independent, which is reasonable in a grid environment in which different servers are at different locations and belong to different entities. We assume that communication delays can either be ignored, which is realistic when computation times are significantly longer than communication times, and is generally the case for applications in which grid computing makes sense, or they can be incorporated into the processing times. We also assume that if a job is replicated and finishes on one server, there is no cost for deleting, or overwriting, (replicas of) that job from other servers. This assumption also is reasonable in grid environments in which the reason one server takes longer than another to process the same job is because it is either unavailable or busy with its own, higher priority, work. Koole and Righter [4] also modeled the grid as a multi-server queueing system but with general job arrival processes. They proved that when service times are New Worse than Used (NWU— defined later), max-replication, i.e., replicating each job so that all servers are occupied by a single job and its replicas, stochastically maximizes the number of job completions by any time. They also proved an analogous result for New Better than Used (NBU— defined later) service times; that no-replication, i.e., not replicating jobs at all and processing distinct jobs in parallel, stochastically maximizes the number of jobs completed up to time t for all t ≥ 0 when there are two servers and at least two jobs. Borst et al. [1] showed that minimal replication is optimal for geometric service times. Here, no job is replicated when the
Y. Kim et al. / Operations Research Letters 38 (2010) 502–504
503
number of jobs is at least as large as the number of servers, and otherwise the difference in the number of replicas for each job is minimized while keeping all servers busy. Clearly, in the deterministic case, never replicating is optimal even when there are fewer jobs than servers. In the exponential case, any replication policy is optimal, as long as all servers are kept busy. In this paper, we complement Koole and Righter’s [4] result for NBU service times by generalizing the number of servers to n ≥ 2 while requiring an infinite number of jobs available at time 0. We prove that for NBU service times, the policy that never idles or replicates jobs, among all scheduling policies that allow both idling and replication, stochastically maximizes the job completion process. Note that this policy has the additional practically attractive feature of decoupling the servers. That is, we can assume that each server has its own infinite input queue of jobs, and that they all operate independently of each other. Also note that the non-idling, non-replicating policy will still be optimal even if we relax our assumption that replication can be done at no cost. Note that when there are not an infinite number of jobs (jobs arrive, say) and there are more than two servers, the optimal policy is unclear. In particular, if there are fewer jobs than servers, and there is no penalty for preempting and replacing a replicate (when a new job arrives), it is clear that jobs should be replicated. However, it is unclear which jobs should be replicated, and which, if any, should be preempted when a new job arrives.
change the future sample path in terms of job completions on the other servers. Hence, we need to be a bit careful. Call our original Model A, and consider a relaxed Model B, in which we can decide the identity of a job on any server at any time, and, in particular, when there is a job completion. Clearly this gives us more freedom than having to determine job identities upon the start of service. When a job completes, if we suppose that some other jobs are replicates, then we basically induce a preemption at the servers of those replicates. Hence, a further relaxation of our model is model C, in which we assume that each server has its own independent set of jobs, and that we can preempt any server at any time based on the histories of all the servers. Preempted jobs are assumed to be discarded. For any policy and any sample path in Model B, there is a corresponding sample path and policy for Model C, and similarly, for any policy and any sample path in Model A, there is a corresponding sample path and policy for Model B. Therefore, if the optimal policy for the relaxed Model C can be achieved with a policy that is permitted in Model A, this policy must also be optimal for Model A. Thus, since a non-idling nonpreemptive policy for Model C can be achieved by the NINR policy in Model A, we need only show the following.
2. Results
Proof. I. Non-idling: Suppose an idling policy π idles some server j during some time interval [a, b]. Since there are an infinite number of jobs in the queue, construct a policy π ′ , with coupled environment and service times, that agrees with π before time a and agrees with π for the other servers during [a, b], but assigns a job that is distinct from all other assigned jobs to server j at time a. If the job finishes before time b, then assign another distinct job. Repeat until time b. At time b, preempt and discard the job currently running on server j and let π ′ agree with π from then on. Then π ′ has more departures than π up to any time t. The result follows by repeating the argument for all servers and idle intervals. II. Non-preemptive: Now suppose that under π some job is preempted and discarded, and suppose the first preemption happens at time t1 , on server j, say. We argue that we can improve π by not preempting server j at time t1 . By repeating the argument, and using our non-idling result above, we can conclude that a policy that never preempts nor idles is optimal. Consider a policy π ′ that agrees with π before t1 but does not preempt on server j at t1 . Because of our NBU assumption, we can construct coupled nominal potential remaining service times Y , Y ′ on the same probability space such that P (Y ≥ Y ′ ) = 1, where Y is the nominal service time for a new job started at time t1 under π , and Y ′ is the remaining nominal service time of the job in process at time t1 under π ′ . Let us also couple the future environmental state under both policies, so that the speeds and availabilities of the servers are identical. Then, assuming the job on server j at time t1+ is not preempted, its completion time under policies π and π ′ will be C (Y ) ≥ C (Y ′ ) respectively, where C is an increasing function that depends on the server speed over the course of processing. Let π ′ agree with π on the other servers from time t1 until either server j is preempted again under π or the job completes on server j under π ′ (at time t1 + C (Y ′ ) ≤ t1 + C (Y )). If the preemption under π occurs first, let π ′ also preempt, and let it agree with π ′ from then on for all servers, so π ′ ≽ π in the non-strict sense since all job completion times under π ′ and π are identical. Otherwise, if the job completes first under π ′ , let π ′ idle server j and agree with π for the other servers from time t1 + C (Y ′ ) until there is either a job completion
Let X be a non-negative random variable. We say that X is New Better (Worse) than Used or NBU (NWU) if X ≥st (≤st )[X − t |X > t ] for all t ≥ 0. We assume that nominal service times on a server, given the environmental state, are NBU, and independent of other servers. This is weaker than IFR (increasing in failure rate), where X is IFR if [X − t |X > t ] is stochastically decreasing in t for all t ≤ 0. Let Nt (Nt′ ) denote the number of successful job completions under policy π (π ′ ) up to time t. We define the preference relation ∞ π ′ ≽ π if {Nt′ }∞ t =0 ≥st {Nt }t =0 . That is, we can couple the departure processes such that every departure is earlier under π ′ than under π with probability 1. The NINR (Non-Idling No-Replication) Policy is the policy that never idles a server and never replicates a job. In other words, each server independently serves its own infinite supply of jobs. For a single server, say server 1, with an infinite number of jobs and NBU service times, it is not hard to show that any non-idling, non-preemptive policy stochastically maximizes the departure process from that server. When server 1 is part of a multi-server system with the option of replication, it is also easy to see intuitively that the effect of replication is to introduce the possibility, from server 1’s point of view, that some of its jobs may be preempted and discarded. Hence, intuitively, replication can be seen to be non-optimal, and indeed, these observations basically prove that not replicating stochastically maximizes {N (i)t }∞ t =0 for each i, where N (i)t is the number of departures from server i by time t. However, we want to show the optimality of never replicating in the sense of maximizing the joint departure process when servers’ policies have an effect on each other, which is more difficult. A careful coupling argument is necessary to relate the standard multi-server model without server interaction to the model where server interaction is induced by job replication. The standard approach is to consider an arbitrary policy that replicates, at time t on server 1 say, and couple its sample path with that of a policy that does not replicate at that time on that server, so that the latter sample path has more total departures from all servers across all time. However, changing the replication decision at server 1 will
Theorem 2.1. For Model C, a heterogeneous multi-server system in a random environment, in which idling and preemption are permitted at any time on any server, and may depend on the history of the process, if there are an infinite number of jobs and service times are NBU, for any idling and/or preemptive policy π , there exists a non-idling nonpreemptive policy π ′ such that π ′ ≽ π .
504
Y. Kim et al. / Operations Research Letters 38 (2010) 502–504
or a preemption on server j under π , whichever comes first. Let us call this time t2 . If t2 is induced by a job completion, the number of jobs completed under π and π ′ are equal at t2 . If t2 is induced by a preemption, there is exactly 1 more job completed under π ′ than π . In either case, the number of jobs completed under π ′ is at least as large as that under π at time t2 . Finally, let π ′ agree with π for all servers after time t2 and couple the sample paths under the two policies thereafter. Now we have π ′ ≽ π . By our non-idling argument above, we can find a non-idling π ′′ such that π ′′ ≽ π ′ . We apply the same reasoning to π ′′ until there are no more preemptions and we arrive at π ∗ . The following corollaries for our original Model A are immediate. Corollary 2.2. The NINR policy stochastically maximizes {Nt }∞ t =0 . Corollary 2.3. The NINR policy maximizes the long-run throughput, limt →∞ Ntt . Hence the maximal throughput is n/EX where X is a generic service time. Corollary 2.4. For a system with a finite number of initial jobs and an arbitrary arrival process, the NINR policy maximizes the stability
region, i.e., the system will be stable under NINR as long as the arrival rate, λ, is smaller than n/EX . Corollary 2.5. For a system with a finite number m of initial jobs where m > n and an arbitrary arrival process with arrival rate smaller than n/EX , the NINR policy stochastically minimizes the time until the system empties its queue, i.e., until only n jobs are left in the system. Acknowledgements We would like to thank the referee for helpful feedback that improved our presentation. References [1] S. Borst, O. Boxma, J.F. Groote, S. Mauw, Task allocation in a multi-server system, Journal of Scheduling 6 (2003) 423–436. [2] I. Foster, C. Kesselman, S. Tuecke, The anatomy of the grid: enabling scalable virtual organizations, International Journal of High Performance Computing Applications 15 (3) (2001) 200–222. [3] Y. Kim, Resource management for large scale unreliable distributed systems, Ph.D. Dissertation, UC, Berkeley, 2009. [4] G. Koole, R. Righter, Resource allocation in grid computing, Journal of Scheduling 11 (2008) 163–173.