Discrete Optimization 20 (2016) 11–22
Contents lists available at ScienceDirect
Discrete Optimization www.elsevier.com/locate/disopt
Approximation algorithms for scheduling on multi-core processor with shared speedup resources Xufeng Chen a , Deshi Ye b,∗ a b
Department of Mathematics, School of Science, Hangzhou Dianzi University, Hangzhou 310018, China College of Computer Science, Zhejiang University, Hangzhou 310027, China
article
info
Article history: Received 9 September 2013 Received in revised form 5 February 2016 Accepted 9 February 2016
Keywords: Multi-core scheduling Approximation algorithm Resource allocation
abstract We consider a joint resource partition and scheduling problem. We are given m identical cores and discrete resources of total size k. We need to partition the resources among these cores. A set of jobs must be processed non-preemptively on these cores after the resource partition. The processing time of a job on a core depends on the size of resources allocated to that corresponding core. The resource allocation scheme is static, i.e., we cannot change the amount of resources that was allocated to a core during the whole scheduling. Hassidim et al. (2013) investigated this problem with a general processing time function, i.e., the processing time of a job is an arbitrary function of the level of resources allocated to that core. They provided an algorithm with approximation ratio of 36. In this paper, we improve the approximation ratio to 8 by presenting a new resource partition scheme. Next, we consider a special model where the core’s speed is proportional to its allocated resource, then we present two algorithms with improved approximation ratios. © 2016 Elsevier B.V. All rights reserved.
1. Introduction In this paper, we consider the problem of scheduling n jobs J = {t1 , . . . , tn } to m identical cores on a multi-core processor with shared resources of size k. The shared resources will be partitioned among cores. Each job tj is associated with a non-increasing processing time function Tj (x) on a core of resource size x (0 ≤ x ≤ k). For each core i, we need to assign resource of size p(i) ≥ 0, such that the total size of m resources used by all cores is bounded by k, i.e., i=1 p(i) ≤ k. Moreover, the partition of resources among the cores is fixed for all times. In other words, the partition is static. One solution is to define a resource partition p, assigning p(i) resources to core i, and a job assignment S, mapping each job j to a core S(j).
∗ Corresponding author. E-mail addresses:
[email protected] (X. Chen),
[email protected] (D. Ye).
http://dx.doi.org/10.1016/j.disopt.2016.02.002 1572-5286/© 2016 Elsevier B.V. All rights reserved.
12
X. Chen, D. Ye / Discrete Optimization 20 (2016) 11–22
The goal is to find a resource partition and a job assignment such that the longest completion time of a job is minimized, i.e., the makespan is minimized. This problem generalizes many machine scheduling problems. Given a resource partition, our problem becomes the related machine scheduling [1] if Tj (x) = wj /x, where x can be regarded as the speed of a machine, and wj is the processing time of job tj . Even for a very special case that Tj (x) = wj , the processing time of a job tj is independent of the size of allocated resources, our problem becomes the parallel identical machine scheduling, which is NP-hard [2]. Given a partition, our problem will be a special case of the unrelated machine scheduling problem. A well known 2-approximation algorithm for the unrelated machine scheduling was given by Lenstra et al. [3]. Later, Shchepin and Vakhania [4] improved the approximation ratio to 2 − 1/m, where m is the number of machines. However, our problem is difficult than a scheduling problem, because different partitions of the resource will affect a job’s scheduling. This problem has many applications, and most obvious is multi-core scheduling, in which the resource is a shared cache. Besides, the shared cache of size k can be regarded as k additional discrete resources (e.g. budget). For example, one would like to rent virtual machines in a cloud like Amazon’s Elastic Cloud, in which virtual machines have different computing ability due to different prices one will pay. One will rent a total m virtual machines but with budget k and the goal is to minimize the longest completion time among these machines. Another example arises in human resource management. Suppose that we have m teams to execute n projects, and have k new employees, the manager needs to assign projects to these m teams and allocate the new employees to some of the teams. It is reasonable to assume that each employee only joins one team since employee change from one group to another may bring additional costs or are not happy to experience frequent changes of their working environment. Liu et al. [5,6] were the first ones to study this problem by providing experimental heuristics. Their results suggested that jointly solving for the cache partition among the cores and for the job assignment to cores would lead to significant improvements over combining separate algorithms for the two problems. Recently, Hassidim et al. [7] considered this problem theoretically, gave an algorithm with approximation ratio of at most 36, in which the main idea is to provide a 18-approximate algorithm that uses cache at most (1 + 52 ϵ)k for any 0 < ϵ < 1/2, and then extends this to the general case with at most k resources. However, the disadvantage is that the approximation ratio doubles. Our contribution. In this work, we present an improved algorithm with an approximation ratio of at most 8 for the general non-increasing processing time function Tj (x) for job tj with x units of resource. The key idea is to provide a new resource partition scheme. We then consider a special case in which the processing time of a job is linear with respect to the amount of resources. The approximation ratio is further improved to min{4 + ϵ, 2 + 2α}, where α is the maximum speed ratio (a detailed definition is given in Section 3) and ϵ is any given positive number. Related work. Hassidim et al. [7] investigated the joint cache partition and job assignment problem that allows for dynamic cache partitions and dynamic job assignments, in which the size of the cache on a core can be changed over time and the job can be processed on different cores. This means that the scheduler is allowed to interrupt the processing of a job at any point in any time. Regarding the optimal schedule, it was shown that the dynamic variant will improve the makespan when we compare it to the static variant. A closely related problem is called scheduling with an additional renewable speedup resource, in which the processing time of a job depends on resources required. In contrast to our problem, this problem allows a dynamic partition of resources, but keeps the job assignment static and non-preemptive. Grigoriev et al. [8] considered this problem where the processing time depends both on the size of the shared resources and also on the cores, and presented a 6.83-approximation algorithm. Later, this ratio was improved to 4 by Kumar et al. [9,10], and the current best ratio is 3.75 [11]. In [12] the approximation ratio was improved to 3 + ε for any small number ε for parallel machine scheduling. Kellerer [13] studied a linear model of the above
X. Chen, D. Ye / Discrete Optimization 20 (2016) 11–22
13
problem, in which each job j is associated with a value πj , and the actual processing time is pjs = pj − s · πj , where 0 ≤ s ≤ k is the size of the allocated resources for job j. The current best upper bound on the linear model is 3+ε [14,15]. Edis and Oguz [16] gave a nice review of the above problems and provided an extensive computational study. Other related work is the paging problem for multi-core shared caches. L´opez-Ortiz and Salinger [17] introduced a model for a paging strategy to multi-core shared caches. They showed that the problem is APX-hard even if the sequence of requests is known in advance. Hassidim [18] gave a competitive analysis for the cache replacement policies for multiple cores, in which there exist dependences between requests, each request has a fetching time, and the goal is to minimize the makespan. Feuerstein et al. [19] introduced an on-line model for a multi-threaded paging algorithm where there are many threads of requests, in which their algorithms have the capability to schedule requests. In contrast to our problem, paging problems usually focus on the goal of minimizing the number of faults, but without considering the scheduling issues. Additionally, unlike our problem the sizes for each request in paging problems are usually identical. In the remaining part of this paper, we first consider the general case in Section 2. Then we study the linear processing time model in Section 3. Conclusions and future work are given in Section 4. 2. Algorithm for the general case In this section, we propose an 8-approximate algorithm for our problem with the general processing time function. The idea is to design a two-level algorithm that consists of a resource partition and a job assignment. The main approach is to enumerate over all resource partitions, and apply a known unrelated machine scheduling algorithm for each resource partition. The challenge of this approach is that there are too many partitions. We tackle this problem by showing that some partitions are similar to others, and therefore can be ignored with just a small loss in the approximation ratio. For this end, it is useful to find such a subset of resource partitions, which we call it as a characteristic resource partition set as given in Definition 3. Let ϵ be a real constant number with 0 < ϵ < 1. For each positive integer x, there exists only one integer l such that (1 + ϵ)l−1 < x ≤ (1 + ϵ)l . It is easy to find that l = ⌈log1+ϵ x⌉, and we have (1 + ϵ)⌈log1+ϵ x⌉−1 < x ≤ (1 + ϵ)⌈log1+ϵ x⌉ . Definition 1. Given a real constant number 0 < ϵ < 1 and a positive integer k, for any integer x that 0 ≤ x ≤ k, a non-negative function f : [0, k] → Z+ is defined as follows, x if x = 0 or k, k ⌈log x⌉ if 0 < x ≤ , f (x) = ⌊(1 + ϵ) 1+ϵ ⌋ 2 k k − ⌈(1 + ϵ)⌈log1+ϵ (k−x)⌉−1 ⌉ if < x < k. 2 Definition 2. Given a real constant number 0 < ϵ < 1 and a positive integer k, for any integer x that 0 ≤ x ≤ k, we define positive function g : [0, k] → Z+ as follows, ϵk ϵk if 0 < x ≤ , m m g(x) = f (x) if ϵk < x ≤ k − ϵk , m m ϵk k if k − < x ≤ k. m
X. Chen, D. Ye / Discrete Optimization 20 (2016) 11–22
14
Lemma 1. Given f (x) in Definition 1, we have two properties. (i) f (x) ≥ x for 0 ≤ x ≤ k, and (ii) f (x) ≤ (1 + ϵ)x if 0 < x ≤ k2 , or f (x) ≤ x + ϵ(k − x) if k2 < x < k. Proof. (i) For each integer x > 0, x = (1 + ϵ)log1+ϵ x ≤ (1 + ϵ)⌈log1+ϵ x⌉ . Hence x = ⌊x⌋ ≤ ⌊(1 + ϵ)⌈log1+ϵ x⌉ ⌋. On the other hand, if x < k then k − x = (1 + ϵ)log1+ϵ (k−x) > (1 + ϵ)⌈log1+ϵ (k−x)⌉−1 . This leads to x = ⌊x⌋ ≤ ⌊k − (1 + ϵ)⌈log1+ϵ (k−x)⌉−1 ⌋ = k − ⌈(1 + ϵ)⌈log1+ϵ (k−x)⌉−1 ⌉. It follows that f (x) ≥ x for 0 ≤ x ≤ k by the definition of f (x). (ii) If 0 < x ≤ k2 , then f (x) = ⌊(1 + ϵ)⌈log1+ϵ x⌉ ⌋ ≤ (1 + ϵ)⌈log1+ϵ x⌉ < (1 + ϵ)1+log1+ϵ x = (1 + ϵ)x. If k2 < x < k, then f (x) = k − ⌈(1 + ϵ)⌈log1+ϵ (k−x)⌉−1 ⌉ ≤ k − (1 + ϵ)⌈log1+ϵ (k−x)⌉−1 . Since k − x = (1 + ϵ)log1+ϵ (k−x) ≤ (1 + ϵ)⌈log1+ϵ (k−x)⌉ , by adding the above two inequalities about f (x) and k − x we can obtain that f (x) + k − x ≤ k + ϵ(1 + ϵ)⌈log1+ϵ (k−x)⌉−1 < k + ϵ(k − x). It shows that f (x) ≤ x + ϵ(k − x) for k2 < x < k. If
k 2
< x < k then k − x < x. The following two corollaries follow immediately from Lemma 1.
Corollary 1. It is valid that x ≤ f (x) ≤ (1 + ϵ)x for all integers 0 ≤ x ≤ k. Corollary 2. It is valid that x ≤ g(x) ≤
ϵk m
+ (1 + ϵ)x for all integers 0 ≤ x ≤ k.
Let Kϵ (k) = {⌊(1 + ϵ)l ⌋, k − ⌈(1 + ϵ)l−1 ⌉ | 0 < (1 + ϵ)l ≤ k2 , l ∈ N} and Kϵ (k, m) = {0, ⌊ ϵk m ⌋, k} ∪ {⌊(1 + k l < (1 + ϵ) ≤ , l ∈ N}. It is easy to check that f (x) ∈ K (k) and g(x) ∈ Kϵ (k, m). ϵ) ⌋, k − ⌈(1 + ϵ)l−1 ⌉ | ϵk ϵ m 2 Let M = {1, . . . , m} be the index set of m cores. Let p(j) be the size of the resource allocated to the core m j and p(k, m) be the resource partition with m cores and k shared resources such that j=1 p(j) ≤ k. We denote by b the number of different resource sizes (or resource levels, in other words) in a partition, ri the size of ith resource and ci the number of cores with allocated resource of at least ri . It is clear that there are c1 cores with resource level one and (ci − ci−1 ) cores with resource level i for 2 ≤ i ≤ b. Let G(m) = {1, 2, 22 , . . . , 2⌊log2 m⌋ , m}. Let r = (r1 , . . . , rb ) and c = (c1 , . . . , cb ). For fixed m and k, the following definition gives a characteristic partition set, for which we will show that its size only depends on m. l
Definition 3. The characteristic partition set with k resources and m cores, CPS ϵ (k, m), is defined as follows, b CPS ϵ (k, m) = (b, r, c) | r1 c1 + ri (ci − ci−1 ) ≤ (1 + 2ϵ)k , (1) i=2
where b ∈ {1, 2, . . . , ⌊log2 m⌋ + 1},
(2)
r1 > · · · > rb
and ri ∈ Kϵ (k, m),
(3)
c1 < · · · < cb
and ci ∈ G(m), 1 ≤ i ≤ b.
(4)
Lemma 2. The size of the characteristic partition set CPS ϵ (k, m) is polynomial in m and is independent of k. Proof. Let (b, r, c) ∈ CPS ϵ (k, m). By condition (3) in the definition of CPS ϵ (k, m), the values of sequence r1 , . . . , rb are strictly decreasing and ri ∈ Kϵ (k, m). Therefore the number of sequences r1 , . . . , rb is bounded by the number of the subsets of Kϵ (k, m), which is O(2|Kϵ (k,m)| ). Similarly, the number of sequences c1 , . . . , cb is bounded by the number of the subsets of G(m), which is O(2|G(m)| ). Hence the number of (b, r, c) is bounded by O(2|Kϵ (k,m)| ) · O(2|G(m)| ). k ϵk k l Since l is restricted to ϵk m < (1 + ϵ) ≤ 2 in Kϵ (k, m), the candidate range of l is log1+ϵ m < l ≤ log1+ϵ 2 . k ϵk m The number of values of l is at most log1+ϵ 2 − log1+ϵ m , which equals log1+ϵ 2ϵ . We can see that the size
X. Chen, D. Ye / Discrete Optimization 20 (2016) 11–22
15
m m 2ϵ + 3, that is, |Kϵ (k, m)| ≤ 2 log1+ϵ 2ϵ + 3. Clearly, |G(m)| = O(log2 m). 2 m log2 (1+ϵ) O(( 2ϵ ) ) · O(m). Notice that ϵ is a constant, so the size of CPS ϵ (k, m)
of set Kϵ (k, m) is at most 2 log1+ϵ
It follows that |CPS ϵ (k, m)| = is polynomial in m. The independence of k is obviously true from the evaluation of |CPS ϵ (k, m)|.
Without loss of generality, we assume that a resource partition p(k, m) is indexed according to the non-increasing order of their resource allocation in this solution, that is, p(1) ≥ · · · ≥ p(m). In the following, Algorithm P2CPA transforms a resource partition p(k, m) into some characteristic partition. Algorithm 1 P2CPA (Mapping a partition into a characteristic partition). Input: m A resource partition p(k, m) with non-increasing order such that i=1 p(i) ≤ k. Output: A new resource partition p′ (k ′ , m) such that k ′ ≥ k and p(j) ∈ Kϵ (k, m). /*Step 1.*/ for each core j do Calculate ϕ(j) = g(p(j)) by using function g from Definition 2; end for /*Step 2. Calculate the sum of ϕ(j).*/ m Set k ′ = j=1 ϕ(j); /*Step 3. Construct a new resource allocation p′ (k ′ , m) */ p′ (1) = ϕ(1); for j = 2 to 2⌊log2 m⌋ do Find l such that 2l−1 < j ≤ 2l ; p′ (j) = ϕ(l); end for for j = 2⌊log2 m⌋ + 1 to m do p′ (j) = 0 end for return p′ (k ′ , m).
Lemma 3. Given any non-increasing ordered resource partition p(k, m) and m ≥ 3, Algorithm P2CPA (Algorithm 1) returns a new resource partition p′ with total resources of at most (1 + 2ϵ)k such that p(j) ≤ p′ (⌈ 2j ⌉) for 1 ≤ j ≤ m. Moreover, p′ ((1 + 2ϵ)k, m) belongs to some characteristic partition (b′ , r′ , c′ ) in CPS ϵ (k, m). Proof. Let p(k, m) be a given non-increasing ordered partition with total resources of at most k. To prove that p′ (k ′ , m) belongs to some characteristic partition in CPS ϵ (k, m), we must show that k ′ ≤ (1 + 2ϵ)k, and b′ , r′ , c′ are all satisfied with the constraint conditions (2), (3), (4). The constraint condition (4) follows immediately from the construction of p′ (k ′ , m) at step 3 in Algorithm P2CPA. m At step 2 in Algorithm P2CPA, k ′ = g is given by Definition 2. j=1 g(p(j)), where function m Corollary 2 shows that g(p(j)) ≤ ϵk + (1 + ϵ)p(j) for 1 ≤ j ≤ m. Since j=1 p(j) ≤ k, we obtain that m m ϵk ′ ′ k ≤ j=1 ( m + (1 + ϵ)p(j)) ≤ ϵk + (1 + ϵ)k. It follows that k ≤ (1 + 2ϵ)k. Notice that b′ is the number of different resource sizes in partition p′ (k ′ , m). From the construction of p′ (k ′ , m) at step 3 in the Algorithm P2CPA, it is clear that b′ ≤ ⌊log2 m⌋ + 1, then condition (2) is satisfied. By the definitions of f and g, we obtain the following two results. (i) g(p(j)) ∈ Kϵ (k, m), this leads to p′ (j) ∈ Kϵ (k, m). (ii) if p(i) ≤ p(j) then g(p(i)) ≤ g(p(j)). Since p(1) ≥ p(2) ≥ · · · ≥ p(m) by the
16
X. Chen, D. Ye / Discrete Optimization 20 (2016) 11–22
assumption and rl′ , . . . , rb′ denote the different resource sizes in p′ (k ′ , m), we have r1 > · · · > rb . So condition (3) has been proven. Finally, we will show that p(j) ≤ p′ (⌈ 2j ⌉) holds for 1 ≤ j ≤ m. By Corollary 2, we have p(j) ≤ ϕ(j) where ϕ(j) = g(p(j)). By observing the construction of p′ , we have p′ (2l ) = ϕ(2l ) for l = 0, 1, . . . , ⌊log2 m⌋. These lead to p′ (2l ) ≥ p(2l ). So it follows that p′ (i) ≥ p(j) for all 2l−1 ≤ i ≤ 2l and all 2l ≤ j ≤ 2l+1 . Let i = ⌈ 2j ⌉ for each 2l ≤ j ≤ 2l+1 , then we obtain p(j) ≤ p′ (⌈ 2j ⌉). When the parameters are clear from the context, we use p to denote p(k, m). Let S be any job assignment and S(i) be the set of jobs assigned to the core i. Let C(p, S) denote the makespan generated by the resource partition p(k, m) and the job assignment S. Given p and S, the following algorithm MPA adjusts to a characteristic partition p′ and a new job assignment S ′ . Algorithm 2 MPA (Modified resource partition and job assignment). Input: A resource partition and a job assignment (p(k, m), S) with core set M , and job set J. Output: A new resource partition and a job assignment (p′ (k ′ , m), S ′ ). /*Step 1.*/ Call algorithm P2CPA to get the new resource partition p′ (k ′ , m). /*Step 2. Construct a new job assignment S ′ by changing the assignment of the jobs assigned by S.*/ if m is odd then S(m + 1) = ∅; end if for j = 1 to ⌈ m 2 ⌉ do ′ S (j) = S(2j − 1) ∪ S(2j); end for return p′ (k ′ , m) and S ′ .
Lemma 4. Let (p, S) be any resource partition and job assignment with makespan of C(p, S) by using m cores and k resources, then algorithm MPA (Algorithm 2) returns a characteristic partition p′ and a job assignment S ′ with a makespan of at most 2C(p, S) by using ⌈ m 2 ⌉ cores and (1 + 2ϵ)k resources such that p′ ((1 + 2ϵ)k, m) ∈ CPS ϵ (k, m). Proof. Let (p, S) be a resource partition and a job assignment that uses m cores, k resources and its makespan is C(p, S). By Lemma 3, we immediately obtain that p′ ((1 + 2ϵ)k, m) ∈ CPS ϵ (k, m). We next prove that ′ ′ S ′ (i) = ∅ for ⌈ m 2 ⌉ < j ≤ m and C(p , S ) ≤ 2C(p, S). At step 2 in the algorithm MPA, for i = 1, 2, . . . , ⌈ m 2 ⌉, we move the jobs on the cores 2i − 1 and 2i to the m ′ core i, that is, S (i) = S(2i−1)∪S(2i). Since 2⌈ 2 ⌉ ≥ m, we have S(2i−1) = S(2i) = ∅ for ⌈ m 2 ⌉+1 ≤ i ≤ m. m Hence, S ′ (i) = ∅ for ⌈ m ⌉ + 1 ≤ i ≤ m, it means that all jobs are assigned to the first ⌈ ⌉ cores. 2 2 By Lemma 3, p(j) ≤ p′ (⌈ 2j ⌉). It implies that p(2j − 1) ≤ p′ (j) and p(2j) ≤ p′ (j). Therefore the load of the core j in (p′ , S ′ ) is at most the load on the core 2j − 1 plus the load on the core 2j in (p, S). Hence, we get the conclusion that C(p′ , S ′ ) ≤ 2C(p(k, m), S). Theorem 3. Let (p, S) be a resource partition and a job assignment with makespan C(p, S) by using m cores and k resources, then algorithm UPA (Algorithm 3) returns a new characteristic partition p′ and a new job assignment S ′ with makespan at most 4C(p, S) by using ⌈ m 4 ⌉ + 1 cores and k resources.
X. Chen, D. Ye / Discrete Optimization 20 (2016) 11–22
17
Algorithm 3 UPA (Upgraded resource partition and job assignment). Input: Resource partition p(k, m) and job assignment S with core set M , job set J, m ≥ 3. Output: A new resource partition p′ (k ′ , m) and a job assignment S ′ . /*Step 1.*/ Set ϵ = 14 ; Calculate p′ (1) = f (p(1)) by using Definition 1; S ′ (1) = S(1) ∪ S(2); /*Step 2.*/ if m is odd then S(m + 1) = ∅; end if for j = 2 to ⌈ m 2 ⌉ do S1 (j) = S(2j − 1) ∪ S(2j); p1 (j) = ϕ(2j − 1); end for /*Step 3. */ Set new core collection M1 = {2, 3, . . . , ⌈ m 2 ⌉}; Set new job collection J1 = J\S ′ (1); Apply algorithm MPA for job set J1 and core set M1 to get the resource partition p′1 and the job assignment S1′ ; for j = 2 to ⌈ m 2 ⌉ do S ′ (j) = S1′ (j); p′ (j) = p′1 (j); end for /*Step 4. */ for j = ⌈ m 2 ⌉ + 1 to m do ′ S (j) = ∅, p′ (j) = 0; end for return The resource partition p′ and the job assignment S ′ . Proof. Let (p(k, m), S) be a resource partition and a job assignment with core set M , and job set J such that m ≥ 3. Let (p′1 , S1′ ) be the result of algorithm MPA (Algorithm 2) with job set J1 , core set M1 and k−p(1) k1 resources, where J1 = J \ (S(1) ∪ S(2)), M1 = {2, 3, . . . , ⌈ m . By Lemma 4, algorithm 2 ⌉} and k1 ≤ 2 k−p(1) MPA uses at most (1 + 2ϵ) 2 resources and the makespan C(p′ , S ′ ) is at most 2C(p1 , S1 ). Since p is non-increasing we have C(p1 , S1 ) ≤ 2C(p, S). So C(p′ , S ′ ) ≤ 4C(p, S). By the definition of f and Lemma 1, we obtain that p′ (1) ≤ (1 + ϵ)p(1) if 0 < p(1) ≤ k2 and p′ (1) ≤ p(1) + ϵ(k − p(1)) if k2 < p(1) < k. Notice that k = p(1) + odd i≥3 (p(i − 1) + p(i)) ≥ p(1) + 2 odd i≥3 p(i) since p is non-increasing. Therefore we get that odd i≥3 p(i) ≤ k−p(1) . 2 m If 0 < p(1) ≤ k2 , then k − j=1 p′ (j) ≥ k − [(1 + ϵ)(p(1)) + (1 + 2ϵ) k−p(1) ] = ( 21 − ϵ)k − 12 p(1) ≥ 2 m ( 21 − ϵ)k − 14 k = ( 14 − ϵ)k. If k2 < p(1) < k, then k − j=1 p′ (j) ≥ k − [p(1) + ϵ(k − p(1)) + (1 + 2ϵ) k−p(1) ]= 2 (1 − ϵ)(k − p(1)) − ( 21 + ϵ)(k − p(1)) = k−p(1) > 0. Therefore if we choose ϵ ≤ 14 then the sum of resources 2 used by the algorithm UPA will not exceed k. It is clear that algorithm UPA uses at most ⌈ m 4 ⌉ + 1 cores from step 3 and step 4.
X. Chen, D. Ye / Discrete Optimization 20 (2016) 11–22
18
Theorem 4. The approximation ratio of algorithm EPA (Algorithm 4) is at most 8. The running time of the algorithm is polynomial in log2 k, m and n. Proof. Let OPT (p∗ , S ∗ ) be the makespan generated by an optimal algorithm, where (p∗ , S ∗ ) is the optimal resource partition and the job assignment. Using algorithm EPA with input (p∗ , S ∗ ) and by Theorem 3, we get a new partition p′ and a new job assignment S ′ such that C(p′ , S ′ ) ≤ 4OPT (p∗ , S ∗ ). Since we apply Lenstra’s 2-approximation algorithm [3] to schedule the jobs for the partition p′ , the makespan generated by the algorithm EPA is at most 2C(p′ , S ′ ). Hence, the makespan of algorithm EPA is at most 8OPT (p∗ , S ∗ ), and then the approximation ratio is at most 8. Since Kϵ (k) = {⌊(1 + ϵ)l ⌋, k − ⌈(1 + ϵ)l−1 ⌉ | 0 < (1 + ϵ)l ≤ k2 , l ∈ N}, we have |Kϵ (k)| = O(log1+ϵ k2 ) by the same analysis of |Kϵ (k, m)| in Lemma 2. Moreover, the number of characteristic partitions CPS ϵ (k, m) is polynomial in m by Lemma 2. The number of iterations in algorithm EPA is O(|Kϵ (k)| · |CPS ϵ (k, m)|), which is polynomial in log2 k and m. Lenstra’s 2-approximation algorithm runs in a polynomial time in m and n. Therefore, the running time of algorithm EPA is polynomial in log2 k, m and n. Algorithm 4 EPA (Enhanced resource partition and job assignment). Input: Job set J with n jobs, core set M with m cores, shared resource k. Output: A resource partition p, a job assignment S, and the makespan C(p, S). /*Step 1. Initialization*/ Set ϵ = 41 ; Calculate Kϵ (k) = {⌊(1 + ϵ)l ⌋, k − ⌈(1 + ϵ)l−1 ⌉ | 0 < (1 + ϵ)l ≤ k2 , l ∈ N}; Set M inC = ∞; /*Step 2.*/ /*Enumerate the resource allocation on the first core*/ for each r in Kϵ (k) ∪ {k} do Set k1 = k−r 2 ; Calculate CP Sϵ (k1 , m) by using Definition 3; /*Enumerate other core’s resource */ for each resource partition p in CP Sϵ (k1 , m) do for each job j in job set {J} do /*Calculate the processing time of job j on core i */ Tij = Tj (p(i)); end for Use Lenstra’s 2-approximation Algorithm to assign all jobs; Get the job assignment S1 and makespan C1 ; if M inC > C1 then M inC = C1 , M inS = S1 , M inP = p; end if end for end for Set p = M inP , S = M inS, C(p, S) = M inC; return the resource partition p, the job assignment S and the makespan C(p, S).
X. Chen, D. Ye / Discrete Optimization 20 (2016) 11–22
19
3. Linear speedup with resource allocation In this section, we study a special case of the general model, in which the processing time of a job is linear in the amount of resources allocated to the core it is assigned to. Equivalently, in the linear speedup model, we suppose that the speed of a core linearly depends on the amount of resources. Specially, let si be the amount of resources allocated to the core i, then the speed of the core is αi si + 1, where αi is the speed rate of core i. For each job j ∈ J, if it has a non-negative work load Lj , then the actual processing time Tij of job j on core i is defined to be Tij = Tj (si ) =
Lj . αi si + 1
We can adopt a related machine scheduling algorithm to solve this problem when the resource partition is fixed, since Tij depends on si uniformly for every job j (one can regard αi si + 1 as the speed of core i). There exists a PTAS for the related machine scheduling problem by Hochbaum and Shmoys [1]. We modify Algorithm 4 by replacing Lenstra’s 2-approximation algorithm with the PTAS for the related machine scheduling. Then, we obtain the following corollary which can be deduced directly from Theorem 4. Corollary 5. For any given ϵ1 > 0, the modified algorithm of algorithm EPA (Algorithm 4) achieves an approximation ratio of at most 4(1 + ϵ1 ) for the linear speedup model. In the following, we present another algorithm with an approximation ratio that depends on α, where α is the maximal speed rate over all the cores, i.e. α = maxi αi . We formulate this linear model by an integer linear program. We first guess a number T , and then check whether there exists a feasible solution such that the makespan is at most T . We find such T via a binary search method. Now assume that we are given the guess T . Let xij be the indicator variables, and their values are 1 if the job j is assigned to the core i, and 0 otherwise. The following Integer Program (IP) has a solution if there is a feasible resource partition and a job assignment with a makespan of at most T , denoted by IP(T ), where m xij = 1, ∀ j = 1, . . . , n, (5) i=1 m si ≤ k, (6) i=1
n j=1
xij
Lj ≤ T, αi si + 1
xij ∈ {0, 1},
∀ i = 1, . . . , m,
si ∈ {0, 1, . . . , k},
(7)
∀ i = 1, . . . n
and j = 1, . . . , m.
(8)
Constraints (5) ensure that each job j is assigned only to one core. Constraints (6) guarantee that the total amount of resources allocated to all cores is bounded by the capacity k. Constraints (7) make sure that the makespan is at most T . It is clear that the expression (7) can be rewritten as below, n xij Lj − T αi si ≤ T, ∀ i = 1, . . . , m. (9) j=1
We relax the integrality constraints as below, xij ≥ 0,
0 ≤ si ≤ k,
∀ i = 1, . . . n
and j = 1, . . . , m.
(10)
Let LP(T ) denote the above relaxed linear program with constraints (5), (6) and (9), (10). The objective is to minimize T such that the linear program LP(T ) has a feasible solution. To deal with the program LP(T ), we extend the rounding technique for the unrelated machine scheduling by Lenstra et al. [3]. Let pij be the processing time of job j on machine i. Let Ji (t) denote the set of jobs that
20
X. Chen, D. Ye / Discrete Optimization 20 (2016) 11–22
require no more than t time units on machine i, and let Mj (t) be the set of machines that can process job j in no more than t units of time. The following theorem is an important result to obtain an approximation ratio 2 for the unrelated machine scheduling problem. → − Theorem 6 (Lenstra’s Rounding Theorem [3]). Let P = (pij ) ∈ Zm×n , d = (d1 , . . . , dm ) ∈ Zm + and t ∈ Z+ . + → − If the linear program LP(P, d , t), given by xij = 1, for j = 1, . . . , n, i∈Mj (t)
pij xij ≤ di ,
for i = 1, . . . , m,
j∈Ji (t)
xij ≥ 0,
for j ∈ Ji (t), i = 1, . . . , m,
has a feasible solution, then any vertex x ¯ of this polytope can be rounded to a feasible solution x ¯ of the integer → − program IP(P, d , t), given by xij = 1, for j = 1, . . . , n, i∈Mj (t)
pij xij ≤ di + t,
for i = 1, . . . , m,
j∈Ji (t)
xij ∈ {0, 1}
for j ∈ Ji (t), i = 1, . . . , m,
and this rounding can be done in polynomial time. This result cannot be directly applied to our problem since we must take account of the resource partition. However, we can adopt the rounding idea from Lenstra’s Rounding Theorem solve the job assignment and resource partition jointly. Algorithm 5 PAA (Linear programming-based approximation algorithm). Input: Job set J with n jobs, core set M with m cores, shared resource k. Output: A resource partition p and a job assignment S with the minimal makespan C(p, S). n Step 1. Set U B = j=1 Lj /(αk + 1) and LB = U B/m. Find the smallest integer value T ⋆ such that LP (T ⋆ ) has a feasible solution by binary search in the interval [LB, U B]. Step 2. Find an extreme point solution (x, s) in LP (T ⋆ ). Step 3. Assign all jobs with integral values to the cores as indicated in x. Step 4. Round jobs that have fractional values to integral values x ¯ using Lenstra’s matching algorithm. /*To be self-content, we define a bipartite graph on vertex set J and set of cores M , let F be the set of jobs that are fractional values, then let H be the subgraph of bipartite J M induced on F M . There is an edge (i, j) if and only if 0 < xij < 1. Then, we assign these fractional jobs according to the perfect matching of H.*/ Step 5. Round the variable si in s down to the nearest integer value s¯ = ⌊si ⌋. return the resource partition p, the job assignment S and the makespan C(p, S).
Theorem 7. Algorithm PAA (Algorithm 5) achieves an approximation ratio of 2(1 + α) for the joint resource partition and job assignment problem, where α is the maximal speed rate of the cores. Moreover, for any given ϵ > 0, the integrality gap between LP and IP is at least 2 + α − αϵ − 2/m.
X. Chen, D. Ye / Discrete Optimization 20 (2016) 11–22
21
Table 1 Compare LP solution with IP solution. Problem model
LP(T )
IP(T )
Feasible solution x
1, if i = ⌊j/(m − 1)⌋ + 1 0, otherwise, 1 ≤ i ≤ m − 1, 1 ≤ j ≤ l − 1, x1(l−2) = · · · = x(m−2)(l−2) = 1/m, x(m−1)(l−2) = 2/m, x(m−1)(l−1) = 1, xml = 1
x ¯ij = xij , 1 ≤ j ≤ l − 3, x ¯(m−1)(l−2) = 1, x ¯(m−1)(l−1) = 1, x ¯ml = 1
Feasible solution s
s1 = · · · = sm−2 = 0, sm−1 = 1 − ϵ, sm = k + ϵ − 1
s¯1 = · · · = s¯m−2 = 0, s¯m−1 = 0, s¯m = k − 1
Makespan C
m
2m + αm(1 − ϵ) − 2
Approximation ratio r
1
2 + α − αϵ − 2/m
xij =
Proof. The conditions (6) and (9) of the resource partition problem in LP(T ) are linear. If we append these linear constraints to the Lenstra’s rounding theorem, and fix the variables s1 , . . . , sm , then the Lenstra’s rounding theorem is also satisfied. Moreover, the number of fractional jobs of the extreme point (x, s) is at most m. As defined in algorithm PAA, T ⋆ is the smallest integer value such that LP(T ⋆ ) has a feasible solution by binary search in the interval [LB, UB]. Now let t = T ⋆ , and by applying Theorem 6 (Lenstra’s rounding theorem), the complete time of core i for algorithm PAA (Algorithm 5) is at most 2T ⋆ and since T ⋆ ≤ OPT , it follows that the total load on core i is at most 2(1 + αi si )OPT . Since si ≤ ⌊si ⌋ + 1, we have (1 + αi si ) ≤ (1 + α)(1 + αi ⌊si ⌋) where α = maxi αi . After rounding si down to the nearest value ⌊si ⌋, the complete time of core i is at most 2(1 + α)OPT . Therefore the makespan of algorithm PAA is at most 2(1 + α)OPT . To show the lower bound of our algorithm, we explore the integrality gap. Consider the following instance with l = m2 − 2m + 3 jobs. Each of the first m2 − 2m jobs takes one unit load, and the remaining three jobs have loads with m, αm(1 − ϵ) and m + αm(k + ϵ − 1), respectively. Let T ⋆ = m, Table 1 lists a feasible solution and one responding result which algorithm PAA (Algorithm 5) could be given. We only list non-zero variables here. Clearly, the optimal makespan OPT is at most m: allocate all k resources to the last core, assign the last two jobs to the last core, the last third job is assigned to the last second core, and m of the remaining m2 − 2m jobs are assigned to each different core among the remaining m − 2 cores. From Table 1, the makespan generated in the integer programming is 2m + αm(1 − ϵ) − 2. Hence, the integrality gap and the lower bound of our algorithm are at least 2 + α − αϵ − 2/m. 4. Conclusions & future work In this paper, we investigated the joint resource partition and job scheduling problem. The size of the resources allocated to a core is fixed, and the sum of the resources is bounded by a given capacity. The processing time of a job depends on the size of the resources allocated to that core has been assigned to. The objective is to find a resource partition scheme and a job assignment algorithm such that the makespan is minimized. Our proposed algorithm improves the previous approximation ratio 36 [7] to 8. Moreover, we studied a special case that the processing time function is linear in the size of the resources. Some interesting problems have arisen during this research work. In our algorithm, we use about one quarter of the total cores, and it is not clear whether it would help by using more cores. Our algorithm as well as the proposed algorithm in [7] is a two-level based algorithm. A joint algorithm via integer linear programming for the linear model improved the approximation ratio. It remains whether a joint algorithm can do the same works for the general model. Our proposed algorithms are valid for identical cores, but it would be a challenge to design algorithms if the processing time of a job also depends on which core it is assigned to. Previous work [11] showed that the approximation ratio is smaller for our problem if a dynamic
22
X. Chen, D. Ye / Discrete Optimization 20 (2016) 11–22
resource partitioning is allowed. So, it would be interesting to study in the future whether the result can be further improved by considering both the dynamic resource partitioning and the dynamic job assignment. Acknowledgments The authors thank the anonymous referees, the editor, and Joseph Timoney for their helpful comments to improve the presentation of this paper. This research was supported in part by NSFC (11071215, 11201105). References [1] D. Hochbaum, D. Shmoys, A polynomial approximation scheme for scheduling on uniform processors: Using the dual approximation approach, SIAM J. Comput. 17 (3) (1988) 539–551. [2] M.R. Garey, D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, San Francisco, 1979. [3] J. Lenstra, D.B. Shmoys, E. Tardos, Approximation algorithms for scheduling unrelated parallel machines, Math. Program. 46 (1–3) (1990) 259–271. [4] E.V. Shchepin, N. Vakhania, An optimal rounding gives a better approximation for scheduling unrelated machines, Oper. Res. Lett. 33 (2) (2005) 127–133. [5] T. Liu, Y. Zhao, M. Li, C. Xue, Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC, J. Parallel Distrib. Comput. 71 (11) (2011) 1473–1483. [6] T. Liu, M. Li, C. Xue, Instruction cache locking for multi-task real-time embedded systems, Real-Time Syst. 48 (2) (2012) 166–197. [7] A. Hassidim, H. Kaplan, O. Tuval, Joint cache partition and job assignment on multi-core processors, in: Proceedings of the 13th Algorithms and Data Structures Symposium, WADS, 2013, pp. 378–389. [8] A. Grigoriev, M. Sviridenko, M. Uetz, Unrelated parallel machine scheduling with resource dependent processing times, in: Proceeding of 11th International Conference on Integer Programming and Combinatorial Optimization, IPCO, 2005, pp. 182–195. [9] V.A. Kumar, M.V. Marathe, Approximation algorithms for scheduling on multiple machines, in: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS, 2005, pp. 254–263. [10] A. Grigoriev, M. Sviridenko, M. Uetz, LP rounding and an almost harmonic algorithm for scheduling with resource dependent processing times, in: Proceedings of 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX, 2006, pp. 140–151. [11] A. Grigoriev, M. Sviridenko, M. Uetz, Machine scheduling with resource dependent processing times, Math. Program. 110 (1) (2007) 209–228. [12] A. Grigoriev, M. Uetz, Scheduling jobs with time-resource tradeoff via nonlinear programming, Discrete Optim. 6 (4) (2009) 414–419. [13] H. Kellerer, An approximation algorithm for identical parallel machine scheduling with resource dependent processing times, Oper. Res. Lett. 36 (2) (2008) 157–159. [14] A. Grigoriev, M. Uetz, Scheduling parallel jobs with linear speedup, in: Proceedings of the 3rd International Conference on Approximation and Online Algorithms, WAOA, 2005, pp. 203–215. [15] H. Kellerer, V.A. Strusevich, Scheduling parallel dedicated machines with the speeding-up resource, Nav. Res. Logist. 55 (5) (2008) 377–389. [16] E. Edis, C. Oguz, Parallel machine scheduling with flexible resources, Comput. Ind. Eng. 63 (2) (2012) 433–447. [17] A. L´ opez-Ortiz, A. Salinger, Paging for multi-core shared caches, in: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS, 2012, pp. 113–127. [18] A. Hassidim, Cache replacement policies for multicore processors, in: Proceedings of 1st Symposium on Innovations in Computer Science, Tsinghua University Press, 2010. [19] E. Feuerstein, A. Strejilevich de Loma, On-line multi-threaded paging, Algorithmica 32 (1) (2002) 36–60.