Power aware scheduling of preemptive jobs on multi-core processors Rafał Różycki Poznań University of Technology, Institute of Computing Science, ul.Piotrowo 2, 60-965 Poznań, Poland (Tel: 48-61-8790-790; e-mail:
[email protected]). Abstract: The problem of power aware scheduling of preemptive jobs on a multi-core processor is considered. The jobs may be performed with different processing rates using different amount of power. Cores of a processor, possibly with different characteristics, are driven by a common power source with a limited capacity. The problem is to find both: an assignment of jobs to cores and simultaneously a power allocation in order to minimize the schedule length. We describe a general method for solving the problem and propose an approach for finding the solution in the case where number of cores is equal to the number of jobs. Keywords: deterministic scheduling problem, preemptive jobs, uniform machines 1. INTRODUCTION In the past few years the idea of green computing, seen as an efficient utilization of computer resources, achieves a growing interest. Since during the computing, energy is consumed as a main resource, appropriate power management is a basic technique for applying the green computing principles. In this paper we consider a problem of scheduling computational jobs on a multi-core processor driven by a power source with a limited capacity. We assume the modest architectures of processors where heterogeneous cores are devoted to perform specialized tasks. As an example of such a processor one can mention the IBM Cell processor (Riley et al. (2007)) which combines a generalpurpose core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as many other forms of dedicated computation. We assume that a processing rate and power used for processing are strongly related. This relation may be formalized by using different models of job processing. The power vs. processing speed model has been used up to now commonly. Various scheduling problems were formulated basing on the assumption that the resource (energy) consumption depends on the processing rate of a processor. Some exact and heuristic algorithms were proposed to solve both the deterministic and non-deterministic cases under this assumption (see Bansal et al. (2005), Irani and Pruhs (2005). as surveys). We use a different model of job processing which is more general than the classical one. The considered problem may be classified in the deterministic scheduling theory as a generalization of a classical scheduling problem with parallel uniform machines. The similar problem of a non-preemptable scheduling was studied in Jozefowska et al. (2004).
In Section 2 the problem is formulated. Sections 3 and 4 describe the general solution approaches for different special cases of the problem. Section 5 summarizes the paper. 2. PROBLEM FORMULATION Consider a set of n independent, preemptive jobs and m parallel machines. All jobs are ready to be processed at the same moment. Each job requires for its processing a machine and an amount of energy. We assume that energy is a renewable resource, thus power is limited only and it is available in a constant amount over time. Each job is performed by at most one machine at a time and a machine is able to process at most one job at a moment. We utilize a general model where a temporal processing rate of a job depends on both: an amount of the power and a machine, which are allotted to this job at a moment. Since power is a continuously divisible resource, this relation is expressed by the processing speed function sij(·), i=1,2,...,n, j=1,2,...,m in the following way: x& ij (t ) =
dxij (t ) dt
= y ij (t ) ⋅ s ij ( pi (t )), xij (0) = 0
(1)
where •
yij(t) is a binary variable indicating if machine j is assigned to job i at time t ⎧1 y ij (t ) = ⎨ ⎩0
• • •
if machine j is assigned to job i at time t otherwise
xij(t) – a state of job i on machine j at time t (xij(t) = 0, i = 1, 2, ..., n, j = 1, 2, ..., m), sij(⋅) - an increasing (positive), continuous speed function of job i on machine j, sij(0)=0 pi(t) - an amount of power allotted to job i at time t.
We will assume that processing speed function of job i on machine j, sij(⋅), in model (1) is strictly concave. Such an assumption is suitable to express the real relation between the temporal power usage and the processing rate in contemporary microprocessor systems. The value of xij(t) is an objective measure of work related to the processing of a part of job i on machine j. This can be, for example, the number of CPU cycles already consumed for performing a part of a computer program on a particular core of a multi-core processor. The state xi(t) of job i at time t, is given as a sum of states of this job on all machines at time t:
xi (t ) =
m
∑ xij (t )
3. BASIC TIME OPTIMAL PROPERTIES FOR n ≤ m Assume now that the number of jobs n is less than or equal to the number m of machines (n ≤ m). Notice that even in this simple situation allocation of power to a job depends on the speed function of the machine where it is assigned to. In other words the allocation of power is related to the assignment of jobs to the machines. In consequence even for n ≤ m the problem has mixed discrete-continuous nature and can be divided into two strongly interrelated subproblems: i) the problem of assigning jobs to machines (discrete resource allocation), and simultaneously ii) the problem of power allocating to the jobs already assigned.
j =1
Since each job is characterized by its processing demand wi (wi ≥ 0). The processing demand wi of job i is accomplished if the following condition is satisfied:
wi =
m Ci
∑ ∫ yij (t )sij [ pi (t )]dt
(2)
j =1 0
The processing demand wi can be measured as a required number of CPU cycles. Without loss of generality we can assume that the total available power is equal to one (pi(t) ∈ [0,1], i=1,2,...,n) and in a feasible schedule: n
∑ pi (t ) ≤ 1 .
(3)
i =1
3.1 Assignment of jobs to machines Let us denote by Ak m-element vector describing the kth (k=1,2,..., KA) possible assignment of jobs to machines. Value of Ak[j] (0 ≤ Ak[j] ≤ n) denotes an index of the job assigned to machine j in Ak. The jth element of Ak is equal to zero when machine j is idle. The assignment of jobs to machines Ak is called potentially optimal, if each machine performs at most one job, and each job is assigned to a single machine (Ak[j] ≠Ak[j’] for each j ≠ j’). The total number of such potentially optimal assignments KA grows exponentially with n and m. It equals KA= ⎛⎜ m ⎞⎟ ⋅ n! for n ≤ m. ⎜ m − n⎟ ⎝ ⎠
For example, the set A of all potentially optimal assignments for m=3, n=2 is as follows:
The objective is to find the vector function p(t)=[p1(t), p2(t), ..., pn(t) ], pi(t)≥0, i=1,2,...,n, which respects the limited amount of power, as well as the restricted number of available machines and minimizes the schedule length T = max {Ci }
⎧⎡ 1 ⎤ ⎡ 0⎤ ⎡ 2⎤ ⎡ 0 ⎤ ⎡ 2⎤ ⎡ 1 ⎤ ⎫
A= ⎪⎨⎢ 2⎥ , ⎢1⎥ , ⎢0⎥ , ⎢ 2⎥, ⎢1⎥ , ⎢0⎥ ⎪⎬ . ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎪ ⎢ 0 ⎥ ⎢ 2⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 2 ⎥ ⎪ ⎩⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎭
i =1,...,n
Optimal values of p(t) and T will be denoted by p*(t) and T* respectively. It is important to notice that the above model of job processing is general enough to cover the practical situations where fluctuations in an allocation of power to a job during its processing are allowed. Particularly, it is possible to consider a preemption of jobs with the model. Nevertheless, in this paper we consider a special case of the problem where the speed function does not depend on the job. In this case machine j is characterized by the machine speed function sj(·), j=1,2,...m. Let us denote by Jj a set of jobs performed on machine j. Thus processing rate of each job performed on machine j (j = 1, 2, ..., m) is defined by the same machine speed function sj(i) (sj(i)= sij, i∈Jj). This special case of the general problem is called power aware scheduling on parallel uniform machines.
3.2 Optimal power allocation In this section we will consider the problem of the optimal power allocating only, thus we will assume that the assignment of jobs to machines is already known. The problem for the machine independent speed functions has been studied in a number of papers(see Józefowska (1997) as a survey). It is worth reminding that if the processing rates of jobs are machine independent, the problem with n ≤ m has a pure continuous nature, since the machines were undistinguishable. However, below we recall some basic properties of the time optimal allocations of power which are useful for our succeeding considerations. Following Józefowska and Węglarz (1998), let us denote by P the set of feasible power allocations, i.e. the set of all points p=(p1, p2,…, pn) ∈ Rn , pi ≥ 0, i=1,2,…,n, satisfying the relation (3). Further we will denote by Vk, k=1,2,…, ⎛ m ⎞ ⎜ ⎟ ⋅ n! the set defined as follows: ⎜m − n⎟ ⎝ ⎠
vk=(v1k, v2k,…, vnk) ∈ Vk if and only if p∈ P and vik = s j (i ,k ) ( pi ) , i=1,2,…,n
(4)
where s j (i ,k ) (⋅) is the speed function of the machine where the
Węglarz (1981) proved that in the considered case the optimal power allocation for a potentially optimal assignment Ak is always reached at the intersection point of the straight line given by the parametric equations:
job i is assigned to in Ak. Points vk are called Ak-transformed power allocations.
b)
According to (4), the shapes of sets Vk and coVk are related to the characteristics of functions s j (⋅) j=1,2,…,m and to the
sj(2,1)(1)
given potentially optimal assignment Ak of jobs to machines.
V1
Corollary 1
v11 p2
The optimal power allocation for strictly concave processing speed functions has the following form:
sj(1,1)(1)
•
(5)
and the boundary of convex hull of Vk (coVk - convex hull of Vk).
v21
1
vi = wi / T , i=1,2,…,n
pi* (t ) = pi* = s −j (1i ) ( wi / T ' ), i = 1,..., n, t ∈< 0, T ' >
assignment A1
a)
where T’’ is the positive root of the equation:
P
n
•
∑ s −j (1i ) (wi / T ) = 1
p1
(6)
i =1
1 assignment A2
Corollary 1 is the consequence of the fact that for concave processing speed functions s j (⋅) j=1,2,…,m, set Vk is
v22 sj(2,2)(1)
always equal to coVk and thus jobs are processed fully in parallel using constant power amounts pi*, i=1,2,…,n.
c)
V2
v12 sj(1,2)(1)
Fig. 1. The example sets of: a) the feasible power allocations – P, b) the A1-transformed power allocations – V1, c) the A2-transformed power allocations – V2.
Moreover, the optimal vector of power allocation p* calculated using (6) guarantees that jobs are finished at the same moment. In general, the minimal T for a given feasible assignment can be found from (6) numerically. Nevertheless, in some important from the practical point of view cases, where the functions: s j (⋅) = a j p
v2·
1/α j
, a j > 0, α j ∈ {1, 2, 3, 4} , j = 1, 2, …,m,
(7)
the solution of equation (6) may be found analytically. Especially, it is commonly assumed that in the case of the microprocessor CMOS technology αj=α=3 for processing speed functions of the form (7). V1 ∪V2
v1· Fig. 2. The example set of all (V1 ∪V2)possible transformed power allocations for n = m = 2.
In consequence, in a general case, the optimal power allocations for a given Ak for strictly concave speed functions is calculated as: pi* = s −j (1i ,k ) (wi / T ), i = 1,2,...,n.
(8)
3.3 Solving the n ≤ m case In order to find an optimal solution to the problem an optimal power allocation has to be calculated for the jobs assigned to machines optimally. In general, such an optimal assignment of jobs to machines is unknown. Nevertheless, one can solve the problem utilizing the concept of a demand division. Demand division allows partitioning of the size of a preemptive job among its different assignments to the machines. Then, the following convex mathematical programming problem can be formulated and solved in order to find optimal solution: KA
minimize T= ∑ M k* (w Ak [1],k , w Ak [ 2],k ..., w Ak [ m ],k )
(9)
k =1
subject to
∑ wik = wi , i = 1, …, n,
(10)
wik ≥ 0 , i = 1, …, n, k∈Ki
(11)
Let us analyze the example from Figure 3. Particularly, for n=m=2, for each point (v11, v21)∈ V (representing e.g. the processing rates of jobs for a potentially optimal assignment A1 where machine 1 performs job 1 , and machine 2 executes job 2), there exist a point (v12, v22)=(v21, v11)∈ V (representing the processing rates of jobs for an assignment A2 where machine 2 performs job 1 , and machine 1 executes job 2). Thus, the convex hull of the set V = V1 ∪ V2 contains the line segment between points v1 and v2. The optimal processing rates represented by point vx may be derived from two different assignments of jobs to machines. Two different assignments of jobs to machines lead to the situation where a job may be executed on two different machines in the schedule. Of course, for preemptable jobs it is possible situation, concerning that a job is performed on at most one machine at a moment.
k∈Ki
where Mk* is a unique positive root of the equation (12), KA is the number of all potentially optimal assignments ⎛ m ⎞ ⎟⎟ ⋅ n! ). (KA = ⎜⎜ ⎝ m − n⎠
∑
s −j 1 ( w Ak [ j ],k / M k ) = 1
(12)
j =1,..,m: Ak [ j ]≠ 0
In consequence, as we have already noticed optimal processing rates represented by point vx must be derived as the convex combination of the transformed resource allocations v1 and v2.
v2•
•
•v
x
where w Ak [ j ],k is a part of processing demand of a job performed on machine j in k-th potentially optimal assignment. Since the number of variables of the mathematical programming problems (9) – (11) grows exponentially with the number of jobs, the above approach is computationally intractable even for small problem instances. Thus, it is advisable to recognize the special cases of the problem which can be solved more efficiently. Let us start with the simplest situation where n = m. It is reasonable to focus our attention on the shape of the set V = U Vk first. At this moment it is important to remind that the processing rates of a job performed on a machine does not depend on this job. The processing rate of a job is related to the index of the assigned machine as well as to the allocated amount of power. In consequence, each transformed resource allocation vk (processing rates of jobs) for potentially optimal assignment Ak is related to the set of other transformed resource allocations {v1,...,vk-1, vk+1...,vn!} which differ only with the assignment of jobs to machines - the allocation of power to the machines (and consequently a vector of the processing rates of machines) is exactly the same. It leads to the observation that elements of the sets V1,V2,..., VK are permutations of values related to the particular power allocations. Therefore, one can conclude that the set V = U Vk is always symmetric to the straight line linking the points (0,0,0,...,0) and (1,1,1,...,1) in n-dimensional vector space.
v=~ x / C max
v1
•
v2
v1•
Fig. 3. The optimal processing rates (point vx) of jobs for the instance of the problem (n=m=2) The convex hull coV of set V (V = U Vk) for n = m contains a polygon P of the following property. Corollary 2 For n = m and concave functions s j (⋅) j=1,2,...,m convex hull coV of set V contains a polygon P spanned on the n! vertices, where vertex k (k=1,2,...,n!) represents the maximum sum of the processing rates of n jobs in the Aktransformed resource allocation. Proof. Proof bases on a geometric interpretation of the points related to maximum sum of the processing rates of n jobs. Assume that the points v1, v 2, ... v n! related to the maximum sum of the processing rates are known and the straight line (5) intersects the polygon P in point v x. Then, point v x can be expressed as a convex linear combination of points v 1, v 2, ... v n! v x = ∑ λ k v k , where ∑ λ k = 1 , λk ≥ 0 , k = 1,2, ... n! (13) k =1,...,n!
k =1,..., n!
Since there exist μ , μ > 0 such that: w = μ ⋅ v x , then: w = μ⋅
∑ λk v k
k =1,...,n!
If there exist λk ≥ 0 , k = 1,2, ... n! such that (13) is fulfilled, the interpretation of the above equation is the following: μ ⋅ λk is the length of the part of the schedule (Mk*), whereas vk is the vector of processing rates of machines related to the assignment Ak in the optimal schedule. Thus, the minimal makespan T* can be calculated as: n!
n!
k =1
k =1
T * = ∑ M k* = μ ⋅ ∑ λ*k .
It is easy to observe that in order to find optimal Mk*, k = 1,2, ... n!, one can solve the following system of n linear equations: w=
∑ M k vk
k =1,..., n!
(14)
The common primary simplex procedure can be applied to determine if there exist such Mk, that Mk>0, k = 1,2, ... n! for the given instance of the problem. Moreover this effective procedure calculates directly the values of M k , k = 1,2, ... n! 4. PROPERTIES OF OPTIMAL SOLUTIONS FOR n > m It is proved that for concave functions sj(·), j=1,2,…,m, after the start of the schedule exactly m jobs are processed fully in parallel at each moment and the last m jobs are completed at the same time in the makespan optimal solution. None of machines may be idle in a potentially optimal assignment. Therefore, each potentially optimal assignment Zk for n > m is represented by an m-permutation of job indices. The number KZ of all m-permutations of n job indices ⎛n⎞ is equal to ⎜⎜ ⎟⎟ ⋅ m! . ⎝ m⎠ For example, the set Z of all potentially optimal assignments for m=2, n=3 is as follows:
⎧⎪⎡1⎤ ⎡2⎤ ⎡1⎤ ⎡3⎤ ⎡2⎤ ⎡3⎤ ⎫⎪ Z= ⎨ ⎢ ⎥ , ⎢ ⎥ , ⎢ ⎥ , ⎢ ⎥ , ⎢ ⎥ , ⎢ ⎥ ⎬ . ⎪⎩⎣2⎦ ⎣1 ⎦ ⎣3⎦ ⎣1⎦ ⎣3⎦ ⎣2⎦ ⎪⎭ A general solution method for the case where n>m bases on the assumption that every feasible assignment of jobs to m machines may appear in the optimal schedule of the problem with preemptive jobs. It is worth noticing that for each mpermutation Zk, k = 1, 2,…, KZ, an optimal allocation of power among machines depends on the amounts wik of each wi assigned to this m-permutation and is constant in time. Thus for a given feasible sequence S, an optimal power allocation may be found by solving the following non-linear (convex) mathematical programming problem: KZ
minimize T= ∑ M k* (~ x Z k [1], k , ~ x Z k [ 2],k ..., ~ x Z k [ m ],k ) k =1
subject to
(15)
∑ wik = wi , i = 1, …, n,
(16)
wik ≥ 0 , i = 1, …, n, k∈Ki
(17)
k∈K i
where Mk* is a unique positive root of the equation:
∑ s −j 1 ( wZ [ j ],k / M k ) = 1
j =1,..,m
k
(18)
Let us remind that the mathematical programming problem (15-17) can be solved numerically with given accuracy using available methods suited for solving convex mathematical programming problems. In some cases the optimal power allocation can be even found analytically. Since the number of variables of the mathematical programming problems (9) – (11) and (15) – (17) grows exponentially with the number of jobs, the above approach is computationally intractable even for small problem instances. Thus, it is justified to implement a kind of heuristics approach to find suboptimal solutions of practical problems in a relatively short time. 5. SUMMARY We propose a general approach for solving various cases of the considered problem. Although the methods are computationally inefficient, they can be the only way of finding the optimal solutions even for small instances of the problem. The optimal solutions gathered by the methods described in the paper may be then used as a reference base in experimental analysis of other heuristic approaches.
Acknowledgments. The research was partially supported by a grant from the State Committee for Scientific Research, Poland. REFERENCES Bansal N., Kimbrel T., Pruhs K.(2005). Dynamic Speed Scaling to Manage Energy and Temperature, Lecture Notes in Computer Science, 3404, 460-471. Irani S., Pruhs K. (2005). Algorithmic Problems in Power Management, ACM SIGACT News, 36 /2, 63-76. Józefowska J. (1997). Dyskretno-ciagłe problemy szeregowania zadań, Wydawnictwo Politechniki Poznańskiej, seria Rozprawy Nr 318, Poznań (in polish). Józefowska J., Róż ycki R., Węglarz J. (2004). Scheduling non-preemptable jobs on resource driven uniform parallel machines. Materials of 10th IEEE Intern. Conference MMAR’04, 1221-1225. Józefowska J., Węglarz J.(1998). On a methodology for discrete-continuous scheduling problems. European Journal of Operational Research, 107/2, 338-353. Riley M.W., Warnock J.D., Wendel D.F. (2007). Cell Broadband Engine processor: Design and implementation, IBM Journal of Research and Development, 51(5). Węglarz J. (1981). Project scheduling with continuously divisible, doubly constrained resources, Management Sci., 27, 1040-1052.