450 likes | 598 Views
Paul Regnier † George Lima † Ernesto Massa † Greg Levin ‡ Scott Brandt ‡ † Federal University of Bahia ‡ University of California Brazil Santa Cruz.
E N D
Paul Regnier† George Lima† Ernesto Massa† Greg Levin‡ Scott Brandt‡ †Federal University of Bahia‡ University of California Brazil Santa Cruz RUN: Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor
Multiprocessors • Most high-end computers today have multiple processors • In a busy computational environment, multiple processes compete for processor time • More processors means more scheduling complexity 2
Real-Time Multiprocessor Scheduling Real-time tasks have workload deadlines Hard real-time = “Meet all deadlines!” Problem: Schedule a set of periodic, hard real-time tasks on a multiprocessor systems so that all deadlines are met.
Example: EDF on One Processor • On a single processor, Earliest Deadline First (EDF) is optimal (it can schedule any feasible task set) 2 units of work for every 10 units of time Task 1: 6 units of work for every 15 units of time Task 2: 10 units of work for every 25 units of time Task 3: CPU 4 time = 0 10 20 30
Scheduling Three Tasks Example: 2 processors; 3 tasks, each with 2 units of work required every 3 time units job release deadline Task 1 Task 2 Task 3 1 2 3 time = 0 5
Global Schedule Example: 2 processors; 3 tasks, each with 2 units of work required every 3 time units • Task 1 migrates between processors CPU 1 CPU 2 1 2 3 time = 0 6
Taxonomy of Multiprocessor Scheduling Algorithms Uniprocessor Algorithms Dedicated Global Algorithms EDF LLF Uniprocessor Multiprocessor Optimal Algorithms Globalized Uniprocessor Algorithms pfair Partitioned Algorithms LLREF EKG DP-Wrap Partitioned EDF Global EDF 7
Problem Model n tasks running on m processors A periodic taskT = (p, e)requires a workloade be completed within each periodof length p T's utilizationu = e / p is the fraction of each period that the task must execute job release job release job release job release workloade p time = 0 2p 3p periodp
Assumptions Processor Identity: All processors are equivalent Task Independence: Tasks are independent Task Unity: Tasks run on one processor at a time Task Migration: Tasks may run on different processors at different times No overhead: free context switches and migrations In practice: built into WCET estimates
The Big Goal (Version 1) Design an optimal scheduling algorithm for periodic task sets on multiprocessors A task set is feasibleif there exists a schedule that meets all deadlines A scheduling algorithm is optimal if it can always schedule any feasible task set
Necessary and Sufficient Conditions Any set of tasks needing at most 1 processor for each task ( for all i, ui ≤ 1 ) , and mprocessors for all tasks ( ui ≤ m) is feasible Status: Solved pfair (1996) was the first optimal algorithm
The Big Goal (Version 2) Design an optimal scheduling algorithm with fewer context switches and migrations ( Finding feasible schedule with fewest migrations is NP-Complete )
The Big Goal (Version 2) Design an optimal scheduling algorithm with fewer context switches and migrations Status: Ongoing… All existing improvements over pfair use some form of deadline partitioning…
Deadline Partitioning Task 1 Task 2 Task 3 Task 4 14
Deadline Partitioning CPU 1 CPU 2 15
The Big Goal (Version 2) Design an optimal scheduling algorithm with fewer context switches and migrations Status: Ongoing… All existing improvements over pfair use some form of deadline partitioning… … but all the activity in each time window still leads to a large amount of preemptions and migrations Our Contribution: The first optimal algorithm which does not depend on deadline partitioning, and which therefore has much lower overhead 16
The RUN Scheduling Algorithm RUN uses a sequence of reduction operations to transform a multiprocessor problem into a collection of uniprocessor problems Uniprocessor scheduling is solved optimally by Earliest Deadline First (EDF) Uniprocessor schedules are transformed back into a single multiprocessor schedule A reduction operation is composed of two steps: packing and dual
A collection of tasks with total utilization at most one can be packed into a single fixed-utilization task: The Packing Operation Task 1: u = 0.1 Task 2: u = 0.3 Task 3: u = 0.4 u = 0.8 Packed Task:
We divide time with the packed tasks’ deadlines, and schedule them with EDF. In each segment, consume according to total utilization. Scheduling a Packed Task’s Clients Task 1: p=5 , u=.1 Task 2: p=3 , u=.3 Task 3: p=2 , u=.4 EDF schedule of tasks 19
The packed task temporarily replaces (acts as a proxy for) its clients. It may not be periodic, but has a fixed utilization in each segment. Defining a Packed Task Task 1: p=5 , u=.1 Task 2: p=3 , u=.3 Task 3: p=2 , u=.4 EDF schedule of tasks Packed Task 20
The dual of a task T is a task T* with the same deadlines, but complementary utilization / workloads: The Dual of a Task T: u = 0.4, p = 3 T*: u = 0.6, p = 3 21
The Dual of a System Given a system of n tasks and m processors, assume full utilization ( ui = m ) The dual system consists of n dual tasks running on n-m processors Note that the dual utilizations sum to (1-ui )=n - ui = n-m If n < 2m, the dual system has fewer processors The dual task T* represents the idle time of T. T* is executed in the dual system precisely when T is idle, and vice versa. 22
The Dual of a System Task 1 Task 2 Task 3 23
Task 1 Task 2 Task 3 Dual System 1 2 3 time = 0 The Dual of a System 24
Original System Dual System 1 2 3 time = 0 The Dual of a System • Because each dual task represents the idle time of its primal task, scheduling the dual system is equivalent to scheduling the original system. 25
The RUN Algorithm Reduction = Packing + Dual Keep packing until remaining tasks satisfy: ui +uj > 1 for all tasks Ti , Tj Schedule of Dual System Schedule for “Primal” System Replace packed tasks in schedule with their clients, ordered by EDF 26
RUN: A Simple Example • Step 1: Pack tasks until all pairs have utilization > 1 Task 1: u=.2 Task 12: u=.8 Task 2: u=.6 Task 3: u=.3 Task 4: u=.4 Task 5: u=.5 27
RUN: A Simple Example • Step 1: Pack tasks until all pairs have utilization > 1 Task 12: u=.8 Task 3: u=.3 Task 34: u=.7 Task 4: u=.4 Task 5: u=.5 28
Task 12: u=.8 Task 34: u=.7 Task 5: u=.5 RUN: A Simple Example • Step 2: Find the dual system on n-m = 3-2 = 1 processor 29
RUN: A Simple Example • Step 2: Find the dual system on n-m = 3-2 = 1 processor Task 12*: u=.2 Task 12: u=.8 Task 34*: u=.3 Task 34: u=.7 Task 5*: u=.5 Task 5: u=.5 30
Task 12*: u=.2 Task 34*: u=.3 Task 5*: u=.5 Dual CPU time = 0 2 4 6 RUN: A Simple Example • Step 3: Schedule uniprocessor dual with EDF 31
Dual CPU time = 0 2 4 6 RUN: A Simple Example • Step 4: Schedule packed tasks from dual schedule 32
CPU 1 CPU 2 Dual CPU time = 0 2 4 6 RUN: A Simple Example • Step 4: Schedule packed tasks from dual schedule 33
CPU 1 CPU 2 Dual CPU time = 0 2 4 6 RUN: A Simple Example • Step 5: Packed tasks schedule their clients with EDF 34
Task 1: u=.2 Task 2: u=.6 Task 3: u=.3 CPU 1 Task 4: u=.4 CPU 2 Task 5: u=.5 time = 0 2 4 6 RUN: A Simple Example • The original task set has been successfully scheduled! 35
RUN: A Few Details Several reductions may be necessary to produce a uniprocessor system Reduction reduces the number of processors by at least half On random task sets with 100 processors and hundreds of tasks, less than 1 in 600 task sets required more than 2 reductions. RUN requiresui = m . When this is not the case, dummy tasks may be added during the first packing step to create a partially or entirely partitioned system. 36
Proven Theoretical Performance RUN is optimal Reduction is done off-line, prior to execution, and takes O(n log n) Each scheduler invocation is O(n), with total scheduling overhead O(jn log m) when j jobs are scheduled RUN suffers at most (3r+1)/2 preemptions per job on task sets requiring r reductions. Since r ≤ 2 for most task sets, this gives a theoretical upper bound of 4 preemptions per job. In practice, we never observed more than 3 preeptions per job on our randomly generated task sets. 37
Simulation and Comparison We evaluated RUN in side-by-side simulations with existing optimal algorithms LLREF, DP-Wrap, and EKG. Each data point is the average of 1000 task sets, generated uniformly at random with utilizations in the range [0.1, 0.99] and integer periods in the range [5,100]. Simulations were run for 1000 time units each. Values shown are average migrations and preemptions per job. 38
Comparison Varying Processors • Number of processors varies from m = 2 to 32, with 2m tasks and 100% utilization 39
Comparison Varying Utilization • Total utilization varies from 55 to 100%, with 24 tasks running on 16 processors 40
Ongoing Work • Heuristic improvements to RUN • Extend RUN to broader problems: sporadic arrivals, arbitrarily deadlines, non-identical multiprocessors, etc • Develop general theory and new examples of non-deadline-partitioning algorithms 41
Summary- The RUN Algorithm: • is the first optimal algorithm without deadline partitioning • outperforms existing optimal algorithms by a factor of 5 on large systems • scales well to larger systems • reduces gracefully to the very efficient Partitioned EDF on any task set that Partitioned EDF can schedule 42
Thanks for Listening Questions?