160 likes | 400 Views
Multi-core Real-Time Scheduling for Generalized Parallel Task Models. Abusayeed Saifullah, Kunal Agrawal, Chenyang Lu, Christopher Gill. Real-Time Systems on Multi-core. Traditional multiprocessor scheduling Focuses on inter-task parallelism Mostly restricted to sequential task models.
E N D
Multi-core Real-Time Scheduling for Generalized Parallel Task Models Abusayeed Saifullah, Kunal Agrawal, Chenyang Lu, Christopher Gill
Real-Time Systems on Multi-core • Traditional multiprocessor scheduling • Focuses on inter-task parallelism • Mostly restricted to sequential task models • Computation-intensive complex real-time tasks are growing • Video surveillance • Radar tracking • Hybrid real-time structural testing • Multi-core processors provide an opportunity to schedule computation-intensive tasks in real-time • Most of the tasks exhibit intra-task parallelism • Real-time systems need to be developed to exploit intra-task parallelism
Parallel Task Model • Synchronous task model Parallel threads form a segment Each horizontal bar indicates a thread of execution (sequence of instructions) Segment 1 Seg 2 Seg 3 Segment 4 Segment 5 Threads of each segment synchronize at the end of the segment Threads of Segment 1 synchronize here • Lakshmanan et al. (RTSS ’10) have addressed a restricted synchronous model where • A task is an alternate sequence of parallel and sequential segments • All parallel segments have an equal number of threads • The total number of threads in each segment ≤ number of cores
Our Contributions • We address a general synchronous parallel task model • Different segments may have different numbers of threads • Each segment can have an arbitrary number of threads • Example: such tasks are generated by • Parallel for loops in OpenMP, CilkPlus • Barrier primitives in thread libraries • This model is more portable • The same program can execute on machines with different numbers of cores
A Task Example void parallel_task(float *a,float *b,float *c,float * d) { 7 int n=7; int i=0; parallel_for(; i< n; i++) c[i] = a[i] + b[i]; n=4; i=0; parallel_for(; i< n; i++) d[i] = a[i] - b[i]; } start end
Our Contributions (contd..) • We propose a task decomposition for general synchronous parallel task model • Decomposes each parallel task into a set of sequential subtasks • Subtasks are scheduled like traditional tasks • Why decomposition? • We can exploit the rich literature of multiprocessor scheduling • The proposed decomposition ensures that if the decomposed tasks are schedulable, the original task set is also schedulable
Our Contributions (contd..) • We analyze schedulability in terms of processor speed augmentation bound • Speed augmentation bound ν for an Algorithm A: if an optimal algorithm can schedule a synchronous parallel task set on unit-speed processor cores, then A can schedule the decomposed tasks on ν-speed processor cores. • We prove that the proposed decomposition requires a speed augmentation of at most • 4 for Global Earliest Deadline First (G-EDF) scheduling • 5 for Partitioned Deadline Monotonic (P-DM) scheduling
Overview of a Task Decomposition • Each thread of the task becomes an individual task with • An intermediate subdeadline • A release offset to retain precedence relations in the original task • Deadlines are assigned by distributing slack among segments • Deadline of a thread= execution requirement+ assigned slack
Slack Distribution • How much slack a segment demands depends on • Available slack of the task • Execution requirement of the segment • Execution requirement of a segment is the product of • Total number of parallel threads in the segment and • Execution requirement of each thread in the segment • Larger execution requirement implies more demand for slack • In the figure, Segment 1 requires more slack than Segment 2
Slack Distribution (contd..) • We use the following principle to distribute slack • All segments that receive slack will achieve an equal density • Reasons to equalize the density among segments • Fairness: deadline of each segment becomes proportional to its execution requirement • We can bound the density of the decomposed tasks • We can exploit existing density-based analyses for multiprocessor
Slack Distribution (contd..) • Slack of each segment is determined by solving the equalities • Sum of subdeadlines=task deadline (total assigned slack = task slack) • Density of Segment 1= density of Segment 2 = so on • All threads in a segment have the same deadline and offset • Deadline= execution requirement of the thread + segment slack • Release offset=sum of deadlines of preceding segment …
An Example of Task Decomposition Segment 3: deadline=9 density= (3*3)/9=1 Segment 5: deadline=3 density= (1*3)/3=1 Segment 2: deadline=4 density= (2*2)/4=1 Segment 4: deadline=16 density= (4*4)/16=1 Segment 1: deadline=20 density= (5*4)/20=1 All segments have an equal density!
Global EDF (G-EDF) Schedulability • A sufficient condition for • G-EDF scheduling on m unit- • speed cores [Baruah RTSS ’07] • A necessary condition • for any task set for any • scheduler max density total density Using the density bounds for decomposed tasks If the original task set is schedulable anyway on munit-speed cores, the decomposed tasks are schedulable under G-EDF on 4-speed cores
Partitioned DM (P-DM) Schedulability FBB-FFD (Fisher Baruah Baker – First-Fit Decreasing) is a well-known P-DM scheduler [ECRTS ’06] • A sufficient condition for FBB-FFD • scheduling on m unit-speed cores • A necessary condition • for any scheduler max cumulative exe. req. of tasks divided by time length Using load and density bounds for decomposed tasks If the original task set is schedulable anyway on munit-speed cores, the decomposed tasks are FBB-FFD schedulable on 5-speed cores
Conclusion • Multi-core processors provide opportunities to schedule computation-intensive tasks in real-time • Real-time systems need to exploit intra-task parallelism • We have addressed real-time scheduling for generalized synchronous parallel task model • Different segments may have different number of threads • Each segment can have an arbitrary number of threads • We have proposed a task decomposition that achieves • A processor-speed augmentation bound of 4 for Global EDF • A processor-speed augmentation bound of 5 for Partitioned DM