280 likes | 363 Views
CS519: Lecture 7. Uniprocessor and Multiprocessor Scheduling. What and Why?. What is processor scheduling? Why? At first to share an expensive resource – multiprogramming Now to perform concurrent tasks because processor is so powerful Future looks like past + now
E N D
CS519: Lecture 7 Uniprocessor and Multiprocessor Scheduling
What and Why? • What is processor scheduling? • Why? • At first to share an expensive resource – multiprogramming • Now to perform concurrent tasks because processor is so powerful • Future looks like past + now • Rent-a-computer approach – large data/processing centers use multiprogramming to maximize resource utilization • Systems still powerful enough for each user to run multiple concurrent tasks DCS, Rutgers University
Assumptions • Pool of jobs contending for the CPU • CPU is a scarce resource • Jobs are independent and compete for resources (this assumption is not always used) • Scheduler mediates between jobs to optimize some performance criteria DCS, Rutgers University
Types of Scheduling We’re mostly concerned with short-term scheduling DCS, Rutgers University
What Do We Optimize? • System-oriented metrics: • Processor utilization: percentage of time the processor is busy • Throughput: number of processes completed per unit of time • User-oriented metrics: • Turnaround time: interval of time between submission and termination (including any waiting time). Appropriate for batch jobs • Response time: for interactive jobs, time from the submission of a request until the response begins to be received • Deadlines: when process completion deadlines are specified, the percentage of deadlines met must be promoted DCS, Rutgers University
Design Space • Two dimensions • Selection function • Which of the ready jobs should be run next? • Preemption • Preemptive: currently running job may be interrupted and moved to Ready state • Non-preemptive: once a process is in Running state, it continues to execute until it terminates or it blocks for I/O or system service DCS, Rutgers University
Job Behavior CPU • I/O-bound jobs • Jobs that perform lots of I/O • Tend to have short CPU bursts • CPU-bound jobs • Jobs that perform very little I/O • Tend to have very long CPU bursts • Distribution tends to be hyper-exponential • Very large number of very short CPU bursts • A small number of very long CPU bursts Disk DCS, Rutgers University
Scheduling Algorithms • FIFO: non-preemptive • Round-Robin: preemptive • Shortest Job Next: non-preemptive • Shortest Remaining Time: preemptive at arrival • Highest Response Ratio (turnaround/service time) Next: non-preemptive • Priority with feedback: preemptive at time quantum DCS, Rutgers University
Service Time Arrival Time Process 1 0 3 2 2 6 3 4 4 4 6 5 5 8 2 Example Job Set DCS, Rutgers University
Behavior of Scheduling Policies DCS, Rutgers University
Behavior of Scheduling Policies DCS, Rutgers University
Priority with Feedback Scheduling After each preemption, process moves to lower-priority queue DCS, Rutgers University
Scheduling Algorithms • FIFO is simple but leads to poor average response times. Short processes are delayed by long processes that arrive before them • RR eliminate this problem, but favors CPU-bound jobs, which have longer CPU bursts than I/O-bound jobs • SJN, SRT, and HRRN alleviate the problem with FIFO, but require information on the length of each process. This information is not always available (although it can sometimes be approximated based on past history or user input) • Feedback is a way of alleviating the problem with FIFO without information on process length DCS, Rutgers University
It’s a Changing World • Assumption about bi-modal workload no longer holds • Interactive continuous media applications are sometimes processor-bound but require good response times • New computing model requires more flexibility • How to match priorities of cooperative jobs, such as client/server jobs? • How to balance execution between multiple threads of a single process? DCS, Rutgers University
Lottery Scheduling • Randomized resource allocation mechanism • Resource rights are represented by lottery tickets • Have rounds of lottery • In each round, the winning ticket (and therefore the winner) is chosen at random • The chances of you winning directly depends on the number of tickets that you have • P[wining] = t/T, t = your number of tickets, T = total number of tickets DCS, Rutgers University
Lottery Scheduling • After n rounds, your expected number of wins is • E[win] = nP[wining] • The expected number of lotteries that a client must wait before its first win • E[wait] = 1/P[wining] • Lottery scheduling implements proportional-share resource management • Ticket currencies allow isolation between users, processes, and threads • OK, so how do we actually schedule the processor using lottery scheduling? DCS, Rutgers University
Implementation DCS, Rutgers University
Performance Allocated and observed execution ratios between two tasks running the Dhrystone benchmark. With exception of 10:1 allocation ratio, all observed ratios are close to allocations DCS, Rutgers University
Short-term Allocation Ratio DCS, Rutgers University
Isolation Five tasks running the Dhrystone benchmark. Let amount.currency denote a ticket allocation of amount denominated in currency. Tasks A1 and A2 have allocations 100.A and 200.A, respectively. Tasks B1 and B2 have allocations 100.B and 200.B, respectively. Halfway thru experiment B3 is started with allocation 300.B. This inflates the number of tickets in B from 300 to 600. There’s no effect on tasks in currency A or on the aggregate iteration ratio of A tasks to B tasks. Tasks B1 and B2 slow to half their original rates, corresponding to the factor of 2 inflation caused by B3. DCS, Rutgers University
Thread Scheduling for Cache Locality • Traditionally, each resource (CPU, memory, I/O) has been managed separately • Resources are not independent, however • Policy for one resource can affect how another resource is used. For instance, the order in which threads are scheduled can affect performance of memory subsystem • Neat paper that uses a very simple scheduling idea to enhance memory performance DCS, Rutgers University
Main Idea • When working with a large array, want to tile (block) for efficient use of the cache • What is tiling? Restructuring loops for data re-use. • Tiling by hand is a pain and is error-prone • Compilers can automatically tile but not always. For instance, when program contains dynamically allocated or indirectly accessed data • So, use threads and hints to improve cache utilization DCS, Rutgers University
Example Thread ti is denoted by ti(ai1,…,aik), where aij is the address of the jth piece of data reference by thread ti. Simplify by using just 2 or 3 addresses or hints. Hints might be elements of rows or columns of a matrix, for ex. DCS, Rutgers University
Algorithm • Hash algorithm should assign threads to bins so that threads with similar hints fall in the same bin • Threads in each bin are scheduled together • 2-D plane of bins with 2 hints • Key insight: the sum of the two dimensions of a bin is less than the cache size C • Easy to extend to k hints DCS, Rutgers University
Performance Fork = create and schedule null thread Run = execute and terminate null thread DCS, Rutgers University
More Complex Examples Partial differential equation solver DCS, Rutgers University
Multiprocessor Scheduling • Load sharing: single ready queue; processor dequeues thread at the front of the queue when idle; preempted threads are placed at the end of queue • Gang scheduling: all threads belonging to an application run at the same time • Hardware partitions: chunk of the machine is dedicated to each application • Advantages and disadvantages? DCS, Rutgers University
Multiprocessor Scheduling • Load sharing: poor locality; poor synchronization behavior; simple; good processor utilization. Affinity or per processor queues can improve locality. • Gang scheduling: central control; fragmentation --unnecessary processor idle times (e.g., two applications with P/2+1 threads); good synchronization behavior; if careful, good locality • Hardware partitions: poor utilization for I/O-intensive applications; fragmentation – unnecessary processor idle times when partitions left are small; good locality and synchronization behavior DCS, Rutgers University