340 likes | 550 Views
Scheduling in Linux and Windows 2000. CS 431 Sanjay Kumar (01D07041) Sanjay Kumar Ram (01D07042). Linux kernel Linux process Scheduling policy Data Structures and Scheduling Algorithm in Uniprocessor System Scheduling algorithm for multiprocessor Scheduling in windows 2000.
E N D
Scheduling in Linux and Windows 2000 CS 431 Sanjay Kumar (01D07041) Sanjay Kumar Ram (01D07042)
Linux kernel • Linux process • Scheduling policy • Data Structures and Scheduling Algorithm in Uniprocessor System • Scheduling algorithm for multiprocessor • Scheduling in windows 2000
Linux Kernel Kernel subsystem overview
Linux Processes • Three class of processes • Interactive processes • Batch processes • Real-time processes • Linux processes have the following states: • Running • Waiting • Stopped • Zombie
Scheduling Policy • Linux kernel is non-preemptive but processes are preemptive • The set of rules used to determine when and how selecting a new process to run is called scheduling policy • Linux scheduling is based on thetime-sharingtechnique -several processes are allowed to run "concurrently“ • CPU time is roughly divided into "slices“ • scheduling policy is also based on ranking processes according to their priority • Static priority - assigned by the users to real-time processes and ranges from 1 to 99, and is never changed by the scheduler • Dynamic priority - the sum of the base time quantum (the base priority of the process) and of the number of ticks of CPU time left to the process before its quantum expires in the current epoch
Scheduling Policy (contd.) • Process Preemption • When the dynamic priority of the currently running process is lower than the process waiting in the runqueue • A process may also be preempted when its time quantum expires • How Long Must a Quantum Last? • Quantum duration too short - system overhead caused by task switching becomes excessively high • Quantum duration too long - processes no longer appear to be executed concurrently, degrades the response time of interactive applications, and degrades the responsiveness of the system • The rule of thumb adopted by Linux is: choose a duration as long as possible, while keeping good system response time
Data Structures and Scheduling Algorithm in Uniprocessor System • The Linux scheduling algorithm works by dividing the CPU time into epochs • In a single epoch, every process has a specified time quantum • epoch ends when all runnable processes have exhausted their quantum • Each process has a base time quantum • the time-quantum value assigned by the scheduler to the process if it has exhausted its quantum in the previous epoch • users can change the base time quantum of their processes by using the nice( ) and setpriority( ) system calls • The INIT_TASK macro sets the value of the base time quantum of process 0 (swapper) to DEF_PRIORITY, which is approx. 200ms
Data Structures and Scheduling Algorithm in Uniprocessor System (contd.) • The first entry in the process table is the special init process • Task_struct - represents the states of all tasks running in the systems • schedule policy - SCHED_OTHER, SCHED_FIFO, SCHED_RR • field for process state – running, waiting, stopped, zombie • a field that indicates the processes priority • counter - a field which holds the number of clock ticks • a doubly linked list (next_task and prev_task) - to keep track of all executing processes • mm_struct - contains a process's memory management information • process ID and group ID are also stored • fs_struct - File specific process data • fields that hold timing information
Data Structures and Scheduling Algorithm in Uniprocessor System (contd.) • Each time the scheduler is run it does the following: • Kernel work - scheduler runs the bottom half handlers and processes the scheduler task queue • Current process • current process must be processed before another process can be selected to run • If the scheduling policy of the current processes is round robin then it is put onto the back of the run queue • If the task is INTERRUPTIBLE and it has received a signal since the last time it was scheduled then its state becomes RUNNING • If the current process has timed out, then its state becomes RUNNING • If the current process is RUNNING then it will remain in that state • Processes that were neither RUNNING nor INTERRUPTIBLE are removed from the run queue
Data Structures and Scheduling Algorithm in Uniprocessor System (contd.) • Process Selection • most deserving process is selected by the scheduler • real time processes are given higher priority than ordinary processes • when several processes have the same priority, the one nearest the front of the run queue is chosen • when a new process is created the number of ticks left to the parent is split in two halves, one for the parent and one for the child • priority and counter fields are used both to implement time-sharing and to compute the process dynamic priority
Data Structures and Scheduling Algorithm in Uniprocessor System (contd.) • schedule( ) Function • implements the scheduler. • finds a process in the runqueue list and then assigns the CPU to it • Direct invocation 1. Inserts current in the proper wait queue 2. Changes the state of current either to TASK_INTERRUPTIBLE or to TASK_UNINTERRUPTIBLE 3. Invokes schedule( ) 4. Checks if the resource is available; if not, goes to step 2 5. Once the resource is available, removes current from the wait queue • Lazy invocation • When current has used up its quantum of CPU time; this is done by the update_process_times( ) function • When a process is woken up and its priority is higher than that of the current process; performed by the reschedule_idle( ) function, which is invoked by the wake_up_process( ) function • When a sched_setscheduler( ) or sched_ yield( ) system call is issued
Data Structures and Scheduling Algorithm in Uniprocessor System (contd.) • Actions performed by schedule( ) • Before actually scheduling a process, the schedule( ) function starts by running the functions left by other kernel control paths in various queues • The function then executes all active unmasked bottom halves • Scheduling • value of current is saved in the prev local variable and the need_resched field of prev is set to 0 • a check is made to determine whether prev is a Round Robin real-time process. If so, schedule( ) assigns a new quantum to prev and puts it at the bottom of the runqueue list • if state is TASK_INTERRUPTIBLE, the function wakes up the process • schedule( ) repeatedly invokes the goodness( ) function on the runnable processes to determine the best candidate • when counter field becomes zero, schedule( ) assigns to all existing processes a fresh quantum, whose duration is the sum of the priority value plus half the counter value
Data Structures and Scheduling Algorithm in Uniprocessor System (contd.) • goodness( ) function • identify the best candidate among all processes in the runqueue list. • It receives as input parameters prev (the descriptor pointer of the previously running process) and p (the descriptor pointer of the process to evaluate) • The integer value c returned by goodness( ) measures the "goodness" of p and has the following meanings: • c = -1000, p must never be selected; this value is returned when the runqueue list contains only init_task • c =0, p has exhausted its quantum. Unless p is the first process in the runqueue list and all runnable processes have also exhausted their quantum, it will not be selected for execution. • 0 < c < 1000, p is a conventional process that has not exhausted its quantum; a higher value of c denotes a higher level of goodness. • c >= 1000, p is a real-time process; a higher value of c denotes a higher level of goodness.
Scheduling algorithm for multiprocessor • The Linux scheduler is defined in kernel/sched.c. • The new scheduler have been designed to accomplish specific goals: • Implement fully O(1) scheduling, Every algorithm in the new scheduler completes in constant-time, regardless of the number of running processes or any other input. • Implement perfect SMP scalability, Each processor has its own locking and individual runqueue. • Implement improved SMP affinity, Naturally attempt to group tasks to a specific CPU and continue to run them there. Only migrate tasks from one CPU to another to resolve imbalances in runqueue sizes.
Scheduling algorithm for multiprocessor (contd.) • Provide good interactive performance,Even during considerable system load, the system should react and schedule interactive tasks immediately. • Provide fairness, No process should find itself starved of timeslice for any reasonable amount of time. Likewise, no process should receive an unfairly high amount of timeslice. • Optimize for the common case of only 1-2 runnable processes, yet scale well to multiple processors each with many processes • Runqueues • It is the basic data structure in the scheduler. • It is defined in kernel/sched.c as struct runqueue. • It is the list of runnable processes on a given processor; there is one runqueue per processor. Each runnable process is on exactly one runqueue. • It additionally contains per-processor scheduling information.
Scheduling algorithm for multiprocessor (contd.) • A group of macros is used to obtain specific runqueues. • cpu_rq(processor) returns a pointer to the runqueue associated with the given processor. • this_rq()returns the runqueue of the current processor. • task_rq(task)returns a pointer to the runqueue on which the given task is queued. • this_rq_lock()locks the current runqueue • rq_unlock(struct runqueue *rq)unlocks the given runqueue. • double_rq_lock() and double_rq_unlock(), to avoid deadlock
Scheduling algorithm for multiprocessor (contd.) • The Priority Arrays • Each runqueue contains two priority arrays, the active and the expired array. • Each priority array contains one queue of runnable processors per priority level. • Priority arrays are the data structure that provide O(1) scheduling. • It also contain a priority bitmap used to efficiently discover the highest priority runnable task in the system. • Each priority array contains a bitmap field that has at least one bit for every priority on the system.
Scheduling algorithm for multiprocessor (contd.) • Finding the highest priority task on the system is therefore only a matter of finding the first set bit in the bitmap. • Each priority array also contains an array called queue of struct list_head queues, one queue for each priority. • The Linux O(1) scheduler algorithm.
Scheduling algorithm for multiprocessor (contd.) • Recalculating Timeslices • The new Linux scheduler alleviates the need for a recalculate loop. Instead, it maintains two priority arrays for each processor: 1. active array, contains all the tasks in the associated runqueue that have timeslice left. 2. expired array , contains all the tasks in the associated runqueue that have exhausted their timeslice. • When each task's timeslice reaches zero, its timeslice is recalculated before it is moved to the expired array.
Scheduling algorithm for multiprocessor (contd.) • Recalculating all the timeslices is then as simple as just switching the active and expired arrays. Because the arrays are accessed only via pointer, switching them is as fast as swapping two pointers. • This swap is a key feature of the new O(1) scheduler. Instead of recalculating each process's priority and timeslice all the time, the O(1) scheduler performs a simple two-step array swap.
Scheduling algorithm for multiprocessor (contd.) • Sleeping and waking up
Scheduling algorithm for multiprocessor (contd.) • The load balancer
Scheduling in windows 2000 • Windows 2000 scheduler: • Concern is with threads not processes • i.e. threads are scheduled, not processes • Threads have priorities 0 through 31 31 16 “real-time” levels 16 15 15 variable levels Used by zero page thread 1 0 Used by idle thread(s) i
Thread Scheduling • Strictly priority driven • 32 queues (FIFO lists) of ready threads • One queue for each priority level • Queues are common to all CPUs • When thread becomes ready, it: • either runs immediately, or • is inserted at the tail end of the Ready queue for its current priority • On a uniprocessor, highest priority Ready thread always runs • Time-sliced, round-robin within a priority level
Thread Scheduling Multiprocessor Issues • On a multiprocessor, highest-priority n threads will always run (subject to Affinity, explained later) • No attempt is made to share processors “fairly” among processes, only among threads • Tries to keep threads on the same CPU
Scheduling Scenarios • Preemption • A thread becomes ready at a higher priority than the running thread • Lower-priority thread is preempted • Preempted thread goes to the tail of its Ready queue
Scheduling Scenarios (contd.) • Voluntary switch • When the running thread gives up the CPU • Waiting on a dispatcher object • Termination • Explicit lowering of priority • Schedule the thread at the head of the next non-empty Ready queue • Running thread experiences quantum end • Priority is decremented unless at thread base priority • Thread goes to tail of Ready queue for its new priority • May continue running if no equal or higher-priority threads are Ready – i.e. it “gets” the new quantum
Priority Adjustments • Threads in “dynamic” classes can have priority adjustments applied to them (Boost or Decay) • Idle, Below Normal, Normal, Above Normal and High • Carried out upon wait completion • Used to avoid CPU starvation • No automatic adjustments in “real-time” class • Priority 16 and above • Scheduling is therefore “predictable” with respect to the “real-time” threads • Note though that this does not mean that there are absolute latency guarantees
Priority Boosting • Priority boost takes place after a wait • Occurs when a wait (usually I/O) is resolved • Slow devices / long waits = big boost • Fast devices / short waits = small boost • Boost is applied to thread’s base priority • Does not go over priority 15 • Keeps I/O devices busy
CPU Starvation • Balance Set Manager looks for “starved” threads • BSM is a thread running at priority 16, waking up once per second • Looks at threads that have been Ready for 4 seconds or more • Boosts up to 10 Ready threads per pass • Special boost applied • Priority 15 • Quantum is doubled • Does not apply in real-time range
Multiprocessor Support • By default, threads can run on any available processor • Soft affinity (introduced in NT 4.0) • Every thread has an “ideal processor” • When thread becomes ready: • if “ideal” is idle, it runs there • else, if previous processor is idle, it runs there • else, may look at next thread, to run on its ideal processor
Multiprocessor Support (contd.) • Hard affinity • Restricts thread to a subset of the available CPUs • Can lead to: • threads getting less CPU time that they normally would
References • http://iamexwiwww.unibe.ch/studenten/schlpbch/linuxScheduling/LinuxScheduling.html • http://oopweb.com/OS/Documents/tlk/VolumeFrames.html • http://www.samspublishing.com/articles/article.asp?p=101760&rl=1 • http://www.oreilly.com/catalog/linuxkernel/chapter/ch10.html • G. M. Candea and M. B. Jones, “ Vassal: Loadable Scheduler Support for Multi-Policy Scheduling,” Proceedings of the 2nd USENIX Windows NT Symposium Seattle, Washington, August 3–4, 1998