400 likes | 500 Views
実践システムソフトウェア Practical System Software. 石川 裕 (Yutaka Ishikawa) 情報理工学系研究科コンピュータ科学専攻. Schedule. Oct. 6 Introduction Oct. 13 Linux and Windows Basic Architecture Oct. 20 Lab Experiment 1 Setting up and Introducing the WRK environment Oct. 27 Scheduler Nov. 10 Lab Experiment 2
E N D
実践システムソフトウェアPracticalSystem Software 石川 裕 (Yutaka Ishikawa) 情報理工学系研究科コンピュータ科学専攻 part 4
Schedule • Oct. 6 Introduction • Oct. 13 Linux and Windows Basic Architecture • Oct. 20 Lab Experiment 1 • Setting up and Introducing the WRK environment • Oct. 27 Scheduler • Nov. 10 Lab Experiment 2 • Adding a new system call for counting some kernel events (context switch, page fault, process creation/deletion, etc..) • Assignment 1 • Nov. 17 No class • Nov. 24 Synchronizations • Dec. 1 Lab Experiment 3 • User space synchronization and priority inversion • Assignment 2 • Dec. 8 Virtual Memory • Dec. 15 Lab Experiment 4 • Experiment on working set • Assignment 3 • Dec. 22 I/O & File System • Jan. 12 Lab Experiment 5 • Final Assignment • Jan. 19 Lab Experiment 6 • Jan. 26 Lab Experiment 7 Grade: based on the number of attendances and the evaluation of four reports part 4
Process Management Comparison • Linux • Process is called a Task • Basic address space, file descriptor table, statistics • Parent/child relationship • Basic scheduling unit • Threads • No threads per-se • Tasks can act like Windows threads by sharing file descriptor table, PID and address space • PThreads – cooperative user-mode threads • Windows • Process • Address space, handle table, statistics and at least one thread • No inherent parent/child relationship • Threads • Basic scheduling unit • Fibers - cooperative user-mode threads http://www.gnu.org/software/pth/pth-manual.html PThreads does not refer to pthread of glibc. The implementation of glibc’s pthread is based on kernel thread. part 2 3
Outline • Important notions • Windows NT scheduler principles • Windows scheduler architecture • Windows thread states • Linux kernel scheduler principles • Linux scheduler architecture • Linux thread states • Linux completely fair scheduler • Comparing the schedulers part 4
Important notions in scheduling • Priority: the order of privilege for occupying the CPU • Quantum: the amount of time a thread gets to run before another same priority thread is executed • Preemption: interrupting a thread without its cooperation • Processor affinity: mask of processors on which a thread is allowed to run (in SMP environment) part 4
Windows NT Scheduling Principles • Quantum based, priority driven preemptive scheduling • 32 priority levels • Highest-priority runnable thread always runs • Threads within same priority are scheduled following the Round-Robin policy • Thread runs for time amount of quantum part 4
Windows NT Scheduling Principles • Priority can be: • Non-Realtime • Realtime (details later..) • Non-Realtime Priorities are adjusted dynamically • Priority elevation as response to certain I/O and dispatch events and quantum stretching to optimize responsiveness • Realtime priorities are assigned statically • Note: not hard-realtime! (scheduler tick is around 10ms) part 4
Windows NT Scheduler • Event-based scheduling, code spread across the kernel • Dispatcher routines triggered by events like: • Thread becomes ready for execution • Thread leaves running state (quantum expires, wait state) • Thread‘s priority changes (system call/NT activity) • Processor affinity of a running thread changes part 4
Windows Priority Classes • From Windows API point of view: • Processes are given a priority class upon creation • Idle, Below normal, Normal, Above normal, High, Realtime • Threads have a relative priority within the class • Idle, Lowest, Below_Normal, Normal, Above_Normal, Highest, and Time_Critical • From the kernel’s view: • Threads have priorities 0 through 31 • Threads are scheduled, not processes! • Process priority class is not used to make scheduling decisions part 4
Windows Kernel Priority Levels 31 16 16 “real-time” levels 15 1 15 variable levels Used by zero page thread: Zeros memory pages 0 i Used by idle thread(s): Each CPU has an idle thread, which “runs” when there is no real thread to be run part 4
Special Thread Priorities • Idle threads -- one per CPU • When no threads want to run, Idle thread “runs” • Not a real priority level - appears to have priority zero, but actually runs “below” priority 0 • Provides CPU idle time accounting (unused clock ticks are charged to the idle thread) • Loop: • Calls HAL to allow for power management • Processes DPC list • Dispatches to a thread if selected • Server 2003: in certain cases, scans per-CPU ready queues for next thread • Zero page thread -- one per NT system • Zeroes pages of memory in anticipation of “demand zero” page faults • Runs at priority zero (lower than any reachable from Windows) • Part of the “System” process (not a complete process) part 4
Thread Scheduling Priorities vs. Interrupt Request Levels (IRQLs) IRQLs (x86) 31 High 30 Power fail 29 Interprocessor Interrupt 28 Clock Hardware interrupts Device n . . . Device 1 Software interrupts 2 Dispatch/DPC Thread priorities 0-31 APC 1 Passive_Level 0 part 4
Windows Scheduler Architecture (called the Dispatcher database) Process Process thread thread thread thread Base priority Current priority Processor affinity Quantum Ready Queues 31 Bitmask for non-emptyready queues Bitmask for idle CPUs 0 Ready summary Idle summary 31 0 31 0 part 4
Scheduling Scenarios • Preemption • A thread becomes Ready at a higher priority than the running thread • Lower-priority Running thread is preempted • Preempted thread goes back to head of its Ready queue • action: pick lowest priority thread to preempt • Voluntary switch • Waiting on a dispatcher object • Termination • Explicit lowering of priority • action: scan for next Ready thread (starting at your priority & down) • Running thread experiences quantum end • Priority is decremented unless already at thread base priority • Thread goes to tail of ready queue for its new priority • May continue running if no equal or higher-priority threads are Ready • action: pick next thread at same priority level part 4
Scheduling ScenariosPreemption • Preemption is strictly event-driven • does not wait for the next clock tick • no guaranteed execution period before preemption • threads in kernel mode may be preempted (unless they raise IRQL to >= 2) • A preempted thread goes back to the head of its ready queue Running Ready from Wait state 18 17 16 15 14 13 part 4
Scheduling ScenariosReady after Wait Resolution • If newly-ready thread is not of higher priority than the running thread… • …it is put at the tail of the ready queue for its current priority • If priority >=14 quantum is reset (t.b.d.) • If priority <14 and you’re about to be boosted and didn’t already have a boost, quantum is set to process quantum - 1 Running Ready 18 17 16 15 14 13 from Wait state part 4
Scheduling ScenariosVoluntary Switch • When the running thread gives up the CPU… • …Schedule the thread at the head of the next non-empty “ready” queue Running Ready 18 17 16 15 14 13 to Waiting state part 4
Scheduling ScenariosQuantum End (“time-slicing”) • When the running thread exhausts its CPU quantum, it goes to the end of its ready queue • Applies to both real-time and dynamic priority threads, user and kernel mode • Quantums can be disabled for a thread by a kernel function • Default quantum on Professional is 2 clock ticks, 12 on Server • standard clock tick is 10 msec; might be 15 msec on some MP Pentium systems • if no other ready threads at that priority, same thread continues running (just gets new quantum) • if running at boosted priority, priority decays by one at quantum end (described later) Running Ready 18 17 16 15 14 13 part 4
Basic Thread Scheduling States preemption, quantum end Ready (1) Running (2) voluntary switch Waiting (5) part 4
Priority Adjustments • Dynamic priority adjustments (boost and decay) are applied to threads in “dynamic” classes • Threads with base priorities 1-15 (technically, 1 through 14) • Disable if desired with SetThreadPriorityBoost or SetProcessPriorityBoost • Five types: • I/O completion • Wait completion on events or semaphores • When threads in the foreground process complete a wait • When GUI threads wake up for windows input • For CPU starvation avoidance • No automatic adjustments in “real-time” class (16 or above) • “Real time” here really means “system won’t change the relative priorities of your real-time threads” • Hence, scheduling is predictable with respect to other “real-time” threads (but not for absolute latency) part 4
Priority Boosting To favor I/O intense threads: • After an I/O: specified by device driver • IoCompleteRequest( Irp, PriorityBoost ) Other cases discussed in the Windows Scheduling Internals Section • After a wait on executive event or semaphore • After any wait on a dispatcher object by a thread in the foreground process • GUI threads that wake up to process windowing input (e.g. windows messages) get a boost of 2 Common boost values (see NTDDK.H) 1: disk, CD-ROM, parallel, Video 2: serial, network, named pipe, mailslot6: keyboard or mouse8: sound part 4
Time Thread Priority Boost and Decay • Behavior of these boosts: • Applied to thread’s base priority • will not take you above priority 15 • After a boost, you get one quantum • Then decays 1 level, runs another quantum quantum Priority decay at quantum end Boost upon wait complete Round-robin at base priority Priority Preempt (before quantum end) Base Priority Run Wait Run Run part 4
Windows Scheduler Architecture (SMP improvements) Process Process Thread 3 Thread 4 Thread 1 Thread 2 CPU 1 Ready Queues CPU 0 Ready Queues 31 31 0 0 Ready summary Per-CPU ready queues! Per-CPU deferred ready queue Ready summary 31 0 31 0 Deferred Ready Queue Deferred Ready Queue part 4
Windows Thread States • Init: thread is being created • Ready: waiting for execution (on ready queue) • Standby: has been selected to run next, waiting for context switch • Running:being executed • Waiting: waiting for synchronization event, I/O, etc. • Transition: waiting because kernel stack is paged out • Terminated: finished executions • + Deferred Ready: not yet readied (for fine-grained locking in multiprocessor environments, per-CPU deferred ready queue) part 4
Windows Thread States and Transitions Ready (1) Init (0) Standby (3) preempt preemption, quantum end Deferred Ready (7) Running (2) voluntary switch Transition (6) Waiting (5) Terminate (4) part 4
Linux scheduler based on kernel version 2.6, the Completely Fair Scheduler part 4
Linux Scheduling Principles, the Completely Fair Scheduler (CFS) • NOT quantum based, priority driven preemptive scheduling • Tries to run the task with the ‘gravest need’ for more CPU time • ‘Need’ is expressed in terms of waiting time based on a virtual fair-clock • Waiting time computation takes priority into account • Further details later… part 4
Linux Scheduler • Two main entry points: • periodic (scheduler_tick() function, tick: 10ms) • main scheduler (schedule() function) • Queries scheduling classes to get the next task to be executed • Scheduling classes: an extensible hierarchy of scheduler modules • Flat hierarchy (currently 3 classes): • Real-time (RT) class • Completely Fair Scheduler (CFS) class • Idle class part 4
Linux Scheduling policies • Handled by CFS class: • SCHED_NORMAL: normal processes • SCHED_BATCH: CPU intensive batch processes that are not interactive • SCHED_IDLE: less important processes, with minimal relative weight (not the idle task!) • Handled by RT class: • SCHED_RR: real-time tasks with Round-Robin • SCHED_FIFO: real-time tasks with FIFO part 4
Linux Kernel Priority Levels Higher Priority Real-time levels Normal levels 0 99 100 139 -20 +19 Unix nice levels part 4
Linux Scheduler Architecture CPU Main scheduler Periodic scheduler Context switch Select task RT CFS Idle Run queue: embeds scheduler classes task task task task Tasks part 4
Linux Scheduler Architecture (notes on SMP improvements) • per-CPU run-queues • Processor affinity mask in task structure • Load-balancing among run-queues (i.e. CPUs) • Initiated by scheduler_tick() • Support for scheduling domains • Run-queue groups part 4
Linux Task States • TASK_RUNNING:being executed or waiting on a run-queue (ready to run, not waiting for external event) • TASK_INTERRUPTIBLE: waiting for external event, but signals can be delivered to the task • TASK_UNINTERRUPTIBLE: waiting for external event and signals cannot be delivered to the task, can be waken only by the kernel • TASK_STOPPED: stopped (received SIGSTOP) • TASK_TRACED: being traced(by ptrace) • Exit states: • EXIT_DEAD: finished execution and parent called wait() • EXIT_ZOMBIE: finished execution without wait() part 4
Linux Task States and Transitions EXIT_DEAD EXIT_ZOMBIE TASK_RUNNING TASK_UNINTERRUPTIBLE TASK_INTERRUPTIBLE TASK_STOPPED part 4
Completely Fair Scheduler (CFS):the idea • Consider an ideal computer that can run an arbitrary number of tasks in parallel • For example 5 tasks (each runs for 10 sec): • Each would get 20% of the CPU power and run simultaneously => their execution speed is 1/5 of the wall-clock • They would finish in 50 sec, at the same time • CPU can’t do this, how to emulate? part 4
Completely Fair Scheduler (CFS):the idea • Switch the tasks periodically (not quantum!) • When a task runs, the rest of the tasks accumulate wait_time (i.e. the amount of unfairness) based on a virtual clock • Virtual clock advances inverse proportionally with the load (nr. of tasks in the system) • Waiting tasks are ordered based on wait_time (maintained in a red-black tree) • The scheduler runs the task that has the largest wait_time part 4
Completely Fair Scheduler (CFS):details • Each task maintains wait_runtime, virtual clock is called fair_clock • Let’s consider two subsequent points in time, t2 and t1 • fair_clockela = (t2 – t1) / load • Fair clock is updated as: • fair_clock = fair_clock + fair_clockela • Waiting tasks’ wait_time: • wait_time = wait_time + fair_clockela • Running task‘s wait_time: • wait_time = wait_time – (t2 – t1) + fair_clockela (the running process would have got its fair share as well) part 4
Completely Fair Scheduler (CFS): implementation notes • Processes maintain vruntime, not wait_time • Minimum min_vruntime is maintained per CPU • Processes are ordered on a red-black tree • With the key: vruntime – min_vruntime • Scheduler picks the left-most element, the one which has the smallest key (i.e. has run for the least amount of time) part 4
Scheduler comparison • Windows • Basic scheduling unit: Thread • Priority driven preemptive • Quantum based • Event driven, code is spread throughout the source • 2 scheduling classes: • Realtime • Non-realtime • Not extendable • per-CPU ready queues: • Separate queues for each priority level • Round-Robin on queues • Linux • Basic scheduling unit: Task • Priority driven preemptive • Not quantum based (CFS) • Event driven, code is spread throughout the source • 2 scheduling classes: • Realtime • CFS • Alternative scheduling disciplines can be added through scheduling classes • per-CPU run-queues: • Sorted on a red-black tree • Priority embedded in wait_runtime computation part 4 40