1 / 40

実践システムソフトウェア Practical System Software

実践システムソフトウェア Practical System Software. 石川 裕 (Yutaka Ishikawa) 情報理工学系研究科コンピュータ科学専攻. Schedule. Oct. 6 Introduction Oct. 13 Linux and Windows Basic Architecture Oct. 20 Lab Experiment 1 Setting up and Introducing the WRK environment Oct. 27 Scheduler Nov. 10 Lab Experiment 2

Download Presentation

実践システムソフトウェア Practical System Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 実践システムソフトウェアPracticalSystem Software 石川 裕 (Yutaka Ishikawa) 情報理工学系研究科コンピュータ科学専攻 part 4

  2. Schedule • Oct. 6 Introduction • Oct. 13 Linux and Windows Basic Architecture • Oct. 20 Lab Experiment 1 • Setting up and Introducing the WRK environment • Oct. 27 Scheduler • Nov. 10 Lab Experiment 2 • Adding a new system call for counting some kernel events (context switch, page fault, process creation/deletion, etc..) • Assignment 1 • Nov. 17 No class • Nov. 24 Synchronizations • Dec. 1 Lab Experiment 3 • User space synchronization and priority inversion • Assignment 2 • Dec. 8 Virtual Memory • Dec. 15 Lab Experiment 4 • Experiment on working set • Assignment 3 • Dec. 22 I/O & File System • Jan. 12 Lab Experiment 5 • Final Assignment • Jan. 19 Lab Experiment 6 • Jan. 26 Lab Experiment 7 Grade: based on the number of attendances and the evaluation of four reports part 4

  3. Process Management Comparison • Linux • Process is called a Task • Basic address space, file descriptor table, statistics • Parent/child relationship • Basic scheduling unit • Threads • No threads per-se • Tasks can act like Windows threads by sharing file descriptor table, PID and address space • PThreads – cooperative user-mode threads • Windows • Process • Address space, handle table, statistics and at least one thread • No inherent parent/child relationship • Threads • Basic scheduling unit • Fibers - cooperative user-mode threads http://www.gnu.org/software/pth/pth-manual.html PThreads does not refer to pthread of glibc. The implementation of glibc’s pthread is based on kernel thread. part 2 3

  4. Outline • Important notions • Windows NT scheduler principles • Windows scheduler architecture • Windows thread states • Linux kernel scheduler principles • Linux scheduler architecture • Linux thread states • Linux completely fair scheduler • Comparing the schedulers part 4

  5. Important notions in scheduling • Priority: the order of privilege for occupying the CPU • Quantum: the amount of time a thread gets to run before another same priority thread is executed • Preemption: interrupting a thread without its cooperation • Processor affinity: mask of processors on which a thread is allowed to run (in SMP environment) part 4

  6. Windows NT scheduler based on Windows Server 2003 part 4

  7. Windows NT Scheduling Principles • Quantum based, priority driven preemptive scheduling • 32 priority levels • Highest-priority runnable thread always runs • Threads within same priority are scheduled following the Round-Robin policy • Thread runs for time amount of quantum part 4

  8. Windows NT Scheduling Principles • Priority can be: • Non-Realtime • Realtime (details later..) • Non-Realtime Priorities are adjusted dynamically • Priority elevation as response to certain I/O and dispatch events and quantum stretching to optimize responsiveness • Realtime priorities are assigned statically • Note: not hard-realtime! (scheduler tick is around 10ms) part 4

  9. Windows NT Scheduler • Event-based scheduling, code spread across the kernel • Dispatcher routines triggered by events like: • Thread becomes ready for execution • Thread leaves running state (quantum expires, wait state) • Thread‘s priority changes (system call/NT activity) • Processor affinity of a running thread changes part 4

  10. Windows Priority Classes • From Windows API point of view: • Processes are given a priority class upon creation • Idle, Below normal, Normal, Above normal, High, Realtime • Threads have a relative priority within the class • Idle, Lowest, Below_Normal, Normal, Above_Normal, Highest, and Time_Critical • From the kernel’s view: • Threads have priorities 0 through 31 • Threads are scheduled, not processes! • Process priority class is not used to make scheduling decisions part 4

  11. Windows Kernel Priority Levels 31 16 16 “real-time” levels 15 1 15 variable levels Used by zero page thread: Zeros memory pages 0 i Used by idle thread(s): Each CPU has an idle thread, which “runs” when there is no real thread to be run part 4

  12. Special Thread Priorities • Idle threads -- one per CPU • When no threads want to run, Idle thread “runs” • Not a real priority level - appears to have priority zero, but actually runs “below” priority 0 • Provides CPU idle time accounting (unused clock ticks are charged to the idle thread) • Loop: • Calls HAL to allow for power management • Processes DPC list • Dispatches to a thread if selected • Server 2003: in certain cases, scans per-CPU ready queues for next thread • Zero page thread -- one per NT system • Zeroes pages of memory in anticipation of “demand zero” page faults • Runs at priority zero (lower than any reachable from Windows) • Part of the “System” process (not a complete process) part 4

  13. Thread Scheduling Priorities vs. Interrupt Request Levels (IRQLs) IRQLs (x86) 31 High 30 Power fail 29 Interprocessor Interrupt 28 Clock Hardware interrupts Device n . . . Device 1 Software interrupts 2 Dispatch/DPC Thread priorities 0-31 APC 1 Passive_Level 0 part 4

  14. Windows Scheduler Architecture (called the Dispatcher database) Process Process thread thread thread thread Base priority Current priority Processor affinity Quantum Ready Queues 31 Bitmask for non-emptyready queues Bitmask for idle CPUs 0 Ready summary Idle summary 31 0 31 0 part 4

  15. Scheduling Scenarios • Preemption • A thread becomes Ready at a higher priority than the running thread • Lower-priority Running thread is preempted • Preempted thread goes back to head of its Ready queue • action: pick lowest priority thread to preempt • Voluntary switch • Waiting on a dispatcher object • Termination • Explicit lowering of priority • action: scan for next Ready thread (starting at your priority & down) • Running thread experiences quantum end • Priority is decremented unless already at thread base priority • Thread goes to tail of ready queue for its new priority • May continue running if no equal or higher-priority threads are Ready • action: pick next thread at same priority level part 4

  16. Scheduling ScenariosPreemption • Preemption is strictly event-driven • does not wait for the next clock tick • no guaranteed execution period before preemption • threads in kernel mode may be preempted (unless they raise IRQL to >= 2) • A preempted thread goes back to the head of its ready queue Running Ready from Wait state 18 17 16 15 14 13 part 4

  17. Scheduling ScenariosReady after Wait Resolution • If newly-ready thread is not of higher priority than the running thread… • …it is put at the tail of the ready queue for its current priority • If priority >=14 quantum is reset (t.b.d.) • If priority <14 and you’re about to be boosted and didn’t already have a boost, quantum is set to process quantum - 1 Running Ready 18 17 16 15 14 13 from Wait state part 4

  18. Scheduling ScenariosVoluntary Switch • When the running thread gives up the CPU… • …Schedule the thread at the head of the next non-empty “ready” queue Running Ready 18 17 16 15 14 13 to Waiting state part 4

  19. Scheduling ScenariosQuantum End (“time-slicing”) • When the running thread exhausts its CPU quantum, it goes to the end of its ready queue • Applies to both real-time and dynamic priority threads, user and kernel mode • Quantums can be disabled for a thread by a kernel function • Default quantum on Professional is 2 clock ticks, 12 on Server • standard clock tick is 10 msec; might be 15 msec on some MP Pentium systems • if no other ready threads at that priority, same thread continues running (just gets new quantum) • if running at boosted priority, priority decays by one at quantum end (described later) Running Ready 18 17 16 15 14 13 part 4

  20. Basic Thread Scheduling States preemption, quantum end Ready (1) Running (2) voluntary switch Waiting (5) part 4

  21. Priority Adjustments • Dynamic priority adjustments (boost and decay) are applied to threads in “dynamic” classes • Threads with base priorities 1-15 (technically, 1 through 14) • Disable if desired with SetThreadPriorityBoost or SetProcessPriorityBoost • Five types: • I/O completion • Wait completion on events or semaphores • When threads in the foreground process complete a wait • When GUI threads wake up for windows input • For CPU starvation avoidance • No automatic adjustments in “real-time” class (16 or above) • “Real time” here really means “system won’t change the relative priorities of your real-time threads” • Hence, scheduling is predictable with respect to other “real-time” threads (but not for absolute latency) part 4

  22. Priority Boosting To favor I/O intense threads: • After an I/O: specified by device driver • IoCompleteRequest( Irp, PriorityBoost ) Other cases discussed in the Windows Scheduling Internals Section • After a wait on executive event or semaphore • After any wait on a dispatcher object by a thread in the foreground process • GUI threads that wake up to process windowing input (e.g. windows messages) get a boost of 2 Common boost values (see NTDDK.H) 1: disk, CD-ROM, parallel, Video 2: serial, network, named pipe, mailslot6: keyboard or mouse8: sound part 4

  23. Time Thread Priority Boost and Decay • Behavior of these boosts: • Applied to thread’s base priority • will not take you above priority 15 • After a boost, you get one quantum • Then decays 1 level, runs another quantum quantum Priority decay at quantum end Boost upon wait complete Round-robin at base priority Priority Preempt (before quantum end) Base Priority Run Wait Run Run part 4

  24. Windows Scheduler Architecture (SMP improvements) Process Process Thread 3 Thread 4 Thread 1 Thread 2 CPU 1 Ready Queues CPU 0 Ready Queues 31 31 0 0 Ready summary Per-CPU ready queues! Per-CPU deferred ready queue Ready summary 31 0 31 0 Deferred Ready Queue Deferred Ready Queue part 4

  25. Windows Thread States • Init: thread is being created • Ready: waiting for execution (on ready queue) • Standby: has been selected to run next, waiting for context switch • Running:being executed • Waiting: waiting for synchronization event, I/O, etc. • Transition: waiting because kernel stack is paged out • Terminated: finished executions • + Deferred Ready: not yet readied (for fine-grained locking in multiprocessor environments, per-CPU deferred ready queue) part 4

  26. Windows Thread States and Transitions Ready (1) Init (0) Standby (3) preempt preemption, quantum end Deferred Ready (7) Running (2) voluntary switch Transition (6) Waiting (5) Terminate (4) part 4

  27. Linux scheduler based on kernel version 2.6, the Completely Fair Scheduler part 4

  28. Linux Scheduling Principles, the Completely Fair Scheduler (CFS) • NOT quantum based, priority driven preemptive scheduling • Tries to run the task with the ‘gravest need’ for more CPU time • ‘Need’ is expressed in terms of waiting time based on a virtual fair-clock • Waiting time computation takes priority into account • Further details later… part 4

  29. Linux Scheduler • Two main entry points: • periodic (scheduler_tick() function, tick: 10ms) • main scheduler (schedule() function) • Queries scheduling classes to get the next task to be executed • Scheduling classes: an extensible hierarchy of scheduler modules • Flat hierarchy (currently 3 classes): • Real-time (RT) class • Completely Fair Scheduler (CFS) class • Idle class part 4

  30. Linux Scheduling policies • Handled by CFS class: • SCHED_NORMAL: normal processes • SCHED_BATCH: CPU intensive batch processes that are not interactive • SCHED_IDLE: less important processes, with minimal relative weight (not the idle task!) • Handled by RT class: • SCHED_RR: real-time tasks with Round-Robin • SCHED_FIFO: real-time tasks with FIFO part 4

  31. Linux Kernel Priority Levels Higher Priority Real-time levels Normal levels 0 99 100 139 -20 +19 Unix nice levels part 4

  32. Linux Scheduler Architecture CPU Main scheduler Periodic scheduler Context switch Select task RT CFS Idle Run queue: embeds scheduler classes task task task task Tasks part 4

  33. Linux Scheduler Architecture (notes on SMP improvements) • per-CPU run-queues • Processor affinity mask in task structure • Load-balancing among run-queues (i.e. CPUs) • Initiated by scheduler_tick() • Support for scheduling domains • Run-queue groups part 4

  34. Linux Task States • TASK_RUNNING:being executed or waiting on a run-queue (ready to run, not waiting for external event) • TASK_INTERRUPTIBLE: waiting for external event, but signals can be delivered to the task • TASK_UNINTERRUPTIBLE: waiting for external event and signals cannot be delivered to the task, can be waken only by the kernel • TASK_STOPPED: stopped (received SIGSTOP) • TASK_TRACED: being traced(by ptrace) • Exit states: • EXIT_DEAD: finished execution and parent called wait() • EXIT_ZOMBIE: finished execution without wait() part 4

  35. Linux Task States and Transitions EXIT_DEAD EXIT_ZOMBIE TASK_RUNNING TASK_UNINTERRUPTIBLE TASK_INTERRUPTIBLE TASK_STOPPED part 4

  36. Completely Fair Scheduler (CFS):the idea • Consider an ideal computer that can run an arbitrary number of tasks in parallel • For example 5 tasks (each runs for 10 sec): • Each would get 20% of the CPU power and run simultaneously => their execution speed is 1/5 of the wall-clock • They would finish in 50 sec, at the same time • CPU can’t do this, how to emulate? part 4

  37. Completely Fair Scheduler (CFS):the idea • Switch the tasks periodically (not quantum!) • When a task runs, the rest of the tasks accumulate wait_time (i.e. the amount of unfairness) based on a virtual clock • Virtual clock advances inverse proportionally with the load (nr. of tasks in the system) • Waiting tasks are ordered based on wait_time (maintained in a red-black tree) • The scheduler runs the task that has the largest wait_time part 4

  38. Completely Fair Scheduler (CFS):details • Each task maintains wait_runtime, virtual clock is called fair_clock • Let’s consider two subsequent points in time, t2 and t1 • fair_clockela = (t2 – t1) / load • Fair clock is updated as: • fair_clock = fair_clock + fair_clockela • Waiting tasks’ wait_time: • wait_time = wait_time + fair_clockela • Running task‘s wait_time: • wait_time = wait_time – (t2 – t1) + fair_clockela (the running process would have got its fair share as well) part 4

  39. Completely Fair Scheduler (CFS): implementation notes • Processes maintain vruntime, not wait_time • Minimum min_vruntime is maintained per CPU • Processes are ordered on a red-black tree • With the key: vruntime – min_vruntime • Scheduler picks the left-most element, the one which has the smallest key (i.e. has run for the least amount of time) part 4

  40. Scheduler comparison • Windows • Basic scheduling unit: Thread • Priority driven preemptive • Quantum based • Event driven, code is spread throughout the source • 2 scheduling classes: • Realtime • Non-realtime • Not extendable • per-CPU ready queues: • Separate queues for each priority level • Round-Robin on queues • Linux • Basic scheduling unit: Task • Priority driven preemptive • Not quantum based (CFS) • Event driven, code is spread throughout the source • 2 scheduling classes: • Realtime • CFS • Alternative scheduling disciplines can be added through scheduling classes • per-CPU run-queues: • Sorted on a red-black tree • Priority embedded in wait_runtime computation part 4 40

More Related