260 likes | 349 Views
Group Scheduling in System Software. Michael Frisbie, Douglas Niehaus University of Kansas niehaus@ittc.ku.edu Venkita Subramonian, Christopher Gill Computer Science and Engineering Washington University cdgill@cse.wustl.edu. Motivation.
E N D
Group Schedulingin System Software Michael Frisbie, Douglas Niehaus University of Kansas niehaus@ittc.ku.edu Venkita Subramonian, Christopher Gill Computer Science and Engineering Washington University cdgill@cse.wustl.edu
Motivation • Real-time and embedded systems have widely varying computation semantics • Varying policies for controlling specific computations • Varying levels of focus for systems • Single purpose embedded systems • General purpose systems with a RT application • Multi-media, machine control and user-interface • No single policy is likely to be appropriate for all
Motivation • Computation in computer systems is not exclusively done under the publicly exposed scheduling model • OS computational components • Interrupts, SoftIRQs, Tasklets, and Bottom halves • Execute outside exposed scheduling policies • Manifest as noise in the scheduling model • Active middleware • Linux PThreads library, TAO • Manifest as competing load
Goals • Highly configurable framework within which a wide range of policies can be specified • Selection of predefined scheduling semantics • Implementation of customized schedulers • Application computation oriented representation • Representation of all computation components on system under scheduling framework • Current semantics available as default policies • Requires some new types of information
Platform • Linux • Growing popularity for real-time and embedded • Middleware version for portability and range of mechanism control • KU Real-Time Linux (KURT-Linux) • OS computation components integration • Interrupt handling modifications • Number of related projects • Data Streams Performance Evaluation Framework • Ability to gather detailed scenario-oriented data
Group Scheduling • Application computation centric scheduling view • Computations are implemented by a group of one or more computation components • Threads, IRQ handlers, SoftIRQ, Tasklets, BH's • Flexible framework for composing and configuring the system scheduling decision function (SSDF) • SSDF chooses the computation component using the CPU at any given time • Framework explicitly supports description of both computations and relations among computations
Group Structure • Group: a set of computation components with an associated Scheduling Decision Function (SDF) • Elements within a group can be threads, other groups, or other computation components • Elements can belong to more than one group • Scheduling decision tree (SDT) composed of one or more groups • Control semantics for computation components • SDT for computations are composed to form the System Scheduling Decision Tree (SSDT)
System Scheduling Decision Tree (SSDT) • Controls the system's computation components • Explicitly or implicitly • Ultimate goal is to make all of it explicit and easily configurable across a wide semantic range • Can co-exist with the default system scheduler • Semantic hooks • SSDT invocation before default scheduler (DS) • Method of making DS skip components under SSDT control
First-Refusal (FR) SSDT • FR-SSDT uses a sequential SDF at the top level • SDT controlling components under group scheduling model has first refusal • Linux SDF (default scheduler) makes the decision if no component under the group model should run • Exclusion of components from DS ensures precise control as needed
MLFQ SDT Example • Top level priority SDF maintains the priority equivalence class view • Each priority class is a group using a round-robin policy to share the CPU among members • Dynamic priority adjustment of processes can move them among priority classes
Related Work • Hierarchical Scheduling • Regher and Stankovic (RTSS 2001) • Likely computationally equivalent (capability) • Distinguished by which abstractions are emphasized • CPU Inheritance Scheduling • Ford and Susarla (Flux Project) • Group scheduling emphasizes • Application structure reflected in groups • Integration of all computation components • Interrupts, tasklets, etc
Kernel Implementation • Modifies the default Linux scheduler to permit the GS framework to have a chance to choose • Hook to make default scheduler exclude a component is the most subtle change • Changes to existing code to consult the exclusion notation, rather than trying to remove the component from the base data structure • Control for components other than threads is the most significant feature for real-time systems
Middleware Implementation • Currently controls only threads at the user level • Part of DARPA PCES2 project • Layering on top of supplied Linux scheduler requires indirect control through available mechanisms • Separates managing and managed threads into equivalence classes to determine CPU use • Uses Fixed Priority POSIX scheduling model as implemented by Linux SCHED_FIFO
Middleware Implementation • Scheduler has two threads • SSDT thread selects current thread • API thread processes group operation requests • Block Catcher detects when current thread blocks • Signals SSDT thread • Uses SIGSTOP and SIGCONT to control availability of thread for execution • Model is incomplete because it cannot know when a previously blocked thread becomes unblocked
Thread Priority Classes • Reaper spawns scheduler and then blocks • Scheduler SSDT thread chooses current thread • API thread processes group operation requests • Block Catcher detects current thread block, signals SSDT • Non-current threads are both at lower priority and SIGSTOP • Linux threads at level 0
Context Switch Event Sequence • Thread A is current thread • Timer or other event blocks or pre-empts Thread A • Scheduling Thread runs and selects Thread B, blocks in nanosleep • Context switch to Thread B begins its execution
Middleware Implementation Tradeoffs • Portable standards based implementation • POSIX fixed priority scheduling • Socket based group API access • Significantly greater context switch delay compared to existing kernel based implementation • SSDT thread context switch and Block Catcher as well if current thread blocks • Most significant need is SSDT thread notification that a threads unblocks • Scheduler Activations
Performance Evaluation • Metric • Scheduling overhead • Context switch latency (A to B) • Parameters • Number of Processes • CPU bound or I/O bound • User/Kernel Implementation • Others • Signal delivery details and semantics
Kernel Performance • Constant with respect to CPU or I/O bound • Considerably lower than MW version • Simple SSDT • Does not require signal delivery
Middleware Performance • Different with respect to CPU or I/O bound • Requires signal delivery • Block Catcher mechanism adds latency • Considerably higher than kernel version • Simple SSDT • Some extension to existing system semantics required for completeness • Unblocking notification upcall
Group Scheduling – Summary • Provides a flexible control framework • Within which resource control and • Distributed end-to-end scheduling constraints can be expressed and enforced • Portable middleware version • Limited by lack of unblocking notification upcall • Implementation under KURT-Linux is simple • ACE system call wrappers • VxWorks threads state change notification
Current Status • Integration of all KURT-Linux OS computational components under group scheduling framework • Recently completed • Michael Frisbie’s Master’s Thesis topic • We are currently working on Group Scheduling control of service classes in • Event Channel • TAO based computations • Includes control of middleware threads, queues, etc.
Future Work • Middleware use of group scheduling to provide support for service classes in Event Channel and TAO • Concurrency constraint representation in KURT-Linux to permit fine grain computation component control under group scheduling • Experimentation with application aware scheduling decision functions • Integrated DSKI/DSUI instrumentation to diagnose/deduce scheduling-related optimizations and fine-grain points of inefficiency (cruft sleuthing)