1 / 12

Scheduling as a Learned Art*

Scheduling as a Learned Art*. Christopher Gill, William D. Smart, Terry Tidwell, and Robert Glaubius {cdgill, wds, ttidwell, rlg1}@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA.

aneko
Download Presentation

Scheduling as a Learned Art*

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scheduling as a Learned Art* Christopher Gill, William D. Smart, Terry Tidwell, and Robert Glaubius {cdgill, wds, ttidwell, rlg1}@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA Fourth International Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT 2008) July 1, 2008, Prague, Czech Republic *Research supported in part by NSF awards CNS-0716764 (Cybertrust) and CCF-0448562 (CAREER)

  2. Motivation: Systems with (some) Autonomy • Interact with variable environment • Varying degrees of autonomy • Performance is deadline sensitive • Many activities must run at once • Device interrupt handing, computation • Comm w/ other systems/operators • Need reliable activity execution • Scheduling with shared resources and competing, variable execution times • How to guarantee utilizations? Lewis Media and Machines Lab Washington University St. Louis, MO, USA Remote Operator Station (for all but full autonomy) Wireless Communication

  3. More Generally, Open Soft Real-Time Systems • Questions of interest are relevant well beyond mobile robotics • Robotics is a good touchstone, though • In many systems, platform features interact with physical environment • Especially with increased embedding of OS/RTOS platforms everywhere ;-) • Abstract view of the problem • Diverse concurrent application tasks • Task execution times are variable • (Soft) deadlines on application tasks • Resources shared among tasks • Need methods to design and verify scheduling policies accordingly What Other Kinds of Embedded Systems Have Similar Platform Constraints?

  4. Current System Model • Threads of execution depend on a shared resource • Require mutually exclusive access (e.g., to a CPU) to run • Each thread binds the resource when it runs • A thread binds resource for a duration then releases it • Model duration with integer variables: count time quanta • Variable execution times with known distributions • We assume that each thread’s run-time distribution is known and bounded, and independent of the others • Non-preemptive scheduler (repeats perpetually) • Scheduler chooses which thread to run (based on policy) • Scheduler dispatches thread which runs until it yields • Scheduler waits until the thread releases the resource

  5. Uncertainty (but with Observability Post-Hoc) • We summarize system state as a vector of integers • Represent thread utilizations • Threads’ run times come from known, bounded distributions • Scheduling a thread changes the system’s (utilization) state • Utilization is observed after the thread runs based on its run time • State transition probabilities are based the run time distributions • This forms a basis for policy design and optimization probability time From Tidwell et al., ATC 2008 probability time

  6. From Thread Run Times to a Scheduling Policy • We model thread scheduling decisions as a Markov Decision Process (MDP) based on thread run times • (From ATC ‘08) MDP is given by 4-tuple: (X,A,R,T) • X: set of process states (i.e., thread utilization states) • A: set of actions (i.e., scheduling a particular thread) • R: reward function for taking an action in a state • Expected utility of taking that action • Distance of the next state(s) from a desired utilization (vector) • T: transition function • For each action, encodes the probability of moving from a given state to another state • Solve MDP: optimal (per accumulated reward) policy • Fold periodic states: smaller space (recent advance)

  7. Partial Observability • Local CPU usage is pretty easy to observe exactly • E.g., using Pentium tick counter, or other good time source • However, other key properties are noisier • E.g., robot location indoors • No GPS “position sensor”, wheel slip etc. adds noise during motion • How does this relate to scheduling? • What if we consider robot’s progress along a navigation path … • … as an activity which must compete for resources with others? • Then, robot’s position becomes part of the scheduling state • Similar issues may arise for other scheduling cases (e.g., in CPS) • Noise in observation produces partial observability • E.g., multiple different positions can be equally likely • Possible approach: Partially Observable MDPs (POMDPs) • Reason on belief states to get MDP transition function (a big space)

  8. Observation Lag • State observations also may incur temporal lag • E.g., detailed scan of area with a range finding laser • However, during time it takes to scan, time passes • Robot or environment may move while scan is being done • As with partial observability, need a new extension to basic MDP model to address observation lag • In Semi-MDPs (SMDPs), an action causes 1 state change • SMDP extensions to MDPs exist for finding optimal policy

  9. Neglect Tolerance • Need to schedule >1 entire-system behavior at once • Can transform into scheduling interim sub-tasks as before • However, a behavior has own (possibly dynamic) structure • Navigation to cover a room, while mapping its boundary • Resource contention, control/data dependence • Scheduling becomes a multi-criteria optimization • Sub-tasks may have (potentially hard) deadlines • E.g., decide to turn or stop before hitting a wall • Spectrum: remote control to complete autonomy • Higher neglect tolerance needs more on-board scheduling • Uncertainty, observability, temporal lag issues as before • Open problem: formalize tractably, model parametrically • Multi-disciplinary (RT/ML) approach so far is still needed

  10. Learning (aka “Good Scheduler, Bad Scheduler”) • We base scheduling decisions on a value function • Captures state-action notion of long-term utility • Based on expected rewards from current and future actions • But, knowing complete distributions is daunting in practice • Reinforcement learning appears promising for this • A stochastic variant of dynamic programming • Control decisions learned from direct observation • Start by dividing time into discrete steps • At each step, system is in one of a discrete set of states • Scheduler observes state, chooses action from finite set • Running action changes system state at next time step • Scheduler receives reward for immediate effect of action • Estimates value function, resulting model is exactly MDP

  11. Related Work • Reference monitor approaches • Interposition architectures • E.g., Ostia: user/kernel-level (Garfinkel et al.) • Separation kernels • E.g., ARINC-653, MILS (Vanfleet et al.) • Scheduling policy design • Hierarchical scheduling • E.g., HLS and its extensions (Regehr et al.) • E.g., Group scheduling (Niehaus et al.) • State space construction and verification • (Timed automata) model checking • E.g., IF (Sifakis et al.) • Quasi-cyclic state space reduction • E.g., Bogor (Robby et al.)

  12. Concluding Remarks • MDP approach maintains rational scheduling control • Even when thread run times vary stochastically • Encodes rather than presupposes utilizations • Allows policy verification (e.g., over utilization states) • Ongoing and Future Work • State space reduction via quasi-cyclic structure • Verification over continuous/discrete states • Kernel-level non-bypassable policy enforcement • Automated learning to discover scheduling policies • E.g., via RL for MDPs, POMDPs, SMDPs • Project web page • Supported by NSF grant CNS-0716764 • http://www.cse.wustl.edu/~cdgill/Cybertrust/

More Related