Scheduling as a Learned Art*

Scheduling as a Learned Art* Christopher Gill, William D. Smart, Terry Tidwell, and Robert Glaubius {cdgill, wds, ttidwell, rlg1}@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA Fourth International Workshop on Operating Systems Platforms for Embedded Real-Time Applications (OSPERT 2008) July 1, 2008, Prague, Czech Republic *Research supported in part by NSF awards CNS-0716764 (Cybertrust) and CCF-0448562 (CAREER)

Motivation: Systems with (some) Autonomy • Interact with variable environment • Varying degrees of autonomy • Performance is deadline sensitive • Many activities must run at once • Device interrupt handing, computation • Comm w/ other systems/operators • Need reliable activity execution • Scheduling with shared resources and competing, variable execution times • How to guarantee utilizations? Lewis Media and Machines Lab Washington University St. Louis, MO, USA Remote Operator Station (for all but full autonomy) Wireless Communication

More Generally, Open Soft Real-Time Systems • Questions of interest are relevant well beyond mobile robotics • Robotics is a good touchstone, though • In many systems, platform features interact with physical environment • Especially with increased embedding of OS/RTOS platforms everywhere ;-) • Abstract view of the problem • Diverse concurrent application tasks • Task execution times are variable • (Soft) deadlines on application tasks • Resources shared among tasks • Need methods to design and verify scheduling policies accordingly What Other Kinds of Embedded Systems Have Similar Platform Constraints?

Current System Model • Threads of execution depend on a shared resource • Require mutually exclusive access (e.g., to a CPU) to run • Each thread binds the resource when it runs • A thread binds resource for a duration then releases it • Model duration with integer variables: count time quanta • Variable execution times with known distributions • We assume that each thread’s run-time distribution is known and bounded, and independent of the others • Non-preemptive scheduler (repeats perpetually) • Scheduler chooses which thread to run (based on policy) • Scheduler dispatches thread which runs until it yields • Scheduler waits until the thread releases the resource

Uncertainty (but with Observability Post-Hoc) • We summarize system state as a vector of integers • Represent thread utilizations • Threads’ run times come from known, bounded distributions • Scheduling a thread changes the system’s (utilization) state • Utilization is observed after the thread runs based on its run time • State transition probabilities are based the run time distributions • This forms a basis for policy design and optimization probability time From Tidwell et al., ATC 2008 probability time

From Thread Run Times to a Scheduling Policy • We model thread scheduling decisions as a Markov Decision Process (MDP) based on thread run times • (From ATC ‘08) MDP is given by 4-tuple: (X,A,R,T) • X: set of process states (i.e., thread utilization states) • A: set of actions (i.e., scheduling a particular thread) • R: reward function for taking an action in a state • Expected utility of taking that action • Distance of the next state(s) from a desired utilization (vector) • T: transition function • For each action, encodes the probability of moving from a given state to another state • Solve MDP: optimal (per accumulated reward) policy • Fold periodic states: smaller space (recent advance)

Partial Observability • Local CPU usage is pretty easy to observe exactly • E.g., using Pentium tick counter, or other good time source • However, other key properties are noisier • E.g., robot location indoors • No GPS “position sensor”, wheel slip etc. adds noise during motion • How does this relate to scheduling? • What if we consider robot’s progress along a navigation path … • … as an activity which must compete for resources with others? • Then, robot’s position becomes part of the scheduling state • Similar issues may arise for other scheduling cases (e.g., in CPS) • Noise in observation produces partial observability • E.g., multiple different positions can be equally likely • Possible approach: Partially Observable MDPs (POMDPs) • Reason on belief states to get MDP transition function (a big space)

Observation Lag • State observations also may incur temporal lag • E.g., detailed scan of area with a range finding laser • However, during time it takes to scan, time passes • Robot or environment may move while scan is being done • As with partial observability, need a new extension to basic MDP model to address observation lag • In Semi-MDPs (SMDPs), an action causes 1 state change • SMDP extensions to MDPs exist for finding optimal policy

Neglect Tolerance • Need to schedule >1 entire-system behavior at once • Can transform into scheduling interim sub-tasks as before • However, a behavior has own (possibly dynamic) structure • Navigation to cover a room, while mapping its boundary • Resource contention, control/data dependence • Scheduling becomes a multi-criteria optimization • Sub-tasks may have (potentially hard) deadlines • E.g., decide to turn or stop before hitting a wall • Spectrum: remote control to complete autonomy • Higher neglect tolerance needs more on-board scheduling • Uncertainty, observability, temporal lag issues as before • Open problem: formalize tractably, model parametrically • Multi-disciplinary (RT/ML) approach so far is still needed

Learning (aka “Good Scheduler, Bad Scheduler”) • We base scheduling decisions on a value function • Captures state-action notion of long-term utility • Based on expected rewards from current and future actions • But, knowing complete distributions is daunting in practice • Reinforcement learning appears promising for this • A stochastic variant of dynamic programming • Control decisions learned from direct observation • Start by dividing time into discrete steps • At each step, system is in one of a discrete set of states • Scheduler observes state, chooses action from finite set • Running action changes system state at next time step • Scheduler receives reward for immediate effect of action • Estimates value function, resulting model is exactly MDP

Related Work • Reference monitor approaches • Interposition architectures • E.g., Ostia: user/kernel-level (Garfinkel et al.) • Separation kernels • E.g., ARINC-653, MILS (Vanfleet et al.) • Scheduling policy design • Hierarchical scheduling • E.g., HLS and its extensions (Regehr et al.) • E.g., Group scheduling (Niehaus et al.) • State space construction and verification • (Timed automata) model checking • E.g., IF (Sifakis et al.) • Quasi-cyclic state space reduction • E.g., Bogor (Robby et al.)

Concluding Remarks • MDP approach maintains rational scheduling control • Even when thread run times vary stochastically • Encodes rather than presupposes utilizations • Allows policy verification (e.g., over utilization states) • Ongoing and Future Work • State space reduction via quasi-cyclic structure • Verification over continuous/discrete states • Kernel-level non-bypassable policy enforcement • Automated learning to discover scheduling policies • E.g., via RL for MDPs, POMDPs, SMDPs • Project web page • Supported by NSF grant CNS-0716764 • http://www.cse.wustl.edu/~cdgill/Cybertrust/

Scheduling as a Learned Art*