Scheduling Policy Design using Stochastic Dynamic Programming

Scheduling Policy Design using Stochastic Dynamic Programming Robert Glaubius Dissertation Defense 2009, November 13

Sensing on a Mobile Robot • Camera for mission objectives (e.g., finding people). • Laser range-finder for obstacle detection. R. Glaubius

Sensing on a Mobile Robot • Some common obstacles may escape laser detection. R. Glaubius

Sensing on a Mobile Robot • Use camera to supplement obstacle detection. R. Glaubius

Resource Contention 67% • Now we have two tasks that need the camera. • We need a rational policy for allocating the camera to each task. • Assign each task a resource time share. 33% R. Glaubius

Task Scheduling Model • Multiple, repeating tasks use a mutually-exclusive shared resource. • Each task has a utilization target specifying its share, u=(u1,…,un). • Each task instance has stochastic duration. • Tasks may not be preempted. R. Glaubius

The Main Contribution • Scheduling policy design techniques for non-preemptive, non-deterministic systems that are • Share Aware • Scalable • Adaptive R. Glaubius

Share Aware Scheduling • System state: cumulative resource usage of each task. • Dispatching a task moves the system stochastically through the state space according to that task’s duration. (8,17) R. Glaubius

Share Aware Scheduling u • Utilization target induces a ray{u:0} through the state space. • Encode “goodness” relative to the share as a cost. • Require that costs grow with distance from utilization ray. u=(1/3,2/3) R. Glaubius

Task Scheduling MDP • States are the cumulative resource utilization of each task. • Actions correspond to dispatching a task. • Transitions dictated by task duration distributions. • Costs grow with deviation from the share target. • Goal: find a policy that minimizes long-term cost. R. Glaubius

Transition Structure • Transitions are state-independent • Relative distribution over successor states is the same in each state. R. Glaubius

Cost Structure • State equivalence under costs: • States along lines parallel to the utilization ray have equal cost R. Glaubius

Equivalence Classes • Transition and cost structure induces state equivalence. • Equivalent states have the same optimal long-term cost and policy! R. Glaubius

Periodicity • Periodic structure allows us to remove all but one exemplar from each equivalence class. R. Glaubius

Wrapped state model • Remove all but one exemplar from each equivalence class. • Actions and costs remain unchanged. • Remap transitions to removed states to the corresponding exemplar. (0,0) R. Glaubius

c(x)= c(x)= Bounded state model • Inexpensive states are near the utilization target. • Good policies should keep costs small. • Can truncate the state space by bounding costs. R. Glaubius

Bounded state model • Mapping “dangling” transitions to a high-cost absorbing state guarantees that we find bounded cost policies when they exist. • Bounded costs guarantee bounded deviation from the resource share. R. Glaubius

Scheduling Policy Design • Iteratively increase bounds and resolve problem. • As bounds increase, the bounded model solution converges to the optimal wrapped model policy. R. Glaubius

Automating Model Discovery • ESPI: Expanding State Policy Iteration • Start with a policy that only reaches finitely many states from (0,…,0). • E.g., always run most underutilized task. • Enumerate enough states to evaluate and improve that policy. • If policy can not be improved, stop. • Otherwise, goto (2) with improved policy. R. Glaubius

Policy Evaluation Envelope • Enumerate states reachable from the initial state. • Breadth-first state space exploration under the current policy, starting from the initial state. (0,0) R. Glaubius

Policy Improvement Envelope • Consider alternative actions. • Close under the current policy using breadth-first expansion. • Evaluate and improve the policy within this envelope. R. Glaubius

ESPI Termination • As long as the initial policy has finite closure, each ESPI iteration terminates. • Satisfied by the policy that always runs the most underutilized task. • Policy strictly improves at each iteration. • Empirically, ESPI terminates on task scheduling MDPs. R. Glaubius

Comparing Design Methods • Policy performance normalized and centered on ESPI solution. • Larger bounded state models yield ESPI solution. R. Glaubius

Share Aware Scheduling • MDP representation allows consistent approximation of the optimal scheduling policy. • Empirically, bounded model and ESPI solutions appear optimal. • Approach scales exponentially in the number of tasks. R. Glaubius

Addressing the Curse of Dimensionality • Focus attention on a restricted class of appropriate scheduling policies. • How do we choose and parameterize these policies? R. Glaubius

Two-task MDP Policy • Scheduling policies induce a partition on the state space with boundary parallel to the share target. • Establish a decision offset to identify the partition boundary. • Sufficient in 2-d, but what about higher dimensions? R. Glaubius

Time Horizons Ht={x : x1+x2+…+xn=t} u (0,0,2) u (0,2,0) H0 H1 (0,0) (2,0,0) H0 H1 H2 H3 H4 H2 R. Glaubius

Three-task MDP Policy t =10 t =20 t =30 • Action partitions meet along a decision ray that is parallel to the utilization ray. • Action partitions are roughly cone-shaped. R. Glaubius

x Parameterizing the Partition • Specify a decision offset at the intersection of partitions. • Anchor action vectors at the decision offset to approximate partitions. • The conic policy selects action vector best aligned with the displacement between the query state and the decision offset. a2 a1 a3 R. Glaubius

Decision offset d Action vectors a1,a2,…,an Sufficient to partition each time horizon into nregions. Tune policies through local search. Conic Policy Parameterization R. Glaubius

Four Tasks R. Glaubius

Ten Tasks R. Glaubius

Varying Numbers of Tasks R. Glaubius

Addressing the Curse of Dimensionality • Conic policy approximates the geometry of those found using ESPI. • Number of parameters grows just quadratically with the number of tasks. • Contains cost-bounded, stable policies. • Performance is competitive with ESPI. • Improves on heuristic policies. R. Glaubius

Conclusions • We have addressed a novel set of scheduling concerns present in many cyber-physical systems • Non-preemptive resource semantics. • Stochastic task execution times. • Enforce a user-selected resource share. • Our model-based solution methods provide strong approximations to optimal policies. • Our conic policies allow us to scale model-based techniques to larger problems. R. Glaubius

Further Contributions • Adaptive scheduling: online learning • Sample complexity of learning is similar to optimal control of single-state MDPs. • The domain enforces rational exploration without explicit exploration mechanisms. • Formal Guarantees: • Existence of optimal scheduling policies. • Periodicity of optimal scheduling policies. • Existence of cost-bounded policies. • Existence of stable conic policies. R. Glaubius

Publications • R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Policy Design for Autonomic Systems”, International Journal on Autonomous and Adaptive Communications Systems, 2(3):276-296, 2009. • R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Design and Verification for Open Soft Real-Time Systems”, RTSS 2008. • R. Glaubius, T. Tidwell, B. Sidoti, D. Pilla, J. Meden, C. Gill, and W.D. Smart, “Scalable Scheduling Policy Design for Open Soft Real-Time Systems”, Tech. Report WUCSE-2009-71, 2009 (Under Review for RTAS 2010) • R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Design with Unknown Execution Time Distributions or Modes”. Tech. Report WUCSE-2009-15, 2009. • T. Tidwell, R. Glaubius, C. Gill, and W.D. Smart, “Scheduling for Reliable Execution in Autonomic Systems”, ATC 2008. • C. Gill, W.D. Smart, T. Tidwell, and R. Glaubius, “Scheduling as a Learned Art”, OSPERT, 2008. R. Glaubius

Acknowledgements • Bill Smart • Chris Gill • Terry Tidwell • David Pilla, Braden Sidoti, and Justin Meden R. Glaubius

Questions? ? R. Glaubius

Comparison to Real-Time Scheduling • Earliest-Deadline-First (EDF) scheduling: • Enforces timeliness by meeting task deadlines. • Not share aware. • We introduce deadlines as a function of worst-case execution time. • Miss rate is a function of deadline tightness. R. Glaubius

Varying Temporal Resolution R. Glaubius

Stable Conic Policies (0,0,t) • Guaranteed that stable conic policies exist. • For example, set each action vector to point opposite its corresponding vertex. • Induces a vector field that stochastically orbits the decision ray. (t,0,0) (0,t,0) R. Glaubius

More Tasks  Higher Cost • Simple problem: Fair-share scheduling of n deterministic tasks with unit duration. • Trajectories under round robin scheduling: • 2 tasks: E{c(x)} = 1/2. • Trajectory: (0,0)(1,0)(1,1)(0,0) • Costs: c(0,0)=0; c(1,0)=1. • 3 tasks: E{c(x)} = 8/9. • Trajectory: (0,0,0)(1,0,0)(1,1,0)(1,1,1)(0,0,0) • Costs:c(0,0,0)=0; c(1,0,0)=4/3; c(1,1,0)=4/3 • n tasks: E{c(x)} = (n+1)(n-1)/(3n) R. Glaubius

Share Complexity R. Glaubius

Scheduling Policy Design using Stochastic Dynamic Programming

Scheduling Policy Design using Stochastic Dynamic Programming

Presentation Transcript

Non-Preemptive Scheduling Policy Design for Tasks with Stochastic Execution Times*

Design Patterns in Dynamic Programming

Stochastic DAG Scheduling using Monte Carlo Approach

Dynamic Scheduling

Dynamic Scheduling

Dynamic scheduling

Stereo Matching Using Dynamic Programming

Dynamic Scheduling Using Pools and Dynamic Pools in Dynamic Domains

Dynamic Scheduling

Dynamic Scheduling

CPS Scheduling Policy Design with Utility and Stochastic Execution*

Stochastic Dynamic Programming with Factored Representations

Dynamic Programming for Partially Observable Stochastic Games

Stochastic Maintenance Scheduling Problem

Algorithm Design Techniques: Dynamic Programming

Dynamic Scheduling Using Tomasulo’s Approach

Dynamic scheduling

Scalable Stochastic Programming

Non-Preemptive Scheduling Policy Design for Tasks with Stochastic Execution Times*

Non-Preemptive Scheduling Policy Design for Tasks with Stochastic Execution Times*

Stochastic programming