460 likes | 629 Views
Scheduling Policy Design using Stochastic Dynamic Programming. Robert Glaubius Dissertation Defense 2009, November 13. Sensing on a Mobile Robot. Camera for mission objectives ( e . g ., finding people). Laser range-finder for obstacle detection. Sensing on a Mobile Robot.
E N D
Scheduling Policy Design using Stochastic Dynamic Programming Robert Glaubius Dissertation Defense 2009, November 13
Sensing on a Mobile Robot • Camera for mission objectives (e.g., finding people). • Laser range-finder for obstacle detection. R. Glaubius
Sensing on a Mobile Robot • Some common obstacles may escape laser detection. R. Glaubius
Sensing on a Mobile Robot • Use camera to supplement obstacle detection. R. Glaubius
Resource Contention 67% • Now we have two tasks that need the camera. • We need a rational policy for allocating the camera to each task. • Assign each task a resource time share. 33% R. Glaubius
Task Scheduling Model • Multiple, repeating tasks use a mutually-exclusive shared resource. • Each task has a utilization target specifying its share, u=(u1,…,un). • Each task instance has stochastic duration. • Tasks may not be preempted. R. Glaubius
The Main Contribution • Scheduling policy design techniques for non-preemptive, non-deterministic systems that are • Share Aware • Scalable • Adaptive R. Glaubius
Share Aware Scheduling • System state: cumulative resource usage of each task. • Dispatching a task moves the system stochastically through the state space according to that task’s duration. (8,17) R. Glaubius
Share Aware Scheduling u • Utilization target induces a ray{u:0} through the state space. • Encode “goodness” relative to the share as a cost. • Require that costs grow with distance from utilization ray. u=(1/3,2/3) R. Glaubius
Task Scheduling MDP • States are the cumulative resource utilization of each task. • Actions correspond to dispatching a task. • Transitions dictated by task duration distributions. • Costs grow with deviation from the share target. • Goal: find a policy that minimizes long-term cost. R. Glaubius
Transition Structure • Transitions are state-independent • Relative distribution over successor states is the same in each state. R. Glaubius
Cost Structure • State equivalence under costs: • States along lines parallel to the utilization ray have equal cost R. Glaubius
Equivalence Classes • Transition and cost structure induces state equivalence. • Equivalent states have the same optimal long-term cost and policy! R. Glaubius
Periodicity • Periodic structure allows us to remove all but one exemplar from each equivalence class. R. Glaubius
Wrapped state model • Remove all but one exemplar from each equivalence class. • Actions and costs remain unchanged. • Remap transitions to removed states to the corresponding exemplar. (0,0) R. Glaubius
c(x)= c(x)= Bounded state model • Inexpensive states are near the utilization target. • Good policies should keep costs small. • Can truncate the state space by bounding costs. R. Glaubius
Bounded state model • Mapping “dangling” transitions to a high-cost absorbing state guarantees that we find bounded cost policies when they exist. • Bounded costs guarantee bounded deviation from the resource share. R. Glaubius
Scheduling Policy Design • Iteratively increase bounds and resolve problem. • As bounds increase, the bounded model solution converges to the optimal wrapped model policy. R. Glaubius
Automating Model Discovery • ESPI: Expanding State Policy Iteration • Start with a policy that only reaches finitely many states from (0,…,0). • E.g., always run most underutilized task. • Enumerate enough states to evaluate and improve that policy. • If policy can not be improved, stop. • Otherwise, goto (2) with improved policy. R. Glaubius
Policy Evaluation Envelope • Enumerate states reachable from the initial state. • Breadth-first state space exploration under the current policy, starting from the initial state. (0,0) R. Glaubius
Policy Improvement Envelope • Consider alternative actions. • Close under the current policy using breadth-first expansion. • Evaluate and improve the policy within this envelope. R. Glaubius
ESPI Termination • As long as the initial policy has finite closure, each ESPI iteration terminates. • Satisfied by the policy that always runs the most underutilized task. • Policy strictly improves at each iteration. • Empirically, ESPI terminates on task scheduling MDPs. R. Glaubius
Comparing Design Methods • Policy performance normalized and centered on ESPI solution. • Larger bounded state models yield ESPI solution. R. Glaubius
Share Aware Scheduling • MDP representation allows consistent approximation of the optimal scheduling policy. • Empirically, bounded model and ESPI solutions appear optimal. • Approach scales exponentially in the number of tasks. R. Glaubius
Addressing the Curse of Dimensionality • Focus attention on a restricted class of appropriate scheduling policies. • How do we choose and parameterize these policies? R. Glaubius
Two-task MDP Policy • Scheduling policies induce a partition on the state space with boundary parallel to the share target. • Establish a decision offset to identify the partition boundary. • Sufficient in 2-d, but what about higher dimensions? R. Glaubius
Time Horizons Ht={x : x1+x2+…+xn=t} u (0,0,2) u (0,2,0) H0 H1 (0,0) (2,0,0) H0 H1 H2 H3 H4 H2 R. Glaubius
Three-task MDP Policy t =10 t =20 t =30 • Action partitions meet along a decision ray that is parallel to the utilization ray. • Action partitions are roughly cone-shaped. R. Glaubius
x Parameterizing the Partition • Specify a decision offset at the intersection of partitions. • Anchor action vectors at the decision offset to approximate partitions. • The conic policy selects action vector best aligned with the displacement between the query state and the decision offset. a2 a1 a3 R. Glaubius
Decision offset d Action vectors a1,a2,…,an Sufficient to partition each time horizon into nregions. Tune policies through local search. Conic Policy Parameterization R. Glaubius
Four Tasks R. Glaubius
Ten Tasks R. Glaubius
Varying Numbers of Tasks R. Glaubius
Addressing the Curse of Dimensionality • Conic policy approximates the geometry of those found using ESPI. • Number of parameters grows just quadratically with the number of tasks. • Contains cost-bounded, stable policies. • Performance is competitive with ESPI. • Improves on heuristic policies. R. Glaubius
Conclusions • We have addressed a novel set of scheduling concerns present in many cyber-physical systems • Non-preemptive resource semantics. • Stochastic task execution times. • Enforce a user-selected resource share. • Our model-based solution methods provide strong approximations to optimal policies. • Our conic policies allow us to scale model-based techniques to larger problems. R. Glaubius
Further Contributions • Adaptive scheduling: online learning • Sample complexity of learning is similar to optimal control of single-state MDPs. • The domain enforces rational exploration without explicit exploration mechanisms. • Formal Guarantees: • Existence of optimal scheduling policies. • Periodicity of optimal scheduling policies. • Existence of cost-bounded policies. • Existence of stable conic policies. R. Glaubius
Publications • R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Policy Design for Autonomic Systems”, International Journal on Autonomous and Adaptive Communications Systems, 2(3):276-296, 2009. • R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Design and Verification for Open Soft Real-Time Systems”, RTSS 2008. • R. Glaubius, T. Tidwell, B. Sidoti, D. Pilla, J. Meden, C. Gill, and W.D. Smart, “Scalable Scheduling Policy Design for Open Soft Real-Time Systems”, Tech. Report WUCSE-2009-71, 2009 (Under Review for RTAS 2010) • R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Design with Unknown Execution Time Distributions or Modes”. Tech. Report WUCSE-2009-15, 2009. • T. Tidwell, R. Glaubius, C. Gill, and W.D. Smart, “Scheduling for Reliable Execution in Autonomic Systems”, ATC 2008. • C. Gill, W.D. Smart, T. Tidwell, and R. Glaubius, “Scheduling as a Learned Art”, OSPERT, 2008. R. Glaubius
Acknowledgements • Bill Smart • Chris Gill • Terry Tidwell • David Pilla, Braden Sidoti, and Justin Meden R. Glaubius
Questions? ? R. Glaubius
Comparison to Real-Time Scheduling • Earliest-Deadline-First (EDF) scheduling: • Enforces timeliness by meeting task deadlines. • Not share aware. • We introduce deadlines as a function of worst-case execution time. • Miss rate is a function of deadline tightness. R. Glaubius
Varying Temporal Resolution R. Glaubius
Stable Conic Policies (0,0,t) • Guaranteed that stable conic policies exist. • For example, set each action vector to point opposite its corresponding vertex. • Induces a vector field that stochastically orbits the decision ray. (t,0,0) (0,t,0) R. Glaubius
Stable Conic Policies (0,0,t) • Guaranteed that stable conic policies exist. • For example, set each action vector to point opposite its corresponding vertex. • Induces a vector field that stochastically orbits the decision ray. (t,0,0) (0,t,0) R. Glaubius
More Tasks Higher Cost • Simple problem: Fair-share scheduling of n deterministic tasks with unit duration. • Trajectories under round robin scheduling: • 2 tasks: E{c(x)} = 1/2. • Trajectory: (0,0)(1,0)(1,1)(0,0) • Costs: c(0,0)=0; c(1,0)=1. • 3 tasks: E{c(x)} = 8/9. • Trajectory: (0,0,0)(1,0,0)(1,1,0)(1,1,1)(0,0,0) • Costs:c(0,0,0)=0; c(1,0,0)=4/3; c(1,1,0)=4/3 • n tasks: E{c(x)} = (n+1)(n-1)/(3n) R. Glaubius
Share Complexity R. Glaubius