210 likes | 367 Views
Planning and Execution with Phase Transitions. H åkan L. S. Younes Carnegie Mellon University. Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov Processes using Continuous Phase-Type Distributions” ( AAAI’04 ). Planning with Time and Probability. Temporal uncertainty
E N D
Planning and Executionwith Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’“Solving Generalized Semi-Markov Processes usingContinuous Phase-Type Distributions” (AAAI’04)
Planning withTime and Probability • Temporal uncertainty • When to schedule service? Service F1 Fail F3 Serviced Working Failed Return F2 Memoryless distributions MDP Planning and Execution with Phase Transitions
The Role of Phases • Memoryless property of exponential distribution makes MDPs tractable • Many phenomena are not memoryless • Lifetime of product or computer process • Phases introduce memory into state space • Modeling tool—not part of real world (phases are hidden variables) Planning and Execution with Phase Transitions
Exponential Two-phase Coxian p1 2 1 1 2 (1 – p)1 n-phase Erlang 1 2 n Phase-Type Distributions • Time to absorption in Markov chain with n transient states and one absorbing state Planning and Execution with Phase Transitions
Phase-Type Approximation F(x) Weibull(8,4.5) 1 Erlang(7.3/8,8) Erlang(7.3/4,4) Exponential(1/7.3) 0 x 0 2 4 6 8 10 12 Planning and Execution with Phase Transitions
Solving MDPs withPhase Transitions • Solve MDP as if phases were observable • Maintain belief distribution over phase configurations during execution • AAAI’04 paper: Simulate phase transitions • Use QMDP value method to select actions • OK, because there are typically no actions that are useful for observing phases Planning and Execution with Phase Transitions
Factored Event Model • Asynchronous events and actions • Each event/action has: • Enabling condition e • (Phase-type) distribution Ge governing the time e must remain enabled before it triggers • A distribution pe(s′; s) determining the probability that the next state is s′ if e triggers in state s Planning and Execution with Phase Transitions
Event Example • Computer cluster with crash event and reboot action for each computer • Crash event for computer i: • i = upi; p(s′; s) = 1 iff s upi and s′upi • Reboot action for computer i: • i = upi; p(s′; s) = 1 iff supi and s′ upi Planning and Execution with Phase Transitions
Event Example • Cluster with two computers up1up2 up1up2 up1up2 crash2 crash1 t = 0 t = 0.6 t = 0.9 Planning and Execution with Phase Transitions
Exploiting Structure • Value iteration with ADDs: • Factored state representation means that some variable assignments may be invalid • Phase is one for disabled events • Binary encoding of phases with n≠ 2k • Value iteration with state filter f: Planning and Execution with Phase Transitions
Phase Tracking • Infer phase configurations from observable features of the environment • Physical state of process • Current time • Transient analysis for Markov chains • The probability of being in state s at time t Planning and Execution with Phase Transitions
Phase Tracking • Belief distribution changes continuously • Use fixed update interval • Independent phase tracking for each event • Q matrix is n n for n-phase distribution • Phase tracking is O(n) for Erlang distribution Planning and Execution with Phase Transitions
The Foreman’s Dilemma • When to enable “Service” action in “Working” state? Service Exp(10) Fail G Servicedc = –0.1 Workingc = 1 Failedc = 0 Return Exp(1) Planning and Execution with Phase Transitions
Foreman’s Dilemma:Policy Performance Failure-time distribution: W(1.6x,4.5) 100 90 80 Percent of optimal n = 8, = 0.5 n = 4, = 0.5 70 n = 8, simulate 60 n = 4, simulate SMDP 50 x 0 5 10 15 20 25 30 35 40 Planning and Execution with Phase Transitions
Foreman’s Dilemma:Enabling Time for Action 30 20 Service enabling time optimal 10 n = 8, = 0.5 n = 8, simulate SMDP 0 x 0 5 10 15 20 25 30 35 40 Planning and Execution with Phase Transitions
System Administration • Network of n machines • Reward rate c(s) = k in states where k machines are up • One crash event and one reboot action per machine • At most one action enabled at any time (single agent) Planning and Execution with Phase Transitions
System Administration:Policy Performance Reboot-time distribution: U(0,1) 45 40 35 Expected discounted reward n = 4, = n = 2, = 30 n = 4, simulate n = 2, simulate 25 n = 1 20 2 3 4 5 6 7 8 9 10 Number of machines Planning and Execution with Phase Transitions
System Administration: Planning Time 104 103 no filter, n = 5 102 Planning time no filter, n = 4 pre-filter, n = 5 101 pre-filter, n = 4 filter, n = 5 100 filter, n = 4 10–1 2 3 4 5 6 7 8 9 10 Number of machines Planning and Execution with Phase Transitions
Discussion • Phase-type approximations add state variables, but structure can be exploited • Use approximate solution methods (e.g., Guestrin et al., JAIR19) • Improved phase tracking using transient analysis instead of phase simulation • Recent CTBN work with phase-type dist.: • modeling—not planning Planning and Execution with Phase Transitions
Tempastic-DTP • A tool for GSMDP planning: http://sweden.autonomy.ri.cmu.edu/tempastic/ Planning and Execution with Phase Transitions