Decision-Theoretic Planning with Asynchronous Events

Decision-Theoretic Planning with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University

Introduction • Asynchronous processes are abundant in the real world • Discrete-time models are inappropriate for systems with asynchronous events • Generalized semi-Markov (decision) processes are great for this!

Stochastic Processes with Asynchronous Events m1 m2 m1 upm2 up t = 0

Stochastic Processes with Asynchronous Events m1 m2 m2 crashes m1 upm2 up m1 upm2 down t = 0 t = 2.5

Stochastic Processes with Asynchronous Events m1 m2 m2 crashes m1 crashes m1 upm2 up m1 upm2 down m1 downm2 down t = 0 t = 2.5 t = 3.1

Stochastic Processes with Asynchronous Events m1 m2 m1 crashes m1 crashes m2 rebooted m1 upm2 up m1 upm2 down m1 downm2 down m1 downm2 up t = 0 t = 2.5 t = 3.1 t = 4.9

A Model of StochasticDiscrete Event Systems • Generalized semi-Markov process (GSMP) [Matthes 1962] • A set of events E • A set of states S

Events • In a state s, events EsE are enabled • With each event e is associated: • A distribution Ge governing the time e must remain enabled before it triggers • A next-state probability distribution pe(s′|s)

Semantics of GSMP Model • Associate a real-valued clock te with e • For each e Essample te from Ge • Let e* = argmine Este, t* = mine Este • Sample s′ from pe*(s′|s) • For each e  Es′ • te′ = te – t* if e  Es \ {e*} • sample te′ from Ge otherwise • Repeat with s = s′ and te = te′

Semantics: Example m2 crashes m1 upm2 up m1 upm2 down m2 crashes: 2.5 m1 crashes: 3.1 m1 crashes: 0.6 reboot m2: 2.4

Notes on Semantics • Events that remain enabled across state transitions without triggering are not rescheduled • Asynchronous events! • Differs from semi-Markov process in this respect • Continuous-time Markov chain if all Ge are exponential distributions

General State-Space Markov Chain (GSSMC) • Model is Markovian if we include the clocks in the state space • Extended state space X • Next-state distribution f(x′|x) well-defined • Clock values are not know to observer • Time events have been enabled is know

Observation Model • An observation o is a state s and a real value ue for each event representing the time e has currently been enabled • f(x|o) is well-defined

Observations: Example m2 crashes m1 upm2 up m1 upm2 down Actual model: m2 crashes: 2.5 m1 crashes: 3.1 m1 crashes: 0.6 reboot m2: 2.4 m2 crashes m1 upm2 up m1 upm2 down Observed model: m2 crashes: 0.0 m1 crashes: 0.0 m1 crashes: 2.5 reboot m2: 0.0

Actions and Policies (GSMDPs) • Identify a set A Eof controllable events (actions) • A policy is a mapping from observations to sets of actions • Action choice can change at any time in a state s

Rewards and Discounts • Lump sum reward k(s,e,s′) associated with transition from s to s′ caused by e • Continuous reward rate r(s,a) associated with a being enabled in s • Discount factor  • Unit reward earned at time t counts as e–t

Value Function for GSMDPs

GSMDP Solution Method [Younes & Simmons 2004] GSMDP GSMDP Continuous-time MDP Continuous-time MDP Discrete-time MDP Phase-type distributions Uniformization [Jensen 1953]

Continuous Phase-Type Distributions [Neuts 1981] • Time to absorption in a continuous-time Markov chain with n transient states

Exponential Distribution  0

Two-Phase Coxian Distribution p1 2 0 1 (1 – p)1

Generalized Erlang Distribution p    0 1 … n – 1 (1 – p)

Method of Moments • Approximate general distribution G with phase-type distribution PH by matching the first n moments

Moments of a Distribution • The ith moment: i= E[Xi] • Mean: 1 • Variance: 2 = 2 – 12 • Coefficient of variation: cv = /1

Matching One Moment • Exponential distribution:  = 1/1

Matching Two Moments Exponential Distribution cv2 0 1

Matching Two Moments Exponential Distribution cv2 0 1 Generalized Erlang Distribution

Matching Two Moments Exponential Distribution Two-Phase Coxian Distribution cv2 0 1 Generalized Erlang Distribution

1/10 1/10 0 1 F(t) 1.0 9/10 0.5 W(1,1/2) t 1 2 3 4 5 6 7 8 Matching Moments: Example 1 • Weibull distribution: W(1,1/2) • 1 = 2, cv2 = 5

6 6 6 0 1 2 F(t) 1.0 0.5 U(0,1) t 1 2 Matching Moments: Example 2 • Uniform distribution: U(0,1) • 1 = 1/2, cv2 = 1/3

Matching More Moments • Closed-form solution for matching three moments of positive distributions [Osogami & Harchol-Balter 2003] • Combination of Erlang distribution and two-phase Coxian distribution

Approximating GSMDP with Continuous-time MDP • Each event with a non-exponential distribution is approximated by a set of events with exponential distributions • Phases become part of state description

Policy Execution • Phases represent discretization into random-length intervals of the time events have been enabled • Phases are not part of real model • Simulate phase transition during execution

The Foreman’s Dilemma • When to enable “Service” action in “Working” state? Service Exp(10) Fail G Servicedc = 0.5 Workingc = 1 Failedc = 0 Return Exp(1) Replace Exp(1/100)

The Foreman’s Dilemma: Optimal Solution • Find t0 that maximizes v0 Y is the time to failure in “Working” state

The Foreman’s Dilemma: SMDP Solution • Same formulas, but restricted choice: • Action is immediately enabled (t0 = 0) • Action is never enabled (t0 = ∞)

The Foreman’s Dilemma: Performance Failure-time distribution: U(5,x) 100 90 80 Percent of optimal 70 60 50 x 5 10 15 20 25 30 35 40 45 50

The Foreman’s Dilemma: Performance Failure-time distribution: W(1.6x,4.5) 100 90 80 Percent of optimal 70 60 50 x 0 5 10 15 20 25 30 35 40

System Administration • Network of n machines • Reward rate c(s) = k in states where k machines are up • One crash event and one reboot action per machine • At most one action enabled at any time

System Administration: Performance Reboot-time distribution: U(0,1) 50 45 40 35 Reward 30 25 20 15 n 1 2 3 4 5 6 7 8 9 10 11 12 13

System Administration: Performance

The Role of Phases • Foreman’s dilemma: • Phases permit delay in enabling of actions • System administration: • Phases allow us to take into account the time an action has already been enabled

Summary • Generalized semi-Markov (decision) processes allow asynchronous events • Phase-type distributions can be used to approximate a GSMDP with an MDP • Allows us to approximately solve GSMDPs using existing MDP techniques • Phase does matter!

Future Work • Discrete phase-type distributions • Handles deterministic distributions • Avoids uniformization step • Value function approximation • Take advantage of GSMDP structure • Other optimization criteria • Finite horizon, etc.

Tempastic-DTP • A tool for GSMDP planning: http://www.cs.cmu.edu/~lorens/tempastic-dtp.html

Decision-Theoretic Planning with Asynchronous Events

Decision-Theoretic Planning with Asynchronous Events

Presentation Transcript

Planning and Decision Making

Planning and Decision Making

Planning and Decision Aids

Events Planning and Organisation

investment decision under uncertainty: choice theoretic approaches

Planning Decision-Making

BEM – Events Planning (A)

Pharmacy Informatics Events Planning

BEM – Events Planning (B)

Multiple Criteria Decision Analysis with Game-theoretic Rough Sets

Decision-theoretic systems in the real world Decision support

Group Theoretic Decision Problems

A Formalism for Stochastic Decision Processes with Asynchronous Events

Planning Successful Events

Planning Events

Strategic Planning For Events

A decision-theoretic view of image retrieval

5/6: Summary and Decision Theoretic Planning

Planning and Verification for Stochastic Processes with Asynchronous Events

Asynchronous behavior with ACS

Decision Theoretic Instructional Planner for Intelligent Tutoring Systems

Events Planning Company