1 / 17

Markov Decision Processes: Approximate Equivalence

Markov Decision Processes: Approximate Equivalence. Michel de Rougemont Université Paris II & LRI http://www.lri.fr/~mdr/. The world of MDPs. Follow-up of: On the complexity of partially observed markov decision processes, 1996, D. Burago, Anatol, Mdr

graham
Download Presentation

Markov Decision Processes: Approximate Equivalence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Markov Decision Processes: Approximate Equivalence Michel de Rougemont Université Paris II & LRI http://www.lri.fr/~mdr/

  2. The world of MDPs • Follow-up of: On the complexity of partially observed markov decision processes, 1996, D. Burago, Anatol, Mdr • What is robustness? Deviation model in the 1990s. • Distance on runs in the 2000s • Efficient Distance of a run to an MDP • Approximate Comparison of MDPs Statistics Analysis of Probabilistic Processes, (LICS 2009 with Mathieu Tracol)

  3. M.D.P S : States :s,t,u,v Σ: actions : a,b,c P(u |t,b)=0. Policy σ resolves the non-determinism. Example: σ(t)=b, σ(v)=c Run: s,t,a,u,a,v Trace: aba

  4. This talk • Approximation of Decision problems: Property Testing • Non deterministic Automata: Tester for membership and equivalence. • Markov Decision Processes: Tester for the Existence of Strategies, and Equivalence

  5. 1. Testers on a class K Let F be a property on a class K of structures U: An ε -tester for F is a probabilistic algorithm A such that: • If U |= F, A accepts • If U is ε far from F, A rejects with high probability F is testable if there is a probabilistic algorithm A such that • A is an ε -tester for all ε • Time(A) is independent of n=size(U). Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994 Property Testing and its connection to Learning and Approximation. O. Goldreich, S. Goldwasser, D. Ron, 1996. Tester usually implies a linear time corrector. (ε1, ε2)-Tolerant Tester

  6. Edit Distances with Moves on Strings • Classical Edit Distance:Insertions, Deletions, Modifications • Edit Distance with moves : dist(w,w’) 0111000011110011001 0111011110000011001 3. Edit Distance with Moves generalizes to Ordered Trees

  7. Uniform statistics: k-gram W=001010101110 length n, u.stat: any subwords of length k, n-k+1 blocks, shingles

  8. Tester for equality Edit distance with moves. NP-complete problem, but approximable in constant time with additive error. Uniform statistics ( ): W=001010101110 Theorem 1. |u.stat(w)-u.stat(w’)| approximates dist(w,w’)/n. Sample N subwords of length k, compute Y(w) and Y(w’): Lemma (Chernoff).Y(w) approximates u.stat(w). Corollary. |Y(w)-Y(w’)| approximates dist(w,w’)/n. Tester 1: If |Y(w)-Y(w’)| <ε. accept, else reject.

  9. Tester for W є r (regular language) H={u.stat(W) : W in r } is a union of polytopes. 2 Polytopes for r. Y(w) Membership Tester:

  10. 2. Equivalence Tester for regular properties Time polynomial in m=Max(|A |, |B |): The exact equivalence is PSPACE complete

  11. 3. Markov Decision Processes SD: σ(t)=b, σ(v)=c Trace 1: abac ab abac ab ……. Trace 2: ab abac ab abac……. Policies σ : HR: History dependent and Randomized, MR(k): Memory k, Randomized SD: Stationary Deterministic Communicating MDP

  12. Classical results: k=1 State-action frequencies: For a class K of strategies: Theorem (Puterman, Derman, Tsitsiklis) For a communicating MDP,

  13. Generalization Theorem: For a communicating MDP H x

  14. Existence of a strategy Input: MDP, wn ,δ, λ Theorem: Existence of a strategy is PSPACE hard but testable. Tester: Sample wn : Estimate the dist to H (linear program) H x

  15. General MDPs Union of polytopes: each H can be computed by a linear program. Threshold value for each component. H2: .6 H1: .4

  16. Equivalence of MDPs Decide if the Polytopes are identical with identical threshold values. Equivalence Tester: discretize the polytopes with an ε grid. Check mutual inclusion.

  17. Conclusion • Testers for MDPs. Verify property such as: « Almost surely there are less than 10% a » « After an a, there is a b » …… 2. Testers for probabilistic systems • Approximate Probabilistic Membership • Approximate Equivalence 3. VERAP: http://www.lri.fr/~mdr/verap/

More Related