170 likes | 323 Views
Markov Decision Processes: Approximate Equivalence. Michel de Rougemont Université Paris II & LRI http://www.lri.fr/~mdr/. The world of MDPs. Follow-up of: On the complexity of partially observed markov decision processes, 1996, D. Burago, Anatol, Mdr
E N D
Markov Decision Processes: Approximate Equivalence Michel de Rougemont Université Paris II & LRI http://www.lri.fr/~mdr/
The world of MDPs • Follow-up of: On the complexity of partially observed markov decision processes, 1996, D. Burago, Anatol, Mdr • What is robustness? Deviation model in the 1990s. • Distance on runs in the 2000s • Efficient Distance of a run to an MDP • Approximate Comparison of MDPs Statistics Analysis of Probabilistic Processes, (LICS 2009 with Mathieu Tracol)
M.D.P S : States :s,t,u,v Σ: actions : a,b,c P(u |t,b)=0. Policy σ resolves the non-determinism. Example: σ(t)=b, σ(v)=c Run: s,t,a,u,a,v Trace: aba
This talk • Approximation of Decision problems: Property Testing • Non deterministic Automata: Tester for membership and equivalence. • Markov Decision Processes: Tester for the Existence of Strategies, and Equivalence
1. Testers on a class K Let F be a property on a class K of structures U: An ε -tester for F is a probabilistic algorithm A such that: • If U |= F, A accepts • If U is ε far from F, A rejects with high probability F is testable if there is a probabilistic algorithm A such that • A is an ε -tester for all ε • Time(A) is independent of n=size(U). Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994 Property Testing and its connection to Learning and Approximation. O. Goldreich, S. Goldwasser, D. Ron, 1996. Tester usually implies a linear time corrector. (ε1, ε2)-Tolerant Tester
Edit Distances with Moves on Strings • Classical Edit Distance:Insertions, Deletions, Modifications • Edit Distance with moves : dist(w,w’) 0111000011110011001 0111011110000011001 3. Edit Distance with Moves generalizes to Ordered Trees
Uniform statistics: k-gram W=001010101110 length n, u.stat: any subwords of length k, n-k+1 blocks, shingles
Tester for equality Edit distance with moves. NP-complete problem, but approximable in constant time with additive error. Uniform statistics ( ): W=001010101110 Theorem 1. |u.stat(w)-u.stat(w’)| approximates dist(w,w’)/n. Sample N subwords of length k, compute Y(w) and Y(w’): Lemma (Chernoff).Y(w) approximates u.stat(w). Corollary. |Y(w)-Y(w’)| approximates dist(w,w’)/n. Tester 1: If |Y(w)-Y(w’)| <ε. accept, else reject.
Tester for W є r (regular language) H={u.stat(W) : W in r } is a union of polytopes. 2 Polytopes for r. Y(w) Membership Tester:
2. Equivalence Tester for regular properties Time polynomial in m=Max(|A |, |B |): The exact equivalence is PSPACE complete
3. Markov Decision Processes SD: σ(t)=b, σ(v)=c Trace 1: abac ab abac ab ……. Trace 2: ab abac ab abac……. Policies σ : HR: History dependent and Randomized, MR(k): Memory k, Randomized SD: Stationary Deterministic Communicating MDP
Classical results: k=1 State-action frequencies: For a class K of strategies: Theorem (Puterman, Derman, Tsitsiklis) For a communicating MDP,
Generalization Theorem: For a communicating MDP H x
Existence of a strategy Input: MDP, wn ,δ, λ Theorem: Existence of a strategy is PSPACE hard but testable. Tester: Sample wn : Estimate the dist to H (linear program) H x
General MDPs Union of polytopes: each H can be computed by a linear program. Threshold value for each component. H2: .6 H1: .4
Equivalence of MDPs Decide if the Polytopes are identical with identical threshold values. Equivalence Tester: discretize the polytopes with an ε grid. Check mutual inclusion.
Conclusion • Testers for MDPs. Verify property such as: « Almost surely there are less than 10% a » « After an a, there is a b » …… 2. Testers for probabilistic systems • Approximate Probabilistic Membership • Approximate Equivalence 3. VERAP: http://www.lri.fr/~mdr/verap/