330 likes | 487 Views
Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding. Martin Sachenbacher July 1, 2003. Exact vs. Approximate ME. Problems of ME with incomplete belief state Dead ends (no solutions) Incorrect leading solutions Incorrect probabilities of solutions
E N D
Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding Martin Sachenbacher July 1, 2003
Exact vs. Approximate ME • Problems of ME with incomplete belief state • Dead ends (no solutions) • Incorrect leading solutions • Incorrect probabilities of solutions • Usefulness of ME with complete belief state • As accuracy reference • As performance reference • As a starting point for approximations • Key: Compact representation of belief state • Map to semiring-based CSP • Decompose Hypergraph into Hypertree • Encode Tree Nodes symbolically as ADDs
Outline • SCSPs (Semiring-based CSPs) • Mapping State Constraints to SCSPs • Mapping Transition Constraints to SCSPs • ADDs (Algebraic Decision Diagrams) • Hypertree Decompositions of SCSPs • Solving Tree-structured SCSPs • Exact Mode Estimation for POMDPs as Decomposition/ADD-based SCSP Solving • Demonstration: Two Switches Example
SCSPs (Semiring-based CSPs) • Generalization of CSPs [Bistarelli et al. 97] • Domain D, Variables V, Set S, Type T V • Constraints are mappings Dk S • Operations (for join) and (for projection) on S • (S, , , 0, 1) must for form c-semiring • Dynamic Programming applicable to all SCSPs • Examples • ({0,1}, , , 0, 1): Classical CSPs • (R+, min, +, +, 0): Weighted CSPs • ([0,1], max, *, 0, 1): Probabilistic CSPs
Encoding States as SCSPs • Example: Or-Gate • P(Or=ok) = 99%, P(Or=fty) = 1% Or ≥ 1 xt in1 in2 out f ok lo lo look lo hi hiok hi lo hiok hi hi hifty * * * 0.990.990.990.990.01
Encoding Observations as SCSPs • Example: (Probabilistic) Observation Distribution over values for xi xi f 0.9 0123 0.60.90.30.0 P 0.6 0.3 xi 0 1 2 3
Encoding Transitions as SCSPs • Example: (Probabilistic) CCA Transition Function cmd=off xt cmd xt+1 f 0.9 0.90.10.10.90.90.10.10.9 0 off 00 on 00 off 10 on 11 off 01 on 01 off 11 on 1 0 cmd=on cmd=off 0.9 0.9 1 0.9 cmd=on
Algebraic Decision Diagrams • ADDs: Symbolic (graph-based) representation of functions {0,1}n R • Generalization of BDDs (functions {0,1}n {0,1}) • Canonicity of representation (as for BDDs) • Efficient package: CUDD A B B C C 0 1 2 3
ADD Join Operations • Multiplication, addition, maximum, … • Generalization of BDD operations ABC f g f+g f*g 5*f f>1 max(f,g) 000001010011100101110111 01121223 32010001 32131224 02020003 055105101015 00010111 32121223
Example • Summation of ADD f, ADD g A A A B B B B B B + = C C C C C C C C 3 2 1 0 0 1 2 3 4 3 2 1
ADD Projection Operations • (f,X) (and (f,X)) obtained by summing (multiplying) values of tuples that differ only w.r.t. X ABC f AB (f,{C}) (f,{C}) 000001010011100101110111 01121223 00011011 1335 0226
ADD Projection Operations • For optimization, we require operation max(f,X) that yields maximum value of tuples differing only w.r.t. X ABC f AB (f,{C}) (f,{C}) max(f,{C}) 000001010011100101110111 01121223 00011011 1335 0226 1223 Not part of CUDD, but easy to implement as variant of /(f,X).
Solving SCSPs using Decomposition • Transform SCSPs into Hypertree H=(T,,) • Compute constraint (v) for each node v • Bottom-up phase for computing values • Top-down phase for extracting solutions
Pseudocode for Bottom-Up Phase • Function solve(v) For Each child children(v) (v) (v) max((child), (child) \ (v)) Next child Return(v) Generalization of (Semi-)Join Operation
Example • Boolean Polycell X A = 1 Or1 B = 1 And1 F = 0 Y Or2 C = 1 And2 G = 1 D = 1 Z Or3 E = 0
Example • Hypertree Decomposition of Boolean Polycell ok ok 1 0 0 0 0 1ok ok 1 0 0 0 1 1ok ok 1 0 0 1 0 1 … U=.98505 O3A1CEFXYZ v0 Y,Z Y C,X O2BDY A2GYZ O1ACX v1 v2 v3 ok 1 1 1fty 1 1 1fty 1 0 1fty 1 1 0fty 1 0 0 ok 1 1 1fty 1 1 1fty 1 1 0 U=.995 U=.99 U=.99 ok 1 1 1fty 1 1 1fty 1 1 0 U=.005 U=.01 U=.01
Example • Initial (v0) U=.98505 ok ok 1 0 0 0 0 1ok ok 1 0 0 0 1 1ok ok 1 0 0 1 0 1 ADD with20 nodes,5 leaves fty ok 1 0 0 0 1 1fty ok 1 0 0 0 1 0fty ok 1 0 0 1 0 0fty ok 1 0 0 1 0 1fty ok 1 0 0 0 1 1fty ok 1 0 0 1 0 1 U=.00995 O3A1CEFXYZ v0 … U=.00495 … U=.00005
Example • After multiplication with max((v1),{A2,G}) ok ok 1 0 0 0 1 1 U=.98012 ADD with28 nodes,7 leaves fty ok 1 0 0 0 1 1 U=.00990 ok ok 1 0 0 0 0 1ok ok 1 0 0 1 0 1… U=.00492 O3A1CEFXYZ v0 … U=2.4E-5 … U=4.9E-5 … U=2.5E-7
Example • After multiplication with max((v2),{O2,B,D}) ADD with30 nodes,8 leaves ok ok 1 0 0 0 1 1 U=.97032 fty ok 1 0 0 0 1 1 U=.00980 ok fty 1 0 0 0 1 1 U=.00487 … U=4.9E-5 … O3A1CEFXYZ U=4.9E-7 v0 … U=2.4E-7 … U=2.5E-9
Example • After multiplication with max((v3),{O1,A}) ADD with35 nodes,10 leaves ok ok 1 0 0 0 1 1 U=.00970 ok fty 1 0 0 1 1 1 U=.00482 fty ok 1 0 0 0 1 1 U=9.8E-5 Best Solution:Umax = .0097 … U=4.8E-5 … U=4.9E-7 O3A1CEFXYZ … v0 U=2.4E-7 … U=4.9E-9 … U=2.4E-9 … U=2.5E-11
Pseudocode for Top-Down Phase No search queue necessary • Function extractSolutions(vroot) Eedges(vroot) (vroot) max(, vars() \ decvars()vars(E)) WhileE Do e choose(E) v son-node(e) E (E \ e) edges(v) 0-1 (0) div max(0-1 (v), vars()) ( (v)) -1 div max(, vars() \ decvars()vars(E)) End While Restrict todecision andshared variables “Divisor”
Example • Initial = max((vroot),{E,F}) O3A1CXYZ ok ok 1 0 1 1 U=.00970 ok fty 1 1 1 1 U=.00482 fty ok 1 0 1 1 U=9.8E-5 … U=4.8E-5 ADD with21 tuples, 33 nodes, 10 leaves … U=4.9E-7 … U=2.4E-7 … U=4.9E-9 … U=2.4E-9 … U=2.5E-11
Example • After processing edge(v0,v3) O1O3A1YZ fty ok ok 1 1 U=.00970 ok ok fty 1 1 U=.00482 fty fty ok 1 1 U=9.8E-5 … U=4.8E-5 ADD with21 tuples, 32 nodes, 10 leaves … U=4.9E-7 … U=2.4E-7 … U=4.9E-9 … U=2.4E-9 … U=2.5E-11
Example • After processing edge(v0,v2) O1O2O3A1YZ fty ok ok ok 1 1 U=.00970 ok ok ok fty 1 1 U=.00482 fty fty ok ok 1 1fty ok fty ok 1 1 U=9.8E-5 ADD with30 tuples, 47 nodes, 11 leaves … U=4.8E-5 … U=9.9E-7 … U=4.9E-7 … … … U=2.5E-11
Example • After processing edge(v0,v1) O1O2O3A1A2 fty ok ok ok ok U=.00970 ok ok ok fty ok U=.00482 ADD with26 tuples,35 nodes, 12 leaves fty fty ok ok okfty ok fty ok ok U=9.8E-5 … U=4.8E-5 … U=2.4E-5 #Solutions = 26 … U=9.9E-7 … Easy to focus on leading solutions. … … U=2.5E-11
Application: Exact ME for POMDPs • Given: POMDP (Feasible States, Observables, Control Actions, Transitions), Observations • Approach: Complete representation of belief state (through decomposition and symbolic encoding) • Benefit: Allows for exploiting Markov property S0S1…Sn S0S1…Sn Time t Time t+1
Algorithm: Exact ME for POMDPs • Construct Hypertree (offline) • Construct State-ADDs for each node (offline) • Construct Transition-ADDs for each node (offline) • Repeat for each time step: • Multiply nodes with Obs-ADDs (“Condition on Observations”) • Establish consistency in the tree (Bottom-up) • Extract leading solution(s) from the tree (Top-down) • Multiply nodes with Transition-ADDs, project on xt+1, set xt = xt+1, multiply with State-ADDs (“Transition Expansion”) • Complexity: Polynomial in width of Hypertree
Example • Adapted from Jim Kurien’s thesis • t0: Sw1.cmd = on • t1: Or.out = lo, Sw1.cmd = idl, Sw2.cmd = on • t2: Or.out = lo Sw1 Or Switches more likely to fail than Or-Gate hi ≥ 1 hi Sw2
Example • Switch Model cmd=on,idl cmd=off,idl 0.95 0.95 0.95 t1 t2 cmd=off t1 t2 lo lo lo hihi lohi hi on off lo lo hi hi cmd=on 0.95 0.05 0.05 fty true 1.0
Example • Switch Model xt cmd xt+1 f on on onon off offon idl onon * ftyoff on onoff off offoff idl offoff * ftyfty * fty 0.950.950.950.050.950.950.950.051.0 xt t1 t2 f on lo loon hi hioff * *fty * * 1.01.01.01.0
Example • Or-Gate Model xt in1 in2 out f ok lo lo look lo hi hiok hi lo hiok hi hi hifty * * * 1.01.01.01.01.0 0.99 in1 in2 out lo lo lolo hi hihi lo hihi hi hi ok 0.01 xt xt+1 f true fty ok okok ftyfty fty 0.990.011.0 1.0
Example • Initial belief state (chosen): • p(Sw=on) = p(Sw=off) = 0.475, p(Sw=fty) = 0.05 • p(Or=ok) = 0.99, p(Or=fty) = 0.01 • Observations/Commands: • t0: Sw1.cmd=on • t1: Or.out=lo, Sw1.cmd=idl, Sw2.cmd=on • t2: Or.out=lo • Leading Solutions: • t0: Sw1=on/off, Sw2=on/off, Or=ok • t1: Sw1=fty, Sw2=off, Or=ok • t2: Sw1=on, Sw2=on, Or=fty
Conclusion • SCSPs elegant and general representation • ADDs encoding of SCSPs efficient in average case, exponential in the number of variables in worst case • Decomposition factors problem into set of ADDs, each confined to small numbers of variables • The two methods complement each other well • How far can we get with this combination?