Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding

Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding Martin Sachenbacher July 1, 2003

Exact vs. Approximate ME • Problems of ME with incomplete belief state • Dead ends (no solutions) • Incorrect leading solutions • Incorrect probabilities of solutions • Usefulness of ME with complete belief state • As accuracy reference • As performance reference • As a starting point for approximations • Key: Compact representation of belief state • Map to semiring-based CSP • Decompose Hypergraph into Hypertree • Encode Tree Nodes symbolically as ADDs

Outline • SCSPs (Semiring-based CSPs) • Mapping State Constraints to SCSPs • Mapping Transition Constraints to SCSPs • ADDs (Algebraic Decision Diagrams) • Hypertree Decompositions of SCSPs • Solving Tree-structured SCSPs • Exact Mode Estimation for POMDPs as Decomposition/ADD-based SCSP Solving • Demonstration: Two Switches Example

SCSPs (Semiring-based CSPs) • Generalization of CSPs [Bistarelli et al. 97] • Domain D, Variables V, Set S, Type T  V • Constraints are mappings Dk S • Operations  (for join) and  (for projection) on S • (S, , , 0, 1) must for form c-semiring • Dynamic Programming applicable to all SCSPs • Examples • ({0,1}, , , 0, 1): Classical CSPs • (R+, min, +, +, 0): Weighted CSPs • ([0,1], max, *, 0, 1): Probabilistic CSPs

Encoding States as SCSPs • Example: Or-Gate • P(Or=ok) = 99%, P(Or=fty) = 1% Or ≥ 1 xt in1 in2 out f ok lo lo look lo hi hiok hi lo hiok hi hi hifty * * * 0.990.990.990.990.01

Encoding Observations as SCSPs • Example: (Probabilistic) Observation Distribution over values for xi xi f 0.9 0123 0.60.90.30.0 P 0.6 0.3 xi 0 1 2 3

Encoding Transitions as SCSPs • Example: (Probabilistic) CCA Transition Function cmd=off xt cmd xt+1 f 0.9 0.90.10.10.90.90.10.10.9 0 off 00 on 00 off 10 on 11 off 01 on 01 off 11 on 1 0 cmd=on cmd=off 0.9 0.9 1 0.9 cmd=on

Algebraic Decision Diagrams • ADDs: Symbolic (graph-based) representation of functions {0,1}n R • Generalization of BDDs (functions {0,1}n {0,1}) • Canonicity of representation (as for BDDs) • Efficient package: CUDD A B B C C 0 1 2 3

ADD Join Operations • Multiplication, addition, maximum, … • Generalization of BDD operations ABC f g f+g f*g 5*f f>1 max(f,g) 000001010011100101110111 01121223 32010001 32131224 02020003 055105101015 00010111 32121223

Example • Summation of ADD f, ADD g A A A B B B B B B + = C C C C C C C C 3 2 1 0 0 1 2 3 4 3 2 1

ADD Projection Operations • (f,X) (and (f,X)) obtained by summing (multiplying) values of tuples that differ only w.r.t. X ABC f AB (f,{C}) (f,{C}) 000001010011100101110111 01121223 00011011 1335 0226

ADD Projection Operations • For optimization, we require operation max(f,X) that yields maximum value of tuples differing only w.r.t. X ABC f AB (f,{C}) (f,{C}) max(f,{C}) 000001010011100101110111 01121223 00011011 1335 0226 1223 Not part of CUDD, but easy to implement as variant of /(f,X).

Solving SCSPs using Decomposition • Transform SCSPs into Hypertree H=(T,,) • Compute constraint (v) for each node v • Bottom-up phase for computing values • Top-down phase for extracting solutions

Pseudocode for Bottom-Up Phase • Function solve(v) For Each child  children(v) (v)  (v)  max((child), (child) \ (v)) Next child Return(v) Generalization of (Semi-)Join Operation

Example • Boolean Polycell X A = 1 Or1 B = 1 And1 F = 0 Y Or2 C = 1 And2 G = 1 D = 1 Z Or3 E = 0

Example • Hypertree Decomposition of Boolean Polycell ok ok 1 0 0 0 0 1ok ok 1 0 0 0 1 1ok ok 1 0 0 1 0 1 … U=.98505 O3A1CEFXYZ v0 Y,Z Y C,X O2BDY A2GYZ O1ACX v1 v2 v3 ok 1 1 1fty 1 1 1fty 1 0 1fty 1 1 0fty 1 0 0 ok 1 1 1fty 1 1 1fty 1 1 0 U=.995 U=.99 U=.99 ok 1 1 1fty 1 1 1fty 1 1 0 U=.005 U=.01 U=.01

Example • Initial (v0) U=.98505 ok ok 1 0 0 0 0 1ok ok 1 0 0 0 1 1ok ok 1 0 0 1 0 1 ADD with20 nodes,5 leaves fty ok 1 0 0 0 1 1fty ok 1 0 0 0 1 0fty ok 1 0 0 1 0 0fty ok 1 0 0 1 0 1fty ok 1 0 0 0 1 1fty ok 1 0 0 1 0 1 U=.00995 O3A1CEFXYZ v0 … U=.00495 … U=.00005

Example • After multiplication with max((v1),{A2,G}) ok ok 1 0 0 0 1 1 U=.98012 ADD with28 nodes,7 leaves fty ok 1 0 0 0 1 1 U=.00990 ok ok 1 0 0 0 0 1ok ok 1 0 0 1 0 1… U=.00492 O3A1CEFXYZ v0 … U=2.4E-5 … U=4.9E-5 … U=2.5E-7

Example • After multiplication with max((v2),{O2,B,D}) ADD with30 nodes,8 leaves ok ok 1 0 0 0 1 1 U=.97032 fty ok 1 0 0 0 1 1 U=.00980 ok fty 1 0 0 0 1 1 U=.00487 … U=4.9E-5 … O3A1CEFXYZ U=4.9E-7 v0 … U=2.4E-7 … U=2.5E-9

Example • After multiplication with max((v3),{O1,A}) ADD with35 nodes,10 leaves ok ok 1 0 0 0 1 1 U=.00970 ok fty 1 0 0 1 1 1 U=.00482 fty ok 1 0 0 0 1 1 U=9.8E-5 Best Solution:Umax = .0097 … U=4.8E-5 … U=4.9E-7 O3A1CEFXYZ … v0 U=2.4E-7 … U=4.9E-9 … U=2.4E-9 … U=2.5E-11

Pseudocode for Top-Down Phase No search queue necessary • Function extractSolutions(vroot) Eedges(vroot)   (vroot)  max(, vars() \ decvars()vars(E)) WhileE   Do e  choose(E) v  son-node(e) E  (E \ e)  edges(v) 0-1  (0) div  max(0-1  (v), vars())   (  (v)) -1 div   max(, vars() \ decvars()vars(E)) End While Restrict todecision andshared variables “Divisor”

Example • Initial  = max((vroot),{E,F}) O3A1CXYZ ok ok 1 0 1 1 U=.00970 ok fty 1 1 1 1 U=.00482 fty ok 1 0 1 1 U=9.8E-5 … U=4.8E-5 ADD with21 tuples, 33 nodes, 10 leaves … U=4.9E-7 … U=2.4E-7 … U=4.9E-9 … U=2.4E-9 … U=2.5E-11

Example • After processing edge(v0,v3) O1O3A1YZ fty ok ok 1 1 U=.00970 ok ok fty 1 1 U=.00482 fty fty ok 1 1 U=9.8E-5 … U=4.8E-5 ADD with21 tuples, 32 nodes, 10 leaves … U=4.9E-7 … U=2.4E-7 … U=4.9E-9 … U=2.4E-9 … U=2.5E-11

Example • After processing edge(v0,v2) O1O2O3A1YZ fty ok ok ok 1 1 U=.00970 ok ok ok fty 1 1 U=.00482 fty fty ok ok 1 1fty ok fty ok 1 1 U=9.8E-5 ADD with30 tuples, 47 nodes, 11 leaves … U=4.8E-5 … U=9.9E-7 … U=4.9E-7 … … … U=2.5E-11

Example • After processing edge(v0,v1) O1O2O3A1A2 fty ok ok ok ok U=.00970 ok ok ok fty ok U=.00482 ADD with26 tuples,35 nodes, 12 leaves fty fty ok ok okfty ok fty ok ok U=9.8E-5 … U=4.8E-5 … U=2.4E-5 #Solutions = 26 … U=9.9E-7 … Easy to focus on leading solutions. … … U=2.5E-11

Application: Exact ME for POMDPs • Given: POMDP (Feasible States, Observables, Control Actions, Transitions), Observations • Approach: Complete representation of belief state (through decomposition and symbolic encoding) • Benefit: Allows for exploiting Markov property S0S1…Sn S0S1…Sn Time t Time t+1

Algorithm: Exact ME for POMDPs • Construct Hypertree (offline) • Construct State-ADDs for each node (offline) • Construct Transition-ADDs for each node (offline) • Repeat for each time step: • Multiply nodes with Obs-ADDs (“Condition on Observations”) • Establish consistency in the tree (Bottom-up) • Extract leading solution(s) from the tree (Top-down) • Multiply nodes with Transition-ADDs, project on xt+1, set xt = xt+1, multiply with State-ADDs (“Transition Expansion”) • Complexity: Polynomial in width of Hypertree

Example • Adapted from Jim Kurien’s thesis • t0: Sw1.cmd = on • t1: Or.out = lo, Sw1.cmd = idl, Sw2.cmd = on • t2: Or.out = lo Sw1 Or Switches more likely to fail than Or-Gate hi ≥ 1 hi Sw2

Example • Switch Model cmd=on,idl cmd=off,idl 0.95 0.95 0.95 t1 t2 cmd=off t1 t2 lo lo lo hihi lohi hi on off lo lo hi hi cmd=on 0.95 0.05 0.05 fty true 1.0

Example • Switch Model xt cmd xt+1 f on on onon off offon idl onon * ftyoff on onoff off offoff idl offoff * ftyfty * fty 0.950.950.950.050.950.950.950.051.0 xt t1 t2 f on lo loon hi hioff * *fty * * 1.01.01.01.0

Example • Or-Gate Model xt in1 in2 out f ok lo lo look lo hi hiok hi lo hiok hi hi hifty * * * 1.01.01.01.01.0 0.99 in1 in2 out lo lo lolo hi hihi lo hihi hi hi ok 0.01 xt xt+1 f true fty ok okok ftyfty fty 0.990.011.0 1.0

Example • Initial belief state (chosen): • p(Sw=on) = p(Sw=off) = 0.475, p(Sw=fty) = 0.05 • p(Or=ok) = 0.99, p(Or=fty) = 0.01 • Observations/Commands: • t0: Sw1.cmd=on • t1: Or.out=lo, Sw1.cmd=idl, Sw2.cmd=on • t2: Or.out=lo • Leading Solutions: • t0: Sw1=on/off, Sw2=on/off, Or=ok • t1: Sw1=fty, Sw2=off, Or=ok • t2: Sw1=on, Sw2=on, Or=fty

Conclusion • SCSPs elegant and general representation • ADDs encoding of SCSPs efficient in average case, exponential in the number of variables in worst case • Decomposition factors problem into set of ADDs, each confined to small numbers of variables • The two methods complement each other well • How far can we get with this combination?

Exact Mode Estimation for POMDPs based on Constraint Decomposition and Symbolic Encoding