320 likes | 413 Views
Finding Approximate POMDP Solutions through Belief Compression. Based on slides by Nicholas Roy, MIT. Estimated robot position Robot position distribution True robot position Goal position. Reliable Navigation. Conventional trajectories may not be robust to localisation error. Control.
E N D
Finding Approximate POMDP Solutions through Belief Compression Based on slides by Nicholas Roy, MIT
Estimated robot position Robot position distribution True robot position Goal position Reliable Navigation Conventional trajectories may not be robust to localisation error
Control Perception World state Perception and Control Control algorithms
Probabilistic PerceptionModel P(x) Control World state Probabilistic PerceptionModel P(x) argmax P(x) Control World state Perception and Control Assumed full observability Exact POMDP planning Intractable Brittle
Probabilistic PerceptionModel P(x) Control World state Compressed P(x) Perception and Control Brittle Intractable Assume full observability Exact POMDP planning
Probabilistic PerceptionModel P(x) Control World state Low-dimensional P(x) Main Insight Good policies for real world POMDPs can be found by planning over low-dimensional representations of the belief space.
but not usually. Belief Space Structure The controller may be globally uncertain...
Coastal Navigation • Represent beliefs using • Discretise into low-dimensional belief space MDP
A Hard Navigation Problem Average Distance to Goal Distance in M
CharacteristicBeliefs Weights Original Beliefs Dimensionality Reduction • Principal Components Analysis
Principal Components Analysis ~ • Given belief bn, we want bm, m«n. Collection of beliefs drawn from 200 state problem Probability of being in state State
m=9 gives this representation for one sample distribution Principal Components Analysis ~ • Given belief bn, we want bm, m«n. One sample distribution Probability of being in state State
Principal Components Analysis Many real world POMDP distributions are characterised by large regions of low probability. Idea: Create fitting criterion that is (exponentially) stronger in low-probability regions (E-PCA)
3 bases 4 bases 1 basis 2 bases Example EPCA Probability of being in state State
Finding Dimensionality • E-PCA will indicate appropriate number of bases, depending on beliefs encountered
S1 Discretise E-PCA S2 S3 Original POMDP Low-dimensional belief space B Discrete belief space MDP Planning ~
Model Parameters • Reward function p(s) Back-project to high dimensional belief s1 s2 s3 Compute expected reward from belief: ~ R(b) ~
~ 2. Recover full belief bi ~ bi bi bj bj 3. Propagate according toaction ~ ~ 1. For each belief bi and action a 4. Propagate according toobservation ~ 5. Recover bj ~ ~ 6. Set T(bi, a, bj) to probabilityof observation Model Parameters Low dimension Full dimension
Initial Distribution Goal state True (hidden) robot position Goal position Robot Navigation Example
True robot position Goal position Robot Navigation Example
Policy Comparison Average Distance to Goal Distance in M 6 bases
Robot position True person position People Finding as a POMDP Fully Observable Robot Position of person unknown
Finding and Tracking People Robot position True person position
People Finding as a POMDP • Factored belief space • 2 dimensions: fully-observable robot position • 6 dimensions: distribution over person positions Regular grid gives ≈ 1016 states
Variable Resolution • Non-regular grid using samples ~ ~ T(b1, a1, b2) ~ b1 ~ b2 ~ b3 ~ b4 ~ ~ T(b1, a2, b5) ~ b5 Compute model parameters using nearest-neighbour
~ V(b1) ~ b1 ~ V(b'1) ~ b' ~ ~ Keep new belief if V(b'1) > V(b1) Refining the Grid Sample beliefs according to policy Construct new model
Robot position True person position The Optimal Policy Original distribution Reconstruction using EPCA and 6 bases
E-PCA Policy Comparison Average time to find person Average # of Actions to find Person Fully observable MDP E-PCA: 72 states Refined E-PCA: 260 states
Nick’s Thesis Contributions • Good policies for real world POMDPs can be found by planning over a low-dimensional representation of the belief space, using E-PCA. • POMDPs can scale to bigger, more complicated real-world problems. • POMDPs can be used for real deployed robots.