Finding Approximate POMDP Solutions through Belief Compression

Finding Approximate POMDP Solutions through Belief Compression Based on slides by Nicholas Roy, MIT

Estimated robot position Robot position distribution True robot position Goal position Reliable Navigation Conventional trajectories may not be robust to localisation error

Control Perception World state Perception and Control Control algorithms

Probabilistic PerceptionModel P(x) Control World state Probabilistic PerceptionModel P(x) argmax P(x) Control World state Perception and Control Assumed full observability Exact POMDP planning Intractable Brittle

Probabilistic PerceptionModel P(x) Control World state Compressed P(x) Perception and Control Brittle Intractable Assume full observability Exact POMDP planning

Probabilistic PerceptionModel P(x) Control World state Low-dimensional P(x) Main Insight Good policies for real world POMDPs can be found by planning over low-dimensional representations of the belief space.

but not usually. Belief Space Structure The controller may be globally uncertain...

Coastal Navigation • Represent beliefs using • Discretise into low-dimensional belief space MDP

Coastal Navigation

A Hard Navigation Problem Average Distance to Goal Distance in M

CharacteristicBeliefs Weights Original Beliefs Dimensionality Reduction • Principal Components Analysis

Principal Components Analysis ~ • Given belief bn, we want bm, m«n. Collection of beliefs drawn from 200 state problem Probability of being in state State

m=9 gives this representation for one sample distribution Principal Components Analysis ~ • Given belief bn, we want bm, m«n. One sample distribution Probability of being in state State

Principal Components Analysis Many real world POMDP distributions are characterised by large regions of low probability. Idea: Create fitting criterion that is (exponentially) stronger in low-probability regions (E-PCA)

3 bases 4 bases 1 basis 2 bases Example EPCA Probability of being in state State

Example Reduction

Finding Dimensionality • E-PCA will indicate appropriate number of bases, depending on beliefs encountered

S1 Discretise E-PCA S2 S3 Original POMDP Low-dimensional belief space B Discrete belief space MDP Planning ~

Model Parameters • Reward function p(s) Back-project to high dimensional belief s1 s2 s3 Compute expected reward from belief: ~ R(b) ~

~ 2. Recover full belief bi ~ bi bi bj bj 3. Propagate according toaction ~ ~ 1. For each belief bi and action a 4. Propagate according toobservation ~ 5. Recover bj ~ ~ 6. Set T(bi, a, bj) to probabilityof observation Model Parameters Low dimension Full dimension

Initial Distribution Goal state True (hidden) robot position Goal position Robot Navigation Example

True robot position Goal position Robot Navigation Example

Policy Comparison Average Distance to Goal Distance in M 6 bases

People Finding

Robot position True person position People Finding as a POMDP Fully Observable Robot Position of person unknown

Finding and Tracking People Robot position True person position

People Finding as a POMDP • Factored belief space • 2 dimensions: fully-observable robot position • 6 dimensions: distribution over person positions Regular grid gives ≈ 1016 states

Variable Resolution • Non-regular grid using samples ~ ~ T(b1, a1, b2) ~ b1 ~ b2 ~ b3 ~ b4 ~ ~ T(b1, a2, b5) ~ b5 Compute model parameters using nearest-neighbour

~ V(b1) ~ b1 ~ V(b'1) ~ b' ~ ~ Keep new belief if V(b'1) > V(b1) Refining the Grid Sample beliefs according to policy Construct new model

Robot position True person position The Optimal Policy Original distribution Reconstruction using EPCA and 6 bases

E-PCA Policy Comparison Average time to find person Average # of Actions to find Person Fully observable MDP E-PCA: 72 states Refined E-PCA: 260 states

Nick’s Thesis Contributions • Good policies for real world POMDPs can be found by planning over a low-dimensional representation of the belief space, using E-PCA. • POMDPs can scale to bigger, more complicated real-world problems. • POMDPs can be used for real deployed robots.

Finding Approximate POMDP Solutions through Belief Compression

Finding Approximate POMDP Solutions through Belief Compression

Presentation Transcript

Finding approximate palindromes in genomic sequences.

KI2 – MDP / POMDP

Finding the ‘Write’ Solutions:

JOURNEY THROUGH THE WORLD OF BELIEF

Finding Approximate Areas Under Curves

Finding Articles Through EBSCO

Two Approximate Algorithms for Belief Updating

Solving Large-Scale POMDP Problems Via Belief State Analysis

Finding Approximate Repeating Patterns from Sequence Data

Finding and Re-Finding Through Personalization

Fast approximate POMDP planning: Overcoming the curse of history!

JOURNEY THROUGH THE WORLD OF BELIEF

Approximate POMDP planning: Overcoming the curse of history!

Geometric Compression Through Topological Surgery

More About Finding Solutions

Ridge School District Negotiations Finding Creative Solutions through Negotiations

Finding Approximate Areas Under Curves

Finding Solutions: Conflict

POMDP

Two Approximate Algorithms for Belief Updating