270 likes | 287 Views
Algorithms for POMDP. Presented by Alp Sardağ. Monahan Enumeration Phase. Generate all vectors: Number of gen. Vectors = |A|M | | where M vectors of previous state. Monahan Reduction Phase. All vectors can be kept: Each time maximize over all vectors. Lot of excess baggage
E N D
Algorithms for POMDP Presented by Alp Sardağ
Monahan Enumeration Phase • Generate all vectors: Number of gen. Vectors = |A|M|| where M vectors of previous state
Monahan Reduction Phase • All vectors can be kept: • Each time maximize over all vectors. • Lot of excess baggage • The number of vectors in next step will be even large. • LP used to trim away useless vectors
Monahan Reduction Phase • For a vector to be useful, there must be at least one belief point it gives larger value than others:
Monahan’s LP Complication Formulate LP and check for :
Eagle’s Variant of Monahan • The optimization occurs in enumaration phase. • If, in the enumaration process, a vector’s components are completely dominated by another vector’s component, discard it. • Generate ji(t) and following condition holds: • Discard ji(t). • Can be applied to check new vector dominates any vector previously enumarated.
Sondik’s One-Pass Algorithm • Find theproper set of belief states to plug into the below formula to get all necessary vectors: • The algorithm is guaranteed to visit finite number of regions. • The union of these regions is the entire belief space.
Sondik’s One-Pass Algorithm • Simplified version of Sondik’s algorithm:
Sondik’s One-Pass Algorithm • How to define a region around this belief state where that vector is guaranteed to be true linear portion of the value function? • Construct a series of constraints when satisfied, region is found. • Then go step (5)
Sondik’s One-Pass Algorithm • The condition *(t), generated at , larger for all other a(t), as varies: • Variations in can cause changes in a(t). • Need a new constraint to ensure components of a(t) stay the same.
Sondik’s One-Pass Algorithm • What affects *(t) and a(t)? • To ensure that every part of the function does not change, these constraint exists for every combination of a and
Sondik’s One-Pass Algorithm • Constraints restrict belief states to lie on the belief state space simplex:
Sondik’s One-Pass Algorithm • A constraint consists of a region with all the points on one side of the line:
Sondik’s One-Pass Algorithm • The LP constraints at step (4):
Sondik’s One-Pass Algorithm • In step (5), find belief states guaranteed not to be in region defined in step (4). • With the new point proceed exactly as step (4). • The algorithm goes until a complete partition of the belief space found.
Sondik’s One-Pass Algorithm • To find points in the neighboring regions, points lying on the edge of the region defined by the constraints is used:
Sondik’s One-Pass Algorithm • Which constraints are binding: • For each constraint, change its inequality into an equality, • Solve this LP. • If the LP has solution, it is a binding constraint, a non-binding constraint can not pass through the region defined by all other constraints.
Cheng’s Relaxed Region • Same as Sondik’s One Pass algorithm except each region specified with fewer constraints. • Defines regions that will typically be larger than the actual vectors’ regions.
Cheng’s Relaxed Region • Set of constraints for the relaxed regions of Cheng:
Cheng’s Relaxed Region • Corners found with interior algorithm:
Cheng’s Linear Support • The algorithm defines an approximate value function over the entire belief space. • Refine this approximation until it reaches the optimal value function.
Cheng’s Linear Support • Difference between two algorithms:
Cheng’s Linear Support • Initiliaze a search list with extreme points on the belief simplex(e.g. [1,0,0...],[0,1,0,0...]), and an empty set of vectors. • For each of these points the true (t) vector calculated, and added to the set of vectors.
Cheng’s Linear Support • Since both the true and the approximation are PWLC, the largest difference must occur at a corner point. • Cheng then finds all the corner points of the regionsinduced by the approximation. • Disregard the corner points seen before and add those not seen before to search list. • Pick a point from the search list, generate the vector. If it is different all the other approximation, add it to the approximation set. • Repeat whole procedure with the new approximation