80 likes | 329 Views
18 th November , 2010 Witness Algorithm. By: Swapnil Godambe CS 5368: Intelligent Systems (Fall2010). POMDP. A POMDP model contains: A set of states S A set of actions A A set of state transition description T A reward function R (s, a) A finite set of observations Ω
E N D
18thNovember, 2010Witness Algorithm By: Swapnil Godambe CS 5368: Intelligent Systems (Fall2010)
POMDP • A POMDP model contains: • A set of states S • A set of actions A • A set of state transition description T • A reward function R (s, a) • A finite set of observations Ω • An observation function O:S╳A→Π(Ω) • O(s’, a, o) for the probability of making observation o given that the agent took action ‘a’ and landed in state s’.
Policy Tree Policy tree can be represented as follows: Each node in a policy tree represents a particular belief state. Root node determines the first action to be taken
Witness Algorithm ==asdasd • is computed by taking union of for all actions and then pruning it. • At iteration t, algorithm has representation of t step value function.
Witness Inner Loop: • Set u initialized with a single policy tree that is best for some arbitrary belief state • At each iteration we ask if there is some belief state ‘b’ for which true value Qta(b) is different from estimated value cap [Qta(b)] computed using set u • Process continues untill we can prove that no more witness points exists and therefore that current Q-function is perfect.
References • Journal article by Kaelbling, Littman and Cassandra • Tony Cassandra’s POMDP page -POMDPs for Dummies (http://www.cs.brown.edu/research/ai/pomdp/tutorial/index.html)