30 likes | 164 Views
Policy Improvement for POMDPs using gradient ascent. Gaurav Marwah. Introduction. POMDP stands for partially observable markov decision processes. Framework for planning under uncertainty in actions and observations. Optimal planning requires storing entire event history.
E N D
Policy Improvement for POMDPs using gradient ascent Gaurav Marwah
Introduction • POMDP stands for partially observable markov decision processes. • Framework for planning under uncertainty in actions and observations. • Optimal planning requires storing entire event history. • Existing approaches: Policy iteration, value iteration, gradient ascent, sampling etc.
Approach • Policy as a probabilistic finite state controller. • Probabilities: • Node transition probability • Action probability • The probabilities will be adjusted using gradient ascent to maximize value function. • Similarity to back propagation method.