1 / 1

Introduction:

Basis Construction from Power Series Expansions of Value Functions Sridhar Mahadevan and Bo Liu , University of Massachusetts Amherst. MDP and Power Series Expansions: (1) Value function with a fixed policy: V = T(V) = R+γPV (2) Value function and Geometric Dilating Expansions:

brenna
Download Presentation

Introduction:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basis Construction from Power Series Expansions of Value Functions Sridhar Mahadevan and Bo Liu, University of Massachusetts Amherst MDP and Power Series Expansions: (1) Value function with a fixed policy: V = T(V) = R+γPV (2) Value function and Geometric Dilating Expansions: Neumann Expansion Laurent Expansion Krylov/BEBF Bases Drazin Bases BARB (Approximate Drazin) Bases • Abstract: • Provides a useful framework to analyze properties of existing bases, as well as provides insight into constructing more effective bases. • Value function approximation based on Laurent series expansion relates discounted and average-reward formulations, provides both an explanation for convergence performance with different discount factor as well as suggests a way to construct more efficient basis representations. • An incremental variant of Drazin bases called Bellman average-reward bases (BARBs) is described, which provides the same benefits at lower computational cost. Experimental Result (50state Chain): Performance Assessment: • Introduction: • Reward-sensitive bases are constructed by dilating the (sampled) reward function by geometric powers of the (sampled) transition matrix of a policy. • Reward-sensitive bases tend to have low reward error, whereas the feature prediction error of Proto-value Fun ctions (PVFs) can be quite low since the eigenvectors capture invariant subspaces of transition matrix. • Gain is used to capture the long-term nature of invariant subspace of the transition matrix linking basis construction and power series expansions of value function. Experimental Result (Gridworld): • Conclusions and Future Work: • The Krylov bases tends to converge slowly as , whereas Drazin Basis derived from Laurent series overcomes this shortcoming; • BARB is investigated as an incremental version with less computational cost; • Other power series, such as the Schultz expansion, which provides the underpinnings of multiscale bases, merits further investigation; • A unified error bound analysis under different power series expansions is an interesting topic worth further study. • MDP and Its Approximation Error: • Bellman error can be decomposed into two terms: • a reward error term and a weighted feature error : Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS), December 6-8, 2010. For more information, please contact: Prof. Sridhar Mahadevan, Dept. Computer Science, University of Massachusetts Amherst, Email: Mahadeva@cs.umass.edu

More Related