10 likes | 141 Views
Basis Construction from Power Series Expansions of Value Functions Sridhar Mahadevan and Bo Liu , University of Massachusetts Amherst. MDP and Power Series Expansions: (1) Value function with a fixed policy: V = T(V) = R+γPV (2) Value function and Geometric Dilating Expansions:
E N D
Basis Construction from Power Series Expansions of Value Functions Sridhar Mahadevan and Bo Liu, University of Massachusetts Amherst MDP and Power Series Expansions: (1) Value function with a fixed policy: V = T(V) = R+γPV (2) Value function and Geometric Dilating Expansions: Neumann Expansion Laurent Expansion Krylov/BEBF Bases Drazin Bases BARB (Approximate Drazin) Bases • Abstract: • Provides a useful framework to analyze properties of existing bases, as well as provides insight into constructing more effective bases. • Value function approximation based on Laurent series expansion relates discounted and average-reward formulations, provides both an explanation for convergence performance with different discount factor as well as suggests a way to construct more efficient basis representations. • An incremental variant of Drazin bases called Bellman average-reward bases (BARBs) is described, which provides the same benefits at lower computational cost. Experimental Result (50state Chain): Performance Assessment: • Introduction: • Reward-sensitive bases are constructed by dilating the (sampled) reward function by geometric powers of the (sampled) transition matrix of a policy. • Reward-sensitive bases tend to have low reward error, whereas the feature prediction error of Proto-value Fun ctions (PVFs) can be quite low since the eigenvectors capture invariant subspaces of transition matrix. • Gain is used to capture the long-term nature of invariant subspace of the transition matrix linking basis construction and power series expansions of value function. Experimental Result (Gridworld): • Conclusions and Future Work: • The Krylov bases tends to converge slowly as , whereas Drazin Basis derived from Laurent series overcomes this shortcoming; • BARB is investigated as an incremental version with less computational cost; • Other power series, such as the Schultz expansion, which provides the underpinnings of multiscale bases, merits further investigation; • A unified error bound analysis under different power series expansions is an interesting topic worth further study. • MDP and Its Approximation Error: • Bellman error can be decomposed into two terms: • a reward error term and a weighted feature error : Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS), December 6-8, 2010. For more information, please contact: Prof. Sridhar Mahadevan, Dept. Computer Science, University of Massachusetts Amherst, Email: Mahadeva@cs.umass.edu