ABSTRACT:

Sparse Q-learning with Mirror Descent Sridhar Mahadevan and Bo Liu, University of Massachusetts Amherst Autonomous Learning Laboratory, {mahadeva, boliu}@cs.umass.edu Convergence Comparison with LARS-TD: • ABSTRACT: • This paper explores a new framework for reinforcement learning (RL) based on online convex optimization, in particular mirror descent and related algorithms. • A new class of proximal-gradient based temporal difference (TD) methods are presented based on different Bregman divergences, which are more powerful than regular TD learning. • A new family of first-order sparse RL methods are proposed, which are able to find sparse fixed-point of an L1-regularized Bellman equation at significantly less computational cost than previous second-order methods. ALGORITMS: Less difference between successive weights Less running time at each iteration • BACKGROUND • Mirror Descentis an enhanced gradient method, which can be viewed as a proximal algorithm where the distance generating function used is a Bregman divergence. • ERROR BOUND ANALYSIS: Variance Comparison with Q-learning: Less variance compared with Q-learning The error bound is controlled by 1Expressiveness of -subspace 2 Sparsity parameter 3 Quality of empirical l_1 solver Control Learning • DISCUSSIONS AND FUTRE WORK: • Comparison of p-norm with Exponentiated Gradient (EG): EG is not able to generate sparse solutions; Besides, EG-based methods are prone to cause overflow of coefficients. • P-norm link function provides an interpolation between additive and multiplicative gradient update and is thus more flexible and robust to various basis functions. • The regret bound w.r.t different link function in RL setting is yet to be further discovered. • Introducing mirror descent into off-policy TD learning and policy gradient algorithms. • Scaling to large MDPs, including hierarchical mirror descent RL, in particular extending to Semi-MDP Q-learning. EXPERIMENTAL RESULT: Decaying p-norm: Iterative soft-thresholdingfor sparsity • MOTIVATION • This is a two-step Nested Optimization problem: • Projection Step: • Fixed-point Step: Proceedings of the Conference on Uncertainty in AI (UAI), August 15-17, 2012, Catalina Island, CA For more information, please contact: Prof. Sridhar Mahadevan, Dept. Computer Science, University of Massachusetts Amherst, Email: Mahadeva@cs.umass.edu

ABSTRACT:

ABSTRACT:

Presentation Transcript

Making a Battery from Fruits/Vegetables

How to write an abstract

Grammar: Lesson One

Cuisenaire Rods

Chapter 11 Abstract Classes and Interfaces (continued)

Abstract Neuron

Abstract Expressionism to Minimalism

SensLoc: Sensing Everyday Places and Paths using Less Energy

Writing a scientific research paper

Data S tructures and Algorithms

Spring 2014 Program Analysis and Verification Lecture 10: Abstract Interpretation II

Static Analysis with Abstract Interpretation

The Geometry of Generalized Hyperbolic Random Field

Smart Grid ad hoc – July 2011

Machine (Assembly) Language

Abstract Art

Parts-of-speech English-German

Using Arrays in Abstract Data Types

Abstract