1 / 18

Relating Reinforcement Learning Performance to Classification performance

This presentation by Hui Li on September 11, 2006, explores the connection between reinforcement learning and classifier learning. It discusses the motivation behind reducing reinforcement learning to classification learning, presents results, and draws conclusions. The talk focuses on the definition and goal of reinforcement learning, highlighting how to reduce a reinforcement learning problem to a cost-sensitive classification problem. Through illustrations and examples, the presentation demonstrates the process of finding good policies for a T-step MDP through weighted classification problems. It showcases the value functions and paths taken by the algorithm in solving a two-step MDP problem in a continuous state space. The overall goal is to understand the implications of linking reinforcement learning and classification performance.

minervan
Download Presentation

Relating Reinforcement Learning Performance to Classification performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relating Reinforcement Learning Performance to Classification performance Presenter: Hui Li Sept.11, 2006

  2. Outline • Motivation • Reduction from reinforcement learning to classifier learning • Results • Conclusion

  3. Motivation A Simple Relationship: The goal of reinforcement learning: The goal of (binary) classifier learning:

  4. Motivation Question: • The problem of classification has been intensively investigated • The problem of reinforcement learning is still under investigation • Is it possible to reduce the reinforcement learning to classifier learning ?

  5. Reduction Definition: • What is reinforcement learning problem A reinforcement learning problem D is defined as a conditional probability table D(o’,r|(o,a,r)*,o,a) on a set of observations O and rewards r[0,) given any history of past observations (o,a,r)*, actions (from action set A), and rewards.

  6. Reduction 2. What is reinforcement learning goal Given some horizon T, find a policy , Maximizing the expected sum of rewards:

  7. Reduction How to reduce a reinforcement learning problem to a cost-sensitive classification problem • How to obtain training examples • How to obtain training label • How to define the cost of misclassification

  8. Reduction A illustration of trajectory tree • M = {S, A, D, Ps,a} • Two actions {0,1} • non-stationary policy

  9. Reduction The value of the policy of a single step is estimated Which is explicitly written by The goal is the i-th realization

  10. Reduction Value of the policy of a single step S0 = s0n S0 = s01 S0 = s02 . . . … a = 0 … … a = 0 a = 0 a = 1 a = L-1 a = 1 a = L-1 a = 1 a = L-1 S1|0 S1|0 S1|0 S1|1 S1|L-1 S1|1 S1|L-1 S1|1 S1|L-1 the i-th realization

  11. Reduction One step reduction: One step reinforcement learning problem Cost-sensitive classifier learning problem

  12. Reduction where • s0i: the ith sample (or data) • : label • wi: the costs of classifying example i to each of of possible labels.

  13. Reduction Properties of cost • The cost for misclassification is always positive • The cost for correct classification is zero • The larger the difference between the possible actions in terms of future reward, the larger the cost (or weight)

  14. Reduction T-step MDP reduction How to find good policies for a T-step MDP by solving a sequence of weighted classification problems T-step policy  =(0, 1, … T-1) • When updating t, hold the rest constant • When updating t, the trees are pruned form the root to stage t by keeping only the branch which agree with controls 0, 1, … t-1

  15. Reduction Rewards accumulated along the branch which agrees with the controls t+1, t+2, … T-1 Immediate reward Realization of the reward follows actions at stage t

  16. Illustrative Example Two-step MDP problem: • Continuous state space S = [0, 1] • Binary action Space A = {0,1} • Uniform distribution over the initial state

  17. Illustrative Example Value function

  18. Illustrative Example Path Taken by the algorithm

More Related