1 / 41

Active & Reinforcement Learning

Explore active learning techniques that empower classifiers to learn efficiently with fewer training samples. Understand how data labeling can be expensive and how active learning can enhance classifier training. Discover key ideas and examples in pedestrian detection and hierarchical sampling. Delve into the realm of reinforcement learning, where agents learn to maximize long-term rewards through trial and error. Grasp the fundamentals of Q-learning and its application in decision-making scenarios. Enhance your understanding of training agents through rewarding experiences and exploring environments.

hazelj
Download Presentation

Active & Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Active & Reinforcement Learning Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

  2. Students Active Passive Lazy

  3. Active Learning • What is AL? • Techniques that help classifiers learn better with less training samples. • Why: • Data are cheap but labeling can be expensive. • For example: Speech Recognition, Information Extraction … • Key idea: • Let the classifier choose the samples from which it learns. Original Data Supervised Learning Active Learning

  4. Pedestrian Detection

  5. Example 1 Supervised Learning: N labels for ε<=1/N w a b Active Learning: log(N) labels for ε<=1/N w a b

  6. Example 2

  7. Framework

  8. Query Points • Many samples are redundant or irrelevant to the decision boundary. • AL typically works by: • Randomly querying a few samples. • Heuristically querying additional samples. • Synthesized Queries Points • The learner may request labels for any points in the space. • Just like students may pop-up all kinds of questions. • Could be awkward. • Pool-Based Sampling • Samples are cheap and are collected at once. • Informativeness measure is employed to do the selection. • The key question is which samples should be labeled?

  9. Beauty and the Beast

  10. Uncertainty Sampling • Find out what we are not sure about. • Create an initial classifier. • While the teacher is willing to label samples • (a) Apply the current classifier to each sample. • (b) Find samples that the classifier is least certain of the membership. • (c) Have the teacher label the selected samples. • (d) Train a new classifier on all labeled samples. • The classifier needs to output membership and certainty. • KNN, NB, NN … • Extensions for Multi-Class Problems • Margin • Entropy

  11. Least Confident

  12. Query-By-Committee

  13. Query-By-Bagging

  14. Co-Testing

  15. Co-Testing

  16. Sampling Bias w* w S. Dasgupta and D. Hsu: Hierarchical Sampling for Active Learning. ICML 2008

  17. Exploiting Clustering Structure • Find a clustering of the data. • Sample a few points from each cluster randomly. • Assign each cluster its majority label. • Use this fully labeled data to build a classifier.

  18. Finding the Right Granularity

  19. Hierarchical Clustering

  20. Summary • Data Cheap, Labeling Expensive. • AL improves the efficiency of training by selectively querying the most informative samples. • Two Flavors: • Explore the hypothesis space efficiently. • Exploit the clustering structure. • Related Areas: • Semi-Supervised Learning • Design of Experiments • Optimization

  21. 10 Minutes …

  22. Journey of Life

  23. Reinforcement Learning • An agent makes a series of actions and receives awards from the environment. • For example: a robot walking along a maze • Delayed Reward • Usually the reward is given after a number of actions. • Lots of unrewarded intermediate actions • For example: win or lose of a game • Goal • To learn to choose actions to maximize long term rewards. • Supervised? Unsupervised? • Learning Based on Experience • Credit Assignment • Apportion credit and blame to each action.

  24. Prison Break

  25. Graph Representation

  26. Rewards

  27. Example

  28. Terminology • State (S): Room • Action (A): Moving from one room to another • Reward Table (R):

  29. Matrix Q • The agent learns from experience or training by exploring the environment. • Q is the memory of the agent about the environment. • Given a state diagram, find the minimum path from any initial state to the goal state.

  30. Q Learning • Set the learning rate and R. • Initialize Q as a zero matrix. • For each trial • Select a random initial state. • While the goal state is not reached • Select one possible action A for the current state S. • Get the maximum Q value of the new state S′. • Update Q (S, A). • Set the new state as the current state. • End While • End For • After learning, the agent will move by selecting the action with the maximum value in Q.

  31. Examples • Initial State: B • Next Possible States: D & F • From F, the agent can go to B, E & F • Stop!

  32. Examples • Initial State: D • Next State: B, C & E • From B, the agent can go to D & F • Continue …

  33. Convergence of Q

  34. Tower of Hanoi

  35. All States

  36. Graph Representation

  37. Solution

  38. Board Games

  39. Game Trees

  40. Temporal Difference Learning Agent’s Move Opponent’s Move Agent’s Move

  41. Summary • RL trains an agent to make a series of appropriate actions to achieve long term goals based on trial and error. • Credit assignment is implemented largely through the back propagation of rewards with decay. • What are the most important choices that lead you to success? • RL takes into account the details of the interaction. • By contrast: How to use GAs + ANN to learn chess? • There is a trade off between exploration and exploitation. • Human still perform better by truly understanding the problems via analyzing and reasoning.

More Related