410 likes | 427 Views
Explore active learning techniques that empower classifiers to learn efficiently with fewer training samples. Understand how data labeling can be expensive and how active learning can enhance classifier training. Discover key ideas and examples in pedestrian detection and hierarchical sampling. Delve into the realm of reinforcement learning, where agents learn to maximize long-term rewards through trial and error. Grasp the fundamentals of Q-learning and its application in decision-making scenarios. Enhance your understanding of training agents through rewarding experiences and exploring environments.
E N D
Active & Reinforcement Learning Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn
Students Active Passive Lazy
Active Learning • What is AL? • Techniques that help classifiers learn better with less training samples. • Why: • Data are cheap but labeling can be expensive. • For example: Speech Recognition, Information Extraction … • Key idea: • Let the classifier choose the samples from which it learns. Original Data Supervised Learning Active Learning
Example 1 Supervised Learning: N labels for ε<=1/N w a b Active Learning: log(N) labels for ε<=1/N w a b
Query Points • Many samples are redundant or irrelevant to the decision boundary. • AL typically works by: • Randomly querying a few samples. • Heuristically querying additional samples. • Synthesized Queries Points • The learner may request labels for any points in the space. • Just like students may pop-up all kinds of questions. • Could be awkward. • Pool-Based Sampling • Samples are cheap and are collected at once. • Informativeness measure is employed to do the selection. • The key question is which samples should be labeled?
Uncertainty Sampling • Find out what we are not sure about. • Create an initial classifier. • While the teacher is willing to label samples • (a) Apply the current classifier to each sample. • (b) Find samples that the classifier is least certain of the membership. • (c) Have the teacher label the selected samples. • (d) Train a new classifier on all labeled samples. • The classifier needs to output membership and certainty. • KNN, NB, NN … • Extensions for Multi-Class Problems • Margin • Entropy
Sampling Bias w* w S. Dasgupta and D. Hsu: Hierarchical Sampling for Active Learning. ICML 2008
Exploiting Clustering Structure • Find a clustering of the data. • Sample a few points from each cluster randomly. • Assign each cluster its majority label. • Use this fully labeled data to build a classifier.
Summary • Data Cheap, Labeling Expensive. • AL improves the efficiency of training by selectively querying the most informative samples. • Two Flavors: • Explore the hypothesis space efficiently. • Exploit the clustering structure. • Related Areas: • Semi-Supervised Learning • Design of Experiments • Optimization
Reinforcement Learning • An agent makes a series of actions and receives awards from the environment. • For example: a robot walking along a maze • Delayed Reward • Usually the reward is given after a number of actions. • Lots of unrewarded intermediate actions • For example: win or lose of a game • Goal • To learn to choose actions to maximize long term rewards. • Supervised? Unsupervised? • Learning Based on Experience • Credit Assignment • Apportion credit and blame to each action.
Terminology • State (S): Room • Action (A): Moving from one room to another • Reward Table (R):
Matrix Q • The agent learns from experience or training by exploring the environment. • Q is the memory of the agent about the environment. • Given a state diagram, find the minimum path from any initial state to the goal state.
Q Learning • Set the learning rate and R. • Initialize Q as a zero matrix. • For each trial • Select a random initial state. • While the goal state is not reached • Select one possible action A for the current state S. • Get the maximum Q value of the new state S′. • Update Q (S, A). • Set the new state as the current state. • End While • End For • After learning, the agent will move by selecting the action with the maximum value in Q.
Examples • Initial State: B • Next Possible States: D & F • From F, the agent can go to B, E & F • Stop!
Examples • Initial State: D • Next State: B, C & E • From B, the agent can go to D & F • Continue …
Temporal Difference Learning Agent’s Move Opponent’s Move Agent’s Move
Summary • RL trains an agent to make a series of appropriate actions to achieve long term goals based on trial and error. • Credit assignment is implemented largely through the back propagation of rewards with decay. • What are the most important choices that lead you to success? • RL takes into account the details of the interaction. • By contrast: How to use GAs + ANN to learn chess? • There is a trade off between exploration and exploitation. • Human still perform better by truly understanding the problems via analyzing and reasoning.