Classification based on online learning

Classification based on online learning Keywords: reinforcement learning and inverted indexes

Classification learning at different levels • Supervised • Given appropriate “targets” for data learn to map from data to target • Unsupervised • Given data identify “targets” embedded in the data without hints or clues • Reinforcement • Given data and a policy, guess a target, receive reward or punishment, adjust the policy

A motivating example: SPAM • Imagine you wish to create a system to reduce spam • You cannot always predict who send you mails; only upon consideration of the content, i.e., subject, that you can determine if a mail is spam or not

SPAM detection using a classifier • Hence, the goal is to “learn” to distinguish good subjects from bad subjects • This problem is a classic classification problem

Learning a mapping function • The mapping involves identifying for each item in the “Mailbox” a relevance value in the range {0,1} R

(Relevance) Learning based on RL St SK agent Rt At Rt+1 User St+1 Agent in situation St chooses At, receives a reward R from the user at situation St+1 The goal is to learn the Policy: t(S, A) = Pr{At=a | St = S} At time t, given the situation S, the policy  provides the probability that the agent’s action will be A Credit: Based on a model by Gillian Hayes

SPAM prediction • Given at time t there are n emails (situation S), predict for each email its relevance (i.e., the action which is a value between 0,1) so as to maximize the reward R from the user • Given a reward (thumbs up or down, so to speak), adjust policy and expectation about future rewards

Several Key Aspects of RL • The policy  (s, a) helps deduce actions • The value function predicts future rewards or expected rewards • Actually what we need is a rank-ordered relevance vector for each item (i.e., emails) • The learning step : rate of updating the policy (associated with the policy function)

A High-Level RL Algorithm for spam classification • Initialize policy and reward representations • Look at what emails have arrived • Predict relevance of each email based on the policy function • Collect rewards from the user • Check to see if policy has converged; if not update the policy at rate  • Update expected reward

An Implementation of RL: The Actual RL Environment in Spam Classification F1 F2 R Basically, the mapping function is decomposed into two functions: F1: semantic classification and F2: relevance classification

RL Implementation • From the agent’s perspective the environment is stochastic • The agent has to choose an action • That is classify the best class and present it to the user • Based on its policy and reward representations

The Policy and Reward Functions • The Policy or action prediction function is a vector of action predictions over the set of classes Ci such that Pr(Ci) represents the probability class Ci should be selected as the best (first) class • The Reward function is based on an estimated relevance vector over the set of classes Ci such that Pr(Ci) represents the probability the user will provide the maximum reward for the class Ci

Details on the key RL functions I • The reward or estimated relevance probability vector is Ŷ (i = 1, …, n), where there are n classes • In the absence of knowledge of expected rewards all elements of the Ŷ are initialized to be zero • Recall that each element represents the probability that the corresponding class Ci will receive the maximum reward

Details on the key RL functions II • The policy or action probability vector is represented with P (I=1, …, n), where we have n maximum classes • In the absence of information, at the beginning all elements of P are initialized to be equal to each other (i.e., 1/n) • Pi is probability that the corresponding class Ci will be selected as the best (first) class

The RL Spam Detection Algorithm • Classify incoming emails to classes Ci (i= 1, …, n) • Select the best class (most likely not spam) by sampling the distribution in policy representation P • Generate a pseudo-random number (PRN) and select from P accordingly (explained next) • Create a ranked list of classes Ci based on the policy and the expected reward representations (explained later) • Collect the rewards from the user and update the reward representation Ŷ • Update policy P vector

How to select an action? • By sampling the action probability distribution (policy function) • Assume a simple P representation of [0.2, 0.4, 0.1, 0.3] A PRN between 0 and 1 is generated; then class 1 will be selected if the PRN lies between 0 and 0.2, class 2 is top if PRN is between 0.2 and 0.6, class 3 first when PRN is between 0.6 and 0.7, and class 4 in other cases As the learning progresses the action probability vector converges to a unit vector, hence a single class emerges as the dominant one

Generating a ranked list of relevance values • Choose the class with the highest value in the action probability vector as the top class  in instance k • Then generate a vector q(k) as below: qi(k) = 1 + Ŷi(k) if (k) = Ci = Ŷi(k) otherwise Where Ŷi(k) is ith element of vector Ŷ at instance k; as Ŷi(k) is bounded by 1, the top element of q(k) always corresponds to the class ; The remaining elements of q(k) follow the order in the estimated relevance probability vector Ŷ(k)

Updating estimated relevance / reward and the action / policy vectors • After collecting rewards (a value between 0, 1) then each element of the Ŷvector (representing the corresponding class C) is updated as a running average of the reward received for the class • Then another vector E(k) is generated in instance k such that all elements of E are set to zero except the lth element corresponding to the top class in the recently updated Ŷ vector • The P vector is subsequently updated as following: P(k+1) = P(k) + (E(k) – P(k)) where 0< < 1is the learning step Hence, an optimal unit vector (actions) is computed at every instance based on current estimates of the relevance probability vector (rewards)

Some known limitations of RL • Estimating the  or the learning step • Too fast or lazy convergence • Initial latency of learning • Scarcity of rewards • Radical shifts in reward patterns • Re-learning may be slow

RL in action(Note: Latency and Convergence)

Acknowledgement • Snehasis Mukhopadhyay – IUPUI, Indianapolis • Gillian Hayes – Univ. of Edinburgh

Classification based on online learning

Classification based on online learning

Presentation Transcript

CLASSIFICATION OF VASCULITIS BASED ON PATHOGENESIS

Shape Classification Based on Skeleton Path Similarity

Recommendations Based on Speech Classification

Classification based on Modes of propagation

Hands on Classification with Learning Based Java

Statistical Learning Theory and Classification based on Support Vector Machines

object based classification

Automated malware classification based on network behavior

Case-Based Online Learning Module

Classification based on Association Rules

Frame Classification Based on MAC Header Content

A Semantic Text Classification Based on DBpedia

On Compression-Based Text Classification

Introduction Anemia classification based on the mechanism

Cladistics is classification based on common ancestry.

Classification Models based on Graphs in Biomedicine

Instance-based Classification

Classification using instance-based learning

MUNICIPALITIES CLASSIFICATION BASED ON FUZZY RULES

Gender Classification based on Facial information

Instance-based Classification

Rule-Based Classification