Classification and Linear Classifiers

In the Name of God Machine Learning Classification and Linear Classifiers Mohammad Ali Keyvanrad Thanks to: M. Soleymani (Sharif University of Technology) R. Zemel (University of Toronto) p. Smyth (University of California, Irvine) Fall 1392

Outline • Classification • Linear classifiers • Perceptron • Multi-class classification • Generative approach • Naïve Bayes classifier

Classification: Oranges and Lemons

Classification problem • Given: Training set • labeled set of input-output pairs • Goal: Given an input , assign it to one of classes • Examples: • Spam filter • Handwritten digit recognition

Linear classifiers • Linear classifiers: • Decision boundaries are linear functions • dimensional hyper-plane within the dimensional input space. • Examples • Perceptron • Support vector machine • Decision Tree • KNN • Naive Bayes classifier • Linear Discriminant Analysis (or Fisher's linear discriminant)

Linear classifiers • Linearly separable • Data points can be exactly classified by a linear decision surface. • Binary classification • Target variable

Decision boundary • Discriminant function : • : bias • if then else • Decision boundary: • The sign of predicts binary class labels

Linear Decision boundary (Perceptron)

Linear Decision boundary (Decision Tree) t2 Income t3 t1

Linear Decision boundary (K Nearest Neighbor) O x Feature 2 O x x O Feature 1

Non-Linear Decision boundary Decision Boundary Decision Region 1 Decision Region 2

Decision boundary • Linear classifier

Non-linear decision boundary • Choose non-linear features • Classifier still linear in parameters 𝒘

Linear boundary: geometry • In this Slide:

SSE cost function for classification • SSE cost function is not suitable for classification • Sum of Squared Errors loss penalizes “too correct” predictions • SSE also lack robustness to noise

SSE cost function for classification • Is it more suitable if we set ?

Perceptron algorithm • Linear classifier • Two-class: for , for • Goal

Perceptron criterion • Misclassification

Batch gradient for descentPerceptron • “Gradient Descent” to solve the optimization problem • Batch Perceptron converges in finite number of steps for linearly separable data

Stochastic gradient descent for Perceptron • Single-sample perceptron • If is misclassified • Perceptron convergence theorem (for linearly separable data) • If training data are linearly separable, the single-sample perceptron is also guaranteed to find a solution in a finite number of steps.

Convergence of Perceptron • Change in a direction that corrects the error

Convergence of Perceptron

Multi-class classification • Solutions to multi-category problems • Converting the problem to a set of two-class problems • “one versus rest” or “one against all” • For each class , a linear discriminant function that separates samples of from all the other samples is found. • one versus one • linear discriminant functions are used, one to separate samples of a pair of classes.

Multi-class classification • One-vs-all (one-vs-rest)

Multi-class classification • One-vs-one

Multi-class classification: ambiguity • Converting the multi-class problem to a set of two-class problems can lead to regions in which the classification is undefined

Probabilistic approach • Bayes’ theorem

Bayes’ theorem

Bayes decision theory • Bayes decision: Choose the class with highest

Probabilistic classifiers • Probabilistic classification approaches can be divided in two main categories • Generative • Discriminative

Discriminative vs. generative approach

Generative approach • Learning stage • Determine for each class individually • Determine • Use the Bayes theorem to find • Decision stage • After learning the model , make optimal class assignment for new input • if then decide

Discriminative approach • Learning stage • Determine the posterior class probabilities directly • Decision stage • After learning the model (inference stage), make optimal class assignment for new input • if then decide

Naïve Bayes classifier • Conditional independence assumption: • For each class , it finds univariatedistributions instead of finding one multi-variate distribution

Naïve Bayes classifier • It first estimates the class conditional densities and the prior probability for each class based on the training set. • In the decision phase, it finds the label of according to:

Naïve Bayes: discrete example

Classification and Linear Classifiers

Classification and Linear Classifiers

Presentation Transcript

Linear Classifiers

Generalized Linear Models Classification

Linear Classification Models: Generative

Classification: Linear Models

Linear Classifiers

Linear Models for Classification

Linear Classifiers/SVMs

Bayesian Online Classifiers for Text Classification and Filtering

LINEAR CLASSIFICATION METHODS

Non Linear Classifiers

Linear classification

Linear hyperplanes as classifiers

Classification: Linear Models

Linear Classification

Classification Bayesian Classifiers

Linear Classifiers

Linear classifiers

Non Linear Classifiers