930 likes | 1.98k Views
In the Name of God. Machine Learning. Classification and Linear Classifiers. Mohammad Ali Keyvanrad. Thanks to: M . Soleymani (Sharif University of Technology ) R. Zemel (University of Toronto ) p. Smyth (University of California, Irvine). Fall 1392. Outline. Classification
E N D
In the Name of God Machine Learning Classification and Linear Classifiers Mohammad Ali Keyvanrad Thanks to: M. Soleymani (Sharif University of Technology) R. Zemel (University of Toronto) p. Smyth (University of California, Irvine) Fall 1392
Outline • Classification • Linear classifiers • Perceptron • Multi-class classification • Generative approach • Naïve Bayes classifier
Classification problem • Given: Training set • labeled set of input-output pairs • Goal: Given an input , assign it to one of classes • Examples: • Spam filter • Handwritten digit recognition
Linear classifiers • Linear classifiers: • Decision boundaries are linear functions • dimensional hyper-plane within the dimensional input space. • Examples • Perceptron • Support vector machine • Decision Tree • KNN • Naive Bayes classifier • Linear Discriminant Analysis (or Fisher's linear discriminant)
Linear classifiers • Linearly separable • Data points can be exactly classified by a linear decision surface. • Binary classification • Target variable
Decision boundary • Discriminant function : • : bias • if then else • Decision boundary: • The sign of predicts binary class labels
Linear Decision boundary (Decision Tree) t2 Income t3 t1
Linear Decision boundary (K Nearest Neighbor) O x Feature 2 O x x O Feature 1
Non-Linear Decision boundary Decision Boundary Decision Region 1 Decision Region 2
Decision boundary • Linear classifier
Non-linear decision boundary • Choose non-linear features • Classifier still linear in parameters 𝒘
Linear boundary: geometry • In this Slide:
SSE cost function for classification • SSE cost function is not suitable for classification • Sum of Squared Errors loss penalizes “too correct” predictions • SSE also lack robustness to noise
SSE cost function for classification • Is it more suitable if we set ?
Perceptron algorithm • Linear classifier • Two-class: for , for • Goal
Perceptron criterion • Misclassification
Batch gradient for descentPerceptron • “Gradient Descent” to solve the optimization problem • Batch Perceptron converges in finite number of steps for linearly separable data
Stochastic gradient descent for Perceptron • Single-sample perceptron • If is misclassified • Perceptron convergence theorem (for linearly separable data) • If training data are linearly separable, the single-sample perceptron is also guaranteed to find a solution in a finite number of steps.
Convergence of Perceptron • Change in a direction that corrects the error
Multi-class classification • Solutions to multi-category problems • Converting the problem to a set of two-class problems • “one versus rest” or “one against all” • For each class , a linear discriminant function that separates samples of from all the other samples is found. • one versus one • linear discriminant functions are used, one to separate samples of a pair of classes.
Multi-class classification • One-vs-all (one-vs-rest)
Multi-class classification • One-vs-one
Multi-class classification: ambiguity • Converting the multi-class problem to a set of two-class problems can lead to regions in which the classification is undefined
Probabilistic approach • Bayes’ theorem
Bayes decision theory • Bayes decision: Choose the class with highest
Probabilistic classifiers • Probabilistic classification approaches can be divided in two main categories • Generative • Discriminative
Generative approach • Learning stage • Determine for each class individually • Determine • Use the Bayes theorem to find • Decision stage • After learning the model , make optimal class assignment for new input • if then decide
Discriminative approach • Learning stage • Determine the posterior class probabilities directly • Decision stage • After learning the model (inference stage), make optimal class assignment for new input • if then decide
Naïve Bayes classifier • Conditional independence assumption: • For each class , it finds univariatedistributions instead of finding one multi-variate distribution
Naïve Bayes classifier • It first estimates the class conditional densities and the prior probability for each class based on the training set. • In the decision phase, it finds the label of according to: