Linear Models for Classification

LinearModels forClassification Chapter 4

Decision Theory • Inference step Use training data to learn a model for . • Decision step For given x, use to make optimal class assignments.

Decision Theory • Nonprobabilistic Method Discriminant function • Generative Models • Discriminative Models

4.1 Discriminant Function • w determines the orientation of the decision surface. • The bias parameter w0 determines the location of the decision surface.

Multiple classes • One-versus-the-rest classifier : y>0 assign x to class Ck, else not class Ck .

Multiple classes • : assigning a point x to class Ck if yk(x) > yj(x) for all j ≠ k. • The decision regions: connected and convex ?

Least squares for classification The sum-of-squares error function:

Least squares for classification

Fisher’s linear discriminant • Adjust w to maximizes the class separation. • Fisher Criterion • between-class covariance matrix • total within-class covariance matrix

Fisher’s linear discriminant • If the within-class covariance is isotropic, w is proportional to the difference of the class means.

Relation to least squares • The least-squares: making the model predictions as close as possible to a set of target values. • Fisher criterion: requiring maximum class separation in the output space. • For the two-class problem, the Fisher criterion can be obtained as a special case of least squares.

Fisher’s discriminant for multiple classes • We first note from (4.46) that SB is composed of the sum of K matrices, each of which is an outer product of two vectors and therefore of rank 1. • In addition, only (K −1) of these matrices are independent as a result of the constraint (4.44). Thus, SB has rank at most equal to (K −1) and so there are at most (K −1) nonzero eigenvalues. • This shows that the projection onto the (K − 1)-dimensional subspace spanned by the eigenvectors of SB does not alter the value of J(w), and so we are therefore unable to find more than (K − 1) linear ‘features’ by this means.

The perceptron algorithm • The perceptron function • nonlinear activation function f(· ) • The perceptron criterion (error function)

The perceptron algorithm

4.2 Probabilistic Generative Models • Model the class-conditional densities and class prior • Use Bayes’ theorem to compute posterior probabilities

Logistic Sigmoid Function

4.2 Probabilistic Generative Models • Beyes’ theorem • if ak aj for all j ≠ k, then p(Ck|x) ≈1, and p(Cj |x)≈ 0.

Continuous inputs • Model the class-conditional densities Here, all classes share the same covariance matrix. • Two-classes Problem

Two-Classes Problem

K Classes • With the same covariance matrix, the boundary is linear, otherwise it is quadratic.

Maximum likelihood solution • The likelihood function: • Note that the approach of fitting Gaussian distributions to the classes is not robust to outliers, because the maximum likelihood estimation of a Gaussian is not robust.

Discrete features When The class-conditional distributions: Linear functions of the input values:

Exponential family • Exponential family: Linear Function

Linear Models for Classification

Linear Models for Classification

Presentation Transcript

Linear Models for Classification 2 Chap. 4 of PRML

General Linear Models; Generalized Linear Models

Linear Methods for Classification

Linear Models for Classification : Probabilistic Methods

Lecture 3. Linear Models for Classification

Generalized Linear Models Classification

Linear Classification Models: Generative

Classification: Linear Models

Linear Models for Classification

New Models for Relational Classification

Linear Classification with discriminative models

LINEAR CLASSIFICATION METHODS

Classification adjustments (For funding models)

Linear classification

Classification: Linear Models

Multivariate linear models for regression and classification Outline:

Linear Models

Linear Classification

CS 782 – Machine Learning Lecture 4 Linear Models for Classification

Linear Methods for Classification