Prototype Classification Methods

Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica 2788-3799 ext. 1819 fchang@iis.sinica.edu.tw

Types of Prototype Methods • Crisp model (K-means, KM) • Prototypes are centers of non-overlapping clusters • Fuzzy model (Fuzzy c-means, FCM) • Prototypes are weighted average of all samples • Gaussian Mixture model (GM) • Prototypes have a mixture of distributions • Linear Discriminant Analysis (LDA) • Prototypes are projected sample means • K-nearest neighbor classifier (K-NN) • Learning vector quantization (LVQ)

Prototypes thru Clustering • Given the number k of prototypes, find k clusters whose centers are prototypes • Commonality: • Use iterative algorithm, aimed at decreasing an objective function • May converge to local minima • The number of k as well as an initial solution must be specified

Clustering Objectives • The aim of the iterative algorithm is to decrease the value of an objective function • Notations: • Samples • Prototypes • L2-distance:

Objectives (cnt’d) • Crisp objective: • Fuzzy objective: • Gaussian mixture objective

K-Means Clustering

The Algorithm • Initiate k seeds of prototypes p1, p2, …, pk • Grouping: • Assign samples to their nearest prototypes • Form non-overlapping clusters out of these samples • Centering: • Centers of clusters become new prototypes • Repeat the grouping and centering steps, until convergence

Justification • Grouping: • Assigning samples to their nearest prototypes helps to decrease the objective • Centering: • Also helps to decrease the above objective, because and equality holds only if

Exercise: • Prove that for any group of vectors yi, the following inequality is always true • Prove that the equality holds only when • Use this fact to prove that the centering step is helpful to decrease the objective function

Fuzzy c-Means Clustering

Crisp vs. Fuzzy Membership • Membership matrix: Uc×n • Uijis the grade of membership of samplejwith respect to prototypei • Crisp membership: • Fuzzy membership:

Fuzzy c-means (FCM) • The objective function of FCM is

FCM (Cnt’d) • Introducing the Lagrange multiplier λ with respect to the constraint we rewrite the objective function as:

FCM (Cnt’d) • Setting the partial derivatives to zero, we obtain

FCM (Cnt’d) • From the 2nd equation, we obtain • From this fact and the 1st equation, we obtain

FCM (Cnt’d) • Therefore, and

FCM (Cnt’d) • Together with the 2nd equation, we obtain the updating rule for uij

FCM (Cnt’d) • On the other hand, setting the derivative of J with respect to pi to zero, we obtain

FCM (Cnt’d) • It follows that • Finally, we can obtain the update rule ofci:

FCM (Cnt’d) • To summarize:

K-means vs. Fuzzy c-means Sample Points

K-means vs. Fuzzy c-means K-means Fuzzy c-means

Expectation-Maximization (EM) Algorithm

What Is Given • Observed data: X = {x1, x2, …, xn}, each of them is drawn independently from a mixture of probability distributions with the density where

Incomplete vs. Complete Data • The incomplete-data log-likelihood is given by: which is difficult to optimize • The complete-data log-likelihood can be handled much easily, where H is the set of hidden random variables • How do we compute the distribution of H?

EM Algorithm • E-Step: first find the expected value where is the current estimate of • M-Step: Update the estimate • Repeat the process, until convergence

E-M Steps

Justification • The expected value (the circled term) is the lower bound of the log-likelihood

Justification (Cnt’d) • The maximum of the lower bound equals to the log-likelihood • The first term of (1) is the relative entropy of q(h) with respect to • The second term is a magnitude that does not depend on h • We would obtain the maximum of (1) if the relative entropy becomes zero • With this choice, the first term becomes zero and (1) achieves the upper bound, which is

Details of EM Algorithm • Let be the guessed values of • For the given , we can compute

Details (Cnt’d) • We then consider the expected value:

Details (Cnt’d) • Lagrangian and partial derivative equation:

Details (Cnt’d) • From (2), we derive that λ = - n and • Based on these values, we can derive the optimal for , of which only the following part involves :

Exercise: • Deduce from (1) thatλ = - n and

Gaussian Mixtures • The Gaussian distribution is given by: • For Gaussian mixtures,

Gaussian Mixtures (Cnt’d) • Partial derivative: • Setting this to zero, we obtain

Gaussian Mixtures (Cnt’d) • Taking the derivative of with respect to and setting it to zero, we get (many details are omitted)

Gaussian Mixtures (Cnt’d) • To summarize:

Linear Discriminant Analysis(LDA)

Illustration

Definitions • Given: • Samples x1, x2, …, xn • Classes: ni of them are of class i, i = 1, 2, …, c • Definition: • Sample mean for class i: • Scatter matrix for class i:

Scatter Matrices • Total scatter matrix: • Within-class scatter matrix: • Between-class scatter matrix:

Multiple Discriminant Analysis • We seek vectors wi, i = 1, 2, .., c-1 • And project the samples x to the c-1 dimensional space y • The criterion for W = (w1, w2, …, wc-1) is

Multiple Discriminant Analysis (Cnt’d) • Consider the Lagrangian • Take the partial derivative • Setting the derivative to zero, we obtain

Multiple Discriminant Analysis (Cnt’d) • Find the roots of the characteristic function as eigenvalues and then solve for wi for the largest c-1 eigenvalues

LDA Prototypes • The prototype of each class is the mean of the projected samples of that class, the projection is thru the matrix W • In the testing phase • All test samples are projected thru the same optimal W • The nearest prototype is the winner

K-Nearest Neighbor (K-NN) Classifier

K-NN Classifier • For each test sample x, find the nearest K training samples and classify x according to the vote among the K neighbors • The error rate is where • This shows that the error rate is at most twice the Bayes error

Learning Vector Quantization (LVQ)

LVQ Algorithm • Initialize R prototypes for each class: m1(k), m2(k), …, mR(k), where k = 1, 2, …, K. • Sample a training sample x and find the nearest prototype mj(k) to x • If x and mj(k) match in class type, • Otherwise, • Repeat step 2, decreasing ε at each iteration

Prototype Classification Methods

Prototype Classification Methods

Presentation Transcript

Linear Methods for Classification

Common Climate Classification Methods

Classification of Compression Methods

3. Classification Methods

Classification Ensemble Methods 1

E xamples of classification methods

Non-FAO Land Classification Methods

Classification Ensemble Methods 2

Chapter 8: Classification and Clustering Methods

K Nearest Neighbor Classification Methods

Classification Part 4: Tree-Based Methods

Data Classification by Statistical Methods

LINEAR CLASSIFICATION METHODS

Methods Project 4 – Prototype Pathfinder 11.08.2005

Ensemble Classification Methods

K Nearest Neighbor Classification Methods

K Nearest Neighbor Classification Methods

Image Classification: Supervised Methods

Classification of Analytical Methods

Linear Methods for Classification