1 / 42

Prototype Classification Methods

Prototype Classification Methods. Fu Chang Institute of Information Science Academia Sinica 2788-3799 ext. 1819 fchang@iis.sinica.edu.tw. Types of Prototype Methods. Crisp model (K-means, KM) Prototypes are centers of non-overlapping clusters Fuzzy model (Fuzzy c -means, FCM)

dionne
Download Presentation

Prototype Classification Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica 2788-3799 ext. 1819 fchang@iis.sinica.edu.tw

  2. Types of Prototype Methods • Crisp model (K-means, KM) • Prototypes are centers of non-overlapping clusters • Fuzzy model (Fuzzy c-means, FCM) • Prototypes are weighted average of all samples • Gaussian Mixture model (GM) • Prototypes have a mixture of distributions • Linear Discriminant Analysis (LDA) • Prototypes are projected sample means • K-nearest neighbor classifier (K-NN) • Learning vector quantization (LVQ)

  3. Prototypes thru Clustering • Given the number k of prototypes, find k clusters whose centers are prototypes • Commonality: • Use iterative algorithm, aimed at decreasing an objective function • May converge to local minima • The number of k as well as an initial solution must be specified

  4. Clustering Objectives • The aim of the iterative algorithm is to decrease the value of an objective function • Notations: • Samples • Prototypes • L2-distance:

  5. Objectives (cnt’d) • Crisp objective: • Fuzzy objective: • Gaussian mixture objective

  6. K-Means (KM) Clustering

  7. The KM Algorithm • Initiate P seeds of prototypes p1, p2, …, pP • Grouping: • Assign samples to their nearest prototypes • Form non-overlapping clusters out of these samples • Centering: • Centers of clusters become new prototypes • Repeat the grouping and centering steps, until convergence is reached

  8. Justification • Grouping: • Assigning samples to their nearest prototypes helps to decrease the objective • Centering: • Also helps to decrease the above objective, because • and equality holds only if

  9. Exercise I: • Prove that for any group of vectors yi, the following inequality is always true • Prove that the equality holds only when • Use this fact to prove that the centering step is helpful to decrease the objective function

  10. Fuzzy c -Means (FCM) Clustering

  11. Crisp vs. Fuzzy Membership • Membership matrix: UP×N • uijis the grade of membership of samplejwith respect to prototypei • Crisp membership: • Fuzzy membership:

  12. Fuzzy c -means (FCM) • The objective function of FCM is • Subject to the constraints

  13. Supplementary Subject • Lagrange Multipliers • Goal: maximize or minimize f (x), x = (x1, x2, …, xd) • Constraints: gi (x)=0, i = 1, 2, …, n • Solution method: • The solution of the above problem satisfies the following equations where and

  14. Supplementary Subject (cnt’d) • Geometric interpretation: • Assuming there is only one constraint g (x)=0. • is perpendicular to the contour {x: f (x)=c } • is perpendicular to the contour g (x)=0 • means that at the extremum point, the two perpendicular lines are aligned with each other or, equivalently, the two contours are tangent to each other

  15. Tangent to the contour ofg Maximize f(x,y),under the constraint that g(x,y) = 1,wheref(x, y)=4x andg(x, y)=x2+y2

  16. FCM (Cnt’d) • Introducing the Lagrange multiplier λjwith respect to the constraint we rewrite the objective function as:

  17. FCM (Cnt’d) • Setting the partial derivatives to zero, we obtain • It follows that • and

  18. FCM (Cnt’d) • Therefore, • and

  19. FCM (Cnt’d) • On the other hand, setting the derivative of J with respect to pi to zero, we obtain

  20. FCM (Cnt’d) • It follows that • Finally, we can obtain the update rule ofpi:

  21. FCM (Cnt’d) • To summarize:

  22. Exercise II • Show that if we only allow one cluster for a set of samples x1, x2, …, xn, then both the KM cluster center or the FCM cluster center must be

  23. The FCM Algorithm • Using a set of seeds as the initial solution for pi , FCM computes uij , and pi iteratively, until convergence is reached, for i = 1, 2, …, P, and j = 1, 2, …, N

  24. K-means vs. Fuzzy c-meansPositions of 3 Cluster Centers K-means Fuzzy c-means

  25. Gaussian Mixture Model

  26. Given • Observed data: {xi : i = 1, 2, …, N }, each of which is drawn independently from a mixture of probability distributions with the density where are mixture coefficients,

  27. Solution • Repeat the following estimations, until convergence is reached

  28. Linear Discriminant Analysis (LDA)

  29. Illustration

  30. Definitions • Given: • Samples x1, x2, …, xn • Classes: ni of them are of class i, i = 1, 2, …, C • Definition: • Sample mean for class i : • Scatter matrix for class i :

  31. Scatter Matrices • Total scatter matrix: • Within-class scatter matrix: • Between-class scatter matrix:

  32. Multiple Discriminant Analysis • We seek vectors wi , i = 1, 2, .., C -1 • And project the samples x to the C -1 dimensional space y • The criterion for W = (w1, w2, …, wC-1) is

  33. Multiple Discriminant Analysis (Cnt’d) • Consider the Lagrangian • Take the partial derivative • Setting the derivative to zero, we obtain

  34. Multiple Discriminant Analysis (Cnt’d) • Find the roots of the characteristic function as eigenvalues and then solve for wi with the largest C -1 eigenvalues

  35. LDA Prototypes • The prototype of each class is the mean of the projected samples of that class, the projection is thru the matrix W • In the testing phase • All test samples are projected thru the same optimal W • The nearest prototype is the winner

  36. K-Nearest Neighbor (K-NN) Classifier

  37. K-NN Classifier • For each test sample x, find the nearest K training samples and classify x according to the vote among the K neighbors • The error rate is where • This shows that the error rate is at most twice the Bayes error

  38. Condensed Nearest Neighbor (CNN) Rule • K-NN is very powerful, but may take too much time to conduct, if the number of samples is huge • CNN serves as a method to condense the number of samples • We can then perform K-NN on the condensed set of samples

  39. CNN: The Algorithm • For each class type c, add randomly a c-sample to the condensed set Pc. • Check if all c-samples are absorbed, whereas a c-sample x is said to be absorbed, if where and • If there are still unabsorbed c-samples, add randomly one of them to Pc • Go to step 2, until Pc no longer changes for all c.

  40. Learning Vector Quantization (LVQ)

  41. LVQ Algorithm • Initialize R prototypes for each class: m1(k), m2(k), …, mR(k), where k = 1, 2, …, K. • Sample a training sample x and find the nearest prototype mj (k) to x • If x and mj (k) match in class type, • Otherwise, • Repeat step 2, decreasing ε at each iteration

  42. References • R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd Ed., Wiley Interscience, 2001. • T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer-Verlag, 2001. • F. Höppner, F. Klawonn, R. Kruse, and T. Runkler, Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition, John Wiley & Sons, 1999. • S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, 1999.

More Related