1 / 39

Self-Organizing Maps for Data Clustering and Evaluation

This text discusses self-organizing maps (SOM) as a constrained version of k-means clustering, the evaluation of clustering results, and dimension reduction using principal component analysis (PCA) and factor analysis. It also covers techniques such as Monte Carlo methods and cophenetic correlation coefficient index for evaluating clustering results.

streeter
Download Presentation

Self-Organizing Maps for Data Clustering and Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering (3) Self-organizing maps Evaluation of clustering results Figures and equations from Data Clustering by Gan et al.

  2. Self-organizing maps a constrained version of K-means clustering the prototypes are encouraged to lie in a one- or two-dimensional manifold in the feature space – “a constrained topological map”

  3. Self-organizing maps Set up a two-dimensional rectangular grid of K prototypes mj∈ Rp(usually on the two-dimensional principal component plane) Loop for observation data points xi - find the closest prototype mj to xiin Euclidean distance - for all neighbors mk of mj (within distance r in the 2D grid), move mk toward xi via the update Once the model is fit, the observations are mapped down onto the two-dimensional grid.

  4. Self-organizing maps 5 × 5 grid of prototypes

  5. Self-organizing maps SOM moves the prototypes closer to the data, but also to maintain a smooth two-dimensional spatial relationship between the prototypes- a constrained version of K-means clustering If r is small enough, SOM becomes K means, training on one data point at a time. Both r and α decrease over iterations.

  6. Self-organizing maps

  7. Self-organizing maps

  8. Self-organizing maps Is the constraint reasonable?

  9. Evaluation of clustering results

  10. Evaluation

  11. Evaluation External criteria approach: Comparing clustering results (C ) with a pre-specified partition (P). For all pairs of samples, M=a+b+c+d

  12. Evaluation Monte Carlo methods based on H0 (random generation), or bootstrap are needed to find significance.

  13. Evaluation External criteria: An alternative is to compare the proximity matrix Q with the given partition P. Define matrix Y based on P:

  14. Evaluation Internal criteria: evaluate clustering structure by features of the dataset (mostly proximity matrix of the data). Example: For Hierarchical clustering, Pc: cophenetic matrix, the ijth element represents proximity level at which two data points xi and xjare first joined into the same cluster. P: proximity matrix.

  15. Evaluation Cophenetic correlation coefficient index: CPCC is in [-1,1]. Higher value indicates better agreement.

  16. Evaluation Relative criteria: choose the best result out of a set according to predefined criterion. Example: Modified Hubert’s Γ statistic: P is the proximity matrix of the data. High value indicates compact clusters.

  17. Dimension reduction (1) Overview PCA Factor Analysis

  18. Overview • The purpose of dimension reduction: • Data simplification • Data visualization • Reduce noise (if we can assume only the dominating dimensions are signals) • Variable selection for prediction

  19. Overview

  20. PCA • Explain the variance-covariance structure among a set of random variables by a few linear combinations of the variables; • Does not require normality!

  21. PCA

  22. PCA

  23. Reminder of some results for random vectors

  24. Reminder of some results for random vectors Proof of the first (and second) point of the previous slide.

  25. PCA The eigen values are the variance components: Proportion of total variance explained by the kth PC:

  26. PCA

  27. PCA The geometrical interpretation of PCA:

  28. PCA PCA using the correlation matrix, instead of the covariance matrix? This is equivalent to first standardizing all X vectors.

  29. PCA Using the correlation matrix avoids the domination from one X variable due to scaling (unit changes), for example using inch instead of foot. Example:

  30. PCA Selecting the number of components? Based on eigen values (% variation explained). Assumption: the small amount of variation explained by low-rank PCs is noise.

  31. Sparse PCA In high-dimensional data, loadings of a single PC on 10,000 genes doesn’t make much sense. To obtain sparse loadings, and make the interpretation easier, and the model more robust. SCoTLASS

  32. Sparse PCA Zhou, Hastie, Tibshirani’s SPCA by regression (regression/reconstruction property of PCA):

  33. Factor Analysis If we take the first several PCs that explain most of the variation in the data, we have one form of factor model. L: loading matrix F: unobserved random vector (latent variables). ε: unobserved random vector (noise)

  34. Factor Analysis Orthogonal factor model assumes no correlation between the factor RVs. is a diagonal matrix

  35. Factor Analysis

  36. Factor Analysis Rotations in the m-dimensional subspace defined by the factors make the solution non-unique: PCA is one unique solution, as the vectors are sequentially selected. Maximum likelihood estimator is another solution:

  37. Factor Analysis As we said, rotations within the m-dimensional subspace doesn’t change the overall amount of variation explained. Do rotation to make the results more interpretable:

  38. Factor Analysis Varimax criterion: Find T such that is maximized. V is proportional to the summation of the variance of the squared loadings. Maximizing V makes the squared loadings as spread out as possible --- some are real small, and some are real big.

  39. Factor Analysis Orthogonal simple factor rotation: Rotate the orthogonal factors around the origin until the system is maximally aligned with the separate clusters of variables. Oblique Simple Structure Rotation: Allow the factors to become correlated. Each factor is rotated individually to fit a cluster.

More Related