1 / 40

Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010

Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010. Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill. Introduction . The problem. Feature selection for high dimensional data Assume high dimensional n*d data matrix X

jeneil
Download Presentation

Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Feature Selection for Multi-Cluster DataDeng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill

  2. Introduction

  3. The problem • Feature selection for high dimensional data • Assume high dimensional n*d data matrix X • When d is too high (d > 10000) , it causes problems • Storing the data is expensive • Computational challenges

  4. A solution to this problem • Select the most informative features • Assume high dimensional n*d data matrix X d n

  5. After feature selection • More scalable and can be much more efficient d d’ << d n

  6. Previous approaches • Supervised feature selection • Fisher score • Information theoretical feature selection • Unsupervised feature selection • LaplacianScore • MaxVariance • Random selection

  7. Motivation of the paper • Improving clustering (unsupervised) using feature selection • Automatically select the most useful feature by discovering the manifold structure of data

  8. Motivation and Novelty • Previous approaches • Greedy selection • Selects each feature independently • This approach • Considers all features together

  9. A toy example

  10. Some background—manifold learning • Tries to discover manifold structure from data

  11. manifolds in vision appearance variation

  12. Some background—manifold learning • Discovers smooth manifold structure • Map different manifold (classes) as far as possible

  13. The proposed method

  14. Outline of the approach • 1. manifold learning for cluster analysis • 2. sparse coefficient regression • 3. sparse feature selection

  15. Outline of the approach Generate response Y

  16. 1. Manifold learning for clustering • Manifold learning to find clusters of data 2 1 3 4

  17. 1. Manifold learning for clustering • Observations • Points in the same class are clustered together

  18. 1. Manifold learning for clustering • Ideas? • How to discover the manifold structure?

  19. 1. Manifold learning for clustering • Map similar points closer • Map dissimilar points faraway min similarity Data points

  20. 1. Manifold learning for clustering • Map similar points closer • Map dissimilar points faraway min = , D = diag(sum(K))

  21. 1. Manifold learning for clustering • Constraining f to be orthogonal (f^TDf=I) to eliminate free scaling • So we have the following minimization problem

  22. Summary of manifold discovery • Construct graph K • Choose a weighting scheme for K ( ) • Perform • Use f as the response vectors Y

  23. Response vector Y • Y reveals the manifold structure! 4 2 Response Y = 1 3

  24. 2. Sparse coefficient regression • When we have obtained Y, how to use Y to perform feature selection? • Y reveals the cluster structure, use Y as response to perform sparse regression

  25. Sparse regression • The ``Lasso’’ sparsely

  26. Steps to perform sparse regression • Generate Y from step 1 • Data matrix input X • For each column of Y, we denote it as Y_k • Perform the following step to estimate

  27. illustration nonzero X a Y_k =

  28. 3. Sparse feature selection • But for Y, we will have c different Y_k • How to finally combine them? • A simple heuristic approach

  29. illustration a a a a

  30. The final algorithm • 1. manifold learning to obtain Y • 2. sparse regression to select features • 3. final combination , Y = f

  31. Step 1 4 2 Response Y = 1 3

  32. Step 2 4 nonzero X a 2 regression 1 3

  33. Step 3 a a a a

  34. Discussion

  35. Novelty of this approach • Considers all features • Uses sparse regression ``lasso’’ to perform feature selection • Combines manifold learning with feature selection

  36. Results

  37. Face image clustering

  38. Digit recognition

  39. Summary • The method is technically new and interesting • But not ground breaking

  40. Summary • Test on more challenging vision datasets • Caltech • Imagenet

More Related