600 likes | 2.11k Views
Visualizing Data using t-SNE. Presenter : Wei- Hao Huang Authors : Geoffrey Hinton JMLR 2008. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.
E N D
Visualizing Data using t-SNE Presenter : Wei-Hao Huang Authors : Geoffrey Hinton JMLR 2008
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation • Visualization of high-dimensional data is an important problem and deals with data of widely varying dimensionality. • Linear v.s. Nonlinear dimensionality reduction techniques. • Techniques are strong performance on artificial data sets, but visualizing real high-dimensional data are not.
Objectives • To convert a high-dimensional data set into a matrix of pairwise similarities. • To introduce a new technique is called “t-SNE” for visualizing the resulting similarity data.
Methodology • Stochastic Neighbor Embedding • t-Distributed Stochastic Neighbor Embedding • Symmetric SNE • Mismatched Tail can Compensate for Mismatched Dimensionalities
Stochastic Neighbor Embedding Data space Map space Cost function Perplexity Gradient descent method
Symmetric SNE (t-SNE) • To use Student-t distribution improve performance. • Cost function is difficult to optimize Symmetrized • Crowding problem heavy-tailed distribution Cost function Map space Data space Gradient descent method
Mismatched Tails can Compensate for Mismatched Dimensionalities (t-SNE) Map space Gradient descent method
Experiments • Data Sets • MNIST data set, Olivetti faces data set, COIL-20 data set, word-features data set, and Netflix data set. • Experimental Setup • To use PCA to reduce the dimensionality • Cost function parameter settings
Visualizations of 6,000 handwritten digits from the MNIST data set Sammon mapping t-SNE Isomap LLE
Visualizations of the Olivetti faces data set t-SNE Sammon mapping Isomap LLE
Visualizations of the COIL-20 data set t-SNE Sammon mapping Isomap LLE
Applying t-SNE to Large Data Sets Neighborhood graph K=20
Weaknesses Dimensionality reduction for other purposes. Curse of intrinsic dimensionality. Non-convexity of the t-SNE cost function.
Conclusions • t-SNE is capable of retaining the local structure of the data while also revealing some important global structure. • To present a landmark approach that makes it possible to successfully visualize large real-world data sets.
Comments • Advantages • Visualization of high-dimensional data is very well. • Open source. • Applications • Visual application for data.