Stochastic Neighbor Embedding and Its Variants

Stochastic Neighbor Embedding and Its Variants Xiaohong Chen 2011/03/17

Outline SNE: Stochastic Neighbor Embedding SSNE: Symmetric SNE t-SNE: t-distributed SNE HSSNE: Heavy-tailed SSNE m-SNE: Multiview SNE

Outline SNE: Stochastic Neighbor Embedding SSNE: Symmetric SNE t-SNE: t-distributed SNE HSSNE: Heavy-tailed SSNE m-SNE: Multiview SNE 3

Stochastic Neighbor Embedding SNE starts by converting the Euclidean distances between high-dimensional datapoints into conditional probabilities that represent similarity. It can be described as: need not to be symmetric 4

Stochastic Neighbor Embedding SNE aims to find a low-dimensional data representation that minimizes the mismatch between Kullback-Leibler divergence(相对熵)用来衡量两个正函数是否相似，对于两个完全相同的函数，它们的相对熵等于零。 5

momentum term Stochastic Neighbor Embedding In order to speed up the optimization and to avoid poor local minima 6

Stochastic Neighbor Embedding Advantages: (1) Keeps the images of nearby objects (original data points) nearby (2) Keeps the images of widely separated objects (original data points) relatively far apart. • Disadvantages • The asymmetric cost function that is difficult to optimize efficiently; • (2) It is hampered by the “crowding problem”. 8

Stochastic Neighbor Embedding 9

SSNE (Symmetric SNE) 11

SSNE (Symmetric SNE) The main advantage of SSNE is the simpler form of its gradient, which is fast to compute. SSNE shows comparable or even better experimental results. 12

t-分布 Heavy-tailed function

t-SNE (1) It uses a symmetrized version of the SNE cost function; (2) It uses a t-distribution rather than a Gaussian to compute the similarity between two points in the low-dimensional space. 15

t-SNE 16

Experimental results 17

Heavy-tailed SSNE • Contributions: • Point out that various heavy-tailed embedding similarities can be • characterized by their negative score functions. • Present a parameterized subset of similarity functions for choosing • the best tail-heaviness for HSSNE. • (2) Present a fixed-point optimization algorithm that can be applied to • all heavy-tailed functions and freed of parameter choice. 22

where the embedding similarity functioncan be any function that is monotonically decreasing with respect to Heavy-tailed SSNE 23

Heavy-tailed SSNE is the negative score function of H. 24

Embedding similarity function

Outline SNE: Stochastic Neighbor Embedding SSNE: Symmetric SNE t-SNE: t-distributed SNE HSSNE: Heavy-tailed SSNE m-SNE: Multi-view SNE 26

Multiview SNE • Simply concatenating multiview data into a long vector has three problems: • Different statistical properties are not duly considered; • The complementary information of different features is not well explored; • The performance of concatenation will easily deteriorate, if one or more views • are corrupted by noise.

Multiview SNE 28

Multiview SNE Adding an l2 norm regularization term to balance the coefficients over all views. The new objective function for learning optimal combination coefficient is 29

Reference G.Hinton and S.Roweis. Stochastic neighbor embedding. NIPS03(15) : 833-840. J. Cook, I.Sutskever,A.Mnih and G.Hinton. Visualizing similarity data with a mixture of maps, In proceedings of the 11th international Conference on Artificial Intelligence and Statistics,2007(2):67-74. L.van der Matten and G.Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research,2008(9):2579-2605. Z.Yang, I.King, Z.Xu and E.Oja. Heavy-Tailed Symmetric Stochastic Neighbor Embedding. NIPS09: 2169-2177. B.Xie,Y.Mu and D.Tao. m-SNE: Multiview stochastic neighbor embedding. ICONIP 2010,Part I LNCS 6443, 338-346. 33

感谢各位老师同学！

Stochastic Neighbor Embedding and Its Variants