330 likes | 559 Views
Word2Vec Explained. Jun Xu Harbin Institute of Technology China. Word Similarity & Relatedness. How similar is pizza to pasta ? How related is pizza to Italy ? Representing words as vectors allows easy computation of similarity Measure the semantic similarity between words
E N D
Word2Vec Explained Jun Xu Harbin Institute of Technology China
Word Similarity & Relatedness • How similar is pizza to pasta? • How related is pizzato Italy? • Representing words as vectors allows easy computation of similarity • Measure the semantic similarity between words • As features for various supervised NLP tasks such as document classification, named entity recognition, and sentiment analysis
What is word2vec? • word2vec is not a single algorithm • word2vec is not deep learning • It is a software package for representing words as vectors, containing: • Two distinct models • CBoW • Skip-Gram (SG) • Various training methods • Negative Sampling (NS) • Hierarchical Softmax
Why Hierarchical Softmax? • Turn multinomial classification problem into multiple binomial classification problem
Why Negative Sampling? • Increase positive samples’ probability while decrease negative samples’ probability • Hidden assumption: decrease negative samples’ probability means Increase positive samples’ probability • Right? • Maybe not! • The Objective Function has changed already!! • Vectors: word vector and parameter vector, not w_in and w_out
Put it all together • Goal: Learn word vectors • Similar semantic means similar word vector • Maximum likelihood estimation: • MLE on words • multinomial classification -> multiple binomial classification • Hierarchical softmax • MLE on word-context pairs • Negative sampling
Hierarchical Softmax Rethink • Huffman tree and Hidden layers • Change another tree structure other than Huffman tree • Change the way that Huffman tree is built • What is frequency?
What is SGNS learning? “Neural Word Embeddings as Implicit Matrix Factorization” Levy & Goldberg, NIPS 2014
What is SGNS learning? “Neural Word Embeddings as Implicit Matrix Factorization” Levy & Goldberg, NIPS 2014
What is SGNS learning? “Neural Word Embeddings as Implicit Matrix Factorization” Levy & Goldberg, NIPS 2014
What is SGNS learning? “Neural Word Embeddings as Implicit Matrix Factorization” Levy & Goldberg, NIPS 2014
What is SGNS learning? “Neural Word Embeddings as Implicit Matrix Factorization” Levy & Goldberg, NIPS 2014
What is SGNS learning? “Neural Word Embeddings as Implicit Matrix Factorization” Levy & Goldberg, NIPS 2014
What is SGNS learning? • SGNS is doing something very similar to the older approaches • SGNS is factorizing the traditional word-context PMI matrix • So does SVD! • GloVe factorizes a similar word-context matrix