Dependency-Based Word Embeddings

Dependency-BasedWord Embeddings Omer Levy Yoav Goldberg Bar-Ilan University Israel

Neural Embeddings • Dense vectors • Each dimension is a latent feature • word2vec (Mikolov et al., 2013) • State-of-the-Art: Skip-Gram with Negative Sampling • “Linguistic Regularities” king man woman queen Linguistic Regularities in Sparse and Explicit Word Representations Friday, 2:00 PM, CoNLL 2014

Our Main Contribution: Generalizing Skip-Gram with Negative Sampling

Skip-Gram with Negative Sampling v2.0 • Original implementation assumes bag-of-words contexts • We generalize to arbitrary contexts • Dependency contexts create qualitatively different word embeddings • Provide a new tool for linguistically analyzing embeddings

Context Types

Australian scientist discovers star with telescope Example

Australian scientist discovers star with telescope Target Word

Australian scientist discovers star with telescope Bag of Words (BoW) Context

Australian scientist discoversstar with telescope Bag of Words (BoW) Context

Australian scientist discoversstar withtelescope Bag of Words (BoW) Context

Australian scientist discovers star with telescope Syntactic Dependency Context

Australian scientistdiscoversstar with telescope Syntactic Dependency Context nsubj prep_with dobj

Australianscientistdiscoversstar with telescope Syntactic Dependency Context nsubj prep_with dobj

Generalizing Skip-Gram with Negative Sampling

How does Skip-Gram work? • Skip-gram represents each word as a vector • Skip-gram represents each context word as a different vector • Same word has 2 different embeddings (as “word”, as “context”)

How does Skip-Gram work? Text Bag of Words Context Word-ContextPairs Learning

How does Skip-Gram work? Text Bag of Words Contexts Word-ContextPairs Learning

Our Modification Text Arbitrary Contexts Word-ContextPairs Learning

Our Modification Modified word2vec publicly available! Text Arbitrary Contexts Word-ContextPairs Learning

Our Modification: Example Text Syntactic Contexts Word-ContextPairs Learning

Our Modification: Example Text (Wikipedia) Syntactic Contexts Word-ContextPairs Learning

Our Modification: Example Text (Wikipedia) Syntactic Contexts (Stanford Dependencies) Word-ContextPairs Learning

What is the effect of different context types?

What is the effect of different context types? • Thoroughly studied in explicit representations (distributional) • Lin (1998), Padóand Lapata (2007), and many others… General Conclusion: • Bag-of-words contexts induce topicalsimilarities • Dependency contexts induce functionalsimilarities • Share the same semantic type • Cohyponyms • Does this hold for embeddings as well?

Embedding Similarity with Different Contexts Related to Harry Potter Schools

Embedding Similarity with Different Contexts Related to computability Scientists

Embedding Similarity with Different Contexts Online Demo! Related todance Gerunds

Embedding Similarity with Different Contexts • Dependency-based embeddings have more functional similarities • This phenomenon goes beyond these examples • Quantitative Analysis (in the paper)

Quantitative Analysis Dependency-based embeddings have more functional similarities Dependencies BoW (k=2) BoW (k=5)

Why do dependencies induce functional similarities?

Dependency Contexts & Functional Similarity • Thoroughly studied in explicit representations (distributional) • Lin (1998), Padó and Lapata (2007), and many others… • In explicit representations, we can look at the features and analyze • But embeddings are a black box! • Dimensions are latent and don’t necessarily have any meaning

Analyzing Embeddings

Peeking into Skip-Gram’s Black Box • Skip-Gram allows a peek… • Contexts are embedded in the same space! • Given a word , find the contexts it “activates” most:

Associated Contexts

Analyzing Embeddings • We found a way to linguisticallyanalyze embeddings • Together with the ability to engineer contexts… • …we now have the tools to create task-tailored embeddings!

Conclusion

Conclusion • Generalized Skip-Gram with Negative Sampling to arbitrary contexts • Different contexts induce different similarities • Suggest a way to peek inside the black box of embeddings • Code, demo, and word vectors available from our websites • Make linguistically-motivatedtask-tailored embeddings today! Thank you for listening :)

Dependency-Based Word Embeddings

Dependency-Based Word Embeddings

Presentation Transcript

Dependency

Morphological Smoothing and Extrapolation of Word Embeddings

Learning Random Walk Models for Inducing Word Dependency Distributions

Persian Language Resources Based on Dependency Grammar

Bayesian Subtree Alignment Model based on Dependency Trees

Formalizing Homogeneous Language Embeddings

pre-ordering dependency subtreeS for phrase-based smt

Dependency-Based Automatic Evaluation for Machine Translation

Using Word Based Features for Word Clustering

Toward Dependency Path based Entailment

Toward Dependency Path based Entailment

Constraint based Dependency Telugu Parser

Constraint Based Hindi Dependency Parser

Corpus-based word frequency lists

Query Sensitive Embeddings

An SVMs Based Multi-lingual Dependency Parsing

Dependency

QoS-Aware Dependency Management for Component Based Systems

Learning Embeddings for Similarity-Based Retrieval

Dimensionality Reduction and Embeddings