440 likes | 848 Views
Morphological Smoothing and Extrapolation of Word Embeddings. Ryan Cotterell , Hinrich Schütze , Jason Eisner. Words in the Lexicon are R elated!. Word embeddings are already good at this!. running. running. running. Relatedness. Morphology. Similarity. sprinting. shoes. ran.
E N D
Morphological Smoothing and Extrapolation of Word Embeddings Ryan Cotterell, HinrichSchütze, Jason Eisner
Words in the Lexicon are Related! Word embeddings are already good at this! running running running Relatedness Morphology Similarity sprinting shoes ran Goal: Put morphology into word embeddings PAST TENSE
GERUND PRES Inflectional Morphology is Highly-Structured running run RUN LEMMA runs ran 3rd PRES SG PAST TENSE
GERUND PRES sprinting sprint SPRINT LEMMA sprinted sprints 3rd PRES SG PAST TENSE
GERUND PRES wugging wug Same Structure Across Paradigms! WUG LEMMA wugged wugs 3rd PRES SG PAST TENSE
Research Question How do we exploit structured morphologicalknowledge in models of word embedding?
A Morphological Paradigm – Strings Tenses Suffixes /Ø/ /s/ /ed/ /ing/ Gerund Pres 3P Pres Sg Past [run] RUN /run/ [runs] [running] [ran] [sprint] [sprints] [sprinting] [sprinted] SPRINT /sprint/ [wug] [wugs] [wugging] [wugged] WUG /wug/ Stems Verbs [codes] CODE [coding] /code/ LOVE [love] /love/ [loved] BAT [bats] /bat/ [bating] PLAY [played] /play/
A Morphological Paradigm – Strings Suffixes /Ø/ /s/ /ed/ /ing/ Gerund Pres 3P Pres Sg Past [run] RUN /run/ [runs] [running] [ran] [sprint] [sprints] [sprinting] [sprinted] SPRINT /sprint/ [wug] [wugs] [wugging] [wugged] WUG /wug/ Verbs [codes] CODE [coding] /code/ LOVE [love] /love/ [loved] BAT [bats] /bat/ [bating] PLAY [played] /play/
Why is “running” written like that? run ing Concatenate run#ing Orthography (stochastic) Orthographic Change: Doubled n! running Modeling word forms using latent underlying morphs and phonology. Cotterell et. al. TACL 2015
A Morphological Paradigm – Strings Suffixes /Ø/ /s/ /ed/ /ing/ Gerund Pres 3P Pres Sg Past [run] RUN /run/ [runs] [running] [ran] [sprint] [sprints] [sprinting] [sprinted] SPRINT /sprint/ [wug] [wugs] [wugging] [wugged] WUG /wug/ Verbs [code] [codes] CODE [coding] [coded] /code/ [loves] LOVE [love] /love/ [loving] [loved] BAT [bat] [bated] [bats] /bat/ [bating] PLAY [play] [played] /play/ [plays] [playing] Prediction!
Matrix Completion: Collaborative Filtering Movies -37 29 29 19 -36 67 22 77 Users -24 61 12 74 -79 -41 -52 -39
Matrix Completion: Collaborative Filtering Movies [ [ [ [ -6 -3 2 9 -2 1 9 -7 2 4 3 -2 [ [ [ [ -37 29 29 19 [ 4 1 -5] -36 67 22 77 [ 7 -2 0] -24 61 12 74 [ 6 -2 3] Users -79 -41 [-9 1 4] -52 -39 [ 3 8 -5]
Matrix Completion: Collaborative Filtering [1,-4,3] [-5,2,1] Dot Product -10 Gaussian Noise -11
Matrix Completion: Collaborative Filtering Movies [ [ [ [ -6 -3 2 9 -2 1 9 -7 2 4 3 -2 [ [ [ [ [ -37 29 29 19 [ 4 1 -5] -36 67 22 77 [ 7 -2 0] -24 61 12 74 [ 6 -2 3] Users 59 -79 -41 -80 [-9 1 4] -52 6 46 -39 [ 3 8 -5] Prediction!
Morphological Paradigm – Vectors Tenses Suffixes New: This Work Gerund Pres 3P Pres Sg Past RUN SPRINT WUG Stems Verbs CODE LOVE BAT word2vec embeddings PLAY
Things word2vec doesn’t know… • Words with the same stem like “running” and “ran” are related • Words with the same inflection like “running” and ”sprinting” are related • Our Goal: Put this information into word embedding for improved embeddings!
Morphological Paradigm – Vectors Suffixes Gerund Pres 3P Pres Sg Past RUN SPRINT WUG Stems CODE LOVE BAT PLAY
Morphological Paradigm – Vectors Suffixes Pres Past RUN Stems LOVE
Morphological Paradigm – Vectors Suffixes Pres Past RUN Stems LOVE
Morphological Paradigm – Vectors Suffixes Pres Past RUN Stems LOVE
Why does “running” mean that? Add Gaussian Noise
Morphological Paradigm – Vectors Suffixes Pres Past RUN Stems LOVE Same Offset!
Additive Model running RUN GERUND ran RUN PAST
Relation to Vector Offset Method ran RUN PAST 1) RUN SPRINT GERUND GERUND 2) ran SPRINT PAST 3) sprinted ran running sprinting
Step 1: Sample morphemevectors from prior Step 2: Sample true word vector Generating A Type Vector in the Lexicon RUN GERUND running Step 3: Sample observed vector running
Generating A Type Vector in the Lexicon RUN GERUND running running
Directed Graphical Model SPRINT RUN GERUND PAST running ran sprinted sprinting sprinted running sprinting ran
Smoothing and Extrapolation • All word embeddings are noisy! • Optimization during training incomplete • Only observed a few tokens • Our model smooths all of the word embeddings jointly based on morphological information • Note: Extrapolation is extreme smoothing: when you’ve never seen the word!
Gaussian Graphical Model • All Conditionals (probability of child given parents) are Gaussian distributed: • Exact inference is always tractable (through matrix inversion) • General framework for reasoning over word embeddings! (Not limited to morphology) • Post-processing model = lightning fast (10 seconds for 1-best joint inference of embeddings)
Where did we get the graph structure? Answer: from morphological lexicons
Where did we get the graph structure? SPRINT RUN PAST GERUND sprinted sprinting ran running sprinted sprinting ran running
Why you should care! • You aren’t going to see all the words! • Too many, thanks to Zipf’s law • But we know some words must exist:` • Every English verb has a gerund, even if you didn’t see it in a corpus • Can we guess its meaning? • Open vocabulary word embeddings • Simple to implement, to train and to extend!
How do we learn the embeddings? • Learn the model parameters with Viterbi EM • E-step: simple coordinate descent (10 sec.) • M-step: update covariance matrix See paper for more details!
Training Set-up • Experimented on 5 languages: Czech, English, German, Spanish and Turkish • Varying degrees of morphology: English → German → Spanish → Czech → Turkish • Initial Embeddings are trained on Wikipedia • tokenized text • skip-gram with negative sampling • 200 dimensions
Experiment 1: Vector Prediction SPRINT RUN GERUND PAST Task: predict this vector! sprinted running ??? sprinting sprinted running sprinting ???
Experiment 1: Vector Prediction • Choose closest vector in space under cosine distance • Baseline: standard analogies • All details in the paper! Analogies also predict forms! sprinted ran running sprinting Predicts “ran” from “running”, ”sprinting” and ”sprinted”
Experiment 2: Perplexity • How perplexed is skip-gram on held-out data • Just like standard language model evaluation • Question: Do our smoothed and extrapolated word embeddings improve prediction?
Experiment 2: Perplexity Unsmoothed Perplexity (bits) # Observed Tokens Take Away: Smoothing helps! (More with fewer tokens). See paper for more details Smoothed
Experiment 3: Word Similarity • Task: Spearman’s ρ between human judgements and cosine between vectors • Similarity is about lemmata, not inflected forms • Use the latent lemma embedding!
Directed Graphical Model SPRINT RUN GERUND PAST Lemmata Word Embeddings running ran sprinted sprinting sprinted running sprinting ran
Experiment 3: Word Similarity • Task: Spearman’s ρ between human judgements and cosine between vectors • Similarity is about lemmata, not inflected forms • Use the latent lemma embedding!
Future Work • Integrate morphological information with character-level models! • Research Questions: • Are character-level models enough or do we need structured morphological information? • Can morphology help character-level neural networks?
Fin Thanks for your attention!