370 likes | 588 Views
Effectiveness of Indirect Dependency for Automatic Synonym Acquisition. Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama Graduate School of Information Science, Nagoya University. Outline. Introduction Comparison of contextual information Sentence co-occurrence, proximity, dependency
E N D
Effectiveness of Indirect Dependencyfor Automatic Synonym Acquisition Masato Hagiwara, Yasuhiro Ogawa, Katsuhiko Toyama Graduate School of Information Science, Nagoya University
Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion
Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion
Introduction • Automatic synonym acquisition • Basic technique for automatic thesaurus/ontology construction and various NLP tasks • Based on Distributional Hypothesis [Harris 1985] • “Semantically similar words share similar contexts” • Extracts and uses various contexts of words
Contextual Information Co-occurrences (w, c) Surr. words Sent. coocr. (breakfast, dobj:have:*:_) (lunch, iobj:for:go:*) (tea, dobj:have:*:_) ・・・ Selection prox sent Extraction Dependency(dep) Similarity Calculation (Similarity measures or Language models) subj obj mod etc Little attention Extensively studied Common approach for synonym acquisition Corpus Effective contextual information needs to be investigated
Contextual Information Co-occurrences (w, c) Surr. words Sent. coocr. (breakfast, dobj:have:*:_) (lunch, iobj:for:go:*) (tea, dobj:have:*:_) ・・・ Selection prox sent Extraction Dependency(dep) Similarity Calculation (Similarity measures and Language models) subj obj mod etc First goal • Investigation of effective contexts for automatic synonym acquisition • Sentence co-occurrence • Proximity • Dependency Corpus
Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion
(context) (word) Category(1) – Sentence co-occurrence (sent) dep prox sent • Sentences in which words appear • Assume that words that commonly appear in the similarsentences are semantically similar
(target) (prox) January January January January January January L1:since L2:level L3:relatively R1:, R2:the R3:Commerce Category(2) – Proximity (prox) dep prox sent • Words that appear in the vicinity of the target word • Consider a window centered at the target,and extract the words located within Shipments have been relatively level since January, the Commerce Department noted. Window:3 tokens on the both sides of target
ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj (word) (dependency) since since January the Commerce Department Department (dobj * January) (ncmod _ note *) (dobj since *) (det Department *) (ncmod _ Department *) (det * the) (ncmod _ * Commerce) Category(3) – Dependency (dep) dep prox sent • Dependency between words Shipments have been relatively level since January, the Commerce Department noted.
Proximity(prox) performed better thanthe widely-used Dependency(dep) Performance evaluation experiment dep prox sent Which category of context (sent, prox, dep) is effective for synonym acquisition?
Difference of prox and dep dep prox sent • Why is prox better than dep? Dependency ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the Commerce Department noted.
Surrounding words with no dependency with the target may cause performance difference prox subsumes most of dep prox-based context dep-based context Difference of prox and dep dep prox sent • Why is prox better than dep? Dependencycontained in prox (distance≦3) ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the Commerce Department noted.
What kind of syntactical relationship do these contexts have? January <comma> <period> Department noted the Difference of prox and dep dep prox sent • Why is prox better than dep? Dependency ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the CommerceDepartment noted. Target: Commerce dep prox
What kind of syntactical relationship do these contexts have? January <comma> <period> Department noted the Difference of prox and dep dep prox sent Consider indirect dependency oftwo continuous dependency relations • Why is prox better than dep? Dependency ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the CommerceDepartmentnoted. Target: Commerce dep prox
Indirect dependency causesperformance increase? Difference of prox and dep prox d.dep i.dep Consider Indirect dependency oftwo continuous dependency relations • Why is prox better than dep? Dependency ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, theCommerceDepartmentnoted. Target: Commerce January <comma> <period> Department noted the direct dep prox indirect dep
Objectives • Propose indirect dependency as a wayto enhance the contextual information • Indirect dependency = multiple steps of dependency • Direct dependency = single dependency step • Investigate the effectiveness of indirect dependency for automatic synonym acquisition
Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion
ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj aux-of ncmod-of dobj-of ncmod-of ncsubj-of det-of xcomp-of ncsubj-of ncmod-of ccomp-of Corresponding inverse relationsare also included Indirect dependency Direct dependency → a binary relation D Shipments have been relatively level since January, the Commerce Department noted.
Indirect dependency Direct dependency → a binary relation D Indirect dependency → composition D2 ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the Commerce Department noted. aux-of ncmod-of dobj-of ncmod-of ncsubj-of det-of xcomp-of ncsubj-of ncmod-of ccomp-of
Label composition ncmod∘ncsubj Indirect dependency Direct dependency → a binary relation D Indirect dependency → composition D2 ccomp ncmod ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the Commerce Department noted. aux-of ncmod-of dobj-of ncmod-of ncsubj-of det-of xcomp-of ncsubj-of ncmod-of ccomp-of
Indirect dependency Direct dependency → a binary relation D Indirect dependency → composition D2 ccomp ncsubj∘xcomp-of ncmod ncmod∘ncsubj ncsubj xcomp det aux ncmod dobj ncmod ncsubj Shipments have been relatively level since January, the Commerce Department noted. aux-of ncmod-of dobj-of ncmod-of ncsubj-of det-of xcomp-of ncsubj-of ncmod-of ccomp-of
dobj∘ncmod iobj∘dobj Indirect dependency Direct dependency → a binary relation D Indirect dependency → composition D2 ncsubj det ncmod dobj aux iobj dobj det The driver on the truck was looking for me.
Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion
Analyzed by RASP2 GR (n-ary) word context C1 (ncsubj be Shipment _) (aux be have) (xcomp _ be level) (ncmod _ be relatively) (ccomp _ level note) (ncmod _ note since) (ncsubj note Department _) Shipment - (ncsubj be * _) be - (ncsubj * Shipment _) have - (aux be *) be - (aux * have) be - (xcomp _ * level) level - (xcomp _ be *) ... ... Context extraction for indirect dependency Shipments have been relatively level since January, the Commerce Department noted.
ncsubj∘xcomp-of ncsubj xcomp xcomp-of ncsubj-of context C2 - (ncsubj (xcomp _ * level) * _) substitute Context extraction for indirect dependency Shipments have been relatively level since January, the Commerce Department noted. word context C1 Shipment - (ncsubj be * _) be - (ncsubj * Shipment _) have - (aux be *) be - (aux * have) be - (xcomp _ * level) level - (xcomp _ be *) ...
- (xcomp _ (ncsubj * Shipment) *) Context extraction for indirect dependency xcomp∘ncsubj-of ncsubj xcomp Shipments have been relatively level since January, the Commerce Department noted. xcomp-of ncsubj-of word context C1 context C2 - (ncsubj (xcomp _ * level) * _) Shipment - (ncsubj be * _) be - (ncsubj * Shipment _) have - (aux be *) be - (aux * have) be - (xcomp _ * level) level - (xcomp _ be *) substitute Cn(n≧3) is similarly generated ...
Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion
Used the commonly-used combination ofvector space model,tf.idf,and cosine similarity. Vector construction: Similarity calculation: Synonym acquisition Language models and similarity measures arenot within the scope of this study
Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion
Evaluation measures • Measure(1) – Average precision (AP) • Averaged precision values over 11 recall points • Based on the “reference set” created from three existing thesauri: WordNet, Roget’s, and COBUILD thesaurus • Measure(2) – Correlation coefficient (CC) • Correlation between “reference similarity” and cosine similarity • Reference similarity … calculated based on the depth ofword nodes in WordNet tree structure
Outline • Introduction • Comparison of contextual information • Sentence co-occurrence, proximity, dependency • Indirect dependency • Formalization • Context extraction • Synonym acquisition method • Evaluation • Experiment • Conclusion
Experiment – Conditions • Corpora (1) Brown Corpus (BROWN) (approx. 60,000 sentences (2) Wall Street Journal (WSJ) (approx. 68,000 sentences) (3) WordBank (WB) (approx. 190,000 sentences) • Limited to noun synonyms • Frequency cutoff … removed words and contextsappearing less than θftimes • θf = 5 (BROWN, WSJ), θf = 15 (WB)
Drastic improvementby indirect dependency Significantly betterthan prox dep3 affects little Experiment – Result • Compared the performance of: • prox, dep1, dep2, dep12, dep123
Experiment – Result • Compared the performance of: • prox, dep1, dep2, dep12, dep123 Effectiveness of indirect dependencyfor synonym acquisition Consistent results with all the corpora
word-contextco-occurrence matrix Non-zero elements of the matrix ≒computational complexity Experiment – Comparison of data size dep12 … context with good quality achieves higher performance with lower cost
Conclusion • Effectiveness of indirect dependency forautomatic synonym acquisition • Indirect dependency = Composition ofdirect dependency • Performance improvement over direct dependency • Achieves higher performance with lower costthan prox Future Works • Confirmation using other parsers and similarity measures • Other kinds of contexts and their performance