110 likes | 352 Views
Multiword expressions extraction. Hung_H3. Multiword Expressions (MWEs). MWEs MWE Extraction Pavel’s work Various tests on MWE’s properties. Properties of MWEs. MWEs: Multiple simplex words Idiosyncratic Lexically: ad hoc Syntactically: by and large ( prep conj adj?)
E N D
Multiword expressions extraction Hung_H3
Multiword Expressions (MWEs) • MWEs • MWE Extraction • Pavel’s work • Various tests on MWE’s properties
Properties of MWEs • MWEs: • Multiple simplex words • Idiosyncratic • Lexically: ad hoc • Syntactically: by and large ( prep conj adj?) • Semantically: spill the beans • Statistically: good morning • Obstacles to language understanding, translation, generation, etc.
MWEs Extraction • MWEs Extraction • Formularize MWEs’ identifying properties • Association measures • Association measures of MWEs • Statistically marked co-occurrence’s measures • Semantically non-compositional compositionality’s measures • ?
Pavel (2005, 2006, 2008)Combination of 82 association measures • Statistical tests: • Mutual information • Statistical independence • Likely-hood measures • Semantics tests: • Entropy of immediate context • Immediate context: immediately preceding/following words • Diversity of empirical context • Empirical context: words within a certain specified window
Pavel (2005, 2006, 2008)Combination of 82 association measures • Result: • MAP: 80.81% • “Equivalent” measures 17 measures • Issues: • Possible conflicting predictions • All combined is best? • Unclear linguistic and/or statistical significance
Other linguistics-driven tests • MWEs are lexically fixed? • Substitutability test (Lin, 1999) • MWEs are order-specific? • Permutation entropy (PE) (Yi Zhang et al., 2006) • Entropy of Permutation and Insertion (EPI) • (Aline et al., 2008) • Not all permutations are valid • “Permutation” “Syntactic variants” • Others…?
References • T. Baldwin. 2004. Multiword Expressions tutorial. • D. Lin. 1999. Automatic identification of non-compositional phrases. In ACL 1999. • Pavel Pecina and Pavel Schlesinger. 2006. Combining association measures for collocation extraction. In ACL 2006. • P. Pecina. 2008. A machine learning approach to Multiword Expression extraction. In MWE Shared task 2008. • A. Villavicencio et al. 2008. An Evaluation of Methods for the Extraction of Multiword Expressions. In MWE Shared task 2008. • Yi Zhang et al. 2006 Automated multiword expression prediction for grammar engineering. In ACL 2006. • Wikipedia: http://www.wikipedia.org/
Hung_H3 Thank You !