70 likes | 244 Views
Automatic Idiom Recognition. Ajit Paul Singh Dept. of Computing Science University of Alberta. Motivation. Goal : Automatically tag idioms in English language text Why ? e.g. “Break the ice” MT : Literal translation loses meaning Info. Retrieval : Has nothing to do with ice
E N D
Automatic Idiom Recognition Ajit Paul Singh Dept. of Computing Science University of Alberta
Motivation • Goal: Automatically tag idioms in English language text • Why ? • e.g. “Break the ice” • MT: Literal translation loses meaning • Info. Retrieval: Has nothing to do with ice • Malapropisms: Idioms look like them; but are valid phrases (Hirst) • Word-sense disambiguation Reference [1]
Approaches • Statistical • e.g. mutual information • Grammatical(rules to detect idioms) • HPSG encodings (Erbach 1992, Riehemann 1997, 2001) • Link grammars (Sleator & Temperley 1991) • Probabilistic CFGs References [2,3,4,5]
Proposal • Examine supervised learning of grammatical models from tagged corpora • North American News Text Corpora ( 415m words, newspaper articles) • Penn Treebank • Evaluate the different grammatical models in idiom detection
Process & Evaluation • How to learn/validate grammatical rules • Input: Set of idioms and examples • Output: Grammar based description of idioms • Validation: • For a given corpus, find all instances of idiom I and its variants. • Parse corpus and mark instances of idiom I and its variants.
References [1] D. Lin. Automatic Identification of Non-compositional Phrases. Proceedings ACL-99. pp. 317-324 [2] G. Erbach. Head Driven Lexical Representation of Idioms in HPSG. Proceedings of Intl. Conference on Idioms, Tilburg (NL), 1992 [3] S. Riehemann. Idiomatic Constructions in HPSG, 1997. [4] S. Riehemann. A Constructional Approach to Idioms and Word Formation. Thesis (Stanford CLSI, 2001) [5] D.D.K. Sleator and D. Temperley. Parsing English with a Link Grammar. Technical Report (CMU-CS-91-196)