110 likes | 242 Views
Emad Soliman Mohamed Nawfal Department of Linguistics. Multiword Expressions: A Pain in the Neck for NLP. Words jargon grudge hours in computer linguistics. Without a solution to this problem, the area was radically comfort and does not stop hunger.
E N D
Emad Soliman Mohamed Nawfal Department of Linguistics Multiword Expressions: A Pain in the Neck for NLP
Words jargon grudge hours in computer linguistics. Without a solution to this problem, the area was radically comfort and does not stop hunger.
This is the Google translation of the Arabic sentence العبارات الاصطلاحية غصة في حلق اللغويات الحاسوبية, وبدون حل هذه المشكلة حلا جذريا فإن المجال لا يسمن ولا يغني من جوع. This should have been translated as: Multiword expressions are a pain in the neck for Computational Linguistics. Without a radical solution, the field is simply useless. NO, this is not how I speak English.
Multiword expressions ---> words jargon Pain in the neck ---> grudges hours Useless ---> radically comfort and does not stop hunger. The literal meaning of the Arabic for the last thing is: The field will neither fatten you nor even drive away your hunger. Everyone is invited to think about what the problems are. So, what's going on?
What are MWEs and why are they important? • MWEs are those expressions whose meanings cross word boundaries: The English idiom “kick the bucket” means to die. It has nothing to do with kicking or with buckets. MWEs pose a threat to the principle of compositionality. • In the lexical database WordNet 1.7, 41% of the entries are multiword. • This is still an underestimation as specialized domain vocabulary overwhelmingly consists of MWEs
What kinds of problems might be involved? • If MWEs are treated by general compositional methods of linguistic analysis, this might lead to overgeneration and the idiomaticity problem. • Overgeneration: The system will correctly generate telephone booth ortelephone box, but might also generate such perfectly compositional, but unacceptable examples as telephone cabinet,telephone closet, etc. • The idiomaticity Problem: How to predict that an expression like kick the bucket, which appears to conform to the grammar of English VPs, has a meaning unrelated to the meanings of kick, the, and bucket.
Is there a way out? • Statistical as well as linguistic models are being explored. • The LINGO project of Stanford University is employing a linguistic technique within the HPSG formalism. • Kick the bucket and Part of Speech, which have one word that inflects looks like this: • part_of_speech_1 := intr_noun_l & • [ STEM < "part", "of", "speech" >, • INFL-POS "1", • SEMANTICS [KEY part_of_speech_rel ]].
Is there a way out? • Sag suggests that the linguistic rules should be used in combination with frequency information about both semantic relations and construction rules, in so far as they contribute to semantic interpretation. • Sag also mentions a potentially viable approach (by Johnson et al. (1999)) to developing probabilistic grammars based on feature structures e.g. Head-driven Phrase Structure Grammar and Lexical Functional grammar.
What else? • Some researchers are using techniques from Bioinformatics to solve the idiom and collocation problems. • Instead of finding Amino Acids, the algorithms can be used to find related words and phrases, even when they are separated by other linguistic units. • This is useful esp that Bioinformatics algorithms take mutations into considerations.
Finally • Thank you for listening. • This presentation is based mainly on information from: • http://citeseer.ist.psu.edu/sag01multiword.html • www.sci.wsu.edu/math/faculty/bkrishna/FilesMath574/Projects/AndyNLP.pdf • translate.google.com