180 likes | 463 Views
Machine Translation of Persian Complex Predicates. Jan W. Amtrup Kofax Image Products Karine Megerdoomian MITRE. Complex Predicates. Predicates composed of more than one grammatical element but behaving as a simple predicate Persian Verbal Predicates
E N D
Machine Translation of Persian Complex Predicates Jan W. Amtrup Kofax Image Products Karine Megerdoomian MITRE
Complex Predicates • Predicates composed of more than one grammatical element but behaving as a simple predicate • Persian Verbal Predicates • Consist of preverbal element(s) and a light verb behaving as a single semantic unit غصّه خوردن worry eat to worryخجالت کشیدنshame pull to be ashamedشانه زدنcomb hit to combازدست دادن from hand give to lose
Direct Translation Source Language Target Language Transfer Interlingua 3
Why word-for-word translation won’t work • Compositional meaningدست مانی شروع کرد درد گرفتنhand of Mani start didpain catching ‘Mani’s hand started hurting’ • Ambiguity • بگفته آسوشيتد پرس] [شمار بيکاران افزايش] يافته است[ • [To saying Associated Press] [number unemployed increase] found is • ‘According to the Associated Press, the number of the unemployed has increased.’
Solutions for Machine Translation List each light verb in the lexicon as an atomic unit
Problems with the atomic approach • Intervening elements کشورهای اسلامی خواستارنقش فزاينده سازمان ملل در عراقشدند countries-ez islamic requesterrole increasing United Nations in Iraqbecame The Islamic countries requested an increasing United Nations role in Iraq.’ • Internal modification قیمت نفتافزایششدیدییافت. price oil increase intense-Indef found “The price of oil increased intensely.”
Solutions for Machine Translation List each light verb in the lexicon as an atomic unit Treat light verbs as a special case of subcategorization
Issues with subcategorization approach Productivity Even though the individual parts of a construction are present in the lexicon, the semi-productive creation of novel verbs is missed دانلود کردن، هک کردن، فیلتر شدن، پارازيت زدن، لینک دادن link give, parasite hit, filter become, hack do, download do (Lexicon size) Each entry has to be represented 8
Solutions for Machine Translation List each light verb in the lexicon as an atomic unit Treat light verbs as a special case of subcategorization These are static approaches to lexicon architecture
Solutions for Machine Translation List each light verb in the lexicon as an atomic unit Treat light verbs as a special case of subcategorization These are static approaches to lexicon architecture How about Use a constructionist approach to lexicon?
Constructionist Lexicon Dynamic view of word formation Surface words are not atomic units but have internal structure Meaning is composed by combining components of words This view is predominant in theoretical linguistics, but also attracts attention in computational paradigms (e.g. Fong et al. 2001, Fujita et al. 2004)
Inchoative Clothes Clothes dry dried became Clothes Become<> Become<dry> The clothes dried لباسها خشک شدند
Causative The heat of the sun Heat-ez Heat of sun sun clothes cause dried OM the clothes dry clothes made become<> become<dry> گرمای آفتاب لباسها را خشککرد The heat of the sun dried the clothes
Activity Adj: to New York The plane Plane Plane flew to New York Act To New York flight did flight<EventiveNoun> <EventiveNoun> The plane flew to New York به نيويورک پرواز کردهواپیما
Repetitive Activity Hossein Hossein Hossein combed dog-his/her OM Act<repetitive> his/her dog comb His/her dog hit With <> With <comb> Hossein combed his/her dog حسین سگش را شانه زد
The compositional approach Lexicon lists only the necessary parts, not compositions Theoretically motivated Accounts for novel usage Allows modeling of closely related verbsthe window broke vs. John broke the window Ability to handle separable light verb constructions
Conclusion • Discussed some lexical issues in Persian MT • Presented a structure of the lexicon based on linguistic theory • Presented computational formalization and implementation of the lexical structure of certain Persian light verbs • Advantages: • Correct translation without listing each complex verb in the lexicon • Smaller vocabularysize • Facilitates multilingual translation • Interlingua – but based on linguistic theory • Easy handling of separable light verbs