220 likes | 423 Views
Hindi - English Synset Linkage. Resource Center for Indian Language Technology Solutions http://www.cfilt.iitb.ac.in Computer Science and Engineering Department, IIT Bombay. Outline. Introduction Issue in linkage Linkage Problem due to POS Mismatch
Hindi - English Synset Linkage Resource Center for Indian Language Technology Solutions http://www.cfilt.iitb.ac.in Computer Science and Engineering Department, IIT Bombay
Outline • Introduction • Issue in linkage • Linkage Problem due to POS Mismatch • Linkage Problem due to Hindi Construction like Causative Verb • Linkage Problem due to Idiomaticity • Linkage Problem due to Culture Specific • Linkage Problem due to MW
Introduction • Linkage between HWN 1.2 and EWN 2.1 • Two types of linkage:- i) Direct linkage ii) Hypernymy linkage • Direct Linkage- The synsetshaving exact equivalents in English WordNet are to be linked through direct linkage. For example, आम (aama), आम वृक्ष (aamavriksha) is to be linked to the English synset mango, mango tree - (large evergreen tropical tree cultivated for its large oval fruit).
HypernymyLinkage • The synsets which cannot be linked directly to English concepts are to be linked through hypernymy linkage. For example, the Hindi synsets of चाचा (chaachaa, paternal uncle) and मामा (maamaa,maternal uncle) would be linked to the English synset of uncle through hypernymy linkage.
Issues in Linkage • Synsets cannot be linked due to unavailability of corresponding English synsets in EWN • Nouns For example:- आल आउट, ऑल आउट, आल आऊट, ऑल आऊट - एक पक्ष के सभी खिलाड़ियों के आउट होने की क्रिया"आज भारतीय क्रिकेट टीम 195 पर ही आल आउट हो गई” Aalaaauta-EkapakshakesaviikhiilaaDiyonkeaauta hone kiikriyaa “aajabhaaratiiyakriketatiima 195 parahiiaalaaauta ho gaii” All out- Act of all players of a team being out “today Indian cricket team was all out at 195 ”
Issues in Verb Linkage • Verbs- Like the following verb are unlinkable. For example:- स्तब्ध होना, निश्चेष्ट होना, जड़ होना, बधिर होना, बहरा होना, बहिरा होना - संवेदनाशून्य होना "सब कुछ समाप्त हो चुका है यह खबर सुनकर वह पूर्ण रूप से स्तब्ध हो गया” stabdhahonaa, nishchestahonaa, jaDahonaa, badhirahonaa, baharaahonaa, bahiraahonaa _sanvedanaashoonyahonaa “sabakuchhasamaapta ho chukaahaiyahakhabarasunakaravahapoorNaroopa se stabdha ho gayaa” to be shocked – be numbed “ after listening the news that every thing is finished, he was completely shocked ”
Issues in Adjective Linkage • Adjective:- hypernymy linkage is not possible. For example:- एकल- जिसमें एक पक्ष में केवल एक खिलाड़ी हो "नडाल ने एकल स्पर्धा के फाइनल में प्रवेश किया” ekala- jisamenekapaksha men kevalaekakhilaaDiiho“nadaala ne ekalaspardhakephainala men praveshkiya” Single-That which has one player in one team/side “Nadal entered in final of single competition”
Issues in Adverb Linkage • Adverb:- This also cannot be linked with hypernymy. For Example:- के कारण, की वजह से, के चलते, के मारे - किसी कारण से"तेज़ बारिश के कारण मैं भीग गया“ KekaaraNa, kiivajaha se, kechalate, kemaare- kisiikaaraNa se “tejabaarishakekaaraNa main bhiigagayaa” Due to, because of, on account of (it is a conjuction or preposition in English so cannot be linked) “due to heavy rain I got wet”
Linking Problem due to POS Mismatch 1/4 • बरखास्त, बरख़ास्त, बरख़्वास्त, बरख्वास्त, बर्खास्त, विसर्जित, समाप्त- (अधिवेशन, बैठक, सभा आदि के संबंध में) समाप्त किया हुआ या जिसका विसर्जन हो चुका हो “बरखास्त सभा कल सुबह दस बजे पुनः प्रारंभ होगी”-Adjective barakhaasta,visarjita,samaapta(adhiveshana,baithaka,sabhaaaadikesambandha men) samaapatakiyaahuaayaajisakaavisarjana ho chukaa ho” barakhaastasabhaakalasubahadasabajepunahpraarambhahogii” Adjourned- That which has ended ( not given as an adjective in English dictionaries ; given as a past participle of the verb) “adjourned meeting will re start tomorrow at ten O’clock”
Linking Problem due to POS Mismatch 2/4 • बनाम,के विरुद्ध, के ख़िलाफ़, के खिलाफ़, के खिलाफ - किसी के प्रति या विरुद्ध“यह दावा माधवसिंह बनाम बेनीसिंह दायर हुआ है“- Adverb banaama, keviruddha,kekhilaafa – kisiikepratiyaaviruddha “yahadaavaamaadhavasinhabanaamabeniisinhadaayarahuaahai” Versus , against -“towards or against someone. (Translates as ‘versus’ – a preposition in English) “this legal action is lodged against benisinha by madhavasinha”
Linking Problem due to POS Mismatch 3/4 • के माध्यम से, के द्वारा, के ज़रिए, के जरिए, के मार्फ़त, के मारफ़त, के मार्फत, के मारफत - किसी के द्वारा या किसी से"मैं आपको अपने मित्र के माध्यम से कुछ रुपए भेज देता हूँ”Adverb Kemaadhyama se, kedvaaraa, kejarie, kemarfata- kisiikedvaaraayaakisii se “maiaapakoapanemitrakemaadhyama se kuchharupaebhejadetaahoon” Translates as ‘through’ in English which is a preposition Through someone- “I will send you some money through my friend.”
Linking Problem due to POS Mismatch 4/4 • के लिहाज से - के आधार पर या को देखते हुए“जनसंख्या के लिहाज से विश्व में भारत का दूसरा स्थान है”Adverb kelihaaja se – keaadhaaraparayaakodekhate hue janasankhyaakelihaaja se vishva men bhaarata ka doosaraasthaanahai. In terms of or as per “In terms of population India is second in the world” ‘In terms of’ not found in EWN • ओर से, तरफ से - किसी की ओर या तरफ से"हरभजन मुंबई की ओर से खेलेंगे”Adverb ora se, tarafa se- kissikiiorayaatarafa se “harabhajanamumbaiikiiora se khelenge” On behalf of- “Harabhajan will play from Mumbai” ‘On behalf of’ not found in EWN
Linking Problem due to Hindi construction like causative • बनवाना - दाढ़ी या बाल कटवाना या पूरी तरह से निकलवा देना"मैंने नाई से दाढ़ी बनवाई“Verb banavaanaa- daadhiiyaabaalakatavaanaayaapooriitaraha se nikalavaadenaa “mainnenaaii se daadhiibanavaaii” To get a trim or shave:- To get the hair or beard cut or get it shaved completely “I got my beard shaved by the barber” • तुड़वाना, तोड़वाना, तुड़ाना, टोरवाना, तोरवाना - कोई वस्तु आदि को या उसका भाग तोड़ने का काम दूसरे से कराना "माँ राधे से लकड़ियाँ तुड़वा रही है” Verb tuDavaanaa, toDavaanaa, tuDaanaa,toravaanaa –koiivastuaadiikoyaausakaabhaagatorDanekaakaamadoosare se karavaanaa “maanraadhe se lakaDiyaantuDavaarahiihai” Cause to break:-To get some object or a part of it broken by some one. “mother is getting woods broken by radhe”
Linking Problem due to Idiomaticity • हाथ पर हाथ धरे बैठना, हाथ पर हाथ रखकर बैठना, हाथ पर हाथ रखे बैठना, खाली बैठना - कुछ न करना, ऐसे ही पड़े रहना"आप हाथ पर हाथ धरे बैठे हैं इससे कुछ नहीं होनेवाला”Verb haathaparahaathadharebaithanaa, khaaliibhaithanaa- kuchhanakaranaa, aisehiipaDerahanaa “aaphaathaparahaathadharebhaithehainisasekuchhanahiinhonevaalaa” hand on hand hold sit, empty sit To sit idle – not do anything (Does not form a single concept in English therefore cannot be linked) “you are sitting idle it is not going to help” • पेट पालना - जैसे-तैसे गुजर-बसर करना "वह किसी तरह अपना पेटपाल रहा है”Verb petapaalanaa- jaisetaisegujara-basarakaranaa “vahakisiitarahaapanaapetapaalarahaahai” stomach nourish/nurture to make ends meet - to survive somehow “he is somehow making his ends meet”
Linking Problem due to Culture Specific words • केंद्र, केन्द्र - जन्मकुंडली में ग्रहों का पहला, चौथा, सातवाँ और दसवाँ स्थान"ज्योतिषी जी केंद्र को शुभ फलदायी बता रहे हैं” Noun kendra- janmakundaliimaigrahonkaapahalaa, chauthaa, saatavaan aura dasavaansthaana “jyotishiijiikendrakoshubhafaladaayiibataarahehain Centre:- first ,fourth,seventh and tenth place of the planets in the horoscope “Astrologer is telling that centre is fruitful or well productive” • नागपुरी - नागपुर का या नागपुर से संबंधित "नागपुरी संतरे प्रसिद्ध हैं” Adjective naagapurii- naagapurakaayaanaagapura se sambandhita “naagapuriisantareprasiddhahain” Nagpuri-of or related to Nagpur “Nagapuri oranges are famous”
Linking problem due to MW • खातेदार, खाता धारक, अकाउंट होल्डर, एकाउंट होल्डर, अकाउन्ट होल्डर, एकाउन्ट होल्डर - खाता खोलने वाला व्यक्ति"खातेदार के खाते में कम से कम एक हज़ार रुपए अवश्य होने चाहिए” Noun khaatedaara, khaataadhaaraka – khaataakholanevaalaavyaktii “khaatedaarakekhaate men kama se kamaekahajaararupaeavashya hone chaahie. Account Holder- The person who opens an account “The account holder must have at least one thousand rupees in his/her account.”
Linking problem due to MW contd.. • अविश्वास प्रस्ताव - सरकार को पराजित या कमजोर करने की उम्मीद से विपक्ष के द्वारा या शायद ही कभी तत्कालीन समर्थकों द्वारा संसद के सामने पारंपरिक रूप से रखा गया एक संसदीय प्रस्ताव "विपक्षियों ने सरकार के सामने अविश्वास प्रस्ताव रखा है” Noun avishvaasaprastaava- sarakaarakoparaajitayaakamajorakaranekiiummiida se vipakshakedvaaraayaashaayadahiikabhiitatkaaliinasamarthakondvaaraasansadakesaamanepaarampaarikaroopa se rakhaagayaaekasansadiiyaprastaava “vipakshiyon ne sarakaarakesaamaneavishvaasaprastaavarakhaahai” No confidence motion-A motion of non-confidence (alternatively vote of non-confidence, censure motion, no-confidence motion, or confidence motion) is a parliamentary motion traditionally put before a parliament by the opposition in the hope of defeating or weakening a government, or, rarely by an erstwhile supporter who has lost confidence in the government. The motion is passed or rejected by means of a new parliamentary vote (a vote of non-confidence).
Migrating from PWN 2.1 to PWN 3.0 • Issue: • PWN 3.0 has a larger pool of concepts than PWN 2.1 • Hence in several cases linking to PWN 3.0 will be easier • Proposed Solution: • Migration of linked synsets from PWN 2.1 to PWN 3.0 • Upgrading the PWN database in Hindi-English WordNet Linking tool from 2.1 to 3.0
Problems due to Pos Mismatch and some Idiomatic Cases • Issue: • POS Mismatch – Corresponding sense in English may have a different POS • Idioms – In particular cases the corresponding sense in English may be available but may have a different POS • Proposed Solution: • If linking is allowed across POSs, then the tool can be accordingly adjusted to link across POSs