260 likes | 396 Views
Hindi to English Wordnet Linkage: Challenges and Solutions. Jaya Saraswati Rajita Shukla Ripple P. Goyal Pushpak Bhattacharyya. Introduction. Linking of Hindi wordnet (version 1.2) to the Princeton WordNet (version 2.1). Scenario in a Multilingual Country like India.
E N D
Hindi to English Wordnet Linkage: Challenges and Solutions Jaya Saraswati Rajita Shukla Ripple P. Goyal Pushpak Bhattacharyya
Introduction • Linking of Hindi wordnet (version 1.2) to the Princeton WordNet (version 2.1) • Scenario in a Multilingual Country like India • 22 Official languages and hundreds of dialects • Several Linguistics Families • Indo-European languages, Indo-Aryan and Dravidian languages, Austro-Asiatic, Tibeto-Burman.
Roadmap 1. Need for Linkage • 2. Challenges in Linkage • 3.Solutions • 4. WN Synset Linkage Tool • 5.Statistics
Need for Linkage • Creation of Bilingual Dictionaries • NLP tasks like Machine Translation and Cross Lingual Information Retrieval • Word Sense Disambiguation even in the absence of sense tagged corpora in target language • Create a wide wordnet grid of shared concepts.
Challenges in Linkage • Kinship Relations • Musical Instruments • Kitchen Utensils • Tools • Species • Grains • Castes • Occupations • Wages • Women denoting Caste and Occupation
Kitchen Utensils • Specific utensils डोंगा(dongaa - bowl); कटोरदान(katoradaan - container) • Size difference कलछा(kalachhaa - big ladle ); कलछी(kalachhii - small ladle)
Tools • Problem of exact matches in English • कनखोदनी(kanakhodanii); अंकुसी(ankusii) (very specific kinds of tools) • Size difference • खुर्पा(khurpaa – big spud); खुर्पी(khurpii – small spud)
Species • English WordNet does not always have synsets for the male and female of the species मेंढक (meⁿḍhaka – male frog); मेंढकी(meⁿḍhakii - female frog) • Some English concepts do not have separate synsets for species and male of the species • शेर(śera - denoting the species tiger); शेर (śera –denoting male tiger)
Grains Millet ज्वार बाजरा मँड़ुआ
Castes • लुहार (luhaara – a member of the caste of the ironsmiths) • धोबी (dhobi - a member of the caste of people who wash clothes)
Occupations • लुहारी(luhaarii - occupation/work of an ironsmith) • सुनारी (sunaarii - occupation/work of a goldsmith)
Wages • ढुलाई (dhulaaii – wages for carrying /transporting ) • पुताई (putaaii - wages for housepainting)
Women denoting Caste and Occupation • Women of various castes धोबिन (dhobina - a woman belonging to the caste of the washermen) • Wives of men from a certain caste or profession • धोबिन (dhobina - wife of a washerman)
Solutions • Two kinds of linkages: • Direct Linkage for synsets having exact equivalents in English • Hypernymy Linkage for synsets which cannot be linked directly to English concepts
Solutions (contd.) • Examples of hypernymy linkage : • चाचा (caacaa) and मामा (maamaa) – to be linked to uncle • तबला(tabalaa)etc. to be linked to drum • डोंगा (dongaa) – to be linked to tableware • कनखोदनी (kanakhodanii) – to be linked to tool • ज्वार (jwaara), बाजरा (baajaraa) - to be linked to millet
Solutions (contd.) • Terms denoting caste – to be linked to jati • Terms denoting professions – to be linked to occupation • Terms denoting remunerations – to be linked to wage • Terms for women of various castes – to be linked to jati • Terms for wives of men belonging to various castes and occupations - to be linked to wife
Solutions (contd.) • Size Differentiation in Tools and Utensils • Direct linkage for the more popular term (as in खुर्पीkhurpii) • Hypernymy linkage to be used for the other (as inखुर्पाkhurpaa) • Speciesand the male of the species • Direct linkage for term denoting species (शेरśera – linked to tiger) • Hypernymy linkage to be used to denote the male (शेरśera – againlinked to tiger)
Conclusions • Linking of the Hindi wordnet to the English wordnet, • The Challenges therein, and • The Solutions - Strategy of using Direct and Hypernymy Linkages • Help in maximizing linkages
References • ArunKarthikeyanKarra. 2010. WordNet Linking. Master of Technology Dissertation, CSE Department, IIT Bombay. • DipakNarayan, DebasriChakrabarty, PrabhakarPande and P. Bhattacharyya. 2002. An Experience in Building the Indo WordNet- a WordNet for Hindi. International Conference on Global WordNet (GWC 02), Mysore, India. • Fellbaum, C. 1998. Wordnet: An Electronic Lexical Database. The MIT Press. • J. Ramanand, AkshayUkey, BrahmKiran Singh, Pushpak Bhattacharyya. 2007. Mapping and Structural Analysis of Multi-lingual Wordnets. IEEE Data Engineering Bulletin, 30(1). • KamilBulke. 1997. An English-Hindi Dictionary (ed.). S. Chand & Co, New Delhi, India.
References • Lewis Henry Morgan. 1871. Systems of consanguinity and affinity of the human family. Smithsonian Contributions to Knowledge; v. 218, Washington DC. • MiteshKhapra, Sapan Shah, PiyushKedia and Pushpak Bhattacharyya. 2009. Projecting Parameters for Multilingual Word Sense Disambiguation. Empirical Methods in Natural Language Processing (EMNLP09), Singapore. • Dr. S. Awasthi and Dr. (Smt.) I. Awasthi. 2000. Chambers English-Hindi Dictionary (ed.). Allied Publisher Limited, New Delhi, India. • www.Shabdkosh.com • www.wikipedia.org • http://pustak.org/bs/home.html • http://www.thefreedictionary.com