170 likes | 276 Views
Linkage of Language Specific Synset . Resource Center for Indian Language Technology Solutions http://www.cfilt.iitb.ac.in Computer Science and Engineering Department, IIT Bombay. Outline. What is Language specific synset (LSS) What is the need of LSS Linkage of LSS Problems related to it
E N D
Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions http://www.cfilt.iitb.ac.in Computer Science and Engineering Department, IIT Bombay
Outline • What is Language specific synset (LSS) • What is the need of LSS • Linkage of LSS • Problems related to it • Solution to the problems
What is Language specific synset • A Language specific Synset is the synset based on the concept which is available only in a particular language, and which has no conceptual match in other languages. e.g., सेल रोटी in Nepali selarotii ring shaped deep fried sweet roti made of rice flour.
What is the need of LSS • The need for LSS arise to capture the following types of lexical items in a particular language to retain the uniqueness of the language. • Lexical Uniqueness • Lexical Gap • Cultural Gap • Pragmatic Gap • Lexical Mismatch
Lexical Uniqueness • Every language does posses a list of unique lexical items which refer to some unique concepts and ideas for which no conceptual equivalents are available in other languages. e.g., भुत्या (in Maraathii) Bhutyaa A devotee of Bhavaanidevii.
Lexical Gap • This refers to the phenomenon of lack of lexical equivalence between any two or more languages. When meanings of words of a language do not exactly fit into the meanings of words of the other language. e.g., Challenge (in English) There is no word, phrase or multi word to justify its meaning in Bangla
Cultural Gap • A cultural gap may originate from socio-cultural differences between the languages. It may happen that A particular language community observes some socio-cultural rites, rituals, festivals, practices etc., which are not known to the members of another language. e.g., राजा raajaa a unique socio-cultural ritual which is practiced by Oriya language groups.
Pragmatic Gap • This is caused due to the differences in lexicalization between the languages. It says that the basic concept is known to both the languages, but not expressed in the same manner. While it is expressed in a single lexicalized form in one language, it is expressed in the form of a multiword expression (i.e., phrases, idioms, etc.) in another language. e.g., भानवस, भाणवस (in maraathii) Bhaanavasa चूल्हे का पाट a Platform behind the village cooking stove
Lexical mismatch • This is a unique linguistic phenomenon where a lexical item refers a particular concept in a language, while the same lexical item refers to a different concept in another language. e.g., शिक्षा शिक्षा ShikshaaShikshaa punishment education, in Marathi preachment, moral etc. in Hindi
Linkage of LSS • Words having LS concepts is selected by a particular language group and synsets are created in the language for the concepts by the group and, parallelly, the group creates a Hindi synsets for these concepts as well. • LSSs created in this manner are sent to IITB. HWN group will verify and correct grammatical errors etc. of the Hindi synsets. Duplicate synsets will be deleted. • After verification and correction, it will be sent back to the language group to see whether corrected Hindi synsets are right or not. • If green signal is given, then it will be loaded to repository with their relations.
Problems in linkage • Duplication of synsets may occur since a concept can be in other languages as well and lexicographer may not be familiar with it. • Linkage of lexical relations e.g., antonymy relation • LSSs linked with hypernymy-hyponymy relation.
Solutions • Duplicate synsets will be nulled, as we have been doing so. • Interface will be created to link lexical relations like antonymy. • Brijesh will give suggestion for the third problem.
Linking of WordNets Language specific synsets Culture specific • Food Items • Places • Traditions Same concept in different languages? Lexical gap • Kashmiri doesn’t have lexeme for ‘Water’, However there is a lexeme for ‘Drinking Water’. Modification in hierarchy?
Creating Language Specific Synsets • Use hypernymy to describe gloss • Try to distinguish between co-hyponym • Define domain (Food Items, Place etc) • Translate gloss in Hindi and English
Common Concept Hierarchy Uncle Kaka mama Doddappa Chikkappa
Common Index Creating Common concept hierarchy for all languages • Use concept hierarchy of Hindi language as starting point • Add concepts and modify hierarchy for each language • Translate gloss in Hindi & English to compare synsets of two different languages.