250 likes | 577 Views
MTP I Stage Project Presentation Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and Engineering Indian Institute of Technology, Bombay. Interlingual word mapping. Motivation Introduction
E N D
MTP I Stage Project Presentation Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and Engineering Indian Institute of Technology, Bombay Interlingual word mapping
Motivation Introduction Introduction to Transliteration Syllables and their structure types Sonority Theory Relation between Sonority and Syllables What is Schwa? A Sonority theory based Syllabification module Results obtained References Presentation pathway
Language – an integral part of society • Each has its specific structure and rules • Some basic concepts common to all • Helpful in processes like transliteration ultimately leading to better CLIR. • We are trying to exploit them for process of syllabification Motivation
“To study some Phonological similarities between English, Hindi and Marathi and exploit them in order to achieve the goal of transliteration with high accuracy so as to be able to tackle problems like OOV words during Cross-Lingual Information Retrieval.” Problem statement
Concepts being emphasized • Transliteration • Theory of Syllables • Sonority Theory • Their relation • Theory of Schwa & Schwa deletion • Mainly based on the properties of Sound • Driving force behind word pronunciation in any language introduction
A process of phonetically “translating” named entities like proper nouns from a source language to a target language.[1] • The process of transliteration should be as accurate as possible. • Faces the problem of multiple variants of words. Introduction to transliteraton
“Syllable is a unit of spoken language consisting of a single uninterrupted sound formed generally by a Vowel and preceded or followed by one or more consonants.” • Vowels are the heart of a syllable(Most Sonorous Element) • Consonants act as sounds attached to vowels. Basic of syllables
A syllable consists of 3 major parts:- • Onset (C) • Nucleus (V) • Coda (C) • Vowels sit in the Nucleus of a syllable • Consonants may get attached as Onset or Coda. • Basic structure - CV Syllable structure
The Nucleus is always present • Onset and Coda may be absent • Possible structures • V • CV • VC • CVC Possible syllable structures
Prominence Theory • E.g. entertaining /entəteɪnɪŋ/ • The peaks of prominence: vowels /e ə eɪ ɪ/ • Number of syllables: 4 • Chest Pulse Theory • Based on muscular activities • Sonority Theory • Based on relative soundness of segment within words syllable theories
“The Sonority of a sound is its loudness relative to other sounds with the same length, stress and speech.” • Languages have sounds associated with them • Some sounds are more sonorous • Words in a language can be divided into syllables • Sonority theory distinguishes syllables on the basis of sounds. Introduction to sonority theory
Defined on the basis of amount of sound associated • The sonority hierarchy is as follows:- • Vowels (a, e, i, o, u) • Liquids (y, r, l, v) • Nasals (n, m) • Fricatives (s, z, f,…..sh, th etc.) • Affricates (ch, j) • Stops (b, d, g, p, t, k) Sonority hierarchy
Obstruents can be further classified into:- • Fricatives • Affricates • Stops Sonority scale
“A Syllable is a cluster of sonority, defined by a sonority peak acting as a structural magnet to the surrounding lower sonority elements.” • Represented as waves of sonority or Sonority Profile of that syllable Nucleus Onset Coda Sonority theory & syllables
“The Sonority Profile of a syllable must rise until its Peak(Nucleus), and then fall.” Peak (Nucleus) Onset Coda Sonority sequencing principle
ABHIJEET • Sonority Profile 1 A I E E H J B T • Sonority Profile 2 A I E E H J B T examples
“The Intervocalic consonants are maximally assigned to the Onsets of syllables in conformity with Universal and Language-Specific Conditions.” • Determines underlying syllable division • Example • DIPLOMA DIP LO MA & DI PLO MA Maximal onset principle
First alphabet of IAL – {a} • Unstressed and Toneless neutral vowel • Sanskrit is phonetically perfect – no neutral vowels • Hindi, Bengali etc. allow schwa to be neutral • Some schwas deleted and some are not • Schwa deletion – important issue for grapheme to phoneme conversion The concept of schwa
Saphalya and Amantrana Priya and Tritiya Kavya and Ashva Badhai Samuha and Chehara Badara and Kalama Kalama and Banda Schwa deletion contexts
Developed completely in Java • Platform independent • Tries to perform syllabification of words • Rides on the concepts of Sonority theory – mainly sonority sequencing principle • Makes use of Java’s Hashmap utility to save execution time. A sonority-theoretic model
Consists of three major functions:- • SonorityHierarchy() • syllabify(String word) • accuracy() • Delete_schwa() [Under Development] • Stores and references the Sonority hierarchy from the hashmap • Tries to find the syllable boundaries according to their sonority profile • Tries to delete schwas present in the input Technical overview
Syllabification and PRR generation modules implemented • Number of manually syllabified words – 27614 • No. of words fed as input – 27614 • No. of words correctly syllabified – 26253 • Accuracy obtained – 95.86 % for English and about 70% for Hindi • Accuracy of Schwa deletion in English – 77% • Schwa deletion for Hindi is under developement results
Problems faced • First rule-based implementation failed • Some specific consonant and vowel clusters still result in erroneous syllabification • Future work • Schwa deletion for Hindi and Marathi • Implementation of Maximal Onset First principle • Packaging the above implementation in a stable transliteration module to be used further in CLIR Problems and future work
Giegerich, H. J. 1992. English Phonology. An Introduction. Kahn, Daniel. 1976. Syllable-based generalizations in English phonology. Lass, Roger.Phonology: An Introduction to Basic Concepts. Cambridge University Press, 1984 References