140 likes | 248 Views
Building Non Native Pronunciation Lexicon for English using a Rule Based Approach. Rohit Kumar * , Amit Kataria, Sanjeev Sofat Department of Computer Science and Engineering Punjab Engineering College, Chandigarh * Language Technologies Research Center IIIT Hyderabad.
E N D
Building Non Native Pronunciation Lexicon for English using a Rule Based Approach Rohit Kumar *, Amit Kataria, Sanjeev Sofat Department of Computer Science and Engineering Punjab Engineering College, Chandigarh * Language Technologies Research Center IIIT Hyderabad
Outline of the Presentation • Introduction • Need, Problems, Suggestion • Overview of the Approach • Grapheme to Phoneme Alignment • Applying the Rules • Approaches for Building the Rules • Conclusion Building Non Native Pronunciation Lexicon for English using a Rule Based Approach
Introduction: Need for Pronunciation Lexicons • TTS - Text Processing Modules for Non Phonetic Languages like English for Pronunciation Lookups • Building Letter to Sound rules by Learning Approaches • Building Language Models for Recognition Systems Building Non Native Pronunciation Lexicon for English using a Rule Based Approach
Introduction: Appearance of Pronunciation Variations • Pronunciation Variations arise when foreign language is spoken by non – native speakers • Speakers try to speak the foreign language under the pronunciation constraints of their native language Building Non Native Pronunciation Lexicon for English using a Rule Based Approach
Introduction: Problem & Suggestion • Pronunciation Lexicon building is a time consuming manual process • Since, Lexicons are needed, how do we build these up rapidly and semi automatically with minimal effort. • Pronunciation Lexicon in Native pronunciation of Foreign Languages are easily available (particularly for English) • A Rule based approach using these Native Lexicons to build Non Native Lexicons is proposed. • An Example based approach for Rule Building is also suggested Building Non Native Pronunciation Lexicon for English using a Rule Based Approach
Overview of the Approach:3 Step Process Step 1: Word Lookup: lieutenant > l e f t e n @ n t Step 2: Grapheme – Phoneme Alignment Step 3: Rule Application Building Non Native Pronunciation Lexicon for English using a Rule Based Approach
Overview of the Approach:Issues • How to do Grapheme Phoneme Alignment ? (Step 2) • How the Rules are triggered and how do they work ? (Step 3) • How to come up with the Rules ? Building Non Native Pronunciation Lexicon for English using a Rule Based Approach
Grapheme – Phoneme Alignment • Problem is that a Phoneme (Grapheme) may match with zero, one or many Graphemes (Phonemes) • Algorithm based on matching the phoneme with a grapheme from its list of possible grapheme • Consonant Phoneme matches with Consonant Grapheme and Vowel Phoneme Matches with possible Vowel Phoneme (a, e, i, o, u, y) Building Non Native Pronunciation Lexicon for English using a Rule Based Approach
Format of Rules 5 Fields Left Context LC Source Phoneme PH Right Context RC Corresponding Grapheme GR Target Phoneme OPH Wild Cards (*) are allowed Phonemes are represented by a bit pattern (and hence a number) describing their phonetic properties Building Non Native Pronunciation Lexicon for English using a Rule Based Approach
Applying the Rules LC = * ; PH = @ ; RC = * ; GR = “o” ; OPH = “au” ; LC = * ; PH = % ; RC = * ; GR = “re” | “r” ; OPH = “r” ; Issue: What should be the shape of the Window shown ?? Building Non Native Pronunciation Lexicon for English using a Rule Based Approach
Building the Rules • Manually (observe and write) • Example based Approach • Map each source phoneme to nearest target phoneme • Find English words with a native pronunciation different from that obtained by the above mapping • Automatically forms rules to model the difference that is observed • Merge the newly formed rules with existing rules Building Non Native Pronunciation Lexicon for English using a Rule Based Approach
Building the RulesExample Based Approach LC = aa ; PH = % ; RC = m ; GR = “l”; OPH = “l” ; Building Non Native Pronunciation Lexicon for English using a Rule Based Approach
Building the RulesExample Based Approach (contd.) MERGING OF RULES New Rule: LC = aa ; PH = % ; RC = m ; GR = “l”; OPH = “l”; Existing Rule: LC = e ; PH = % ; RC = b ; GR = “ll”; OPH = “l”; Merged Rule: LC = aa|e; PH = % ; RC = m|b ; GR = “l”|”ll”; OPH = “l”; Building Non Native Pronunciation Lexicon for English using a Rule Based Approach
Conclusion • Reuse of information available in existing lexicon along with Rules can help in rapid creation of non native lexicons • Algorithm for Grapheme to Phoneme is presented • Issue of Window Shape for rule application needs further experimentation • Example based approach can be used for rule building Building Non Native Pronunciation Lexicon for English using a Rule Based Approach