100 likes | 199 Views
Position Paper for W3C Workshop on Internationalizing SSML The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML. 2005. 11. 3. Myoung-Wan Koo †‡ and Du-Seong Chang † KT † /KAIT ‡. Introduction. Multiple pronunciation problem Same word but different pronunciations
E N D
Position Paper for W3C Workshop on Internationalizing SSMLThe Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML 2005. 11. 3. Myoung-Wan Koo†‡ and Du-Seong Chang† KT†/KAIT‡
Introduction • Multiple pronunciation problem • Same word but different pronunciations • Newton: /nju:tən/ v.s. /nu:tən/ • Same spelling but different pronunciations (homograph) • refuse: /rɪ'fju:z/ v.s. /'refju:s/ <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme>nju:tən</phoneme> <phoneme>nu:tən</phoneme> </lexeme> <lexeme> <grapheme>refuse</grapheme> <phoneme> rɪ'fju:z </phoneme> <phoneme>'refju:s</phoneme> </lexeme> </lexicon>
Multiple pronunciation in SSML&PLS • SSML • The Speech Synthesis Markup Language Specification Version 1.0 • Pronunciation information in SSML • Phoneme element • Lexicon element • PLS • Pronunciation Lexicon Specification Version 1.0 • Pronunciation information in PLS • Phoneme element • Prefer attribute • They doesn’t fully support the pronunciation lexicon for multiple pronunciations and agglutinative language. • Part-Of-Speech information is needed
Pronunciation information in PLS (1/2) • Pronunciation Lexicon Specification • Version 1.0/Feb 2005/W3C Voice Browser Working Group • It allow interoperable specification of pronunciation information for either ASR and TTS engines within voice browsing applications. • It is expected to handle multiple pronunciation. • Example of PLS <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns=“http://www.w3.org/2005/01/pronunciation-lexicon’ alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>tomato</grapheme> <phoneme> təmei̥ɾou</phoneme> </lexeme> </lexicon>
Pronunciation information in PLS (2/2) • Prefer attribute of phoneme element • Give one pronunciation high priority among pronunciation candidates. • Effective in speech synthesis • Only in multiple pronunciations for same orthography • Not in homograph problem • refuse: verb/rɪ'fju:z/ v.s. noun/'refju:s/ • No information for ASR systems. <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme prefer="true">nju:tən</phoneme> <phoneme>nu:tən</phoneme> </lexeme> </lexicon>
Typical Korean TTS system structure Structural Information Morphemes, POS Phonemes, POS Phonemes, Prosody Text Morphological Analyzer Grapheme-to- Phoneme Prosody Analysis Waveform production Speech
POS for resolving multiple pronunciations • POS information can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. • The word “refuse” can have two different pronunciations depending on pos information. • Proposal: POS attribute <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>refuse</grapheme> <phoneme pos=“verb”> rɪ'fju:z </phoneme> </lexeme> <lexeme> <grapheme>refuse</grapheme> <phoneme pos=“noun”>'refju:s</phoneme> </lexeme> </lexicon>
POS information for LVCSR • Large vocabulary continuous speech recognition of agglutinative language • Basic unit is morpheme (pseudo-morpheme) for reducing the vocabulary size. • Many homographs in the recognition dictionary. • POS information help system to get a proper pronunciation in a dictionary as well as to resolve multiple pronunciations in some words. • It reduce the search time since POS information could cut the wrong word connection in the first stage, not in the semantic interpretation stage.
Proposals • Proposal 1: POS attribute of phoneme element • Optional attribute • Proposal 2: POS element • Lexeme element contain optional POS elements. • POS values: language-specific • Type: allow vendor-specific POS type? • Outstanding POS set: Penn Treebank, Sejong project (Korean) <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>refuse</grapheme> <phoneme> rɪ'fju:z </phoneme> <pos> verb </verb> </lexeme> </lexicon>
Conclusion • No element or attribute for resolving multiple pronunciations • In current SSML, PLS • POS information • can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. • Can reduce the search time in a large vocabulary recognition system. • Can be effective in agglutinative language. • Proposals • POS element • POS attribute