2005. 11. 3. Myoung-Wan Koo †‡ and Du-Seong Chang † KT † /KAIT ‡

Position Paper for W3C Workshop on Internationalizing SSMLThe Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML 2005. 11. 3. Myoung-Wan Koo†‡ and Du-Seong Chang† KT†/KAIT‡

Introduction • Multiple pronunciation problem • Same word but different pronunciations • Newton: /nju:tən/ v.s. /nu:tən/ • Same spelling but different pronunciations (homograph) • refuse: /rɪ'fju:z/ v.s. /'refju:s/ <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme>nju:tən</phoneme> <phoneme>nu:tən</phoneme> </lexeme> <lexeme> <grapheme>refuse</grapheme> <phoneme> rɪ'fju:z </phoneme> <phoneme>'refju:s</phoneme> </lexeme> </lexicon>

Multiple pronunciation in SSML&PLS • SSML • The Speech Synthesis Markup Language Specification Version 1.0 • Pronunciation information in SSML • Phoneme element • Lexicon element • PLS • Pronunciation Lexicon Specification Version 1.0 • Pronunciation information in PLS • Phoneme element • Prefer attribute • They doesn’t fully support the pronunciation lexicon for multiple pronunciations and agglutinative language. •  Part-Of-Speech information is needed

Pronunciation information in PLS (1/2) • Pronunciation Lexicon Specification • Version 1.0/Feb 2005/W3C Voice Browser Working Group • It allow interoperable specification of pronunciation information for either ASR and TTS engines within voice browsing applications. • It is expected to handle multiple pronunciation. • Example of PLS <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns=“http://www.w3.org/2005/01/pronunciation-lexicon’ alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>tomato</grapheme> <phoneme> təmei̥ɾou</phoneme> </lexeme> </lexicon>

Pronunciation information in PLS (2/2) • Prefer attribute of phoneme element • Give one pronunciation high priority among pronunciation candidates. • Effective in speech synthesis • Only in multiple pronunciations for same orthography • Not in homograph problem • refuse: verb/rɪ'fju:z/ v.s. noun/'refju:s/ • No information for ASR systems. <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme prefer="true">nju:tən</phoneme> <phoneme>nu:tən</phoneme> </lexeme> </lexicon>

Typical Korean TTS system structure Structural Information Morphemes, POS Phonemes, POS Phonemes, Prosody Text Morphological Analyzer Grapheme-to- Phoneme Prosody Analysis Waveform production Speech

POS for resolving multiple pronunciations • POS information can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. • The word “refuse” can have two different pronunciations depending on pos information. • Proposal: POS attribute <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>refuse</grapheme> <phoneme pos=“verb”> rɪ'fju:z </phoneme> </lexeme> <lexeme> <grapheme>refuse</grapheme> <phoneme pos=“noun”>'refju:s</phoneme> </lexeme> </lexicon>

POS information for LVCSR • Large vocabulary continuous speech recognition of agglutinative language • Basic unit is morpheme (pseudo-morpheme) for reducing the vocabulary size. • Many homographs in the recognition dictionary. • POS information help system to get a proper pronunciation in a dictionary as well as to resolve multiple pronunciations in some words. • It reduce the search time since POS information could cut the wrong word connection in the first stage, not in the semantic interpretation stage.

Proposals • Proposal 1: POS attribute of phoneme element • Optional attribute • Proposal 2: POS element • Lexeme element contain optional POS elements. • POS values: language-specific • Type: allow vendor-specific POS type? • Outstanding POS set: Penn Treebank, Sejong project (Korean) <?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>refuse</grapheme> <phoneme> rɪ'fju:z </phoneme> <pos> verb </verb> </lexeme> </lexicon>

Conclusion • No element or attribute for resolving multiple pronunciations • In current SSML, PLS • POS information • can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. • Can reduce the search time in a large vocabulary recognition system. • Can be effective in agglutinative language. • Proposals • POS element • POS attribute

2005. 11. 3. Myoung-Wan Koo †‡ and Du-Seong Chang † KT † /KAIT ‡

2005. 11. 3. Myoung-Wan Koo †‡ and Du-Seong Chang † KT † /KAIT ‡

Presentation Transcript

The Robotic KAIT and its SN/GRB Program

Yu-Lin Eda Chang National Taiwan Normal University

Yu-Lin Eda Chang

Biodiversity Scientist: Thomas Ming Swi Chang

Kait Doucette

Weiwei Wang, Seong C. Park, Bruce McCarl , and Steve Amosson

Hurricane Katrina, 2005

Project: IEEE P802.15 Working Group for Wireless Personal Area Networks (WPANs)

The Chang Family

Sung-Hyuck Lee, Seong-Ho Jeong, Hannes Tschofenig, Xiaoming Fu, Jukka Manner

Seong Ho NA, Ph.D ED for Radiation and Radwaste Safety Korea Institute of Nuclear Safety

NT x 10 3

Sung-Hyuck Lee, Seong-Ho Jeong, Hannes Tschofenig, Xiaoming Fu, Jukka Manner

昌禾精密電子有限公司

Start Your Business Mobility

F E B R U A R Y 2 0 0 5

Sylvia Hurtado, June C. Chang, Mitch J. Chang UCLA Higher Education Research Institute

Chang´e-1 ends mission with planned crash

A Comparison of Equal-Area Map Projections for Regional and Global Raster Data