1 / 12

Japanese SSML Standards in Voice Synthesis Technology Workshop

Explore JEITA’s role in setting Japanese SSML standards for accurate voice synthesis in technology. Learn about notations for pronunciation, speech rate, and ruby elements in Japanese text. Dive into the future of voice synthesis technology.

andrewse
Download Presentation

Japanese SSML Standards in Voice Synthesis Technology Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. W3C Workshop on SSML, Nov2-3,2005, Beijing Issues of SSML in Japanese Wataru IMATAKE (ANIMO LIMITED) Makoto AKABANE (Sony Computer Entertainment Inc.) Kazuyo TANAKA (Tsukuba University) JEITA Technical Standardization Group on Speech Input/Output Systems JEITA Speech Group

  2. 1-1 About JEITA • JEITA (Japan Electronics and Information Technology Industries Association) is industry organization about information systems, personal information device, digital appliance, industrial or social system device and electronic parts. • JEITA was established in November 2000, by merging Japanese Electronic Industry Development Association (JEIDA) and Electronic Industries Association of Japan (EIAJ). JEITA Speech Group

  3. 1-2 JEITA Speech Group, Activities • Expert Committee on Speech Input/Output Systems (JEIDA Speech Group) was established "JEIDA-62-2000 Standard of Symbols for Japanese Text-to-Speech Synthesizer" as JEIDA standard, in March, 2000. • Revised version of JEIDA-62-2000 was published in March, 2005 , as “JEITA-IT-4002”. • JEIDA-62-2000 included control tags for synthesizers, defined by XML. • However, the control tags are removed in "JEITA-IT-4002“. JEITA Speech Group

  4. 2-1 How to specify Japanese pronunciation in phoneme element "JEITA IT-4002: Symbols for Japanese Text-to-Speech Synthesizer " • Two levels for notation: kana level notation with Japanese katakana, and phonemic level with IPA or SAMPA. We suggest that we describe it with "x-JEITA-IT-4002-kana", "x-JEITA-IT-4002-ipa", "x-JEITA-IT-4002-sampa" as alphabet attribute. JEITA Speech Group

  5. 2-2 How to specify Japanese pronunciation in phoneme element JEITA Speech Group

  6. 3-1 How to specify speaking rate in Japanese • A basic unit of Japanese rhythm is mora. • Mora is called "拍"(haku) in Japanese. For example, a haiku is described in 5-7-5 haku. “こんにちわ”/ko N ni chi wa/ →5 moras “しゃしん”/sya si n/→3 moras Japanese, /sya sin/→2 syllables English • Therefore, it is natural to specify the speaking rate / Japanese phoneme length by a number of mora. • To specify speaking rate in rate attribute of prosody element, use a unit of mora/sec. • By the same token, to specify pause time in time attribute of break element, use a unit of mora. JEITA Speech Group

  7. 3-2 How to specify speaking rate in Japanese JEITA Speech Group

  8. 4-1 ruby element • There is a lot of different meaning word of the same type (a reading different by the same notation) in a Japanese kanji. • For a long time, the newspaper publishing companies or magazine companies used a ruby to understand kanji words easier for readers. • In addition, there is a function to describe a ruby, and it is generally used for the word processor which is used a lot in Japan. (Ex. Microsoft Word, Justsystem ICHITARO, OpenOffice writer, etc) • Therefore, there are a lot of contents of a text including a ruby in Japan. • Japanese voice synthesis engines can reduce misreading by utilizing a ruby positively. • A ruby is usually described Japanese katakana or a hiragana letter. Therefore, a ruby does not fit a phoneme element. JEITA Speech Group

  9. 4-2 ruby element • We know "Ruby Annotation - W3C Recommendation 31 May 2001"(http://www.w3.org/TR/ruby/) , but this is overspecialization for voice synthesis. Layout information is unnecessary for a voice synthesis. • The simplest expression of the ruby is enough for a voice synthesis. • Therefore, we propose that a ruby element be defined newly. JEITA Speech Group

  10. 4-3 ruby element JEITA Speech Group

  11. 5-1 Expansion of an say-as element • There are different readings (both are right) in Japanese in the same meaning and the same notation. • For example, 「二十日」can be read as [ニジュウニチ」(ni-jyu-ni-chi) and 「ハツカ」(ha-tsu-ka) with same notation. Both mean 20th of the month. • In this case, SSML should provide a function that a creator can choose whether a voice synthesis engine reads "10/20" with "ジューガツハツカ" (jyu-gatsu-ha-tsu-ka) or "ジューガツニジューニチ"(jyu-gatsu-ni-jyu-ni-chi). • Therefore, we propose the attribute that can speak a Japanese language reading of a date for a say-as element. • We are still examining this issues. JEITA Speech Group

  12. 5-2 Expansion of an say-as element JEITA Speech Group

More Related