Initiation of Standardization on Network-based Speech-to-speech Translation at ITU-T SG16

Initiation of Standardization on Network-based Speech-to-speech Translation at ITU-T SG16 National Institute of Information and Communications Technology, Japan Satoshi Nakamura Chiori Hori

Many Languages All Over the World http://en.wikipedia.org/wiki/List_of_language_families

Breaking Language Boundaries • Language boundaries is one of the causes of barriers to mutual understanding. • To remove language boundaries between people who speak different languages, Speech-to-Speech Translation (S2ST) technologies are an effective means of communication. • S2ST technologies have been studied.

Speech-to-Speech Translation (S2ST) Speech Recognition (ASR) Machine Translation (MT) Speech Synthesis (TTS) English “I go to school” Japanese 「私は学校に行く」 w a t a shi w a g a xtu k o o n i….. I to school go 私は学校に行く I go to school Convert to Japanese phoneme sequence “w”, “a”, “t”… Convert to word sequence using lexicon and grammar Convert to English word sequence 「私は」⇒ “I” 「学校に」⇒“to school” 「行く」⇒“go” Reorder word sequences according to English grammar “I” 　　　　“I” “to school” 　“go” “go”　　　“to school” Select appropriate waveform for English text Corpora

Stand Alone and Client-server S2ST Systems Stand alone system Client-serversystem Japanese speech “おはようございます．” Packages the entire speech translation functions into a handheld PC Indonesian Japanese English speech “Good morning.” Chinese English

Why Network-based? • Resource limitation in stand alone systems and language pairs are limited. • ASR/MT/TTS systems for many languages are available and needs to be maintained by each country. • Broadband network is available.

Standardization on Network-based S2ST Speech of Language B Speech of Language A Synthesized Speech Synthesized Speech S2ST Client S2ST Client Parallel corpus, Speech data, lexicon Standardization ASR ASR Lexicon Speech data Lexicon Speech data Language B Language A Data format for ASR and MT results MT MT Parallel corpus lexicon Parallel corpus lexicon Language A Language B Language B Language A Communication protocol among modules TTS TTS Lexicon Speech data Lexicon Speech data Language A Language B

Lexicon for overall S2ST systems • The global standardization for lexicon format and a system to collect and provide lexicon for all languages is requisite to maintaining reliable lexicon for overall S2ST systems. An example of a lexicon for overall modules in S2ST systems

Asian Network-Based S2ST System by A-STAR Consortium 1National Institute of Information and Communications Technology (NICT), Japan 2Electronics and Telecommunications Research Institute (ETRI), Korea 3Chinese Academy of Sciences (CASIA), China 4National Electronics and Computer Technology Center (NECTEC), Thailand 5Agency for the Assessment and Application of Technology (BPPT), Indonesia 6Center for Development of Advance Computing (CDAC), India 7Institute of Information Technology (IOIT), Vietnam 8Institute for Infocomm Research (I2R), Singapore

Server Location for Network-based S2ST

Speech Translation using Distributed Service Servers Example: From Korean to ThaiSpeech Translation ① Speech recognition (Korean) ASR server ② Language translation (Korean→Thai) Text (Korean) Speech (Korean) Speech translation service client TTS server MT server Translated text (Thai) MT server Synthesized speech (Thai) ③ Speech synthesis (Thai) TTS server ASR server

S2ST Client and Server

Scope of Standardization Table : Draft Roadmap to develop standards for network-based S2ST

Conclusion • We would like to invite more people to standardization activities on network-based S2ST systems. • By leveraging the standardization, network-based S2ST systems can cover more languages.

Initiation of Standardization on Network-based Speech-to-speech Translation at ITU-T SG16

Initiation of Standardization on Network-based Speech-to-speech Translation at ITU-T SG16

Presentation Transcript

Speech-to-Speech Translation: A New Direction for the Speech Industry

Global Speech-to-speech Translation Market 2012-2016

Recommendations Based on Speech Classification

ITU-T Standardization on Countering Spam

Coupling between ASR and MT in Speech-to-Speech Translation

ITU-T Security Standardization

LS from ITU-T SG16 – key messages

ITU-T SG16 and JCA-IoT activities

AVIVAVOZ: technologies for speech-to-speech translation

Coupling between ASR and MT in Speech-to-Speech Translation

E-health standardization activities in ITU-T SG16

The Use of Speech in Speech-to-Speech Translation

Speech Translation on a PDA

ITU-T standardization activity on testing

ITU-T standardization directions

Machine Translation Speech Translation

ITU-T Standardization on Countering Spam

Survey of Speech-to-speech Translation Systems: Who are the players

Speech-to-Speech Infrastructure Based on UIMA

Security Standardization in ITU-T

Standardization at ITU

ITU-T SG16 and JCA-IoT activities