Speech-to-Speech Infrastructure Based on UIMA

Speech-to-Speech Infrastructure Based on UIMA Jan Kleindienst, Ph.D. (on behalf of TC_STAR partners) Manager, Conversational Interactions and Architectures IBM Prague Sectional slide

Overview • Challenges • Approach • The Resulting Infrastructure • Use Cases • Conclusion

What is a speech-to-speech system? • S2S system translates spoken input from a source language to a target language • Speech-to-speech systems typically consist of three main processing blocks: • Transcription • Translation • Synthesis ASR TTS MT

Challenges TC_STAR Project , 2004-2007, www.tc-star.org • Create an open technological infrastructure to support effective delivery of scientific results from speech-to-speech research community • Online distributed speech-to-speech infrastructure for automatic performance evaluation of end-2-end systems as well as individual components • Open technological framework based on open-source Unstructured Information Management Architecture (UIMA)

Key Challenge: Support Online System Combinations and Automatic Evaluations RWTH IBM ? ELDA LIMSI UKA ITC-Irst UPC

Approach: Pick such an infrastructure, which… UIMAComponent Model: • …specifies a common data format understood by all speech-to-speech components • …has well-defined APIs that let the enginespass the data in and read them out • …transparently takes care of network and local connectivity options • …requires just minimum coding to plug the proprietary engines to the infrastructure • Common MUMA Type System • initialize(), process(), destroy(), … • Java/C++/… local calls or SOAP and Vinci • Concept of UIMA Annotators

Unstructured Information Analysis Bridge Structured Information …. Inefficient Search Efficient Search Unstructured Information Management Architecture (UIMA) • What is UIMA? In Business Terms => the Analysis Bridge between unstructured and structured information In Technical Terms => infrastructure for integrating, processing and data managing all kinds of data driven engine entities, incl. support on monitoring • Key features • UIMA is an emerging standard for text and media processing • UIMA SDK is open source under Apache license • UIMA infrastructure supports interoperability between platforms, component interfacing via Java, C++, Python, Perl, and remote/networked services • Offers a simple XML based integration with UIMA APIs • Distributed data exchange which supports complex data structures

CAS CAS Meta-data Meta-data data data UIMA Annotator Wrapper code How to make components UIMA-pluggable? • Step1: Implement the required Annotator interface -=> initiate() & process() methods • Step2: Specify Component Descriptor XML file for configuration and lifecycle • Step3: Define in and out data structures of the Type System component descriptor proprietary engine

TC_STAR Speech to Speech Evaluation infrastructure Collection Processing Engine CAS CAS CAS CAS CAS evaluation target audio target audio target text target text target text source text source text source text source text pcm pcm pcm pcm pcm Upload Download Annotator API Annotator API Annotator API Annotator API Wrapper coder Wrapper coder wrapper code Wrapper code ASR SLT Evaluation TTS Vinci Name Service Evaluation Data results Evaluation Reports Evaluation Data input http

TC_STAR Speech-2-Speech pan-European deployment Download Upload ASR TTS RWTH Data Web server IBM SLT Eval CPE ASR ELDA Control Web Server LIMSI SLT ASR UKA Puncuator Vinci name server ASR Rover ASR ITC-Irst Annotator UIMA/other SLT UPC Profile 1: ASR->SLT->TTS->EVAL (with ASR ROVER) Upload Profile 2 ASR->SLT->TTS->EVAL in different setup TTS

Current user and status UIMA Web Control Console Annotators combination in use for the experiment Experiment ID, and the set of input data Distributed Logging and Monitoring AJAX infrastructure Links to graphical speech-to-speech evaluation results

UIMA Web Control Console Processing engine Indication of active engine Path of completed processing Engine where the data are currently processed

Lessons learned… • Pain in placing machines on public IPs • Firewall configuration for all participating machines, local IT people ;-) • Need to support variety of Linux distributions to host UIMA … • Partially eliminated by UIMA school development warm up • Variety of programming languages for writing Annotators • Java, C++, Perl, Python, … • Broad Requirements on Common Type System • Punctuation, Casing, Lattices • Support for individual secure data download/upload of data server • Authentication, HTTPS, Firewall rules • Web console for controlling the evaluation lifecycle • Concept of profiles, experiment ids, monitoring • Remote Logging and Debugging • Distributed logging capabilities, Logging to Web console • Reliability of components and networks

Speech-to-Speech Showcases • UIMA S2S Evaluation Web Portal • The video demonstrates how S2S portal users (e.g. S2S researchers) set up, test, and evaluate speech-to-speech chains consisting of individual text and media processing components such as ASR, machine translation, TTS, etc. These components, in UIMA jargon called Annotators, are exported as Web services on public Internet and glued together by UIMA. More that 15 annotators are currently exported by IBM and EU institutes and universities. • http://www.tc-star.org/Demo/ibm/web_console_batch.swf • UIMA S2S Translation Video Console • The individual Web service components can be assembled online into remote services that provide direct value to citizens. We show a video console that translates from English to Spanish (EU parliamentary domain). Note that the three Web services involved – ASR, MT, TTS are hosted by three different sites hundred kilometers away – glued together by UIMA. • http://www.tc-star.org/Demo/ibm/video_console_near_real_time.swf

Conclusion • First-of-a-kind online multi-partner speech-to-speech system demonstrated on UIMA (Jun 06-May 07) • Remote speech-to-speech components dynamically combined via UIMA infrastructure to support different combinations, e.g. ROVER • Annotators hosted on public IPs of partner’s site • The framework controlled via UIMA Web AJAX infrastructure • The open infrastructure is used to automatically set-up and evaluate individual components as well as end-to-end systems • Designed to support various use cases from research experiments to technology showcasing

Speech-to-Speech Infrastructure Based on UIMA

Speech-to-Speech Infrastructure Based on UIMA

Presentation Transcript

The Speech Speech

Landmark-Based Speech Recognition

Wavelet-Based Speech Enhancement

Speech Segregation Based on Sound Localization

Recommendations Based on Speech Classification

Speech Processing Text to Speech Synthesis

Speech

REPORTED SPEECH / INDIRECT SPEECH

Speech

Wavelet-Based Speech Enhancement

Speech Segregation Based on Oscillatory Correlation

The Use of Speech in Speech-to-Speech Translation

Speech-based Information Retrieval

Reported Speech (Direct to Indirect Speech)

A Game Based on Speech Recognition

On-Speech Audio

Speech On Environment

Speech Enhancement Based on Nonparametric Factor Analysis

Wavelet-Based Speech Enhancement

Landmark-Based Speech Recognition

Landmark-Based Speech Recognition