150 likes | 276 Views
Speech-to-Speech Infrastructure Based on UIMA. Jan Kleindienst, Ph.D. (on behalf of TC_STAR partners) Manager, Conversational Interactions and Architectures IBM Prague. Sectional slide. Overview. Challenges Approach The Resulting Infrastructure Use Cases Conclusion.
E N D
Speech-to-Speech Infrastructure Based on UIMA Jan Kleindienst, Ph.D. (on behalf of TC_STAR partners) Manager, Conversational Interactions and Architectures IBM Prague Sectional slide
Overview • Challenges • Approach • The Resulting Infrastructure • Use Cases • Conclusion
What is a speech-to-speech system? • S2S system translates spoken input from a source language to a target language • Speech-to-speech systems typically consist of three main processing blocks: • Transcription • Translation • Synthesis ASR TTS MT
Challenges TC_STAR Project , 2004-2007, www.tc-star.org • Create an open technological infrastructure to support effective delivery of scientific results from speech-to-speech research community • Online distributed speech-to-speech infrastructure for automatic performance evaluation of end-2-end systems as well as individual components • Open technological framework based on open-source Unstructured Information Management Architecture (UIMA)
Key Challenge: Support Online System Combinations and Automatic Evaluations RWTH IBM ? ELDA LIMSI UKA ITC-Irst UPC
Approach: Pick such an infrastructure, which… UIMAComponent Model: • …specifies a common data format understood by all speech-to-speech components • …has well-defined APIs that let the enginespass the data in and read them out • …transparently takes care of network and local connectivity options • …requires just minimum coding to plug the proprietary engines to the infrastructure • Common MUMA Type System • initialize(), process(), destroy(), … • Java/C++/… local calls or SOAP and Vinci • Concept of UIMA Annotators
Unstructured Information Analysis Bridge Structured Information …. Inefficient Search Efficient Search Unstructured Information Management Architecture (UIMA) • What is UIMA? In Business Terms => the Analysis Bridge between unstructured and structured information In Technical Terms => infrastructure for integrating, processing and data managing all kinds of data driven engine entities, incl. support on monitoring • Key features • UIMA is an emerging standard for text and media processing • UIMA SDK is open source under Apache license • UIMA infrastructure supports interoperability between platforms, component interfacing via Java, C++, Python, Perl, and remote/networked services • Offers a simple XML based integration with UIMA APIs • Distributed data exchange which supports complex data structures
CAS CAS Meta-data Meta-data data data UIMA Annotator Wrapper code How to make components UIMA-pluggable? • Step1: Implement the required Annotator interface -=> initiate() & process() methods • Step2: Specify Component Descriptor XML file for configuration and lifecycle • Step3: Define in and out data structures of the Type System component descriptor proprietary engine
TC_STAR Speech to Speech Evaluation infrastructure Collection Processing Engine CAS CAS CAS CAS CAS evaluation target audio target audio target text target text target text source text source text source text source text pcm pcm pcm pcm pcm Upload Download Annotator API Annotator API Annotator API Annotator API Wrapper coder Wrapper coder wrapper code Wrapper code ASR SLT Evaluation TTS Vinci Name Service Evaluation Data results Evaluation Reports Evaluation Data input http
TC_STAR Speech-2-Speech pan-European deployment Download Upload ASR TTS RWTH Data Web server IBM SLT Eval CPE ASR ELDA Control Web Server LIMSI SLT ASR UKA Puncuator Vinci name server ASR Rover ASR ITC-Irst Annotator UIMA/other SLT UPC Profile 1: ASR->SLT->TTS->EVAL (with ASR ROVER) Upload Profile 2 ASR->SLT->TTS->EVAL in different setup TTS
Current user and status UIMA Web Control Console Annotators combination in use for the experiment Experiment ID, and the set of input data Distributed Logging and Monitoring AJAX infrastructure Links to graphical speech-to-speech evaluation results
UIMA Web Control Console Processing engine Indication of active engine Path of completed processing Engine where the data are currently processed
Lessons learned… • Pain in placing machines on public IPs • Firewall configuration for all participating machines, local IT people ;-) • Need to support variety of Linux distributions to host UIMA … • Partially eliminated by UIMA school development warm up • Variety of programming languages for writing Annotators • Java, C++, Perl, Python, … • Broad Requirements on Common Type System • Punctuation, Casing, Lattices • Support for individual secure data download/upload of data server • Authentication, HTTPS, Firewall rules • Web console for controlling the evaluation lifecycle • Concept of profiles, experiment ids, monitoring • Remote Logging and Debugging • Distributed logging capabilities, Logging to Web console • Reliability of components and networks
Speech-to-Speech Showcases • UIMA S2S Evaluation Web Portal • The video demonstrates how S2S portal users (e.g. S2S researchers) set up, test, and evaluate speech-to-speech chains consisting of individual text and media processing components such as ASR, machine translation, TTS, etc. These components, in UIMA jargon called Annotators, are exported as Web services on public Internet and glued together by UIMA. More that 15 annotators are currently exported by IBM and EU institutes and universities. • http://www.tc-star.org/Demo/ibm/web_console_batch.swf • UIMA S2S Translation Video Console • The individual Web service components can be assembled online into remote services that provide direct value to citizens. We show a video console that translates from English to Spanish (EU parliamentary domain). Note that the three Web services involved – ASR, MT, TTS are hosted by three different sites hundred kilometers away – glued together by UIMA. • http://www.tc-star.org/Demo/ibm/video_console_near_real_time.swf
Conclusion • First-of-a-kind online multi-partner speech-to-speech system demonstrated on UIMA (Jun 06-May 07) • Remote speech-to-speech components dynamically combined via UIMA infrastructure to support different combinations, e.g. ROVER • Annotators hosted on public IPs of partner’s site • The framework controlled via UIMA Web AJAX infrastructure • The open infrastructure is used to automatically set-up and evaluate individual components as well as end-to-end systems • Designed to support various use cases from research experiments to technology showcasing