400 likes | 541 Views
Machine Translation in the Spoken Domain Mobile Speech-to-Speech Translation The Verbmobil System. Natalie Bruckwilder, Jana Friedrich, Katharina Klassen, Nina Klusmeyer, Vanessa Kroeger. Overview. 1. Introduction 2. Challenges tackled by Verbmobil
E N D
Machine Translation in the Spoken Domain Mobile Speech-to-Speech Translation The Verbmobil System Natalie Bruckwilder, Jana Friedrich, Katharina Klassen, Nina Klusmeyer, Vanessa Kroeger.
Overview 1. Introduction 2. Challenges tackled by Verbmobil 3. Verbmobil´s Massive Data Collection Effort 4. The main Components of Verbmobil 5. Use of Prosodic Information 6. The Multi-Blackboard Architecture 7. Verbmobil´s Multi-Engine Approach 8. Summarizing Dialogs 9. Conclusion
1. Introduction: What is the Verbmobil System? • an “intelligent” software system that provides mobile phone users with simultaneous dialog interpretation services for restricted topics • a longterm interdisciplinary Language Technology research project (1993-2000) with the aim to develop a system that can recognize, translate and produce natural utterances and thus translate spontaneous speech robustly and bidirectionally for German/English and German/Japanese • funded by Germany's Federal Ministry of Research and Technology, the “Bundesministerium für Forschung und Technologie” with roughly 60 million Euros
Context-sensitive Translations previous dialog translation systems translated sentence-by-sentence, but Verbmobil provides context-sensitive translations it recognizes spoken input, analyses and translates it and finally utters the translation
2. Challenges tackled by Verbmobil • only dialog translation system based on an open microphone condition • the signals can be from a close-speaking microphone, but also be in mobile phone quality • deals with spontaneous speech: • changes of tack in mid-sentences or mid-words • “ums” and “ers” • words that are left out in rapid speech • recognize a speaker’s mistakes and translate what he tries to say
Prosodic level: detecting self-corrections • Example: • „Ja, ich weil also würde mal sagen äh vorschlagen, wir könnten uns am äh 7. treffen so im Mai" • statistical and linguistic analysis translation will be based on the date: • “How about the seventh of May?" • all the disfluencies have been filtered out • Semantic analysis • Example: • "Geht es bei Ihnen?" • “Do we meet at your place?" ( "Wo treffen wir uns?") • “Is it possible for you?” (“Sollen wir uns im April treffen?“)
Speaker-adaptive system: • speaker-independent mode recognition of language by adaptation • several methods are used to adjust to the acoustic characteristics of the speaker’s voice, the speaking rate and pronunciation variants due to dialectal diversity of the users community Multiparty negotiation: • clear interaction goal in a negotiation task • sometimes argumentative dialogs • has to deal with a much richer set of dialog acts
Verbmobil = a hybrid system • deep and shallow processing schemes • corpus-based and rule-based methods • combination of machine learning results from large corpora with linguistics’ hand-crafted knowledge sources
3. Verbmobil’s Massive Data Collection Effort • Multi-channel recording: speech recognizers were trained on data sets with various audio signal qualities: • various speakers using different signals were recorded • Verbmobil speech corpora • partitur format uses 15 stratas of annotations: • 2 transliteration variants • lexical orthography • canonical pronunciation
automatic phonological segmentation • word segmentation • prosodic segmentation • dialog acts • noises • superimposed speech • syntactic category • word category • syntactic function • prosodic boundaries
Multilingual Verbmobil corpus • includes bilingual dialogs and aligned bilingual transliterations • annotations on: morpho-syntax, phrase structure and predicate- argument structure • end-to-end evaluations: the robustness, coverage and accuracy of a speech-to-speech translation system for spontaneous dialogs depends critically on the quantity and quality of the training corpora
4. The main Components of Verbmobil • 3 speech recognizers and three speech synthesizers for German, English and Japanese • multi-engine parsing architecture (three parsers) • production of fragmentary analyses combined in a chart of VIT structures
The main Components of Verbmobil • the module “deep analysis” is based on a wide-coverage unification grammar • theoretical clarity & elegance of linguistic analyses • statistical translation module – prosodic information about phrase boundaries
Two components for case-based translation: • substring-based translation – • method for synchronous interpretation • combined with patterns for word order switching and word cluster information • date, time & naming expressions are recognized by definite clause grammars (DCGs) • dialog-act based translation– • includes statistical classification of 19 dialog acts & a cascades of more than 300 finite-state transducers
dialog-act based translation– • statistical dialog classifier takes the previous dialog history into account • dialog memory <= shallow interlingual representation of an utterance, topic and focus information, deep semantic representation
plan processor structures an ongoing dialog in different dialog phases, games and moves dialog acts= the terminal nodes of the tree structure that represents the dialog structure interference services for recognition of anaphora or ellipsis temporal reasoning transforms expressions of time into fully specified times and dates Other important functions:
Other important functions: • contextual reasoning • the final Verbmobil system - more than 20000 transfer rules • semantic-based transfer is extremely fast • microplaner: subordination, aggregation, focus and theme control as well as anaphora generation • the synthesizer - the best available synthesis segments (=> exploits the syntactic, prosodic and discourse information provided by previous processing stages)
5. Prosodic Information • Verbmobil uses detailed prosodic information at all stages • Other recent speech recognition systems can only identify questions vs. declarative • Prosodic information is passed through the whole translation process • Source utterance Target utterance
Vermobil ´s Multilingual Prosody Module (2) • Input: speech signal and the Word Hypotheses Graph (WHG) • Output: annotated WHG with prosodic information for each recognised word • Classification of phrase boundaries, accented words and sentence mood • Use of probabilistic prosodic information search process for syntactic analysis is reduced dramatically
Vermobil ´s Multilingual Prosody Module (3) • Clause boundary marks are important for punctuation marks in written language • Prosody module also provides information for dialog act segmentation • Prosodic features concerning sentence mood are also used by the translation module, if there is not enough syntactic or semantic evidence • The extracted prosodic features are used for speaker-adaptation in general
6. The Multi-Blackboard Architecture of Verbmobil • Final Verbmobil system: 69 interactive modules: • All modules communicate when the speech input (source language) is transformed into the target language • Parallel processing schemes are used in order to translate under real-time conditions
Modules have several instances (not only one) • Example: 2 German Speakers in a Multiparty Conversation • 2 instances of the German speech recognition module are needed • Intermediate results at each stage • Selection modules can choose the most promising result at each processing stage
Architecture is based on packed representations • Charts with underspecified representations reduce uncertainties • Use of confidence values, constraints, probabilities and alternate hypotheses
7. Verbmobil’s Multi-Engine Approach • Verbmobil performs language identification, parsing and translation with several engines concurrently • Multi-engine parsing results are combined and merged into a single chart • A statistical selection module chooses between the alternate results • Only a single translation is used for generating the system’s output
Verbmobil’s Multi-Engine Approach • Verbmobil uses three parallel parsing threads • An incremental chunk parser • A statistical parser • A HPSG parser • Each parser uses a semantic component to transform its analysis results into a semantic representation term
Verbmobil’s Multi-Engine Approach • Verbmobil uses five translation engines that cover a spectrum of translation methods • The language identification component of Verbmobil uses also a multi-engine approach to identify each user’s language ( German, English, Japanese ) → The error rate for is only 7.3%
8. Summarizing Dialogs • Verbmobil has the ability to generate dialog summaries • The dialog summary can be produced on demand after the end of a conversation • The summaries are based on the semantic representation of all dialog turns • Each participant can ask for a written summary of the dialog in their own language
9. Conclusion challenges: • a speaker-independent and bidirectional speech-to-speech translation system for spontaneous dialogs in mobile situations • should work in an open microphone mode and can cope with speech over GSM mobile phones • for three language pairs (German, Japanese and English) • vocabulary size: more than 10.000 word forms • average processing time of four times of the input signal duration • word recognition rate of more than 75 % for spontaneous speech • more than 80 % of the translations should be approximately correct • 90 % of the dialog tasks should end successfully • Verbmobil has successfully met all the project goals and even surpassed some of them
Success because of competition • various teams within the project developed rival solutions to particular tasks, with formal evaluations to find the most successful or to combine it with the next best solutions to improve the overall performance of the system
Solution of the problem • can only be cracked by the combined muscle of deep and shallow processing approaches: • deep processing can be used for merging, completing and repairing the results of shallow processing strategies • shallow methods can be used to guide the search in deep processing • statistical methods must be augmented by symbolic models to achieve higher accuracy and broader coverage and are useful to learn operators or selection strategies for symbolic processes
The research: • more than 900 young researchers from disparate disciplines worked on the project • brought researchers in Germany together across the language/speech and the academic/industrial divides • professional software engineers with no particular language or speech background were responsible for ensuring that the software is robust and maintainable • various other patents and inventions resulting from Verbmobil useful for building, improving or evaluating natural language and speech algorithms or systems in coming years
Bibliography • http://verbmobil.dfki.de/Vm-Buch.final.html • http://verbmobil.dfki.de/facts.html • Wahlster, W. Verbmobil: Foundations of Speech-to-Speech Translation. Berlin, Heidelberg, New York, Barcelona, Hong Kong, London, Milan, Paris, Singapore, Tokyo: Springer, 2000