470 likes | 698 Views
Speech-to-Speech MT Design and Engineering. Alon Lavie and Lori Levin MT Class April 16 2001. Outline. Design and Engineering of the JANUS speech-to-speech MT system The Travel & Medical Domain Interlingua (IF) Portability to new domains: ML approaches Evaluation and User Studies
E N D
Speech-to-Speech MTDesign and Engineering Alon Lavie and Lori Levin MT Class April 16 2001
Outline • Design and Engineering of the JANUS speech-to-speech MT system • The Travel & Medical Domain Interlingua (IF) • Portability to new domains: ML approaches • Evaluation and User Studies • Open Problems, Current and Future Research
Overview • Fundamentals of our approach • System overview • Engineering a multi-domain system • Evaluations and user studies • Alternative translation approaches • Current and future research
JANUS Speech Translation • Translation via an interlingua representation • Main translation engine is rule-based • Semantic grammars • Modular grammar design • System engineered for multiple domains • Recent focus on domain portability • using machine learning for rapid extension to a new domain
The C-STAR Travel Planning Domain General Scenario: • Dialogue between one traveler and one or more travel agents • Focus on making travel arrangements for a personal leisure trip (not business) • Free spontaneous speech
The C-STAR Travel Planning Domain Natural breakdown into several sub-domains: • Hotel Information and Reservation • Transportation Information and Reservation • Information about Sights and Events • General Travel Information • Cross Domain
Semantic Grammars • Describe structure of semantic concepts instead of syntactic constituency of phrases • Well suited for task-oriented dialogue containing many fixed expressions • Appropriate for spoken language - often disfluent and syntactically ill-formed • Faster to develop reasonable coverage for limited domains
Semantic Grammars Hotel Reservation Example: Input: we have two hotels available Parse Tree: [give-information+availability+hotel] (we have [hotel-type] ([quantity=] (two) [hotel] (hotels) available)
The SOUP Parser • Specifically designed to parse spoken language using domain-specific semantic grammars • Robust - can skip over disfluencies in input • Stochastic - probabilistic CFG encoded as a collection of RTNs with arc probabilities • Top-Down - parses from top-level concepts of the grammar down to matching of terminals • Chart-based - dynamic matrix of parse DAGs indexed by start and end positions and head cat
The SOUP Parser • Supports parsing with large multiple domain grammars • Produces a lattice of parse analyses headed by top-level concepts • Disambiguation heuristics rank the analyses in the parse lattice and select a single best path through the lattice • Graphical grammar editor
SOUP Disambiguation Heuristics • Maximize coverage (of input) • Minimize number of parse trees (fragmentation) • Minimize number of parse tree nodes • Minimize the number of wild-card matches • Maximize the probability of parse trees • Find sequence of domain tags with maximal probability given the input words: P(T|W), where T= t1,t2,…,tn is a sequence of domain tags
JANUS Generation Modules Two alternative generation modules: • Top-Down context-free based generator - fast, used for English and Japanese • GenKit - unification-based generator augmented with Morphe morphology module - used for German
Modular Grammar Design • Grammar development separated into modules corresponding to sub-domains (Hotel, Transportation, Sights, General Travel, Cross Domain) • Shared core grammar for lower-level concepts that are common to the various sub-domains (e.g. times, prices) • Grammars can be developed independently (using shared core grammar) • Shared and Cross-Domain grammars significantly reduce effort in expanding to new domains • Separate grammar modules facilitate associating parses with domain tags - useful for multi-domain integration within the parser
Translation with Multiple Domain Grammars • Parser is loaded with all domain grammars • Domain tag attached to grammar rules of each domain • Previously developed grammars for other domains can also be incorporated • Parser creates a parse lattice consisting of multiple analyses of the input into sequences of top-level domain concepts • Parser disambiguation heuristics rank the analyses in the parse lattice and select a single best sequence of concepts
DomainPortability: Travel to Medical Knowledge-Based Methods Re-usability of knowledge sources for translation and speech recognition Corpus-Based Methods Reduce the amount of new training data for translation and speech recognition
Background • New domain: Medical • Doctor-patient diagnostic conversations • Global importance in emergencies and in machine translation for remote health care • Synergy with Lincoln Lab • Joint evaluation • Joint interlingua • Test case for portability
Portability • Advantage: Interlingua • Problem: Writing semantic grammars • Domain dependent • Requires time, effort, and expertise • Approach: • Grammar modularity • Domain action learning • Automatic/Interactive semantic grammar induction
Hybrid Stat/Rule-based Analysis • Developing large coverage semantic analysis grammars is time consuming difficult to port analysis system to new domains • “low-level” argument grammars are more domain-independent: contain many concepts that are used across domains: time, location, prices, etc. • “high-level” domain-actions are domain-specific, must be redeveloped for each new domain: give-info+onset+symptom • Tagging data sets with interlingua representations is less time consuming, needed anyway for system development
Hybrid Rule/Stat Approach • Combines grammar-based and statistical approaches to analysis: • Develop semantic grammars for phrase-level arguments that are more portable to new domains • Use statistical machine learning techniques for classifying into domain-actions • Porting to a new domain requires: • developing argument parse rules for new domain • tagging training set with domain-actions for new domain • training the classifiers for domain-actions on the tagged data
The Hybrid Analysis Process • Parse an utterance for arguments • Segment the utterance into sentences • Extract features from the utterance and the single best parse output • Use a learned classifier to identify the speech act • Use a learned classifier to identify the concept sequence • Combine into a full parse
Argument Parsing • The SOUP parser produces a forest of parse trees that cover as much of the input as possible • The parse forest can be a mixture of trees allowed by any of the grammars • Only the best parse is used for further processing
Argument Parse Example We have a double room available for you at twenty-three thousand five hundred yen [=availability=]::PSD ( we have [super_room-type=] ( [room-type=] ( a [room:double] ( double room ) ) ) available ) [arg-party:for-whom=]::ARG ( for [you] ( you ) ) [arg:time=]::ARG ( [point=] ( at [hour-minute=] ( [big:hour=] ( [big:23] ( twenty-three ) ) ) ) ) [arg:super_price=]::ARG ( [price=] ( [one-price:main-quantity=] ( [n-1000=] ( thousand ) [price:n-100=] ( five hundred ) ) [currency=] ( [yen] ( yen ) ) ) )
Automatic Classification of Domain Actions • Train classifiers for speech acts and concepts • Training data: Utterances labeled with speech act, concepts, and best argument parse • Input features • n most common words • Arguments and pseudo-arguments in best parse • Speaker • Predicted speech act (for concept classifier)
Full Parse Example We have a double room available for you at twenty-three thousand five hundred yen give-information+availability+room ( [=availability=]::PSD ( we have [super_room-type=] ( [room-type=] ( a [room:double] ( double room ) ) ) available ) [arg-party:for-whom=]::ARG ( for [you] ( you ) ) [arg:time=]::ARG ( [point=] ( at [hour-minute=] ( [big:hour=] ( [big:23] ( twenty-three ) ) ) ) ) [arg:super_price=]::ARG ( [price=] ( [one-price:main-quantity=] ( [n-1000=] ( thousand ) [price:n-100=] ( five hundred ) ) [currency=] ( [yen] ( yen ) ) ) ) )
Classification Results UsingMemory-based (TiMBL) Classifiers
Status and Open Research • Preliminary analysis engine implemented, currently used for travel domain in NESPOLE! • Areas for further research and development: • Explore a variety of classifiers • Explore features for domain-action classification • Classification compositionality – how to claissify the components of the domain-action separately and combine them? • Taking advantage of additional knowledge sources: the interlingua specification, dialogue context • Better address segmentation of utterance into DAs
Automatic Induction of Semantic Grammars • Seed grammar for a new domain has very limited coverage • Corpus of development data tagged with interlingua representations available • Expand the seed grammar by learning new rules for covering the same domain-actions • First step: how well can we do with no human intervention?
Outline of Semantic Grammar Induction IF Parser Tree Matching Linearization Hypotheses Generation Rules Management Seed Grammar s[gi+onset+sym] ( [manner=] [sym-loc=] *+became [adj:sym-name=] ) Rules Induction Learned Grammar Knowledge
Human vs Machine Experiment • Seed grammar • Extended by a human • Extended by automatic semantic grammar induction
Seed Grammar Medical I have a burning sensation in my foot. Cross Domain Hello. My name is Sam. Medical Around 200 rules Around 600 rules and growing Shared Around 100 rules and 6000 lexical items
A Parse Tree [request-information+existence+body-state]::MED ( WH-PHRASES::XDM ( [q:duration=]::XDM ( [dur:question]::XDM ( how long ) ) ) HAVE-GET-FEEL::MED ( GET ( have ) ) you HAVE-GET-FEEL::MED ( HAS ( had ) ) [super_body-state-spec=]::MED ( [body-state-spec=]::MED ( ID-WHOSE::MED ( [identifiability=] ( [id:non-distant] ( this ) ) ) BODY-STATE::MED ( [pain]::MED ( pain ) ) ) ) )
Manual Grammar Development About five additional days of development after the seed grammar was finalized Focusing on medical rules only Domain-independent rules remain untouched
Development and evaluation sets • Development set: 133 sentences • from one dialog • Evaluation set: 83 sentences • from two dialogs • unseen speakers • Only SDUs that could be manually tagged with a full IF according to the current specification were included.
Grading Procedure: Recall and Precision of IF Components c:give-information+ speech act existence+body-state concepts (body-state-spec=(pain, top-level argument identifiability=no), sub-argument body-location= top-level argument (inside=head)) sub-argument • Recall • ignored if number of items is 0 • Precision • ignored if 0 out of 0
Seed Extended Learned Speech Act Recall 43.3 48.2 49.3 Precision 71.0 75.0 45.8 Concept List Recall 2.2 10.1 32.5 Precision 12.5 42.2 25.1 Top-Level Args Recall 0.0 7.2 29.6 Precision 0.0 42.2 34.4 Top-Level Values Recall 0.0 8.3 29.8 Precision 0.0 50.0 39.2 Sub-Level Args Recall 0.0 28.3 14.1 Precision 0.0 48.2 12.6 Sub-level Values Recall 1.2 28.3 14.1 Precision 6.2 48.2 12.9 Human vs. Machine: Evaluation Results
User Studies • We conducted three sets of user tests • Travel agent played by experienced system user • Traveler is played by a novice and given five minutes of instruction • Traveler is given a general scenario - e.g., plan a trip to Heidelberg • Communication only via ST system, multi-modal interface and muted video connection • Data collected used for system evaluation, error analysis and then grammar development
System Evaluation Methodology • End-to-end evaluations conducted at the SDU (sentence) level • Multiple bilingual graders compare the input with translated output and assign a grade of: Perfect, OK or Bad • OK = meaning of SDU comes across • Perfect = OK + fluent output • Bad = translation incomplete or incorrect
August-99 Evaluation • Data from latest user study - traveler planning a trip to Japan • 132 utterances containing one or more SDUs, from six different users • SR word error rate 14.7% • 40.2% of utterances contain recognition error(s)
Current and Future Work • Expanding the interlingua: covering descriptive as well as task-oriented sentences • Developing the new portable approaches • development of the server-based architecture for supporting multiple applications: • NESPOLE!: speech-MT for advanced e-commerce • C-STAR: speech-to-speech MT over mobile phones • LingWear: MT and language assistance on wearable devices
Students Working on the Project • Chad Langley: Hybrid Rule/Stat Analysis, Speech MT architecture • Ben Han: Automatic Grammar Induction • Alicia Tribble: Interlingua and grammar development for Medical Domain • Joy Zhang, Erik Peterson: Chinese EBMT for LingWear
The JANUS Speech-MT Team • Project Leaders: Lori Levin, Alon Lavie, Tanja Schultz, Alex Waibel • Grammar and Component Developers: Donna Gates, Dorcas Wallace, Kay Peterson, Alicia Tribble, Chad Langley, Ben Han, Celine Morel, Susie Burger, Vicky MacLaren, Kornel Laskowski, Erik Peterson