130 likes | 149 Views
Explore parsing and analysis techniques for speech-to-speech MT, addressing robustness and portability challenges. Focus on target representation, robustness issues like disfluencies, and adaptability to language diversity. Learn from case studies like CMU’s JANUS and MilliRADD.
E N D
Parsing and Analysisfor Speech-to-Speech MT Discussion Leader: Alon Lavie Carnegie Mellon University
Main Issues • Target Representation • Robustness Issues: • Speech disfluencies and spontaneity • Dealing with output from Speech Recognizers • Portability and Adaptation Issues: • Domain portability • Language portability • Language diversity Issues Babylon Workshop
What is the Target Representation? • Different MT approaches require different analysis representations: • Interlingua: deep and detailed vs. shallow • Transfer approaches: syntactic representation • Some MT approaches don’t require “real” parsing at all: • Statistical MT • Example-based MT Babylon Workshop
Robustness Issues • Speech disfluencies and spontaneity: • Spoken language is very different than written text: short turns, back-channels, looser notion of grammar, fixed expressions/constructions, elided words and constituents • (“wanna go eat?” => Do you want to go to eat?) • Disfluencies are common: • Filled pauses: uh, um, “you know” • False starts and repairs: “I got to… I think I have to be there at”… • Dealing with output of Speech Recognizers: • High word error rates • Lack of punctuation and “sentence” boundaries Babylon Workshop
Robustness Issues • How do we build parsers/analyzers that can deal with such language? • Focus more on semantics rather than syntax • Focus on language properties of the domain (fixed expressions, task oriented language) • “Clean-up” the language: • Detect and remove disfluencies • Detect SDUs (semantic dialogue units) • Use prosodic information: pauses, intonation Babylon Workshop
Portability Issues • Can we develop parsing/analysis methods that can be rapidly adapted to new languages and new domains? • Some trends: • Focus on machine learning and trainable approaches: data-driven MT approaches, grammar induction, transfer-rule learning • Multi-engine systems that can adapt to the resources that are available Babylon Workshop
Language Diversity Issues • Different languages have very different characteristics - how do we deal with the diversity of phenomena? • Morphologically synthetic to analytic • word order free to fixed • diverse dialects within languages • Some approaches more portable in nature, less sensitive to these differences • Related to language and domain portability Babylon Workshop
Example: CMU’s JANUS • Target Representation: shallow task-oriented interlingua representation: “I would like to take a vacation in val-di-fiemme" c:give-information+disposition+trip (disposition=(who=i, desire), visit-spec=(identifiability=no,vacation), location=(place-name=val_di_fiemme)) • Hybrid Statistical/Rule-based analysis approach: • Semantic phrases are parsed using a robust parser • Trainable classifier used for selection of domain action Babylon Workshop
Example: CMU’s MilliRADD:Rapidly Adaptable Data Driven MT With Limited Resources EBMT The reporter interviewed the ambassador in America. 记者采访在美 国里的大使 Multi-Engine Integration SMT iRBMT NB::NB : [PP “DE" NB] -> [NB PP] ((X3::Y1) ; Alignments (X1::Y2) ((X0 ppadj) = X1) ; X-side constraints (X0 = X3) (Y0 = X0) ; Transfer constraints (x2 == (x0 ppadj)) ; Generation constraints (x1 = x0)) PP::PP : [“ZAI" NP PREP] -> [PREP NP] ((X3::Y1) ; Alignments (X2::Y2) ((X3 loc) =c +) ; X-side constraints ((X0 obj) = X2) (X0 = X3) (Y0 = X0) ; Transfer constraints (X2 == (x0 obj)) ; Generation constraints (x1 = x0)) Automatic Learning of Xfer Rules Babylon Workshop
Example: Parsing the CHILDES Database • Goal: syntactic annotation of transcribed conversations of children and their parents • Target Representation: complete syntactic feature-structures: *MOT: you kicked it . %mor: pro|you v|kick-PAST pro|it . %fst: ((mood *declarative) (tense *past) (index 2) (subject ((cat pro) (num *sg) (pers 2) (case *nom) (index 1) (root *you))) (object ((cat pro) (sum sg) (pers 3) (case acc) (index 3) (root *it))) (root *kick) (cat v)) %cst: (sentence (decl (np (pro you)) (vp (vbar (v kicked) (np (pro it))))) (period .)) • Analysis Approach: multi-pass robust parser (LCFlex) that gradually relaxes constraints • [Sagae, Lavie & MacWhinney, IWPT-01] Babylon Workshop
Discussion Topics • Sharing of available data resources and components: • Robust parsers • Annotated data and bilingual corpora • Coding schemes: interlingua, other,… Babylon Workshop
Discussion Topics • Approaches to Parsing/Analysis: • Focus on specific approaches or support a wide variety of different approaches? • Common target representations? • Example: the C-STAR model – common interlingua representation, but independent analysis approaches • Multi-Engine systems: how do we put together an effective combined system? • Models of collaboration Babylon Workshop
Discussion Topics • Language and Domain Portability: • How do we encourage approaches that are more suited for fast adaptation to new languages and domains? Babylon Workshop