610 likes | 910 Views
A Finite-State Approach to Machine Translation. Srinivas Bangalore Giuseppe Riccardi AT&T Labs-Research NAACL 2001, Pittsburgh. Overview. Motivation Stochastic Finite State Machines Learning Machine Translation Models Case study MT for Human-Machine Spoken Dialog
E N D
A Finite-State Approach to Machine Translation Srinivas Bangalore Giuseppe Riccardi AT&T Labs-Research NAACL 2001, Pittsburgh June 6, 2001
Overview • Motivation • Stochastic Finite State Machines • Learning Machine Translation Models • Case study • MT for Human-Machine Spoken Dialog • Experiments and Results June 6, 2001
Speech Understanding Case P( C ) Semantics C P(W|C) Syntax L P(A|W) Acoustic A max P(A,W,C) min A L C Speech Recognizer • ATIS 1994 DARPA Evaluation • (G. Riccardi et al., ICASSP 1995, E. Bocchieri et al. SLT Workshop 1995 • Levin et al., SLT Workshop 95) June 6, 2001
Source Spoken Language Target Spoken Language Motivation • Finite State Transducers (FST) • Unified formalism to represent symbolic transductions • Learnability • Automatically train transductions from (parallel) corpora • Speech-to-Speech Machine Translation chain • Combining speech and language sciences June 6, 2001
Overview • Motivation • Stochastic Finite State Machines • Learning Machine Translation Models • Case study • MT for Human-Machine Spoken Dialog • Experiments and Results June 6, 2001
Finite State Transducers • Weighted Finite State Transducers (FST) • Algebraic Operations (1 + 2, 1 2…) • Composable E-S = E-J J-S • Minimization min(1) • Stochastic transductions (E-S :E* X S* [0,1]) • Joint Probability Decomposition: P(X1, X2, …, XN) P(X1) P(X2| X1)…P(XN| XN-1) • ..and computation 1 2….N June 6, 2001
Stochastic Transducers I,e I,e I,3 I,1 3 4 1 2 1 2 V/4 5 cool,<adv>/0.3 cool,<adj>/0.3 1 2 cool,<noun>/0/2 cool,<verb>/0.2 June 6, 2001
Learning FSTs • Data-driven Learnig FSTs from large corpora. • Learnability • Finite History Context (N-gram). • Generalization • Unseen Event modeling (back-off) • Class-based n-grams. • Phrase grammar (long-distance dependency) • Context-free grammar approximation. • Large-scale transductions • Efficient state and transition function model • Variable Ngram Stochastic Machine (VNSM) June 6, 2001
VNSM: the state and transition space • Bottom-up approach • Each state is associated to n-tuple in the corpus • Each transition is associated to adjacent strings • Parametrization: #states #n-tuples in the corpus #transitions #n-tuples in the corpus #-transition #(n-1)-tuples in the corpus VNSM recognizes W V* (V is the dictionary) June 6, 2001
VNSM:Unseen Event Modeling(the power of amnesia/reminiscence) w4 History= w2, w3 History= w3 History=“” June 6, 2001
VNSM: probability distributions • Probability Distribution over W V* • Parameter tying • Probability training June 6, 2001
Overview • Motivation • Stochastic Finite State Machines • Learning Machine Translation Models • Case study • MT for Human-Machine Spoken Dialog • Experiments and Results June 6, 2001
Stochastic FST Machine Translation • Decompose language translation into two independent processes. Lexical Choice : searching the target language words Word Reordering: searching the correct word order • Modeling the two processes as stochastic finite state transductions • Learning the transductions from bilingual corpora. Speech and Language finite state transduction chain Source Spoken Language Target Spoken Language June 6, 2001
Stochastic Machine Translation MT • Noisy-channel paradigm (IBM) • Stochastic Finite State Transducer Model June 6, 2001
Learning Stochastic Transducers • Given the input-output pair training set • Align the input and output language sequences: • Estimate the joint probability via VNSMs • Local reordering • Sentence-level reordering June 6, 2001
Pairing and Aligning (1) • Source-target language pairs • Sentence Alignment • Automatic algorithm (Alshawi, Bangalore and Douglas, 1998) Spanish :ajá quiero usar mi tarjeta de crédito English : yeah I wanna use my credit card Alignment : 1 3 4 5 7 0 6 June 6, 2001
Learning SFST from Bi-language • Bi-language: each token consists of a source language word with its target language word. • Ordering of tokens: source language order or target language order • ajá quiero usar mi tarjeta de crédito • yeah I wanna use my credit card • (ajá,yeah) (e,I) (quiero,wanna) (usar,use) (mi,my) (tarjeta,card) (de, e) (crédito,credit) June 6, 2001
Learning Bilingual Phrases • Effective translation of text chunks (e.g. collocations) • Learn bilingual phrases • Joint entropy minimization on bi-language corpus • Weighted Mutual Information to rank bilingual phrases • Phrase-based VNST • Local Reordering of phrases una llamada de larga distancia a call long distance a long distance call VNST Local Reordering June 6, 2001
Local Reordering • Spanish Reordered Phrase=min(S TLM) • Word permutation machineexpensive • S is the “sausage” machine • TLM is the target language model June 6, 2001
Lexical Reordering • Output of the lexical choice transducer: sequence of target language phrases. • like to makeI'd calla calling cardplease • Words in phrases are in target language word order. • However, phrases need to be reordered in target language word order. • Reordered: • I'd like to make a calling card call please June 6, 2001
Lexical Reordering Models • Tree-based model • Impose a tree structure on a sentence (Alshawi et.al ACL98) • English: I'd like to charge this to my home phone June 6, 2001
私は (I) 私は (I) 家の (home) これを (this) これを (this) 私の (my) 電話に (phone) したいのです (like) したいのです (like) 家の (home) チャージ (charge) 私の (my) 電話に (phone) チャージ (charge) Lexical Reordering Models • Reordering using tree-local reordering rules. Eng-Jap:私は したいのですチャージこれを私の家の 電話に Japanese: 私はこれを 私の 家の 電話に チャージしたいのです June 6, 2001
Lexical Reordering Models (contd.) • Dependency tree represented as a bracketed string (bounded) with reordering instructions. e:[したいのです:したいのですe:-1e:[チャージ:チャージe:]e:] • Training VNSTs from bracketed corpus • Output of lexical reordering VNST: strings with reordering instructions. • Instructions are composed with “interpreter” FST to form target language sentence. June 6, 2001
Tree Reordering • Sentence-level reordering • Mapping sentence tree structures English :my card credit (spanish order) English : my credit card (english order) Transduction 1 card -1 +1 my credit Transduction 2 (alignment statistics) card -2 -1 my credit Transduction 3 June 6, 2001
ASR-based Speech Translation Acoustic Model Training Alignment VNST Learning Lexicon FSM Bi-Phrase Learning Speech Recognizer Tree Reordering June 6, 2001
Overview • Motivation • Stochastic Finite State Machines • Learning Machine Translation Models • Case study • MT for Human-Machine Spoken Dialog • Experiments and Results June 6, 2001
MT Evaluation • Lexical Accuracy (LA) • Bag of words. • Translation Accuracy (TA) • Based on string alignment • Application-driven evaluation • “How May I Help You?” • Spoken dialog for call routing • Classification based on salient phrase detection June 6, 2001
HELP DA rate . . . area code billing credit Automated Services and Customer Care via Natural Spoken Dialog • Prompt is “AT&T. How may I help you?” • User responds with unconstrained fluent speech • Spoken Dialog System for call routing June 6, 2001
Examples • Yes I like to make this long distance call area code x x x x x x x x x x • Yeah I need the area code for rockmart georgia • Yeah I’m wondering if you could place this call for me I can’t seem to dial it it don’t seem to want to go through for me June 6, 2001
Call-Classification Performance • False Rejection Rate: • Probability of rejecting a call, • given that the call-type is one • of the 14 call-type set. • Probability Correct: • Probability of correctly • classifying a call , given that • the call is not rejected. June 6, 2001
MT evaluation on HMIHY June 6, 2001
DEMO June 6, 2001
Conclusion • Stochastic Finite State based approach is viable and effective for limited domain MT. • Finite-state model chain for complex speech and language constraints. • Multilingual speech application enabled by MT • Coupling of ASR and MT http://www.research.att.com/~srini/Projects/Anuvaad/home.html June 6, 2001
Biblio -J. Berstel “Transductions and Context Free Languages” Teubner Studienbüchner -G. Riccardi, R. Pieraccini and E. Bocchieri, "Stochastic Automata for Language Modeling", Computer Speech and Language, 10, pp. 265-293, 1996. -Fernando C. N. Pereira and Michael Riley. Speech Recognition by Composition of Weighted Finite Automata . Finite-State Language Processing. MIT Press, Cambridge, Massachusetts. 1997 -S. Bangalore and G. Riccardi, "Stochastic Finite-State Models for Spoken Language Machine Translation", Workshop on Embedded Machine Translation Systems, NAACL, pp. 52-59, Seattle, May 2000. More references on http://www.research.att.com/info/dsp3 http://research.att.com/info/dsp3 June 6, 2001
Stochastic Finite State Models:from concepts to speech 1993 • Variable Ngram Stochastic Automata (VNSA) • Concept Modeling for NLU • Word Sequence Modeling for ASR • Phonotactic Transducers (context-to-phone) • Tree-structured Transducers (phone-to-word) • Stochastic-FSM based ASR (context-to-concept) • ATIS Evaluation: it actually worked! 1994 June 6, 2001
Why it worked? • Symbolic representation (SFSM) for probabilistic sequence modeling (words, concepts,..). • Learning algorithms • Cascade (phrase grammar -> {phrases, word classes} -> words) • Machine Combination • Context-to-Phone, Phone-to-Word, Context-to-Grammar (CLG) • Decoding very simple and fast (Viterbi and Beam-Width Search) June 6, 2001
Tarjeta de credito Credit card ASR-MT Engine Credit card Multilingual Speech Processing • Finite state chain allow for: • Speech and Language coupling (e.g. prosody, recognition errors) • Integrated multilingual processing June 6, 2001
Speech Translation • Previous approaches to Speech Translation • Source language ASR • Translation Model • Finite-state Model based Speech Translation • Source Language Acoustic Model • Lexical Choice Model • Lexical Reordering Model June 6, 2001
Learning the state space and state transition function (revised) • For each suffix in the corpus, we create two states (one for string recognition and the other for backoff, epsilon transition). • The size of the automaton is still linear in the corpus size • The stochastic automaton is able to compute word probability for all strings in X*!. June 6, 2001
“elections” States/p1 the president of United History(4)=“the president of United” PrevClass=Adj PrevPrevClass=Function Word Trigger(10)=“Elections” Airlines/p2 /p3 State Transition Probability Stochastic Finite State Automata/Transducers Word Prediction ……the President of United ???…… June 6, 2001
Learning Lexical Choice Models • English utterances recorded from customer calls. • Manually translated into Japanese/Spanish. • ``Bunsetsu'' like tokenization for Japanese. • Alignment English: I'd like to charge this to my home phone Japanese: 私は これを 私の 家の 電話に チャージ したいのです Alignment: 1 7 0 6 2 0 3 4 5 • Bilanguage I'd_私はlike_したいのですto_echarge_チャージthis_これをto_emy_私のhome_家のphone_電話に June 6, 2001
Learning Bilingual Stochastic Transducers • Learn stochastic transducers from bilanguage (Embedded MT 2000) • Learn automatically bilingual phrases • Reordering within phrases. エイ ティー アンド ティーA T and T 私の 家の 電話にto my home phone 私は コレクト コールをI need to make かける 必要がありますa collect call tarjeta de credito credit card una llamada de larga distancia a long distance call June 6, 2001
Lexical Choice Transducer • Language Model: N-gram model built on phrase-chunked bilanguage. • A combination of phrases and words maximize predictive power and minimize number of parameters • Resulting finite-state automaton on bilanguage vocabulary is converted into a finite-state transducer. June 6, 2001
Lexical Reordering • Output of the lexical choice transducer: sequence of target language phrases. • like to makeI'd calla calling cardplease • Words in phrases are in target language word order. • However, phrases need to be reordered in target language word order. • Reordered: • I'd like to make a calling card call please June 6, 2001
Lexical Reordering Models • Tree-based model • Impose a tree structure on a sentence (Alshawi et.al ACL98) • English: I'd like to charge this to my home phone June 6, 2001
私は (I) 私は (I) 家の (home) これを (this) これを (this) 私の (my) 電話に (phone) したいのです (like) したいのです (like) 家の (home) チャージ (charge) 私の (my) 電話に (phone) チャージ (charge) Lexical Reordering Models • Reordering using tree-local reordering rules. Eng-Jap: 私は したいのですチャージこれを私の家の 電話に Japanese: 私はこれを 私の 家の 電話に チャージしたいのです June 6, 2001
Lexical Reordering Models (contd.) • Dependency tree represented as a bracketed string with reordering instructions. e:[したいのです:したいのですe:-1e:[チャージ:チャージe:]e:] • Lexical reordering FST: Result of training a stochastic finite-state transducer on the corpus of bracketed strings. • Output of lexical reordering FST: strings with reordering instructions. • Instructions are interpreted to form target language sentence. June 6, 2001
私は (I) 私は (I) 家の (home) これを (this) これを (this) 私の (my) 電話に (phone) したいのです (like) したいのです (like) 家の (home) チャージ (charge) 私の (my) 電話に (phone) チャージ (charge) Translation using stochastic FSTs • Sequence of finite-state transductions English: I’d like to charge this to my home phone Eng-Jap: 私は したいのですチャージこれを私の家の 電話に Japanese: 私はこれを 私の 家の 電話に チャージしたいのです June 6, 2001
Spoken Language Corpora • Prompt: How may I help you? • Examples Yeah I need the area code for rockmart georgia Yes I'd like to make this long distance call area code x x x x x x x x x x Yeah I'm wondering if you could place this call for me I can't seem to dial it it don't seem to want to go through for me • Parallel corpora: English, Japanese, Spanish. June 6, 2001
Evaluation Metric • Evaluation metric for MT is a complex issue. • String edit distance between reference string and result string (length in words: R) • Insertions (I) • Deletions (D) • Moves = pairs of Deletions and Insertions (M) • Remaining Insertions (I') and Deletions (D') • Simple String Accuracy = 1 – (I + D + S) / R • Generation String Accuracy = 1 – (M + I' + D' + S) / R June 6, 2001