A Finite-State Approach to Machine Translation

A Finite-State Approach to Machine Translation Srinivas Bangalore Giuseppe Riccardi AT&T Labs-Research NAACL 2001, Pittsburgh June 6, 2001

Overview • Motivation • Stochastic Finite State Machines • Learning Machine Translation Models • Case study • MT for Human-Machine Spoken Dialog • Experiments and Results June 6, 2001

Speech Understanding Case P( C ) Semantics C P(W|C) Syntax L P(A|W) Acoustic A max P(A,W,C) min A  L  C Speech Recognizer • ATIS 1994 DARPA Evaluation • (G. Riccardi et al., ICASSP 1995, E. Bocchieri et al. SLT Workshop 1995 • Levin et al., SLT Workshop 95) June 6, 2001

Source Spoken Language Target Spoken Language Motivation • Finite State Transducers (FST) • Unified formalism to represent symbolic transductions • Learnability • Automatically train transductions from (parallel) corpora • Speech-to-Speech Machine Translation chain • Combining speech and language sciences June 6, 2001

Finite State Transducers • Weighted Finite State Transducers (FST) • Algebraic Operations (1 + 2, 1 2…) • Composable E-S =  E-J  J-S • Minimization min(1) • Stochastic transductions (E-S :E* X S* [0,1]) • Joint Probability Decomposition: P(X1, X2, …, XN) P(X1) P(X2| X1)…P(XN| XN-1) • ..and computation  1  2….N June 6, 2001

Stochastic Transducers I,e I,e I,3 I,1 3 4 1 2 1 2 V/4 5 cool,<adv>/0.3 cool,<adj>/0.3 1 2 cool,<noun>/0/2 cool,<verb>/0.2 June 6, 2001

Learning FSTs • Data-driven Learnig FSTs from large corpora. • Learnability • Finite History Context (N-gram). • Generalization • Unseen Event modeling (back-off) • Class-based n-grams. • Phrase grammar (long-distance dependency) • Context-free grammar approximation. • Large-scale transductions • Efficient state and transition function model • Variable Ngram Stochastic Machine (VNSM) June 6, 2001

VNSM: the state and transition space • Bottom-up approach • Each state is associated to n-tuple in the corpus • Each transition is associated to adjacent strings • Parametrization: #states  #n-tuples in the corpus #transitions  #n-tuples in the corpus #-transition  #(n-1)-tuples in the corpus VNSM recognizes W  V* (V is the dictionary) June 6, 2001

VNSM:Unseen Event Modeling(the power of amnesia/reminiscence) w4 History= w2, w3   History= w3   History=“”   June 6, 2001

VNSM: probability distributions • Probability Distribution over W  V* • Parameter tying • Probability training June 6, 2001

Stochastic FST Machine Translation • Decompose language translation into two independent processes. Lexical Choice : searching the target language words Word Reordering: searching the correct word order • Modeling the two processes as stochastic finite state transductions • Learning the transductions from bilingual corpora. Speech and Language finite state transduction chain Source Spoken Language Target Spoken Language June 6, 2001

Stochastic Machine Translation MT • Noisy-channel paradigm (IBM) • Stochastic Finite State Transducer Model June 6, 2001

Learning Stochastic Transducers • Given the input-output pair training set • Align the input and output language sequences: • Estimate the joint probability via VNSMs • Local reordering • Sentence-level reordering June 6, 2001

Pairing and Aligning (1) • Source-target language pairs • Sentence Alignment • Automatic algorithm (Alshawi, Bangalore and Douglas, 1998) Spanish :ajá quiero usar mi tarjeta de crédito English : yeah I wanna use my credit card Alignment : 1 3 4 5 7 0 6 June 6, 2001

Learning SFST from Bi-language • Bi-language: each token consists of a source language word with its target language word. • Ordering of tokens: source language order or target language order • ajá quiero usar mi tarjeta de crédito • yeah I wanna use my credit card • (ajá,yeah) (e,I) (quiero,wanna) (usar,use) (mi,my) (tarjeta,card) (de, e) (crédito,credit) June 6, 2001

Learning Bilingual Phrases • Effective translation of text chunks (e.g. collocations) • Learn bilingual phrases • Joint entropy minimization on bi-language corpus • Weighted Mutual Information to rank bilingual phrases • Phrase-based VNST • Local Reordering of phrases una llamada de larga distancia a call long distance a long distance call VNST Local Reordering June 6, 2001

Local Reordering • Spanish Reordered Phrase=min(S  TLM) • Word permutation machineexpensive • S is the “sausage” machine • TLM is the target language model June 6, 2001

Lexical Reordering • Output of the lexical choice transducer: sequence of target language phrases. • like to makeI'd calla calling cardplease • Words in phrases are in target language word order. • However, phrases need to be reordered in target language word order. • Reordered: • I'd like to make a calling card call please June 6, 2001

Lexical Reordering Models • Tree-based model • Impose a tree structure on a sentence (Alshawi et.al ACL98) • English: I'd like to charge this to my home phone June 6, 2001

私は (I) 私は (I) 家の (home) これを (this) これを (this) 私の (my) 電話に (phone) したいのです (like) したいのです (like) 家の (home) チャージ (charge) 私の (my) 電話に (phone) チャージ (charge) Lexical Reordering Models • Reordering using tree-local reordering rules. Eng-Jap:私はしたいのですチャージこれを私の家の電話に Japanese: 私はこれを私の家の電話にチャージしたいのです June 6, 2001

Lexical Reordering Models (contd.) • Dependency tree represented as a bracketed string (bounded) with reordering instructions. e:[したいのです:したいのですe:-1e:[チャージ:チャージe:]e:] • Training VNSTs from bracketed corpus • Output of lexical reordering VNST: strings with reordering instructions. • Instructions are composed with “interpreter” FST to form target language sentence. June 6, 2001

Tree Reordering • Sentence-level reordering • Mapping sentence tree structures English :my card credit (spanish order) English : my credit card (english order) Transduction 1 card -1 +1 my credit Transduction 2 (alignment statistics) card -2 -1 my credit Transduction 3 June 6, 2001

ASR-based Speech Translation Acoustic Model Training Alignment VNST Learning Lexicon FSM Bi-Phrase Learning Speech Recognizer Tree Reordering June 6, 2001

MT Evaluation • Lexical Accuracy (LA) • Bag of words. • Translation Accuracy (TA) • Based on string alignment • Application-driven evaluation • “How May I Help You?” • Spoken dialog for call routing • Classification based on salient phrase detection June 6, 2001

HELP DA rate . . . area code billing credit Automated Services and Customer Care via Natural Spoken Dialog • Prompt is “AT&T. How may I help you?” • User responds with unconstrained fluent speech • Spoken Dialog System for call routing June 6, 2001

Examples • Yes I like to make this long distance call area code x x x x x x x x x x • Yeah I need the area code for rockmart georgia • Yeah I’m wondering if you could place this call for me I can’t seem to dial it it don’t seem to want to go through for me June 6, 2001

Call-Classification Performance • False Rejection Rate: • Probability of rejecting a call, • given that the call-type is one • of the 14 call-type set. • Probability Correct: • Probability of correctly • classifying a call , given that • the call is not rejected. June 6, 2001

MT evaluation on HMIHY June 6, 2001

DEMO June 6, 2001

Conclusion • Stochastic Finite State based approach is viable and effective for limited domain MT. • Finite-state model chain for complex speech and language constraints. • Multilingual speech application enabled by MT • Coupling of ASR and MT http://www.research.att.com/~srini/Projects/Anuvaad/home.html June 6, 2001

Biblio -J. Berstel “Transductions and Context Free Languages” Teubner Studienbüchner -G. Riccardi, R. Pieraccini and E. Bocchieri, "Stochastic Automata for Language Modeling", Computer Speech and Language, 10, pp. 265-293, 1996. -Fernando C. N. Pereira and Michael Riley. Speech Recognition by Composition of Weighted Finite Automata . Finite-State Language Processing. MIT Press, Cambridge, Massachusetts. 1997 -S. Bangalore and G. Riccardi, "Stochastic Finite-State Models for Spoken Language Machine Translation", Workshop on Embedded Machine Translation Systems, NAACL, pp. 52-59, Seattle, May 2000. More references on http://www.research.att.com/info/dsp3 http://research.att.com/info/dsp3 June 6, 2001

Stochastic Finite State Models:from concepts to speech 1993 • Variable Ngram Stochastic Automata (VNSA) • Concept Modeling for NLU • Word Sequence Modeling for ASR • Phonotactic Transducers (context-to-phone) • Tree-structured Transducers (phone-to-word) • Stochastic-FSM based ASR (context-to-concept) • ATIS Evaluation: it actually worked! 1994 June 6, 2001

Why it worked? • Symbolic representation (SFSM) for probabilistic sequence modeling (words, concepts,..). • Learning algorithms • Cascade (phrase grammar -> {phrases, word classes} -> words) • Machine Combination • Context-to-Phone, Phone-to-Word, Context-to-Grammar (CLG) • Decoding very simple and fast (Viterbi and Beam-Width Search) June 6, 2001

Tarjeta de credito Credit card ASR-MT Engine Credit card Multilingual Speech Processing • Finite state chain allow for: • Speech and Language coupling (e.g. prosody, recognition errors) • Integrated multilingual processing June 6, 2001

Speech Translation • Previous approaches to Speech Translation • Source language ASR • Translation Model • Finite-state Model based Speech Translation • Source Language Acoustic Model • Lexical Choice Model • Lexical Reordering Model June 6, 2001

Learning the state space and state transition function (revised) • For each suffix in the corpus, we create two states (one for string recognition and the other for backoff, epsilon transition). • The size of the automaton is still linear in the corpus size • The stochastic automaton is able to compute word probability for all strings in X*!. June 6, 2001

“elections” States/p1 the president of United History(4)=“the president of United” PrevClass=Adj PrevPrevClass=Function Word Trigger(10)=“Elections” Airlines/p2 /p3 State Transition Probability  Stochastic Finite State Automata/Transducers Word Prediction ……the President of United ???…… June 6, 2001

Learning Lexical Choice Models • English utterances recorded from customer calls. • Manually translated into Japanese/Spanish. • ``Bunsetsu'' like tokenization for Japanese. • Alignment English: I'd like to charge this to my home phone Japanese: 私はこれを私の家の電話にチャージしたいのです Alignment: 1 7 0 6 2 0 3 4 5 • Bilanguage I'd_私はlike_したいのですto_echarge_チャージthis_これをto_emy_私のhome_家のphone_電話に June 6, 2001

Learning Bilingual Stochastic Transducers • Learn stochastic transducers from bilanguage (Embedded MT 2000) • Learn automatically bilingual phrases • Reordering within phrases. エイティーアンドティーA T and T 私の家の電話にto my home phone 私はコレクトコールをI need to make かける必要がありますa collect call tarjeta de credito credit card una llamada de larga distancia a long distance call June 6, 2001

Lexical Choice Transducer • Language Model: N-gram model built on phrase-chunked bilanguage. • A combination of phrases and words maximize predictive power and minimize number of parameters • Resulting finite-state automaton on bilanguage vocabulary is converted into a finite-state transducer. June 6, 2001

Lexical Reordering • Output of the lexical choice transducer: sequence of target language phrases. • like to makeI'd calla calling cardplease • Words in phrases are in target language word order. • However, phrases need to be reordered in target language word order. • Reordered: • I'd like to make a calling card call please June 6, 2001

Lexical Reordering Models • Tree-based model • Impose a tree structure on a sentence (Alshawi et.al ACL98) • English: I'd like to charge this to my home phone June 6, 2001

私は (I) 私は (I) 家の (home) これを (this) これを (this) 私の (my) 電話に (phone) したいのです (like) したいのです (like) 家の (home) チャージ (charge) 私の (my) 電話に (phone) チャージ (charge) Lexical Reordering Models • Reordering using tree-local reordering rules. Eng-Jap: 私はしたいのですチャージこれを私の家の電話に Japanese: 私はこれを私の家の電話にチャージしたいのです June 6, 2001

Lexical Reordering Models (contd.) • Dependency tree represented as a bracketed string with reordering instructions. e:[したいのです:したいのですe:-1e:[チャージ:チャージe:]e:] • Lexical reordering FST: Result of training a stochastic finite-state transducer on the corpus of bracketed strings. • Output of lexical reordering FST: strings with reordering instructions. • Instructions are interpreted to form target language sentence. June 6, 2001

私は (I) 私は (I) 家の (home) これを (this) これを (this) 私の (my) 電話に (phone) したいのです (like) したいのです (like) 家の (home) チャージ (charge) 私の (my) 電話に (phone) チャージ (charge) Translation using stochastic FSTs • Sequence of finite-state transductions English: I’d like to charge this to my home phone Eng-Jap: 私はしたいのですチャージこれを私の家の電話に Japanese: 私はこれを私の家の電話にチャージしたいのです June 6, 2001

Spoken Language Corpora • Prompt: How may I help you? • Examples Yeah I need the area code for rockmart georgia Yes I'd like to make this long distance call area code x x x x x x x x x x Yeah I'm wondering if you could place this call for me I can't seem to dial it it don't seem to want to go through for me • Parallel corpora: English, Japanese, Spanish. June 6, 2001

Evaluation Metric • Evaluation metric for MT is a complex issue. • String edit distance between reference string and result string (length in words: R) • Insertions (I) • Deletions (D) • Moves = pairs of Deletions and Insertions (M) • Remaining Insertions (I') and Deletions (D') • Simple String Accuracy = 1 – (I + D + S) / R • Generation String Accuracy = 1 – (M + I' + D' + S) / R June 6, 2001

A Finite-State Approach to Machine Translation

A Finite-State Approach to Machine Translation

Presentation Transcript

Finite State Machine

Finite State Machine Continued

Anusaaraka: An Approach to Machine Translation

Chapter #8: Finite State Machine Design 8.5 Finite State Machine Word Problems

Finite State Machine

Finite State Machine Minimization

Lab 2 – Finite State Machine

Converting a Finite State Machine to VHDL

Finite State Machine

Finite State Machine: Realization

Finite State Machine

Chapter #8: Finite State Machine Design 8.1 - 8.2 Finite State Machine Design

Finite State Machine: Introduction

Finite State Machine

Finite Automata (Finite State Machine)

A Finite-State Approach to Machine Translation

Finite State Machine (FSM)

Finite State Machine Analysis

Finite State Machine Design

The Finite State MacHine

Finite State Machine(FSM)

Machine Translation, Statistical Approach