1 / 61

Stochastic Transductions for Machine Translation*

Stochastic Transductions for Machine Translation*. Giuseppe Riccardi AT&T Labs-Research *Joint work with Srinivas Bangalore and Enrico Bocchieri. Overview. Motivation Stochastic Finite State Machines Learning Machine Translation Models Case study MT for Human-Machine Spoken Dialog

solisw
Download Presentation

Stochastic Transductions for Machine Translation*

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stochastic Transductions for Machine Translation* Giuseppe Riccardi AT&T Labs-Research *Joint work with Srinivas Bangalore and Enrico Bocchieri January 26, 2000

  2. Overview • Motivation • Stochastic Finite State Machines • Learning Machine Translation Models • Case study • MT for Human-Machine Spoken Dialog • Experiments and Results

  3. Speech Understanding Case P( C ) Semantics C P(W|C) Syntax L P(A|W) Acoustic A max P(A,W,C) min A  L  C Speech Recognizer • ATIS 1994 DARPA Evaluation • (G. Riccardi et al., ICASSP 1995, E. Bocchieri et al. SLT Workshop 1995 • Levin et al., SLT Workshop 95)

  4. Source Spoken Language Target Spoken Language Motivation • Finite State Transducers (FST) • Unified formalism to represent symbolic transductions • Learnability • Automatically train transductions from (parallel) corpora • Speech-to-Speech Machine Translation chain • Combining speech and language sciences

  5. Overview • Motivation • Stochastic Finite State Machines • Learning Machine Translation Models • Case study • MT for Human-Machine Spoken Dialog • Experiments and Results

  6. Finite State Transducers • Weighted Finite State Transducers (FST) • Algebraic Operations (1 + 2, 1 2…) • Composable E-S =  E-J  J-S • Minimization min(1) • Stochastic transductions (E-S :E* X S* [0,1]) • Joint Probability Decomposition: P(X1, X2, …, XN) P(X1) P(X2| X1)…P(XN| XN-1) • ..and computation  1  2….N

  7. Stochastic Transducers I,e I,e I,3 I,1 3 4 1 2 1 2 V/4 5 cool,<adv>/0.3 cool,<adj>/0.3 1 2 cool,<noun>/0/2 cool,<verb>/0.2

  8. Learning FSTs • Data-driven Learnig FSTs from large corpora. • Learnability • Finite History Context (N-gram). • Generalization • Unseen Event modeling (back-off) • Class-based n-grams. • Phrase grammar (long-distance dependency) • Context-free grammar approximation. • Large-scale transductions • Efficient state and transition function model • Variable Ngram Stochastic Machine (VNSM)

  9. VNSM: the state and transition space • Bottom-up approach • Each state is associated to n-tuple in the corpus • Each transition is associated to adjacent strings • Parametrization: #states  #n-tuples in the corpus #transitions  #n-tuples in the corpus #-transition  #(n-1)-tuples in the corpus VNSM recognizes W  V* (V is the dictionary)

  10. VNSM:Unseen Event Modeling(the power of amnesia/reminiscence) w4 History= w2, w3   History= w3   History=“”  

  11. VNSM: probability distributions • Probability Distribution over W  V* • Parameter tying • Probability training

  12. Overview • Motivation • Stochastic Finite State Machines • Learning Machine Translation Models • Case study • MT for Human-Machine Spoken Dialog • Experiments and Results

  13. Stochastic FST Machine Translation • Decompose language translation into two independent processes. Lexical Choice : searching the target language words Word Reordering: searching the correct word order • Modeling the two processes as stochastic finite state transductions • Learning the transductions from bilingual corpora. Speech and Language finite state transduction chain Source Spoken Language Target Spoken Language

  14. Stochastic Machine Translation MT • Noisy-channel paradigm (IBM) • Stochastic Finite State Transducer Model

  15. Learning Stochastic Transducers • Given the input-output pair training set • Align the input and output language sequences: • Estimate the joint probability via VNSMs • Local reordering • Sentence-level reordering

  16. Pairing and Aligning (1) • Source-target language pairs • Sentence Alignment • Automatic algorithm (Alshawi, Bangalore and Douglas, 1998) Spanish :ajá quiero usar mi tarjeta de crédito English : yeah I wanna use my credit card Alignment : 1 3 4 5 7 0 6

  17. Learning SFST from Bi-language • Bi-language: each token consists of a source language word with its target language word. • Ordering of tokens: source language order or target language order • ajá quiero usar mi tarjeta de crédito • yeah I wanna use my credit card • (ajá,yeah) (e,I) (quiero,wanna) (usar,use) (mi,my) (tarjeta,card) (de, e) (crédito,credit)

  18. Learning Bilingual Phrases • Effective translation of text chunks (e.g. collocations) • Learn bilingual phrases • Joint entropy minimization on bi-language corpus • Weighted Mutual Information to rank bilingual phrases • Phrase-based VNST • Local Reordering of phrases una llamada de larga distancia a call long distance a long distance call VNST Local Reordering

  19. Local Reordering • Spanish Reordered Phrase=min(S  TLM) • Word permutation machineexpensive • S is the “sausage” machine • TLM is the target language model

  20. Lexical Reordering • Output of the lexical choice transducer: sequence of target language phrases. • like to makeI'd calla calling cardplease • Words in phrases are in target language word order. • However, phrases need to be reordered in target language word order. • Reordered: • I'd like to make a calling card call please

  21. Lexical Reordering Models • Tree-based model • Impose a tree structure on a sentence (Alshawi et.al ACL98) • English: I'd like to charge this to my home phone

  22. 私は (I) 私は (I) 家の (home) これを (this) これを (this) 私の (my) 電話に (phone) したいのです (like) したいのです (like) 家の (home) チャージ (charge) 私の (my) 電話に (phone) チャージ (charge) Lexical Reordering Models • Reordering using tree-local reordering rules. Eng-Jap:私は したいのですチャージこれを私の家の 電話に Japanese: 私はこれを 私の 家の 電話に チャージしたいのです

  23. Lexical Reordering Models (contd.) • Dependency tree represented as a bracketed string (bounded) with reordering instructions. e:[したいのです:したいのですe:-1e:[チャージ:チャージe:]e:] • Training VNSTs from bracketed corpus • Output of lexical reordering VNST: strings with reordering instructions. • Instructions are composed with “interpreter” FST to form target language sentence.

  24. Tree Reordering • Sentence-level reordering • Mapping sentence tree structures English :my card credit (spanish order) English : my credit card (english order) Transduction 1 card -1 +1 my credit Transduction 2 (alignment statistics) card -2 -1 my credit Transduction 3

  25. ASR-based Speech Translation Acoustic Model Training Alignment VNST Learning Lexicon FSM Bi-Phrase Learning Speech Recognizer Tree Reordering

  26. Overview • Motivation • Stochastic Finite State Machines • Learning Machine Translation Models • Case study • MT for Human-Machine Spoken Dialog • Experiments and Results

  27. MT Evaluation • Lexical Accuracy (LA) • Bag of words. • Translation Accuracy (TA) • Based on string alignment • Application-driven evaluation • “How May I Help You?” • Spoken dialog for call routing • Classification based on salient phrase detection

  28. HELP DA rate . . . area code billing credit Automated Services and Customer Care via Natural Spoken Dialog • Prompt is “AT&T. How may I help you?” • User responds with unconstrained fluent speech • Spoken Dialog System for call routing

  29. Examples • Yes I like to make this long distance call area code x x x x x x x x x x • Yeah I need the area code for rockmart georgia • Yeah I’m wondering if you could place this call for me I can’t seem to dial it it don’t seem to want to go through for me

  30. Call-Classification Performance • False Rejection Rate: • Probability of rejecting a call, • given that the call-type is one • of the 14 call-type set. • Probability Correct: • Probability of correctly • classifying a call , given that • the call is not rejected.

  31. MT evaluation on HMIHY

  32. DEMO

  33. Conclusion • Stochastic Finite State based approach is viable and effective for limited domain MT. • Finite-state model chain for complex speech and language constraints. • Multilingual speech application enabled by MT • Coupling of ASR and MT http://www.research.att.com/~srini/Projects/Anuvaad/home.html

  34. Biblio -J. Berstel “Transductions and Context Free Languages” Teubner Studienbüchner -G. Riccardi, R. Pieraccini and E. Bocchieri, "Stochastic Automata for Language Modeling", Computer Speech and Language, 10, pp. 265-293, 1996. -Fernando C. N. Pereira and Michael Riley. Speech Recognition by Composition of Weighted Finite Automata . Finite-State Language Processing. MIT Press, Cambridge, Massachusetts. 1997 -S. Bangalore and G. Riccardi, "Stochastic Finite-State Models for Spoken Language Machine Translation", Workshop on Embedded Machine Translation Systems, NAACL, pp. 52-59, Seattle, May 2000. More references on http://www.research.att.com/info/dsp3 http://research.att.com/info/dsp3

  35. Stochastic Finite State Models:from concepts to speech 1993 • Variable Ngram Stochastic Automata (VNSA) • Concept Modeling for NLU • Word Sequence Modeling for ASR • Phonotactic Transducers (context-to-phone) • Tree-structured Transducers (phone-to-word) • Stochastic-FSM based ASR (context-to-concept) • ATIS Evaluation: it actually worked! 1994

  36. Why it worked? • Symbolic representation (SFSM) for probabilistic sequence modeling (words, concepts,..). • Learning algorithms • Cascade (phrase grammar -> {phrases, word classes} -> words) • Machine Combination • Context-to-Phone, Phone-to-Word, Context-to-Grammar (CLG) • Decoding very simple and fast (Viterbi and Beam-Width Search)

  37. Tarjeta de credito Credit card ASR-MT Engine Credit card Multilingual Speech Processing • Finite state chain allow for: • Speech and Language coupling (e.g. prosody, recognition errors) • Integrated multilingual processing

  38. Speech Translation • Previous approaches to Speech Translation • Source language ASR • Translation Model • Finite-state Model based Speech Translation • Source Language Acoustic Model • Lexical Choice Model • Lexical Reordering Model

  39. Learning the state space and state transition function (revised) • For each suffix in the corpus, we create two states (one for string recognition and the other for backoff, epsilon transition). • The size of the automaton is still linear in the corpus size • The stochastic automaton is able to compute word probability for all strings in X*!.

  40. “elections” States/p1 the president of United History(4)=“the president of United” PrevClass=Adj PrevPrevClass=Function Word Trigger(10)=“Elections” Airlines/p2 /p3 State Transition Probability  Stochastic Finite State Automata/Transducers Word Prediction ……the President of United ???……

  41. Learning Lexical Choice Models • English utterances recorded from customer calls. • Manually translated into Japanese/Spanish. • ``Bunsetsu'' like tokenization for Japanese. • Alignment English: I'd like to charge this to my home phone Japanese: 私は これを 私の 家の 電話に チャージ したいのです Alignment: 1 7 0 6 2 0 3 4 5 • Bilanguage I'd_私はlike_したいのですto_echarge_チャージthis_これをto_emy_私のhome_家のphone_電話に

  42. Learning Bilingual Stochastic Transducers • Learn stochastic transducers from bilanguage (Embedded MT 2000) • Learn automatically bilingual phrases • Reordering within phrases. エイ ティー アンド ティーA T and T 私の 家の 電話にto my home phone 私は コレクト コールをI need to make かける 必要がありますa collect call tarjeta de credito credit card una llamada de larga distancia a long distance call

  43. Lexical Choice Transducer • Language Model: N-gram model built on phrase-chunked bilanguage. • A combination of phrases and words maximize predictive power and minimize number of parameters • Resulting finite-state automaton on bilanguage vocabulary is converted into a finite-state transducer.

  44. Lexical Reordering • Output of the lexical choice transducer: sequence of target language phrases. • like to makeI'd calla calling cardplease • Words in phrases are in target language word order. • However, phrases need to be reordered in target language word order. • Reordered: • I'd like to make a calling card call please

  45. Lexical Reordering Models • Tree-based model • Impose a tree structure on a sentence (Alshawi et.al ACL98) • English: I'd like to charge this to my home phone

  46. 私は (I) 私は (I) 家の (home) これを (this) これを (this) 私の (my) 電話に (phone) したいのです (like) したいのです (like) 家の (home) チャージ (charge) 私の (my) 電話に (phone) チャージ (charge) Lexical Reordering Models • Reordering using tree-local reordering rules. Eng-Jap: 私は したいのですチャージこれを私の家の 電話に Japanese: 私はこれを 私の 家の 電話に チャージしたいのです

  47. Lexical Reordering Models (contd.) • Dependency tree represented as a bracketed string with reordering instructions. e:[したいのです:したいのですe:-1e:[チャージ:チャージe:]e:] • Lexical reordering FST: Result of training a stochastic finite-state transducer on the corpus of bracketed strings. • Output of lexical reordering FST: strings with reordering instructions. • Instructions are interpreted to form target language sentence.

  48. 私は (I) 私は (I) 家の (home) これを (this) これを (this) 私の (my) 電話に (phone) したいのです (like) したいのです (like) 家の (home) チャージ (charge) 私の (my) 電話に (phone) チャージ (charge) Translation using stochastic FSTs • Sequence of finite-state transductions English: I’d like to charge this to my home phone Eng-Jap: 私は したいのですチャージこれを私の家の 電話に Japanese: 私はこれを 私の 家の 電話に チャージしたいのです

  49. Spoken Language Corpora • Prompt: How may I help you? • Examples Yeah I need the area code for rockmart georgia Yes I'd like to make this long distance call area code x x x x x x x x x x Yeah I'm wondering if you could place this call for me I can't seem to dial it it don't seem to want to go through for me • Parallel corpora: English, Japanese, Spanish.

  50. Evaluation Metric • Evaluation metric for MT is a complex issue. • String edit distance between reference string and result string (length in words: R) • Insertions (I) • Deletions (D) • Moves = pairs of Deletions and Insertions (M) • Remaining Insertions (I') and Deletions (D') • Simple String Accuracy = 1 – (I + D + S) / R • Generation String Accuracy = 1 – (M + I' + D' + S) / R

More Related