570 likes | 757 Views
Speech-to-Speech MT in the JANUS System. Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University. Outline. Design and Engineering of the JANUS/C-STAR speech-to-speech MT system Fundamentals of our approach System overview Engineering a multi-domain system
E N D
Speech-to-Speech MT in the JANUS System Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University
Outline • Design and Engineering of the JANUS/C-STAR speech-to-speech MT system • Fundamentals of our approach • System overview • Engineering a multi-domain system • The C-STAR Travel Domain Interlingua (IF) • Evaluation and User Studies • Conclusions, Current and Future Research UMD Seminar
JANUS Speech Translation • Translation via an interlingua representation • Main translation engine is rule-based • Semantic grammars • Modular grammar design • System engineered for multiple domains • Incorporate alternative translation engines UMD Seminar
Analysis German Korean English French Japanese Italian Interchange Format Japanese Italian English French German Korean Generation • Instructions: • Delete sample document icon and replace with working document icons as follows: • Create document in Word. • Return to PowerPoint. • From Insert Menu, select Object… • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked. • Click OK • Select icon • From Slide Show Menu, Select Action Settings. • Click “Object Action” and select “Edit” • Click OK Multilingual Interlingual Machine Translation UMD Seminar
Advantages of Interlingua • Avoid the n-sqared problem for all-ways translation. • Mono-lingual grammar development teams. • Add a new language easily and automatically get all-ways translation to all previous languages. UMD Seminar
The C-STAR Travel Planning Domain General Scenario: • Dialogue between one traveler and one or more travel agents • Focus on making travel arrangements for a personal leisure trip (not business) • Free spontaneous speech UMD Seminar
The C-STAR Travel Planning Domain Natural breakdown into several sub-domains: • Hotel Information and Reservation • Transportation Information and Reservation • Information about Sights and Events • General Travel Information • Cross Domain UMD Seminar
A Travel DialogueTranslated from Italian A: Albergo Gabbia D’Oro. Good evening. B: My name is Anna Maria DeGasperi. I’m calling from Rome. I wish to book two single rooms. A: Yes. B: From Monday to Friday the 18th, I’m sorry, to Monday the 21st. A: Friday the 18th of June. B: The 18th of July. I’m sorry. A: Friday the 18th of July to, you were saying, Sunday. B: No. Through Monday the 21st. UMD Seminar
A Travel Dialogue(Continued) B: So with departure on Tuesday the 22nd. A: Then leaving on the 22nd. Yes. We have two singles certainly. B: Yes. A: Would you like breakfast? B: Is it possible to have all meals? A: No. We serve meals only in the evening. B: Ok. If you can do breakfast and dinner. A: Ok. B: Do you need a deposit? UMD Seminar
A Travel Dialogue(Continued) A: You can give me your credit card number. B: Ok. Just a moment. Ok. My name is Anna Maria DeGaperi. The card is 005792005792. A: Good. B: Expiration 2002. A: 2002. Good. Thank you. We need a confirmation on the 18th of July before 6pm. B: Goodbye. A: Thanks. Goodbye. B: Thanks. Goodbye. UMD Seminar
A Non-Task-Oriented Dialogue(We can’t translate this.) A: Are you cooking? B: My father is cooking. I’m cleaning. I just finished cleaning the bathroom. A: Look. What do you know about Monica? B: I don’t know anything. Look. I don’t know anything. A: You don’t know anything? I wrote her three weeks ago, but if she hasn’t received the letter, they would have returned it. I hope she received it. B: Because Celia told me that the address that Monica had given us was wrong. She said that if I was going to write to her, well, …. UMD Seminar
Semantic Grammars • Describe structure of semantic concepts instead of syntactic constituency of phrases • Well suited for task-oriented dialogue containing many fixed expressions • Appropriate for spoken language - often disfluent and syntactically ill-formed • Faster to develop reasonable coverage for limited domains UMD Seminar
Semantic Grammars Hotel Reservation Example: Input: we have two hotels available Parse Tree: [give-information+availability+hotel] (we have [hotel-type] ([quantity=] (two) [hotel] (hotels) available) UMD Seminar
The JANUS-III TranslationSystem UMD Seminar
The JANUS-III TranslationSystem UMD Seminar
The SOUP Parser • Specifically designed to parse spoken language using domain-specific semantic grammars • Robust - can skip over disfluencies in input • Stochastic - probabilistic CFG encoded as a collection of RTNs with arc probabilities • Top-Down - parses from top-level concepts of the grammar down to matching of terminals • Chart-based - dynamic matrix of parse DAGs indexed by start and end positions and head cat UMD Seminar
The SOUP Parser • Supports parsing with large multiple domain grammars • Produces a lattice of parse analyses headed by top-level concepts • Disambiguation heuristics rank the analyses in the parse lattice and select a single best path through the lattice • Graphical grammar editor UMD Seminar
SOUP Disambiguation Heuristics • Maximize coverage (of input) • Minimize number of parse trees (fragmentation) • Minimize number of parse tree nodes • Minimize the number of wild-card matches • Maximize the probability of parse trees • Find sequence of domain tags with maximal probability given the input words: P(T|W), where T= t1,t2,…,tn is a sequence of domain tags UMD Seminar
Modular Grammar Design • Grammar development separated into modules corresponding to sub-domains (Hotel, Transportation, Sights, General Travel, Cross Domain) • Shared core grammar for lower-level concepts that are common to the various sub-domains (e.g. times, prices) • Grammars can be developed independently (using shared core grammar) • Shared and Cross-Domain grammars significantly reduce effort in expanding to new domains • Separate grammar modules facilitate associating parses with domain tags - useful for multi-domain integration within the parser UMD Seminar
Translation with Multiple Domain Grammars UMD Seminar
Analysis with Multiple Domain Grammars • Parser is loaded with all domain grammars • Domain tag attached to grammar rules of each domain • Previously developed grammars for other domains can also be incorporated • Parser creates a parse lattice consisting of multiple analyses of the input into sequences of top-level domain concepts • Parser disambiguation heuristics rank the analyses in the parse lattice and select a single best sequence of concepts UMD Seminar
A SOUP Parse Lattice UMD Seminar
Alternative Analysis Approach: SALT SALT - Statistical Analyzer for Lang. Translation • Combines ML trainable and rule-based analysis methods for robustness and portability • Rule-based parsing restricted to well-defined set of argument-level phrases and fragments • Trainable classifiers (NN, Decision Trees, etc.) used to derive the DA (speech-act and concepts) from the sequence of argument concepts. • Phrase-level grammars are more robust and portable to new domains UMD Seminar
SALT Approach • Example: Input: we have two hotels available Arg-SOUP: [exist] [hotel-type] [available] SA-Predictor: give-information Concept-Predictor: availability+hotel • Predictors using SOUP argument concepts and input words • Preliminary results are encouraging UMD Seminar
Instructions: • Delete sample document icon and replace with working document icons as follows: • Create document in Word. • Return to PowerPoint. • From Insert Menu, select Object… • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked. • Click OK • Select icon • From Slide Show Menu, Select Action Settings. • Click “Object Action” and select “Edit” • Click OK Design Criteria of the Interchange Format • Suitable for task oriented dialogue • Based on speaker’s intent, not literal meaning • Domain independent framework with domain-specific parts • Simple and reliable enough to use: • at multiple research sites. • with widely varying type of parsers and generators UMD Seminar
Instructions: • Delete sample document icon and replace with working document icons as follows: • Create document in Word. • Return to PowerPoint. • From Insert Menu, select Object… • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked. • Click OK • Select icon • From Slide Show Menu, Select Action Settings. • Click “Object Action” and select “Edit” • Click OK Domain Actions: Extended, Domain-Specific Speech Acts Examples: c:request-information+availability+room a:give-information+personal-data c:give-information+temporal+arrival UMD Seminar
Instructions: • Delete sample document icon and replace with working document icons as follows: • Create document in Word. • Return to PowerPoint. • From Insert Menu, select Object… • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked. • Click OK • Select icon • From Slide Show Menu, Select Action Settings. • Click “Object Action” and select “Edit” • Click OK Task Oriented Sentences • Perform an action in the domain. • Are not descriptive. • Contain fixed expressions that cannot be translated literally. UMD Seminar
Instructions: • Delete sample document icon and replace with working document icons as follows: • Create document in Word. • Return to PowerPoint. • From Insert Menu, select Object… • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked. • Click OK • Select icon • From Slide Show Menu, Select Action Settings. • Click “Object Action” and select “Edit” • Click OK Components of the Interchange Format speakera: (agent) speech actgive-information concept*+availability+room argument*(room-type=(single & double), time=md12) UMD Seminar
Instructions: • Delete sample document icon and replace with working document icons as follows: • Create document in Word. • Return to PowerPoint. • From Insert Menu, select Object… • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked. • Click OK • Select icon • From Slide Show Menu, Select Action Settings. • Click “Object Action” and select “Edit” • Click OK Examples • no that’s not necessary • c:negate • yes I am • c:affirm • and I was wondering what you have in the way of rooms available during that time • c:request-information+availability+room • my name is alex waibel • c:give-information+personal-data(person-name=(given-name=alex, family-name=waibel)) • and how will you be paying for this • a:request-information+payment(method=question) • I have a mastercard • c:give-information+payment(method=mastercard) UMD Seminar
Speaker Tag Client says: Do you take credit cards? c:request-information+payment (method=credit-card) Agent says: Will you be paying with a credit card? a:request-information+payment (method=credit-card) UMD Seminar
Instructions: • Delete sample document icon and replace with working document icons as follows: • Create document in Word. • Return to PowerPoint. • From Insert Menu, select Object… • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked. • Click OK • Select icon • From Slide Show Menu, Select Action Settings. • Click “Object Action” and select “Edit” • Click OK Size of IF May 1999 • Speech acts 54 • Concepts 84 • Arguments 118 UMD Seminar
accept acknowledge acknowledge-action affirm affirm-action apologize closing delay-action end-action give-certainty give-information I’ll take that, Sounds good Okay, Sure, Yeah Here you go, This is it Yes, That is correct Yes, please do, Go ahead Sorry, I’m sorry Bye, See you next week I’ll get back to you on that That’s all for now I’m sure I have 2 singles available Speech Acts UMD Seminar
greeting greeting-nice-meet greeting-request greeting-response-bad greeting-response-good greeting-welcome introduce-self introduce-topic negate negate-action not-understand Hello, Good morning Nice to meet you How are you I’m not good I’m fine Welcome to Pittsburgh This is Brian, Best Western About that flight… No No, don’t I don’t understand Speech Acts UMD Seminar
offer offer-information offer-repeat please-wait reject (e.g., offer) request-action request-affirmation request-certainty request-delay-action request-information request-introduce-self How about it? Let me get you the information Let me repeat that Just a minute, Let me see No, I don’t want that Can you reserve that for me? Is that correct? Are you sure? Can I get back to you later? Do you accept visa? Who am I speaking with? Speech Acts UMD Seminar
request-knowledge request-neg-affirmation request-repeat request-suggestion request-verification return-from-delay suggest thank verify welcome x-exclamation Do you know? Is that bad? Could you repeat that? Which hotel should I get? Right?, That was 40 dollars? I’m back How about a single? Thank you very much Yes, that is 40 dollars. You’re welcome That is beautiful! (ETRI only) Speech Acts UMD Seminar
testing testing-problem testing-start testing-stop testing-proceed testing-request-proceed testing-ready testing-present testing-request-present Testing 1 2 3, This is a test We have a problem Let’s start Let’s stop Go ahead! Would you go first Ready here We are here, CMU is on line Are you there? Meta-Demo Speech acts UMD Seminar
Some Concepts • Actions:change, reservation, confirmation, cancellation, help, purchase, view, display, preference • Attributes:availability, price, temporal, price, location, size, features etc. • Objects:room, hotel, flight, tour, event, attraction, web-page etc. • Other:arrival, departure, numeral, expiration-date, payment UMD Seminar
Using Concepts to Represent Information Focus Is there a hotel in Pittsburgh? c:request-information+availability+hotel (location=pittsburgh) Is the hotel in Pittsburgh? c:request-information+location+hotel (location=pittsburgh) UMD Seminar
Topic vs Focus The Hilton Hotel is in Verona. a:give-information+location+hotel (hotel-name=hilton, location=verona) The hotel in Verona is the Hilton Hotel. a:give-information+location+hotel (hotel-name=hilton, location=verona) UMD Seminar
The Interchange Format Database d.u.sdu olang X lang Y Prv Z “sdu-in-language-Y on one line” d.u.sdu olang X lang E Prv Z “sdu-in-English on one line” d.u.sdu IF Prv Z dialogue-act-on-one-line d.u.asdu comments: your comments d.u.asdu comments: go here 61.2.3 olang I lang I Prv IRST “telefono per prenotare delle stanze per quattro colleghi” 61.2.3 olang I lang E Prv IRST “I’m calling to book some rooms for four colleagues” 61.2.3 IF Prv IRST c:request-action+reservation+features+room (for-whom= (associate, quantity=4)) 61.2.3 comments: dial-oo5-spkB-roca0-02-3 UMD Seminar
The Interchange Format Database English Dialogues English Sentences Korean Dialogues Korean Sentences Italian Dialogues Italian Sentences Japanese Dialogues Japanese Utterances Distinct Dialogue Acts 36 2466 70 1142 5 233 124 5887 554 (310 agent, 244 client) UMD Seminar
Instructions: • Delete sample document icon and replace with working document icons as follows: • Create document in Word. • Return to PowerPoint. • From Insert Menu, select Object… • Click “Create from File” • Locate File name in “File” box • Make sure “Display as Icon” is checked. • Click OK • Select icon • From Slide Show Menu, Select Action Settings. • Click “Object Action” and select “Edit” • Click OK Phenomena Not Covered Anaphora Comparative Constructions Scope (negation and modifiers) Relative Clauses Plurality Descriptive Sentences UMD Seminar
Expressivity vs Simplicity • If it is not expressive enough, components of meaning will be lost. • If it is not simple enough, it can’t be used reliably across sites. UMD Seminar
Coverage • The database includes about 550 distinct dialogue acts. • About 60 dialogue acts cover about 70% of the data. • About 5% of unseen data wasn’t covered (as judged by human experts) UMD Seminar
Consistency of Use Across Sites • Successful international demo. • After testing English-Italian and English-Korean, Italian-Korean worked without extra effort. • Inter-coder agreement for each component of IF individually (speech acts, concepts, arguments) around 85% • Cross-site evaluation same as intra-site evaluation: 60% spoken; 75% transcribed. UMD Seminar
User Studies • We conducted three sets of user tests • Travel agent played by experienced system user • Traveler is played by a novice and given five minutes of instruction • Traveler is given a general scenario - e.g., plan a trip to Heidelberg • Communication only via ST system, multi-modal interface and muted video connection • Data collected used for system evaluation, error analysis and then grammar development UMD Seminar
Evaluations • Accuracy Based Evaluation • Translation preserves original meaning • Task Based Evaluation • goal success or failure • user effort: how many attempts before succeeding or giving up UMD Seminar
Accuracy Based Evaluation • End-to-end evaluations conducted at the SDU (sentence) level • Multiple bilingual graders compare the input with translated output and assign a grade of: Perfect, OK or Bad • OK = meaning of SDU comes across • Perfect = OK + fluent output • Bad = translation incomplete or incorrect UMD Seminar
Task Based Evaluation I would like to reserve #1s a single room #2f request-action+reservation+hotel (room-type=single) Translation: I would like to reserve a seating room. UMD Seminar
Task Based Evaluation Scoring Scheme: For goals that succeed: 1/n For goals that fail: -(1-1/n) where n is the number of attempts Overall score: average for all goals UMD Seminar