1 / 18

Multilingual Conversational Interfaces: An NTT-MIT Collaboration

Multilingual Conversational Interfaces: An NTT-MIT Collaboration. Stephanie Seneff Spoken Language Systems Group MIT Laboratory for Computer Science January 13, 2000. Collaborators. James Glass (Co-PI, MIT-LCS) T.J. Hazen (MIT-LCS) Yasuhiro Minami (NTT Cyberspace Labs)

yachi
Download Presentation

Multilingual Conversational Interfaces: An NTT-MIT Collaboration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multilingual Conversational Interfaces:An NTT-MIT Collaboration Stephanie Seneff Spoken Language Systems Group MIT Laboratory for Computer Science January 13, 2000

  2. Collaborators • James Glass (Co-PI, MIT-LCS) • T.J. Hazen (MIT-LCS) • Yasuhiro Minami (NTT Cyberspace Labs) • Joseph Polifroni (MIT-LCS) • Victor Zue (MIT-LCS)

  3. What are Conversational Interfaces • Can communicate with users through conversation • Can understand verbal input • Speech recognition • Language understanding (in context) • Can retrieve information from on-line sources • Canverbalize response • Language generation • Speech synthesis • Can engage in dialogue with a user during the interaction Introduction || Approach || Mokusei || Summary

  4. Sentence LANGUAGE GENERATION SPEECH SYNTHESIS Speech Graphs & Tables DIALOGUE MANAGEMENT DATABASE Meaning Representation DISCOURSE CONTEXT Speech Meaning LANGUAGE UNDERSTANDING SPEECH RECOGNITION Words Components of Conversational Interfaces Introduction || Approach || Mokusei || Summary

  5. Language Generation Text-to-Speech Conversion Dialogue Management Application Back-end Audio Server Application Back-end Audio Server Hub Application Back-end I/O Servers Speech Recognition Discourse Resolution Language Understanding System Architecture: Galaxy Introduction || Approach || Mokusei || Summary

  6. Application Development at MIT • Jupiter: Weather reports (1997) • 500 cities worldwide • Information updated three times daily from four web sites, plus a satellite feed • Pegasus: Flight status (1998) • ~4,000 flights in US airspace for 55 major cities • Information updated every three minutes • Also uses flight schedule information, updated daily • Voyager: (Greater Boston) traffic and navigation (1998) • Traffic information updated every three minutes • Also uses maps and navigation information • Mercury: Travel planning (1999) • Flight information and reservation for ~250 cities worldwide • Flight schedule information and pricing • Demonstration: Jupiter in English Introduction || Approach || Mokusei || Summary

  7. Speech Generation Speech Generation Common Meaning Representation Speech Understanding Speech Understanding DATABASE NTT-MIT Collaborative Research: Mokusei • Explore language-independent approaches to speech understanding and generation • Develop necessary human-language technologies to enable porting of conversational interfaces from English to Japanese • Use existing Jupiter weather-information domain as test case • It is the most mature English system • It allows us to explore language technology for interface and content Introduction || Approach || Mokusei || Summary

  8. SPEECH SYNTHESIS SPEECH SYNTHESIS LANGUAGE GENERATION SPEECH SYNTHESIS Rules Graphs & Tables DIALOGUE MANAGER DATABASE Meaning Representation DISCOURSE CONTEXT LANGUAGE UNDERSTANDING SPEECH RECOGNITION Models Models Models Rules Language Independent Language Transparent Language Dependent Multilingual Conversational Systems: Our Approach Introduction || Approach || Mokusei || Summary

  9. Mokusei: Speech Recognition • Lexicon: >2,000 words • Phonological modeling: • Japanese specific phonological rules, e.g., • Deletion of /i/ and /u/: desu ka  /d e s k a/ • Acoustic modeling: • Used English models to generate transcriptions for Japanese (read and spontaneous) utterances • Retrained acoustic models to create hybrid models from a mixture of English and Japanese utterances • Language modeling: • Class n-gram using 60 word classes • Also exploring a class n-gram derived automatically from TINA Introduction || Approach || Mokusei || Summary

  10. Mokusei: Language Understanding • Parse query into meaning representation • Uses same NL system (TINA) as for English • Top-down parsing strategy with trace mechanism • Probability model automatically trained • Chooses best hypothesis from proposed word graph • Japanese grammar contains • >900 unique nonterminals • Nearly 2,500 vocabulary items • Translation file maps Japanese words to English equivalent • Produces same semantic frame (i.e., meaning representation) as for English inputs Introduction || Approach || Mokusei || Summary

  11. object-1 object-2 no object-1 object-3 no object-2 no object-1 subject wa object-3 no object-2 no object-1 predicate Mokusei: Language Understanding (cont’d) • Problem: Left recursive structure of Japanese requires look-ahead to resolve role of content words • Nihon wa . . . • Nihon no tenki wa . . . • Nihon no Tokyo no tenki wa . . . • Solution: Use trace mechanism • Parse each content word into structure labeled “object” • Drop off “object” after next particle, which defines role and position in hierarchy Nihon no Tokyo no tenki wa doo desu ka Introduction || Approach || Mokusei || Summary

  12. Mokusei: Content Processing • Update sources from Web sites and satellite feeds at frequent intervals • Now harvesting weather reports for ~50 additional Japanese cities • Use the same representation for English and Japanese • Parse all linguistic data into semantic frames to capture meaning • Scan frames for semantic content and prepare new relational database table entries Introduction || Approach || Mokusei || Summary

  13. rain/storm clause: weather_event topic: precip_act, name: thunderstorm, num: pl quantifier: some pred: accompanied_by adverb: possibly topic: wind, num: pl, pred: gusty and: precip_act, name: hail weather Frame indexed under weather, wind, rain, storm, and hail wind hail Japanese: Spanish: Algunas tormentas posiblement acompanadas por vientos racheados y granizo German: Einige Gewitter moeglichenweise begleitet von boeigem Wind und Hagel Mokusei: Example of Content Processing English: Some thunderstorms may be accompanied by gusty winds and hail Introduction || Approach || Mokusei || Summary

  14. Mokusei: Language Generation Using Genesis • Used English language generation tables as template • Modified ordering of constituents • Provided translation lexicon for >4,000 words • Challenges: • Prepositions had to be marked for role: in_loc, in_time • Multiple meanings for some other words: e.g., “well inland” • Complex sentences presented difficulties for constituent ordering • A new version of GENESIS is being developed to support finer control of constituent ordering Introduction || Approach || Mokusei || Summary

  15. Mokusei: Speech Synthesis • Currently use the NTT Fluet text-to-speech system • Fully integrated into the system • Runs as a server communicating with the Galaxy hub Introduction || Approach || Mokusei || Summary

  16. Mokusei Demonstration • Entire system running at MIT • Access via international telephone call • Scenario: inquiring about weather conditions in Japan and worldwide • Potential problems: • The system is VERY new! • System reliability • 14 hour time difference • Transmission conditions and environmental noise Introduction || Approach || Mokusei || Summary

  17. Lessons Learned • Our approach to developing multilingual interfaces appears feasible • Performance is similar to the English system two years ago • A top-down approach to parsing can be made effective for left-recursive languages • Word order divergence between English and Japanese motivated a redesign of our language generation component • Novel technique of generating a class n-gram language model using the NL component appears promising • Involvement of Japanese researcher is essential Introduction || Approach || Mokusei || Summary

  18. Future Work • Additional data collection from native Japanese speakers • Nearly 2,000 sentences were collected in December and January • Improvement of individual components • Vocabulary coverage, acoustic and language models • Parse coverage • Continued development of a more sophisticated language generation component • Expansion of weather content for Japan Introduction || Approach || Mokusei || Summary

More Related