180 likes | 313 Views
ICSLP Info Building the Interspeech 2006 Information Line. Rohit Kumar, Arthur Chan, Lingyun Gu Spring 2006 11754: Dialog Lab Project. Layout. Motivation Use Cases System Architecture Backend Language Components Grammar, NLG Templates, Language Models Dialog Component
E N D
ICSLP InfoBuilding the Interspeech 2006 Information Line Rohit Kumar, Arthur Chan, Lingyun Gu Spring 2006 11754: Dialog Lab Project
Layout • Motivation • Use Cases • System Architecture • Backend • Language Components • Grammar, NLG Templates, Language Models • Dialog Component • Telephony Component • Conclusions & Future Work • Distribution of work among team members
Motivation • To provide schedule information to attendees of Interspeech 2006, Pittsburgh • Users can ask queries about • Time / Location of Papers & Sessions • Search for Papers & Session by title & author • Other information • Panel Discussion, Keynotes, Breaks • To study development of Spoken Dialog System by practicing involved principles
Use Case • One typical use case is shown below • System: Welcome to Interspeech 2006 Information Line. What information about the conference schedule do you need ? I can talk about the various papers and sessions in the conference. • User: When is the session on Language modeling on Thursday ? • System: Looking for Language modeling session. • User: Yes • System: I found one session matching your query. There is a session on Language Modeling at 12.30pm on Thursday in Siaca Hall on 2nd Floor. Do you need any more information ? • User: No • System: Thank you for using the Interspeech 2006 Information Line. Have a nice day.
Present Results max.3 Welcome Prompt Summary Prompt Natural Language Prompt: How may I help you? Got Something No understanding Sorry. I didn’t get that. Confirmation Menu: What type of Information you want? Papers Sessions Date Author and/or Title Paper Found No Paper Found Confirmation Area / Topic Area / Topic Confirmation Confirmation Title / Keywords Time of Day Confirmation More Info? Decide if we have enough info. to lookup few definitive records Lookup Intended Dialog Flow chart
System Architecture • Based on the Ravenclaw Dialog System Architecture along with use of the VeraOut system for Telephony through Skype. • Sphinx2 for Telephony setup • Sphinx3 for Desktop Setup
System Components • The following components were specifically engineered for this project • Backend: Database & Robust Querying • DM: Dialog Task Specification • Parser: Grammar • ASR: Language Model & Vocabulary • NLG: Templates • Telephony: Skyper
Backend • Database: • ICSLP 2005 Records crawled from their Website to build the currently used database • Information crawled • Sessions (Chair, Title, Time, Date, Location) • Papers (Title, Session, Type, Time, Date, Location, Chair, Authors) • Other information manually filled into the database like keynote address, panel discussions, special sessions, etc. • Database structured as a single table with all relevant information available for each record • Totally: 875 records
Backend • Querying System • Statistical Matching of Queries • Queries considered as bag of words and matched with appropriate fields of all records • Records with highest normalized match confidence are reported back • At most 10 records are returnedIf more than 10 records match, 0 records are returned • Query pre-processing: Expanding abbreviations(TTS text to speech) • Robust: Allows matching of Multilingual Speech to Multilingual Text to Speech records unlike fixed expression matching
Grammar • 126 Nets including generic & task specific non – terminals • Task Specific Concepts Extracted • Session Names (110 with variations) • Author Names (92) • Query Type (5: Paper, Session, Keynote, Panel, Lunch) • Request Type (5: When, Where, Which, How Many, About) • Date & Time Specifications (Day of week is the only one used in the dialog)
NLG: Templates • Number of Task Specific Templates authored (70) • Establish Context: 3 • Inform: 37 • Request: 13 • Implicit Confirmation: 7 • Explicit Confirmation: 10
Language Models & Vocabulary • Training Corpus • Created for a strict version of grammar • ~120,000 utterances in the training corpus • Created by generating fake sentences from various nets of the grammar and concatenating the individual sets • ~100,000 utterances make of author names, session titles, full and partial valid queries • Utterances appearing more than 25 times removed to keep only 25 copies • Utterances appearing between 12 to 24 times reduced in number
Language Models & Vocabulary • Vocabulary size: • 497 unique words + <s>, </s>, <UNK> in LM • 497 words with pronunciation variations • Trained trigram models using CMU-Cambridge LM Toolkit • Good Turing discounting applied • Number of Unique Trigrams: 37333 • Number of Unique Bigrams: 5177 • WER (on 22 utterances, 8k Sphinx3 models): 23.529%
Final Dialog Task Specification 55 Agents in Total
Telephony: Skyper • Skype In used to receive calls • Skype Id: interspeech06 • Phone: 412 567 2683 • Additional Setup • Automatically Receive call • Non Default Audio Devices • No ringing • Hardware Dependency eliminated • Virtual Audio Cable Software • Created 2 virtual cables • Virtual Cable 1 IN for Sphinx • Virtual Cable 1 OUT for Skype • Virtual Cable 2 IN for Skype • Virtual Cable 2 OUT for TTS • No Sound Cards Required at all • Better quality due to no Physical coupling between sound cards
Conclusions & Future Work • Improving LM for better ASR performance • Extended Vocabulary with pronunciation corrections & variations • State specific LMs • Useful when asking only for day, session name, author name, etc. • Incorporation of Interspeech 2006 schedule information when available • Bug fixes in Audio Server and Sphinx3 for better functioning with Skyper based telephony • Extending dialog functionality to support further questioning on results of a query • Session management in Back end
Task Distribution • Besides participation in group meeting, everyone’s individual contributions to the project are listed below • Rohit Kumar • Initial Project Proposal • Backend: Crawling from ICSLP05 site, Querying System, Galaxy Agent Wrapper • Dialog Management: Complete First and Final iteration of Dialog Task Specification & Implementation • Natural Language Generation: Authoring of all Final Templates (in proper “british” English) • Grammar for the Parser • Telephony Integration • Building Final (non-class based) LM and Vocabulary and integration with final system after implementation of recommended tuning • Project Presentation & other documentation • Arthur Chan* • Project proposal refinement • Initial version management setup • Intermediate iteration of Dialog Task implementation • First and Intermediate iteration of Language models • Intermediate iteration some of the NLG Templates • Several fixes in Sphinx3 and Audio Server for improved performance • Lingyun Gu • Worked on LM fine tuning as per recommendations • Intermediate authoring of 1 NLG Template • Intermediate authoring of grammar * dropped course