CSETalk: A Spoken Dialogue System for CSE Course Information

CSETalk: A Spoken Dialogue System for CSE Course Information Preethi Jyothi, Rohit Prabhavalkar, Thomas Lynch, Deepak Bal, Prateeti Mohapatra

Outline • Introduction • Automatic Speech Recognition • Language Understanding • Galaxy Overview • Backend • Ravenclaw Architecture • Speech Synthesis • Evaluation Metrics • Challenges Faced • Conclusions and Future Work

Introduction • Dialog systems seek to provide a natural conversational interaction between the user and the computer system • Spoken input from the human user -> meaning of the utterance -> results of the operation to the user • Two-way flow of information • User-to-system • System-to-user

Introduction • Types of Dialogue Systems • System-initiative • User-initiative • Mixed-initiative

Motivation • All the components of a speech and language processing systems are integrated • Building a SDS is a challenging task due to the interaction of the various components that are involved • Goal: To implement an end to end spoken dialog system for the Course Information system using the RavenClaw dialog management framework

Other Dialogue Systems • Saplen system 1997 [R. López-Cózaret et al. Eurospeech1997]: Food ordering system • Let’s Go! Bus Information System [Raux et. al. Eurospeech 2000] • ITSpoke: Intelligent Tutoring SDS [Litman and Silliman HLT-NAACL 2004]

Recognition SPHINX Synthesis THETA Overall Architecture Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (various) Lang. Generation ROSETTA

CSETalk: Course Information System • Information-based • Mixed type of dialogue system • The user can seek information about course numbers, credits, instructors, call numbers, etc. • Typical query • What course is Prof. X offering? • What is the call number for CSE-333?

ASR – Sphinx II • Using the Sphinx II decoding engine • Uses semi-continuous hidden markov models • Used an off the shelf acoustic model (male and female voice ) • Pronunciation model and language model were built using Sphinx Knowledge Base Tool • Used an n-gram language model, built from a corpus of sentences generated randomly from system grammar

Helios – Confidence Annotation • Generates a confidence score for the utterance based on information from ASR, Parser and Dialog manager • ASR • Number of words, 'unconfident' words • Parser • Uncovered words, transitions between parsed fragments, unparsed fragments, etc. • Dialog manager • State of dialog, concepts expected at current state, number of turns at current state, etc.

Helios – Confidence Annotation • Allows detection and recovering from misunderstandings • Unfortunately, due to lack of time, we were not able to explore it fully

Phoenix - Parser • The parser parses the word sequence into a set of semantic frames • A frame is a set of named 'slots' • Frame: [CSETalk] Nets: [QueryCallNum] [QueryInstructor] ... • Nets are compiled into RTNs

Phoenix - Parser RTN for [QueryCallNum]

Phoenix - Parser

Galaxy Communicator http://communicator.sourceforge.net/sites/MITRE/distributions/GalaxyCommunicator/docs/manual/index.html

What is Galaxy Communicator • Per the Galaxy Website: • Distributed • Message based • Hub-and-spoke Infrastructure • Optimized for constructing spoken dialogue systems • Origins: • Based on MIT Galaxy System • Currently Mitre Corporation /Darpa Communicator (Now Concluded). • Available at Sourceforge http://communicator.sourceforge.net/sites/MITRE/distributions/GalaxyCommunicator/docs/manual/index.html

Galaxy Overview • Modular with Galaxy as the communications controller. • Everything is sent using sockets and frames. • Servers/modules can be located on the network or internet. • Galaxy transforms the information from text strings to sockets and back again. • Easy to interface with C++ or any other language with a sockets library. • Handles storage and forwarding for each independent module. • Monitors the status of the other modules. • Logs information pertaining to the other servers specified. (Very flexible.) • No need to negotiate communications parameters with the other modules. • Simple to add different package to handle some aspect of the system. • Write an interface from the new system to Galaxy. • For MySQL, changed the original backend server call to load and call MySQL. • The same strategy would work for other modules. • Downside: Galaxy is a single point of failure. If Galaxy goes down, everything is down and no one is notified.

The Details • What its not: • Not an end-to-end dialogue system. • No run-time semantic standards • No Configuration-time semantic standards • But compatible with many standards like W3C Voice Browsers group's proposed specifications. • Knowledge Requirements: • You need to know C programming. • Some background in distributed processing. • RPC, CORBA, Java RM • Reasonable command of English - to read the documentation • Platforms: Sparc Solaris, Intel Linux and Win32 • Compiler: GNU gcc and make.

The Exchange { query course_query courseId seven thirty three credits callNum instructor fosler-lussier ) Select * From courses Where courseId =“730” and instructor like “fosler-lussier”

The Exchange { query refine_results results : 1 { { courseId "730" credits "3" callNum "04581-7" callNumM "04581-7" dow "T R" starttime "1130" endtime "1248" room "DL 0305" instructor "Eric Fosler-Lussier" title "Survey of Artificial Intelligence II: Advanced Topics U G 3" description "A survey of advanced concepts, techniques, and applications of artificial intelligence, including knowledge-based systems, learning, natural language understanding, and vision." ) } } aStr = Gal_GetObject(f, ":inframe"); inframe = Gal_StringValue(aStr); // Our Stuff next line GetCourseInfoC( inframe, &outframe ); aStr = Gal_StringObject(outframe); Gal_SetProp(f, ":outframe", aStr);

Backend Server MySQL is the database. Possibly a different choice will be made next time. Maybe something simple like SQLite. Original Roomline calls a perl script’s for data retrieval. Many of the components also use perl scripts for various functions.

Galaxy Summary • Makes Life Easy • It Glues Everything Together (Seamlessly?) • Do Not Start From Scratch! You’ll be sorry. • Modify an Existing System. • Great Piece of Software to Allow Experimenting with various components. • However, too many single points of failure!

RavenClaw Architecture • Domain-specific dialog control logic specified • Most of the effort goes into creating this specification Dialog Task Specification (Domain Specific) Dialog Engine (Domain Independent) • Runs the Dialog Task Specification to specify the dialog control at runtime • Provides a large number of domain-independent conversational strategies

Dialog Task Specification • Hierarchical plan for the dialog • Tree of dialog agents • Non-terminals - Dialog Agencies • Terminals - Fundamental Dialog Agents • Inform • Request • Expect and Execute

Dialog Agents • Concepts – associated with agents • Concepts have pre-defined types • Set of value/confidence pairs • Structure of an agent • Execute routine – dependent on the agent type • Preconditions • Success/Failure criteria • Trigger conditions/ Trigger commands

RavenClaw Dialog Engine • Functions using two data structures • Dialog Stack • Expectation Agenda • Functions in two phases • Execution Phase – Dialog agents executed from Dialog Stack • Input Phase – Uses Expectation Agenda to map user inputs to concept values

Dialog Stack Expectation Agenda Dialog Task Specification Dialog Task Specification [captures domain-specific dialog control logic] Dialog Engine [domain-independent reusable component] Execution Phase

Dialog Stack Expectation Agenda Dialog Task Specification Dialog Task Specification [captures domain-specific dialog control logic] Dialog Engine [domain-independent reusable component] Execution Phase CSETalk

Dialog Stack Expectation Agenda Dialog Task Specification Dialog Task Specification [captures domain-specific dialog control logic] Dialog Engine [domain-independent reusable component] Execution Phase S: Welcome to CSE Talk ... Welcome CSETalk

Dialog Stack Expectation Agenda Dialog Task Specification Dialog Task Specification [captures domain-specific dialog control logic] Dialog Engine [domain-independent reusable component] Execution Phase S: Welcome to CSE Talk ... CSETalk

Dialog Stack Expectation Agenda Dialog Task Specification Dialog Task Specification [captures domain-specific dialog control logic] Dialog Engine [domain-independent reusable component] Execution Phase S: Welcome to CSE Talk ... Task CSETalk

Dialog Stack Expectation Agenda Dialog Task Specification Dialog Task Specification [captures domain-specific dialog control logic] Dialog Engine [domain-independent reusable component] Execution Phase S: Welcome to CSE Talk ... Course_Id: [courseNumber] S: How can I help you? How May I Help You Task CSETalk

Dialog Stack Expectation Agenda Dialog Task Specification Dialog Task Specification [captures domain-specific dialog control logic] Dialog Engine [domain-independent reusable component] Execution Phase S: Welcome to CSE Talk ... Course_Id: [courseNumber] Course_Id: [courseNumber] Course_Inst: [professor] … S: How can I help you? How May I Help You Task CSETalk

Dialog Stack Expectation Agenda Dialog Task Specification Dialog Task Specification [captures domain-specific dialog control logic] Dialog Engine [domain-independent reusable component] Input Phase S: Welcome to CSE Talk ... Course_Id: [courseNumber] Course_Id: [courseNumber] Course_Inst: [professor] … S: How can I help you? How May I Help You Task U: Who’s teaching seven thirty? [courseNumber](seven thirty) CSETalk

Language Generation - Rosetta • Template-Based • Attribute-Value pairs  Text for synthesis • Input from Dialog Manager, Output to TTS • Input Frame { act inform object welcome } • Output Welcome to CSETalk Automated Course Information….

Speech Synthesis Within Galaxy, Kalliope manages the speech synthesis Is the link between Festival and Galaxy Hub Submits synthesis requests to Festival, holds a synthesis queue, etc Formatted to work with multiple synthesis systems (Festival, Swift, Theta)

Speech Synthesis • Actual speech synthesized by Festival • Open Source speech synthesis software • Developed at University of Edinburgh and worked on extensively at CMU • Currently includes many types of voices • Diphone (what we use) • HMM based (referred to as HTS) • Unit Selection

Speech Synthesis • CMU’s Festvox project hopes to make the building of new voices more systematic and better documented • Extensive instructions and tools on the web to help with the creation of new voices • Limited domain could be useful for our project • Uses unit selection methods

Performance Evaluation • Many facets to measuring performance • Efficiency • Processing time • How much time is wasted with corrections, etc. • Quality • How many times did the system misinterpret • How many times did the user have to correct the system • User Satisfaction • Does the user feel happy with their interaction • Have user fill out survey • Task Success • Did the user leave with the information that they came for

Performance Evaluation • One framework that tries to incorporate all of these facets is PARADISE (PARAdigm for DIalogue System Evaluation) • PARADISE proposes to compute a performance measure as a function of both task success and dialogue costs • If design changes were made, PARADISE could be used to evaluate the effectiveness of the changes.

Challenges Faced • NO DOCUMENTATION!!  • Hard to localize faults since ASR/DM/NLG are tightly coupled • Festival runs best on UNIX whereas the rest of the system runs on Windows • Limited Time

Conclusions and Future Work • Using more classes in the language model • Using other corpuses to obtain the acoustic model and the language model information • Creating or using better TTS voice • Actually evaluating performance as discussed

CSETalk: A Spoken Dialogue System for CSE Course Information

CSETalk: A Spoken Dialogue System for CSE Course Information

Presentation Transcript

An overview of the SPHINX Speech Recognition System

Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework

Environmental Information System for Uzbekistan

Spoken Dialogue Systems

Phonetics

The System Development Life Cycle

Chapter 3 Planning Spoken and Written Messages

Chapter 8 COGNITIVE DEVELOPMENT: INFORMATION-PROCESSING PERSPECTIVES

Facilitating spoken language development in the regular classroom

HUMAN INFORMATION PROCESSING SYSTEM

LEXICON AND LEXICAL SEMANTICS WORDNET

CS 224S / LINGUIST 285 Spoken Language Processing

Management Information System

Israeli and Palestinian Women in Dialogue

Speaker: Hung-yi Lee

Spoken Language Understanding

Dialogue-main

PROGRAMME 3: Information Management System