560 likes | 917 Views
Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework. Sphinx Lunch Talk Carnegie Mellon University, October 2004 Presented by: Dan Bohus Special appearances: Antoine Raux, Jahanzeb Sherwani, Thomas Harris. Examples. RoomLine
E N D
Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework Sphinx Lunch Talk Carnegie Mellon University, October 2004 Presented by: Dan Bohus Special appearances: Antoine Raux, Jahanzeb Sherwani, Thomas Harris
Examples • RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH • Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] • Sublime personalized information management system • TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments
Examples • RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH • Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] • Sublime personalized information management system • TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments
Examples • RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH • Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] • Sublime personalized information management system • TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments
Examples • RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH • Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] • Sublime personalized information management system • TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments
More Systems • LARRI multimodal system that assists F/A-18 aircraft maintenance personnel throughout the execution of procedural tasks [Symphony] • Madeleine text-based prototype for medical diagnosis system [MITRE workshop] • Eureka dialogue interface to the Vivisimo web search engine
The Communicator / RavenClaw Spoken Dialogue Systems Framework • Examples • Overall Architecture • System Development • Components & Resources • Miscellaneous • Current Research examples : architecture : development : components : miscellaneous : research
Recognition SPHINX Synthesis THETA Overall Architecture • Classical pipeline architecture Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (various) Lang. Generation ROSETTA examples : architecture : development : components : miscellaneous : research
Galaxy HUB • Generic centralized, message-passing communication architecture • Developed at MIT, used in Communicator program • Competitor: OAA Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Galaxy HUB Dialog Manag. RAVENCLAW Back-end (various) Synthesis THETA Lang. Generation ROSETTA examples : architecture : development : components : miscellaneous : research
Getting Even Closer Recognition SPHINX Lang. Understand. PHOENIX/HELIOS HUB Dialog Manag. RAVENCLAW Back-end (perl) Synthesis THETA Language Gen. ROSETTA examples : architecture : development : components : miscellaneous : research
Inputs from othermodalities Other domain agents DateTime Parsing PHOENIX Lang. Understand. PHOENIX/HELIOS Confidence HELIOS Back-end Galaxy Stub Lang. Generation Galaxy Stub Actual Perl Back-end Lang. Generation ROSETTA (Perl) Text I/O TTYServer PROCESSMONITOR Getting Even Closer Multiple, parallel decoders SPHINX SPHINX SPHINX Recognition Server HUB Dialog Manag. RAVENCLAW Back-end (perl) Synthesis THETA Lang. Generation ROSETTA examples : architecture : development : components : miscellaneous : research
The Communicator / RavenClaw Spoken Dialogue Systems Framework • Examples • Overall Architecture • System Development • Components & Resources • Miscellaneous examples : architecture : development : components : miscellaneous : research
Recognition SPHINX Synthesis THETA Building a Spoken Dialogue System Language, Acoustic, Lexical Models Grammar Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Lang. Generation ROSETTA (Limited Domain) Voice Templates examples : architecture : development : components : miscellaneous : research
Recognition SPHINX Synthesis THETA So How Long Will It Take? • MITRE Workshop on Dialogue Management (Fall 2003) • Develop a Text-based SDS formedical diagnosis (provided backend) • Madeleine (22 hours) Language, Acoustic, Lexical Models Grammar Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Lang. Generation ROSETTA (Limited Domain) Voice Templates examples : architecture : development : components : miscellaneous : research
Okay, How Long Will It Really Take? • To get a system running with a reasonable performance [poll amongst 3 RavenClaw developers] • 1 month to get a working system up and running • 1 month to fine-tune performance • Further iterative improvements will continue as more data accumulates examples : architecture : development : components : miscellaneous : research
The Communicator / RavenClaw Spoken Dialogue Systems Framework • Examples • Overall Architecture • System Development • Components & Resources • Miscellaneous examples : architecture : development : components : miscellaneous : research
Recognition SPHINX Synthesis THETA Components & Resources Language, Acoustic Models Grammar Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research
Components & Resources Language, Acoustic Models Grammar Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Synthesis THETA Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research
SPHINX II • Semi-continuous acoustic models • Off-the-shelf 8kHz, 11.025kHz, 16kHz models • Scripts for building your own • PLSA adapted models perform better • Language models • 2-gram & 3-gram model • CMU-Cambridge SLM Toolkit • Generate from Phoenix Grammar • Finite state grammar • Sphinx supports state-specific LMs • Dictionary (lexical models) • CMU Dictionary examples : architecture : development : components : miscellaneous : research
Sphinx II - continued • Multiple parallel decoders [e.g., male + female] • Multiple hypothesis forwarded, selection done later • Typical WER: 15-30% • With pronounced differences native vs. non-native • Lowered by retuning acoustic and language models to the domain • Migration to SPHINX 3.x in the near future • Expected: big improvement in WER • Concern: real-time performance
Recognition SPHINX Synthesis THETA Components & Resources Language, Acoustic Models Grammar Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research
Phoenix Parser / Grammar • Phoenix: Robust Parser • CFG Grammar • Manually-generated domain-specific grammar rules • Reusable, generic sub-grammars • [Yes], [No], [Number], [DateTime], [Help], [Repeat], [Suspend], etc… [room_size_spec] ([rss_large]) ([rss_small]) ([rss_larger]) ([rss_smaller]) ([rss_smallest]) ([rss_largest]) ; [rss_large] (large) (big) (huge) ; [rss_larger] (*the larger) (*the bigger) (too small) ; [rss_largest] (*the largest) (*the biggest) ; [rss_small] (small) (little) ; DO YOU HAVE SOMETHING A BIT LARGER? [NeedRoom] ( [_i_want] (DO YOU HAVE SOMETHING) ) [RoomSizeSpec] ( [room_size_spec] ( [rss_larger] (LARGER))) • Parses all incoming hypotheses and passes all parses along… examples : architecture : development : components : miscellaneous : research
Helios / Confidence Annotation • Builds accurate confidence scores using features from 3 sources of knowledge: • Speech recognition • Language understanding • Dialogue management • Selects hypothesis with maximum confidence score • Research in progress on hypothesis-selection, and transferability across domains examples : architecture : development : components : miscellaneous : research
Recognition SPHINX Synthesis THETA Components & Resources Language, Acoustic Models Grammar Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research
RavenClaw Architecture • Captures all domain-specific dialog (task) logic using a hierarchical description • The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine • Manages dialog by executing the dialog task specification • Provides a large number of domain-independent conversational strategies examples : architecture : development : components : miscellaneous : research
RavenClaw Architecture • Captures all domain-specific dialog (task) logic with a hierarchical description • The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine • Manages dialog by executing the dialog task specification • Provides a large number of domain-independent conversational strategies examples : architecture : development : components : miscellaneous : research
diagnostic have_fever general_feeling RavenClaw: Dialogue Task Specification • Tree of dialog agents • Terminals: Inform, Request, Expect, Execute • Non-terminals / Dialog agency: plans execution of child nodes • Basically a Hierarchical Task Execution Network; each agent: • Preconditions & effects • Success & failure criteria • Trigger (focus) criteria • Effects Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever examples : architecture : development : components : miscellaneous : research
general_feeling Sample DTS Code GeneralFeel R:HowAreYou? I:Glad I:Sorry // /Madeleine/GeneralFeel DEFINE_AGENCY(CGeneralFeel, DEFINE_CONCEPTS( STRING_USER_CONCEPT(general_feeling, none)) DEFINE_SUBAGENTS( SUBAGENT(HowAreYou, CHowAreYou) SUBAGENT(Glad, CGlad) SUBAGENT(Sorry, CSorry)) SUCCEEDS_WHEN(COMPLETED(Glad) || COMPLETED(Sorry))) // /Madeleine/GeneralFeel/HowAreYou DEFINE_REQUEST_AGENT(CHowAreYou, REQUEST_CONCEPT(general_feeling) GRAMMAR_MAPPING("![Yes]>good, ![FeelingGood]>good, " "![FeelingSoSo]>soso, ![FeelingBad]>bad"))) // /Madeleine/GeneralFeel/Glad DEFINE_INFORM_AGENT(CGlad, PRECONDITION(C("general_feeling") == CString("good")) PROMPT("inform glad_youre_good") ON_COMPLETION(FINISH(/Madeleine))) // /Madeleine/GeneralFeel/Sorry DEFINE_INFORM_AGENT(CSorry, PRECONDITION(C("general_feeling") != CString("good")) PROMPT("inform sorry_youre_bad")) examples : architecture : development : components : miscellaneous : research
have_fever diagnostic chart general_feeling RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda examples : architecture : development : components : miscellaneous : research
have_fever chart diagnostic general_feeling RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Madeleine examples : architecture : development : components : miscellaneous : research
have_fever chart general_feeling diagnostic RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Welcome Madeleine examples : architecture : development : components : miscellaneous : research
have_fever chart general_feeling diagnostic RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… Madeleine examples : architecture : development : components : miscellaneous : research
diagnostic have_fever chart general_feeling headache RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… LoadSymptoms Madeleine examples : architecture : development : components : miscellaneous : research
diagnostic have_fever chart general_feeling headache RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… Madeleine examples : architecture : development : components : miscellaneous : research
diagnostic have_fever chart general_feeling headache RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… GeneralFeel Madeleine examples : architecture : development : components : miscellaneous : research
general_feeling headache have_fever diagnostic chart RavenClaw Execution / Input Pass Madeleine I:Welcome E:LoadSymptoms GeneralFeel GeneralFeel Diagnose R:HowAreYou? I:Glad I:Glad I:Sorry I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… general_feeling: [good], [bad], [soso] How are you feeling today? general_feeling: [good], [bad], [soso] Not so good, I think I have a fever general_feeling: [good], [bad], [soso]have_fever: [fever]. ![yes], ![no]headache: [headache], ![yes], ![no]cough: [cough], ![yes], ![no]… … [soso](not so good)[fever](I think I have a fever) HowAreYou GeneralFeel GeneralFeel Madeleine examples : architecture : development : components : miscellaneous : research
general_feeling headache diagnostic have_fever chart RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… How are you feeling today? Not so good, I think I have a fever [soso](not so good)[fever](I think I have a fever) GeneralFeel Madeleine examples : architecture : development : components : miscellaneous : research
headache diagnostic have_fever chart general_feeling RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… How are you feeling today? Not so good, I think I have a fever [soso](not so good)[fever](I think I have a fever) Sorry Oh, I’m sorry to hear that… GeneralFeel Let me take your temperature… Madeleine examples : architecture : development : components : miscellaneous : research
RavenClaw – Other features • Dialogue Engine transparently provides a set of conversational skills • Universal dialogue mechanisms: • Repeat, Suspend / Resume, Quit • Help: • Help!, Where are we?, What can I say? • Error handling: • Explicit and implicit confirmations • Strategies for recovering from non-understandings • Dynamic dialogue task generation • Dynamic dialogue control policy
Components & Resources Language, Acoustic Models Grammar Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Synthesis THETA Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research
Backend & Domain Agents • Various problem-specific solutions • RoomLine • Connects to a static Perl database or to the CMU CorporateTime server; • Let’s Go! Bus Information system • Connects to a PostGRES database • Sublime • Connects to a MySQL database; also functions as a web-server; DTW search domain agent • Basically, build your own; we provide a stub for interfacing with the Galaxy-Hub examples : architecture : development : components : miscellaneous : research
Components & Resources Language, Acoustic Models Grammar Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Synthesis THETA Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research
Rosetta Language Generation • Template- and stochastic-based language generation • Input: (act, object, {slot=value}) • Output: text (tagged with concepts) # welcome to the system “welcome” => “Welcome to RoomLine, the automated conference room “. “reservation system.”, # greet user “greet_user” => (“Hi, <user_name>.”, “Hi, <user_name>, good to hear from you again.”), # inform the user that the system has misunderstood the times (order) “wrong_time_order” => sub { my %args = @_; my $time_interval_as_string = get_wrong_time_interval_as_string(\%args, “room_query.date_time.time”); my $answer = “I'm sorry, I must have misunderstood the “. “time you needed the room. “; $answer .= “I heard $time_interval_as_string. “; return [“$answer So, let's see ... “, “$answer So, let's try this again ... “, “$answer So, let's try this once more ... “]; }, examples : architecture : development : components : miscellaneous : research
Components & Resources Language, Acoustic Models Grammar Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Synthesis THETA Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research
Synthesis • Cepstral Theta synthesis • Open-domain unit-selection synthesis • SSML tags • [Currently working on barge-in location] • Festival synthesis • Diphone synthesis; Open-domain, Limited-domain unit-selection synthesis • SABLE tags • Server running separately on a Linux box examples : architecture : development : components : miscellaneous : research
The Communicator / RavenClaw Spoken Dialogue Systems Framework • Examples • Overall Architecture • System Development • Components & Resources • Miscellaneous • Current Research examples : architecture : development : components : miscellaneous : research
Miscellaneous – Documentation • Transmitted largely by oral tradition :) • A bit of documentation available • Research papers, slides • WIKI: http://hap.speech.cs.cmu.edu/commwiki • mostly for developers, postings of updates, recent developments; • hopefully more introductory materials soon. • More under work • Tutorials: 2 available, but a bit outdated examples : architecture : development : components : miscellaneous : research
Miscellaneous – Portability • Current systems work on PC Windows platforms • Galaxy has Linux version • Components are C, C++, (Visual Studio 6.0, Visual Studio.NET), Perl • How about using different input / output components? • Modify RavenClaw DMInterface class • Has been done for the Gemini parser / language generator examples : architecture : development : components : miscellaneous : research
Miscellaneous – Research Platform • Communicator / RavenClaw framework is a research platform! • Constantly evolving • Modular • Easy to change, develop and test new technologies • Research on variety of topics in a real-world, full-blown system: • Recognition, Language understanding, Dialogue management, Language generation, Synthesis • Your work can be evaluated / reused easily across multiple existing systems examples : architecture : development : components : miscellaneous : research
Miscellaneous - Download • www.cs.cmu.edu/~dbohus/RavenClaw • Download a version of RoomLine • An installation script can seed your own project from this RoomLine version examples : architecture : development : components : miscellaneous : research