1 / 56

Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework

Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework. Sphinx Lunch Talk Carnegie Mellon University, October 2004 Presented by: Dan Bohus Special appearances: Antoine Raux, Jahanzeb Sherwani, Thomas Harris. Examples. RoomLine

Download Presentation

Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework Sphinx Lunch Talk Carnegie Mellon University, October 2004 Presented by: Dan Bohus Special appearances: Antoine Raux, Jahanzeb Sherwani, Thomas Harris

  2. Examples • RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH • Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] • Sublime personalized information management system • TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments

  3. Examples • RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH • Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] • Sublime personalized information management system • TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments

  4. Examples • RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH • Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] • Sublime personalized information management system • TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments

  5. Examples • RoomLine conference room reservations within SCS; system can access schedules of 13 conf rooms in Wean-Hall and NSH • Let’s Go! Bus Information System bus schedule information system for Port Authority buses in Oakland and Squirrel Hill [Let’s Go! Project] • Sublime personalized information management system • TeamTalk an investigation into human and multi-robot spoken language communication in unstructured environments

  6. More Systems • LARRI multimodal system that assists F/A-18 aircraft maintenance personnel throughout the execution of procedural tasks [Symphony] • Madeleine text-based prototype for medical diagnosis system [MITRE workshop] • Eureka dialogue interface to the Vivisimo web search engine

  7. The Communicator / RavenClaw Spoken Dialogue Systems Framework • Examples • Overall Architecture • System Development • Components & Resources • Miscellaneous • Current Research examples : architecture : development : components : miscellaneous : research

  8. Recognition SPHINX Synthesis THETA Overall Architecture • Classical pipeline architecture Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (various) Lang. Generation ROSETTA examples : architecture : development : components : miscellaneous : research

  9. Galaxy HUB • Generic centralized, message-passing communication architecture • Developed at MIT, used in Communicator program • Competitor: OAA Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Galaxy HUB Dialog Manag. RAVENCLAW Back-end (various) Synthesis THETA Lang. Generation ROSETTA examples : architecture : development : components : miscellaneous : research

  10. Getting Even Closer Recognition SPHINX Lang. Understand. PHOENIX/HELIOS HUB Dialog Manag. RAVENCLAW Back-end (perl) Synthesis THETA Language Gen. ROSETTA examples : architecture : development : components : miscellaneous : research

  11. Inputs from othermodalities Other domain agents DateTime Parsing PHOENIX Lang. Understand. PHOENIX/HELIOS Confidence HELIOS Back-end Galaxy Stub Lang. Generation Galaxy Stub Actual Perl Back-end Lang. Generation ROSETTA (Perl) Text I/O TTYServer PROCESSMONITOR Getting Even Closer Multiple, parallel decoders SPHINX SPHINX SPHINX Recognition Server HUB Dialog Manag. RAVENCLAW Back-end (perl) Synthesis THETA Lang. Generation ROSETTA examples : architecture : development : components : miscellaneous : research

  12. The Communicator / RavenClaw Spoken Dialogue Systems Framework • Examples • Overall Architecture • System Development • Components & Resources • Miscellaneous examples : architecture : development : components : miscellaneous : research

  13. Recognition SPHINX Synthesis THETA Building a Spoken Dialogue System Language, Acoustic, Lexical Models Grammar Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Lang. Generation ROSETTA (Limited Domain) Voice Templates examples : architecture : development : components : miscellaneous : research

  14. Recognition SPHINX Synthesis THETA So How Long Will It Take? • MITRE Workshop on Dialogue Management (Fall 2003) • Develop a Text-based SDS formedical diagnosis (provided backend) • Madeleine (22 hours) Language, Acoustic, Lexical Models Grammar Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Lang. Generation ROSETTA (Limited Domain) Voice Templates examples : architecture : development : components : miscellaneous : research

  15. Okay, How Long Will It Really Take? • To get a system running with a reasonable performance [poll amongst 3 RavenClaw developers] • 1 month to get a working system up and running • 1 month to fine-tune performance • Further iterative improvements will continue as more data accumulates examples : architecture : development : components : miscellaneous : research

  16. The Communicator / RavenClaw Spoken Dialogue Systems Framework • Examples • Overall Architecture • System Development • Components & Resources • Miscellaneous examples : architecture : development : components : miscellaneous : research

  17. Recognition SPHINX Synthesis THETA Components & Resources Language, Acoustic Models Grammar Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

  18. Components & Resources Language, Acoustic Models Grammar Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Synthesis THETA Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

  19. SPHINX II • Semi-continuous acoustic models • Off-the-shelf 8kHz, 11.025kHz, 16kHz models • Scripts for building your own • PLSA adapted models perform better • Language models • 2-gram & 3-gram model • CMU-Cambridge SLM Toolkit • Generate from Phoenix Grammar • Finite state grammar • Sphinx supports state-specific LMs • Dictionary (lexical models) • CMU Dictionary examples : architecture : development : components : miscellaneous : research

  20. Sphinx II - continued • Multiple parallel decoders [e.g., male + female] • Multiple hypothesis forwarded, selection done later • Typical WER: 15-30% • With pronounced differences native vs. non-native • Lowered by retuning acoustic and language models to the domain • Migration to SPHINX 3.x in the near future • Expected: big improvement in WER • Concern: real-time performance

  21. Recognition SPHINX Synthesis THETA Components & Resources Language, Acoustic Models Grammar Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

  22. Phoenix Parser / Grammar • Phoenix: Robust Parser • CFG Grammar • Manually-generated domain-specific grammar rules • Reusable, generic sub-grammars • [Yes], [No], [Number], [DateTime], [Help], [Repeat], [Suspend], etc… [room_size_spec] ([rss_large]) ([rss_small]) ([rss_larger]) ([rss_smaller]) ([rss_smallest]) ([rss_largest]) ; [rss_large] (large) (big) (huge) ; [rss_larger] (*the larger) (*the bigger) (too small) ; [rss_largest] (*the largest) (*the biggest) ; [rss_small] (small) (little) ; DO YOU HAVE SOMETHING A BIT LARGER? [NeedRoom] ( [_i_want] (DO YOU HAVE SOMETHING) ) [RoomSizeSpec] ( [room_size_spec] ( [rss_larger] (LARGER))) • Parses all incoming hypotheses and passes all parses along… examples : architecture : development : components : miscellaneous : research

  23. Helios / Confidence Annotation • Builds accurate confidence scores using features from 3 sources of knowledge: • Speech recognition • Language understanding • Dialogue management • Selects hypothesis with maximum confidence score • Research in progress on hypothesis-selection, and transferability across domains examples : architecture : development : components : miscellaneous : research

  24. Recognition SPHINX Synthesis THETA Components & Resources Language, Acoustic Models Grammar Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

  25. RavenClaw Architecture • Captures all domain-specific dialog (task) logic using a hierarchical description • The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine • Manages dialog by executing the dialog task specification • Provides a large number of domain-independent conversational strategies examples : architecture : development : components : miscellaneous : research

  26. RavenClaw Architecture • Captures all domain-specific dialog (task) logic with a hierarchical description • The authoring effort is focused entirely here Dialog Task (Specification) Domain-independent Dialog Engine • Manages dialog by executing the dialog task specification • Provides a large number of domain-independent conversational strategies examples : architecture : development : components : miscellaneous : research

  27. diagnostic have_fever general_feeling RavenClaw: Dialogue Task Specification • Tree of dialog agents • Terminals: Inform, Request, Expect, Execute • Non-terminals / Dialog agency: plans execution of child nodes • Basically a Hierarchical Task Execution Network; each agent: • Preconditions & effects • Success & failure criteria • Trigger (focus) criteria • Effects Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever examples : architecture : development : components : miscellaneous : research

  28. general_feeling Sample DTS Code GeneralFeel R:HowAreYou? I:Glad I:Sorry // /Madeleine/GeneralFeel DEFINE_AGENCY(CGeneralFeel, DEFINE_CONCEPTS( STRING_USER_CONCEPT(general_feeling, none)) DEFINE_SUBAGENTS( SUBAGENT(HowAreYou, CHowAreYou) SUBAGENT(Glad, CGlad) SUBAGENT(Sorry, CSorry)) SUCCEEDS_WHEN(COMPLETED(Glad) || COMPLETED(Sorry))) // /Madeleine/GeneralFeel/HowAreYou DEFINE_REQUEST_AGENT(CHowAreYou, REQUEST_CONCEPT(general_feeling) GRAMMAR_MAPPING("![Yes]>good, ![FeelingGood]>good, " "![FeelingSoSo]>soso, ![FeelingBad]>bad"))) // /Madeleine/GeneralFeel/Glad DEFINE_INFORM_AGENT(CGlad, PRECONDITION(C("general_feeling") == CString("good")) PROMPT("inform glad_youre_good") ON_COMPLETION(FINISH(/Madeleine))) // /Madeleine/GeneralFeel/Sorry DEFINE_INFORM_AGENT(CSorry, PRECONDITION(C("general_feeling") != CString("good")) PROMPT("inform sorry_youre_bad")) examples : architecture : development : components : miscellaneous : research

  29. have_fever diagnostic chart general_feeling RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda examples : architecture : development : components : miscellaneous : research

  30. have_fever chart diagnostic general_feeling RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Madeleine examples : architecture : development : components : miscellaneous : research

  31. have_fever chart general_feeling diagnostic RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Welcome Madeleine examples : architecture : development : components : miscellaneous : research

  32. have_fever chart general_feeling diagnostic RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… Madeleine examples : architecture : development : components : miscellaneous : research

  33. diagnostic have_fever chart general_feeling headache RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… LoadSymptoms Madeleine examples : architecture : development : components : miscellaneous : research

  34. diagnostic have_fever chart general_feeling headache RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… Madeleine examples : architecture : development : components : miscellaneous : research

  35. diagnostic have_fever chart general_feeling headache RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… GeneralFeel Madeleine examples : architecture : development : components : miscellaneous : research

  36. general_feeling headache have_fever diagnostic chart RavenClaw Execution / Input Pass Madeleine I:Welcome E:LoadSymptoms GeneralFeel GeneralFeel Diagnose R:HowAreYou? I:Glad I:Glad I:Sorry I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… general_feeling: [good], [bad], [soso] How are you feeling today? general_feeling: [good], [bad], [soso] Not so good, I think I have a fever general_feeling: [good], [bad], [soso]have_fever: [fever]. ![yes], ![no]headache: [headache], ![yes], ![no]cough: [cough], ![yes], ![no]… … [soso](not so good)[fever](I think I have a fever) HowAreYou GeneralFeel GeneralFeel Madeleine examples : architecture : development : components : miscellaneous : research

  37. general_feeling headache diagnostic have_fever chart RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… How are you feeling today? Not so good, I think I have a fever [soso](not so good)[fever](I think I have a fever) GeneralFeel Madeleine examples : architecture : development : components : miscellaneous : research

  38. headache diagnostic have_fever chart general_feeling RavenClaw Execution Madeleine I:Welcome E:LoadSymptoms GeneralFeel Diagnose R:HowAreYou? I:Glad I:Sorry Fever Travel R:Headache R: R: R: R:AskFever E:MeasureTemp I:InformFever Dialog Stack Expectation Agenda Hi, this is Madeleine, the automated… How are you feeling today? Not so good, I think I have a fever [soso](not so good)[fever](I think I have a fever) Sorry Oh, I’m sorry to hear that… GeneralFeel Let me take your temperature… Madeleine examples : architecture : development : components : miscellaneous : research

  39. RavenClaw – Other features • Dialogue Engine transparently provides a set of conversational skills • Universal dialogue mechanisms: • Repeat, Suspend / Resume, Quit • Help: • Help!, Where are we?, What can I say? • Error handling: • Explicit and implicit confirmations • Strategies for recovering from non-understandings • Dynamic dialogue task generation • Dynamic dialogue control policy

  40. Components & Resources Language, Acoustic Models Grammar Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Synthesis THETA Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

  41. Backend & Domain Agents • Various problem-specific solutions • RoomLine • Connects to a static Perl database or to the CMU CorporateTime server; • Let’s Go! Bus Information system • Connects to a PostGRES database • Sublime • Connects to a MySQL database; also functions as a web-server; DTW search domain agent • Basically, build your own; we provide a stub for interfacing with the Galaxy-Hub examples : architecture : development : components : miscellaneous : research

  42. Components & Resources Language, Acoustic Models Grammar Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Synthesis THETA Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

  43. Rosetta Language Generation • Template- and stochastic-based language generation • Input: (act, object, {slot=value}) • Output: text (tagged with concepts) # welcome to the system “welcome” => “Welcome to RoomLine, the automated conference room “. “reservation system.”, # greet user “greet_user” => (“Hi, <user_name>.”, “Hi, <user_name>, good to hear from you again.”), # inform the user that the system has misunderstood the times (order) “wrong_time_order” => sub { my %args = @_; my $time_interval_as_string = get_wrong_time_interval_as_string(\%args, “room_query.date_time.time”); my $answer = “I'm sorry, I must have misunderstood the “. “time you needed the room. “; $answer .= “I heard $time_interval_as_string. “; return [“$answer So, let's see ... “, “$answer So, let's try this again ... “, “$answer So, let's try this once more ... “]; }, examples : architecture : development : components : miscellaneous : research

  44. Components & Resources Language, Acoustic Models Grammar Recognition SPHINX Lang. Understand. PHOENIX/HELIOS Dialog Manag. RAVENCLAW Back-end (perl) Back-end (perl) RavenClawDialogTaskSpecification Synthesis THETA Lang. Generation ROSETTA Limited Domain Voice Templates examples : architecture : development : components : miscellaneous : research

  45. Synthesis • Cepstral Theta synthesis • Open-domain unit-selection synthesis • SSML tags • [Currently working on barge-in location] • Festival synthesis • Diphone synthesis; Open-domain, Limited-domain unit-selection synthesis • SABLE tags • Server running separately on a Linux box examples : architecture : development : components : miscellaneous : research

  46. The Communicator / RavenClaw Spoken Dialogue Systems Framework • Examples • Overall Architecture • System Development • Components & Resources • Miscellaneous • Current Research examples : architecture : development : components : miscellaneous : research

  47. Miscellaneous – Documentation • Transmitted largely by oral tradition :) • A bit of documentation available • Research papers, slides • WIKI: http://hap.speech.cs.cmu.edu/commwiki • mostly for developers, postings of updates, recent developments; • hopefully more introductory materials soon. • More under work • Tutorials: 2 available, but a bit outdated examples : architecture : development : components : miscellaneous : research

  48. Miscellaneous – Portability • Current systems work on PC Windows platforms • Galaxy has Linux version • Components are C, C++, (Visual Studio 6.0, Visual Studio.NET), Perl • How about using different input / output components? • Modify RavenClaw DMInterface class • Has been done for the Gemini parser / language generator examples : architecture : development : components : miscellaneous : research

  49. Miscellaneous – Research Platform • Communicator / RavenClaw framework is a research platform! • Constantly evolving • Modular • Easy to change, develop and test new technologies • Research on variety of topics in a real-world, full-blown system: • Recognition, Language understanding, Dialogue management, Language generation, Synthesis • Your work can be evaluated / reused easily across multiple existing systems examples : architecture : development : components : miscellaneous : research

  50. Miscellaneous - Download • www.cs.cmu.edu/~dbohus/RavenClaw • Download a version of RoomLine • An installation script can seed your own project from this RoomLine version examples : architecture : development : components : miscellaneous : research

More Related