290 likes | 457 Views
Speech Application Architectures. Markku Turunen Tampere Unit for Human-Computer Interaction University of Tampere MUMIN PhD course, Tampere, 18.-22.11.2002. Outline. Topics Background Architecture types Example architectures Topics for research Jaspis architecture.
E N D
Speech Application Architectures Markku Turunen Tampere Unit for Human-Computer InteractionUniversity of Tampere MUMIN PhD course, Tampere, 18.-22.11.2002
Outline Topics • Background • Architecture types • Example architectures • Topics for research • Jaspis architecture
Software architectures 1 Definitions • “software architecture defines the system in terms of components and interactions between them. Connectors are used to mediate interaction between the components” [Garlan & Shaw, 1994] • several views can be used to describe different aspects of software architectures: design view, run-time view, module view, logical view, control view, class view, … • human-computer interaction viewpoint: support for interaction methods and techniques
Software architectures 2 Software development tools • support and tools for the construction of practical applications • core architecture: basic infrastructure (hub/facilitator, communication libraries, blackboard) • complete architecture: technology components (ASR, TTS), dialogue manager, database, … • toolkit: dialogue editor, ASR grammar builder, corpus collection tool, annotation editor, …
Speech system components speech recognition natural language processing user telephone interface dialogue management database speech synthesis natural language generation
ASR NLU DM NLG TTS Architecture types 1 Pipelines and dialogue management architectures • pipeline (batch-sequence) architectures • data flow • one-way interfaces • fixed processing order • dialogue manager architectures • function calls • dialogue manager as controller • relaxed processing order TTS ASR NLU DM DB NLG UM
Architecture types 2 client-server and blackboard architectures • client-server architectures • two-way messages • hub as coordinator (star topology) • free processing order • blackboard (DB) architectures • data events / db operations • shared information • free processing order TTS DM NLU HUB ASR NLG DB UM TTS DM NLU IS ASR NLG UM
Architecture types 3 agent architectures • independent agents • independent agents • facilitator • collaborative processing • compact agents • compact agents • shared knowledge • distributed processing TTS DM NLU Facilitator ASR NLG DB UM NLU DA PA DA DE ASR IA IS Facilitator TTS UA IA PA PE PE DE NLG PA DA
Example architectures 1 GALAXY-II • MIT / MITRE • DARPA Communicator reference architecture • freely available • HUB and servers • frames (messages) • hub scripts route messages [Seneff et al., 1998]
Example architectures 2 Open Agent Architecture • general agent architecture • Facilitator as coordinator • requesters (tasks) • services (solutions) • Interagent Communication Language (ICL) • freely available • used in speech applications [Martin et al., 1999]
Example architectures 3 WITAS • dialogue manager agent reacts to events send by other agents • dialogue manager acts as blackboard • multimodal inputs are coordinated by DM • based on OAA [Lemon et al., 2001]
Example architectures 4 MITRE architecture • dialogue manager as controller • default processing order • dialogue manager monitors other components • dialogue manager is a kind of blackboard • based on OAA [Luperfoy et al., 1998]
Example architectures 5 TRIPS • agents, managers and shared databases • loosely coupled components • no dialogue manager • KQML messages • facilitator does not contain control logic [Allen et al., 2001]
Adaptive systems Need for adaptive applications • different users: speech-based communication can differ greatly between individual users and situations • speech is language and culture dependent • preferences and needs between user groups can be large • different approaches: people from different backgrounds have different solutions for same problems • we need interaction methods and architectures that adapt to the different users and situations and support multiple approaches
Topics for research Topics for speech systems • adaptivity: how to support adaptive methods? how to make systems to be adaptive? • reusability: components, interaction methods, … • distributed systems: communication protocols, resource sharing, ubiquitous applications • distributed interaction management: centralized dialogue manager is not suitable for many tasks • shared knowledge: dialogue, user etc. • development and evaluation tools: WOZ, corpora, …
Jaspis architecture speech application development framework • implementation of core architecture with extensions • designed especially for multilingual and distributed applications • overall focus on system level adaptivity • current focus on ubiquitous and multimodal applications • Java and XML, freely available • used in several projects and applications
Jaspis architecture overview NGL NLU DB UM
Jaspis components Agents, evaluators and managers • agents handle various interaction situations, such as speech input interpretations, dialogue decisions and speech output presentations • evaluators measure how well agents can handle current interaction situation • managers are used to coordinate agents and evaluators, especially to try to choose the best possible agents to handle each interaction situation
Information management in Jaspis • information storing method is not fixed (XML, DB) • information access protocol is defined (DTD) • Information Managers are used to access the Information Storage – these can be implemented in any language and they can use TCP/IP, XML-RPC or method calls
Presentation management in Jaspis • presentation agents convert conceptual messages to speech outputs • for every output the most suitable agent is selected by presentation evaluators • multiple presentation management modules for different phases
Dialogue management in Jaspis • different dialogue agents for different dialogue tasks • alternative dialogue agents for same dialogue tasks • dialogue evaluators select dialogue agents • no single controller (the dialogue manager) • multiple dialogue management modules
Communication (I/O) management in Jaspis • i/o-agents and evaluators handle, combine and coordinate different input streams • devices – clients – servers – engines • run-time interpretation and multimodal fusion • separate module for selection of input modalities
Jaspis extensions Beyond core infrastructure • XML-based linguistic information (Annotation Graphs) and log formats (corpus collection, usability tests) • visualization components (blackboard, interaction) • speech technology interfaces for common telephony cards, synthesizer and recognizers • reusable components: error handling, general tasks • SMS interface, graphical components • Wizard Of Oz tools
Jaspis Future improvements • concurrent dialogues and multiple users • event-based interaction management
TampereUnit forComputerHumanInteraction Department of Computer and Information Sciences http://www.cs.uta.fi/hci/spi/ spi@cs.uta.fi mturunen@cs.uta.fi