330 likes | 509 Views
Speech Service Creation. NY / NJ Chapter December, 2006. An Overview of Speech Service Creation Tools. K. W. (Bill) Scholz. Agenda. Speech Applications – where we were and where we are Building speech applications today Methodologies and Tools Reusable components & packaged applications
E N D
Speech Service Creation NY / NJ Chapter December, 2006 An Overview of Speech Service Creation Tools K. W. (Bill) Scholz
Agenda • Speech Applications – where we were and where we are • Building speech applications today • Methodologies and Tools • Reusable components & packaged applications • Summary of today’s Leading VUI creation tools • Highlight / compare / contrast industry’s leading tools
What’s it take to build a speech app? Requirements, Use Cases, Project Plan Dialog Design & Test Call flow, Implementation, & Test Prompts, Grammars, & Test Data / Back-end Integration, & Test Unit Test, Integration Test, System Test Pilot, Limited Deployment, Analysis Full Deployment, Analysis
Where We’ve Come From: Building Speech Apps • Development toolkits designed for building DTMF applications were extended to support speech • Call flows had the sound-and-feel of DTMF apps • Grammars were constructed by hand • Back-end integration coded by hand, often targeting closed-architecture information stores • Screen scraping – ‘row 12, column 37, 9 characters’ • Proprietary closed databases • Separate natural language processors driven by recognizer output required separate ‘NL’ grammars • Poor TTS quality generated need for recorded prompts
Where We Are: Building speech apps today • Methodologies and Tools • Methodology: problem statement, use cases, dialog design, project management • Data / Back-end integration • Reusable components • OpenSpeech Dialog Modules • Reusable Dialog Components • Packaged applications • Testing & Analytics
Current Practice Most applications use state-based dialogs • Easiest to design, debug and test for current simple applications • Natural fit with the directed dialogs that are easiest for novice users • Speech recognizer grammars are simpler to construct and therefore less error prone • As developers and users become exposed to more sophisticated dialog approaches, they will become less satisfied with state-based dialogs • Goal-directed • Conversational • Rule-based
And others…… Avaya Dialog Designer IBM WebSphere Intervoice InVision Microsoft Speech .NET NetByTel (TuVox) Nortel MPS Developer (was PeriProducer) Nuance OSD Orange Nextfire OAVS Tools for Building Speech Applications • Dialog design, evaluation, call flow development back-end integration, prototype, deployment, tuning, life cycle support. • Vendors • Active: • Audium: the ‘Audium Builder’ • DBscape Vocabase • Fluency: ‘Voice Runner’ • OpenMethods: ‘OpenVXML’ • TuVox: ‘CVR’ (‘Producer’ + management & analytics) • Vicorp: ‘xMP’ • VoiceObjects: ‘VoiceObjects X6’ • Inactive: • Unisys: the ‘NL Speech Assistant’ • Unveil: ‘Conversation Manager’ • Vocalocity: ‘AppCenter’ • Support: • Eclipse – Back-end integration • Microsoft: ‘Visio’ for call flow representation • Nuance: OSI – Tuning
SCE Tools: what to look for • Manipulable element – what the SCE assembles • Element detailing – how each is tailored for use • Business rule / back-end integration • Architectural model – underlying design pattern • Life cycle support – pre- and post-deployment management and testing
Visio to Represent Dialog Call flow Source: Unisys ‘FFA’ design specification)
Audium (Purchased by Cisco) • Audium Builder: a GUI that permits users to create and manage multiple applications • Visual elements include functions for managing databases, menus, dates and times, or phone transfers, as well as credit card or email processing. • Application creation is done by dragging elements to the workspace to construct the call flow • As elements are added their properties can be configured to load pre-recorded audio or TTS prompts, and configured to play naturally to callers. • Elements are interconnected using the GUI to assign ‘exit states’ to reach an end goal. Source: Joe Oh, Audium, (private communication)
Application treeview Tools Object properties Audium
DBscape Vocabase The VocaBase “Dialog Map” represents the sequence of modules, sub-modules, and steps. Clicking on any element permits access its detailed configuration.
Fluency ‘Voice Runner’ Key features of this tool are: • Visual component assembly • Integrated component assembly analysis & testing • One click assembly deployment • Library of process and rule components: • Address Collection • Credit Card Verification
VoiceObjects 6 Desktop • Tree structure to represent dialog design • Point-and-click authoring. • Layering includes system layers and user-built layers • Single click packages an application for deployment • Back-end integration: ‘connectors’ support both server-side scripting and J2EE code execution • Uses object-oriented concepts Source: http://www.voiceobjects.com/
List of all available VoiceObjects Individual editor for voice object VoiceObjects Desktop – At a glance Components Resources Logic Actions Source: Tiemo Winterkamp, VoiceObjects (private communication)
VoiceObjects Desktop - Control Center Source: Tiemo Winterkamp, VoiceObjects (private communication)
Vocalocity AppCenter Source: Ken Rehor - 2005
Back-end Integration • Java, JSP, C# • Scripting languages • PERL • JSP / ASP • PHP • … • Databases • Oracle • Microsoft SQL Server • MySQL / PostgreSQL • Web Services • AJAX (Asynchronous Javascript and XML)
Testing • Unit – emulation • Callflow – WoZ or live • Usability – WoZ or live • Post deployment analytics
Modules and packaged applications Modules: components and templates Component Template Application A software program designed to perform a specific set of functions A piece of software that can be combined with other pieces to construct a program A pattern used to replicate objects Source: Steve Erlich, Apptera (private communication)
SCE Analysis and Evaluation • Manipulable element – what the SCE assembles • Dialog state • Object module • Conversation step • Element detailing • Properties and values • Element attributes • Prompt and grammar management • Business rule / back-end integration • Built-in primitives • Integration with Java, Web Services, Databases • Architectural model • OO? FSM? SOA? MVC? Design patterns? • Visible dialog metalanguage? • Life cycle: Deployment and post-deployment support • Reuse: create, package, and integrate reusable components • Test capability; test script generation; WoZ capability • Analytics
Audium • Application Development assets • Gui is implemented using Eclipse. VISIO-like view • Inline grammars can be generated directly by the Studio • Centralized prompt management capability; recording scripts generated • OSDM integration supported (but RDCs are not) • XML dialog meta-language documented and the DTD provided • Multiple ‘Form’ elements can be combined to generate mixed-initiative dialog • Multi-user collaboration is well supported and demonstrated at customer sites • Runtime assets • Applications published as XML; interpreted by a Java runtime engine • SNMP queries are generated • Liabilities • Layering is not distinct – common database and external component references • No 3rd party application support • No automatic test script generation • No dedicated form for mixed initiative • No runtime cluster or server management • No speaker verification or video service generation capability • Elements oriented towards programmers, not towards VUI designers
Vicorp • Application Development assets • Explicit separation of presentation layer from business objects layer • Visio-like presentation of application call flow. • Inline grammars with confidence levels generated from item lists • Prompt categories facilitates multiple persona and language management. • Invokes 3rd party applications by URI with arguments. • Directed dialog, mixed initiative, and sub dialogs are supported. • Runtime assets • Applications published as EAR files for execution on J2EE application server. • Service Management Console provided to mange server clusters. • Liabilities • No support for the generation of SSML for TTS • Internal XML dialog meta-language not exposed for use • No automatic testing of applications; no post-deployment analytics • No support for multi-user management or collaboration • Speaker verification and video service generation not shown • It is not possible to open multiple simultaneous projects then cut-and-paste between them.
VoiceObjects • Application Development assets • Layering facilitates runtime prompt and persona remapping • Java extensions easily integrated as external resources • OSDM integration supported • Invokes 3rd party applications by URI with arguments. • XML dialog meta-language documented, DTD provided • Recording script generation by DB query • Multi-user collaboration supported: user logons with specific privileges • Runtime assets • Single runtime engine accesses all applications as data • Runtime data collection through ‘InfoStore’ and a mature Analytics package. • Extensive server cluster management, including SNMP • Support for multi-tenancy: separate JVMs launched for each tenant • Liabilities • Reusable Dialog Components are not supported • No explicit prompt management • Eclipse integration is incomplete • Confidence values not supported • No generation of SSML or recording scripts • No built-in application testing capability or test script generation capability • Natural language apps only supported by reference to external SLMs • External resources such as Java jar files are not managed by app dev environment.
Supported by Multiple Leading Vendors Conclusion • Building speech applications today….. …..a bit like a marriage! Something old, something new, something borrowed, ..... Dialog modules, Packaged apps VUI built with tools ASR and TTS subsystems
Summary • Overview of speech application creation process • Building speech applications today • Methodologies and Tools • Reusable components • Packaged applications • Where the field is going • Dialog description languages and tools: MI, Personalization, automatic call flow generation • SLMs, ASR & TTS improvements, Rule-Based and Case-Based Reasoning
Thank You. K. W. (Bill) Scholz, Ph.D. Home: +1 610.989.0989 Mobile: +1 610.212.8016 bill.scholz@comcast.net