440 likes | 560 Views
Mike Phillips. Technology Development at SpeechWorks. Intro to SpeechWorks. Founded 1994, Public since Aug 2000 (SPWX) Ongoing MIT relationship 275+ people Boston, NY, SF, Montreal, Mexico, Singapore, UK Network-based speech products, tools and services. Telco AT&T BellSouth
E N D
Mike Phillips Technology Development at SpeechWorks
Intro to SpeechWorks • Founded 1994, Public since Aug 2000 (SPWX) • Ongoing MIT relationship • 275+ people • Boston, NY, SF, Montreal, Mexico, Singapore, UK • Network-based speech products, tools and services
Telco AT&T BellSouth Cellular One Singapore Telecom Banking One of top three CIBC First Union Brokerage E*TRADE One of top three Discover (via Fiserv) Three other disct brkrs Singapore Stock Exchange Portals AOL HeyAnita AudioPoint Travel United and Continental Airlines Amtrak / MTRC (HongKong) Payless Car Rental Distribution FedEx (via NextLink) Roberts Insurance Guardian Life Manulife Financial Pharmaceuticals / Health McKessonHBOC Boston Medical Ctr Manufacturing / IT Apple Computer Hewlett Packard Representative Customers
SpeechWorks Architecture Application Application Development Environment DialogModules Application Building Blocks Core Speech Technologies: ASR, TTS, Verification
SpeechWorks Recognizer Continuous & speaker-independent Phonetic -- type in words Barge-in -- all deployed systems Large vocabulary -- over 50,000 words Natural language through BNF grammars Dynamic vocabularies and grammars MultiLingual
SpeechWorks Architecture Application Application Development Environment DialogModules Application Building Blocks Core Speech Technologies: ASR, TTS, Verification
Each recognition step must handle • Silence, or no speech detected • Cannot understand the utterance (rejection) • Confidence interpretation • High (OK) & Low (needs confirmation) • Touch-tone inputs (DTMF) Even after a confident recognition • Disambiguation (We have two Bills, which one do you mean) • Make use of previous information Why Do We Need DialogModules?
Specification Preliminary UI Design Requirements Analysis Project Plan High Level System Design
Development Vocabularies (“Recognition Contexts”) Accuracy Testing System Integration Functional Testing Usability Testing User Interface (“Call Flow”) Database / Transaction Interface Telephony / CTI Interface Platform / Operations Interface
Deployment Performance & Operability Test Pilot User Test Partial Deploy Full Deploy UI & Context Tuning UI & Context Tuning Monitor & Tune Pronunciations Grammar / Vocabulary Language Model Acoustic Model
Reporting and Analysis Tools Tuning tools Logged Data VRU Recognition Events DialogModule Events Waveforms Call Playback DM Success Summary Performance Summary Call Flow Summary
Using Web Infrastructure • Web model of application development • Applications on web servers • Markup (VoiceXML) for controlling telephony resources • Shares Infrastructure with Web • Need content and application management tools for multi-model interfaces (Web, WAP, Speech) • Share application logic • Different user interfaces
Public Telephone Network (VoiceXML Servers) Internet (VoiceXML client) Web-based Deployment Model Application Servers Telephony Platform
ASR • Very Large Vocabularies • Support for VoiceXML/Portals/ASPs • Robustness • Accuracy
Very Large Vocabulary • Based on FST technology from AT&T • Vocabulary sizes >1 million words • Current practical limit 50 – 100K words • Less computation and memory • Enables new generation of functionality • Demo • Accessing directory-based information • Dialog technologies used to disambiguateanswers from a database
Support for VoiceXML/Portals/ASPs • Fully dynamic grammars • Parallel grammars • Dramatic memory footprint reduction • JavaScript in grammars
Robustness • Overall performance of ASR is very high • But, various situations can result in reduced performance • Identify and improve significant cases • Wireless environment becoming more important • Significant performance improvements for Wireless • Wide variety of wireless conditions - environment, coding, network
Natural Language • Tools For easier NL Application Development • Grammar Import/Export • Wildcards in grammars • NL grammars/actions in parallel with DialogModules • High-level Application Framework • Reusable higher-level Dialog Components • Constructs for commonly used dialogs • Customizable for particular application • Common User Interface Constructs • “How May I Help You” (HMIHY) Technology
“How May I Help You?” • What is “How May I Help You?”? • Automated handling of highly unconstrained customer input via interactive dialogue • More flexible than today’s NL grammars • Leverages years of AT&T call center experience • Enables a new class of speech applications • Call Routing • Help Desk • Customer Care
I was trying to call my sister in Italy but I got a wrong number CREDIT DIAL HELP How do I dial direct to Tokyo? I need to make a long distance call and charge to my home number CHARGE How May I Help You? STEP 1: Training LEARNING Salient Fragments Conceptual Relevance
I need to make a long distance call and charge to my home number I need to make a long distance call and charge to my home number 5.7 0.5 0.5 0.5 0.1 0.4 How May I Help You? STEP 2: Classification I need to make a long distance call and charge to my home number CREDIT DIAL HELP CHARGE
Which number do you want to call? Your home phone number? How May I Help You? STEP 3: Dialog Disambiguation/Completion questions I need to make a long distance call and charge to my home number I need to make a long distance call and charge to my home number CHARGE # to Call OR To Home Number To Credit Card Credit Card # Home #
TTS • TTS becoming more important • Dynamic information • Voice Portals • Large Vocabulary tasks • Quality now acceptable
Comparing TTS Systems • Lucent • AcuVoice • Festival • L&H RealSpeak • Speechify female • Speechify male
TTS • Product development • Increased densities • More Platforms • New voices • SpeechWorks standard voices • Custom voice development • Application-specific improvements • Increased quality for application • Mix of TTS and recordings • New languages
Speech Technology • Continued gains on raw technology (30-50% error rate reductions per year) • Supports more and more difficult tasks • Supports richer User Interfaces
Natural Language • Drive NL tools and capabilities by User Interface • Current State-of-the-Art User Interface is Directed Dialog + NL Shortcuts + Personalization • As users gain experience, evolve User Interface • SpeechWorks evolving tools to match • SpeechWorks participating in DARPA Speech Program
TTS • Continued quality increases • With application-specific tuning, should approach human-quality • Hardware costs fall significantly • Reduced cost of custom voices • Broad language support
SpeechWeb: The Next Year • High performance VoiceXML platforms and applications • Open Source Browser/SpeechLinks • Server-Side tools for easier application development
SpeechWeb: Ongoing Development • Standardization is essential! • Open Source efforts • Critical mass of applications • Critical mass of users • Tools for combined Web/WAP/Speech development
Networks: The Next Year • Voice Over IP (VOIP) • Working with platform providers • Optimizing for this environment • Cellular • Main focus of ASR robustness work • Obtaining performance similar to landline in reasonable conditions • Increased mobile use • Environmental noise • Hands-free
Public Telephone Network Enterprise Server(s) DialogModulesTM SpeechWorksTM Database Connectivity Recognition Engine Application Building Blocks IVR Platform Changing the Network
VOIP TCP/IP Enterprise Server(s) DialogModulesTM SpeechWorksTM Database Connectivity Recognition Engine Application Building Blocks IVR Platform Changing the Network
VOIP TCP/IP Enterprise Server(s) DialogModulesTM SpeechWorksTM Database Connectivity Recognition Engine Application Building Blocks IVR Platform Changing the Network Voip phones PDA’s Wireless Etc.
Networks: Next Generations • VOIP • Network-based • Premise-based • DSR - Distributed Speech Recognition • Front-end processing in handset/mobile device/gateway • Better performance at reduced bandwidth • Wide-Range of devices
Mobile Devices • Increasing use of Mobile Devices • With High Quality Displays • With Wireless Networking • Too Small for Keyboard • Speech + Pointing In • Speech + Display out
Summary • Speech will play an increasing roll as UI of choice • Especially in mobile environments • Advances to make this possible include • Continued progress on core technology • Evolution of Speech User Interfaces • Platforms designed and optimized for speech interface • Evolution of standards