280 likes | 404 Views
VISIONS, TECHNOLOGY, AND BUSINESS OF CONVERSATIONAL MACHINES. Mazin Gilbert, Director, AT&T August 7-10, 2006. Joint work with Roberto Pieraccini, Tell-Eureka. A brief history of spoken language technology. Homer Dudley Bell Labs (1939). Von Kempelen (1791).
E N D
VISIONS, TECHNOLOGY, AND BUSINESS OF CONVERSATIONAL MACHINES Mazin Gilbert, Director, AT&T August 7-10, 2006 Joint work with Roberto Pieraccini, Tell-Eureka
A brief history of spoken language technology
Homer Dudley Bell Labs (1939) Von Kempelen (1791) Joseph Faber (1835) Talking Machines: First Steps into Spoken Language Technology
S1 S2 S3 The 60's to 90's: Technology Evolution Isolated Words Speaker Dependent Connected Words Speaker Independent Context Dependent Sub-Word Units Stochastic Language Models Template Matching a11 a22 a33 a12 a23 Acoustic/Phonetic Hidden Markov Models The statistical approach becomes ubiquitous
HOSTING APPLICATION DEVELOPERS STANDARDS TOOLS PLATFORM INTEGRATORS STANDARDS TECHNOLOGY VENDORS STANDARDS The 90’s: the Birth of the Spoken Dialog Industry
Modern speech technology
The Speech Technology Chain Speech Speech TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning DialogManagement
The Speech Technology Chain Accurately and efficiently convert a speech signal into a text message independent of the device, speaker or the environment. Speech Speech TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning DialogManagement
ASR - The Big Picture! Change AM for each new language Acoustic Model P(X|W) Input Speech “Hello World” Decoder/ Pattern Classification Confidence Scoring Feature Extraction (0.9) (0.8) Word Lexicon Language Model P(W) Change LM and Lexicon for each new language & app.
Human Speech Recognition vs. ASR Accuracy Machines Outperform Humans Efficiency x100 Operational Performance x10 On-line Learning x1 Robustness Machines are 5-50 times worse than humans on virtually any recognition task.
The Speech Technology Chain Speech Speech TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning DialogManagement Extract the meaning from recognized speech and interpret a user’s request
Why is SLU a Difficult Problem? Ways to say “question about my bill”
Knowledge Sources for SLU Enabling Applications Syntactic Call routing Pragmatic Semantic Lexical Problem solving Customer care Speech Translation Acoustic/ Phonetic Speech Data Mining
SLU - The Big Picture! From ASR/DM (text, lattices, n-best, history) Text Normalization Morphology, Synonyms Database Access Extracting named entities, semantic concepts, syntactic tree Parsing/ Decoding Interpretation Slot filling, reasoning, task knowledge representation To DM (concepts, entities, parse tree)
The Speech Technology Chain Speech Speech TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning Manage elaborate exchanges with the user, providing access to information DialogManagement
Context Interpretation The Dialog Flow Dialog Strategies Backend Action Observation Dialog State Transition
Mixed-Initiative Dialog Who manages the dialog? User System Initiative How may I help you? I need to travel from Chicago to Newark tomorrow night Please say just your departure city. Chicago
The Speech Technology Chain Speech Speech TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning Translate the action of the DM into a textual representation DialogManagement
The Speech Technology Chain Provide completely natural, high intelligibility speech from text for any talker, language or accent Speech Speech TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning DialogManagement
Concatenative Synthesis Dictionary and Rules Store of Sound Units Change Sound Store for each new voice and/or language Change Front-End for each new language Speech Waveform Modification and Synthesis Text Analysis,Letter-to-Sound,Prosody Assemble Units that Match Input Targets Speech Text Alphabetic Characters Phonetic Symbols, Prosody Targets
"There is no data like more data", but data is expensive to collect and label, and typically unavailable in large quantities for every speaker, language and environment. Significant resources and expertise are necessary for creating, maintaining and customizing conversational machines. Speech input/output is insufficient for accommodating for system failures and for creating complex automated applications for anyone and anywhere. Conversational MachinesLessons Learned
Multimodal Technology Components Speech Speech Pen Gesture Visual TTS ASR Automatic SpeechRecognition Text-to-SpeechSynthesis Data, Rules Words Words SLG SLU Spoken Language Generation Spoken LanguageUnderstanding DM Action Meaning DialogManagement
Commercial Spoken Dialog Systems
Speech Scientist VUI Designer usability testing 8 full deployment speech science 7 Analyst VUI Designer 2 3 1 VUI design 10 9 6 VUI development partial deployment 4 5 requirements high level system design system engineering integration Architect, App Developer Engineer The Speech Application Lifecycle
MRCP SSML, SRGF, EMMA The Voice Web Web Server Telephony Platform Voice Browser Internet TTS ASR VoiceXML /SALT Telephone
The Speech Technology Market Speech to Speech Translation Entertainment Server-based Telephony Conversational Desktop Dictation Security Embedded Car Cell Call Center Automation Multimodal/ Multimedia
Business in Conversational Technology • Return on Investment (ROI) • Reduce cost • Enable self service options • New revenue opportunities • Customer Retention • Better user interface • Reduce waiting time for callers • Reduce misrouting • Branding • Project a new image and brand awareness • Use of persona HOSTING APPLICATION DEVELOPERS TOOLS PLATFORM INTEGRATORS TECHNOLOGY VENDORS
VISIONS, TECHNOLOGY, AND BUSINESS OF CONVERSATIONAL MACHINES