280 likes | 600 Views
Speech in .NET. Sphinx CMU November 2002. Presenter. casey chesnut brains-N-brawn.com Web Services Mobile / Wireless Speech. Audience. Java / C++ / VB / C# ? VoiceXml ? SALT / Speech .NET ?. Outline. MS Technologies VoiceXml Demo Speech .NET Demo Future Questions (throughout)
E N D
Speech in .NET Sphinx CMU November 2002
Presenter • casey chesnut • brains-N-brawn.com • Web Services • Mobile / Wireless • Speech
Audience • Java / C++ / VB / C# ? • VoiceXml ? • SALT / Speech .NET ?
Outline • MS Technologies • VoiceXml • Demo • Speech .NET • Demo • Future • Questions (throughout) • ~25 slides
MS Technologies • Tools • Devices • Phone • Desktop PC • Pocket PC • Tablet PC
Tools • MS Agents • SAPI / Speech SDK 5.1 (.NET wrappable) • Office • AutoPC ??? • ASP .NET (VoiceXml) • (beta) Speech .NET / IE Speech Add-In • … SALT Telephony gateway (early 2003) • … Pocket IE Speech Add-In (mid 2003)
Devices • Phone • billions of devices, people are comfortable speaking to • Desktop PC • large market, speech input is slower and uncomfortable • Pocket PC • small market, opportunities for speech (device limitations) • Tablet PC • new market, speech friendly (slate models don’t have keyboards)
Phone • ASP .NET w/ VoiceXml 2.0 • Production quality now • Multiple vendor support • Speech .NET VoiceOnly • Currently no way to deploy and test over a phone • Speech .NET Beta 2 has telephony simulation • MS target market for Speech .NET
Desktop PC • Web • Speech .NET MultiModal • Beta 2 IE Speech Add-In • Embedded control w/SAPI • MS Agents • Fat • SAPI • MS Agents
Pocket PC • Web • SALT Pocket IE Speech Add-Ins (mid 2003) • Fat • 3rd parties only • MS Reader does not support TTS
Tablet PC - TODAY! • Web • … same as desktop PC • Beta 2 has added support for Tablet PC • Virtual keyboard has speech control • Fat • … same as desktop PC • Virtual keyboard has speech control • MS Reader should be able to support TTS • Digital Ink is currently more compelling to MS
VoiceXml • XML-based language • Declarative – XML tags, grammars • Procedural – Javascript • Telephony Gateway is the client • Event driven – Bargein, Goodbye • Object oriented – Properties
Usage • Input • Speech Recognition (Command and Control) • DTMF • Voice recording and posting to a server • Output • Text-To-Speech • Prerecorded audio files • Telephony control • Hang-up, Transfers, …
VoiceXml • DEMO • /vxml (VS.NET) • Mobile ADK (menu1.aspx) • BeVocal
VoiceXml - SALT • VoiceXml : ??? : : SALT : Speech .NET • Nuance has some WYSIWYG • SALT is considered lightweight to VoiceXml • SALT was submitted to W3C August 2002 • VoiceXml is v2.0 in W3C • Mandatory W3C grammar spec • Beta 2 Speech .NET has moved to W3C SRGS • VoiceXml has complementary specs (ccXml) • VoiceXml is moving to MultiModal as well
VoiceXml - SALT • VoiceXml = AT&T, Motorola, TellMe, (IBM) • SALT = MS, SpeechWorks, Intel, (BeVocal) • VoiceXml has multiple vendor support with venture capital from before the burst • Most vendors will support both specs • VoiceXml has ~ 15,000 developers • SALT has potentially millions
SALT • I have not read the new spec • Remember doing an in-head mapping to VoiceXml when reading an early spec • Why • Common spec for MultiModal operation • Multiple modes of interaction with the same syntax • Speech enabling existing sites • Why not VoiceXml • MultiModal retrofit harder than redo
Speech .NET • MS implementation of SALT • (VoiceWebSolutions + DreamWeaver MX) • Some Beta 1 Speech .NET apps still work, because SALT has not changed much, but Speech .NET Beta 2 controls have • VoiceXml not as portable between vendors as it should be, the Speech .NET controls could help mitigate this for SALT • i.e. layer of abstraction for voice browser wars
Code • Creating static grammars and prompts • Very little server-side code • Only dynamic grammars / prompts • Server-side code mods to better support speech • Mainly setting properties on Speech controls and tying to client-side javascript • Tie javascript to mouse-click events to avoid redundant code
Impression • Separate app layers to reduce complexity • Voice UI will be less functional, design is key • Learning low level SALT might be easier than high level Speech .NET controls • Application controls change this in Beta 2 • Speech .NET has a great debugger (now server side too), grammar, and prompt tools • Speech Control Editor was needed for dev • IE Audio meter was needed for MultiModal • MultiModal has some time to grow
Speech .NET • DEMO • Speech .NET Beta 2 (VS .NET) • /noHands (VoiceOnly web app)
Industry • Wrote 1st VoiceXml article a year ago • Received 1st proposal request last month • 1 other proposal request since then • Wrote 1st Speech .NET article 5 months ago • Request for an article from MSDN magazine
Voice Recognition • PSTN is less secure than Internet! • More accessible and easier to automate hack • Traditionally spoken password OR DTMF pin, also # • Clients always confuse with speech recognition • Not a part of VoiceXml or SALT specs • Telephony gateways proprietary implementations • Not useful for identifying somebody • Useful for confirming somebody is whom they say they are • Prints have to change when device changes
Future (MS Speech) • SALT Telephony gateways • Speech .NET (VoiceOnly then MultiModal) • Pocket IE Speech Add-In • NET Fat-client Speech APIs • Desktop / Tablet / PPC • MS or 3rd party VS .NET VoiceXml controls • Possibility for Speech .NET controls to render both SALT and VoiceXml
Future • Lots of W3C Voice specs … • VoiceXml MultiModal browser • Auto (hands-free, navigation, radio) • 3G (bridge voice and wireless web) • offload Speech processing • VOIP or PSTN • Pocket PC Phone Edition / SmartPhones • IBM recently announced chip for Speech on mobile devices