200 likes | 409 Views
VoiceXML. Nuance Speech Analysis. 92% of customer service is through phone. 84% of industrialists believe speech better than web. History of VoiceXML. Bell/Lucent (’98). PML. PML. IBM (’98). SpeechML. VoiceXML Forum (’00). W3C (’02). AT&T (‘95). VoiceXML 1.0. VoiceXML 2.0.
E N D
Nuance Speech Analysis • 92% of customer service is through phone. • 84% of industrialists believe speech better than web.
History of VoiceXML Bell/Lucent (’98) PML PML IBM (’98) SpeechML VoiceXML Forum (’00) W3C (’02) AT&T (‘95) VoiceXML 1.0 VoiceXML 2.0 HP (’98) TalkML Motorola (’98) VoxML
VoiceXML • Open standard-language for serving voice/audio documents. • VoiceXML is designed for creating audio dialogs that feature. • Synthesized speech, Digitized audio, Recognition of spoken and DTMF key input, Recording of spoken input, Telephony and Mixed-Initiative conversations.
VoiceXML (Cont’d) • VoiceXML allows scripts/CGIs etc. • Can take input from the listener via speech(fill out forms like in HTML). • Used extensively for automated call handling. • Makes info accessible over (cell) phones • The next revolution on the Web.
Goals of VoiceXML • Web development and content delivery into voice response applications. • Minimize client/server interactions. • Separate code from service logic. • Shield the application authors from platform specific details.
Voice Browser • Software platform running on a network server. • It supports the following features. • ASR • DTMF • Recognition grammars • Mixed-initiative dialog • TTS • Voice browser:VoiceXML :: Web browser:HTML
Sample VoiceXML Code • <vxml version="2.0"> <form> <field name="rich" type="boolean"> <prompt>Would you like to get rich quick?</prompt> <filled>Gotcha. <if cond="rich">You want to be rich! <goto next="rich.vxml" /> <else /> You don't want to be rich. <goto next="poor.vxml" /> </if> </filled> </field> </form> </vxml>
Problem with VoiceXML • Navigation of the voice document. • Author has to ask where listener will like to go next. • Listener has absolutely no control over navigation. • Tedium, Adv.Applications not possible. • Analogy: Scroll vs book
Voice Anchors • Speech labels that listeners can place on a dialog. • Listener can return to that dialog later by uttering that label. • Hard to implement, as free-form speech recognition is not possible. • Need to incorporate in the voice browser.
Voice Anchors • We developed a number of methods for attaching voice anchors. • Most practical method: Spelling. • Anchor as a whole word. • Default anchors • Default navigation strategies
Cumulative Anchors • Different dialogs can be marked with the same label. • Recalling the label reads out the corresponding dialogs. • Multiple cumulative anchors in a single document.
Grammar • Set of valid expressions. • Each dialog references one or more grammars. • Nuance Grammar Specification Language (GSL). • Inline grammar and Offline grammar. • Offline provides the following advantages: • Can be generated dynamically (via Cgi’s, Asp's). • Reused by multiple dialogs or applications. • Updated and modified without change in source code. • Subgrammars and Form-level grammar.
Sample Grammar code <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [(skip)]{<option "skip">} [(previous)]{<option "previous">} [(place anchor) (call mark) (begin mark)]{<option "mark">} [(recall mark) (recall anchor) (recall)]{<option "recall">} ] ]]>
Applications • The Voice Web. • Talking books • Mathematics for visually impaired. • Hazardous Material Emergency Response.