240 likes | 374 Views
Listener-Control Navigation of VoiceXML. Nuance Speech Analysis. 92% of customer service is through phone. 84% of industrialists believe speech better than web. History of VoiceXML. Bell/Lucent (’98). PML. PML. IBM (’98). SpeechML. VoiceXML Forum (’00). W3C (’02). AT&T (‘95).
E N D
Nuance Speech Analysis • 92% of customer service is through phone. • 84% of industrialists believe speech better than web.
History of VoiceXML Bell/Lucent (’98) PML PML IBM (’98) SpeechML VoiceXML Forum (’00) W3C (’02) AT&T (‘95) VoiceXML 1.0 VoiceXML 2.0 HP (’98) TalkML Motorola (’98) VoxML
VoiceXML • Open standard-language for serving voice/audio documents. • VoiceXML is designed for creating audio dialogs that feature. • Synthesized speech, Digitized audio, Recognition of spoken and DTMF key input, Recording of spoken input, Telephony and Mixed-Initiative conversations.
VoiceXML (Cont’d) • VoiceXML allows scripts/CGIs etc. • Can take input from the listener via speech(fill out forms like in HTML). • Used extensively for automated call handling. • Makes info accessible over (cell) phones • The next revolution on the Web.
Goals of VoiceXML • Web development and content delivery into voice response applications. • Minimize client/server interactions. • Separate code from service logic. • Shield the application authors from platform specific details.
Voice Browser • Software platform running on a network server. • It supports the following features. • ASR • DTMF • Recognition grammars • Mixed-initiative dialog • TTS • Voice browser:VoiceXML :: Web browser:HTML
Sample VoiceXML Code • <vxml version="2.0"> <form> <field name="rich"> <grammar type=“application/x-gsl” mode = “voice”> <![CDATA[[ [(yes)]{<option “yes”>} [(no)]<option “no”>} ]]]> </grammar> <prompt>Would you like to get rich quick?</prompt> <filled>Gotcha. <if cond="rich==‘yes’">You want to be rich! <goto next="rich.vxml" /> <else /> You don't want to be rich. <goto next="poor.vxml" /> </if> </filled> </field> </form> </vxml>
Problem with VoiceXML • Navigation of the voice document. • Author has to ask where listener will like to go next. • Listener has absolutely no control over navigation. • Tedium, Adv.Applications not possible. • Analogy: Scroll vs book
Solution • Allow users to control navigation interactively. • Using Voice Anchors.
Voice Anchors • Permit Speech labels that listeners can place on a dialog. • Listener can return to that dialog later by uttering that label. • Hard to implement, as free-form speech recognition is not possible. • Need to incorporate in the voice browser.
Voice Anchors • We developed a number of methods for attaching voice anchors. • Most practical method: Spelling. • Anchor as a whole word. • Default anchors • Default navigation strategies
Recall Anchor Place Anchors Converter Voice browser Initial VXML Augmented VXML New VXML DB file Creates a DB file
Cumulative Anchors • Different dialogs can be marked with the same label. • Recalling the label reads out the corresponding dialogs. • Multiple cumulative anchors in a single document. • Allows creation of sub documents. • Hierarchy of sub documents can be created.
Grammar • Set of valid expressions. • Each dialog references one or more grammars. • Nuance Grammar Specification Language (GSL). • Inline grammar and Offline grammar. • Offline provides the following advantages: • Can be generated dynamically (via CGI’s, ASP's). • Reused by multiple dialogs or applications. • Updated and modified without change in source code. • Subgrammars and Form-level grammar.
Sample Grammar code <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [(skip)]{<option "skip">} [(previous)]{<option "previous">} [(place anchor) (call mark) (begin mark)]{<option "mark">} [(recall mark) (recall anchor) (recall)]{<option "recall">} ] ]]>
Get the HTML page Reference to another link in Augmented VXML Translator Converter Initial HTML Initial VXML Augmented VXML Voice browser
Applications Web access through voice. • This involves the following sequence of steps • HTML -> VXML • Translator written in java was already developed. • Navigation of VXML
Applications Mathematics for visually impaired. • This involves the following steps. • MathML -> VXML. • A translator was developed to convert the MathML documents to VXML documents using the XSLT semantics. • Navigation of VXML.
Conclusion & Future work • Designing default navigation strategies. • Unit of division for navigation. • Voice Scripting Languages. • Example: “repeat chlorine until exit”.