1.13k likes | 1.54k Views
Introduction to VoiceXML 2.0. Rob Marchand Director of Product Management VoiceGenie Technologies Inc. Introduction to VoiceXML. Audience Managers and programmers with little experience with VoiceXML Attendees will learn The basic principles of VoiceXML,
E N D
Introduction to VoiceXML 2.0 Rob Marchand Director of Product Management VoiceGenie Technologies Inc.
Introduction to VoiceXML • Audience • Managers and programmers with little experience with VoiceXML • Attendees will learn • The basic principles of VoiceXML, • Just enough syntax to design and code simple speech applications requiring voice menus and voice forms.
VoiceXML in the Marketplace • VoiceXML 2.0 is now ratified as a Recommendation (e.g., official standard) by the W3C • Hundreds of millions of VoiceXML calls are answered every day VoiceXML is the standard for building speech-enabled applications
W3C and VoiceXML Forum • W3C manages the technical evolution and development of the VoiceXML language • VoiceXML Forum focuses on providing best practices, certification testing, resources and tools Together the W3C and VoiceXML Forum accelerate the adoption of VoiceXML-based speech applications
Outline • Motivation for VoiceXML • W3C Speech Interface Framework Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control
Motivation for Speech Applications • Users access Web sites from any telephone, anywhere, any time. • Speaking and listening are the natural usage modes for phones.
Speech-enabled Applications Are Possible Now • Increased computing power at less expense • Due to improved chip design and manufacturing techniques • Improved speech recognition • Due to refinements to basic speech recognition algorithms • Improved dialog design using voice • Minimizes the number of words and phrases that the speech recognizer must process at any point during the dialog
Strength of VoiceXML Applications • Traditional system-directed dialogs for novice users • Mixed initiative dialogs for experienced users • Novice users smoothly become experienced users at their own pace
Limitations of VoiceXML Applications • No special analysis of speech input • Not suitable for training speech skills—Reading, ESL, singing, etc. • VUI conversational bandwidth is slower than GUI conversational bandwidth • Using a VUI is like drinking from Lake Superior with a straw
Exercise 1 • Name or describe a speech application you could use at work. • Name or describe a speech application you or family member can use at home.
XML • XML = eXtensible Markup Language • Elements are surrounded by tags • <prompt>Welcome to the voice system </prompt> • Elements may be nested <prompt> Welcome to Ajax Travel <break/> we have the cheapest fares </prompt> • Elements may have attributes <choice next="#boat"> <grammar type="application/grammar+xml" version="1.0" root = "by_boat" src = “boat.grxml”> • Because “<”, “>”, and “&” have special meanings • “<” in place of “<” • “>” in place of “>” • “&” in place of “&”.
Outline • Motivation for VoiceXML • W3C Speech Interface Framework Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control
Documents Multimedia Files HTML Scripts VoiceXML Scripts Web Browser DB Voice Browser Capture Voice Grammars ASR Database Server DTMF Replay Audio Audio Files TTS Speech Server/Gateway Web Server
W3C Speech Interface Framework VoiceXML 2.0 Speech Synthesis Call Control SemanticInterpretation Other Grammar
Status of W3C Speech Interface Languages Recommendation VoiceXML 2.0 Grammar Synthesis Proposed Recommendation Candidate Recommendation Semantic Interpret- ration Call Control VoiceXML2.1 Last Call Working Draft Working Draft Requirements V 3 PLS
Outline • Motivation for VoiceXML • W3C Speech Interface Framework Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control
<?xml version="1.0"?> <vxml version="2.0"> <form> … <field> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice"> <rule id = “account_type"> <one-of> <item> savings </item> <item> checking </item> <item> CD </item> <item> certificate of deposit <tag>$ = “CD”<tag> </item> </one-of> </rule> </grammar> </field> …. <form> … </vxml> VoiceXML 2.0 Fragment Dialog Language (VoiceXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Speci
<?xml version="1.0"?> <vxml version="2.0"> <form> … <field> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice"> <rule id = “account_type"> <one-of> <item> savings </item> <item> checking </item> <item> CD </item> <item> certificate of deposit <tag>$ = “CD”<tag> </item> </one-of> </rule> </grammar> </field> …. </form> … </vxml> VoiceXML 2.0 Fragment Dialog Language (VoiceXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification erpretation (SI)
<?xml version="1.0"?> <vxml version="2.0"> <form> … <field> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice"> <rule id = “account_type"> <one-of> <item> savings </item> <item> checking </item> <item> CD </item> <item> certificate of deposit <tag>$ = “CD”<tag> </item> </one-of> </rule> </grammar> </field> …. </form> … </vxml> VoiceXML 2.0 Fragment Dialog Language (VoiceXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)
<?xml version="1.0"?> <vxml version="2.0"> <form> … <field> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice"> <rule id = “account_type"> <one-of> <item> savings </item> <item> checking </item> <item> CD </item> <item> certificate of deposit <tag>$ = “CD”<tag> </item> </one-of> </rule> </grammar> </field> …. </form> … </vxml> VoiceXML 2.0 Fragment Dialog Language (VoiceXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)
VoiceXML 2.0 features • Menus, forms, sub-dialogs • <menu>, <form>, <subdialog> • Inputs • Speech recognition <grammar> • Recording <record> • Keypad <grammar mode=“dtmf”> • Output • Audio files <audio> • Text-to-speech <prompt> • Variables • <var> <script> <assign> • Events • <nomatch>, <noinput>, <help>, <catch>, <throw> • Transition and submission • <goto>, <submit> • Telephony • Connection control • <transfer>, <disconnect> • Telephony information • Platform • Objects • Performance • Fetch
A Typical Voice Menu <menu> <prompt> <audio src=“http://www.ajax.com/three_blind_mice.wav"/> Do you want to listen, next, prior, buy, or exit? </prompt> <choice next="http://www.ajax.com/listen.vxml"> listen </choice> <choice next="http://www.ajax.com/next.vxml"> next </choice> <choice next="http://www.ajax.com/prior.vxml"> prior </choice> <choice next="http://www.ajax.com/buy.vxml"> buy </choice> <choice next="http://www.ajax.com/exit.vxml"> exit </choice> </menu> Exercise 2: Write a menu that asks the user a “yes/no” question to confirm that the user wants to buy the audio “three blind mice
Answer to Exercise 2A “yes/no” menu <menu> <prompt> Do you want to buy three blind mice now? </prompt> <choice next="http://www.ajax.com/yes.vxml"> yes </choice> <choice next="http://www.ajax.com/no.vxml"> no </choice> </menu>
<form> <prompt>Welcome to the electronic payment system.</prompt> <field name="card_number"> <prompt> Please enter your credit card number? </prompt> <grammar src=“http://www.ajax.com/credit_card_number.grxml"/> </field> <field name="date"> <prompt>Please enter your expiration date </prompt> <grammar src=“http://www.ajax.com/credit_card_date.grxml"/> </field> </form> Typical Form Fill-In Exercise 3: Write a form that solicits the month, day, and year for the user’s birth date.
Answer to Exercise 3 <form> <prompt> When were you born? </prompt> <field name = "month"> <prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> <field name = "day"> <prompt> What day of the month? </prompt> <grammar src=“http://www.ajax.com/day.grxml"/> </field> <field name = "year"> <prompt> What year </prompt> <grammar src=“http://www.ajax.com/year.grxml"/> </field> </form>
Event Handlers • Deal with exceptional or error conditions • Control mechanism for dialog turn retries • <catch event=“noinput”> … </catch> • <catch event=“nomatch” … </catch> • <catch event=“help”> … </catch> • Shorthand notation available • <noinput> … </noinput>, etc. • Scoped according to where they occur • <form>, <field>, etc.
Adding Event Handlers <form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch> <prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> ….. </form>
Adding Event Handlers <form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch> <prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> ….. </form>
Adding Event Handlers <form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch> <prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> ….. </form>
Default Event Handlers <catch event = "nomatch"> <prompt> I did not understand, please try again </prompt></catch> <catch event = "help"> <prompt> Sorry, no help is available. </prompt></catch> <catch event = "noinput"> <prompt> I did not hear anything, please speak again </prompt></catch>
Exercise 4Write event handlers for the month field <catch event = "nomatch"> <prompt> __________________________ </prompt></catch> <catch event = "help"> <prompt> ____________________ </prompt></catch> <catch event = "noinput"> <prompt> ___________________________________ </prompt></catch>
Answer to Exercise 4Write event handlers for the month field <catch event = "nomatch"> <prompt> Which month, for example, January February, or March? </prompt></catch> <catch event = "help"> <prompt> In what month were you born? </prompt></catch> <catch event = "noinput"> <prompt> Say the name of the month you were born in </prompt></catch>
Outline • Motivation for VoiceXML • W3C Speech Interface Framework Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control
Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis
Before and afterStructure Analysis • Before structure analysis • Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass. • After structure analysis <p> <s> Dr. Smith lives at 214 Elm Dr. </s> <s> He weights 214 lb. </s> <s> He plays bass guitar. </s> <s> He also likes to fish; last week he caught a 19 lb. bass. </s> </p>
Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis Markup support:say-as for dates, times, etc.sub for aliasing Non-markup behavior: automatically identify and convert constructs
After Text Normalization <p> <s> <sub alias= "doctor">Dr. </sub> Smith lives at 214 Elm <sub alias = "drive">Dr. </sub> </s> <s> He weights 214<sub alias= "pounds"> lb. </sub> </s> <s> He plays bass guitar. </s> <s> He also likes to fish; last week he caught a 19 <sub alias= "pound"> lb. </sub> bass. </s> </p>
<p> <s> <sub alias = "doctor">Dr.</sub> Smith lives at <say-as interpret-as = “address">214 </say-as> Elm <sub alias = "drive">Dr. </sub> </s> <s> He weighs <sayas interpret-as = “number">214 </sayas> <sub alias = "pounds"> lb.</sub> </s> <s> He plays bass guitar. </s> <s> He also likes to fish; last week he caught a <say-as interpret-as = “number">19 </say-as> <sub alias= "pound"> lb. </sub> bass. </s> </p>
Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: p, s Non-markup behavior: infer structure by automated text analysis Markup support:say-as for dates, times, etc.sub for aliasing Non-markup behavior: automatically identify and convert constructs
After text-to-phoneme conversion <p> <s> <sub alias = "doctor">Dr.</sub> Smith lives at <say-as interpret-as = “address"> 214 </sayas> Elm <sub alias = "drive">Dr. </sub> </s> <s> He weighs <sayas interpret-as = “number”>214 </sayas> <sub alias= "pounds"> lb.</sub> </s> <s> He plays <phoneme alphabet = “IPA" ph="b@s">bass</phoneme> guitar. </s> <s> He also likes to fish; last week he caught a <sayas interpret-as= “number">19 </sayas> <sub alias= "pound"> lb. </sub> <phoneme alphabet = “IPA" ph="bas">bass</phoneme>. </s> </p>
Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: p, s Non-markup behavior: infer structure by automated text analysis Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Markup support:say-as for dates, times, etc.sub for aliasing Non-markup behavior: automatically identify and convert constructs
Prosody Analysis(Initial text) <prompt> Environmental control menu. Do you want to adjust the lighting or temperature? </prompt>
Prosody Analysis(Add pause at phrase boundaries) <prompt> Environmental control menu <break strength=“medium”/> Do you want to adjust the lighting or temperature? </prompt>
Prosody analysis(De-emphasize familiar words) <prompt> Environmental control menu <break strength=“medium” /> <emphasis level = "reduced"> Do you want to adjust </emphasis> the lighting or temperature? </prompt>
Prosody Analysis(pause to let the listener catch up) <prompt> Environmental control menu <break/> <emphasis level = "reduced " > do you want to adjust </emphasis> the lighting <break/> or temperature? </prompt>
Prosody Analysis(Add emphasis to focus listener’s attention) <prompt> Environmental control menu <break/> <emphasis level = "reduced" > do you want to adjust the </emphasis> <emphasis level = "strong"> lighting </emphasis> <break/> or <emphasis level = "strong"> temperature? </emphasis> </prompt>
Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: voice, audio* Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: paragraph, sentence Non-markup behavior: infer structure by automated text analysis *audio icons, branding, advertising Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Markup support:say-as for dates, times, etc.sub for aliasing Non-markup behavior: automatically identify and convert constructs
Waveform Production <prompt> <audio src=“http://www.example.com/adjust.wav" > Environmental control menu. Do you want to adjust the lighting or temperature </audio> </prompt>
Exercise 5(insert SSML commands) <prompt> Welcome to Ajax Bank do you want to withdraw or deposit funds? </prompt>