1 / 52

VoiceXML Overview, Opportunities & Challenges

VoiceXML Overview, Opportunities & Challenges. Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com O’Reilly Conference on Enterprise Java, 2001. Agenda. Introduction History Elements Developing Voice Portals Applications Vendor Landscape Challenges

matty
Download Presentation

VoiceXML Overview, Opportunities & Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VoiceXMLOverview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com O’Reilly Conference on Enterprise Java, 2001

  2. Agenda • Introduction • History • Elements • Developing Voice Portals • Applications • Vendor Landscape • Challenges • Resources

  3. Introduction

  4. The Web is Ubiquitous • Key Highlights • HTTP Protocol • HTML for Content • Static, Dynamically Generated • Usage Model • Create Content/Scripts • Publish on the Web Server • Access it through a web browser

  5. What about Voice? • Call Center, IVR based products have been around • IVR Applications usually are “DTMF” oriented • Interaction through the key pad rather than Voice • Complex Infrastructure • Involve huge investments in proprietary solutions • Lack of integration with the Internet • ASP model for deployment wasn’t established • Emergence of sophisticated Text-to-Speech/Voice Recognition solutions

  6. VoiceXML • What is VoiceXML? • XML based markup language which describes voice/touch-tone based interactions for development of interactive voice based applications

  7. Application Model

  8. Technical Highlights • Based on XML 1.0 • Supports • DTMF (touch tone keys) and Voice Input • Press 1 for Email; Please say your name • TTS (Text-to-Speech) and Pre-Recorded Audio Output • Recording of User Input • Telephony Integration • e.g. Connect to a Live Operator • Form & field level grammars • direct and (near) natural dialogs • Direct: Which city would you like to go?San Jose • Natural Like: What can I do for you, today?I would like to travel from San Jose, CA to Newark, NJ on 15 Nov

  9. Key Benefits • Brings the ubiquity of Web to the ubiquitous access device – an ordinary phone • Reach billion(s) of LAN and mobile phones • Hands free communication for automobiles • Single Platform for developing Web & Voice Applications • Opens up the web to reach billions of ordinary phones worldwide • Automated Customer Service • Can enhance customer satisfaction (immediate response) • Lower costs (lesser customer service reps. and customer waiting costs!) • Can use it even in a flight!

  10. Hello VoiceXML <?xml version="1.0"?> <vxml version="1.0"> <form> <block> Hello World! </block> </form> </vxml>

  11. Demo

  12. History

  13. History • 3/2/1999 AT&T, Lucent & Motorola create VXML Forum No of Members: 17 • 8/25/1999 VoiceXML 0.9 Preliminary Spec Released No of Members: 61 • 3/7/2000 VoiceXML 1.0 Spec Released No of Members: 79 • 5/22/2000 VoiceXML 1.0 submitted to W3C No of Members: 150 • Today, there are 281 members of the VoiceXML Forum(10/5/2000)

  14. Earlier Works • SpeechML by IBM • VoxML by Motorola • PhoneWeb/PML by Lucent/AT&T

  15. Elements

  16. Elements • Root • <vxml> • Form/Interaction • <field>, <filled>, <initial>, <param>, <option> • Grammar • <dtmf>, <grammar> • Events • <error>, <exit>, <noinput>, <help>, <nomatch> • Platform Specific • <meta>, <property>, <object> • Telephony Integration • <disconnect>, <record>, <transfer>

  17. Elements • Language • <if>, <else>, <elseif>, <assign>, <value>, <var>, <script>, <return>, <clear>, <throw>, <catch>, <subdialog>, <block> • Prompt/Audio • <break>, <sayas>, <audio>, <block>, <enumerate>, <emp>, <prompt>, <pros>, <div>, <reprompt> • Navigation • <choice>, <menu>, <link>, <goto>, <submit>

  18. Prompts • TTS (Text-to-Speech) • <prompt>What can I do for you?</prompt> • <prompt> Did you say <sayas class=“phone”>732-362-2187</sayas></prompt> • Did you say Area Code (732) 362-2187 • Pre-Recorded Prompts • <prompt> <audio src=“initial_greetings.wav”/>, Hitesh</prompt> • Rule of Thumb • Use TTS sparingly (only for dynamic information) • <prompt bargein=“false”> can be used for Ads or any other special announcements.

  19. Navigation <?xml version="1.0"?> <vxml version="1.0"> <menu> <prompt>Welcome to your Personal Portal. <enumerate/> </prompt> <choice dtmf="1" caching="safe" next="Email.jsp">Email</choice> <choice dtmf="2" caching="safe" next="Calendar.jsp">Calendar</choice> <choice dtmf="3" caching="safe" next=“EmployeeDirectory.jsp">Employee Directory</choice> </menu> </vxml>

  20. Grammars • Specify utterances that a user may speak to provide corresponding string value or set of attribute-value pairs • Can define a form grammar or field grammar • Spec. doesn’t require an implementation to support a particular format • Common Grammar Formats • Java Speech API Grammar Spec (JSGF) • Nuance GSL • Speech Recognition Grammar Spec for W3C Speech Interface Framework (Working Draft) • Can be specified inline with the VoiceXML document or referenced externally using the <grammar> tag

  21. Inline ... <field name="emplId"><prompt>Say the name of the person</prompt><grammar type="application/x-jsgf"> hitesh seth {1} | ... </grammar> ... </field> ... External ... <field name="emplId"><prompt>Say the name of the person</prompt><grammar type="application/x-jsgf" src="mycompany.gram#employee" caching="safe"/> ... </field> ... mycompany.gram #JSGF V1.0; grammar mycompany; public <employee> = (hitesh seth) {1} ... Grammars

  22. Interaction <?xml version="1.0"?> <vxml version="1.0"> <form id="Main"> <field name="emplId"> <prompt>Say the name of the person</prompt> <grammar type="application/x-jsgf"> (hitesh seth) {1} | ... </grammar> <filled> <if cond="emplId=='1'"> <goto next="#Employee1"/> <elseif cond="emplId=='2'"/> ... </if> </filled> </field> </form>

  23. Interaction <form id=“Employee1"> <block> <prompt>Hitesh Seth. Direct Phone: <sayas class="phone">732-362-2187</sayas>. </prompt> </block> </form> ... </vxml>

  24. Telephony Integration • <transfer> element • Connect the user to another phone • Applications • Assisted dialing • Online Employee Directory! • I would like to call Hitesh on his cellular phone. • Connecting to (732) 433-5603 …. • Switching to a human Operator • Welcome to XYZ Voice Portal. At any point of time say Operator to connect to a customer service agent. Please say your name. ….

  25. Telephony Integration <?xml version="1.0"?> <vxml version="1.0"> <form ...> ... <field name="cmd"> <prompt>Hitesh’s direct phone is (732) 362-2187, Cellular ... </prompt> <grammar type="application/x-jsgf"> home | direct | cellular </grammar> <filled> <if cond="cmd=='direct'"> <assign name="phone_no" expr="'7323622187'" /> <goto next="#CallTransfer"/> <elseif cond="cmd=='cellular'"/> ... </if> </filled> </field> ... </form>

  26. Telephony Integration <form id="CallTransfer"> <block><prompt><audio src="transfer.wav“/></prompt> <transfer dest="{phone_no}"/> </form> </vxml>

  27. Extensions • <object> & <property> Tags • <property> • Implementation Specific Properties • e.g. • TTS Engine Parameters (gender, tone etc) • <object> • Implementation Specific Components and Value Add Services • e.g. • Integration with the components built for the underlying ASR Engine (e.g. Nuance SpeechObjects) • e.g. Component for getting an address • Caller-Id Information Service • Cellular Phone Location Service

  28. Demo

  29. Developing Voice Portals

  30. Developing • What do you need? • Development Tool • To develop/test the application • IBM WebSphere Voice Server SDK, Motorola Mobile ADK, Nuance V-Builder, Tellme Studio, … • Web Server • To execute the scripts/server VoiceXML content • Apache, Microsoft, Netscape, … • JSP, Servlets • XML Parser, XSLT Processor • VoiceXML Interpreter/Implementation Platform • Ordinary Touch Tone Phone • PC with a good Sound Card and microphone • For Creating/Testing Applications using Simulators/SDKs

  31. Static/Dynamic • Serving! Up VoiceXML • Static v/s Dynamic Content • Dynamic • Server Scripting technologies such as JSP,Servlets to generate VoiceXML • Dynamic Presentation using XML/XSLT • XML represents content • XSLT represents transformation of the content into presentation • Use Apache Cocoon!

  32. XML/XSLT • XML • Represents Data • Static XMLor • Dynamically Generated using Server Scripts • XSLT • Represents Formatting • Write it yourselfor • Create through a tool

  33. Processing XML/XSLT • JSP <%@page import="org.apache.xalan.xslt.*"%> <% String xml =“AddressBook.xml"; XSLTProcessor processor= XSLTProcessorFactory.getProcessor(); String xslFile = "AddressBook.xsl"; processor.process( new XSLTInputSource(xmlFile), new XSLTInputSource(xslFile), new XSLTResultTarget(out)); %> • Use Sophisticated Content Management Systems • Create different Style Sheets for different interfaces - VoiceXML, HTML,WML,etc.

  34. Deployment • Infrastructure Required • In Addition to Web Application Server serving VoiceXML pages, you need • Telephony Interface Boards • ASR Engine • TTS Engine • VoiceXML Interpreter • Bandwidth/Incoming Lines • Deployment Options • Pre-packaged VoiceXML Server (all-in-one) • Pick and choose VoiceXML Solution components • ASR, TTS, VoiceXML Interpreter, Hardware Ports, Bandwidth • Hosted Voice ASP Solutions

  35. Applications

  36. Applications • Utilized Web Content/Information • Stock Quotes, Weather Information, News • Customer Service • Order Status, Address Change, Automated Call Center, etc • Commerce • Banking, Stock Trading, Voice Enabled Commerce • Corporate Portals • Employee Directory, Employee Self Service - Human Resources, Email, Calendar, Unified Messaging • Alerts [Push Model] • Server Initiated Transactions (Call me when the stock price of any company in my portfolio goes up by $10)

  37. Corporate Portal Scenario • 1 (800) – XXXXXXX • Welcome to Your Corporate Portal. Please say your name. • Hitesh Seth • Please enter your access code • **** • Good Morning, Hitesh. What can I do for you? • Check my mail • You have 34 new messages. • Is there any new message from my boss? • Yes there are two message from …

  38. Corporate Portal (contd.) • First message. Subject: Help Need in XYZ Project. • Hitesh, could you please call …?. • Reply • I am in San Jose till 15th of November. I could come to Phoenix on 16th November.[#] • [used <record>] • Mail Sent • When am I meeting with John today? • You have a meeting with John, at 2:00 PM. • Connect me to his office, please. • Connecting to John’s direct number, (732) ... • [used <transfer>]

  39. Vendor Landscape

  40. Vendor Landscape • All-in-one VoiceXML Gateways/Servers • Combines ASR, TTS, VoiceXML Interpreter, Hardware Ports • Lucent Speech Server, Motorola Voice Developer Gateway, VoiceGenie VoiceXML Gateway, … • ASR (Advanced Speech Recognition) Engines • AT&T, IBM, Nuance, Philips, SpeechWorks, … • Development Tools • IBM WebSphere Voice Server SDK, Motorola Mobile ADK, Nuance V-Builder, Tellme Studio, … • Recording & Developing Prompts • Microsoft Sound Recorder, Sonic Foundry Sound Forge, Syntrillium Software Cool Edit, ...

  41. Vendor Landscape • Text-to-Speech Engines • AT&T, Fonix TTS, L&H RealSpeak, Lucent TTS Engine, Nuance Vocalizer, SpeechWorks Speechify, … • Telephony Interface Boards • Dialogic, Lucent, ... • Voice ASP Solutions • BeVocal, Interactive Telesis, Tellme, VoiceGenie Technologies, Voxeo.net, ...

  42. Challenges

  43. Challenges • Need Sophisticated Infrastructure • Voice Recognition Quality • Need to build Sophisticated Grammars for near natural language speech recognition. • Your Application is as good as its grammar. • TTS Quality & Customization • Server Initiated VoiceXML Interactions! (Push Model) • VoiceXML Application Development Tools are still maturing

  44. Authentication • Possible Approaches • User-Ids/Passwords • Too cryptic for ASR Engines to recognize • Usually need to spell it out, which is hard • Names/Access-Codes • Names may not be unique; may be good for intranets • Telephone No/Access Codes • Telephone No are unique (0017323622187) for International Portal, (7323622187) for a US Portal (or redirected to a US only area) • Easy to Key in and/or say-aloud • If available, use Caller-Id similar to “persistent cookie” • Voice Based Authentication • Voice Print/Pattern

  45. Performance • Grammars • Inline v/s External • Caching! • VoiceXML Documents • Caching! • Multiple interactions per document • Audio • TTS v/s Recorded Prompts • Quality v/s Size

  46. Getting Started • Take Small Steps • Use DTMF • Enter your 10 digit account number • Press 1 for Email, 2 for calendar, 3 for employee directory • Use Directed Dialogs • Say the name of the person • Move towards natural language conversations • What can I do for you? • Use TTS Sparingly for quality of voice interaction • If your application incorporate ads, make sure to make them short and crisp • Start Small, grow big (try regional betas/limited trials and move towards a larger audience)

  47. Opportunities • According to Kelsey Group • By 2005, • Advertising and transaction from Voice Portals will produce $5 billion in revenues and $6 billion for associated hardware, software and Net service provider companies. (Adopted from Voice portal companies overshooting demand, http://news.cnet.com/news/0-1004-200-1844967.html, May 9, 2000)

  48. Resources

  49. Resources • Organizations • VoiceXML Forumhttp://www.voicexml.org • W3C Voice Browser Activityhttp://www.w3c.org/Voice • Specs • VoiceXML Specificationhttp://www.voicexml.org/spec.html • Java Speech API Grammar Spec (JSGF)http://java.sun.com/products/java-media/speech/forDevelopers/JSGF.pdf

  50. Vendors AT&Thttp://www.att.com/aspg/ BeVocalhttp://www.BeVocal.com Dialogichttp://www.dialogic.com IBMhttp://www.ibm.com/software/speech Lucenthttp://www.lucent.com/speech Motorolahttp://www.motorola.com Nuancehttp://www.nuance.com Tellmehttp://www.tellme.com SpeechWorkshttp://www.speechworks.com VocieGenie Technologieshttp://www.voicegenie.com Voxeohttp://www.voxeo.com Resources

More Related