520 likes | 617 Views
VoiceXML Overview, Opportunities & Challenges. Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com O’Reilly Conference on Enterprise Java, 2001. Agenda. Introduction History Elements Developing Voice Portals Applications Vendor Landscape Challenges
E N D
VoiceXMLOverview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com O’Reilly Conference on Enterprise Java, 2001
Agenda • Introduction • History • Elements • Developing Voice Portals • Applications • Vendor Landscape • Challenges • Resources
The Web is Ubiquitous • Key Highlights • HTTP Protocol • HTML for Content • Static, Dynamically Generated • Usage Model • Create Content/Scripts • Publish on the Web Server • Access it through a web browser
What about Voice? • Call Center, IVR based products have been around • IVR Applications usually are “DTMF” oriented • Interaction through the key pad rather than Voice • Complex Infrastructure • Involve huge investments in proprietary solutions • Lack of integration with the Internet • ASP model for deployment wasn’t established • Emergence of sophisticated Text-to-Speech/Voice Recognition solutions
VoiceXML • What is VoiceXML? • XML based markup language which describes voice/touch-tone based interactions for development of interactive voice based applications
Technical Highlights • Based on XML 1.0 • Supports • DTMF (touch tone keys) and Voice Input • Press 1 for Email; Please say your name • TTS (Text-to-Speech) and Pre-Recorded Audio Output • Recording of User Input • Telephony Integration • e.g. Connect to a Live Operator • Form & field level grammars • direct and (near) natural dialogs • Direct: Which city would you like to go?San Jose • Natural Like: What can I do for you, today?I would like to travel from San Jose, CA to Newark, NJ on 15 Nov
Key Benefits • Brings the ubiquity of Web to the ubiquitous access device – an ordinary phone • Reach billion(s) of LAN and mobile phones • Hands free communication for automobiles • Single Platform for developing Web & Voice Applications • Opens up the web to reach billions of ordinary phones worldwide • Automated Customer Service • Can enhance customer satisfaction (immediate response) • Lower costs (lesser customer service reps. and customer waiting costs!) • Can use it even in a flight!
Hello VoiceXML <?xml version="1.0"?> <vxml version="1.0"> <form> <block> Hello World! </block> </form> </vxml>
History • 3/2/1999 AT&T, Lucent & Motorola create VXML Forum No of Members: 17 • 8/25/1999 VoiceXML 0.9 Preliminary Spec Released No of Members: 61 • 3/7/2000 VoiceXML 1.0 Spec Released No of Members: 79 • 5/22/2000 VoiceXML 1.0 submitted to W3C No of Members: 150 • Today, there are 281 members of the VoiceXML Forum(10/5/2000)
Earlier Works • SpeechML by IBM • VoxML by Motorola • PhoneWeb/PML by Lucent/AT&T
Elements • Root • <vxml> • Form/Interaction • <field>, <filled>, <initial>, <param>, <option> • Grammar • <dtmf>, <grammar> • Events • <error>, <exit>, <noinput>, <help>, <nomatch> • Platform Specific • <meta>, <property>, <object> • Telephony Integration • <disconnect>, <record>, <transfer>
Elements • Language • <if>, <else>, <elseif>, <assign>, <value>, <var>, <script>, <return>, <clear>, <throw>, <catch>, <subdialog>, <block> • Prompt/Audio • <break>, <sayas>, <audio>, <block>, <enumerate>, <emp>, <prompt>, <pros>, <div>, <reprompt> • Navigation • <choice>, <menu>, <link>, <goto>, <submit>
Prompts • TTS (Text-to-Speech) • <prompt>What can I do for you?</prompt> • <prompt> Did you say <sayas class=“phone”>732-362-2187</sayas></prompt> • Did you say Area Code (732) 362-2187 • Pre-Recorded Prompts • <prompt> <audio src=“initial_greetings.wav”/>, Hitesh</prompt> • Rule of Thumb • Use TTS sparingly (only for dynamic information) • <prompt bargein=“false”> can be used for Ads or any other special announcements.
Navigation <?xml version="1.0"?> <vxml version="1.0"> <menu> <prompt>Welcome to your Personal Portal. <enumerate/> </prompt> <choice dtmf="1" caching="safe" next="Email.jsp">Email</choice> <choice dtmf="2" caching="safe" next="Calendar.jsp">Calendar</choice> <choice dtmf="3" caching="safe" next=“EmployeeDirectory.jsp">Employee Directory</choice> </menu> </vxml>
Grammars • Specify utterances that a user may speak to provide corresponding string value or set of attribute-value pairs • Can define a form grammar or field grammar • Spec. doesn’t require an implementation to support a particular format • Common Grammar Formats • Java Speech API Grammar Spec (JSGF) • Nuance GSL • Speech Recognition Grammar Spec for W3C Speech Interface Framework (Working Draft) • Can be specified inline with the VoiceXML document or referenced externally using the <grammar> tag
Inline ... <field name="emplId"><prompt>Say the name of the person</prompt><grammar type="application/x-jsgf"> hitesh seth {1} | ... </grammar> ... </field> ... External ... <field name="emplId"><prompt>Say the name of the person</prompt><grammar type="application/x-jsgf" src="mycompany.gram#employee" caching="safe"/> ... </field> ... mycompany.gram #JSGF V1.0; grammar mycompany; public <employee> = (hitesh seth) {1} ... Grammars
Interaction <?xml version="1.0"?> <vxml version="1.0"> <form id="Main"> <field name="emplId"> <prompt>Say the name of the person</prompt> <grammar type="application/x-jsgf"> (hitesh seth) {1} | ... </grammar> <filled> <if cond="emplId=='1'"> <goto next="#Employee1"/> <elseif cond="emplId=='2'"/> ... </if> </filled> </field> </form>
Interaction <form id=“Employee1"> <block> <prompt>Hitesh Seth. Direct Phone: <sayas class="phone">732-362-2187</sayas>. </prompt> </block> </form> ... </vxml>
Telephony Integration • <transfer> element • Connect the user to another phone • Applications • Assisted dialing • Online Employee Directory! • I would like to call Hitesh on his cellular phone. • Connecting to (732) 433-5603 …. • Switching to a human Operator • Welcome to XYZ Voice Portal. At any point of time say Operator to connect to a customer service agent. Please say your name. ….
Telephony Integration <?xml version="1.0"?> <vxml version="1.0"> <form ...> ... <field name="cmd"> <prompt>Hitesh’s direct phone is (732) 362-2187, Cellular ... </prompt> <grammar type="application/x-jsgf"> home | direct | cellular </grammar> <filled> <if cond="cmd=='direct'"> <assign name="phone_no" expr="'7323622187'" /> <goto next="#CallTransfer"/> <elseif cond="cmd=='cellular'"/> ... </if> </filled> </field> ... </form>
Telephony Integration <form id="CallTransfer"> <block><prompt><audio src="transfer.wav“/></prompt> <transfer dest="{phone_no}"/> </form> </vxml>
Extensions • <object> & <property> Tags • <property> • Implementation Specific Properties • e.g. • TTS Engine Parameters (gender, tone etc) • <object> • Implementation Specific Components and Value Add Services • e.g. • Integration with the components built for the underlying ASR Engine (e.g. Nuance SpeechObjects) • e.g. Component for getting an address • Caller-Id Information Service • Cellular Phone Location Service
Developing • What do you need? • Development Tool • To develop/test the application • IBM WebSphere Voice Server SDK, Motorola Mobile ADK, Nuance V-Builder, Tellme Studio, … • Web Server • To execute the scripts/server VoiceXML content • Apache, Microsoft, Netscape, … • JSP, Servlets • XML Parser, XSLT Processor • VoiceXML Interpreter/Implementation Platform • Ordinary Touch Tone Phone • PC with a good Sound Card and microphone • For Creating/Testing Applications using Simulators/SDKs
Static/Dynamic • Serving! Up VoiceXML • Static v/s Dynamic Content • Dynamic • Server Scripting technologies such as JSP,Servlets to generate VoiceXML • Dynamic Presentation using XML/XSLT • XML represents content • XSLT represents transformation of the content into presentation • Use Apache Cocoon!
XML/XSLT • XML • Represents Data • Static XMLor • Dynamically Generated using Server Scripts • XSLT • Represents Formatting • Write it yourselfor • Create through a tool
Processing XML/XSLT • JSP <%@page import="org.apache.xalan.xslt.*"%> <% String xml =“AddressBook.xml"; XSLTProcessor processor= XSLTProcessorFactory.getProcessor(); String xslFile = "AddressBook.xsl"; processor.process( new XSLTInputSource(xmlFile), new XSLTInputSource(xslFile), new XSLTResultTarget(out)); %> • Use Sophisticated Content Management Systems • Create different Style Sheets for different interfaces - VoiceXML, HTML,WML,etc.
Deployment • Infrastructure Required • In Addition to Web Application Server serving VoiceXML pages, you need • Telephony Interface Boards • ASR Engine • TTS Engine • VoiceXML Interpreter • Bandwidth/Incoming Lines • Deployment Options • Pre-packaged VoiceXML Server (all-in-one) • Pick and choose VoiceXML Solution components • ASR, TTS, VoiceXML Interpreter, Hardware Ports, Bandwidth • Hosted Voice ASP Solutions
Applications • Utilized Web Content/Information • Stock Quotes, Weather Information, News • Customer Service • Order Status, Address Change, Automated Call Center, etc • Commerce • Banking, Stock Trading, Voice Enabled Commerce • Corporate Portals • Employee Directory, Employee Self Service - Human Resources, Email, Calendar, Unified Messaging • Alerts [Push Model] • Server Initiated Transactions (Call me when the stock price of any company in my portfolio goes up by $10)
Corporate Portal Scenario • 1 (800) – XXXXXXX • Welcome to Your Corporate Portal. Please say your name. • Hitesh Seth • Please enter your access code • **** • Good Morning, Hitesh. What can I do for you? • Check my mail • You have 34 new messages. • Is there any new message from my boss? • Yes there are two message from …
Corporate Portal (contd.) • First message. Subject: Help Need in XYZ Project. • Hitesh, could you please call …?. • Reply • I am in San Jose till 15th of November. I could come to Phoenix on 16th November.[#] • [used <record>] • Mail Sent • When am I meeting with John today? • You have a meeting with John, at 2:00 PM. • Connect me to his office, please. • Connecting to John’s direct number, (732) ... • [used <transfer>]
Vendor Landscape • All-in-one VoiceXML Gateways/Servers • Combines ASR, TTS, VoiceXML Interpreter, Hardware Ports • Lucent Speech Server, Motorola Voice Developer Gateway, VoiceGenie VoiceXML Gateway, … • ASR (Advanced Speech Recognition) Engines • AT&T, IBM, Nuance, Philips, SpeechWorks, … • Development Tools • IBM WebSphere Voice Server SDK, Motorola Mobile ADK, Nuance V-Builder, Tellme Studio, … • Recording & Developing Prompts • Microsoft Sound Recorder, Sonic Foundry Sound Forge, Syntrillium Software Cool Edit, ...
Vendor Landscape • Text-to-Speech Engines • AT&T, Fonix TTS, L&H RealSpeak, Lucent TTS Engine, Nuance Vocalizer, SpeechWorks Speechify, … • Telephony Interface Boards • Dialogic, Lucent, ... • Voice ASP Solutions • BeVocal, Interactive Telesis, Tellme, VoiceGenie Technologies, Voxeo.net, ...
Challenges • Need Sophisticated Infrastructure • Voice Recognition Quality • Need to build Sophisticated Grammars for near natural language speech recognition. • Your Application is as good as its grammar. • TTS Quality & Customization • Server Initiated VoiceXML Interactions! (Push Model) • VoiceXML Application Development Tools are still maturing
Authentication • Possible Approaches • User-Ids/Passwords • Too cryptic for ASR Engines to recognize • Usually need to spell it out, which is hard • Names/Access-Codes • Names may not be unique; may be good for intranets • Telephone No/Access Codes • Telephone No are unique (0017323622187) for International Portal, (7323622187) for a US Portal (or redirected to a US only area) • Easy to Key in and/or say-aloud • If available, use Caller-Id similar to “persistent cookie” • Voice Based Authentication • Voice Print/Pattern
Performance • Grammars • Inline v/s External • Caching! • VoiceXML Documents • Caching! • Multiple interactions per document • Audio • TTS v/s Recorded Prompts • Quality v/s Size
Getting Started • Take Small Steps • Use DTMF • Enter your 10 digit account number • Press 1 for Email, 2 for calendar, 3 for employee directory • Use Directed Dialogs • Say the name of the person • Move towards natural language conversations • What can I do for you? • Use TTS Sparingly for quality of voice interaction • If your application incorporate ads, make sure to make them short and crisp • Start Small, grow big (try regional betas/limited trials and move towards a larger audience)
Opportunities • According to Kelsey Group • By 2005, • Advertising and transaction from Voice Portals will produce $5 billion in revenues and $6 billion for associated hardware, software and Net service provider companies. (Adopted from Voice portal companies overshooting demand, http://news.cnet.com/news/0-1004-200-1844967.html, May 9, 2000)
Resources • Organizations • VoiceXML Forumhttp://www.voicexml.org • W3C Voice Browser Activityhttp://www.w3c.org/Voice • Specs • VoiceXML Specificationhttp://www.voicexml.org/spec.html • Java Speech API Grammar Spec (JSGF)http://java.sun.com/products/java-media/speech/forDevelopers/JSGF.pdf
Vendors AT&Thttp://www.att.com/aspg/ BeVocalhttp://www.BeVocal.com Dialogichttp://www.dialogic.com IBMhttp://www.ibm.com/software/speech Lucenthttp://www.lucent.com/speech Motorolahttp://www.motorola.com Nuancehttp://www.nuance.com Tellmehttp://www.tellme.com SpeechWorkshttp://www.speechworks.com VocieGenie Technologieshttp://www.voicegenie.com Voxeohttp://www.voxeo.com Resources