1 / 61

LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing

LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing. Lecture 3: October 5, 2004 Dan Jurafsky. Week 2: Dialogue and Conversational Agents. Examples of spoken language systems Components of a dialogue system, focus on these 3: ASR NLU Dialogue management VoiceXML

sylviabrown
Download Presentation

LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 138/238 SYMBSYS 138Intro to Computer Speech and Language Processing Lecture 3: October 5, 2004 Dan Jurafsky LING 138/238 Autumn 2004

  2. Week 2: Dialogue and Conversational Agents • Examples of spoken language systems • Components of a dialogue system, focus on these 3: • ASR • NLU • Dialogue management • VoiceXML • Grounding and Confirmation LING 138/238 Autumn 2004

  3. Conversational Agents • AKA: • Spoken Language Systems • Dialogue Systems • Speech Dialogue Systems • Applications: • Travel arrangements (Amtrak, United airlines) • Telephone call routing • Tutoring • Communicating with robots • Anything with limited screen/keyboard LING 138/238 Autumn 2004

  4. A travel dialog: Communicator LING 138/238 Autumn 2004

  5. Call routing: ATT HMIHY LING 138/238 Autumn 2004

  6. A tutorial dialogue: ITSPOKE LING 138/238 Autumn 2004

  7. Dialogue System Architecture • Simplest possible architecture: ELIZA • Read-search/replace-print loop • We’ll need something with more sophisticated dialogue control • And speech LING 138/238 Autumn 2004

  8. Dialogue System Architecture LING 138/238 Autumn 2004

  9. ASR engine • ASR = Automatic Speech Recognition • Job of ASR system is to go from speech (telephone or microphone) to words • We will be studying this in a few weeks LING 138/238 Autumn 2004

  10. ASR Overview (pic from Yook 2003) LING 138/238 Autumn 2004

  11. ASR in Dialogue Systems • ASR systems work better if can constrain what words the speaker is likely to say. • A dialogue system often has these constraints: • System: What city are you departing from? • Can expect sentences of the form • I want to (leave|depart) from [CITYNAME] • From [CITYNAME] • [CITYNAME] • etc LING 138/238 Autumn 2004

  12. ASR in Dialogue Systems • Also, can adapt to speaker • But!! ASR is errorful • So unlike ELIZA, can’t count on the words being correct • As we will see, this fact about error plays a huge role in dialogue system design LING 138/238 Autumn 2004

  13. Natural Language Understanding • Also called NLU • We will discuss this later in the quarter • There are many ways to represent the meaning of sentences • For speech dialogue systems, perhaps the most common is a simple one called “Frame and slot semantics”. • Semantics = meaning LING 138/238 Autumn 2004

  14. An example of a frame • Show me morning flights from Boston to SF on Tuesday. SHOW: FLIGHTS: ORIGIN: CITY: Boston DATE: Tuesday TIME: morning DEST: CITY: San Francisco LING 138/238 Autumn 2004

  15. How to generate this semantics? • Many methods, as we will see in week 9 • Simplest: semantic grammars • LIST -> show me | I want | can I see|… • DEPARTTIME -> (after|around|before) HOUR | morning | afternoon | evening • HOUR -> one|two|three…|twelve (am|pm) • FLIGHTS -> (a) flight|flights • ORIGIN -> from CITY • DESTINATION -> to CITY • CITY -> Boston | San Francisco | Denver | Washington LING 138/238 Autumn 2004

  16. Semantics for a sentence • LIST FLIGHTS ORIGIN • Show me flights from Boston • DESTINATION DEPARTDATE • to San Francisco on Tuesday • DEPARTTIME • morning LING 138/238 Autumn 2004

  17. Frame-filling • We use a parser (week 10) to take these rules and apply them to the sentence. • Resulting in a semantics for the sentence • We can then write some simple code • That takes the semantically labeled sentence • And fills in the frame. LING 138/238 Autumn 2004

  18. Other NLU Approaches • Cascade of Finite-State-Transducers • Instead of a parser, we could use FSTs, which are very fast, to create the semantics. • Or we could use “Syntactic rules with semantic attachments” • This latter is what is done in VoiceXML, so we will see that today. LING 138/238 Autumn 2004

  19. Generation and TTS • Won’t say much about this today • TTS next week! • Generation: two main approaches • Simple templates (prescripted sentences) • Unification: use similar grammar rules as for parsing, but run them backwards! LING 138/238 Autumn 2004

  20. Dialogue Manager • Eliza was simplest dialogue manager • Read-search/replace-print loop • No state was kept; system did the same thing on every sentence • A real dialogue manager needs to keep state • We can’t keep asking the same question over and over! LING 138/238 Autumn 2004

  21. Three architectures for dialogue management • Finite State • Frame-based • Planning Agents LING 138/238 Autumn 2004

  22. Finite State Dialogue Manager LING 138/238 Autumn 2004

  23. Finite-state dialogue managers • System completely controls the conversation with the user. • It asks the user a series of question • Ignoring (or misinterpreting) anything the user says that is not a direct answer to the system’s questions LING 138/238 Autumn 2004

  24. Dialogue Initiative • “Initiative” means who has control of the conversation at any point • Single initiative • System • User • Mixed initative LING 138/238 Autumn 2004

  25. System Initiative • Systems which completely control the conversation at all times are called system initiative. • Advantages: • Simple to build • User always knows what they can say next • System always knows what user can say next • Known words: Better performance from ASR • Known topic: Better performance from NLU • Disadvantage: • Too limited LING 138/238 Autumn 2004

  26. User Initiative • User directs the system • Generally, user asks a single question, system answers • System can’t ask questions back, engage in clarification dialogue, confirmation dialogue • Used for simple database queries • User asks question, system gives answer • Web search is user initiative dialogue. LING 138/238 Autumn 2004

  27. Problems with System Initiative • Real dialogue involves give and take! • In travel planning, users might want to say something that is not the direct answer to the question. • For example answering more than one question in a sentence: • Hi, I’d like to fly from Seattle Tuesday morning • I want a flight from Milwaukee to Orlando one way leaving after 5 p.m. on Wednesday. LING 138/238 Autumn 2004

  28. Single initiative + universals • We can give users a little more flexibility by adding universal commands • Universals: commands you can say anywhere • As if we augmented every state of FSA with these • Help • Correct • This describes many implemented systems • But still doesn’t deal with mixed initiative LING 138/238 Autumn 2004

  29. Mixed Initiative • Conversational initiative can shift between system and user • Simplest kind of mixed initiative: use the structure of the frame itself to guide dialogue • Slot Question • ORIGIN What city are you leaving from? • DEST Where are you going? • DEPT DATE What day would you like to leave? • DEPT TIME What time would you like to leave? • AIRLINE What is your preferred airline? LING 138/238 Autumn 2004

  30. Frames are mixed-initiative • User can answer multiple questions at once. • System asks questions of user, filling any slots that user specifies • When frame is filled, do database query • If user answers 3 questions at once, system has to fill slots and not ask these questions again! • Anyhow, we avoid the strict constraints on order of the finite-state architecture. LING 138/238 Autumn 2004

  31. Multiple frames • flights, hotels, rental cars • Flight legs: Each flight can have multiple legs, which might need to be discussed separately • Presenting the flights (If there are multiple flights meeting users constraints) • It has slots like 1ST_FLIGHT or 2ND_FLIGHT so use can ask “how much is the second one” • General route information: • Which airlines fly from Boston to San Francisco • Airfare practices: • Do I have to stay over Saturday to get a decent airfare? LING 138/238 Autumn 2004

  32. Multiple Frames • Need to be able to switch from frame to frame • Based on what user says. • Disambiguate which slot of which frame an input is supposed to fill, then switch dialogue control to that frame. LING 138/238 Autumn 2004

  33. VoiceXML • Voice eXtensible Markup Language • An XML-based dialogue design language • Makes use of ASR and TTS • Deals well with simple, frame-based mixed initiative dialogue. • Most common in commercial world (too limited for research systems) • But useful to get a handle on the concepts. LING 138/238 Autumn 2004

  34. Voice XML • Each dialogue is a <form>. (Form is the VoiceXML word for frame) • Each <form> generally consists of a sequence of <field>s, with other commands LING 138/238 Autumn 2004

  35. Sample vxml doc <form> <field name="transporttype"> <prompt> Please choose airline, hotel, or rental car. </prompt> <grammar type="application/x=nuance-gsl"> [airline hotel "rental car"] </grammar> </field> <block> <prompt> You have chosen <value expr="transporttype">. </prompt> </block> </form> LING 138/238 Autumn 2004

  36. VoiceXML interpreter • Walks through a VXML form in document order • Iteratively selecting each item • If multiple fields, visit each one in order. • Special commands for events LING 138/238 Autumn 2004

  37. Another vxml doc (1) noinput> I'm sorry, I didn't hear you. <reprompt/> </noinput> <nomatch> I'm sorry, I didn't understand that. <reprompt/> </nomatch> LING 138/238 Autumn 2004

  38. Another vxml doc (2) <form> <block> Welcome to the air travel consultant. </block> <field name="origin"> <prompt> Which city do you want to leave from? </prompt> <grammar type="application/x=nuance-gsl"> [(san francisco) denver (new york) barcelona] </grammar> <filled> <prompt> OK, from <value expr="origin"> </prompt> </filled> </field> LING 138/238 Autumn 2004

  39. Another vxml doc (3) <field name="destination"> <prompt> And which city do you want to go to? </prompt> <grammar type="application/x=nuance-gsl"> [(san francisco) denver (new york) barcelona] </grammar> <filled> <prompt> OK, to <value expr="destination"> </prompt> </filled> </field> <field name="departdate" type="date"> <prompt> And what date do you want to leave? </prompt> <filled> <prompt> OK, on <value expr="departdate"> </prompt> </filled> </field> LING 138/238 Autumn 2004

  40. Another vxml doc (4) <block> <prompt> OK, I have you are departing from <value expr="origin”> to <value expr="destination”> on <value expr="departdate"> </prompt> send the info to book a flight... </block> </form> LING 138/238 Autumn 2004

  41. A mixed initiative VXML doc • Mixed initiative: user might answer a different question • So VoiceXML interpreter can’t just evaluate each field of form in order • User might answer field2 when system asked field1 • So need grammar which can handle all sorts of input: • Field1 • Field2 • Field 1 and field 2 • etc LING 138/238 Autumn 2004

  42. VXML Nuance-style grammars • Rewrite rules • Wantsentence -> I want to (fly|go) • Nuance VXML format is: • () for concatenation, [] for disjunction • Each rule has a name: • Wantsentence (I want to [fly go]) • Airports [(san francisco) denver] LING 138/238 Autumn 2004

  43. Mixed-init VXML example (3) <noinput> I'm sorry, I didn't hear you. <reprompt/> </noinput> <nomatch> I'm sorry, I didn't understand that. <reprompt/> </nomatch> <form> <grammar type="application/x=nuance-gsl"> <![ CDATA[ LING 138/238 Autumn 2004

  44. Grammar Flight ( ?[ (i [wanna (want to)] [fly go]) (i'd like to [fly go]) ([(i wanna)(i'd like a)] flight) ] [ ( [from leaving departing] City:x) {<origin $x>} ( [(?going to)(arriving in)] City:x) {<dest $x>} ( [from leaving departing] City:x [(?going to)(arriving in)] City:y) {<origin $x> <dest $y>} ] ?please ) LING 138/238 Autumn 2004

  45. Grammar City [ [(san francisco) (s f o)] {return( "san francisco, california")} [(denver) (d e n)] {return( "denver, colorado")} [(seattle) (s t x)] {return( "seattle, washington")} ] ]]> </grammar> LING 138/238 Autumn 2004

  46. Grammar <initial name="init"> <prompt> Welcome to the air travel consultant. What are your travel plans? </prompt> </initial> <field name="origin"> <prompt> Which city do you want to leave from? </prompt> <filled> <prompt> OK, from <value expr="origin"> </prompt> </filled> </field> LING 138/238 Autumn 2004

  47. Grammar <field name="dest"> <prompt> And which city do you want to go to? </prompt> <filled> <prompt> OK, to <value expr="dest"> </prompt> </filled> </field> <block> <prompt> OK, I have you are departing from <value expr="origin"> to <value expr="dest">. </prompt> send the info to book a flight... </block> </form> LING 138/238 Autumn 2004

  48. Grounding and Confirmation • Dialogue is a collective act performed by speaker and hearer • Common ground: set of things mutually believed by both speaker and hearer • Need to achieve common ground, so hearer must ground or acknowledge speakers utterance. • Clark (1996): • Principle of closure. Agents performing an action require evidence, sufficient for current purposes, that they have succeeded in performing it LING 138/238 Autumn 2004

  49. Clark and Schaefer: Grounding • Continued attention: B continues attending to A • Relevant next contribution: B starts in on next relevant contribution • Acknowledgement: B nods or says continuer like uh-huh, yeah, assessment (great!) • Demonstration: B demonstrates understanding A by paraphrasing or reformulating A’s contribution, or by collaboratively completing A’s utterance • Display: B displays verbatim all or part of A’s presentation LING 138/238 Autumn 2004

  50. LING 138/238 Autumn 2004

More Related