450 likes | 578 Views
Structure of Dialogue. - Holtgraves. Source: Holtgraves, T., Language as Social Action: Social Psychology and Language Use, Lawrence Erlbaum Assciates, Mahwah, New Jersey, pp. 89 – 119. Findings Failure to respond to a request implies a lack of understanding
E N D
Structure of Dialogue - Holtgraves Source: Holtgraves, T., Language as Social Action: Social Psychology and Language Use, Lawrence Erlbaum Assciates, Mahwah, New Jersey, pp. 89 – 119 • Findings • Failure to respond to a request implies a lack of understanding • The first part of an adjacency pair constrains the second and makes it conditionally relevant • The “preferred” response to a question-answer pair is usually immediate and overlaps the question • A “dis-preferred” response to a question-answer pair is usually delayed, indirect, involves the use of prefaces (well, but, uh) and includes some kind of explanation for the response • Ending a conversation abruptly can pose a threat to the positive face of the other person Ben Koh
Structure of Dialogue - Holtgraves cont’d • Design Implications • A SUI could use a lack of response as a cue to provide help • SUIs can elicit targeted responses from users if the first part of an adjacency pair is sufficiently constraining • Allowing utterances to occur in the middle of conversation creates a more natural experience • SUIs can use cues from conversation to detect dis-preferred responses and propose other options Ben Koh
Structure of Dialogue - Beveridge Beveridge, M., Milward, D., Ontologies and the Structure of Dialogue, CATALOG ‘04 • Findings • During a conversation, responses sometimes include insufficient detail or too much detail • Conversations are more natural when questions on related topics follow one another • Design Implications • Prompt the user when more detail is required, but do not ask redundant questions when detailed information was already provided by the user • A system that is flexible in the order of questioning will produce conversation that flows more naturally from topic to topic Ben Koh
Case studies - Emacspeak • Emacspeak • Audio desktop for the visually impaired using a text-to-speech AUI • Features • Unlike screenreaders that speak the contents of a visual display, Emacspeak speaks the underlying information. • Key Findings Pros • Intelligent Audio Formatting & Audio Icons Cons • Errors • Design Implications • Provides a groundwork for speech-enabling conversational interfaces:e.g., accessing the wealth of information on the Internet via a mobile telephone or while driving. • A more "human" computer, one the user can talk to, may make educational and entertainment applications seem more friendly and realistic. • Other Speech Navigation Tools • VoiceXML • SALT Lily Cho
Case studies- Evaluating a Spoken Language Interface to Email • Case Study: • Background Stigma on SLI • Limited for delivering information • Require user to learn the language the system can understand • Hides available command options • Leads to unrealistic expectations to capabilities • Experiment • 2 Dialogue Strategies for a SLI to accessing email (ELVIS) by phone: • Mixed initiative dialogue style, in which users can flexibly control the dialogue • System initiative dialogue style, in which the system controls the dialogue • Results • The mixed initiative system is more efficient (measured by numbers or turns, or elapsed time to complete a set of email tasks) • Surprisingly, users preferred to use the system-initiative interface: • Easier to learn • More predictable • Conclusions • Perhaps, if the study progressed and users became experts, they would prefer the mixed-initiative system. Lily Cho
Common Grounding • Clark, H., Brennan, S. (1991) “Grounding in Communication” • Common Grounding - mutual knowledge, mutual beliefs, and common knowledge. • In asking a question, it must be established that the respondent has understood what the questioner meant. • There are two phases to establishing grounding: • Presentation Phase: A presents an utterance to B. If B gives evidence, then A can believe that B understands what he means. • Acceptance Phase: B accepts the utterance from A that she believes she understands what A meant. B will also believe that once A registers this evidence, he will also believe that B understands. • The system in a SUI should be sure that the user understands its utterances and give the user a way of getting clarification if they do not understand. Melissa Ludowise
Common Grounding • Kiesler, S. (2005) “Fostering Common Ground in Human-Robot Interaction” • People attribute knowledge to robotic systems. • If a robot has many human-like qualities, people will expect it to act as a real person. Robots may be more explicit than a human would (example of robotic security guard giving directions). • Stereotypes can also be applied to robotic systems. For example, when participants were told a robot originated from China, they believed it to have more knowledge of Chinese landmarks. • SUI systems should not be explicit unless necessary. • Users will use stereotypes of characters used in SUI systems. Melissa Ludowise
Common Grounding • Patterson, E. S., Watts-Perotti, J., Woods, D. D. "Voice loops as coordination aids in space shuttle mission control." Computer Supported Cooperative Work: The Journal of Collaborative Computing, 8(4), 353-371 (1999). • Finding: constant monitoring of multiple auditory channels allows mission control operators to attend to many stimuli in their operational periphery. Subsequently, when tasks that require cooperation between heterogenious groups of specialists are necessary, those specialists require less time to coordinate the task. This is due to their shared grounding: they each understand the issues that other team members and groups have been struggling with. • Design Implication: When it is possible to output a constant, low-level stream of relevant background contextual information to the user, users will require less time and information to make a decision. Jason Cornwel
Common Grounding • Monk, A. “Common ground in electronically mediated communication: Clark’s theory of language use.”, Chapter 10 in J. Carroll (ed.), HCI Models, Theories and Frameworks, Morgan Kaufmann, pp. 265-290, San Francisco, 2003. • Finding: When conversational systems ignore one or more of Clark’s constraints on grounding (included in the ntoes for this slide) they will incurr additional costs in orienting their participants. Examples from the paper included Cognoter, a shared electronic whiteboard system that allowed meeting participants to create and annotate “items” on the board either in parallel or in advance of the meeting. The system was unsucessful because people did not have enough of a shared understanding at the outset of a meeting to make independent ideation efficient. The other example cited involved a doctor, patient, and observer communicating via videoconference. When the observer knew that the others were aware of him/her (copresence), the observer was more willing to interrupt to ask questions. • Design Implication: Design for as many of Clark’s constraints as possible to increase conversational efficiency. Jason Cornwel
Emotion and conversation- Emotional reactions to system interaction • SUI malfunction may cause changes to a person’s emotional state, altering both acoustic properties and choice of words. • Initial strategy: prevent an angry reaction. • Contingencies should be in place to detect and de-escalate emotional situations. • Train for and adapt to emotional changes. • Initiate clarification dialogues • Explain or apologize for system shortcomings. • Sum up current state of the system. K. Fischer (1999) “Repeats, reformulations, and emotional speech: Evidence for the design of human- computer speech interfaces”. Human-Computer Interaction: Ergonomics and User Interfaces, Volume 1 of the Proceedings of the 8th International Conference on Human-Computer Interaction Munich, Germany Lawrence Erlbaum Ass., London pp. 560-565 Simon King
Emotion and conversation- Small talk and social intelligence • Small talk is an important mechanism for managing both the channel of communication and interpersonal distance between a user and an interface agent. • It can help maintain an open channel of communication and establish a social bond. • Simple acts can keep the conversation flowing: • Always respond with more information than was explicitly asked for. • Use idling behavior such as “Mhm”, or “Yea”. Timothy Bickmore (1999) “A Computational Model of Small Talk” MAS 962 Discourse and Dialog for Interactive Systems http://web.media.mit.edu/~bickmore/Mas962b/ Simon King
Emotion and conversation- Matching Emotional States and Attention/Performance • “Matching” emotional states between the user and the SUI has the following positive effects: • Performance increase – less errors in tasks requiring high attention • Quicker response times • Greater attention spans • Increased communication – encouraging continued communication could help task completion • The research identified “matching” as: • Use energetic voice when the user is happy • Use subdued voice when the user is upset • (The implications from this paper is suited for design of systems where voice-based interface is in an multi-tasking environment where safety is important.) Ray Su
Emotion and conversation- Augmenting Conversation with Dual-purpose Speech • A dual–purpose speech interaction is one where the speech serves two roles. • First, it is socially appropriate and meaningful in the context of a human–to–human conversation. • Second, the speech provides useful input to a computer. • Much of office work involves interpersonal conversation. Dual-purpose speech allows users to interact with SUI systems without interrupting work and normal conversations with others. • Application of dual-purpose speech: • Use keywords in normal human to human conversation as commands to the speech user interface. • "Lemme check the email sent by Donna last week" • "email" and "Donna" and "last week" are action prompts for the SUI to navigate through the email app. • Current limitations in speech recognition technology requires the user to "push to prompt" the input of keywords to the speech user interface. For future designs, we can remove this limitation to allow smoother interaction with the system Ray Su
Cognition and Auditory Working Memory • D.E Kieras, et al; An EPIC computational model of verbal working memory;University of Michigan • Experiment with words recalling • Phonologically distict words have higher performance than similar words. • List of short words shows better recalling performance than long words significantly. • Design: short list, short words, phonologic distinction. Annie Ha
Phonological Loop • A. Braddeley; The episodic buffer; a new component of workign memory?(Nov, 2000); Trends in Cognitive Science; vol 4 no11 • phonological similarity effectword or letters with similar pronunciation is harder to remember • word-length • articulatory suppression • irreverent sound -the. will decrease the performanceusually • when words are not related people starts to make errors once the number of the words exceed 6. when the words are related a span of 16 words is possible Annie Ha
Talkback; a conversational answering machine Source: Vidya Lakshmipathy, Chris Schmandt, and Natalia Marmasse.TalkBack: a conversational answering machine, UIST 2003, Vancouver, BC, Canada, pages 41 - 50. • Good things • Users found it easier to provide responses compared with regular email. • Replies are like inline email reply • Users said it would be nice to get an overview of the message before replying (summary?) • No touching required to begin recording • Bad things • Pauses that stopped reply recording was annoying. • Users pause while talking to think. • Users suggested that pressing something stop might be good • Users wanted to be able to jump between segments • Make email responses more conversational • Messages are segmented by detected speech gaps or topic shifts • System stops for a moment between gaps to allow for response. • Users stop recording or continue by not speaking • Users can record anytime by interrupting playback with speech Jeff Wong
Talkback; a conversational answering machine • Talkback Design Implications • Inline replying reduces memory load on replying • Automatic segmentation can enable some within message navigation • Interruption by speech is easier than interruption by button press • (compared with early prototype) • Don’t use pauses to end a recording Jeff Wong
A 3D audio only interactive web browser • Using spatial audio to indicate location in a document is BAD • Uncomfortable to hear in one ear for a long time. • You need surround sound or headphones to make this work • Take-off and landing sounds indicate navigation skipping. • Different gender voice and spatial location to announce position in document. Source: Goose, S. and Müller, C. 1999. A 3D audio only interactive Web browser: using spatialization to convey hypermedia document structure. MULTIMEDIA '99. ACM Press, New York, NY, 363-371. Jeff Wong
Resnick,Paul and Robert A. Virzi (1992) "Skip and Scan: Cleaning Up Telephone Interfaces." Proceedings of CHI '92. New York: ACM Press, pp.419-426. http://portal.acm.org/citation.cfm?id=142881&coll=portal&dl=ACM&CFID=396449&CFTOKEN=63284630 Skip and Scan • The Findings • The Skip and Scan paper talked about a automated interface that people interact with over the phone. It compared the current systems of menus and number prompts (ie. "For accounting, Press 1...") to a new way to interact with the system. • This new way of interaction had the user navigate through the prompts by using the 7 key for going backwards, the 9 key for going forward, and the 1 key to select the current prompt.Allows for larger menus. • This system allows users to browser through at their own will. • This new interaction took some time to get use to, the younger the group, the faster they got it. Once they got the hang of it, it was quicker than listening to all the prompts on a menu. James Soracco
Resnick,Paul and Robert A. Virzi (1992) "Skip and Scan: Cleaning Up Telephone Interfaces." Proceedings of CHI '92. New York: ACM Press, pp.419-426. http://portal.acm.org/citation.cfm?id=142881&coll=portal&dl=ACM&CFID=396449&CFTOKEN=63284630 Skip and Scan • The very old do not want to interrupt the system, so they are not much faster with this method. • The vast majority of all users from every group said that they liked this new method of interaction. Some of the reasons why they said that they like it was because it put them in control of the system instead of them waiting for the system. • Design implementations • For most users, they prefer to have control over navigation through their data and can navigate it faster than if it was read off to them. It is important to create a system where the user can navigate the data quickly (both forwards and backwards) and interrupt the system when they recognize that the message that is being read is not what they wanted so they can move on to the next item in the list. James Soracco
Machine with human-like qualities- Julie and other voice systems • Julie is AMTRAK’s phone answering system (1-800-USA-RAIL) • She has received strong positive reviews. • Surveys have found that callers give Julie a 90 percent approval rating. • She handles 25% of AMTRAK’s calls, about five million, at a savings of $13 million. • The interface is informal, with quotes such as: “OK Let’s get started” ; “You’ll want paper handy” ; “Got it!” ; “Sorry, I didn’t get that.” She apologizes when wrong. Julie Stinneford, the voice of Amtrak’s Julie. Craig Borchardt
Machine with human-like qualities- Other noted voice systems • Tom, United Airlines • Jenni McDermott, Yahoo – Comes with photo and four-page biography which describes that she graduated from Berkeley with an art history degree in 2001. Quotes are “Got it!” “Cool” and “Wow, you’re popular” (for callers with crowded e-mail boxes. • Claire, Sprint – Failed system no longer used by Sprint. Online bogs still carry comments about the system. Claire could not recover when words were mispronounced and could not deal with background noises. One blog complained that she could not be interrupted; another said that she was the reason for switching to AT&T. She was attractive and friendly but navigation was a maze. She sounded happy even when customers were angry and frustrated. • Mercedes-Benz – Had to change the on-board software in some cars because men complained that they did not want to take orders from a female voice. • Brokerage Houses – Found people calling in responded favorably to female voices but want to deal with a man when making a trade. Craig Borchardt
Machine with human-like qualities- Voice systems and interaction • Design Implications • A virtual operator should be able to sense if you’re flustered by the length of pauses and number of “uhs” as well as from fluctuations in voice inflections. • Matching voice, gender, and persona to user is important. Source: New York Times, November 24, 2004; Chicago Tribune, March 28, 2005. Craig Borchardt
Machine with human-like qualities- Clifford Nass on Clippy the Microsoft Paper Clip: • “I was involved on the on the dancing paper clip…I think they are right to despise him for many reasons. • The single best reason is a fundamentally social one. When you ask people why they hate that paper clip so much, one of the first things they say is, “Well, every time I write ‘Dear Fred,’ the thing pops up and says, ‘Oh, I see you’re writing a letter.’ And they dismiss it. The first time it was okay, it was helpful. • The second time it was at least trying. But the forty-seventh time, it was clearly being at best passive aggressive and at worst down right hostile, implying that I couldn’t make a decision about the right thing to do.” Well, we know what we do with people like that – we hate them. In fact, the first rule in the Dale Carnegie course of “How to win friends and influence people” is remember things about people. Craig Borchardt
Machine with human-like qualities- Nass and Microsoft Paperclip • The paper clip doesn’t do that. Also it manifests a particular personality style that’s not very popular; it’s a rather dominant, unfriendly personality style. • There are characters in there that lots and lots of people like – unfortunately the interface was designed so that you couldn’t discover them…research shows that almost everyone finds a character that they like.” • Image from Joe Tullio Craig Borchardt
Machine with human-like qualities- Nass and Microsoft Paperclip • Design Implications • Interface should learn about the user’s actions. • Match the personality of the interface to the user. • Let the user choose which interface character to interact with. • Source: Conversations with Clement Mok and Jacob Nielson, and with Bill Buxton and Clifford Nass. Interactions, Volume 7, issue 1. January 2000 Craig Borchardt
Machine with human-like qualities- Can computer personalities be human personalities? • Clifford Nass, et al. CHI, 1995. • Study examined theory that computer personalities can be created with a small set of cues and that people respond to personalities the same way that they would to human personalities. • In this experiment, dominant and submissive computer personalities were created and paired with people who were determined to have dominant or submissive personalities. • The experiment concluded that when a person was paired with a computer with a similar personality, higher affiliation and competence ratings resulted. Craig Borchardt
Machine with human-like qualities- Can computer personalities be human personalities? • Design Implication: Create multiple interface characters so that users can choose a match for themselves. • Source: Nass, C. et al. (May 1995). Can computer personalities be human personalities? Paper presented to Chi’95 conference of the ACM/SIGCHI, Denver, CO. Craig Borchardt
Machine with human-like qualities- Can computer-generated speech have gender? An experimental test of gender stereotype. • The study looked at how gender in computer speech affected the user’s perception of the computer. • Found that a male voice exerted greater influence on the user’s decision than a female voice, and was seen as more socially attractive and trustworthy. • Gendered synthesized speech triggered social identification processes with female subjects conformed more to female-voiced computers and males conformed more to male-voiced computers. Craig Borchardt
Machine with human-like qualities- Can computer-generated speech have gender? An experimental test of gender stereotype. • Speech interface should consider gender of voice and consider presenting user with option to choose voice. • Source: Lee, Eun-Ju. et al. (April 2000). Paper presented to Chi’00 conference of the ACM/SIGCHI. Craig Borchardt
Machine with human-like qualities- Kismet • Source • Sociable Machines Project in MIT homepage : http://www.ai.mit.edu/projects/humanoid-robotics-group/kismet/kismet.html • Overview • Autonomous robot designed for social interactions with humans. • Perceives natural social cues from visual and auditory channels, and delivers social signals to the human through gaze direction, facial expression and vocal babbles. Jaewon Kang
Machine with human-like qualities- Kismet • Findings • instead of trying to achieve realism, project team focuses on sharing emotions through communication. • to recognize and affectively responds to intent such as praise, prohibition, attention, and comfort, Kismet identifies the difference in speech rate, pitch, intensity and etc. • designed Kismet to look like a very young child so that people are naturally exaggerate the way they speak and which deliver a very characteristic tone of voice. Jaewon Kang
Machine with human-like qualities- Kismet • Findings • By using voice synthesizer, Kismet generates the sound with pitch accents in response to the speaker’s communicative intent. • even though there is no grammatical structure, its manner of vocal expression gives understandable responses and contributes to Kismet’s personality. • Implications • Responses/feedback should be understandable intuitively • Simplifying human emotions into subsets is useful to understand the speaker’s intent • Creates the condition in which people naturally exaggerate their voice and get more recognizable input • Strong personality of system can offset somewhat unnatural and slow responses of the machine in communication. Related movie clip: http://www.ai.mit.edu/projects/sociable/movies/expression-examples.mov Jaewon Kang
Machine with human-like qualities- VoCollect • Source • From article, “Machine to human: can we talk?” Ward's Auto World, May, 1992 by Stephen E. Plumb • Vocollect Homepage: http://www.vocollect.com/global/web.php/en/ • Overview • a portable voice-interactive device bridges the gap between human observations and computer data bases by allowing inspectors to input findings directly into a computer system. • consists of two parts; central processing unit, battery pack (3 lb.) and speech recognition headsets. • works in environments of up to 100 decibels. sound variances won't affect voice recognition. • can be tailored to specific jobs so in many cases, the program looks totally different than the generic format Wearable computer Headset Jaewon Kang
Machine with human-like qualities- VoCollect • Findings • Process • Host coordinates operational data and send assignments • Assignments are converted into speech and completed • Vocollect voice software sends real time status on assignments • Host updates data • Benefits • boost productivity; 30% faster • improve accuracy • cut training time • lower operating costs Jaewon Kang
Machine with human-like qualities- VoCollect • Findings • Application • Material handling and shipping and receiving verification • Order selection, replenishments, put-aways and transfers • linked with a bar-code scanner so inspectors don’t have to read 17-digit identification numbers • Implications • Give users the possibility to tailor the device to meet individual needs • Voice recognition is effective for real time data update • Try to reduce time for educating the user • Vocollect keeps the structure of the voice messages consistent Jaewon Kang
Non-verbal voice command • Non-verbal features in speech • Continuous voice as on/off button • Increasing pitch as accelerator • Tonguing as discrete controller • Different vowel qualities as direction indicator • Pros: Immediate, continuous control • Cons: Unnatural way of using the voice • Can complement other speech recognition interfaces and visual interfaces • For example, to adjust system parameters
Auditory Icons • Tested 83 students for sound recognition • Less than 15% correct identification of exact sound • 80% partial identification of material or function • Sounds identified as either objects or actions • Tearing, ripping, winding vs. camera, door, zipper • Sounds can represent objects and actions but their use should not require user to interpret them too specifically Source: Mynatt, Elizabeth. (1994) “Designing With Auditory Icons: How Well do we Identify Auditory Cues?” Conference on Human Factors in Computer Systems. Boston, MA: ACM Press, pp. 269-270.
Adaptivity in SUI:Speech User Interface • Adaptivity is typically used to make an interaction more efficient or more comfortable for users. • Adaptive systems need to be aware of User differences (characteristics/behaviour/preferences). • Adaptive systems must be aware of Context differences (and that context may change during the interaction). Context also plays key role in User behaviour. • Adaptive systems must be self-aware enough to let Users know if something has change within the system. • Primary notes of interest: • Error/contingency planning • Adaptive system (information) architecture • System awareness (User/Context/Self)
Language and Culture (1) • Findings • Conversational Interaction that includes expanding and giving synonyms, when a second language speaker is involved in the conversation, helps the NNS's (non-native speaker) accuracy while executing the intended task (or conversation) • Recasting, which is the act of repeating a phrase correcting structural mistakes the NNS can make while formulating a statement, improves the NNS's understanding of the overall context of meaning and syntax of conversation. (Big trade off is the emotional response to hear the interlocutor correcting what you say) • It is possible to find patterns on the speech (as length of sounds and silences) between languages. For instance when a NNS does not understand a word in any given conversation can infer at least the basic notion of the answer based on emotion, tone but also structure of sound and silence.
Language and Culture (1) • Design Implications • Synonyms and other equivalences to actual terms used throughout the SUI -or in the help section- can aid the non-native speaker user in completing his or her task successfully. • Keeping the commands and the answers to really short phrases is the tradition and the logical structure for SUIs, However, more feedback or little context help immensely the user if he or she does not understand the meaning of the words in the feedback • Repeating terms when necessary can help the NNS user to associate these words to their actual meaning, providing language performance tools for NNS power users • Based on • MacKey, A., Philip, J. Conversational Interaction and Second Language Development: Recasts, Responses and Red Herrings?. The Modern Language Journal, Vol 82, No.3, Special Issue: The Role of Input and Interaction in Second Language Acquisition (Autumn, 1998), pp. 338-356.
Language and Culture (2) • Findings • It is hard for people over 12 years old to reach a native speaker's performance in the second language independently of the time they have practiced the second language • The elements that constitute the "foreign" accent can be broken into metaphors that could find equivalents of pitch, tone and speed of speech • Once the foreign language has been identified, elements such as rhythm and speed can be measured and compared to the same elements of first language in which the conversation takes place.
Language and Culture (2) • Design Implications • Machines with speech recognition capabilities can learn to identify "foreign accents" and correct their interpretation of words depending on the analyzed sample of speech. • According to the elements of the speech analyzed,common patterns of speech can not only be used to structure the menus on a SUI, they can also aid to structure the feedback given to the NNS user. • Based on • Van Eels, T., De Bot K. The Role of Intonation in Foreign Accent. The Modern Language Journal, Vol. 71, No.2.(1987) pp. 147-155
3D Virtual Interactive Acoustic Environments • It is possible not only to place a virtual sound in a precise three dimensional space, but to modify the size and shape of the room in which the virtual sound is contained. Source: Noisternig, M., Musil, T., Sontacchi, A., Holdrich, R. (2003) A 3D Real Time Rendering Engine For Binaural Sound Reproduction Boston, Massachusetts Proceedings of the 2003 International Conference on Auditory Display
3D Virtual Interactive Acoustic Environments • In comparing sighted versus blind children: while blind children could successfully map out a fully three dimensional space using only sound cues, sighted children had much more trouble and their mental map included many incorrect elements, and left out many other subtle elements. Source: Sanchez, J., Lumbreras, M. (2000) Usability andCognitive Impact of the Interaction with 3D Virtual Interactive Acoustic Environments by Blind Children Alghero, Italy Proc. 3rd Intl Conf. Disability, Virtual Reality & Assoc.Tech.