1 / 50

Multimodal Dialogue Analysis

Multimodal Dialogue Analysis. INOUE, Masashi Yamagata University. 29-Nov-09 @FIU Dr. Tao Li’s Group. Name of the discipline. Computational Social Linguistics Society influences language use Conversation Analysis (CA) Discourse Analysis (DA). Overview (1/5). Layers of investigation.

diata
Download Presentation

Multimodal Dialogue Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimodal Dialogue Analysis INOUE, Masashi Yamagata University 29-Nov-09 @FIU Dr. Tao Li’s Group

  2. Name of the discipline • Computational Social Linguistics • Society influences language use • Conversation Analysis (CA) • Discourse Analysis (DA)

  3. Overview (1/5)

  4. Layers of investigation

  5. Major Conferences and Journals • ICMI-MLMI • ICMI (User Interface) and MLMI (Dialogue Analysis) merged in 2009 • Some in multimedia or NLP conferences • ACM Multimedia • ACL • etc.

  6. Research Initiatives In Europe • CHIL Corpus • AMI Corpus • Augmented multi-party interaction • http://corpus.amiproject.org/ • SSPNET • A European network of excellence in social signal processing • http://sspnet.eu/

  7. FIRST example (2/5)

  8. Paper 1 (ICMI-MLMI 2009) • "Discovering group nonverbal conversational patterns with topics” by DineshBabuJayagopi, Daniel Gatica-Pere (IDIAP) • Goal: Understand group dynamics (= leadership) from conversational video

  9. Method • Feature descriptor • Time slices of conversation (documents) • different time scale shows different patterns • 1 min scale – monologue vs. 5 min scale - a lot of interaction • Speaking energy/Speaking status • Bag of non-verbal patterns (NVP) • speech length, # of turns, successful interruptions • Method (what’s new) • Unsupervised • Topic model (LDA) – which feature is prominent

  10. Feature categories • Generic group patterns: group as a whole • silence, one-speaker, two-speaker, other, evenly • Leadership patterns: • proposed in social psychology field • position of designated leader (‘L’) or someone else (‘NL’): taking maximum values • 21 dimensional feature vectors (vocabulary) • 6 tokens per slice (words)

  11. Data • AMI Corpus • Meeting for product design • 17 meetings (17 hours) • 4 participants / group: • ‘Project Manager’, ‘User Interface specialist’, ‘Marketing Expert’, and ‘Industrial Designer’.

  12. Result (3 topics)

  13. Result (visual) 3 topics Can be used to characterize groups

  14. Validation • Comparison with ground-truth(GT): • 5 min scale, 8 top docs per class • 3 annotator / meeting • GT is majority agreed • Accuracy: 62%, 100%, 75% for each class • Autocratic, Participative, Free rein

  15. Questions • Feature representation (Are they good? ) • Some magic numbers (e.g., 6 words/slice) • Balancing #of vocabulary and # of words • Modeling technique (Is LDA a valid one?) • Can we regard the NVC as words and Group Dynamics as topics? • Arbitrary number of topics, different interpretation

  16. Example 2 (3/5)

  17. Paper 2 (MSSSC 2009) • "Sensor-Based Organizational Engineering” by Daniel Olguin-Olguin, Alex (Sandy) Pentland (MIT Media Lab) • [16] Olguin-Olguin, D., & Pentland, A. (2008). Social Sensors for Automatic Data Collection. 14th Americas Conference on Information • Social signals/Reality mining/Sensible organizations • Introduction to their research projects • Use of sensors to collect data in groups • Combination of textual and survey data • Business communication domain (organizational behavior)

  18. Method • Sensor data • Face/body/vocal behavior/space and environment/affective behavior • camera infrared sensors, accelerometer, gyroscopes, inclinometers, cameras, pressure sensors, microphones, cameras, vibration,... • Pattern recognition • Social network analysis • Who talks to who • How well they are communicating

  19. Case 1 • Communication in a call center • wearable sensor devices (sociometric badge) • completion time difference (productivity) • 2,200 hours of data (100 hours per employee) and 880 reciprocal e-mails • Findings • more interaction implied lower productivity • higher variance in physical activity implies lower productivity

  20. Case 2 • Communication in a marketing division • face-to-face vs. emails • questionnaire (satisfaction) • Findings: • Total comm = email + face-to-face • Total comm negatively correlate with satisfaction

  21. Questions • Evaluation • Some domains do no have clear definition of good/bad conversation • Interestingness • High proximity -> low email usage • Implementation • management practices for productivity improvements, customer satisfaction, and a better competitive position

  22. overview of our project (3/5)

  23. Pattern discovery from dialogue • Goal: Finding recurring events or event sequences in human face-to-face dialogues. • Why?: Human communication skills are often experience or assumption-based. • Enable smooth communication • Prevent problematic communication • Task: Identify plausible hypotheses by machines that human cannot notice by observation

  24. Target dialogue • Psychotherapeutic Interview (Counseling) • Counseling at schools • Counseling at hospitals • Increasing demand for therapists • Shortage of qualified teachers • Lack of effective training methods • Therapist training setting (non-experimental)

  25. Our Corpus (Private) • Psychotherapeutic interview (counseling) • Training opportunity for students • 25 dialogues (approx. 2 hrs each, 21 hrs in total) • Adding more dialogues (3/year)

  26. Recording and data format Priority: minimize disturbance for participants Single Camera Two microphones AVI -> MPEG Video Data Transcript Annotation

  27. Multimodality • Verbal cue is dominant in defining meanings (textual information) • What are the impact of non verbal cues such as gestures, eye-gaze, styles, timing, or context including social background?

  28. Can gestures indicate misunderstandings? • “Prediction of Misunderstanding from Gesture Patterns in Psychotherapy”, M. Inoue, R. Hanada, N. Furuyama, NII-2009-001E, Feb. 2009 • Negative result • We should rely on verbal content

  29. Gestural Feature for Th & Cl • Before/During/After the misunderstanding • 5/10/50 sec. windows • Frequency (x1; x2; x3) • Frequency Difference (x4; x5) • Duration (Mean & Max & Min) (x6; x7; x8) • Mean Interval (x9)

  30. Predictability by gestural cues • Classification by linear discriminate analysis • Is there any feature that have similar precision/recall tendency over different dialogues? P P 2 1 2 3 1 3 R R Dialogue 1 Dialogue 2

  31. Speech-gesture interaction (4/5)

  32. Analysis of speech type patterns • Understand how therapists speak words to their clients based on speech type transition patterns Closed question e.g., :”Do you mean ~?” Open question e.g., “Can you elaborate that?” Encouragement/Repeat e.g., “Go on.” “I see.” Rephrase e.g., “So, you are thinking ~.” Reflection of emotion Reflection of meaning Other A taxonomy used in counseling domain

  33. Relationship between speech and gesture • Frequencies of speech types • At the beginning or the end of dialogues • How do speech patterns look differently when gestures are taken into consideration? • Speeches that co-occur with gestures VSSpeeches without gestures • Do above division leads to any changes in the speech type transition patterns?

  34. Speech type transition in the beginning of the dialogue Sequences beginning from questioning Co-gesture Generic encouragement Non-Co-Gesture

  35. Speech type transition in the ending part of the dialogue Sequence beginning from question Question and rephrase Co-Gesture Sequence beginning from encouragement Non Co-Gesture

  36. Speech type transition in the ending part of the dialogue (Beginner therapist) Co-Gesture Reflection of therapists’ skill? Non Co-Gesture

  37. Summary • Various speech sequence patterns can be interpreted as the techniques in dialogues. • Patterns could be better understood when multimodality is taken into account. • Discovered patterns could be used to assess the proficiency of therapist.

  38. Verbal content mismatch (5/5)

  39. Mismatch between intension and perception over an utterance • Therapists (Th) want to empower clients (Cl) by compliments. • Clients want to be empowered by Th through their compliments. • They share the same goal but this process dos not goes well in reality. • Th tried compliment but Cl did not notice it • Some complimentary expression are uncomfortable to Cls • Th cannot figure out how Cls are praised

  40. Compliment as a counseling technique • Therapists learn the concept and necessity of compliment through lectures, but • There is not enough analysis of failures. • Concrete examples of expression are scarce. As a result • Inexperienced Th cannot succeed in using compliment techniques in the actual interview occasionsvery often.

  41. Analysis approach • How there happen mismatches in terms of vocabulary. • The focus is on whatThs say rather than how they say. • How the intention and perception are different over the word usage • Timing of the utterance are ignored. • To understand the generic tendency, multiple dialogues are mixed together into a word pool.

  42. Data preparation • Transcripts based on the videos of psychotherapeutic interviews (13 pairs, 27 participants) • They are assigned to the participants. • Both Th and Cl highlights Th’s speech where Th conducted compliment (Th) or Cl was empowered (Cl). • Highlighted speeches are extracted and put into the word pool.

  43. Degree of discrepancy • Number of highlighted speech by therapists: • 114 (M=8.1) • Number of highlighted speech by clients: • 69(M=4.6) • Agreement: • 6%(11/183) Both marked (11)

  44. Pre-processing • Morphological analysis • Replacement of words (fluctuation, removal of proper nouns for anonymity) • Number of tokens: 4250 • Removal of low frequent (tf<2) or single document (df<2) words focusing on the generic (cross-dialogue) expressions • Number of vocabulary: 476 -> 113

  45. Frequent words

  46. Eliminate high frequency words frequency word id threshold

  47. Mid frequency words

  48. Summary • Problem: Compliment used by therapists (Th) during counseling are not well accepted by clients (Cl). • Data: 13 dialogue transcripts; utterances where Th intended compliment technique and Cl feel empowered by compliment are marked. • Analysis: To understand the mismatch in vocabulary level, differences in usage are explored in terms of frequency. • Th tend to use compliment technique to focus on the difficulties of the problem. • Cl may be empowered by the words referring internal mental status. • Future direction: Understanding resolving process of mismatches taking the difference in proficiency of therapists and dialogue topics into account.

More Related