500 likes | 703 Views
Multimodal Dialogue Analysis. INOUE, Masashi Yamagata University. 29-Nov-09 @FIU Dr. Tao Li’s Group. Name of the discipline. Computational Social Linguistics Society influences language use Conversation Analysis (CA) Discourse Analysis (DA). Overview (1/5). Layers of investigation.
E N D
Multimodal Dialogue Analysis INOUE, Masashi Yamagata University 29-Nov-09 @FIU Dr. Tao Li’s Group
Name of the discipline • Computational Social Linguistics • Society influences language use • Conversation Analysis (CA) • Discourse Analysis (DA)
Major Conferences and Journals • ICMI-MLMI • ICMI (User Interface) and MLMI (Dialogue Analysis) merged in 2009 • Some in multimedia or NLP conferences • ACM Multimedia • ACL • etc.
Research Initiatives In Europe • CHIL Corpus • AMI Corpus • Augmented multi-party interaction • http://corpus.amiproject.org/ • SSPNET • A European network of excellence in social signal processing • http://sspnet.eu/
Paper 1 (ICMI-MLMI 2009) • "Discovering group nonverbal conversational patterns with topics” by DineshBabuJayagopi, Daniel Gatica-Pere (IDIAP) • Goal: Understand group dynamics (= leadership) from conversational video
Method • Feature descriptor • Time slices of conversation (documents) • different time scale shows different patterns • 1 min scale – monologue vs. 5 min scale - a lot of interaction • Speaking energy/Speaking status • Bag of non-verbal patterns (NVP) • speech length, # of turns, successful interruptions • Method (what’s new) • Unsupervised • Topic model (LDA) – which feature is prominent
Feature categories • Generic group patterns: group as a whole • silence, one-speaker, two-speaker, other, evenly • Leadership patterns: • proposed in social psychology field • position of designated leader (‘L’) or someone else (‘NL’): taking maximum values • 21 dimensional feature vectors (vocabulary) • 6 tokens per slice (words)
Data • AMI Corpus • Meeting for product design • 17 meetings (17 hours) • 4 participants / group: • ‘Project Manager’, ‘User Interface specialist’, ‘Marketing Expert’, and ‘Industrial Designer’.
Result (visual) 3 topics Can be used to characterize groups
Validation • Comparison with ground-truth(GT): • 5 min scale, 8 top docs per class • 3 annotator / meeting • GT is majority agreed • Accuracy: 62%, 100%, 75% for each class • Autocratic, Participative, Free rein
Questions • Feature representation (Are they good? ) • Some magic numbers (e.g., 6 words/slice) • Balancing #of vocabulary and # of words • Modeling technique (Is LDA a valid one?) • Can we regard the NVC as words and Group Dynamics as topics? • Arbitrary number of topics, different interpretation
Paper 2 (MSSSC 2009) • "Sensor-Based Organizational Engineering” by Daniel Olguin-Olguin, Alex (Sandy) Pentland (MIT Media Lab) • [16] Olguin-Olguin, D., & Pentland, A. (2008). Social Sensors for Automatic Data Collection. 14th Americas Conference on Information • Social signals/Reality mining/Sensible organizations • Introduction to their research projects • Use of sensors to collect data in groups • Combination of textual and survey data • Business communication domain (organizational behavior)
Method • Sensor data • Face/body/vocal behavior/space and environment/affective behavior • camera infrared sensors, accelerometer, gyroscopes, inclinometers, cameras, pressure sensors, microphones, cameras, vibration,... • Pattern recognition • Social network analysis • Who talks to who • How well they are communicating
Case 1 • Communication in a call center • wearable sensor devices (sociometric badge) • completion time difference (productivity) • 2,200 hours of data (100 hours per employee) and 880 reciprocal e-mails • Findings • more interaction implied lower productivity • higher variance in physical activity implies lower productivity
Case 2 • Communication in a marketing division • face-to-face vs. emails • questionnaire (satisfaction) • Findings: • Total comm = email + face-to-face • Total comm negatively correlate with satisfaction
Questions • Evaluation • Some domains do no have clear definition of good/bad conversation • Interestingness • High proximity -> low email usage • Implementation • management practices for productivity improvements, customer satisfaction, and a better competitive position
Pattern discovery from dialogue • Goal: Finding recurring events or event sequences in human face-to-face dialogues. • Why?: Human communication skills are often experience or assumption-based. • Enable smooth communication • Prevent problematic communication • Task: Identify plausible hypotheses by machines that human cannot notice by observation
Target dialogue • Psychotherapeutic Interview (Counseling) • Counseling at schools • Counseling at hospitals • Increasing demand for therapists • Shortage of qualified teachers • Lack of effective training methods • Therapist training setting (non-experimental)
Our Corpus (Private) • Psychotherapeutic interview (counseling) • Training opportunity for students • 25 dialogues (approx. 2 hrs each, 21 hrs in total) • Adding more dialogues (3/year)
Recording and data format Priority: minimize disturbance for participants Single Camera Two microphones AVI -> MPEG Video Data Transcript Annotation
Multimodality • Verbal cue is dominant in defining meanings (textual information) • What are the impact of non verbal cues such as gestures, eye-gaze, styles, timing, or context including social background?
Can gestures indicate misunderstandings? • “Prediction of Misunderstanding from Gesture Patterns in Psychotherapy”, M. Inoue, R. Hanada, N. Furuyama, NII-2009-001E, Feb. 2009 • Negative result • We should rely on verbal content
Gestural Feature for Th & Cl • Before/During/After the misunderstanding • 5/10/50 sec. windows • Frequency (x1; x2; x3) • Frequency Difference (x4; x5) • Duration (Mean & Max & Min) (x6; x7; x8) • Mean Interval (x9)
Predictability by gestural cues • Classification by linear discriminate analysis • Is there any feature that have similar precision/recall tendency over different dialogues? P P 2 1 2 3 1 3 R R Dialogue 1 Dialogue 2
Analysis of speech type patterns • Understand how therapists speak words to their clients based on speech type transition patterns Closed question e.g., :”Do you mean ~?” Open question e.g., “Can you elaborate that?” Encouragement/Repeat e.g., “Go on.” “I see.” Rephrase e.g., “So, you are thinking ~.” Reflection of emotion Reflection of meaning Other A taxonomy used in counseling domain
Relationship between speech and gesture • Frequencies of speech types • At the beginning or the end of dialogues • How do speech patterns look differently when gestures are taken into consideration? • Speeches that co-occur with gestures VSSpeeches without gestures • Do above division leads to any changes in the speech type transition patterns?
Speech type transition in the beginning of the dialogue Sequences beginning from questioning Co-gesture Generic encouragement Non-Co-Gesture
Speech type transition in the ending part of the dialogue Sequence beginning from question Question and rephrase Co-Gesture Sequence beginning from encouragement Non Co-Gesture
Speech type transition in the ending part of the dialogue (Beginner therapist) Co-Gesture Reflection of therapists’ skill? Non Co-Gesture
Summary • Various speech sequence patterns can be interpreted as the techniques in dialogues. • Patterns could be better understood when multimodality is taken into account. • Discovered patterns could be used to assess the proficiency of therapist.
Mismatch between intension and perception over an utterance • Therapists (Th) want to empower clients (Cl) by compliments. • Clients want to be empowered by Th through their compliments. • They share the same goal but this process dos not goes well in reality. • Th tried compliment but Cl did not notice it • Some complimentary expression are uncomfortable to Cls • Th cannot figure out how Cls are praised
Compliment as a counseling technique • Therapists learn the concept and necessity of compliment through lectures, but • There is not enough analysis of failures. • Concrete examples of expression are scarce. As a result • Inexperienced Th cannot succeed in using compliment techniques in the actual interview occasionsvery often.
Analysis approach • How there happen mismatches in terms of vocabulary. • The focus is on whatThs say rather than how they say. • How the intention and perception are different over the word usage • Timing of the utterance are ignored. • To understand the generic tendency, multiple dialogues are mixed together into a word pool.
Data preparation • Transcripts based on the videos of psychotherapeutic interviews (13 pairs, 27 participants) • They are assigned to the participants. • Both Th and Cl highlights Th’s speech where Th conducted compliment (Th) or Cl was empowered (Cl). • Highlighted speeches are extracted and put into the word pool.
Degree of discrepancy • Number of highlighted speech by therapists: • 114 (M=8.1) • Number of highlighted speech by clients: • 69(M=4.6) • Agreement: • 6%(11/183) Both marked (11)
Pre-processing • Morphological analysis • Replacement of words (fluctuation, removal of proper nouns for anonymity) • Number of tokens: 4250 • Removal of low frequent (tf<2) or single document (df<2) words focusing on the generic (cross-dialogue) expressions • Number of vocabulary: 476 -> 113
Eliminate high frequency words frequency word id threshold
Summary • Problem: Compliment used by therapists (Th) during counseling are not well accepted by clients (Cl). • Data: 13 dialogue transcripts; utterances where Th intended compliment technique and Cl feel empowered by compliment are marked. • Analysis: To understand the mismatch in vocabulary level, differences in usage are explored in terms of frequency. • Th tend to use compliment technique to focus on the difficulties of the problem. • Cl may be empowered by the words referring internal mental status. • Future direction: Understanding resolving process of mismatches taking the difference in proficiency of therapists and dialogue topics into account.