170 likes | 284 Views
Towards an integrated scheme for semantic annotation of multimodal dialogue data. Volha Petukhova and Harry Bunt. Motivation. Several corpora with multimodal data transcriptions: AMI meeting corpus ( http://www.amiproject.org ); IFA Dialog Video corpus ( http://www.fon.hum.uva.nl/
E N D
Towards an integrated scheme for semantic annotation of multimodal dialoguedata Volha Petukhova and Harry Bunt
Motivation Several corpora with multimodal data transcriptions: AMI meeting corpus (http://www.amiproject.org); IFA Dialog Video corpus (http://www.fon.hum.uva.nl/ IFA-SpokenLanguageCorpora/IFADVcorpus); ISL meeting corpus (Burger et al., 2002)
Motivation • coding schemes for the analysis of nonverbal actions in terms of behavioural low-level features: • Facial Action Coding System (FACS) ; • Ham-NoSys • coding schemes of semantic andpragmatic information in visual expressions: • SmartKom Coding scheme(Steininger, 2001) • DIME-DAMSL(Pineta et al., 2005) • MUMIN annotation scheme (Allwood etal., 2004)
Motivation • the majority of these schemes are designed for a particularpurpose and are used solely by their creators (Dybkjær and Bernsen, 2002) • The AAMASworkshop ‘Towards a StandardMarkup Language forEmbodied Dialogue Acts’ in 2008 and 2009 • ISO project 24617-2 “Semantic annotationframework, Part 2: Dialogue acts”
Exploratory annotation study • DIT++ dialogue act annotation scheme (http://dit.uvt.nl/) • incorporates theoretical and empirical findings from otherapproaches (Petukhova & Bunt, 2009c and Bunt & Schiffrin, 2007) • describe not only task-oriented communicative actions,but also actions related to other communicative dimensions: Task, Auto-Feedback, Allo-Feedback, Turn Management, Time Management, Contact Management, Discourse Structuring, Social Obligation Management, Own Communication Management, Partner Communication Management • contains open classes, allowing suitable additions of those communicative functions which are specific for a certain modalityoffer flexible segmentationstrategies
Exploratory annotation study • Corpus material and annotations: two scenario-based dialogues witha total duration of 51 minutes from the AMI corpus • Tool: ANVIL (http://www.dfki.de/˜kipp/anvil) • Two annotations studies: (1) using only speech transcription and sound; (2) using speech transcription, sound and video provided with transcriptions of nonverbal signals (gaze, head, facial expression, posture orientation and hand movements).
Exploratory annotation study Transcriptions: • Verbal elements: manually produced orthographic transcriptions for each speaker, including word-level timings • Non-verbal elements: gaze direction; head movements; hand and arm gestures; eyebrow, eyes and lips movements; posture shifts; features: • form of movement (head: nod, shake, jerk; hands: pointing, shoulder-shrug, etc.; eyes: narrow, widen; lips: pout, compress, purse, flatten, (half)open, random moves); • direction (up, down, left, right, backward, forward); • trajectory (line, circle, arch); • size (large, small, medium, extra large); • speed (slow, medium, fast); • number of repetitions (up to 20 times); • FTO: difference between time that turn starts and moment that previous turn ends; • Duration Overall kappa = .76
Exploratory annotation study compared the annotations with respect to the numberand nature of • functional segments identified; (2) communicativefunctions altered; (3) communicative functionsspecified; and (4) communicative functions assigned to singlefunctional segments.
Results Nonverbal communicative behaviourmay serve fourpurposes: • emphasizing or articulating the semantic content of dialogueacts; • emphasizing or supporting the communicative functionsof synchronous verbal behaviour; • performing separate dialogue acts in parallel to whatis contributed by the partner; • expressing a separate communicative function in parallelto what the same speaker is expressing verbally.
Results • Full-fledged dialogue acts (20% new segments): • Feedback acts (68.5%): positive (65.3%), negative (3.2%): • Time Management (24.8%) • Turn Management(4.7%) • Discourse Structuring(2%)
Results are reflected in a significant majority of annotation schemes: We analyzed 18 well-known dialogue act annotation schemes: DAMSL, SWBD-DAMSL, LIRICS, DIT++, MRDA, Coconut, Verbmobil, HCRC MapTask, Linlin, TRAINS, AMI, SLSA, Alparon, C-Star, Primula, Matis, Chiba and SPAAC Feedback is not defined only in Linlin and Primula; Turn management acts are not defined in HCRC MapTask, Verbmobil, Linlin, Alparon and C-Star; Discourse Structuring is not defined in TRAINS and Alparon; and Time Management is not defined in MRDA, HCRC MapTask, Linlin, Maltus, Primula and Chiba.
Results • Communicative function alteration andspecification: • adjustment of the level of feedback (understanding vs agreement) • express degree of certainty about the validity of the proposition • reveal speaker’s attitude towards the addressee(-s), towards the content of what he is saying, or towards the actions he is considering to perform • signal speaker’s emotionalor cognitive state (Pavelin (2002): modalizers)
Results • Communicative function alteration andspecification: - no existing dialogue act annotationscheme deals with this type of information Proposal: a setof qualifiers that can be attached to communicative functionin order to describe the speaker’s behaviour more accurately
Results • Multifunctionality in multimodal utterances: • A verbal functional segment has on average 1.3 communicativefunctions (also confirmed in Bunt, 2009) • multimodal segment has 1.4 functions on average • very often concerned with feedback and other interaction management dimensions such as Own Communication Management and Time Management; Task and Turn Management; Task and Discourse Structuring; Task and Allo-Feedback • Dialogue act taxonomies that take the multifunctionality of utterances into account such as DIT++, LIRICS, DAMSL, MRDA and Coconut, known as multidimensional dialogue act annotation schemes
Results Articulating semantic content(about 39%): They are relating tothe propositional or referential meaning of an utterance For example deictic gestures: wording:Press this little presentation hand: ........point................. pure semantic acts, as a rule do not have a communicativefunction on their own
Conclusions • Multidimensional schemes could be used for annotation of multimodal data (such as DIT++, LIRICS, DAMSL, MRDA and Coconut) • Extension is needed: with respect to uncertainty, speaker’s attitude and speaker’s emotions • Proposal: • Amused Suggestion’ and ’Uncertain Answer’ (undesirable) • better to have qualifiers that can be attached to communicative function