240 likes | 470 Views
To Beat or Not To Beat? Beat Gestures in Direction Giving Chris Brandhorst & Mari ët Theune University of Twente. Overview. Beats and other gesture types Research context: The Virtual Guide A small direction giving corpus How to recognize beats? The Beat Filter
E N D
To Beat or Not To Beat?Beat Gestures in Direction GivingChris Brandhorst & Mariët TheuneUniversity of Twente
Overview • Beats and other gesture types • Research context: The Virtual Guide • A small direction giving corpus • How to recognize beats? The Beat Filter • When are beats used? Concept categories • A (very) simple beat usage model • Conclusions and future work
Beats and other gestures Gesture types • Deictic: pointing at an object’s location • Iconic: representing the shape of a concrete object • Metaphoric: depicting an abstract object using metaphor • Beat: indicating discourse structure; emphasis McNeill (1992) “Hand and Mind” p.93: beats made up 44.7% of used gestures in a cartoon narration corpus.
Context: The Virtual Guide • An embodied direction giving agent in a 3D environment
Video (link naar filmpje hier) Try out the Virtual Guide “live” here: http://wwwhome.ewi.utwente.nl/~hofs/dialogue/
Gesture generation • Keyword-based • Turns (“left”, “right”, etc.): fixed pointing gesture in turn direction • Objects (“the coffee counter”, etc.): • pointing gesture to absolute 3D object location (gesture is computed dynamically) • pointing gesture to relative object location, from viewpoint along the route (fixed gesture; like Turns) • iconic gesture reflecting object shape (fixed gesture from “gestionary”)
When to use beat gestures? • In the BEAT system (Cassell et al., 2001) gestures are used to mark new information and to contrast items • Beats are given low priority: they are only used when no other gesture type is available • In the Virtual Guide, in almost all cases a pointing or iconic gesture is available – so, no “need” for beats? • No: human direction givers do often use beat gestures, as shown in our small video corpus.
Direction giving video corpus Used for this study: • 15 short video clips (± 45 sec. each) • 4 different Dutch speakers, 3-4 clips each • 2 different destinations in our building • 2 versions of each clip (except one): with a listener present or to the camera • 133 gestures; 124 annotated (others not clearly visible)
How to recognize beats? Beat characteristics: • “A simple flick of the hand” • Short and quick • Only 2 gesture phases: preparation and retraction (no stroke) • No “tensed stasis” • Formless hand shape Formal coding, based on shape only: the Beat Filter (McNeill, 1992) “filters out” beats from other gestures.
The Beat Filter • Does the gesture have other than 2 movement phases? • How often does tensed stasis or finger movement appear? • If the first movement is in non-center space, is any other movement in center space? • If there are exactly 2 movement phases, are they in different spaces? • Add 1 point for each “yes” answer to Questions 1, 3, 4 to the number given in answer to Question 2. • The lower the score, the more likely the gesture is a beat.
Using the Beat Filter (1) • 109 gestures were scored with the Beat Filter* • 95 of those were annotated by two annotators (the other 14 were used as test items) • Annotator agreement on the Beat Filter questions was very low: • Question 1: K = 0.43 • Question 2: K = 0.31 • Question 3: K = 0.18 • Question 4: answer is dependent on Q1, so computing reliability makes no sense *15 gestures were considered to be “obvious” (other gesture types than beats) and not “filtered” by annotator A …!
Using the Beat Filter (2) Agreement on total Beat Filter scores: • 44.2% same score (but possibly on different grounds!) • 36.8% difference of 1 • 16.8% difference of 2 • 2.1% difference of 3 In the end, only the scores of annotator A were used.
Annotating gesture types • Gesture types based on global shape information: resemblance to mentioned object, finger pointing, directional component, etc. (in combination with speech) Annotator agreement: • Agreed on 83.3% of gesture types (102 of 124), K=0.73 • Of these, 33.3% are beats (34 of 102) • Disagreed on 17.7% of gesture types (22 of 124) • Most confused were point and iconic (45.5%) • Next most confused were beat and point (13.6%)
Gesture types and beat score • Beats do have lower Beat Filter scores • Many pointing gestures have low scores too *NF (Not Filtered) gesture types were not entirely obvious after all…!
When are beat gestures used? Some direction giving concept categories were defined: • Directions(up, down, left, right, …) • (Other) Spatial information(through, in, on, at, across, …) • Duration & Timing(all the way, continue, immediately, …) • Landmarks • Nouns(windows, a square, the hallway, …) • Pronouns (that, the same, this, they, it, …) • Points in Time or Space(now, then, here, there, …) • Hesitations (uh, uhm, I would say, something like that, maybe, …)
Concept Categories Relative frequency of concept categories: • Landmarks are most frequently mentioned • Directions are only in fourth place
Concepts and gestures • Not fitting into these categories: 4 beats, 10 “other gestures” SI = Spatial Information; H = Hesitations; DT = Duration & Timing; L = Landmarks; PTS = Points in Time or Space; D = Directions
Landmarks: pronoun or noun • Landmarks as pronouns: fewer gestures, relatively more beats
A simple beat usage model The probability that a beat gesture B is generated to accompany an utterance u (and modelling speaker s): P(B|u) = P(B|Cu) x ms where • Cu is the concept category of u • P(B|Cu)is the probability of B accompanying Cu based on corpus data • msis an optional multiplier for speaker s (weight factor)
Toward a better model Other factors than just corpus frequency should be taken into account. For example: • First or second time the same directions are given? • Listener present or not? • Context: influence of preceding and following concepts / gestures • Etc. And of course, more (and more reliable!) corpus data are needed.
Conclusions How can we recognize beats? • Applying Beat Filter to recognize beat gestures may not give reliable results • “Impressionistic” gesture type annotation was more reliable • Add “directionality” and “hand shape” to Beat Filter? When are beats used? • “Other” gestures don’t always take precedence over beats • Beats mark spatial information, hesitations, duration and timing more often than other gestures do
Future work • More data • More reliable annotation • Investigate when beats or other gestures are used given a concept category • Better / more general concept categories? → Implement in the Virtual Guide
The End Questions?