Attention and Perception of Attentive Behaviours for Emotive Interaction

Attention and Perceptionof Attentive Behavioursfor Emotive Interaction Christopher Peters LINC University of Paris 8

Virtual Humans • Computer models of people • Can be used as… • substitutes for the real thing in ergonomic evaluations • Conversational agents • Display and Animation: • Two layers: skeletal layer and skin layer • Skeleton is hierarchy of positions and orientations • Skin layer provides the visual appearance

Animating Characters • Animation Methods • Low level: rotate leye01 by 0.3 degrees around axis (0.707,0.707,0.0) at 0.056 seconds into the animation • High level: ‘walk to the shops’ • Character must know where shop is, avoid obstacles on the way there, etc. Must also be able to walk … • Autonomy • Direct animation  automatic generation • Autonomy requires character to animate itself based on simulation models • Models should result in plausible behaviour

Our Focus • Attention and related behaviours • (a) Where to look • (b) How to generate gaze behaviour • Perception of attentive behaviours and emotional significance • How to interpret attention behaviours of others • Conversation initialisation in Virtual Environments…

Why VE? • Cheap! • No need for expensive equipment (facilities, robots, etc) • Duplication at the click of a mouse • Quick • Changes to environment can be made quickly and easily, at no extra cost • But… • Things we take for granted in RL need to be programmed into the virtual environment • Physics • And will only ever be approximations of reality

1. Attention and Gaze • Our character is walking down a virtual street • Where should the character look and how should the looking behaviours be generated?

Foundation • Humans need to look around • An ecological approach to visual perception, J.J. Gibson, 1979. • Eyes in the front of our head • Poor acuity over most of visual field • Even for places where we have been before, memory is far from perfect • Virtual humans should look around too! Ilab, University of Southern California

Significance to Virtual Humans • Viewer perception • The impact of eye gaze on communication using humanoid avatars, Garau et al., 2001. • Plausibility • “If they don’t look around, then how do they know where things are?” Human viewer

Significance to Virtual Humans • Functional purposes • Navigation for digitial actors based on synthetic vision, memory and learning, Noser et al., 1995. • Autonomy • If they don’t look around, then they won’t know where things are

Our Focus • Gaze shifts versus saccadic eye movements • General looking behaviours • Where to Look? Automating Certain Visual Attending Behaviors of Human Characters, Chopra-Khullar, 1999. • Practical Behavioural Animation Based On Vision and Attention, Gillies, 2001. • Two problems: • Where to look • How to look

Approach • Use appropriately simplified models from areas such as psychology, neuroscience, artificial intelligence … • Appropriate = fast, allowing real-time operation • Capture the high-level salient aspects of such models without the intricate detail • Components • Sensing • Attention • Memory • Gaze Generator Where to look How to look

System Overview Input environment through synthetic vision component • Process visual field using spatial attention model • Modulate attended object details using memory component • Generate gaze behaviours towards target locations

Visual Sensing • Three renderings taken per visual update • One full-scene rendering (  to attention module) • Two false-colour renderings (  to memory module)

False-colour Renderings • Approximate acuity of the eye with two renderings • Fovea • Periphery Viewpoint Fovea Periphery

Renderings • Renderings allow both spatial/image and object based operations to take place

1(a) Where to look • Model of Visual Attention • Two-component theory of attention • “Bottom-up” • Endogenous • Voluntary, task driven • ‘Look for the coke can’ • “Top-down” • Exogenous • Environment appears to ‘grab’ our attention • Colour, intensity, orientation, motion, texture, sudden onset, etc

Bottom-up Attention Orientation, intensity and colour contrast

Bottom-up Attention • Model • Cognitive engineering • Itti et al. 2000 • http://ilab.usc.edu/bu/ • Biologically inspired • Inputs an image, outputs encoding of attention allocation • Peters and O’ Sullivan 2003

Input Image Intensity RG Colour BY Colour

Gaussian Pyramid • Each channel acts as the first level in a Gaussian or Gabor pyramid • Each subsequent level is a blurred and decimated version of the previous level • Image processing techniques simulate early visual processing

Center-Surround Processing • Early visual processing • Ganglion cells • Respond to light in a center-surround pattern • Contrast a central area with its neighbours • Simulated by comparing different levels in image pyramids • Contrast important, not amplitude (CONTEXT)

Saliency Map • Conspicuity Maps • Result of center-surround calculations for each feature type • Define the ‘pop-out’ for each feature type • Integrated into saliency map • Attention directed preferably to lighter areas Input Intensity Colour Orientation Saliency Map

Memory • Differentiate between what an agent has and hasn’t observed • Agents should only know about objects that they have witnessed • Agents won’t have exact knowledge about world • Used to modulate output of attention module (saliency map) • Object-based, taking input from synthetic vision module

Stage Theory • The further information goes, the longer it is retained • Attention acts as a filter

Stimulus Representations • Two levels of detail representation for objects • Proximal stimuli • Early representation of the stimulus • Data discernable only from retinal image • Observations • Later representation of stimuli after resolution with the world database

Stage Theory • Short-term Sensory Storage (STSS) • From distal to proximal stimuli • Objects have not yet been resolved with world database

Stage Theory • Short-term memory (STM) and Long-Term Memory (LTM) • Object-based • Contains resolved object information • From proximal stimuli to observations • Observations store information for attended objects • Object pointer • World-space transform • Timestamp • Virtual humans are not completely autonomous from the world database

Memory Uncertainty Map • Can now create a memory uncertainty map for any part of the scene the agent is looking at • The agent is uncertain of parts of the scene it has not looked at before • Depends on scene object ‘granularity’

Attention Map • Determines where attention will be allocated to • Bottom-up components • Top-down (see 2) • Memory • Modulating the saliency map by the uncertainty map • Here, sky and road have low uncertainty levels

Human Scanpaths Eye movements and fixations

Inhibition of Return • Focus of attention must change • Inhibit attended parts of the scene from being revisited soon • Image-based IOR • Problem: Moving viewer or dynamic scene • Solution: Object based memory • Object-based IOR • Store uncertainty level with each object • Modulate saliency map by uncertainty levels

Artificial Regions of Interest • Attention map at lower resolution than visual field • Generate AROIs from highest values of current attention map to create scanpath • Assume simple one-to-one mapping from attention map to overt attention

1(b) How to look • Generate gaze animation given a target location • Gaze shifts • Combined eye-head gaze shifts to visual and auditory targets in humans, Goldring et al., 1996. • Targets beyond oculomotor range

Gaze Shifts • Contribution of head movements • Head Movement Propensity, J. Fuller, 1992. • ‘Head movers’ Vs. ‘Eye movers’ • ±40 degree orbital threshold • Innate behavioural tendancy for subthreshold head moving • Midline-attraction and Resetting

Blinking • Subtle and often overlooked • Not looking while leaping: the linkage of blinking and saccadic gaze shifts, Evinger et al., 1994. • Gaze-evoked blinking • Amplitude of gaze shift influences blink probability and magnitude

2. Perception of Attention • Attention behaviours may elicit attention from others • Predator-prey • Gaze-following • Goals • Intentions

Gaze in Infants • Infants • Notice gaze direction as early as 3 months • Gaze-following • Infants are faster at looking at targets that are being looked at by a central face • Respond even to circles that look like eyes www.mayoclinic.org

Theory of Mind • Baron-Cohen (1994) • Eye Direction and Intentionality Detectors • Theory of Mind Module • Perrett and Emery (1994) • More general Direction of Attention Detector • Mutual Attention Mechanism

Our Model • ToM for conversation initiation • Based on attention behaviours • Key metrics in our system are Attention Levels and Level of Interest • Metrics represent the amount of attention perceived to have been paid by another • Based primarily on gaze • Also body direction, locomotion, directed gestures and facial expressions • Emotional significance of gaze

Implementation (in progress) • Torque game engine • http://www.garagegames.com • Proven engine used for number of AAA titles • Useful basis providing fundamental functionality • Graphics exporters • In-simulation editing • Basic animation • Scripting • Terrain rendering • Special effects

Overview

Synthetic Vision • Approximated human vision for computer agents • Why? • Inexpensive – no special hardware required • Bypasses many computer vision complexities • Segmentation of images, recognition • Enables characters to receive visual information in a way analogous to humans • How? • Updated in a snapshot manner • Small, simplified images rendered from agents perspective • Textures, lighting and sfx disabled • False-colouring

False-colours provide a look-up scheme for acquiring objects from the database • False colour defined (r,g,b) where • Red is the object type identifier • Green is the object instance identifier • Blue is the sub-object identifier • Allows quick retrieval of objects

Intentionality Detector(ID) • Represents behaviour in terms of volitional states (goal and desire) • Based on visual, auditory and tactile cues • Our version only based on vision • Attributes intentionality characteristic to objects based on the presence of certain cues • Implemented as a filter on objects from visual system • Only “agent” objects can pass the filter

Direction of Attention • Direction of Attention Detector (DAD) • More useful than EDD alone • Eye, head, body and locomotion direction read from database after false-colour lookup • Used to derive Attention Level metric from filtered stimuli

Direction of Attention • What happens when eyes aren’t visible? • Hierarchy of other cues • Head direction > Body direction > Locomotion direction

Mutual Attention • Comparison between: • Eye direction read from other agent • Focus of attention of this agent • See 1. Generating Attention Behaviours • If agents are focus of each others attention, Mutual Attention Mechanism (MAM) is activated

Attention Levels • Perception of attention paid by another • At instant of time • Based on orientation of body parts • Eyes, head, body, locomotion direction

Attention Levels • Direction is weighted for each segment • Eyes provide largest contribution Less attention

Attention and Perception of Attentive Behaviours for Emotive Interaction