190 likes | 321 Views
Deducing the Visual Focus of Attention from Head Pose Estimation in Dynamic Multi-View Meeting Scenarios. Michael Voit Interactive Analysis and Diagnosis Fraunhofer IITB Karlsruhe, Germany Rainer Stiefelhagen Interactive Systems Labs Universität Karlsruhe, Germany. Motivation.
E N D
Deducingthe Visual Focus of Attention from Head Pose Estimationin Dynamic Multi-View Meeting Scenarios Michael Voit Interactive Analysis and Diagnosis Fraunhofer IITB Karlsruhe, Germany Rainer Stiefelhagen Interactive Systems Labs Universität Karlsruhe, Germany
Motivation • Determinethetarget a personislookingat • Importantcuetorecognize a person‘soccupationandactions Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Difficulties • Ideal method: trackeyegaze • Recentdevelopment in HCI / CV: • Unobtrusivesensorsetup („Smart Rooms“) • Complexenvironments (manyinteractionobjects) & „everydaysituations“ • Surveillance Pupils not alwaysvisible in naturalenvironments! Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Current State ofthe Art • vFoAusingheadorientation: • Stiefelhagen et al . Tracking Focus of Attention in Meetings. 2002. • Ba et al. Multi-Party Focus of Attention Recognition in Meetings from Head Pose and Multimodal Interfaces. 2006. • Ba et al. A CognitiveandUnsupervisedMap Adaptation Approach tothe Recognition of Focus ofAttentionfrom Head Pose. 2007. • … • Drawbacks: • vFoAhasonlybeentrackedforverycontrolled, restrictedscenarios: • Meetings withfixedsetofparticipants • Nomovement, nodynamics • Datasets onlyincludesmallnumberofallowedtargets (mostlymeetingparticipants) Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
This Work • Introducedynamicscenes • Movingpeopleandobjects • Variable numberofparticipants • Useunobtrusivesensorsetup • Collectfirstdatasetincludingdynamicscenes • PresentourfirstsystemtoestimatevFoAunderthesenewconditions • Adaptive mappingfromheadposetolikelygaze Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Data Collection: Setup • 10 Meetings • Each ~ 10 minuteslong • Unobtrusivesensorsetup • 4 wide angle cameras in room‘scorners • Fisheye on ceiling • T-shapedmicrophonearrays • Table-top microphones Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Data Collection: Dynamics • Eachmeetingconsistsof 4-5 participants: • 3 actors • 1-2 „unaware“ colleagues • Predefinedscript • All objectsarepossible focustargets Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Data Collection: Annotations (1) • Head orientation: • Magneticposesensor • Annotations: • All objects in room • Observedfocustargets • Head bounding box • Currently in progress: • Upperbodyorientation • Speakeractivity Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Data Collection: Annotations (2) Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Multi-View Head Pose Estimation Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
From Head Pose to Gaze • Neurophysiology (e.g. Freedmanand Sparks): Mapping not constant! Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Reasonsfor Quick Focus Changes • Interaction: • Speech, discussion • Presentation • Introductionofnewtargets (gestures, …) • Occupation Change: • Focus changesfromwhiteboardtonotebookwhilewriting down notes • Curiosity / Movement: • A targetpassesby, focusfollowsthatperson • A personentersscene • Noise comingfrom a target (generaldisturbance) Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Idea: Adaptmappingfactor • Buildhistogramoverdiscretizedsetofmappingfactors k • Adapthistogrambinsaccordingtotargets‘ posteriorlikelihoods: • Model focusshifts in priors: Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Experiments (so far…) • Video dataonly • Currentexperiments on dominant targetsonly • Persons (~60%) • Whiteboard (~12%) • Table (incl. Notebooks, ~12%) • Experiments on personwearingsensoronly • Upperbodyorientationfixed • Body orientationofremainingparticipantsiscurrentlybeingannotatedmanually Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Results Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Conclusions • vFoAimportantcuetodeterminepeoples‘ occupations • First datasetfordynamicmeetingscenes • Variable numberoftargets • Movingtargets • Large amountof potential targets • Constant mappingfromheadposetogaze suboptimal • Adaptive mappingenhancesresults (10%+) • vFoAis VERY ambiguouswithoutregardingfurthermodalities: • Gestures, Speech, … • Definingnecessary „levelofdetail“ • Currentworkincludesestimatingupperbodyorientation • Extendsystemto all participants • Computejointviewingareas • Future work: • completingannotations • Includemoremodalities Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
ThankYou ! Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
From Head Pose to Gaze (2) Deducing vFoA from Head Pose in Dynamic Meeting Scenarios
Multi-View Head Pose Estimation Deducing vFoA from Head Pose in Dynamic Meeting Scenarios