100 likes | 116 Views
Explore selective perception strategies for guiding sensing and computation in multimodal systems, as presented in the ICMI’03 paper by N. Oliver and E. Horvitz. Learn about the SEER application, recognition engines used, and the benefits of not processing every detail always. Discover the usage of LHMM recognition engines, selective perception strategies at different levels, and the Expected Value of Information concept applied to reduce computational costs effectively.
E N D
Selective Perception Policies for Guiding Sensing and Computation in Multimodal Systems Brief Presentation of ICMI’03 N.Oliver & E.Horvitz paper Nikolaos Mavridis, Feb ‘02
Introduction • The menu for today: • An application that served as testbed & excuse • The architecture of recognition engines used • Two varieties of selective perception • Results • Big Ideas • An intro to resolver • The main big idea: NO NEED TO NOTICE AND PROCESS EVERYTHING ALWAYS!
The Application • SEER: • A multimodal system for recognizing office activity • General setting: • A basic requirement for visual surveillance and multimodal HCI, is the provision of of rich, human-centric notions of context in a tractable manner… • Prior work: mainly particular scenarios (waiving the hand etc.), HMM, DynBN • Output Categories: • PC=Phone Conversation • FFC=Face2Face Conversation • P=Presentation • O=Other Activity • NP=Nobody Present • DC=Distant Conversation (out of field of view) • Input: • Audio: PCA of LPC coeffs, energy, μ,σ ofω0, zero cr. rate • Audio Localisation: Time Delay of Arrival (TDOA) • Video: skin color, motion, foreground and face densities • Mouse & Keyboard: History of 1,5 and 60sec of activity
Recognition Engine • Recognition engine: LHMM (Layered!) • First level: • Parallel discriminative HMM’s for categories: • Audio: human speech, music, silence, noise, ring, keyboard • Video: nobody, static person, moving person, multiperson • Second level: • Input: Outputs of above + derivative of sound loc + keyb histories • Output: PC, P, FFC, P, DC, N – longer temporal extent! • Selective Perception Strategies usable for both levels! • Selecting which features to use at the input of the HMM’s! • Example: • motion & skin density for one active person • Skin density & face detection for multiple people • Also for second stage: selecting which first stage HMM’s to run… • HMM’s vs LHMM’s • Compared to CP HMM’s (cart. Product, one long feature vector) • Prior knowledge about problem encoded in structure for LHMM’s • I.e. decomposition into smaller subproblems -> less training required, more filtered output for second stage, only first level needs retraining!
Selective Perception Strategies Why sense everything and compute everything always?!? • Two approaches: • EVI: Expected Value of Information (ala RESOLVER) • Decision theory and uncertainty reduction • EVI computed for different overlapping subsets, real time, every frame • Greedy, one-step lookahead approach for computing the next best set of observation to evaluate • Rate-based perception (somewhat similar to RIP BEHAVIOR) • Policies defined heuristically for specifying observational frequencies and duty cycles for each computed feature • Two baselines for comparison: • Compute everything! • Randomly select feature subsets
Expected Value of Information Endowing the perceptual system with knowledge of the value of action in the world…
Expected Value of Information But what we are really interested in is what we have to gain! Thus: Where we also account for: • What we would given no sensing at all • Cost of sensing – but have to map cost and utility to the same currency! • HMM-ised implementation used! • Richer cost models: • Non-identity U matrix • Constant vs. activity-dependent costs (what else is running?) – successful results! (no significant decrease in accuracy;-))
Rate-based perception • Simple idea: • In this case, no online-tuning of rates… • Doesn’t capture sequential prerequisites etc.
Results EVI: No significant performance decrease with much less computational cost! Also effective in activity-dependent mode. And even more to be gained!
Take home message:Big Ideas • No need to sense & compute everything always! • In essence we have a Planner: • a planner for goal-based sensing and cognition! • Not only useful for AI: • Approach might be useful for computational modeling of human performance, too… • Simple satisficing works: • No need for fully-optimised planning; with some precautions, one-step ahead with many approximations is sufficient – ALSO more plausible for Humans! (ref:Ullman) • Easy co-existence with other goal-based modules: • We just need a method for distributing time-varying costs of sensing and cognitising actions (centralised stockmarket?) • As a future direction: time-decreasing confidence mentioned