Dynamic Visual Scene Analysis System

Generalarchitecture CS 664, Session 19

Minimal Subscene • Working definition: The smallest set of objects, actors and actions in a dynamic visual scene that are relevant to present behavior For now we will assume: • Bottom-up: objects/actors/actions must be visible • Top-down: relevance to present behavior explicitly specified, e.g., by specifying a question or task • Knowledge base: the system may supplement explicit knowledge with long-term acquired knowledge CS 664, Session 19

Motivation:Humans • 1) Free examination • 2) estimate material • circumstances of family • 3) give ages of the people • 4) surmise what family has • been doing before arrival • of “unexpected visitor” • 5) remember clothes worn by • the people • 6) remember position of people • and objects • 7) estimate how long the “unexpected • visitor” has been away from family CS 664, Session 19 Yarbus, 1967

“Beobot” CS 664, Session 19

VisualAttention see http://iLab.usc.edu CS 664, Session 19

ObjectRecognition Riesenhuber & Poggio, Nat Neurosci, 1999 (MIT) CS 664, Session 19

Action Recognition Oztop & Arbib, 2001 CS 664, Session 19

Start: • Issue question • Parse question • Extract keywords • Expand to related concepts, • using ontology/KB • -Fill initial “task list” CS 664, Session 19

Task list Working list of currently relevant objects/actors/actions • Initially empty • Question/task specification provides initial filling-in • As the scene is scanned and objects/actors/actions are recognized, contents of task list are updated CS 664, Session 19

“Where:” attention, saliency map and task map Input: video stream Low-level vision: massively parallel extraction of simple visual features from video input Saliency map: localizes conspicuous (potentially interesting) objects irrespectively of why they are salient Task map: acts as spatial filter to saliency map; only locations in the current minimal subscene can easily pass through. Other locations need to be exceptionally salient to pass through. CS 664, Session 19

“What” memory Relates concepts to visual properties Bridge between visual and semantic knowledge CS 664, Session 19

Generalarchitecture CS 664, Session 19

Examples / experiments • Examine video clips • For each scene, please write down: • Most salient object • Most salient action • Minimal subscene • Who is doing what to whom CS 664, Session 19

Scene 001 CS 664, Session 19

Scene 001 – Attentional Trajectory CS 664, Session 19

Dynamic Visual Scene Analysis System