430 likes | 473 Views
Explore the selection process of minimal subscenes in visual scenes based on bottom-up salience, top-down relevance, and cultural influences. Learn how factors like setting, salience, and personal preferences shape minimal subscene formation.
E N D
Minimal Subscene • Working definition: The smallest set of objects, actors and actions in a dynamic visual scene that are relevant to present behavior For now we will assume: • Bottom-up: objects/actors/actions must be visible • Top-down: relevance to present behavior explicitly specified, e.g., by specifying a question or task • Knowledge base: the system may supplement explicit knowledge with long-term acquired knowledge CS 664, Session 20
Generalarchitecture CS 664, Session 20
Factors influencing selection of minimal subscene At least include… • Setting/gist/layout • Bottom-up salience • Cultural/learned • Top-down CS 664, Session 20
Factors influencing selection of minimal subscene (1) • Setting/gist/layout: selected objects/actors/actions tended to be: - at center of field of view - in the foreground / occludes other objects/actors/actions - followed by camera if camera moved - present throughout video clip - often getting closer / growing larger • E.g., boy playing with scooter; bare-chested man standing & drinking Caveats: - lack of stereo increases foreground/background interferences - having everything in focus is unnatural - ambiguity in selection of minimal subscene if actors pass by - selected minimal subscene may disintegrate due to occlusions CS 664, Session 20
Factors influencing selection of minimal subscene (2) • Bottom-up salience: introspection as well as model suggest that selected objects/actors/actions were fairly salient • E.g., boy riding scooter; bare-chested man • Note: motion cues widely agreed to be the strongest Caveats: - low-quality video makes details difficult to perceive - salient distracting actors may disengage attention from current minimal subscene CS 664, Session 20
Factors influencing selection of minimal subscene (3) • Cultural/learned: some actors/objects/actions may bear culturally strong meaning that is likely to make them belong to the minimal subscene • E.g., finger pointing movement; facial expressions; alpha male Caveats: - culture-specific (? – not tested) - experience-specific (? – not tested) CS 664, Session 20
Factors influencing selection of minimal subscene (4) • Top-down: behavioral priorities and personal likings influence selection of components of minimal subscene • E.g., nerd playing with electronic gadget; handsome man; pretty girl; groups more interesting than isolated people Caveats: - somewhat linked to cultural - gender-specific differences - most probably influenced by nature of task but we have not explicitly tested for that CS 664, Session 20
Nature of minimal subscene • Exploration mode: Initial selection seems guided by setting/gist/layout as well as salience • E.g., focus on salient actors at center & foreground • Analysis mode: once locked onto a minimal subscene, all background activity becomes distracting • Disengagement: if background distractor strong enough, may break current minimal subscene and trigger analysis of another minimal subscene • E.g., nice girl passing by CS 664, Session 20
Additional caveats • If a minimal subscene is too boring, it will easily disintegrate • E.g., second clip with boy & dad playing with scooter: “pffff, him again, I know what he will be doing, so let’s check out what else is happening” • Cameraman may have strong influence on which minimal subscene is selected • E.g., by determining centering, by following some actors (or not following them – which may be perceived as unnatural and be distracting) • In extended video clips, several minimal subscenes may be selected in sequence • E.g., first boy with scooter, then pretty girl, then man with dog, etc. CS 664, Session 20
Can we deal with all that? • Setting/gist/layout: in principle, yes – some limited models exist • Bottom-up salience: should be fine based on previous modeling • Cultural/learned: very difficult for a computer system! Cues often very subtle (e.g., facial expressions) or involve complex spatial transformations (e.g., pointing to a location in 3D space) • Top-down: should be fine based on previous modeling work CS 664, Session 20
Generalarchitecture It is important to Note that the General architecture Seems to support all Functions just described. CS 664, Session 20
More video clips? • Multi-threaded events / interactions • Influence of task • Foreground/background ambiguities • Cross-clip continuity • Effects of scale • Multiple simultaneous subscenes • Etc… CS 664, Session 20
Examples / experiments • Examine video clips • For each scene, please write down: • Most salient object • Most salient action • Minimal subscene • Who is doing what to whom CS 664, Session 20
Scene 018 CS 664, Session 20
Scene 018 – Attentional Trajectory CS 664, Session 20
Scene 019 CS 664, Session 20
Scene 019 – Attentional Trajectory CS 664, Session 20
Scene 020 CS 664, Session 20
Scene 020 – Attentional Trajectory CS 664, Session 20
Scene 021 CS 664, Session 20
Scene 021 – Attentional Trajectory CS 664, Session 20
Scene 022 CS 664, Session 20
Scene 022 – Attentional Trajectory CS 664, Session 20
Scene 023 CS 664, Session 20
Scene 023 – Attentional Trajectory CS 664, Session 20
Scene 024 CS 664, Session 20
Scene 024 – Attentional Trajectory CS 664, Session 20
Scene 025 CS 664, Session 20
Scene 025 – Attentional Trajectory CS 664, Session 20
Scene 026 CS 664, Session 20
Scene 026 – Attentional Trajectory CS 664, Session 20
Scene 027 CS 664, Session 20
Scene 027 – Attentional Trajectory CS 664, Session 20
Scene 028 CS 664, Session 20
Scene 028 – Attentional Trajectory CS 664, Session 20
Scene 029 CS 664, Session 20
Scene 029 – Attentional Trajectory CS 664, Session 20
Scene 030 CS 664, Session 20
Scene 030 – Attentional Trajectory CS 664, Session 20
Scene 031 CS 664, Session 20
Scene 031 – Attentional Trajectory CS 664, Session 20
Scene 032 CS 664, Session 20
Scene 032 – Attentional Trajectory CS 664, Session 20