CS 764 Seminar in Computer Vision

CS 764Seminar in Computer Vision Ramin Zabih Fall 1998

Course mechanics, revised • Meeting time will be Tue/Thu 11-12, here • No meeting next week; next meeting is Sept 24 • RDZ will send email reminder • Home page is now up at: www/CS764 • Check out the suggested readings

What is the visual system’s “contract” • Most of the field is low-level vision or model-based recognition • Well-defined problems • Not what you want for almost any task • Key question: how to avoid brittleness? • Can make the visual system compute just what we need for our task (I.e., berries) • But how to handle the unexpected (I.e., lions)?

Our path for 764 • No good computational work to read • We will examine papers along these lines: • Computational approaches that failed • Psychological data that is highly suggestive • Neurologically inspired architectures • Cognitive scientists and philosophers • Their goal is argument, not algorithm! • They’ve thought the most about these issues

Today: active vision and attention • Faster computers made some tasks attainable • Especially, robotics • Basic observations: • Robots desperately need vision • No one needs what the vision community is trying to provide! • One can get by with a lot less • And build robots that work now

The pendulum swings back “In the active vision paradigm, the basic components of the visual system are visual behaviors tightly integrated with the actions they support; these behaviors may not require elaborate categorical representations of the 3-D world… The cost of generating and updating a complete, detailed model of our everyday world is too high” “Active vision encompasses attention, selective sensing in space, resolution and time” -Swain & Stricker, Promising directions in active vision, 1991

Attention • Emphasize the top-down selection of what to compute • But is there a cost in brittleness? • Some very nice experimental work • Treisman’s spotlight model • Extensive psychological studies • But, very simple stimuli! • A natural approach to implement

Visual attention • Psychophysicists study this area a lot • Some models seem pretty computational • Issues include “pop-outs” • Most famous studies involve response time as a function of number of distractors • Linear response implies visual search • Top down driven! • Constant response implies pop-outs

Experimental results • A Q will pop out in a field of O’s • An O will not pop out in a field of Q’s • A different color will also pop out • Red T in a field of green T’s • But: a different color and a different shape will not! • Find the red Q among green Q’s, red and green O’s

Spotlight model • Some properties are computed “bottom up” • Everywhere, all the time • Example: color, motion • A field can signal an outlier, but only one field at once • We use an attentional “spotlight” in order to compute, I.e., conjunctions of prperties • Not the same as the fovea!

Computational proposals • Ullman’s influential Visual Routines paper • Attentional in flavor, not in details • Pengi and “deictic” representations • Moved away from object-based representations • Ullman’s more recent work on sequence seeking • Mixture of neural and computational modeling

Some suggested intuitions • Good ideas that lack computational proposals • Shortcuts for recognition • Mentioned in Ullman’s recent work • Part of Vera’s thesis • Choice of properties to compute • What best distinguishes the thing you want from the other things out there? • How do you know what else will appear?

CS 764 Seminar in Computer Vision