Computational Vision

Computational Vision Jitendra Malik University of California, Berkeley

What is in an image? The input is just an array of brightness values; humans perceive structure in it.

Water back Grass Tiger Tiger Sand head eye legs tail mouse shadow From Pixels to Perception outdoor wildlife

Objects Scenes Pixels Local Neighborhoods Contours Surfaces Water Grass Tiger Sand Mid-level High-level Low-level Recognition Image Processing Grouping Figure/Ground Surface Attributes If visual processing was purely feedforward…(it isn’t)

Boundaries of image regions defined by a number of attributes • Brightness/color • Texture • Motion • Binocular disparity • Familiar configuration

A B C Grouping is hierarchical Perceptual organization forms a tree: Image BG L-bird R-bird bush far grass body beak body beak eye head eye head Two segmentations are consistent when they can be explained by the same segmentation tree • A,C are refinements of B • A,C are mutual refinements • A,B,C represent the same percept

Humans assign a depth ordering to surfaces across a contour • R1 appears in front of R2 • R2 appears in front of R3 This can be done for images of natural scenes …

Figure-Ground Labeling • - red is near; blue is far

Ground (shapeless) Figure (face) Figure (Goblet) Ground (Shapeless) Figure/Ground Organization • A contour belongs to one of the two (but not both) abutting regions. Important for the perception of shape

Good continuation Amodal completion Modal completion Some other aspects of perceptual organization

What do we see here?

And here?

Some Pictorial Cues

Support, Size 2 ? 3 ? 1 ?

Cast Shadows

Shading

Measuring Surface Orientation

Binocular Stereopsis

Optical flow for a pilot

Object Category Recognition

Shape variation within a category • D’Arcy Thompson: On Growth and Form, 1917 • studied transformations between shapes of organisms

Attneave’s Cat (1954)Line drawings convey most of the information

Objects are in Scenes

Human stick figure from single image Input image Stick figure Support masks

This is hard… • Variety of poses • Clothing • Missing parts • Small support for parts • Background clutter

Taxonomy and Partonomy • Taxonomy: E.g. Cats are in the order Felidae which in turn is in the class Mammalia • Recognition can be at multiple levels of categorization, or be identification at the level of specific individuals , as in faces. • Partonomy: Objects have parts, they have subparts and so on. The human body contains the head, which in turn contains the eyes. • These notions apply equally well to scenes and to activities. • Psychologists have argued that there is a “basic-level” at which categorization is fastest (Eleanor Rosch et al). • In a partonomy each level contributes useful information for recognition.

Visual Control of Action • Locomotion • Navigation/Way-finding • Obstacle Avoidance • Manipulation • Grasping • Pick and Place • Tool use

Camera Obscura(Reinerus Gemma-Frisius, 1544)

Camera Obscura(Angelo Sala, 1576-1637)

Computational Vision