Object and Face Recognition: Lecture Notes

Object and Face Recognition: Lecture Notes Introduction • These notes are in 2 parts, roughly corresponding to (1) object recognition; and (2) face recognition. • We'll actually make a smooth transition by discussing whether face recognition is a special kind of recognition, or whether it can be described in terms of more general abilities.

1. Early Models of Recognition • Template-matching. Early models of recognition were based on matching the stimulus pattern to a set of stored, pre-defined 'templates'. These models can be expanded by incorporating some pre-processing • Feature analysis. Instead of looking for a match to a standard shape, feature analysis emphasizes that what makes an 'A' an A is the letter's unique set of defining features

2. Failures of Matching-Models There are several key weaknesses of these early approaches to recognition: • 3 dimensions. The template-matching and feature analysis models were designed mostly to recognize alphanumeric characters • However, most real-world human recognition occurs on 3D objects, not 2D symbols • Superficial differences. It's hard for these models to 'extract' the basic similarity of repeat instances of an item while also representing superficial differences (consider handwriting as an example)

Where to start? • These models don't have a clear means for breaking up complex input into a series of objects-to-be-identified; instead, they seem to rely on being 'fed' items (e.g., letters) to recognize • That seems utterly unlike human perception

3. Building Blocks of Recognition? • In order to simulate human recognition in a complex 3D world, more recent models of object recognition have relied on defining constraints (or "primitives") and basic strategies that the visual system might use

Marr and Nishihara • Marr and Nishihara present a model of recognition, restricted to the set of objects that can be described as generalized cones-- objects with a clear main axis and a constant-shape cross section

Recognition occurs in the following levels: • Single-model axis. The first step in the model is identification of the main axis of the object • Component axes. Then, the axes of each of the smaller sub-portions of the object are identified.[see steps 1 and 2] • 3D model match. Finally, a match between the arrangement of components and a stored 3D model description is performed to identify the object. [see step 3-- the catalog]

Steps 1 & 2 of Marr and Nishihara’s theory:

Step 3: The catalog (Marr & Nishihara)

Problem • Although object comparisons are fastest if the main axis of an object is the same as the object it is being judged against, there is not strong data supporting the 'psychological reality' of the Marr & Nishihara model (although it might be a very good way to get computers to identify objects)

Geons (Biederman) • According to geon theory, complex objects are made up of arrangements of basic, component parts ('geons') • Some evidence for the psychological reality of geons comes from experiments that show that recognition is impaired when we hide the intersections of geons • However, occluding these intersections also changes the picture in many other ways

Object with geon intersections occluded

Object with geon intersections revealed

4. View-Dependent Recognition? • An alternate theory of recognition is simply that you store a set of characteristic views (mental images) of each object • You recognize each object by finding the closest match • Note that this type of model does not require you to infer/recover 3D shape in order to identify objects

Experimental Evidence • Bulthoff & Edelman trained people to learn the shapes of weird objects that they made up • They tested them on identification of the objects shown from (1) the same views as during training; and (2) different views • Performance was much better for the training (familiar) views

5. Perceptual Organization • We’ve switched from talking about how the visual system extracts important features of the environment-- edges, colors, motion, depth, etc-- to how the visual system identifies objects (which are made up of these key features) • The goal of understanding "perceptual organization" is to understand how we put together the basic features to see a coherent, organized world of things and surfaces

Gestalt • The basic motto of the Gestaltists-- the first real "school" of perceptual organization theory-- was that the whole is greater than the sum of its parts • Examples: a melody made a various notes; or see an example from the pointillist painter Paul Signac

Figure & Ground • The first step in organizing the perceptual world is to divide figure and ground; ambiguous figures demonstrate this • Notice that you can perceive either the white or dark portion of the picture as figure or ground, but that it's very difficult to perceive both as figure or both as ground simultaneously

Figure & Ground

Gestalt laws of organization: • Proximity. Elements that are close are often grouped together • Similarity. Similar elements tend to be grouped together • Common fate. Things that move together tend to group • Good continuation. Perceptual mechanisms tend to preserve smooth continuity in favor of abrupt edges • Closure. Of several possible perceptual organizations, ones yielding "closed" figures are more likely than those yielding "open" ones. • Orientation & symmetry. Objects oriented with horizontal and vetical axes, or ones that are symmetric, are more often perceived as figures

Demonstration of proximity & similarity

Demonstration of good continuation (your perception of the objects on the left) and not good continuation (right)

Figure/ground changes with orientation and closure

The law of Pragnanz: • "Of several geometrically possible organizations that one will actually occur that possesses the best, simplest, and most stable shape" (Koffka, 1935) • But what's the "best, simplest, most stable"? • Do you think that the Gestalt laws of organization provide a sufficient answer?

Illusions • Sometimes, your visual system must group/segment an object, not based on edges in the retinal image, but by inferring parts of the missing boundaries • Consider illusory contours; and also note that such edgeless contours can occur in the real world, too

Illusory contour. Note that there is no physical contour for most of the perceived "edge"

Illusory contour in real life

Impossible Figures • Certain figures can be drawn so that there is no consistent interpretation, due to the fact that there is no obvious real-world physical object that could correspond to the image • However, "impossible objects" can be built so that they look like an impossible figure from a certain vantage point

Impossible figure

Impossible object

Impossible object, revealed

Failures of Grouping • Some figures do not yield a consistent grouping, and so the visual system struggles to arrive at a solution, and ends up switching back and forth between several possible groupings

Multiple possible groupings

Object and Face Recognition: Lecture Notes