460 likes | 471 Views
This text explores the challenges of identifying objects from 2D retinal images and discusses the nature of shape representations in memory. It also examines the criteria for evaluating shape representations and the factors that contribute to creating a shape representation.
E N D
THE PROBLEM OF VISUAL RECOGNITION (Ch. 3, Farah) • Why is it difficult to identify real world objects from the retinal image? • How many shape representations of each distinguishable object do we need in memory? • What is the nature of shape representations in memory? • How do we evaluate a proposed representation? • What are the fundamental dimensions of a useful shape representation?
From image to object: A hard problem • Why is it difficult to identify a distal object using only our 2D retinal image? • The 2D retinal image is only partly determined by the shape of a distal object
Shape of the 2D retinal image … • The shape of a 2D retinal image of an object varies depending upon spatial relation between the viewer (e.g., you) and the object. • In a 2D image, the shape of a CUBE or SPHERE is rarely square or round, more often it appears as a parallelogram or an oval, respectively. • What else varies in a 2D retinal image? • Position and size of the object in the picture plane • Which surfaces are visible, foreshortened, or occluded • The presence or absence of shadows.
How do we identify the object shape from a 2D retinal image? • Do we infer (compute) the shape from the image? • Or, do we learn a separate association between each view of an object and its identity?
Object recognition in normal humans: Two hypotheses • Human shape perception is: H1: Viewpoint dependent (Rock, Tarr) H2: Viewpoint independent (Marr, Biederman)
H1: Human shape perception is tied to viewing conditions. • Novel perspectives of wire figures can be hard to identify (accuracy: 75% - 39% correct; Rock et al, 1981). • But, the same shapes with clay surfaces can be recognized from different perspectives (Farah et al, 1994).
H2: Human shape perception is independent of viewing conditions. • Mental rotation in the picture plane: • We can name highly familiar letters/numbers equally fast/accurate at any orientation (Corballis, 1988). • But, orientation is important on first encounter (Jolicoeur, 1985, see figure) .
Variation from canonical perspective • Rotating in depth mimics foreshortening and changes canonical perspective. • Time to name objects increases as they are rotated away from a canonical perspective (Palmer et al, 1981).
Multiple Views Theory (Tarr, 1995) • Shape representations in memory combine shape and viewpoint information (a la Rock). • We transform perceptual representations to match shape representations across changes in viewpoint.
General conclusions • Our ability to identify “a familiar object from a novel image may depend strongly on the type, complexity, and familiarity of the object”. (Farah, p. 68) • Most likely, we have more than one shape representation in memory per distinguishable object. • Two potential ways to identify an image: • Transform it to correspond to a familiar shape • Factor it into true object shape + viewing condition
Shape representation: a computational framework • Some information about an object is EXPLICIT in the 2D retinal image (e.g., location in visual field, distance of parts from viewer). • But, much of the information important for visual recognition is only IMPLICIT in the retinal image (e.g., 3D shape, presence and shape of component parts).
What is the nature of the shape representation in memory? • What are the criteria by which we can evaluate proposed shape representations? • What goes into creating a shape representation?
Criteria for evaluating the usefulness of the internal shape representation for object recognition (Marr & Nishihara, 1978) • Accessibility • Scope • Uniqueness • Stability • Sensitivity
Accessibility: Ease of deriving (recovering) shape information about an object from a 2D retinal image • Human object perception is typically fast, effortless, and accurate. • Hence, the relevant information should be recoverable from the 2D image with minimal demand on resources.
Scope: Range of stimuli over which a shape representation is effective • Most machine vision representations are special purpose systems that can only recognize stimuli in a limited domain (e.g., bank numbers, blocks world). • In contrast, human object recognition system is often viewed as a general-purpose system, capable of representing all types of stimuli (objects, faces, printed letters, handwriting). From Palmer (1999)
Uniqueness: Assigning the same shape description to a given image of an object • To describe an image of an object the same way on different occasions requires that the image is always coded using the same coordinate system. • For example: Assigning the same shape representation to a particular chair on different occasions requires that the chair be coded using the same coordinates on each occasion.
Stability: Assigning the same shape representation to images of the same object under different viewing conditions • A stable representation captures the intrinsic shape of an object regardless of changes in image appearance due to shifts in location, perspective, lighting, position of moving parts (e.g., a cat in many positions). • Stability also captures the similarity relations that exist between images of similar objects (e.g., seeing a polar bear and a black bear as bears or seeing different black bears in different locations or on different occasions as bears).
Stability: Cats • Cats have movable parts, can be in different positions, colors, etc. • A stable shape representation will capture the intrinsic shape of a cat, regardless of variation in the 2D retinal image. From Kosslyn (1994)
Sensitivity: The degree to which the shape representation codes (subtle) differences between similar shapes and different images of the same shape • Making within category discriminations: Being able to distinguish between the shape representations of different bears (black bears, polar bears, grizzly bears), chairs (wooden chair, folding chair) and faces (your face, my face, your friend’s face).
Four fundamental aspects of shape representation • Marr: Three dimensions of shape representation that must be specified in any computational model: • Coordinate system • Primitives • Organization. • Plaut & Farah: How the shape representation is implemented.
Coordinate system: A fundamental aspect of shape representation. • “… shape is nothing more than a set of locations occupied by an object” (Farah, 2000, p. 71) and hence, representing these locations has to be relative to some coordinate system. • Accessibility and stability trade-off. Highly accessible coordinate systems have low stability and vice versa.
Three types of coordinate systems • Viewer centered • Environment centered • Object centered
Viewer-centered Coordinate System • Locations are specified relative to viewer – retina, head, hand, etc. • Visual stimuli are initially represented in a retinotopic coordinate system (2D space with origin fixed with respect to retina). If either the eyes or the object moves, the retinotopic representation changes. • Very accessible, poor stability.
Environment-centered Coordinate System • Locations of objects are specified relative to other objects in the environment. • Stable over movements of viewer, but not over movements of objects. • Requires the viewer to continually update the spatial relationship of the environment to the viewer as the viewer moves about the environment. Accessibility is reduced.
Object-Centered Coordinate System • Locations occupied by different parts of an object are represented in a coordinate system intrinsic to, or fixed, relative to the object. Mug: Handle is on the outside wall of a cylinder. This spatial relation stays the same, regardless of viewing perspective. • Position and orientation invariance yields perfect stability, but reduced accessibility. • Interesting difficulty: How do you assign relations between parts before you recognize object?
Primitives: What is localized in space: Contours, surfaces, or 3D shapes? • Contour-based primitives? • Edges are extracted from visual image early in cortical processing. They are relatively accessible, but have limited scope, and are not stable across viewing conditions, especially depth rotation.
Primitives cont. • Surface-based primitives? Evidence suggests simple cells in V1 actually code surfaces. Surfaces provide broader scope, better stability. (Marr’s 2 ½-D sketch).
Primitives cont… • Volume-based primitives: Although it is computationally difficult to derive them from a 2D image, volume-based primitives seem ideal for object recognition. • Marr’s cylinders (upper figure) • Biederman’s geons (lower figure)
Biederman’s GEON model Some geons
Organization: Degree and type of relation among elements of shape representation. • Are the elements on: • the same scale as in Biederman’s geon model or • related hierarchically as in Marr’s model?
Recapping … • Have examined: • Need for multiple shape representations in memory • Criteria for evaluating shape representations • Three coordinate systems • Nature of the primitive elements • Taken together, the evidence suggests that object recognition may use an object-centered coordinate system, where volume-based primitive parts combine to represent objects.
Implementation • Neural net modeling blurs the distinction between the algorithmic (computational processes involved in perception) and implementation (brain, machine) levels. Hence, consider two aspects here. • Nature of the computations underlying memory search differs between symbolic and neural net models. • Local vs. distributed representations
Models in Cognitive Psychology • Function: • Help to organize what we know • Help to identify gaps in our knowledge • Are the source of testable hypotheses • When implemented as a computer model, allow us to test the adequacy of the model
Symbolic Models • Symbolic Models • Parallel processing vs serial processing • Transformation of symbolic information from stage to stage
Nature of the computations underlying memory search. • Symbolic model: • Perceptual representation is separate from the stored shape representation in memory. • Comparison process is separated from knowledge. • Explicitly compares input (perceptual representation) to memory (shape representations in memory).
Neural Net Models • Neural Net Models • Simple units: Nodes organized in layers (input, hidden, output) • Activation level of unit • Connections between units • Connection weights
Computations underlying “memory search” in neural net model IN NEURAL NET MODELS • Pattern of activation across units corresponds to recognized object, jointly determined by input activation and weights of network (system knowledge). • Difficult to distinguish structure/process; perception/memory.
Local vs distributed representations • Local: One-to-one mapping of things doing the representing to that which is being represented (i.e., grandmother cells). • Distributed: Many-to-many mapping of things representing onto things being represented. A pattern of activation over many units.
Distributed representations… • Represent and retrieve information efficiently in a network of highly interconnected representational units (like neurons in the brain) • Allow a greater number of entities to be represented within a given number of units • Degrade gracefully • Automatically generalize (but this can cause interference)
Onward to Object Recognition • Chapter 4: Object recognition • Chapter 5: Face Recognition • Chapter 6: Word Recognition