870 likes | 1.09k Views
Using Analogy to Discover the Meaning of Pictures. Melanie Mitchell Computer Science Department Portland State University and External Professor Santa Fe Institute. An image-understanding task:. High-level perception. “Meaning”. ?. Simple Segmentation. Color, Shape, Texture.
E N D
Using Analogy to Discover the Meaning of Pictures Melanie Mitchell Computer Science Department Portland State University and External Professor Santa Fe Institute
High-level perception “Meaning” ? Simple Segmentation Color, Shape, Texture Object recognition Pattern recognition Low-level vision
High-level perception “Meaning” Simple Segmentation Color, Shape, Texture Object recognition Pattern recognition Low-level vision The “SEMANTIC GAP’
The HMAX model for object recognition(Serre, Wolf, Bileschi, Risenhuber, and Poggio, 2006)
Gabor Filters Gabor filter: Essentially a localized Fourier transform in the image. Filter has associated frequency , scale s, and orientation . Response measures extent to which is present at orientation at scale s centered about pixel (x,y).
S1 units: Gabor filters (one per pixel) 16 scales / frequencies, 4 orientations
C1 unit: Maximum value of group of S1 units, pooled over slightly different positions and scales 8 scales / frequencies, 4 orientations
S2 units: Radial Basis Functions over “Natural Image Patches” • Idea is that natural images contain universal, low-level features that are useful in classifying objects. • Randomly sample small “crops” from natural images, and feed them through S1 and C1 layers. • Collect a set of N patches , {Pi | i 1, ..., N}, of C1 layer from this random sample. • Now, with new image, a unit S2i corresponding to Pi gets input X from C1 layer, computes a radial basis function: • Gives “degree” to which feature Piis present in input X.
Feature vector representing image Support Vector Machine classification
Object detection (here, “car”) with HMAX model (Bileschi, 2006)
Sample of results from Poggio model (Serre et al., 2006) (Bileschi, 2006)
Can we use a simple ontology to answer this question? “Dog walking” Person Dog leash holds attached to action action walking
Can we use a simple ontology to answer this question? “Dog walking” Person Dog leash holds attached to Dogs action action running walking
Can we use a simple ontology to answer this question? “Dog walking” Person Dog leash holds attached to Dogs action action Cat running Iguana walking
Can we use a simple ontology to answer this question? “Dog walking” Person Dog leash Helicopter Bicycle Car holds attached to Dogs action action Cat running Iguana walking
Why is image-understanding hard for computers? • It is vastly open-ended.
Dog grooming Fanny pack Dog walking Gasoline Lawn mower Sidewalk Beach Stick Inside Runway Sky Helicopter Leash Army Grass Airplane Dog Outside Person Ground Holding Attached to Tree Backpack Car Far from Close to Standing Running Above Left of Walking Track
Why is image-understanding hard for computers? • It is vastly open-ended. • Can’t solve by feeding image’s feature vector to all known “object classifiers”; in general too many such classifiers, and they are too imperfect! (Compare with StreetScenes system.) • In general can’t even construct high-level“feature vector” ahead of time, since there are too many possible features and you don’t know which features are relevant. • Need dynamics! Need to construct “probable”, coherent, consistent, representation of picture at “recognition time”. Construction process must allow different parts of representation to influence one another dynamically.
In constructing representation, need to limit exploration of features to the most promising possibilities ― but how do you know which ones are promising without exploring them? • Need prior, higher-level knowledge to interact with lower-level vision in both directions (bottom-up and top-down). • Need to allow prior knowledge to be “fluid” – allow concepts to “slip”. Need to perceive essential similarity in the face of superficial differences (analogy-making). • In short, need “active symbols”: concepts with dynamic activation (relevance) that can be activated by other active symbols, spread activation to conceptual neighbors, and that can push for themselves to be instantiated in the perception of a situation.
Concept network Active Symbol Architectures(Hofstadter et al.) “Top-down” perceptual agents (codelets) Workspace Temperature “Bottom-up” perceptual agents (codelets)
Architecture of Copycat Concept network (Slipnet) a b c ---> a b d i i j j k k --> ? Perceptual and structure-building agents (codelets) Workspace Temperature
Idealizing analogy-making abc ---> abd ijk ---> ?
Idealizing analogy-making abc ---> abd ijk ---> ijl (replace rightmost letter by successor)
Idealizing analogy-making abc ---> abd ijk ---> ijl (replace rightmost letter by successor) ijd (replace rightmost letter by ‘d’)
Idealizing analogy-making abc ---> abd ijk ---> ijl (replace rightmost letter by successor) ijd (replace rightmost letter by ‘d’) ijk (replace all ‘c’s by ‘d’s)
Idealizing analogy-making abc ---> abd ijk ---> ijl (replace rightmost letter by successor) ijd (replace rightmost letter by ‘d’) ijk (replace all ‘c’s by ‘d’s) abd (replace any string by ‘abd’)
Idealizing analogy-making abc ---> abd iijjkk ---> ?
Idealizing analogy-making abc ---> abd iijjkk ---> iijjkl Replace rightmost letter by successor
Idealizing analogy-making abc ---> abd iijjkk ---> ?
Idealizing analogy-making abc ---> abd iijjkk ---> iijjll Replace rightmost “letter” by successor
Idealizing analogy-making abc ---> abd kji ---> ?
Idealizing analogy-making abc ---> abd kji ---> kjj Replace rightmost letter by successor
Idealizing analogy-making abc ---> abd kji ---> ?
Idealizing analogy-making abc ---> abd kji ---> lji Replace “rightmost” letter by successor
Idealizing analogy-making abc ---> abd kji ---> ?
Idealizing analogy-making abc ---> abd kji ---> ?
Idealizing analogy-making abc ---> abd kji ---> kjh Replace rightmost letter by “successor”