Object-Graphs for Context-Aware Visual Category Discovery

Object-Graphs for Context-Aware Visual Category Discovery Cheng-Ming Chiang Advisor: Sheng-Jyh Wang 2012/7/9 Reference: L. Yong Jae and K. Grauman, "Object-Graphs for Context-Aware Visual Category Discovery," PAMI, 2012.

Outline • Introduction • Related Work • Approach • Results • Conclusion and Future Work • Reference

Introduction • How to discover unfamiliar objects in unlabeled images? • Unsupervised visual category discovery • Existing unsupervised techniques usually use appearance alone to detect visual themes, but it may suffer from • Occluded objects • Large intra-category variations • Low-resolution data

Introduction • A new idea: How could visual discovery benefit from familiar objects? • Model the interaction between a set of detected known categories and the unknown to-be-discovered categories • Object-level context cues + Appearance descriptors • Introduce a novel object-graph descriptor to encode the 2D and 3D spatial layout

Introduction

Related Work • State-of-the-art discovery method, appearance alone • B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman, "Using Multiple Segmentations to Discover Objects and their Extent in Image Collections," CVPR, 2006 Reference: B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman, "Using Multiple Segmentations to Discover Objects and their Extent in Image Collections," CVPR, 2006

Related Work • Is it possible to learn visual object classes and their segmentations simply from looking at images? • Challenges: • How to recognize visually similar objects? • How to segment them from their background? • In fact, both object recognition and image segmentation can be thought of as parts of one large grouping problem • Projecting groups onto a particular image gives segmentation • Projecting groups onto the image index gives recognition Reference: B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman, "Using Multiple Segmentations to Discover Objects and their Extent in Image Collections," CVPR, 2006

Related Work • AlgorithmGiven a large, unlabeled collection of images • For each image in the collection, compute multiple candidate segmentations • For each segment in each segmentation, compute a histogram of “visual words” • Perform topic discovery on the set of all segments in the image collection (using Latent Dirichlet Allocation) • For each discovered topic, sort all segments by how well they are explained by this topic Reference: B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman, "Using Multiple Segmentations to Discover Objects and their Extent in Image Collections," CVPR, 2006

Related Work • Generating multiple segmentations • Produce sufficient segmentations to have a high chance of obtaining “good” segments that will contain potential objects • Obtaining visual words • SIFT descriptors for each image and quantized into 2000 visual words • Each image segmentis represented by a histogram of visual words contained within the segment Reference: B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman, "Using Multiple Segmentations to Discover Objects and their Extent in Image Collections," CVPR, 2006

Related Work • The topic discovery models • To analyze the collection of segments and discover ‘topics’ • Sorting the soup of segments • Find good segments within each topic Reference: B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman, "Using Multiple Segmentations to Discover Objects and their Extent in Image Collections," CVPR, 2006

Related Work Reference: B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman, "Using Multiple Segmentations to Discover Objects and their Extent in Image Collections," CVPR, 2006

Outline- Approach • Identifying unknown objects • Object graphs: modeling the topology of category predictions • Three-dimensional object graphs • Category discovery amid familiar objects

Approach • Goal • Discover categories in unlabeled image collections using appearance and object-level semantic context cues • Generate multiple segmentation for each image and classify each region as known or unknown • Model the unknown regions’ surrounding contextual information in terms of object-graph • Group the unknown regions based on their appearance similarity and relationship to the surrounding known regions

Identifying Unknown Objects • Predict which regions are likely instances of the previously learned categories • Learn classifiers for N categories, • Generate multiple segmentations per image • Given the region s, calculate posterior for each class • The “known ” object will have only one peak value among all ⇒ Lowerentropy • The “unknown” object will have multiple peak values among the posteriors ⇒ Higher entropy

Identifying Unknown Objects • Select a cutoff threshold equal to the midpoint in the entropy rang • Lighter/darker color indicate higher/lower entropy

Object Graphs: Modeling the Topology of Category Predictions • Model the unknown regions’ surrounding contextual information in the form of graph representation • Regions with similar surrounding context would have similar graphs • Generate superpixels for each image, except for the unknown region Roughly 50 superpixels for each image

Object Graphs: Modeling the Topology of Category Predictions • From stage 1, we have the posteriors foreach segment • Then, map the per-region posteriors to per-pixel posteriors • Calculate posteriors for each superpixel regions

Object Graphs: Modeling the Topology of Category Predictions • For each unknown segment s, we compute a series of histograms using the posterior computed within its neighboring superpixels • Each histogram records the posteriors within ’s spatially nearest segments for each of two orientations, above and below the segment

Object Graphs: Modeling the Topology of Category Predictions • Concatenate the component histograms for to produce the final object-graph descriptor • Use R=20 in the example An dimensional vector

Object Graphs: Modeling the Topology of Category Predictions Similar object graphs for the unknown regions

Three-Dimensional Object Graphs • Is 2D object-graph a reliable descriptor? • Relationship between a car and the road • Introduce a 3D variant of the object graph • Use a depth information to estimate the proximity and relative orientations of surrounding familiar objects • Use regions rather than superpixels for 3D object-graph nodes • Employ the method of Hoiem et al. to estimate depth • D. Hoiem, A.N. Stein, A.A. Efros, and M. Hebert, “Recovering Occlusion Boundaries from a Single Image,” ICCV, 2007.

Three-Dimensional Object Graphs

Three-Dimensional Object Graphs More robust to camera pose variations

Category Discovery amid Familiar Objects • Combine object-level context with region-based appearance to form groups from unknown regions • Object-level context: 2D or 3D object graph descriptors • Appearance descriptor : • Texton Histograms(TH) Edge filters + Gaussian filter + Laplacian-of-Gaussian filter • Color Histograms(CH) Lab color space • Pyramid HOG(pHOG) Three pyramid level with eight bins

Category Discovery amid Familiar Objects • Similarity measure • Compute the affinities between all pairs of unknown regions to generate an affinity matrix • Use the spectral clustering method to group the regions ,where denote a kernel function for two histogram inputs:

Algorithm Summarization • Offline training • Unlabeled novel images as the input

Algorithm Summarization • Generate multiple segmentations • Compute the posteriors and classify each segment as either known or unknown

Algorithm Summarization • Generate superpixel regions and compute the posteriors

Algorithm Summarization • Build an object-graph descriptor for each unknown region

Algorithm Summarization • Compute affinities between all pairs of unknown regions • Cluster using those affinities to group the objects

Outline- Result • Unsupervised discovery accuracy • Comparison to the state of the art • Discovered categories: qualitative results

Unsupervised Discovery Accuracy • Appearance + object graph V.S. appearance alone Different known objects # of unknowns increase, the accuracy of object-graph decreases

Unsupervised Discovery Accuracy • Greater improvement for high appearance variance

Comparison to the State of the Art • Compare to "Using Multiple Segmentations to Discover Objects and their Extent in Image Collections" • Use a bag-of-features representation with SIFT features

Reference [1] L. Yong Jae and K. Grauman, "Object-Graphs for Context- Aware Visual Category Discovery," PAMI2012. [2] D. Hoiem, A. N. Stein, A. A. Efros, and M. Hebert, "Recovering Occlusion Boundaries from a Single Image," ICCV 2007 [3] B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman, "Using Multiple Segmentations to Discover Objects and their Extent in Image Collections," CVPR 2006 [4] http://nlp.stanford.edu/IRbook/html/htmledition/ evaluation-of-clustering-1.html

Object-Graphs for Context-Aware Visual Category Discovery