130 likes | 866 Views
Group Sparse Coding. Samy Bengio , Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009). Presented by Miao Liu July-23-2010. *Figures and formulae are directly copied from the original paper. Outline. Introduction Group Coding Dictionary Learning
E N D
Group Sparse Coding SamyBengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July-23-2010 *Figures and formulae are directly copied from the original paper
Outline • Introduction • Group Coding • Dictionary Learning • Results and Discussion
Introduction • Bag-of-words document representations • Encode document by a vector of the counts of descriptors (words) • Widely used in text, image, and video processing • Easy to determine a suitable word dictionary for text documents. • For images and videos • No simple mapping from the raw document to descriptor counts • Require visual descriptors (color, texture, angles, and shapes) extraction • Measure descriptors at appropriate locations (regular grids, special interest points, multiple scales) • More carful design of dictionary is needed
Dictionary Construction • Unsupervised vector quantization (VQ), often k-means clustering • Pro: maximally sparse per descriptor occurrence • Cons: • Does not guarantee sparse coding whole image • Not robust w.r.to descriptor variability • regularized optimization • Encode each visual descriptor as a weighted sum of dictionary elements • Mixed-norm regularizers • Take into account the structure of bags of visual descriptors in images • Presenting sets of images from a given category
Problem Statement • The main goal : encode groups of instances (e.g. image patches) in terms of dictionary code words (some kind of average patches) • Notations • The m’th group • the subscript m is removed for single group operation. • Sub goals • Encoding ( ) • Learning a good dictionary from a set of training groups
Group Coding • Given and , group coding is achieved by solving where • . • is the • balances fidelity and reconstruction complexity. • Coordinate descent is applied to solve the above problem. • Finally, compress into a single vector by taking p-norm of each .
Group coding • Define • Optimum for p=1 • Optimum for p=2
Dictionary Learning • Good Dictionary should balances between • Reconstruction error • Reconstruction complexity • Overall complexity relative to the given training set • Seeking learning method facilitates both • induction of new dictionary words • removal of dictionary words that have low predictive power • Applying • Let • Objective
Dictionary Learning • In this paper p=2 • Define auxiliary variables • Define vector (appearing in the gradient of objective function) • Similar to the argument in group coding, one can obtain
Experimental Setting • Compare with previous sparse coding method by measuring impact on classification the PASCAL VOC (Visual Object Classes) 2007 dataset • image from 20 classes, including people, animals, vehicles and indoor objects etc. • around 2500 images for respective training and validation; 5000 images for testing. • Extract local descriptors based on Gabor wavelet response at • Four orientations ( ) • Spatial scales and offsets (27 combination) • The 27 (scale, offset) pairs were chosen by optimizing a previous image recognition task, unrelated to this paper.