550 likes | 727 Views
Toyota Technological Institute at Chicago http://ttic.uchicago.edu/~ gpapan. Visual Dictionaries. George Papandreou CVPR 2014 Tutorial on BASIS. K. D. Additive Image Patch Modeling. The patch-based image modeling approach. How to span the space of all 8x8 image patches?. α 1. Σ. α 2.
E N D
Toyota Technological Institute at Chicago http://ttic.uchicago.edu/~gpapan Visual Dictionaries George Papandreou CVPR 2014 Tutorial on BASIS
K D Additive Image Patch Modeling • The patch-based image modeling approach. • How to span the space of all 8x8 image patches? α1 Σ α2 α3
K D Additive Image Patch Modeling • The patch-based image modeling approach. • How to span the space of all 8x8 image patches? α1 Σ α2 α3
Two Modeling Goals Image reconstruction • Use dictionary to build image prior • Tasks: Compression, denoising, deblurring, inpainting,… Image interpretation • Use dictionary for feature extraction • Tasks: Classification, recognition,…
Three Modeling Regimes Two inter-related properties: • How big is the dictionary? Over-completeness: • How many non-zero components? sparsity Clustering PCA Sparse Coding
Where Does the Dictionary Come From? (1) Dictionary is fixed, e.g., basis or union of bases DCT JPEG image compression Wavelets
Where Does the Dictionary Come From? (2) Learn generic dictionary from a collection of images Many algorithms possible (see later)
Where Does the Dictionary Come From? (3) Learn an image-specific (image-adapted) dictionary Many algorithms possible (see later)
Where Does the Dictionary Come From? (4) Non-parametric: Dictionary is the set of all overlapping image patches (one or many images) Non-local means, patch transform, etc.
Beyond Bases: Hierarchical Dictionaries (1) Multi-scale image modeling • Apply same dictionary to image at different scales • Gaussian+Laplacian pyramids, wavelets, … (2) Recursive hierarchical models • Build recursive dictionaries • Deep learning
K D Key Problems • Coding • Find the expansion coefficients given the dictionary • Dictionary learning • Given data, learn a dictionary • Hierarchical modeling
Image Coding Problem: Least Squares • Least squares criterion. Equivalent formulations: • Solution (Tikhonov regularization, Wiener filtering): • Columns of V are the dual filters (dual dictionary). • Fast processing (inner products). Yields dense code.
Image Coding Problem: Vector Quantization • Equivalent formulations: • Solution: • Exact O(DK): one inner product for each basis • ApproximateO(D logK): ANN search
Sparse Coding Problem • Assume only L non-zero coefficients: • This is a much harder combinatorial problem. In the worst case there are possible active sets. • If we knew the active set of coefs, then LS problem. Two very effective families of approximate algorithms: • Greedy algorithms • Relaxation algorithms
Greedy Sparse Coding: Matching Pursuit • Greedily add T terms one by one Algorithm (Basic Matching Pursuit): • Initialize the residual r = x • Find atom that best explains the residual • Update the residual • Return if stopping criterion met, otherwise go to 2. VQ problem at each iteration • Many variants (e.g., OMP). Efficient implementations. Mallat (2009) SPAMS
Basic Matching Pursuit Convergence Analysis • Exponential convergence (recall VQ analysis): • Dictionary coherence: • Note that if spans . • Basic matching pursuit costs T times more than VQ.
Relaxed Sparse Coding • Continuous relaxation of the combinatorial problem • Prominent case: p = 1 (L1 convex optimization)
Basis Pursuit Coding • L1-penalized problem (a.k.a. basis pursuit, LASSO) • Global optimum (convex optimization) • Huge literature: • Algorithms for large-scale problems • Recovery guarantees: compressed sensing • Extensions: TV minimization, ADMM • Extensions: • Re-weighted L1 • Non-convex relaxations: 0 < p < 1 SPAMS Mallat (2009), Elad (2010)
Thresholding Algorithms • Lp-optimization with orthonormal basis • Decompose into separable problem: L2 invariant to rotation Lp norm is separable • Look-up table 1-D optimization: • L0 / L1: hard/soft thresholding • L2: linear shrinkage Elad (2010)
Recap: (Sparse) Coding • Problem: • Find the expansion coefficients given the dictionary • Exact methods • p = 2 (Fourier, PCA, etc): Linear system • p = 0 and = 1 (VQ): Fast search • Orthonormal dictionary: Separable 1-D optimization • Approximate methods for sparse coding • p = 0: Greedy matching pursuit • p = 1: Convex relaxation
Dictionary Learning • Find a dictionary W that best fits a dataset • Exact solution for L2 norm via the SVD (PCA) • For sparse norms this is a hard non-convex problem even if the coding problem is convex • Main approach: alternating minimization • Recent advances in theory
Alternating Minimization Methods • Update codes given dictionary • Use any greedy/ relaxation sparse coding algorithm • Update dictionary , given codes Least squares • Method converges to local minimum • K-SVD: Updates dictionaries sequentially • Online version much faster for large datasets Olshausen & Field (1996); Engan+ (1999); Aharon+ (2006); Mairal+ (2010)
K-Means as Dictionary Learning Method • Update codes given dictionary such that • Update dictionary , given codes • Special case of K-SVD using OMP-1 for coding • Extremely fast Aharon+ (2006); Coates, Lee, Ng (2011)
Learned Dictionaries Barbara KSVD Generic KSVD Aharon+ (2006)
Learned Dictionaries Generic K-Means Generic KSVD Aharon+ (2006), Coates+ (2011), Papandreou+ (2014)
Image Denoising with Learned Dictionaries Denoised KSVD 30.8dB Noisy 22.1dB Aharon+ (2006)
Image Inpainting with Learned Dictionaries • Joint dictionary learning and image inpainting Mairal+ (2010)
K-SVD vs K-Means Dictionaries in Denoising • Replace K-SVD with K-Means in dictionary learning step of the denoising algorithm. K-Means 32.25 dB OMP-1, 22 sec KSVD 32.43 dB OMP-32, 84 sec Noisy 22.12 dB
Recap: Dictionary Learning • Find a dictionary W that best fits a dataset • Non-convex problem • Greedy alternating optimization methods • The K-means algorithm is very fast and works well for small image patches
SIFT Patches Image Patch Dictionaries in Visual Recognition • SIFT-based Bag-of-Words classification pipeline Dictionary >10K words Classifier
Patch Dictionaries in Image Classification • Image classification without SIFT • Key insights: • K-means works well • Whitening is crucial • Using larger dictionaries boosts recognition rate • Encoding has a huge effect on performance • Promising results on CIFAR but not on large image datasets Varma, Zisserman (2003); Coates+ (2011)
Histograms of Sparse Codes for Object Detection • Key idea: Build a HOG-like descriptor on top of K-SVD learned patch dictionary instead of gradients, then DPM Ren, Ramanan (2013); Also see Dikmen, Hoiem, Huang (2012)
Hierarchical Modeling and Dictionary Learning • So far: Modeling the appearance of small image patches, say 8x8 pixels. • How about dictionaries of larger visual patterns? • Multiscale modeling • Work with image pyramids • Hierarchical modeling • Model higher order statistics of feature responses • Recursively compose complex visual patterns • Use unsupervised or supervised objectives
Hierarchical Models of Objects Fidler & Leonardis (2007); Zhu+ (2010)
Hierarchical Matching Pursuit (K-SVD) Bo, Ren, Fox (2013)
Deep Convolutional Networks LeCun+ (1998); Krizhevsky+ (2012)
K D Transformation Aware Dictionaries • How to span the space of all 8x8 image patches? α1 Σ α2 α3
Sources of Redundancy in Patch Dictionaries • Same pattern, different position • Same pattern, opposite polarity (x2 redundancy) • Same pattern, different contrast • How to build less redundant dictionaries?
The Epitome Data Structure Patch Epitome Epitomes: Jojic, Frey, Kannan, ICCV-03
Generating Patches from an Epitome • A single epitome essentially is a large collection of translated copies of a visual pattern.
Epitomic Image Matching Epitomes: Jojic, Frey, Kannan, ICCV-03
Dictionary of Mini-Epitomes Papandreou, Chen, Yuille, CVPR-14
Coding and Learning with Epitomic Dictionaries Patch coding in epitomic dictionaries: • Epitomic dictionary equivalent to standard dictionary with patches at all possible positions in epitome: Dictionary learning: • Variational inference on GMM model (Jojic+ '01) • Sparse dictionary learning (Aharon, Elad '08; Mairal+ '11) • Epitomic K-Means (Papandreou+ '14)
K-Means for the Mini-Epitome Model Generative model: • Select mini-epitome k with probabilityz • Select position p within epitome uniformly • Generate the patch Epitomic K-means (hard-EM) • Epitomic matching (hard assignment) • Epitome update • Diverse initialization with K-means++ (optional) Papandreou, Chen, Yuille, CVPR-14
K-Means for the Mini-Epitome Model • Generative model: • 1. Select mini-epitome k with probability • 2. Select position p within epitome uniformly • 3. Generate the patch Max likelihood, hard EM – essentially epitomic adaptation of K-Means. Faster convergence using diverse initialization of mini-epitomes by epitomic adaptation of K-Means++.
A Generic Mini-Epitome Dictionary Epitomic dictionary 256 mini-epitomes (16x16) Non-Epitomic dictionary 1024 elements (8x8) Both trained on 10,000 Pascal images
Evaluation on Image Reconstruction Original image Epitome reconstr. PSNR: 29.2 dB Improvement over non-epitome