Problems in large-scale computer vision

Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University

Research questions • Given huge collections of images online, how can we analyze images and non-visual metadata to: • Help users organize, browse, search? • Mine information about the state of the world and human behavior?

Common computational problems 1. Image-by-image (e.g. recognition) • Large-scale, but easily parallelizable 2. Iterative algorithms (e.g. learning) • Sometimes few but long-running iterations • Sometimes many lightweight iterations 3. Inference on graphs (e.g. reconstruction, learning) • Small graphs with huge label spaces • Large graphs with small label spaces • Large graphs with large label spaces

Scene classification • E.g.: Find images containing snow, in a collection of ~100 million images • Typical approach: extract features and run a classifier (typically SVM) on each image • We use Hadoop, typically with trivial Reducer, images in giant HDFS sequence files, and C++ Map-Reduce bindings

Geolocation • Given an image, where on Earth was it taken? • Match against thousands of place models, or against hundreds of attribute classifiers (e.g. indoor vs outdoor, city vs rural, etc.) • Again use Hadoop with trivial mapper

Learning these models • Many recognition approaches use “bags-of-words” • Using vector space model over “visual words” • To learn, need to: • Generate vocabulary of visual words (e.g. with K-means) 2. Extract features from training images 3. Learn a classifier • Our computational approach: • For k-means, use iterative Map-Reduce (Twister – J. Qiu) 2. For feature extraction, Map-Reduce with trivial reducer 3. For learning classifiers, we use off-the-shelf packages (can be quite slow)

Inference on graphical models • Statistical graphical models are widely used in vision • Basic idea: vertices are variables, with some known and some unknown; edges are probabilistic relationships • Inference is NP hard in general • Many approximation algorithms are based on message passing – e.g. Loopy Discrete Belief Propagation • # of Messages proportional to # of edges in graph • Messages can be large – size depends on variable label space • # of iterations depends (roughly) on diameter of graph

Pose and part-based recognition • Represent objects in terms of parts • Can be posed as graphical model inference problem • Small number of variables (vertices) and constraints (edges), but large label space (millions++) • We use single-node multi-threaded implementation, with barriers between iterations

Fine-grained object recognition • Classify amongst similar objects (e.g. species of birds) • How can we learn discriminative properties of these objects automatically? • Model each training image as a node, edges between all pairs; goal is to label each image with a feature that is found in all positive examples and no negative examples • We use off-the-shelf solver • With some additional multi- threading on single node; still very slow

Large-scale 3D reconstruction

Pose as inference problem • View reconstruction as statistical inference over a graphical model • Vertices are camerasand points • Edgesare relative camera/point correspondences (estimated through point matching) • Inference: Label each image with a camera pose and each point with a 3-d position, such that constraints are satisfied

Computation • Our graphs have ~100,000 vertices, ~1,000,000 edges, ~100,000 possible discrete labels • Reduce computation using exact algorithmic tricks (min convolutions) from O(|E| |L|2) to O(|E| |L|) • Huge amount of data: total message size >1TB per iteration • Parallelize using iterative MapReduce • Hadoop plus shell scripts for iteration • Mappers take in messages from last iteration and compute outgoing messages • Reducers collate and route messages • Messages live on HDFS between iterations

Common computational problems 1. Image-by-image (e.g. recognition) • Large-scale, but easily parallelizable 2. Iterative algorithms (e.g. learning) • Sometimes few but long-running iterations • Sometimes many lightweight iterations 3. Inference on graphs (e.g. reconstruction, learning) • Small graphs with huge label spaces • Large graphs with small label spaces • Large graphs with large label spaces

Problems in large-scale computer vision