1 / 13

Problems in large-scale computer vision

Problems in large-scale computer vision. David Crandall School of Informatics and Computing Indiana University. Research questions. Given huge collections of images online, how can we analyze images and non-visual metadata to: Help users organize, browse, search?

kiefer
Download Presentation

Problems in large-scale computer vision

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Problems in large-scale computer vision David Crandall School of Informatics and Computing Indiana University

  2. Research questions • Given huge collections of images online, how can we analyze images and non-visual metadata to: • Help users organize, browse, search? • Mine information about the state of the world and human behavior?

  3. Common computational problems 1. Image-by-image (e.g. recognition) • Large-scale, but easily parallelizable 2. Iterative algorithms (e.g. learning) • Sometimes few but long-running iterations • Sometimes many lightweight iterations 3. Inference on graphs (e.g. reconstruction, learning) • Small graphs with huge label spaces • Large graphs with small label spaces • Large graphs with large label spaces

  4. Scene classification • E.g.: Find images containing snow, in a collection of ~100 million images • Typical approach: extract features and run a classifier (typically SVM) on each image • We use Hadoop, typically with trivial Reducer, images in giant HDFS sequence files, and C++ Map-Reduce bindings

  5. Geolocation • Given an image, where on Earth was it taken? • Match against thousands of place models, or against hundreds of attribute classifiers (e.g. indoor vs outdoor, city vs rural, etc.) • Again use Hadoop with trivial mapper

  6. Learning these models • Many recognition approaches use “bags-of-words” • Using vector space model over “visual words” • To learn, need to: • Generate vocabulary of visual words (e.g. with K-means) 2. Extract features from training images 3. Learn a classifier • Our computational approach: • For k-means, use iterative Map-Reduce (Twister – J. Qiu) 2. For feature extraction, Map-Reduce with trivial reducer 3. For learning classifiers, we use off-the-shelf packages (can be quite slow)

  7. Inference on graphical models • Statistical graphical models are widely used in vision • Basic idea: vertices are variables, with some known and some unknown; edges are probabilistic relationships • Inference is NP hard in general • Many approximation algorithms are based on message passing – e.g. Loopy Discrete Belief Propagation • # of Messages proportional to # of edges in graph • Messages can be large – size depends on variable label space • # of iterations depends (roughly) on diameter of graph

  8. Pose and part-based recognition • Represent objects in terms of parts • Can be posed as graphical model inference problem • Small number of variables (vertices) and constraints (edges), but large label space (millions++) • We use single-node multi-threaded implementation, with barriers between iterations

  9. Fine-grained object recognition • Classify amongst similar objects (e.g. species of birds) • How can we learn discriminative properties of these objects automatically? • Model each training image as a node, edges between all pairs; goal is to label each image with a feature that is found in all positive examples and no negative examples • We use off-the-shelf solver • With some additional multi- threading on single node; still very slow

  10. Large-scale 3D reconstruction

  11. Pose as inference problem • View reconstruction as statistical inference over a graphical model • Vertices are camerasand points • Edgesare relative camera/point correspondences (estimated through point matching) • Inference: Label each image with a camera pose and each point with a 3-d position, such that constraints are satisfied

  12. Computation • Our graphs have ~100,000 vertices, ~1,000,000 edges, ~100,000 possible discrete labels • Reduce computation using exact algorithmic tricks (min convolutions) from O(|E| |L|2) to O(|E| |L|) • Huge amount of data: total message size >1TB per iteration • Parallelize using iterative MapReduce • Hadoop plus shell scripts for iteration • Mappers take in messages from last iteration and compute outgoing messages • Reducers collate and route messages • Messages live on HDFS between iterations

  13. Common computational problems 1. Image-by-image (e.g. recognition) • Large-scale, but easily parallelizable 2. Iterative algorithms (e.g. learning) • Sometimes few but long-running iterations • Sometimes many lightweight iterations 3. Inference on graphs (e.g. reconstruction, learning) • Small graphs with huge label spaces • Large graphs with small label spaces • Large graphs with large label spaces

More Related