Visual Element Discovery as Discriminative Mode Seeking

Visual Element Discovery as Discriminative Mode Seeking CMU CMU UCB Carl Doersch, Abhinav Gupta, Alexei A. Efros

The need for mid-level representations 6 billion images 70 billion images 1 billion images served daily 10 billion images 60 hours uploaded per minute : From Almost 90% of web traffic is visual!

Discriminative patches • Visual words are too simple • Objects are too difficult • Something in the middle? (Felzenswalb et al. 2008) (Singh et al. 2012)

Mid-level “Visual Elements” • Simple enough to be detected easily • Complex enough to be meaningful • “Meaningful” as measured by weak labels (Singh et al. 2012) (Doersch et al. 2012)

Mid-level “Visual Elements” • Doersch et al. 2012 • Singh et al. 2012 • Jain et al. 2013 • Endres et al. 2013 • Juneja et al. 2013 (Singh et al. 2012) (Doersch et al. 2012) • Li et al. 2013 • Sun et al. 2013 • Wang et al. 2013 • Fouhey et al. 2013 • Lee et al. 2013

Our goal • Provide a mathematical optimization for visual elements • Improve performance of mid-level representations.

Elements as Patch Classifiers

What if the labels are weak? • E.g. image has horse/no-horse • (Or even weaker, like Paris/not-Paris) • Idea: Label these all as “horse” • Problem: 10,000 patches per image, most of which are unclassifiable.

The weaker the label, the bigger the problem. Task: Learn to classify Paris from Not-Paris Paris Also Paris

Other approaches • Latent SVM: • Assumes we have one instance per positive image • Multiple instance learning • Not clear how to define the bags

What if the labels are weak? • Negatives are negatives, positives might not be positive • Most of our data can be ignored • First: how to cluster without clustering everything (Singh et al. 2012) (Doersch et al. 2012)

Mean shift

Patch distances Input Nearest neighbor Min distance: 2.59e-4 Max distance: 1.22e-4

Mean shift

Paris Not Paris Negative Set

Paris Not Paris Density Ratios

Positive Negative Adaptive Bandwidth Bandwidth

Discriminative Mode Seeking • Find local optima of an estimate of the density ratio • Allow an adaptive bandwidth • Be extremely fast • Minimize the number of passes through the data

Discriminative Mode Seeking • Mean shift: maximize (w.r.t. w) w Bandwidth Patch Feature Distance Centroid b

Discriminative Mode Seeking B(w) is the value of b satisfying:

Discriminative Mode Seeking • Distance metric: Normalized Correlation optimize s.t.

Positive Negative Discriminative Mode Seeking optimize s.t. w

Optimization • Initialization is straightforward • For each element, just keep around ~500 patches where wTx - b > 0 • Trivially parallelizable in MapReduce. • Optimization is piecewise quadratic s.t.

Evaluation via Purity-Coverage Plot • Analogous to Precision-Recall Plot

Low Purity Element 1 Element 2 Element 3 Element 4 Element 5

High purity, Low Coverage Element 1 Element 2 Element 3 Element 4 Element 5

Paris Not Paris Purity-Coverage Curve Purity x1e4 pixels Coverage

Purity-Coverage Curve • Coverage for multiple elements is simply the union.

This work Purity-Coverage This work, no inter-element SVM Retrained 5x (Doersch et al. 2012) LDA Retrained 5x LDA Retrained Exemplar LDA (Hariharan et al. 2012) Top 25 Elements Top 200 Elements 1 0.98 0.96 0.94 0.92 Purity 0.9 0.88 0.86 0.84 0.82 0.8 0 0.1 0.2 0.3 0.4 0.5 0 0.2 0.4 0.6 0.8 Coverage (fraction of positive dataset) Coverage (fraction of positive dataset)

Results on Indoor 67 Scenes Kitchen Grocery Bowling Bakery Bathroom Elevator

Results on Indoor 67 Scenes

Qualitative Indoor67 Results

Indoor67: Error Analysis Guess: staircase Guess: grocery store GT: corridor Ground Truth (GT): deli GT: laundromat GT: museum Guess: garage Guess: closet

Thank you! More results at http://graphics.cs.cmu.edu/projects/discriminativeModeSeeking/ Paris Elements • Indoor 67 Elements Indoor 67 Heatmaps• Source code (soon) Guess: staircase Guess: grocery store GT: corridor Ground Truth (GT): deli GT: laundromat GT: museum Guess: garage Guess: closet

Some New Paris Elements

Visual Element Discovery as Discriminative Mode Seeking

Visual Element Discovery as Discriminative Mode Seeking

Presentation Transcript

Visual Analogy in Scientific Discovery

Schema Mapping as Query Discovery

Manganese as a primary alloying element

Comorbidity as a New Data Element

Discriminative Classifiers

Architectural discovery with Visual Studio 11

“WATER AS A VITAL ELEMENT’’

Mid-level Visual Element Discovery as Discriminative Mode Seeking

Typography as a design element

Visual Discovery Management: Divide and Conquer

Discriminative Model Checking

Freedmen’s Bureau Visual Discovery

Regulatory element discovery for developmental time series

Multiplexer as a Universal Element

Discriminative NBS Tracking

Has thing been declared as element?

Discriminative Sub-categorization

ISTISNA AS MODE OF FINANCING

Visual Evidence / E-Discovery LLC

SVM-KNN Discriminative Nearest Neighbor Classification for Visual Category Recognition

802.11 Discovery Phase and Passive Scanning Mode

Dynamic Queries for Visual Information Seeking Ben Shneiderman