Depth Estimation via Scene Classification

Vladimir Nedović Depth Estimation via Scene Classification vnedovic@science.uva.nl with: Arnold Smeulders & Jan-Mark Geusebroek (UvA) André Redert (Philips Research) 28-05-2008

Order in Pollock's Chaos R.P. Taylor, A.P. Micolich and D. Jonas, Fractal Analysis Of Pollock's Drip Paintings, Nature, vol. 399, p.422 (1999) Jackson Pollock, Blue Poles: Number 1, 1952 Pre-perspective (Gothic art, before 1430) Know any tilted buildings? Simone Martini (1285-1344) Post-perspective (Quattrocento, after 1430) W. Richards, A. Jepson and J. Feldman, Priors, Preferences and Categorical Percepts, in Perception as Bayesian Inference, pp. 80-111, 1996. Sandro Botticelli, Annunciation, 1489-90 seems chaotic, but there is structure - same as in natural image statistics viewpoint constraints understood, influence on film art ‘modal’ scene configurations – structures orthogonalto each other

Outline Introduction Related work Our approach Preliminary classification Conclusions

Introduction The context: fully automatic 2D to 3D conversion of video data for 3DTV • We know about stereo, structure from motion, etc. but can we also derive depth from a single image? • humans can, right? • Can we exploit some constraints? • is the data really chaotic? • what about perceptual limitations of viewers? GOAL: in a fast manner, obtain an approximate, but visually pleasing 3D model from a single image

Related work • Related work (1): Torralba & Oliva • showed that depth can be derived from structure, itself derived from natural image statistics (IEEE PAMI 2001) • Related work (2): Hoiem (Carnegie Melon Univ.) • obtained 3D orientation of scene surfaces using machine learning (ICCV 2005) • improved object detection (CVPR 2006 best paper) + accounted for occlusions to derive relative ordering of elements (ICCV 2007) • BUT: • outdoor images only + assumes sky&ground are always present • i.e. accounts for less than half of all possibilities • Related work (3): Saxena (Stanford Univ.) • 3D mesh from ML on low-level features (no classes)

stage • Separate a visual scene into its two constituent elements: • consider objects separately from the stage on which they act object Our approach Our approach: depth estimation via geometric scene classification • i.e. holistic, not pixel-based Determine the 3D stage model first • Stage ≈ first approximation of global depth • reduces subsequent (finer) depth processing tasks • can guide other processes, e.g. object localization & recognition V. Nedović et al. ICCV2007

Our approach- stage models - For the stage, a rough depth model is sufficient • regularities arise from: • natural image statistics -> texture gradients • viewpoint constraints -> perspective • modal configurations & film rules -> orthogonality Exploit geometric structure of images, which reduces the number of possible configurations Only a few configurations are prominent => the first step in depth estimation can be stage classification

Our approach- stage hierarchy - • Structure of the visual world leads to only 15 geometric scene types • Influence of structure identical indoors & outdoors => such distinction unnecessary • Three-level hierarchy • perform classification in steps: first determine the geometric neighbourhood, then proceed further

i.e. 2-3 sub-stages per each stage accounting for variability in parameters • geometry at bottom so constrained that pre-defined crude depth maps already possible i.e. no parameter estimation needed! Our approach- three-level hierarchy -

TRECVID dataset of TV news used for evaluation • Features extracted based on a 4x4 region grid over the image • two features per region => 64 features in total A.F. Smeaton et al. “Evaluation campaigns and TRECVid”, 8th ACM Int’l Workshop on Multimedia Info. Retrieval, 2006. Preliminary classification (1) • Proof of concept with a single feature type • natural image statistics-based Weibull features (i.e. texture gradients)

stage groups individual stages (results of symmetrical variants combined) • two-step classification, average within group (assuming super-stage is known) Preliminary classification (2) • Support Vector Machines (SVM) classifier based on a 1 vs. 1 multi-class approach

Conclusions (1) • We need a fast & approximate solution: • do only what is necessary, viewers may not perceive it anyway • generalize where possible, to reduce the problem at every step • Separate a scene into a stage and the objects • Determine the stage 3D model first • rough model is sufficient • plus, structure greatly reduces the number of possible configurations • and, stage will help us to locate and process objects

Conclusions (2) • Due to structure, we can create simple models that fit TV data • 15 stages is sufficient • no need to distinguish between indoor & outdoor • Therefore, we can use scene classification as the first step in depth estimation

Conclusions (3) • Our approach: three-step classification • geometry at the bottom constrained enough, so we can already assign pre-defined depth maps • no parameter estimation necessary • Proof of concept demonstrated with a single feature type • performance much better than chance • but enhancements needed (more features etc.)

Questions?

Depth Estimation via Scene Classification

Depth Estimation via Scene Classification

Presentation Transcript

Cost concepts, Cost Classification and Estimation

Video Scene Segmentation Via Continuous Video Coherence

Scene Classification: Computational and Cognitive Approaches

Nonparametric Scene Parsing via Label Transfer

Depth Estimation and Focus Recovery

Nonparametric Scene Parsing: Label Transfer via Dense Scene Alignment

Approximating the Depth via Sampling and Emptiness

Document Classification via Term Distribution Similarity

AGE ESTIMATION: A CLASSIFICATION PROBLEM

Illumination Estimation via Thin Plate Spline

Illumination Estimation via Non-Negative Matrix Factorization

Practical Scene Illuminant Estimation via Flash/No-Flash Pairs

Interconnect Estimation without Packing via ACG Floorplans

Global TRANSFORMATION ESTIMATION VIA LOCAL REGION CONSENSUS

Depth Estimation for Ranking Query Optimization

Pattern Classification via Density Estimation

Wavelet phase estimation without accurate time-depth conversion

Scene Classification

Relation Extraction (RE) via Supervised Classification

Precipitation Analysis Areal Precipitation Estimation Depth - Area Analysis