Urban Scene Analysis

Urban Scene Analysis James Elder & Patrick Denis York University

Phase IV Objectives • Single-View 3D Reconstruction • Scene Dynamics • Scene Segmentation and Labelling

Single-View Reconstruction

Streetscape

Ultimate Goal • Our ultimate goal is to automate this process!

Immediate Goal • Automatic estimation of the three vanishing points corresponding to the “Manhattan directions”.

Ground Truth Database

Manhattan Frame Geometry • An edge is aligned to a vanishing point if the interpretation plane normal is orthogonal to the vanishing point vector in the Gauss Sphere

Mixture Model Image • Each edge Eij in the image is generated by one of four possible kinds of scene structure: • m1-3: a line in one of the three Manhattan directions • m4: non-Manhattan structure • The observable properties of each edge Eij are: • position • angle • The likelihoods of these observations are co-determined by: • The causal process (m1-4) • The rotation Ψ of the Manhattan frame relative to the camera mi mi E11 E12 Ψ mi mi E22 E21

Mixture Model Image • Our goal is to estimate the Manhattan frame Ψ from the observable data Eij. mi mi E11 E12 Ψ mi mi E22 E21

Design Criteria • Accuracy • Speed

Design Decisions • Features • Dense gradient map • Sparse sub-pixel localized edges • Measurement Space • Image • Gauss Sphere • Search Method • Coarse-to-Fine (Coughlan & Yuille 2001) • Quasi-Newton • EM • Quasi-EM

Accuracy 12 MW Edge-Based Coarse-to-Fine MW Params 10 Edge-Based Newton MW Params Edge-Based Newton 8 Edge-Based EM Edge-Based Quasi-EM 6 Edge-Based Quasi-EM GS Anuglar Error (deg) 4 2 0 Horizontal VPs Vertical VP Error Type

Speed 160 155 MW 150 Edge-Based Coarse-to-Fine MW Params 145 Edge-Based Newton MW Params 140 Edge-Based Newton Edge-Based EM Time (sec) Edge-Based Quasi-EM 25 Edge-Based Quasi-EM GS 20 15 10 5 0 Method

Conclusions • We have developed an algorithm for automatically estimating the Manhattan frame from a single camera. • This algorithm is 40% more accurate and roughly 3 times faster than the leading prior method. • This algorithm will be used as a basis for single-view reconstruction of urban scenes.

Single-View Reconstruction • Potential Research Objectives for Phase IV • Recover connected Manhattan cuboids • Connected, labelled line segments • Connected, labelled rectangular facets • Estimate scale factor • From pedestrian, vehicle traffic • From building features whose size is approximately known (e.g., doors) • Integrate with other data sources • Existing 3D models on coarser scale • 3D models from cameras with overlapping fields of view

FOVEAL IMAGE TILT PAN WIDE-FIELD IMAGE Projects: Pre-Attentive and Attentive Sensing

Motion Region Log Likelihood Ratio 4 2 Joint Region Log Likelihood Ratio 0 4 -2 2 Foreground Region Log Likelihood Ratio -4 4 0 2 -2 0 -4 -2 -4 Skin Region Log Likelihood Ratio 4 2 0 -2 -4 Statistical Integration of Weak Cues

Wide-Field Person Detection

confirmed face location mean body indicator motion kernel spatial prior gaze command prior posterior random sampler gaze control high-resolution face detection non-max suppression likelihood Attentive sensor motion kernel Attentive Feedback Loop

Attentive High-Res Video Surveillance

Pose-Invariant Face Recognition(with Simon Prince, UCL)

Projects: 3D Facial Estimation and Modelling

Scene Dynamics • Potential Research Objectives for Phase IV • Person re-identification • Individuation (counting) in crowds

Scene Segmentation

Using Prior Knowledge: Example

Experimental Results

Scene Segmentation • Potential Research Objectives for Phase IV • Application to urban scenes • Scene layout • Ground plane • Buildings • Vegetation • Sky • Material recognition • Integrated text recognition

Urban Scene Analysis