Urban Scene Analysis

Urban Scene Analysis James Elder & Patrick Denis York University

Phase IV Objectives • Single-View 3D Reconstruction • Scene Dynamics • Scene Segmentation and Labelling

Single-View Reconstruction

Ultimate Goal • Our ultimate goal is to automate this process!

Immediate Goal • Automatic estimation of the three vanishing points corresponding to the “Manhattan directions”.

Manhattan Frame Geometry • An edge is aligned to a vanishing point if the interpretation plane normal is orthogonal to the vanishing point vector in the Gaussian Sphere (i.e. dot product is 0)

Mixture Model Image • Each edge Eij in the image is generated by one of four possible kinds of scene structure: • m1-3: a line in one of the three Manhattan directions • m4: non-Manhattan structure • The observable properties of each edge Eij are: • position • angle • The likelihoods of these observations are co-determined by: • The causal process (m1-4) • The rotation Ψ of the Manhattan frame relative to the camera mi mi E11 E12 Ψ mi mi E22 E21

Mixture Model Image • Our goal is to estimate the Manhattan frame Ψ from the observable data Eij. mi mi E11 E12 Ψ mi mi E22 E21

E-M Algorithm • M Step • Given estimates of the mixture probabilities for each edge, update our estimate of the Manhattan coordinate frame

Design Criteria • Accuracy • Speed

Design Decisions • Features • Dense gradient map • Sparse sub-pixel localized edges • Measurement Space • Image • Gauss Sphere • Search Method • Coarse-to-Fine (Coughlan & Yuille 2001) • Quasi-Newton • EM • Quasi-EM

MW 12 Edge-Based Coarse-to-Fine MW Params 10 Edge-Based Newton MW Params Edge-Based Newton 8 Edge-Based EM Anuglar Error (deg) Edge-Based Quasi-EM 6 Edge-Based Quasi-EM GS 4 2 0 Horizontal VPs Vertical VP Error Type

Speed 160 155 MW 150 Edge-Based Coarse-to-Fine MW Params 145 Edge-Based Newton MW Params 140 Edge-Based Newton Edge-Based EM Time (sec) Edge-Based Quasi-EM 25 Edge-Based Quasi-EM GS 20 15 10 5 0 Method

Single-View Reconstruction • Potential Research Objectives for Phase IV • Recover connected Manhattan cuboids • Connected, labelled line segments • Connected, labelled rectangular facets • Estimate scale factor • From pedestrian, vehicle traffic • From building features whose size is approximately known (e.g., doors) • Integrate with other data sources • Existing 3D models on coarser scale • 3D models from cameras with overlapping fields of view

FOVEAL IMAGE TILT PAN WIDE-FIELD IMAGE Projects: Pre-Attentive and Attentive Sensing

Motion Region Log Likelihood Ratio 4 2 Joint Region Log Likelihood Ratio 0 4 -2 2 Foreground Region Log Likelihood Ratio -4 4 0 2 -2 0 -4 -2 -4 Skin Region Log Likelihood Ratio 4 2 0 -2 -4 Statistical Integration of Weak Cues

confirmed face location mean body indicator motion kernel spatial prior gaze command prior posterior random sampler gaze control high-resolution face detection non-max suppression likelihood Attentive sensor motion kernel Attentive Feedback Loop

Wide-Field Person Detection

Attentive High-Res Video Surveillance

Attentive Snapshots

Automatically Confirmed High-Resolution Faces

Pose-Invariant Face Recognition(with Simon Prince, UCL)

Projects: 3D Facial Estimation and Modelling

Scene Dynamics • Potential Research Objectives for Phase IV • Person re-identification • Individuation (counting) in crowds

Using Prior Knowledge: Example

Experimental Results

Mean relative errors 0.45 0.4 0.35 0.3 0.25 Relative Error 0.2 0.15 0.1 0.05 0 SS vs. MS RC vs. MS EJ vs. MS Experimental Results

Scene Segmentation • Potential Research Objectives for Phase IV • Application to urban scenes • Scene layout • Ground plane • Buildings • Vegetation • Sky • Material recognition • Integrated text recognition

Urban Scene Analysis