380 likes | 575 Views
Geo-Spatial Aerial Processing for Scene Understanding and Object Tracking. Jiangjian Xiao, Hui Cheng, Feng Han, Harpreet Sawhney. Problem. Given Aerial Video Understand the Scene Find buildings Trees Roads Cars Use understanding Object Detection Tracking Cool Idea
E N D
Geo-Spatial Aerial Processing for Scene Understanding and Object Tracking Jiangjian Xiao, Hui Cheng, Feng Han, Harpreet Sawhney
Problem • Given Aerial Video • Understand the Scene • Find buildings • Trees • Roads • Cars • Use understanding • Object Detection • Tracking • Cool Idea • Trees and buildings are in 3D
Related Work CVPR 2006 Hui Cheng, Darren Butler and Chumki Basu ViTex: Video To Tex and Its Applications in Aerial Video Survellance.
Related Work CVPR2008 Jake Porway, Kristy Wang, Benjamin Yao, Song Chun Zhu A Hierarchical and Contextual Model for Aerial Image Understanding
System Overview Stage 1 Initial camera location Geo-reference image Input Frames Pose estimation Depth estimation Geo-registration Non-ground object detection Planar + depth extension for structure detection Road Detection GIS Stage 2 Scene segmentation output
Stage1 Stage 1 Initial camera location Geo-reference image Input Frames Pose estimation Depth estimation Geo-registration
GeoRegistration GPS Aircraft Parameters Camera Parameters Geo-reference image Meta Data Input Frames Geo-registration
GeoRegistration Frame To Frame transformations SIFT matching GPS Aircraft Parameters Camera Parameters Bundle Adjustment
Stage1 Stage 1 Initial camera location Geo-reference image Input Frames Pose estimation Depth estimation Geo-registration
Adjusting camera position • Metadata Gives camera position • Along with many other parameters • Metadata has error • In all parameters • Georegistration overcomes error • Returns a 3x3 homography matrix • Want to figure out the exact camera position
Adjusting camera position Ground Point Image Point Project Ground point to image
Adjusting camera position Alternatively the point can be projected using homography obtained from georegistration Get rid of translation parameters
Adjusting camera position Extract rotation and calibration parameters using SVD smooth and Using Kalman filter Use refined and to estimate translation parameters
Stage1 Stage 1 Initial camera location Geo-reference image Input Frames Pose estimation Depth estimation Geo-registration
Depth Estimation • Use graphcuts to estimate depth • A difficult task due to poor image quality, and unconstrained motion • Solution • Fuse depthmaps • Project several depthmaps unto the DOQ • Take their average • Smooth out the average map • Depth is quantized along Z direction
Stage 2 Non-ground object detection Planar + depth extension for structure detection Road Detection GIS Stage 2 Scene segmentation output
Detect Non-Ground Regions Threshold Depth Map
Stage 2 Non-ground object detection Planar + depth extension for structure detection Road Detection GIS Stage 2 Scene segmentation output
Detect Roofs Threshold Depth Map Fit Plane Remove Trees
“Roof” Refinement Fit a plane to the detected “roofs”. We have a set of x,y,z points Want to fit
“Roof” refinement v u z Z Depth Along u Must be invariant Z Y
Building Detection Extend Roof To Ground Gives Building height
Tree Detector Classify each pixel as tree non-tree 9D Gaussian Mixture Color, Depth, Texture Supervised offline training
Stage 2 Non-ground object detection Planar + depth extension for structure detection Road Detection GIS Stage 2 Scene segmentation output
GIS constrained Road Detection Want to determine Precise road center Road Width Road Information Provided by GIS
Training • Sample Patches along roads • Align patches along road direction • Extract Features • Color • Gradient • Feature Vector = histogram of color and gradients • Model: Gaussian Mixture model • Offline Training
Detection Align the Road Extract patches Feed patches into MOG model Response of the model Gradient Histogram Gives Road center Peaks Give Road bounds
Object Detection • Stabilization • Optical flow warping • Depth warping
Tracking with/without depth without depth with depth
Tracking with/without depth without depth with depth
Quantitative Results Multiple object racking accuracy False acceptance count False rejection count False identity switches Ground truth object count
Quantitative Results MOTA improvement: 0.740 to 0.851 (15% improvement) FAR improvement: 0.190 to 0.072 (62% improvement)