440 likes | 749 Views
Aerial Video Surveillance and Exploitation. Roland Miezianko CIS 750 - Video Processing and Mining Prof. Latecki. Agenda. Aerial Surveillance Comparisons Technical Challenges and the Mission Framework Ideas for Video Surveillance Alignment and Change Detection Mosaicing
E N D
Aerial Video Surveillance and Exploitation Roland Miezianko CIS 750 - Video Processing and Mining Prof. Latecki
Agenda • Aerial Surveillance Comparisons • Technical Challenges and the Mission • Framework Ideas for Video Surveillance • Alignment and Change Detection • Mosaicing • Tracking Moving Objects • Geo-location • Enhanced Visualization • Image Mosaics
Types of Aerial Surveillance • Using film and framing cameras • Hi-resolution still images • Examined by human or machine • Video captures dynamic events • Used to detect and geo-locate moving objects in real-time • Follow detected motion • Constantly monitor a site
Technical Challenges, 1 • Video cameras have lower resolution than framing cameras • Video uses telephoto lens to get high resolution to identify objects • Telephoto lens - Narrow field of view • Provides “soda straw” view of the scene [2]
Technical Challenges, 2 • Camera must scan the region of interest to get the “full-picture” • Objects of interest move in and out of the field of view • Difficulty in perceiving object relative locations
Technical Challenges, 3 • Challenge in manually tracking an object due to camera’s small field of view • Video contains much more data then film frames; Storage is expensive
The Mission • The new aerial surveillance systems must provide a framework for spatio-temporal aerial video analysis
Video AnalysisFramework, 1 • Frame-to-Frame alignment and decomposition of video frames into motion layers • Mosaicing static background layers to form panoramas as compact representations of the static scene
Video AnalysisFramework, 2 • Detecting and tracking independently moving objects in the presents of background clutter • Geo-locating the video and tracked objects by registering it to controlled reference imagery; digital terrain maps and models
Video AnalysisFramework, 3 • Enhanced visualization of the video by re-projecting and merging it with reference imagery, terrain, and maps to provide a larger context
Alignment and Change Detection, 1 • Displacement of pixels between video frames may occur due to the following: • Motion of the video sensor • Independent motion of objects in the field of view • Motion of the source of illumination
Alignment and Change Detection, 2 • Global motion estimation • Displacement of pixels due to the motion of the sensor is computed • Alignment of Video Frames • Pyramid-Processing • Lock into the motion of background scene • Warp images into common coordinate frame
Alignment and Change Detection, 3 • Moving objects are detected by aligning video frames and detecting pixels with poor correlation across the temporal domain
MosAICING, 1 • Images are accumulated into the mosaic as the camera pans • Construction of a 2D mosaic requires computation of alignment parameters that relate all of the images in the collection to a common world coordinate system
MosAICING, 2 • Transformation parameters are used to warp the images into the mosaic coordinate system • Warped images are then combined to form a mosaic • To avoid seams, warped frames are merged in the Laplacian pyramid domain
Tracking Moving Objects, 1 • Scene analysis includes operations that interpret the source video in terms of objects and activities in the scene • Moving objects are detected and tracked over the cluttered scene
Tracking Moving Objects, 2 • State of each moving object is represented by its: • Motion • Appearance • Shape • The state is updated at each instant of time using Expectation-Maximization (EM) algorithm
Geo-location • Video Surveillance system must also determine the geodetic coordinates of objects within the camera’s field of view • More precise geo-locations can be estimated by aligning video frames to calibrated reference images
Enhanced Visualization • Challenging aspect of aerial video surveillance is formatting video imagery for effective presentation to an operator • The “soda straw” view makes direct observation tedious and disorienting
Mosaic-Based Display • Display de-couples the observer’s display from the camera • Operator may scroll or zoom to examine one region of the mosaic even as the camera is updating another region of the mosaic
Elements of Mosaic Display camera Pyramid merge warp merge ED Estimate displacement Update window Operator’s display Image accumulating memory
Psuedo codes of main algorithm [5] read(base_image); read(unregistered_image); base_image=expand(base_image); confirm three pairs of matched points between base_image and unregistered_image;calculate initial matrix M;Apply Levenberg-Marquardt minimization to update M;M = inverse(M);Resample and apply blending function to render the mosaics;
Homogeneous Coordinates Using homogeneous coordinates, we can describe the class of 2D planar projective transformations using matrix multiplication [4]:
Rigid Transformation The same hierarchy of transformations exists in 3D. Rigid (Euclidean) transformation where R is a 3 × 3 orthonormal rotation matrix and t is a 3D translation vector.
Viewing Matrix The 3×4 viewing matrix: projects 3D points through the origin onto a 2D projection plane a distance f along the z axis.
Combined Equations The combined equations projecting a 3D world coordinate p = (x, y, z, w) onto a 2D screen location u = (x', y', w') can thus be written as where P is a 3 × 4 camera matrix. This equation is valid even if the camera calibration parameters and/or the camera orientation are unknown.
Local Image Registration, 1 • How do we compute the transformations relating the various scene pieces so that we can paste them together? • We could manually identify four or more corresponding points between the two views • Manual approaches are too tedious to be useful
Local Image Registration, 2 • This has the advantages of not requiring any easily identifiable feature points and of being statistically optimal, that is, giving the maximum likelihood estimate once we are in the vicinity of the true solution. Rewrite our 2D transformations
Minimizes Intensity Errors Technique minimizes the sum of the squared intensity errors. Over all corresponding pairs of pixels i inside both images I(x, y) and I’(x’, y’). Pixels that are mapped outside image boundaries do not contribute.
Minimization To perform the minimization, we use the Levenberg-Marquardt iterative nonlinear minimization algorithm. This algorithm requires computation of the partial derivatives of ei with respect to the unknown motion parameters {m 0 ... m 7 }.
Conclusion • The techniques presented here automatically register video frames into 2D and partial 3D scene models. • Video mosaics and related techniques will enable an even more exciting range of interactive computer graphics, telepresence, and virtual reality applications.
References [1] Automatic Panoramic Image Construction Yap-Peng Tan, Sanjeev R. Kulkarni and Peter J. Ramadge Princeton University, Department of Electrical Engineering [2] Chapter 2 by Rakesh Kumar Aerial Video Survelliance and Exploitation [3] A Multiresolution Spline With Application to Image Mosaics PETER J. BURT and EDWARD H. ADELSON RCA David Sarnoff Research Center
References [4] Richard Szeliski. Video mosaics for virtual environments. IEEE Computer Graphics and Applications, 16(2):22--30, March 1996 [5] Jingbin Wang, Boston University CS580:Advanced Graphics Project 1: Image Mosaics