210 likes | 334 Views
MULTI-TARGET TRACKING THROUGH OPPORTUNISTIC CAMERA CONTROL IN A RESOURCE CONSTRAINED MULTIMODAL SENSOR NETWORK. Jayanth Nayak 1 , Luis Gonzalez-Argueta 2 , Bi Song 2 , Amit Roy-Chowdhury 2 , Ertem Tuncel 2 Department of Electrical Engineering, University of California, Riverside.
E N D
MULTI-TARGET TRACKING THROUGH OPPORTUNISTIC CAMERA CONTROL IN ARESOURCE CONSTRAINED MULTIMODAL SENSOR NETWORK Jayanth Nayak1, Luis Gonzalez-Argueta2, Bi Song2, Amit Roy-Chowdhury2, Ertem Tuncel2 Department of Electrical Engineering, University of California, Riverside Bourns College of Engineering Information Processing Laboratory www.ipl.ee.ucr.edu ICDSC'08
Overview • Introduction • Problem Formulation • Audio And Video Processing • Camera Control Strategy • Computing Final Tracks Of All Targets • Experimental Results • Conclusion • Acknowledgements ICDSC'08
Motivation • Obtaining multi-resolution video from a highly active environment requires a large number of cameras. • Disadvantages • Cost of buying, installing and maintaining • Bandwidth limitations • Processing and storage • Privacy • Our goal: minimize the quantity of cameras by a control mechanism that directs the attention of the cameras to the interesting parts. ICDSC'08
Proposed Strategy • Audio sensors direct the pan/tilt/zoom of the camera to the location of the event. • Audio data intelligently turns on the camera and video data turns off the camera. • Audio and video data are fused to obtain tracks of all targets in the scene. ICDSC'08
Example Scenario An example scenario where audio can be used to efficiently control two video cameras. There are four tracks that need to be inferred. Directly indicated on tracks are time instants of interest, i.e., initiation and end of each track, mergings, splittings, and cross-overs. The mergings and crossovers are further emphasized by X. Two innermost tracks coincide in the entire time interval (t2, t3). The cameras C1 and C2 need to be panned, zoomed, and tilted as decided based on their own output and that of the audio sensors a1, . . . , aM. ICDSC'08
Relation To Previous Work • Fusion of simultaneous audio and video data. • Our audio and video data are captured at disjoint time intervals. • Dense network of vision sensors. • In order to cover a large field, we focus on controlling a reduced set of vision sensors. • Our video and audio data is analyzed from dynamic scenes. ICDSC'08
Problem Formulation • Audio sensors A = {a1, . . . , aM} are distributed across ground plane R • R is also observable from a set of controllable cameras C = {c 1, . . . ,cL}. • However, entire region R may not be covered with one set of camera settings. • p-tracks: tracks belonging to targets • a-tracks: tracks obtained by clustering audio • Resolving p-track ambiguity • Camera Control • Person Matching ICDSC'08
Tracking System Overview a-tracks Overall camera control system. Audio sensors A = {a1, . . . , aM} are distributed across regions Ri. The set of audio clusters are denoted by Bt, and Kt− represent the set of confirmed a-tracks estimated based on observations before time t. P/T/Z cameras are denoted by C = {c1, . . . , cL}. Ground plane positions are denoted by Otk . ICDSC'08
Processing Audio and Video • a-tracks are clusters of audio data that are above amplitude threshold • Tracked using Kalman Filter • In video, people are detected using histogram of orientated gradients and tracked using Auxilary Particle Filter ICDSC'08
Mapping From Image Plane to Ground Plane • Learned parameters are used to transform tracks from image plane to ground plane • Estimate projective transformation matrix H during a calibration phase • Precompute H for each PTZ setting of each camera vanishing line ICDSC'08
Tracking System Overview ICDSC'08
Camera Control • Camera control • Goal: avoid ambiguity or disambiguate when tracks • are created or deleted • intersect • merge • Set pan/tilt/zoom parameters ICDSC'08
Setting Camera Parameters • Heuristic algorithm • Cover ground plane by regions Ril • Rilin field of view of camera Cl • Camera parameters • Tracking algorithm specifies point of interest x from last known a-track • If no camera on, find Rilcontaining x • Reassign a camera and set its parameters if x approaches boundary of current Ril ICDSC'08
Separation Location (Meters) Location (Meters) Location (Meters) Location (Meters) Location (Meters) Location (Meters) Time (Seconds) Time (Seconds) Time (Seconds) Time (Seconds) Time (Seconds) Time (Seconds) Camera Control Based on Track Trajectories Switch to video Intersection Merger Undetected Disappearance SuddenDisappearance Sudden Appearance ICDSC'08
Creating Final Tracks Of All Targets • Bipartite graph matching over a set of color histograms • We collect features as the target enters and exits the scene in video. • For every new a-track, features are collected from a small set of frames. • The weight of an edge is the distance between the observed video features. • Additionally, audio data is enforced on the weights. ICDSC'08
Location (Meters) Time (Seconds) Creating Final Tracks Using Bipartite Matching [a+, a-] Audio Video Audio Tracking in Audio and Video Tracking in Audio Only [c+] Video Location (Meters) [d+] [c-] [f+] [b+, b-] [e+, e-] [d-] Time (Seconds) [g+] Bipartite Graph Matching Without Audio Constraint Bipartite Graph Matching Audio cannot disambiguate independence once the clusters have merged. a b c d e f g a b c d e f g a b c d e f g a b c d e f g + + - - Three tracks are recovered by matching every node (entry and exit from the scene) where video was capture. Two tracks are recovered . However, red and green show the wrong path. ICDSC'08
Experimental Results Inter P-Track Distance at a Merge Event Inter P-Track Distance at a Crossover Event ICDSC'08
Experimental Results (Cont.) Click To Review Layout ICDSC'08
Conclusion • Goal: minimize camera usage in a surveillance system • Save power, bandwidth, storage and money • Alleviate privacy concerns • Proposed a probabilistic scheme for opportunistically deploying cameras in a multimodal network. • Showed detailed experimental results on real data collected in multimodal networks. • Final set of tracks are computed by bipartite matching ICDSC'08
Acknowledgements This work was supported by Aware Building: ONR-N00014-07-C-0311 and the NSF CNS 0551719. Bi Song2 and Amit Roy-Chowdhury2 were additionally supported by NSF-ECCS 0622176 and ARO-W911NF-07-1-0485. ICDSC'08
Thank You. • Questions? Jayanth Nayak1 nayak@mayachitra.com Luis Gonzalez-Argueta2, Bi Song2, Amit Roy-Chowdhury2, Ertem Tuncel2 {largueta,bsong,amitrc,ertem}@ee.ucr.edu Bourns College of Engineering Information Processing Laboratory www.ipl.ee.ucr.edu ICDSC'08