210 likes | 238 Views
Multi-Scale Video Cropping. Hazem El-Alfy , David Jacobs and Larry Davis Department of Computer Science University of Maryland, College Park Sep 25 th 2007, ACM MM ’07. Modern Surveillance Systems. Networks of sur-veillance cameras. Control Room: Fewer monitors than cameras.
E N D
Multi-Scale Video Cropping Hazem El-Alfy, David Jacobs and Larry Davis Department of Computer Science University of Maryland, College Park Sep 25th 2007, ACM MM ’07
Modern Surveillance Systems • Networks of sur-veillance cameras. • Control Room: • Fewer monitors than cameras. • Far fewer operators than monitors. • Cameras cycle through monitors.
Modern Surveillance Systems Typical Control Rooms: airports, subways, metropolitan areas, seaports, crowd control.
“Future” Control Rooms • “Continuous” display wall versus a fixed set of discrete monitors. • Algorithms to control: • where to display videos, • how much area to assign to them, • how to display them. Barco Control Room, Vienna, Austria
Video Cropping Munich Airport – Courtesy Siemens, NJ
Why Cropping? • Resize video to save bandwidth or to fit display area. • Cropping before resizing to focus operator attention of on important areas.
y t x Problem Definition Determine trajec-tories of cropping windows through the video: • variable size window • maximize captured saliency • smooth trajectory • occasional jumps (cuts) between trajectories.
Problem Definition • Each frame t covered by variable size overlapping windows Wi,t • Saliency measure S(Wi,t) • argmaxQΣtS(Wi,t), over all window sequences Q • Subject to constraints for smooth window motion and size change. Wi,t
Our Approach: Overview • Extract motion energy. • Model video as a graph. • Find trajectories as shortest paths in graph. • Merge trajectories. • Repeat for other segments of long videos.
Extracting Motion Energy • Motion energy as a saliency measure. • Frame differences are smoothed using morphological operations.
Modeling Graph • Nodes: cropping windows in each frame. • Add dummy source and target nodes. • Edges: allowable window changes (location and size) between consecutive frames. w=0 dummy source node dummy target node w=0 windows of first frame windows of i th frame windows of last frame
Modeling Graph • Multi-scale energy function for window W: • E(W) = S(W): always favors large windows • E(W) = S(W)/A(W): favors small (dense) windows • E(W) = S(Win)/A(Win) – Sbelt/K • Edge weight: wij = 1 – ENorm(Wj)
x2 x1 cropping window W x4 x3 video frame Modeling Graph • Energy function computed for all windows in all frames. • Efficiently computed using integral images [Viola & Jones ’01]: • ii(x,y) = Σx’<x,y’<yi(x’,y’) • E(W)=ii(x3)-ii(x2)-ii(x4)+ii(x1)
Shortest Path • Dial’s implementation of Dijkstra’s algorithm: linear in # graph nodes. • Smoothing: low-pass filter + cubic Hermite interpolation.
Merging Trajectories • More cropping windows needed to capture simultaneous activity. • Wipe captured activity from motion frames and repeat earlier process on remaining motion. • Merge trajectories: find shortest path through a graph of trajectories.
Processing Long Videos • Problems: • Graph gets too big if video is long. • Latencies must be short in surveillance systems. • Solution: • Break long videos into segments with overlap. • Process each segment then stitch results together. break here break here
Processing Long Videos • Issues • How short can segments be? • Are there preferable locations to break video? • Overlap amount needed for smooth transitions? • We ran many experiments for fixed size crop • Shortest path converge quickly. Segments can be as short as 40 frames. • Avoid periods of low activity when breaking video. • Overlap intervals of 20 frames are sufficient.
Results Munich Airport: variable size single window.
Results Munich Airport: video-in-video display.
Results Traffic at a stop sign on campus (2 windows).
Contributions • Variable size smooth cropping window. • Simultaneous multiple cropping windows. • Relatively short video segments processed vs. the entire video (online). • Empirically shown identical to processing the largest video that can be processed as a whole.