E-V: Efficient Visual Surveillance with Electronic Footprints

E-V: Efficient Visual Surveillancewith Electronic Footprints Jin Teng, Junda Zhu, Boying Zhang, Dong Xuan and Yuan F. Zheng IEEE Infocom 2012

Outline Deficiency of Visual Surveillance Systems A Brief of Our E-V System A Case Study A Broader View of Our E-V System Final Remarks 2014/11/16 2

Visual Surveillance 2014/11/16 3

Failure Examples Chicago police installed 10,000 surveillance cameras in the city, only 1 of 200 crimes is captured by the visual surveillance [2]! One of the bombers in London bombing (July, 2005) is not identified by the surveillance system and escaped [3]! 2014/11/16 4

Why fail? Visual technologies are not efficient and accurate enough to do automatic localization and tracking, and a lot of human power is needed! † Big Apple is Watching You: http://www.slate.com/articles/news_and_politics/explainer/2010/05/big_apple_is_watching_you.html • Large volume of video data • Temporal: 2.07*106 frames per camera per day • Spatial: tons of surveillance cameras in a city e.g. New York has 4176 video cameras in lower Manhattan area[1]. • Monitored objects may be visually occluded or have multiple inconsistent appearance 2014/11/16 5

Our Methodology: E-V Integration Combining electronic and visual signals for efficient surveillance E-V Integration makes it possible to efficiently and accurately localize and identify objects in a large volume of video data 7

Electronic Signals • Wireless channels: • Wireless address, such as WiFi MAC address • Content etc. 8

Pervasiveness of Electronic Signals • Electronic signals are emitted by many mobile devices • Mobile device’s popularity is increasing • Smartphone as an example: 302.6 million shipped in 2010 9

Our E-V System: A Bird’s Eye View 10

Our E-V System: Layers Surveillance Training Health Specific Applications Localization Identification Other Technologies Technologies Electronic Visual Other Signals Sensing Methods 11

Related Work on E-V Integration Fuse multiple sensors for tracking [4] Visual camera + RFID for monitoring [5] Existing work cannot achieve accuracy and efficiency for visual surveillance at the same time! 2014/11/16 12

A Typical Surveillance Scenario • Find a specific person given some vague visual information, i.e., retrieve his appearance in videos of a long period of time • If we depend on videos alone, we may need • Extract all human figures in each frame, which may come in the number of thousands, and compare them with a designated vague picture. • Involve a large amount of human efforts to stare at the videos, which may last several hours or even days, from a number of cameras. • With E-V integration, how can we do? 2014/11/16 14

Problem Formulation: Notations • V-sensing: V-ID and V Frame • V-ID: Visual identity, such as human figure • VID*: Our target V-ID • V Frame: a set of V-IDs with some background captured by visual sensors (cameras) in certain area and time • E-sensing: E-ID and E Frame • E-ID: Electronic identity such as MAC address etc. • EID*: Our target E-ID • E Frame: a set of E-IDs captured by electronic sensors in certain area and time • Vagueness and completeness • Vagueness: reflect how clearly a V-ID/E-ID can be identified • Completeness: reflect if V-IDs/E-IDs are complete in a V/E frame 15

Problem Formulation: Cases • General case: • Input: EID* (and VID*), and a set of E frames and corresponding V frames • Output: VID* in video frames √ √ √ √ √ √ • Baseline case ( ): • Input: clear EID*, (and vague VID*), and a set of E frames with clear and complete EIDs and V frames with vague and complete VIDs • Output: VID* in video frames (VID* may be different from given vague VID*) √ 2014/11/16 16

A Naïve Solution to the Baseline Case E frame 1 EID* EID1 E frame 2 E frame 3 EID* EID2 EID* EID2 EID3 • Two steps: • Step 1: Find out all E frames which include EID* (example) • Step 2: Identify VID* in their corresponding V frames • Comments: Few V frames to process because V frames without VID* are filtered out, but there may be still many V frames 17 Suppose we have three E/V frames. We go through them one by one.

Our Solution • E-Filtering • Find the minimum number of E Frames, whose intersection is the given E-ID, i.e. EID* • Much less frames for further V side processing • We will formulate it into the Element Distinguishing Problem (EDP) • V-Retrieval • Retrieve the V-ID from the filtered frames through intersection to determine VID* • We will formulate it into the n-partite Best Matching Problem (nBM). 18

E-filtering Overview E frame 1 E frame 1 EID1 EID* EID1 EID* EID2 EID3 E frame 2 E frame 3 E frame 2 EID* EID3 EID* EID2 Two E Frames are enough to identify EID* through intersection. EID2 19

Nature of E-Filtering At least one 0 in each non-EID* column • Finding the minimum number of frames, whose intersection is EID* • NP-complete: equivalent to the set cover problem • Whether each E-ID appears in each E frame is summarized in a matrix, with 1 meaning ‘appear’ and 0 ‘not appear’. • At least one 0 in each non-EID* column • Use these 0s to ‘cover’ all non-EID* column 20

Solution: EDP Algorithm • Element Distinguishing Problem (EDP) • The element to be distinguished is EID* • Greedily select E Frames in which the most number of E-IDs can be told apart from EID* • In the example, the greedy algorithm will select e1 or e3 first, because we can tell two E-IDs are not EID* • Repeat the greedy selection until EID* is distinguishable

EDP(cont’d) 22 Approximation results can be achieved with the greedy heuristic algorithm for the set cover problem

V-Retrieval • General Problem • Find the corresponding VID* from the frames selected by E-Filtering • VID* is the only one that should appear in all the frames after E filtering. So an intersection operation can give VID*. • Largest Challenge • Indistinct V-IDs: do not know for sure which person is which in different frames • Solution • nBM algorithm: find the VID with the largest probability of appearing in all V frames. 23

The nBM Algorithm Similar? • Find whether an VID appears in each V frame based on similarity scores • Using Maximum Likelihood Criterion to choose the VID whose appearance/ disappearance agrees with EID* best. Dummy VID to indicate that VID1 is not similar to any VIDs in this frame • n-partite Best Match Problem (nBM) • Find the VID* that matches the visual appearance of EID* best • Put all VIDs in different frames in n different circles • n-partite graph (right) 24

Practical Considerations √ √ √ √ √ √ √ √ □ practical case of our focus solved √ The baseline case we have studied • In the baseline case, we assumed that the information of E-IDs and V-IDs is complete. • However, in realistic cases, we may have • Ghost V-ID or missing V-ID • Missing E-ID 25

Solutions to Practical Problems Time 1 EIDi 0 smoothing 1 EIDi 0 1 EIDi 0 smoothing 1 EIDi 0 • Careful Deployment • Make sure that the coverage of the camera and the wireless detectors are roughly the same • nBM is probability based, so it is naturally resistant to noises • Select appropriate threshold in nBM for better tradeoff between noise resistance and performance • Generalized EDP • Handle missing/ghost E-ID • Introduction of fuzzy logic to improve the robustness of EDP • Use RSSI for estimation and smoothing 26

A Quick Recap of Our Solutions 27

Implementation • Real world implementation • One camera viewing from above to collect V frames • 1-3 laptops around sniffing the WiFi traffic to collect E frames • Tested on campus • Gymnasium • Library 28

Experimental Evaluations Scenario 1: Gymnasium 6 people 28 frames Scenario 2: Library 8 people 40 frames • Real world experiments • Successfully find the VID* • Minimum frames needed for Scenario 1 is 3, and we achieve 3 • Minimum frames needed for Scenario 2 is 3, and we achieve 4 29

Large Scale Simulation-based Evaluations • Evaluation settings • Networks of cameras and wireless detectors at three locations • ~120 people moving randomly • Much less video frames to process (left) • High Accuracy (right) 30

E-V Surveillance: Problem Space Uncooperative Cooperative Tracking Onsite Offline 31

Final Remarks • Existing visual surveillance system is not efficient • Our E-V system • Integrates the E signals and V signals for efficient visual surveillance • Implemented in real world • Many open issues left, still a long way to go 32

References [1] Big Apple is Watching You: http://www.slate.com/articles/news_and_politics/explainer/2010/05/big_apple_is_watching_you.html [2] http://articles.chicagotribune.com/2010-05-06/news/ct-oped-0506-chapman-20100506_1_surveillance- cameras-vandalism-effect-on-violent-crime [3] http://news.bbc.co.uk/2/hi/4659093.stm [4] D. Smith, et.al, “Approaches to Multisensor Data Fusion in TargetTracking: A Survey”, Knowledge and Data Engineering, IEEE Transactionson, 2006. [5] S. Cho, et.al, “Association and Identification in HeterogeneousSensors Environment with Coverage Uncertainty”, IEEE AdvancedVideo and Signal Based Surveillance, 2009. 2014/11/16 33

Backup Slides

A Case Study A typical surveillance scenario Problem formation in E-V integration Our solution Implementation and Evaluations 2014/11/16 35

GEDP Algorithm • Clearly NP-hard • We can reduce EDP to GEDP • Heuristic algorithm based on the subset sum approximation algorithm 36

The nBM Algorithm • Similarity matrix for all V-IDs which have appeared • n-partite Best Match Problem (nBM) • Find the VID* that matches the visual appearance of EID* best • Put all VIDs in different frames in n different circles • n-partite graph (right) 37

nBM (cont’d) VID1 is in v2, and appears as VID2 VID1 is not in v2 • Maximum Likelihood matching • Given the observed VID1 … VIDm • Which VID is the best candidate • Calculate the probability of all VIDi across all V frames • Select the VID with the largest probability 38

E-V: Efficient Visual Surveillance with Electronic Footprints