270 likes | 485 Views
Clarifying Sensor Anomalies using Social Network feeds. Prasanna Giridhar * , Tanvir Amin * , Lance Kaplan + , Jemin George + , Raghu Ganti ++ , Tarek Abdelzaher *. * University of Illinois at Urbana Champaign + U.S. Army Research Lab ++ IBM Research, USA. INTRODUCTION.
E N D
Clarifying Sensor Anomalies using Social Network feeds Prasanna Giridhar*, Tanvir Amin*, Lance Kaplan+, Jemin George+, Raghu Ganti++, Tarek Abdelzaher* * University of Illinois at Urbana Champaign +U.S. Army Research Lab ++IBM Research, USA
INTRODUCTION • Explosive growth in deployment of physical sensors. • Many times activities recorded by these sensors deviate from the norm: • Closure of a freeway due to forest fire. • Change in building occupancy due to shutdown. • Unusual behavior tend to attract human attention and get reported socially as well.
MOTIVATION • Several research works in the past for detecting events in the physical as well as the social domain. • Can we use the social media as a tool for explaining the underlying cause of anomalies? • A system for identifying the discriminative social feeds that can be correlated with sensor anomalies. • The more unusual the event, higher probability. • Evaluation performed on real time traffic data.
System Work-flow STEP 1: Initialization of the system Continuous stream of tweets using parameters • Keywords • Location Continuous stream of data from physical sensors
Detecting events in Sensors STEP 2: Identification of sensor anomalies • Run a black box algorithm. • Store attributes for sensors classified positively by the algorithm • Cluster the sensors which provide redundant data
Detecting events in Sensors STEP 2: Identification of sensor anomalies • Run a black box algorithm. • Store attributes for sensors classified positively by the algorithm • Cluster the sensors which provide redundant data t1,t2
Detecting events in Sensors STEP 2: Identification of sensor anomalies • Run a black box algorithm. • Store attributes for sensors classified positively by the algorithm • Cluster the sensors which provide redundant data
Discriminative Social Feeds STEP 3: Identification of discriminative social feeds • Social feeds often have keywords describing an event • Keywords: malaysian, airlines, 370
Keyword Signatures Single Keyword? Airlines
Keyword Signatures Keyword pair? Malaysian, Airlines
Keyword Signatures Keyword triplet? Malaysia, Airlines, 370 Malaysia, Airlines, Satellite
Keyword Signatures • Signature profile on the twitter data collected • Ideal 1-to-1 mapping for keyword pair
Possible Approaches Problem: Given a list of keyword pairs for the current and past window, how to find the most discriminating subset? Difference in rate of occurrences: (traffic,jam) 50 times today compared to past average of 35 (drunk, kills) 12 times today compared to a past average of 0. Increase in percentage: (traffic,jam) 1 time today compared to past average of 0 (drunk, kills) 12 times today compared to a past average of 2 Overcome disadvantages using Information Gain Theory
Information Gain Theory and Entropy Entropy measures randomness introduced by a variable Using conditional entropy value determine information gain about an event by the keyword pair. This can be formulated as: Information Gain = H(Y) − H(Y|X) Y: variable associated with event; y=0 (normal) and y=1 (anomalous) X: variable associated with keyword pair; x=0 (absent) and x=1 (present)
Rank the unusual events STEP 4: Ranking discriminative events • Identify tweets for discriminative pairs. • Score proportional to conditional entropy. • The lower the entropy value, the higher is the discriminating power.
Mapping both events STEP 5: Matching tweets with sensor anomalies We align both the data based on spatiotemporal properties associated with the event. For example • Sensor ID40456 on I-15 Northbound with unusual activity • Unusual Tweet: “SFvSD game tonight, stuck @15N traffic!!!”
Output Explanations STEP 6: Output the matched explanations • Final step is to provide the explanations. • A user interface which enables to track unusual events on a per-day basis.
EXPERIMENTAL RESULTS • Twitter feeds collected for a period of 2 weeks: Aug 19 to September 01, 2013 with a radius of 30 miles • Three cities in CA: • Los Angeles • San Francisco • San Diego • Physical sensors data retrieved from PeMS (Caltrans Performance Measurement System http://pems.dot.ca.gov/ ) : 5 minutes report for flow, speed, occupancy, delay
EXPERIMENTAL RESULTS Performance measured using Precision and Mean Average rank for our Information gain theory approach against other baseline approaches Table: Precision using different methods B1 corresponds to Difference in rate of occurrences and B2 to Increase in percentage. Table: Average position of tweets from the top
INTERESTING EVENTS Sensor anomaly detected • Highway I-80 Eastbound in SF • Landmarks: Bay bridge • Duration: 4 days
INTERESTING EVENTS US101 blockage due to Bomb squad in LA
INTERESTING EVENTS Traffic on 15N due to game in SD
CONCLUSION • Abnormal behavior recorded in social medium. • Tool to explain the abnormalities. • Major activities explained with high precision. • Explanations ranked among top two tweets.
Future Work • Scalability Issues • Credibility of social feeds • Geo localization of tweets
THANK YOU Q+A