200 likes | 318 Views
Social Networks and Surveillance : Evaluating Suspicion by Association. Ryan P. Layfield Dr. Bhavani Thuraisingham Dr. Latifur Khan Dr. Murat Kantarcioglu The University of Texas at Dallas. {layfield, bxt043000, lkhan, muratk}@utdallas.edu. Overview. Introduction Our Goal System Design
E N D
Social Networks and Surveillance: Evaluating Suspicion by Association Ryan P. Layfield Dr. Bhavani Thuraisingham Dr. Latifur Khan Dr. Murat Kantarcioglu The University of Texas at Dallas {layfield, bxt043000, lkhan, muratk}@utdallas.edu
Overview • Introduction • Our Goal • System Design • Social Networks • Threat Detection • Correlation Analysis • The Experiment • Setup • Current Results • Issues • Future Work
Introduction • Automated message surveillance is essential to communication monitoring • Widespread use of electronic communication • Exponential data growth • Impossible to sift through all ‘by hand’ • Going beyond basic surveillance • Identifying groups rather than individuals • Monitoring conversations rather than messages
Our Goal • Design new techniques and apply existing algorithms to… • Create a machine-understandable model of existing social networks • Identify abnormal conversations and behavior • Monitor a given communications system in real-time • Continuously learn and adapt to a dynamic environment
System Design • Three major components: • Social Network Modeler • Initial Activity Detector • Correlated Activity Investigator
Social Networks • Individuals engaged in suspicious or undesirable behavior rarely act alone • We can infer than those associated with a person positively identified as suspicious have a high probability of being either: • Accomplices (participants in suspicious activity) • Witnesses (observers of suspicious activity) • Making these assumptions, we create a context of association between users of a communication network
Social Networks • Within our model: • Every node is a unique user • Every message creates or strengthens a link between nodes • Over time, the network changes • Frequent communication leads to stronger links • Intermittent messaging implies weakening social ties • The strength of the link implies how strong an association between individuals is • From this data, we can theoretically identify • Hubs • Groups • Liaisons
Threat Detection • Every message sent is scrutinized in the interest of identifying suspicious communication • Keywords analysis • Prior context (i.e. previous message content) • When a detection algorithm yields a strong result, a token is created • The token is created at the origin and passed to the recipient(s) • Existing tokens, if any, are cloned instead • The result is a web that potentially reflects the dissemination of suspicious information activity
Correlation Analysis • Future messages with similar suspicious topics are not always identifiable with the same ‘initial’ techniques • Quick replies • Pronoun use • Assumption that recipient is aware of topic • If a token is present at the sender when a message is sent: • Message token is associated with and new message are analyzed • If analysis yields a strong match, the token is further cloned and passed to recipient
The Experiment • A rare set of words shared between two or more messages are candidates for keyword analysis, but they are not always easily sifted from ‘noise’ • Noise within text-based messages comes in a variety of forms • Misspelled words • Unusual word choice • Incompatible variations of the same language (i.e. British vs. American English) • Unexpected language • However, we do not want to eliminate potential keywords • Document names • Terminology specific to a subject • ‘Buzz’ words
The Experiment • We proposed an experiment that attempts to eliminate false positives due to noisy data while strengthening and expanding our correlation techniques
Setup • Tools • Running word ‘rank’ database • Implementation of word set theory infrastructure • JAMA Matrix Library • Singular Value Decomposition • Our Approach • Apply SVD noise filtering based on 100 messages • Analyze word frequency correlation between current message and prior suspicious messages • Generate a score based on the results
Construct a matrix based on the last 100 messages Setup messages More common words Less common
Setup • Decompose and rebuild VT U A Eliminate ‘weak’ singular values
Setup Pulled from messages j and k ‘Raw’ total score for word wi Pulled from ‘running’ word database Counts only intersection of words Predefined fixed threshold
Current Results • Method is not currently accurate • Large fluctuations • Correlation easily swayed by plethora of common words • Uncommon words not given enough weight
Current Results 1000 messages evaluated, first 100 used to seed word ranks.
Issues • Word frequencies fluctuate wildly during beginning of experiment (0.0 – 10.0+) • Extreme cost for current construction methods and computation • Filtering context limited to recent global history • Affected by large bodies of text
Future Work • Tap potential of existing matrix for further analysis • Adaptive filtering feedback algorithms • Speed improvements to accommodate real-time streams • Flexible communication platform monitoring • Addition of pipe architecture for modular threat detection and correlation