410 likes | 526 Views
Research Overview. Xintao Wu Aug 25,2014. Outline. Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation Fraud Detection in Social Networks Spectral analysis of graph topology Detecting Random Link Attacks Detecting weak anomalies
E N D
Research Overview Xintao Wu Aug 25,2014
Outline • Introduction • Privacy Preserving Social Network Analysis • Input perturbation • Output perturbation • Fraud Detection in Social Networks • Spectral analysis of graph topology • Detecting Random Link Attacks • Detecting weak anomalies • Sample Projects • Conclusions and Future work
Trustworthy Computing • Trustworthy = reliability, security, privacy, usability • Sample research challenges • Understand and capture emergent behaviors/interactions among regular users, fraudsters, and victims • Design secure, survivable, persistent systems when under attack • Enable privacy protection in collecting/analyzing/sharing personal data
Privacy Breach Cases • Nydia Velázquez (1994) • Medical record on her suicide attempt was disclosed • AOL Search Log (2006) • Anonymized release of 650K users’ search histories lasted for less than 24 hours • NetFlix Contest (2009) • $1M contest was cancelled due to privacy lawsuit • 23andMe (2013) • Genetic testing was ordered to discontinue by FDA due to genetic privacy
Acxiom • Privacy • In 2003, the EPIC alleged Acxiom provided consumer information to US Army "to determine how information from public and private records might be analyzed to help defend military bases from attack." • In 2013 Acxiom was among nine companies that the FTC investigated to see how they collect and use consumer data. • Security • In 2003, more than 1.6 billion customer records were stolen during the transmission of information to and from Acxiom's clients.
Privacy Regulation -- Forrester • Most restricted • Restricted • Minimal restrictions • Some restrictions • No legislation or no information • Effectively no restrictions
Privacy Protection Laws • USA • HIPAA for health care • Grann-Leach-Bliley Act of 1999 for financial institutions • COPPA for children online privacy • State regulations, e.g., California State Bill 1386 • Canada • PIPEDA 2000 - Personal Information Protection and Electronic Documents Act • European Union • Directive 94/46/EC - Provides guidelines for member state legislation and forbids sharing data with states that do not protect privacy • Contractual obligations • Individuals should have notice about how their data is used and have opt-out choices
Privacy Preserving Data Mining 69% unique on zip and birth date 87% with zip, birth date and gender Generalization (k-anonymity, l-diversity, t-closeness) Randomization
Social Network Data Data miner • Data owner release
Threat of Re-identification • Attacker attack • Privacy breaches • Identity disclosure • Link disclosure • Attribute disclosure
Privacy Preservation in Social Network Analysis • Input Perturbation • K-anonymity • Generalization • Randomization • Output Perturbation • Background on differential privacy • Differential privacy preserving social network mining
Our Work • Feature preservation randomization • Spectrum preserving randomization (SDM08) • Markov chain based feature preserving randomization (SDM09) • Reconstruction from randomized graph (SDM10) • Link privacy (from the attacker perspective) • Exploiting node similarity feature (PAKDD09 Best Student Paper Runner-up Award) • Exploiting graph space via Markov chain (SDM09)
Output Perturbation Data miner • Data owner Query f Query result + noise Cannot be used to derive whether any individual is included in the database
Differential Guarantee [Dwork, TCC06] f count(#cancer) • K f(x) + noise 3 + noise f count(#cancer) • K f(x’) + noise 2 + noise achieving Opt-Out
Our Work • DP-preserving cluster coefficient (ASONAM12) • Divide and conquer • Smooth sensitivity • DP-preserving spectral graph analysis (PAKDD13) • LNPP: based on the Laplace Noise Perturbation • SBMF: based on the Exponential Mechanism and MBF density • Linear-refinement of DP-preserving query answering (PAKDD13 Best Application Paper) • DP-preserving graph generation based on degree correlation (TDP13)
Outline • Introduction • Privacy Preserving Social Network Analysis • Input perturbation • Output perturbation • Fraud Detection • Spectral analysis of graph topology • Detecting Random Link Attacks • Detecting weak anomalies • Sample Projects • Conclusions and Future work
Cyber Fraud • Cyber crime • cost US economy $400 Billion annually • OSN Fraud and Attack • Sybil attack, spam, viral marketing, fraudulent auction, brand jacking, denial of service, etc. • Fake followers on Twitter (used in viral marketing) worth $360 million annually on the black market.
Topology-based Detection • Fraud Characterization • Individual vs. collusive • Robot vs. money-motivated regular user • Random vs. selective target • Static vs. dynamic • Traditional topology-based detection methods • incur high computational cost • difficult to detect collaborative attacks or subtle anomalies
Random Link Attack [Shirvastava ICDE08] • An abstraction of collaborative attacks including spam, viral marketing, etc. • The attacker creates some fake nodes and uses them to attack a large set of randomly selected regular nodes; • Fake nodes also mimic the real graph structure among themselves to evade detection.
Spectral Graph Analysis based Fraud Detection Examine the spectral space of graph topology. • A network with n nodes and m edges that is undirected, un-weighted, and without considering link/node attribute information • Adjacency Matrix A (symmetric) • Adjacency Eigenspace
Eigenspace Principal Minor
Projecting Node in Spectral Space [SDM09] Spectral coordinate: • k-orthogonal line pattern when nodes u, v from the same community when nodes u, v from different communities
Example Spectral coordinate: Polbook Network
Evaluation on Web spam challenge data [ICDE11] A snapshot of websites in domain .UK (2007) (114K nodes and 1.8M links), add a mix of 8 RLAs with varied sizes and connection patterns. SPCTRA: based on spectral space GREEDY: based on outer-triangles [Shrivastava, ICDE08] Much faster 36s vs. 26h
Outline • Introduction • Privacy Preserving Social Network Analysis • Input perturbation • Output perturbation • Fraud Detection • Spectral analysis of graph topology • Detecting random link attacks • Detecting weak anomalies • Sample Projects • Conclusions and Future work
Genetic Privacy (NSF SCH pending) BIBM13 Best Paper Award
Manipulation in E-Commerce (NSF III pending) Reviews Ratings Ranks • Bot-committed • Money-motivated Spectral Bipartite Graph Analysis Structured Topic Analysis D-S based Evidence Fusion
Privacy Preserving Database Application testing (NSF 0310974) ER DDL Production db Catalog Data Schema & Domain Filter User R NR S Conflict resolution Disclosure Assessment Rule Analyzer R’ NR’ S’ Schema’ Domain’ Data Generator Mock DB 33
Data Generation for Testing DB Applications (NSF 0915059) How to generate data to cover paths? 34
Outline • Introduction • Privacy Preserving Social Network Analysis • Input perturbation • Output perturbation • Fraud Detection • Spectral analysis of graph topology • Detecting Random Link Attacks • Detecting weak anomalies • Sample Projects • Conclusions and Future work
Big Data Computing • Drowning in data • Volume, Velocity, Variety, and Veracity • 2.5 Exabyte every day • Web data, healthcare, e-commerce, social network • Advancing technology • Cheap storage/processing power • Growth in huge data centers • Data is in the “cloud”- Amazon AWS, Hadoop, Azure • Computing is in the “cloud”
Social Media Customer Analytics Unstructured text (e.g., blog, tweet) Product and review Transaction database Structured profile Entity resolution Patterns Temporal/spatial Scalability Visualization Sentiment Privacy Velocity, Variety 10GB tweets per day Belk and Lowe’s Chancellor’s special fund Network topology (friendship,followship,interaction) Retweet sequence
Samsung AVC Denial Log Analysis Volume and Velocity:1 million log files per day and each has thousands entries S3, Hive and EMR
Drivers of Data Computing Reliability Security Privacy Usability 6A’s Anytime Anywhere Access to Anything by Anyone Authorized 4V’s Volume Velocity Variety Veracity
Thank You! Questions? Collaborators: Aidong Lu, Xinghua Shi, Jun Li (Oregon), Dejing Dou (Oregon), Tao Xie (UIUC) Doctoral graduates: SongtaoGuo, Ling Guo, Kai Pan, Leting Wu, Xiaowei Ying Doctoral Students: Yue Wang, Yuemeng Li, ZhilinLuo (visiting)