1 / 102

Influence propagation in large graphs - theorems, algorithms, and case studies

Influence propagation in large graphs - theorems, algorithms, and case studies. Christos Faloutsos CMU. Thank you!. V.S. Subrahmanian Weiru Liu Jef Wijsen. Outline. Part 1: anomaly detection OddBall (anomaly detection) Belief Propagation Conclusions Part 2: influence propagation.

radwan
Download Presentation

Influence propagation in large graphs - theorems, algorithms, and case studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Influence propagation in large graphs - theorems, algorithms, and case studies Christos Faloutsos CMU

  2. Thank you! • V.S. Subrahmanian • Weiru Liu • JefWijsen C. Faloutsos (CMU)

  3. Outline • Part 1: anomaly detection • OddBall (anomaly detection) • Belief Propagation • Conclusions • Part 2: influence propagation C. Faloutsos (CMU)

  4. OddBall: Spotting Anomaliesin Weighted Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School of Computer Science PAKDD 2010, Hyderabad, India

  5. Main idea For each node, • extract ‘ego-net’ (=1-step-away neighbors) • Extract features (#edges, total weight, etc etc) • Compare with the rest of the population C. Faloutsos (CMU)

  6. What is an egonet? egonet ego C. Faloutsos (CMU)

  7. Selected Features • Ni: number of neighbors (degree) of ego i • Ei: number of edges in egonet i • Wi: total weight of egonet i • λw,i: principal eigenvalue of the weighted adjacency matrix of egonet I C. Faloutsos (CMU)

  8. Near-Clique/Star C. Faloutsos (CMU)

  9. Near-Clique/Star C. Faloutsos (CMU)

  10. Near-Clique/Star C. Faloutsos (CMU)

  11. Near-Clique/Star Andrew Lewis (director) C. Faloutsos (CMU)

  12. Outline • Part 1: anomaly detection • OddBall (anomaly detection) • Belief Propagation • Conclusions • Part 2: influence propagation C. Faloutsos (CMU)

  13. E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07] C. Faloutsos (CMU)

  14. E-bay Fraud detection C. Faloutsos (CMU)

  15. E-bay Fraud detection C. Faloutsos (CMU)

  16. E-bay Fraud detection - NetProbe C. Faloutsos (CMU)

  17. Popular press And less desirable attention: • E-mail from ‘Belgium police’ (‘copy of your code?’) C. Faloutsos (CMU)

  18. Outline • OddBall (anomaly detection) • Belief Propagation • Ebay fraud • Symantec malware detection • Unification results • Conclusions C. Faloutsos (CMU)

  19. PATENT PENDING SDM 2011, Mesa, Arizona Polonium: Tera-Scale Graph Mining and Inference for Malware Detection Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept

  20. Polonium: The Data 60+ terabytes of dataanonymously contributedby participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges C. Faloutsos (CMU)

  21. Polonium: Key Ideas • Use Belief Propagation to propagate domain knowledge in machine-file graph to detect malware • Use “guilt-by-association” (i.e., homophily) • E.g., files that appear on machines with many bad files are more likely to be bad • Scalability: handles 37 billion-edge graph C. Faloutsos (CMU)

  22. Polonium: One-Interaction Results Ideal 84.9% True Positive Rate1% False Positive Rate True Positive Rate % of malware correctly identified False Positive Rate % of non-malware wrongly labeled as malware C. Faloutsos (CMU)

  23. Outline • Part 1: anomaly detection • OddBall (anomaly detection) • Belief Propagation • Ebay fraud • Symantec malware detection • Unification results • Conclusions • Part 2: influence propagation C. Faloutsos (CMU)

  24. Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5-9 September 2011, Athens, Greece

  25. Problem Definition:GBA techniques ? Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects) ? ? ? C. Faloutsos (CMU)

  26. Homophily and Heterophily homophily heterophily NOTall methods handle heterophily BUT proposed method does! Step 1 All methods handle homophily Step 2 C. Faloutsos (CMU)

  27. Are they related? • RWR (Random Walk with Restarts) • google’s pageRank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) • minimize the differences among neighbors • BP (Belief propagation) • send messages to neighbors, on what you believe about them C. Faloutsos (CMU)

  28. Are they related? YES! • RWR (Random Walk with Restarts) • google’s pageRank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) • minimize the differences among neighbors • BP (Belief propagation) • send messages to neighbors, on what you believe about them C. Faloutsos (CMU)

  29. Correspondence of Methods 1 1 1 d1 d2 d3 0 1 0 1 0 1 0 1 0 ? 0 1 1 prior labels/ beliefs final labels/ beliefs adjacency matrix C. Faloutsos (CMU)

  30. Results: Scalability runtime (min) # of edges (Kronecker graphs) FABP is linear on the number of edges. C. Faloutsos (CMU)

  31. Results (5): Parallelism % accuracy FABP ~2x faster & wins/ties on accuracy. runtime (min) C. Faloutsos (CMU)

  32. Conclusions • Anomaly detection: hand-in-hand with pattern discovery (‘anomalies’ == ‘rare patterns’) • ‘OddBall’ for large graphs • ‘NetProbe’ and belief propagation: exploit network effects. • FaBP: fast & accurate C. Faloutsos (CMU)

  33. Outline • Part 1: anomaly detection • OddBall (anomaly detection) • Belief Propagation • Conclusions • Part 2: influence propagation C. Faloutsos (CMU)

  34. Influence propagation in large graphs - theorems and algorithms B. AdityaPrakash http://www.cs.cmu.edu/~badityap Christos Faloutsos http://www.cs.cmu.edu/~christos Carnegie Mellon University

  35. Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] C. Faloutsos (CMU)

  36. Dynamical Processes over networks are also everywhere! C. Faloutsos (CMU)

  37. Why do we care? • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology • Social Collaboration ........ C. Faloutsos (CMU)

  38. Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks C. Faloutsos (CMU)

  39. Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? C. Faloutsos (CMU)

  40. Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) C. Faloutsos (CMU)

  41. Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users C. Faloutsos (CMU)

  42. Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing C. Faloutsos (CMU)

  43. High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches C. Faloutsos (CMU)

  44. Research Theme ANALYSIS Understanding POLICY/ ACTION Managing DATA Large real-world networks & processes C. Faloutsos (CMU)

  45. In this talk Given propagation models: Q1: Will an epidemic happen? ANALYSIS Understanding C. Faloutsos (CMU)

  46. In this talk Q2: How to immunize and control out-breaks better? POLICY/ ACTION Managing C. Faloutsos (CMU)

  47. Outline • Part 1: anomaly detection • Part 2: influence propagation • Motivation • Epidemics: what happens? (Theory) • Action: Who to immunize? (Algorithms) C. Faloutsos (CMU)

  48. A fundamental question Strong Virus Epidemic? C. Faloutsos (CMU)

  49. example (static graph) Weak Virus Epidemic? C. Faloutsos (CMU)

  50. Problem Statement # Infected above (epidemic) below (extinction) time Separate the regimes? Find, a condition under which • virus will die out exponentially quickly • regardless of initial infection condition C. Faloutsos (CMU)

More Related