1 / 82

Mining Large Graphs: Spectral Methods, Tensors and Influence propagation

Mining Large Graphs: Spectral Methods, Tensors and Influence propagation. Christos Faloutsos CMU. Thanks. Alex Smola Jia Yu (Tim) Pan. Roadmap. Graph problems: G1: Fraud detection – BP G2: Botnet detection – spectral G3: Beyond graphs: tensors and ``NELL’’

onan
Download Presentation

Mining Large Graphs: Spectral Methods, Tensors and Influence propagation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Large Graphs: Spectral Methods, Tensors and Influence propagation Christos Faloutsos CMU

  2. Thanks • Alex Smola • Jia Yu (Tim) Pan C. Faloutsos (CMU)

  3. Roadmap • Graph problems: • G1: Fraud detection – BP • G2: Botnet detection – spectral • G3: Beyond graphs: tensors and ``NELL’’ • Influence propagation and spike modeling • C1: spikeM model • Conclusions C. Faloutsos (CMU)

  4. E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07] C. Faloutsos (CMU)

  5. E-bay Fraud detection C. Faloutsos (CMU)

  6. E-bay Fraud detection C. Faloutsos (CMU)

  7. E-bay Fraud detection - NetProbe C. Faloutsos (CMU)

  8. details E-bay Fraud detection - NetProbe Compatibility matrix heterophily C. Faloutsos (CMU)

  9. Background 1: Belief Propagation Equations ~bi (xi ) [Pearl ‘82][Yedidia+ ‘02] …[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10] C. Faloutsos (CMU)

  10. Popular press And less desirable attention: • E-mail from ‘Belgium police’ (‘copy of your code?’) C. Faloutsos (CMU)

  11. Roadmap • Graph problems: • G1: Fraud detection – BP • Ebay • Symantec • Unification • G2: Botnet detection – spectral • G3: Beyond graphs: tensors and ``NELL’’ • Influence propagation and spike modeling • Conclusions C. Faloutsos (CMU)

  12. PATENT PENDING SDM 2011, Mesa, Arizona Polonium: Tera-Scale Graph Mining and Inference for Malware Detection Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept

  13. Polonium: The Data 60+ terabytes of dataanonymously contributedby participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges C. Faloutsos (CMU)

  14. Polonium: Key Ideas • Use “guilt-by-association” (i.e., homophily) • E.g., files that appear on machines with many bad files are more likely to be bad • Scalability: handles 37 billion-edge graph C. Faloutsos (CMU)

  15. Polonium: One-Interaction Results Ideal 84.9% True Positive Rate1% False Positive Rate True Positive Rate % of malware correctly identified False Positive Rate % of non-malware wrongly labeled as malware C. Faloutsos (CMU)

  16. Roadmap • Graph problems: • G1: Fraud detection – BP • Ebay • Symantec • Unification • G2: Botnet detection – spectral • G3: Beyond graphs: tensors and ``NELL’’ • Influence propagation and spike modeling • Conclusions C. Faloutsos (CMU)

  17. Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5-9 September 2011, Athens, Greece

  18. Problem Definition:GBA techniques ? Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects) ? ? ? C. Faloutsos (CMU)

  19. Homophily and Heterophily homophily heterophily NOTall methods handle heterophily BUT proposed method does! Step 1 All methods handle homophily Step 2 C. Faloutsos (CMU)

  20. Are they related? • RWR (Random Walk with Restarts) • google’s pageRank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) • minimize the differences among neighbors • BP (Belief propagation) • send messages to neighbors, on what you believe about them C. Faloutsos (CMU)

  21. YES! Are they related? • RWR (Random Walk with Restarts) • google’s pageRank (‘if my friends are important, I’m important, too’) • SSL (Semi-supervised learning) • minimize the differences among neighbors • BP (Belief propagation) • send messages to neighbors, on what you believe about them C. Faloutsos (CMU)

  22. Background 1: Belief Propagation Equations [Pearl ‘82][Yedidia+ ‘02] …[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10] C. Faloutsos (CMU)

  23. Correspondence of Methods d1 d2 d3 0 1 0 1 0 1 0 1 0 ? 0 1 1 prior labels/ beliefs final labels/ beliefs adjacency matrix C. Faloutsos (CMU)

  24. Correspondence of Methods d1 d2 d3 0 1 0 1 0 1 0 1 0 ? 0 1 1 prior labels/ beliefs final labels/ beliefs adjacency matrix We know when it converges! C. Faloutsos (CMU)

  25. Results: Scalability runtime (min) # of edges (Kronecker graphs) FABP is linear on the number of edges. C. Faloutsos (CMU)

  26. Results: Parallelism % accuracy FABP ~2x faster & wins/ties on accuracy. runtime (min) C. Faloutsos (CMU)

  27. Conclusions for BP • ‘NetProbe’, ‘Polonium’, and belief propagation: exploit network effects. • FaBP: fast & accurate (and -> convergence conditions) C. Faloutsos (CMU)

  28. Roadmap • Graph problems: • G1: Fraud detection – BP • Ebay • Symantec • Unification • G2: Botnet detection – spectral • G3: Beyond graphs: tensors and ``NELL’’ • Influence propagation and spike modeling • Conclusions C. Faloutsos (CMU)

  29. EigenSpokes B. Aditya Prakash, Mukund Seshadri, Ashwin Sridharan, Sridhar Machiraju and Christos Faloutsos: EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs, PAKDD 2010, Hyderabad, India, 21-24 June 2010. C. Faloutsos (CMU)

  30. EigenSpokes • Eigenvectors of adjacency matrix • equivalent to singular vectors (symmetric, undirected graph) C. Faloutsos (CMU)

  31. EigenSpokes details • Eigenvectors of adjacency matrix • equivalent to singular vectors (symmetric, undirected graph) N N C. Faloutsos (CMU)

  32. EigenSpokes details • Eigenvectors of adjacency matrix • equivalent to singular vectors (symmetric, undirected graph) N N C. Faloutsos (CMU)

  33. EigenSpokes details • Eigenvectors of adjacency matrix • equivalent to singular vectors (symmetric, undirected graph) N N C. Faloutsos (CMU)

  34. EigenSpokes details • Eigenvectors of adjacency matrix • equivalent to singular vectors (symmetric, undirected graph) N N C. Faloutsos (CMU)

  35. EigenSpokes 2nd Principal component • EE plot: • Scatter plot of scores of u1 vs u2 • One would expect • Many points @ origin • A few scattered ~randomly u2 u1 1st Principal component C. Faloutsos (CMU)

  36. EigenSpokes • EE plot: • Scatter plot of scores of u1 vs u2 • One would expect • Many points @ origin • A few scattered ~randomly u2 90o u1 C. Faloutsos (CMU)

  37. EigenSpokes - pervasiveness • Present in mobile social graph • across time and space • Patent citation graph C. Faloutsos (CMU)

  38. EigenSpokes - explanation Near-cliques, or near-bipartite-cores, loosely connected C. Faloutsos (CMU)

  39. EigenSpokes - explanation Near-cliques, or near-bipartite-cores, loosely connected C. Faloutsos (CMU)

  40. EigenSpokes - explanation Near-cliques, or near-bipartite-cores, loosely connected C. Faloutsos (CMU)

  41. EigenSpokes - explanation Near-cliques, or near-bipartite-cores, loosely connected So what? • Extract nodes with high scores • high connectivity • Good “communities” spy plot of top 20 nodes C. Faloutsos (CMU)

  42. Bipartite Communities! patents from same inventor(s) `cut-and-paste’ bibliography! magnified bipartite community C. Faloutsos (CMU)

  43. (maybe, botnets?) Victim IPs? Botnet members? Exploring it with Dr. Eric Mao (III-Taiwan) C. Faloutsos (CMU)

  44. Roadmap • Graph problems: • G1: Fraud detection – BP • G2: Botnet detection – spectral • G3: Beyond graphs: tensors and ``NELL’’ • Influence propagation and spike modeling • Conclusions C. Faloutsos (CMU)

  45. GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Abhay Harpale Evangelos Papalexakis Christos Faloutsos KDD’12 C. Faloutsos (CMU)

  46. Background: Tensors • Tensors (=multi-dimensional arrays) are everywhere • Hyperlinks &anchor text [Kolda+,05] 1 Anchor Text 1 1 C# 1 C++ URL 2 1 1 1 Java URL 1 C. Faloutsos (CMU)

  47. Background: Tensors • Tensors (=multi-dimensional arrays) are everywhere • Sensor stream (time, location, type) • Predicates (subject, verb, object) in knowledge base “Eric Claptonplays guitar” (48M) NELL (Never Ending Language Learner) data Nonzeros =144M “Barack Obamaispresidentof U.S.” (26M) (26M) C. Faloutsos (CMU)

  48. Background: Tensors • Tensors (=multi-dimensional arrays) are everywhere • Sensor stream (time, location, type) • Predicates (subject, verb, object) in knowledge base Anomaly Detection in Computer networks Time-stamp IP-source IP-destination C. Faloutsos (CMU)

  49. Problem Definition • How to decompose a billion-scale tensor? • Corresponds to SVD in 2D case C. Faloutsos (CMU)

  50. Problem Definition • How to decompose a billion-scale tensor? • Corresponds to SVD in 2D case ‘Artists’ ‘Politicians’ C. Faloutsos (CMU)

More Related