1 / 58

ddBall: Spotting A n o m a l i e s in Weighted Graphs

ddBall: Spotting A n o m a l i e s in Weighted Graphs. Leman Akoglu , Mary McGlohon , Christos Faloutsos Carnegie Mellon University School of Computer Science Pittsburgh, Pennsylvania, USA. Motivation. Anomaly detection in networks (graph data) has important applications:

roddy
Download Presentation

ddBall: Spotting A n o m a l i e s in Weighted Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ddBall: Spotting Anomalies in Weighted Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School of Computer Science Pittsburgh, Pennsylvania, USA

  2. Motivation • Anomaly detection in networks (graph data) has important applications: • Computer networks  spammers, port scanners • Phone-call networks  telemarketers, misbehaving costumers, faulty equipment • Social networks  ‘popularity contests’ • Account networks  scammers, transfer fraud • Terrorist networks  tight groups of people Akoglu, McGlohon, Faloutsos

  3. Problem Q1. Given a weighted and unlabeled graph, how can we spot strange, abnormal, extreme nodes? Q2. Can we explain why the spotted nodes are anomalous? Akoglu, McGlohon, Faloutsos

  4. Preliminaries I – What is an anomaly? • No clear and unique definition! “An observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism.” [Hawkins, 80] Akoglu, McGlohon, Faloutsos

  5. Preliminaries II – Weights $15K $5K $10K 3 1 Bipartite Unipartite Akoglu, McGlohon, Faloutsos 5

  6. Preliminaries III – Power Laws Pr[X≥x] ~ cx-α ln(Pr[X≥x]) ~ -α(c lnx) c ≥ 0, α ≥ 0 slope = -α log-log plot lin-lin plot Akoglu, McGlohon, Faloutsos 6

  7. ‘Power Law’ Example Total weight #Source nodes Densification Power Law [Leskovec ‘05] Weight Power Law [McGlohon ‘08] #Destination nodes # Edges DBLP Keyword-to-Conference Network Akoglu, McGlohon, Faloutsos 7

  8. ‘Power Law’ Example e.g. John Kerry, $10M received, from 1K donors In-weights($) In-degree (# donors) 2004 US FEC Committees to Candidates network Snapshot Power Law [McGlohon et al.‘08] Akoglu, McGlohon, Faloutsos

  9. Preliminaries IV – how to fit Least Squares fit to medians! Akoglu, McGlohon, Faloutsos

  10. Problem revisited Q1. Given a weighted and unlabeled graph, how can we spot strange, abnormal, extreme nodes? Q2. Can we explain why the spotted nodes are anomalous? Akoglu, McGlohon, Faloutsos

  11. Problem sketch Akoglu, McGlohon, Faloutsos

  12. Main idea For each node, P.1) extract ‘ego-net’ (=1-step-away neighbors) P.2) extract features (#edges, total weight, etc.) P.3) extract patterns (norms) P.4) anomaly detection: compare with the rest of the population C. Faloutsos (CMU)

  13. Outline • Motivation • Preliminaries and Problem Definition 3. Proposed Method • Study of ego-nets • Laws and Observations • Anomaly detection • Datasets • Experiments • Discussion & Conclusion Akoglu, McGlohon, Faloutsos

  14. P.1 What is an egonet? ego-net ego Akoglu, McGlohon, Faloutsos

  15. What is odd? Akoglu, McGlohon, Faloutsos

  16. What is “anomalous”? telemarketer, port scanner, people adding friends indiscriminatively, etc. Near-star tightly connected people, terrorist groups?, discussion group, etc. Near-clique Leman Akoglu

  17. What is “anomalous”? too much money wrt number of accounts, high donation wrt number of donors, etc. Heavy vicinity single-minded, tight company Dominant heavy link Leman Akoglu 17

  18. P.2What features… … should we extract so that to project nodes into a low-dimensional space?  features that could yield “laws”  features easy to compute and interpret Leman Akoglu 18

  19. Selected Features • Ni: number of neighbors (degree) of ego i • Ei: number of edges in egonet i • Wi: total weight of egonet i • λw,i: principal eigenvalue of the weighted adjacency matrix of egonet i Akoglu, McGlohon, Faloutsos

  20. details λw,i = √N = √E = √W λw,i > √N ~ √E, √W λw,i √W λw,i = N ≈ √W λw,i = W λw,i ≈ W N: #neighbors, W: total weight Akoglu, McGlohon, Faloutsos 20

  21. Other Features • Si: number of singleton neighbors of ego iwith degree 1 • max(Wi): maximum edge weight in egoneti • max(Wi, d=1): maximum edge weight to/from a degree 1 neighbor of ego i • max(di): maximum degree of the neighbors of ego i • 2-step neighborhood features Akoglu, McGlohon, Faloutsos

  22. Outline • Motivation • Preliminaries 3. Proposed Method • Study of egonets • Laws and Observations • Anomaly detection 4. Datasets 5. Experiments 6. Discussion & Conclusion Akoglu, McGlohon, Faloutsos 22

  23. Observation 1: Egonet Density Power Law (EDPL) P.3What patterns? Q1: How does the number of neighbors N of the egonet relate to the number of edges E? Akoglu, McGlohon, Faloutsos 23

  24. Observation 1: Egonet Density Power Law (EDPL) Ei ∝ Niα 1 ≤ α ≤ 2 Leman Akoglu 24

  25. Observation 2: Egonet Weight Power Law (EWPL) P.3What patterns? Q2: How does the total weight W of the egonet relate to the number of edges E? Akoglu, McGlohon, Faloutsos 25

  26. Observation 2: Egonet Weight Power Law (EWPL) Wi ∝ Eiβ β ≥ 1 26

  27. Observation 3: Egonet λw Power Law (ELWPL) P.3What patterns? Q3: How does the largest eigenvalue λw of the weighted adjacency matrix of the egonet relate to the total weight W? Akoglu, McGlohon, Faloutsos 27

  28. Observation 3: Egonet λw Power Law (ELWPL) λw,i∝ Wiγ 0.5 ≤ γ ≤ 1 28

  29. Outline • Motivation • Preliminaries 3. Proposed Method • Study of egonets • Laws and Observations • Anomaly detection 4. Datasets 5. Experiments 6. Discussion & Conclusion Akoglu, McGlohon, Faloutsos 29

  30. P.4 Anomaly detection violates our “laws” too far away from the rest of the points Anomaly ≈ Akoglu, McGlohon, Faloutsos 30

  31. scoredist= distance to fitting line scoreoutl= outlierness score score = func ( scoredist , scoreoutl) • can tell what kind of anomaly a node belongs to • can sort nodes wrt their outlierness scores Akoglu, McGlohon, Faloutsos 31

  32. Outline • Motivation • Preliminaries • Proposed Method • Study of egonets • Laws and Observations • Anomaly detection • Datasets • Experiments • Discussion & Conclusion Akoglu, McGlohon, Faloutsos 32

  33. Datasets Bipartite networks: |N| |E| 1. Don2Com 1.6M 2M 2. Com2Cand 6K 125K 3. Auth2Conf 421K 1M Unipartite networks: |N| |E| 5. BlogNet 27K 126K 6. PostNet 223K 217K 7. Enron 36K 183K 8. Oregon 11K 38K Akoglu, McGlohon, Faloutsos 33

  34. Outline • Motivation • Preliminaries • Proposed Method • Study of egonets • Laws and Observations • Anomaly detection • Datasets • Experiments • Discussion & Conclusion Akoglu, McGlohon, Faloutsos 34

  35. Experimental Results Akoglu, McGlohon, Faloutsos 35

  36. Near-Clique/Star Leman Akoglu 36

  37. Near-Clique/Star Akoglu, McGlohon, Faloutsos 37

  38. Experimental Results Akoglu, McGlohon, Faloutsos 38

  39. Heavy Vicinity Akoglu, McGlohon, Faloutsos 39

  40. Heavy Vicinity Akoglu, McGlohon, Faloutsos 40

  41. Experimental Results Akoglu, McGlohon, Faloutsos 41

  42. Dominant Heavy Link $87M - DNC $25M - RNC Akoglu, McGlohon, Faloutsos 42

  43. Dominant Heavy Link Leman Akoglu 43

  44. Experimental Results Akoglu, McGlohon, Faloutsos 44

  45. Outline • Motivation • Preliminaries • Proposed Method • Study of egonets • Laws and Observations • Anomaly detection • Datasets • Experiments • Discussion & Conclusion Akoglu, McGlohon, Faloutsos 45

  46. Scalability • Counting number of edges in egonets for ALL nodes is expensive! need to scan connections for all pairs of neighbors! • Can be reworded as counting local triangles • A fast method [Tsourakakis,08] exists! IDEA: • #triangles = (# paths of length 3) / 2 • # paths of length 3 for node i = (A3)ii • Computing A3 is still expensive! • Low-rank approximation! Akoglu, McGlohon, Faloutsos 46

  47. details UT S S3 ~ U A A3 kxn kxk A3 =O(n3) ~ O(nk2) nxn nxk • Prune d=1 nodes • Prune d=2 as well as d=1 nodes  smaller & sparser A matrix Akoglu, McGlohon, Faloutsos

  48. Scalability – time vs. size • Time vs. number of edges. • Effect of pruning on computation time. Solid (–): no pruning, Dashed (−−): pruning nodes w/ d ≤1, Dotted (…): pruning nodes w/ d ≤ 2 • Computation time increases linearly with increasing number of edges, while decreasing with pruning. Akoglu, McGlohon, Faloutsos 48

  49. Scalability – accuracy vs time • Time vs. accuracy. • Effect of pruning on accuracy of finding top anomalies as in the original ranking before pruning. • New rankings are scored using Normalized Cumulative Discounted Gain. • Pruning reduces time for both Node-Iterator and Eigen-Triangle while keeping accuracy at as high as ~1 and ~.9, respectively. Akoglu, McGlohon, Faloutsos 49

  50. Conclusion • OddBall, a fast, unsupervised method to detect abnormal nodes in weighted graphs. • Study of egonets; list of numerical features • Discovery of new patternsin density (Obs.1: EDPL), weights (Obs.2: EWPL), and principal eigenvalues (Obs.3: ELWPL). • Speed-up in feature extraction, with accuracy ~.9 • Experiments on real graphs of over 1M nodes, that reveal strange/extreme nodes from many different domains Software available online! http://www.cs.cmu.edu/~lakoglu/#tools Akoglu, McGlohon, Faloutsos 50

More Related