1.05k likes | 1.22k Views
Large Graph Mining. Christos Faloutsos CMU. Thank you!. Dr. Yan Liu Dr. Jimeng Sun. Outline. Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Generators Problem#3: Scalability Conclusions. Graphs - why should we care?. Internet Map [lumeta.com].
E N D
Large Graph Mining Christos Faloutsos CMU
Thank you! • Dr. Yan Liu • Dr. Jimeng Sun C. Faloutsos
Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Problem#2: Generators • Problem#3: Scalability • Conclusions C. Faloutsos
Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01] C. Faloutsos
T1 D1 ... ... DN TM Graphs - why should we care? • IR: bi-partite graphs (doc-terms) • web: hyper-text graph • ... and more: C. Faloutsos
Graphs - why should we care? • network of companies & board-of-directors members • ‘viral’ marketing • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • .... C. Faloutsos
Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • Weighted graphs • Time evolving graphs • Problem#2: Generators • Problem#3: Scalability • Conclusions C. Faloutsos
Problem #1 - network and graph mining • How does the Internet look like? • How does the web look like? • What is ‘normal’/‘abnormal’? • which patterns/laws hold? C. Faloutsos
Graph mining • Are real graphs random? C. Faloutsos
Laws and patterns • Are real graphs random? • A: NO!! • Diameter • in- and out- degree distributions • other (surprising) patterns • So, let’s look at the data • (‘it is amazing what you hear, when you listen’) C. Faloutsos
-0.82 Solution# S.1 • Power law in the degree distribution [SIGCOMM99] internet domains att.com log(degree) ibm.com log(rank) C. Faloutsos
Solution# S.2: Eigen Exponent E • A2: power law in the eigenvalues of the adjacency matrix Eigenvalue Exponent = slope E = -0.48 May 2001 Rank of decreasing eigenvalue C. Faloutsos
Solution# S.2: Eigen Exponent E • [Mihail, Papadimitriou ’02]: slope is ½ of rank exponent Eigenvalue Exponent = slope E = -0.48 May 2001 Rank of decreasing eigenvalue C. Faloutsos
But: How about graphs from other domains? C. Faloutsos
users sites More power laws: • web hit counts [w/ A. Montgomery] Web Site Traffic log(count) Zipf ``ebay’’ log(in-degree) C. Faloutsos
epinions.com • who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000-people user (out) degree C. Faloutsos
Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • degree, diameter, eigen*, • triangles • cliques • Weighted graphs • Time evolving graphs • Problem#2: Generators C. Faloutsos
Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles C. Faloutsos
Triangle ‘Laws’ Real social networks have a lot of triangles Friends of friends are friends Any patterns? C. Faloutsos
Triangle Law: #S.3 [Tsourakakis ICDM 2008] HEP-TH ASN X-axis: # of Triangles a node participates in Y-axis: count of such nodes Epinions C. Faloutsos
Triangle Law: #S.4 [Tsourakakis ICDM 2008] Reuters SN X-axis: degree Y-axis: mean # triangles Notice: slope ~ degree exponent (insets) Epinions C. Faloutsos
Triangle Law: Computations [Tsourakakis ICDM 2008] details But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? C. Faloutsos
Triangle Law: Computations [Tsourakakis ICDM 2008] details But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum ( li3 ) (and, because of skewness, we only need the top few eigenvalues! C. Faloutsos
Triangle Law: Computations [Tsourakakis ICDM 2008] details 1000x+ speed-up, high accuracy C. Faloutsos
Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • degree, diameter, eigen*, • triangles • cliques • Weighted graphs • Time evolving graphs • Problem#2: Generators C. Faloutsos
Large Human Communication NetworksPatterns and a Utility-Driven Generator Nan Du, Christos Faloutsos, Bai Wang, Leman Akoglu KDD 2009
0 2 4 1 3 Cliques • Clique is a complete subgraph. • If a clique can not be contained by any largerclique, it is called the maximal clique. C. Faloutsos
0 2 4 1 3 Clique • Clique is a complete subgraph. • If a clique can not be contained by any largerclique, it is called the maximal clique. C. Faloutsos
0 2 4 1 3 Clique • Clique is a complete subgraph. • If a clique can not be contained by any largerclique, it is called the maximal clique. C. Faloutsos
0 2 4 1 3 Clique • Clique is a complete subgraph. • If a clique can not be contained by any largerclique, it is called the maximal clique. • {0,1,2}, {0,1,3}, {1,2,3}{2,3,4}, {0,1,2,3} are cliques; • {0,1,2,3} and {2,3,4} are the maximal cliques. C. Faloutsos
S.5: Clique-Degree Power-Law • Power law: degree of node i # maximal cliques of node i Dataset: who-calls-whom anonymized, Over several time units More friends, even more social circles ! C. Faloutsos
S.5 Clique-Degree Power-Law • Outlier Detection C. Faloutsos
1.5 Clique-Degree Power-Law • Outlier Detection C. Faloutsos
Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • degree, diameter, eigen*, • triangles • cliques • Weighted graphs • Time evolving graphs • Problem#2: Generators C. Faloutsos
Observation W.1 Question : Nodes in a triangle are topologically equivalent. Will they also give equal number of phone calls to each other ? Max Weight Min Weight Mid Weight C. Faloutsos
Observation W.1:Triangle Weight Law Periods S1 – S3 C. Faloutsos
Other observations on weighted graphs? • A: yes - even more ‘laws’! M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 C. Faloutsos
Observation W.2: fortification Q: How do the weights of nodes relate to degree? C. Faloutsos
Observation W.2: fortification:Snapshot Power Law weight of a node is super-linear on the in-degree with PL exponent ‘iw’: i.e. 1.01 < iw < 1.26, super-linear Orgs-Candidates More donors, even more $ e.g. John Kerry, $10M received, from 1K donors In-weights ($) $10 $5 Edges (# donors) C. Faloutsos
Are there deviations from P.L.? • A: yes – but they also correspond to very skewed distributions (‘black/gray swans’) C. Faloutsos
“Mobile Call Graphs: Beyond Power Law and Lognormal Distributions” K D D ’ 0 8 Mukund Seshadri , Sridhar Machiraju, Ashwin Sridharan, Jean Bolot Christos Faloutsos, Jure Leskovec
+ Observed Data .... LogNormal Fit Dataset • Who-calls-whom (anonymized) • Degree distribution, in green .... Observed Data --- Power Law Fit Area S1 Time period T1 1 month Count Degree of Mobile-phone call graph C. Faloutsos
+ Observed Data .... LogNormal Fit A poor fit? • Power Laws: p(x) ~ x^(-a) • Lognormal: log(x) is Normal .... Observed Data --- Power Law Fit Area S1 Time period T1 1 month Count Degree of Mobile-phone call graph C. Faloutsos
Solution: DPLN • Double Pareto Log Normal (Reed, 2003) • 4 parameters: [α,β,ν,τ] • Linear head and tail. Area S1, Time Period T1 count “Rich get Richer” BUT Population lifetimes NOT identical β α C. Faloutsos Degree
Datasets • Anonymized monthly aggregates of call metrics per user • Collected at 4 coverage areas S1,S2,S3,S4 • Collected for two month-long periods T1 and T2, separated by 6 months • Total coverage: • >1 million users • >10 million calls • ~7000 square miles. C. Faloutsos
Metrics • Degree (No. of Call Partners) • Calls • Talk Time Total over a month, per user Area S1 Time-period T1 Degree Distribution C. Faloutsos
Persistence –Across Metric, Time, Space Calls Area S1, Time T1 Partners S1, T2 Partners S2, T2 C. Faloutsos
Applications • Outlier detection • Many un-answered calls => multiples of 27 sec! Per-user distribution of total talk time (@ T1, S1) C. Faloutsos
Outline • Introduction – Motivation • Problem#1: Patterns in graphs • Static graphs • Weighted graphs • Time evolving graphs • Problem#2: Generators • … C. Faloutsos
Problem: Time evolution • with Jure Leskovec (CMU -> Stanford) • and Jon Kleinberg (Cornell – sabb. @ CMU) C. Faloutsos