250 likes | 443 Views
CS728 Lecture 5 Generative Graph Models and the Web. Importance of Generative Models. Gives insight into the graph formation process: Anomaly detection – abnormal behavior, evolution Predictions – predicting future from the past Simulations and evaluation of new algorithms
E N D
Importance of Generative Models Gives insight into the graph formation process: • Anomaly detection – abnormal behavior, evolution • Predictions – predicting future from the past • Simulations and evaluation of new algorithms • Graph sampling – many real world graphs like the web are too large and complex to deal with • Goal: generating graphs with small world property, clustering, power-laws, other naturally occurring structures
Graph Models: Waxman Models • Used for models of clustering in Internet-like topologies and networks with long and short edges • The vertices are distributed at random in a plane. • An edge is added between each pair of vertices with probability p. p(u,v) = * exp( -d / (*L) ), 0 , 1. • L is the maximum distance between any two nodes. • Increase in alpha increases the number of edges in the graph. • Increase in beta increases the number of long edges relative to short edges. • d is the Euclidean distance from u to v in Waxman-1. • d is a random number between [0, L] in Waxman-2.
Graph Models: Configuration Model • Random Graph from given degree sequence • Problem: Given a degree sequence, d1,d2, d3, …., dn generate a random graph with that degree sequence • Solution: Place di stubs onto vertex I Choose pairs of stubs at random
Problem: we may construct graphs with loops and multiedges • To prevent this there must be enough “absorbing” residual degree capacity. • Algorithm: • Maintain list of nodes sorted by residual degrees d(v) • Repeat until all nodes have been chosen: • pick arbitrary vertex v • add edges from v to d(v) vertices of highest residual degree • update residual degrees To randomize further, we can start with a realization and repeatedly 2-swap pairs of edges (u,v), (s,t) to (u,t), (s,v) Works OK, But is there a more ‘natural’ generative model?
Generative Graph models: Preferential attachment • Price’s Model [65] : Physics citations – “cummulative advantage” • Herb Simon [50’s]: Nobel and Turing Awards, political scientist “rich get richer” (Pareto) • Matthew effect / Matilda effect: sociology • Barabasi and Albert 99: Preferential attachment: • Add a new node, create d out-links • Probability of linking a node is proportional to its current degree • Simple explanation of power-law degree distributions
Issues with preferential attachment and Power-laws • Barabasi model fixed constant m for out-degree • Price’s model directed with m mean out-degree • Probability of adding a new edge is proportional to its (in) degree k • problem at the start degree 0 • Price’s model: prop to deg + 1 • Analysis: prob a node has degree k • pk ~ k-3 (Barabasi model) • pk ~ k-(2+1/m) power-law with exponent 2-3 (Price) • Exercise: give pseudocode that generates such a graph in linear time
Variations on the PA Theme • Clustering, Small-World and Ageing • Copying Model • Alpha and beta Models • Temporal Evolution • Densification
Graph models: Copying model • Copying model • [Kleinberg, Kumar, Raghavan, Rajagopalan and Tomkins, 99]: • Add a node and choose the number of edges to add • Choose a random vertex and “copy” its links (neighbors) • Also generates power-law degree distributions • Generates communities - clustering
Graph Models: The Alpha Model Watts (1999) a model: Add edges to nodes, as in random graphs, but makes links more likely when two nodes have a common friend. For a range of a values: • The world is small (average path length is short), and • Groups tend to form (high clustering coefficient). Probability of linkage as a function of number of mutual friends (a is 0 in upper left, 1 in diagonal, and ∞ in bottom right curves.)
Graph Models: The Beta Model Watts and Strogatz (1998) “Link Rewiring” b = 0 b = 1 b = 0.125 People know their neighbors, and a few distant people. Clustered and “small world” People know others at random. Not clustered, but “small world” People know their neighbors. Clustered, but not a “small world”
Graph Models: The Beta Model Watts and Strogatz (1998) First five random links reduce the average path length of the network by half, regardless of N! Both a and b models reproduce short-path results of random graphs, but also allow for clustering. Small-world phenomena occur at threshold between order and chaos. Clustering coefficient / Normalized path length Clustering coefficient (C) and average path length (L) plotted against b
Other Related Work • Hybrid models: Beta + Waxman on grid • Huberman and Adamic, 1999: Growth dynamics of the world wide web • Argue against Barabasi model for its age dependence • Kumar, Raghavan, Rajagopalan, Sivakumar and Tomkins, 1999: Stochastic models for the web graph • Watts, Dodds, Newman, 2002: Identity and search in social networks • Medina, Lakhina, Matta, and Byers, 2001: BRITE: An Approach to Universal Topology Generation • …
Statistics • Statistics of common networks: Large k = large c? Small c = large d?
Modeling Ageing and Temporal Evolution • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is guess for E(t+1) =? 2 * E(t) • A: over-doubled?
Temporal Evolution of Graphs • Densification Power Law • networks appear denser over time • the number of edges grows faster than the number of nodes – average degree is increasing a … densification exponent or equivalently
Graph Densification • Densification Power Law • Densification exponent: 1 ≤ a ≤ 2: • a=1: linear growth – constant out-degree (assumed in the literature so far) • a=2: quadratic growth – clique • Let’s see the real graphs!
Densification – ArXiv citation graph in Physics • Citations among physics papers • 1992: • 1,293 papers, 2,717 citations • 2003: • 29,555 papers, 352,807 citations • For each month M, create a graph of all citations up to month M E(t) 1.69 N(t)
Densification – Patent Citations • Citations among patents granted • 1975 • 334,000 nodes • 676,000 edges • 1999 • 2.9 million nodes • 16.5 million edges • Each year is a datapoint E(t) 1.66 N(t)
Densification – Internet Autonomous Systems • Graph of Internet • 1997 • 3,000 nodes • 10,000 edges • 2000 • 6,000 nodes • 26,000 edges • One graph per day E(t) 1.18 N(t)
Evolution of the Diameter • Prior work on Power Law graphs hints at Slowlygrowing diameter: • diameter ~ O(log N) • diameter ~ O(log log N) • What is happening in real data? • Diameter shrinks over time • As the network grows the distances between nodes slowly decrease
Diameter – ArXiv citation graph diameter • Citations among physics papers • 1992 –2003 • One graph per year time [years]
Diameter – “Patents” diameter • Patent citation network • 25 years of data time [years]
Diameter – Autonomous Systems diameter • Graph of Internet • One graph per day • 1997 – 2000 number of nodes
Next Time: Densification – Possible Explanations • Generative models to capture the Densification Power Law and Shrinking diameters • 2 proposed models: • Community Guided Attachment – obeys Densification • Forest Fire model – obeys Densification, Shrinking diameter (and Power Law degree distribution)