A Random-Surfer Web-Graph Model

A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira

links.html index.html http://cnn.com resume.html The Web as a Graph Consider the World Wide Web as a graph, with web pages as nodes and hyperlinks between pages as edges.

Studying the Web Since the Web emerged there has been a lot of interest in: • Empirically studying properties of the Web Graph. • Modeling the Web Graph mathematically. Benefits of Generative Models: • Simulation – When real data is scarce • Extrapolation – How will the graph change? • Understanding– Inspire further research on real data

Power Law f(x) ~ g(x) if Limx→∞ f(x)/g(x) = 1 e.g (x+1) ~ (x+2) The distribution of a random variable X follows a power law if Prob [X=k]~ Ck-α Example: Prob [X=k]=k-2

Power Law: Prob [X=k]=k-2

Power Law Prob [X=k] ~ Ck-α log Prob [X=k] ~ log C –αlog k Prob [X=k]=k-2 log Prob [X=k]= -2 logk

Power Law: Log-Log plot

Power Law contd. More general definition: Prob [X≥k]~ Ck-α Particularly useful if X takes on real values. Sometimes referred to as “heavy tailed” or “scale free.”

Power Laws inDegree distribution Let G be a graph. Let Xk be the proportion of nodes with degree k in G. Then if Xk~ Ck-α we say that G has power law degree distribution.

Properties of the Web Graph A Power-law degree distribution has been observed in a wide variety of graphs including citation networks, social networks, protein-protein interaction networks and so on. It has also been observed in the Web Graph. [Barabási & Albert]

Outline • Background/Previous Work • Motivation • Models • Theoretical results • Experimental results • Conclusions

Classic Random Graph Models • In the G(n,p) random graph model: • There are n nodes. • There is an edge between any two nodes with probability p. • Was proposed by Erdös and Renyi in 1960s.

Online G(n,p) In this model each new node makes k connections to existing nodes uniformly at random. For this talk we will focus on k = 1, hence the graph will be a tree.

T=1 T=2 T=3 ½ ½ T=4 ⅓ ⅓ ⅓ Online G(n,p)

Properties of Online G(n,p) • E[degree of first node] = 1+ 1/2 +1/3+1/4 + …1/n = (log n) • E[max degree] = (log n) • Xk = Proportion of nodes with degree k E[Xk] = (½k) NOT POWER LAWED!!

Online G(n,p)(n=100,000, average of 100 runs)

Preferential Attachment In the Preferential Attachment model, each new node connects to the existing nodes with aprobability proportional to their degree. [Barabási & Albert]

Degree = in-degree + out-degree T=1 T=2 Deg = 1 Deg = 3 T=3 ¾ ¼ T=4 Deg = 1 Deg = 4 Deg = 1 Preferential Attachment

Preferential Attachment E[degree of 1st node] = √n Preferential Attachment gives a power-law degree distribution. [Mitzenmacher, Cooper & Frieze 03, KRRSTU00]

Preferential Attachment

Other Models Kumar et. al. proposed the “copying model.” [KRRSTU00] Leskovec et. al. propose a “forest fire” model which has some similarites to this work. [LKF05]

Motivating Questions • Why would a new node connect to nodes of high degree? • Are high degree nodes more attractive? • Or are there other explanations? How does a new node find out what the high degree nodes are?

Motivating Questions Motivating Observation: • Suppose each page has a small probability p of being interesting. • Suppose a user does a (undirected) random walk until they • find an interesting page. • If p is small then this is the same as preferential attachment. • What about other processes and directed graphs?

Start with a single node with a self-loop. T=1 (½) (½)+ (½) (½)+ (½) (½) • Choose a node uniformly at random • With probability p connect • With probability (1-p) connect to its neighbor T=2 T=3 ¾ ¼ Directed 1-step Random Surfer, p=.5

Directed 1-step Random Surfer It turns out this model is a mixture of connecting to nodes uniformly at random and preferential attachment. Has a power-law degree distribution. But taking one step is not very natural. What about doing a real random walk?

Directed Coin Flipping model • Pick a node uniformly at random. 2. Flip a coin of bias p If HEADS connect to current node, else walk to neighbor D C NEW NODE B A RANDOM STARTING NODE 1. COIN TOSS: TAIL (at node A) 2. COIN TOSS: TAIL (at node B) 3. COIN TOSS: HEAD (at node C)

Directed Coin Flipping model • At time 1, we start with a single node with a self-loop. • At time t, we choose a node uuniformly at random. • We then flip a coin of bias p. • If the coin comes up heads, we connect to the current node. • Else we walk to a random neighbor and go to step 3. “each page has equal probability p of being interesting to us”

Is Directed Coin-Flipping Power-lawed? We don’t know … but we do have some partial results ...

Virtual Degree Definitions: • Let li(u)be the number of levelidescendents of node u. • l1(u) = # of children • l2(u) = # of grandchildren, e.t.c. Let  = (β1, β2,..) be a sequence of real numbers with 1=1. Thenv(u) = 1 + β1 l1(u) + β2 l2(u) + β3 l3(u) + … We’ll call v(u) the “Virtual degree of u with respect to .”

u v(u) = 1 + β1 (2) + β2 (4) + β3 (0) + β4 (0) + ... # of children # of grandchildren Virtual Degree

u Virtual Degree Easy observation: If we set βi = (1-p)i then the expected increase in deg(u) is proportional to v(u). Expected increase in deg(u) = p/t + (1-p)pl1(u)/t + (1-p)2pl2(u)/t + … = (p/t)v(u)

Virtual Degree • Theorem: There always exist βi such that • For i ≥ 1, |βi| · 1. • As i → ∞, βi →0 exponentially. • The expected increase in v(u) is proportional to v(u). Recurrence:1=1, 2=p, i+1=i – (1-p)i-1 E.g., for p=¾, i = 1, 3/4, 1/2, 5/16, 3/16, 7/64,... for p=½, i = 1, 1/2, 0, -1/4, -1/4, -1/8, 0, 1/16, …

Virtual Degree, continued Let vt(u) be the virtual degree of node u at time t and tu be the time when node u first appears. Theorem: For any node u and time t ≥tu, E[vt(u)] = Θ((t/tu)p) So, the expected virtual degrees follow a power law.

Actual Degree We can also obtain lower bounds on the expected values of the actual degrees: Theorem: For any node u and time t ≥tu, E[degree(u)] ≥ Ω((t/tu)p(1-p))

Experiments • Random graphs of n=100,000 nodes • Compute statistics averaged over 100 runs. • K=1 (Every node has out-degree 1)

Online Erdös-Renyi

Directed 1-Step Random Surfer, p=3/4

Directed Coin Flipping, p=1/2

Directed Coin Flipping, p=1/4

Undirected coin flipping, p=1/2

Undirected Coin Flipping p=0.05

Conclusions Directed random walk models appear to generate power-laws (and partial theoretical results). Power laws can naturally emerge, even if all nodes have the same intrinsic “attractiveness”.

Open questions • Can we prove that the degrees in the directed coin-flipping model do indeed follow a power law? • Analyze degree distribution for the undirected coin-flipping • model with p=1/2? • Suppose page i has “interestingness” pi. Can we analyze • the degree as a function of t, i and pi?

A Random-Surfer Web-Graph Model

A Random-Surfer Web-Graph Model

Presentation Transcript

Exponential Random Graph Models

Exponential Random Graph Models (ERGM)

Web Graph Characteristics

Random-Graph Theory

A Graph Model for RDF

The web graph

Soul Surfer

A Fuzzy Web Surfer Model

A Model Using Random Graph Theory

The Web is a Graph

Hashi in a Graph-Theoretic Model

The Web as a graph

Finding a maximum independent set in a sparse random graph

Soul Surfer

Web as a graph

Random Graph Models of Social Networks

Random Effects Model

A Fuzzy Web Surfer Model

Random Walk Model