240 likes | 365 Views
Traffic-driven model of the World-Wide-Web Graph. A. Barrat, LPT, Orsay, France M. Barthélemy, CEA, France A. Vespignani, LPT, Orsay, France. Outline. The WebGraph Some empirical characteristics Various models Weights and strengths Our model: Definition Analysis: analytics+numerics
E N D
Traffic-driven model of the World-Wide-Web Graph A. Barrat, LPT, Orsay, France M. Barthélemy, CEA, France A. Vespignani, LPT, Orsay, France
Outline • The WebGraph • Some empirical characteristics • Various models • Weights and strengths • Our model: • Definition • Analysis: analytics+numerics • Conclusions
The Web as a directed graph nodes i: web-pages directed links: hyperlinks l j i in- and out- degrees:
Poisson distribution Empirical facts • Small world : captured by Erdös-Renyi graphs With probability p an edge is established among couple of vertices <k> = p N
n 3 Higher probability to be connected 2 1 Empirical facts • Small world • Large clustering: different neighbours of a node • will likely know each other =>graph models with large clustering, e.g. Watts-Strogatz 1998
Empirical facts • Small world • Large clustering • Dynamical network • Broad connectivity distributions • also observed in many other contexts • (from biological to social networks) • huge activity of modeling (Barabasi-Albert 1999; Broder et al. 2000; Kumar et al. 2000; Adamic-Huberman 2001; Laura et al. 2003)
Various growing networks models • Barabási-Albert (1999): preferential attachment • Many variations on the BA model: rewiring (Tadic 2001, Krapivsky et al. 2001), addition of edges, directed model (Dorogovtsev-Mendes 2000, Cooper-Frieze 2001), fitness (Bianconi-Barabási 2001), ... • Kumar et al. (2000): copying mechanism • Pandurangan et al. (2002): PageRank+pref. attachment • Laura et al. (2002): Multi-layer model • Menczer (2002): textual content of web-pages
The Web as a directed graph nodes i: web-pages directed links: hyperlinks l j i Broad P(kin) ; cut-off for P(kout) (Broder et al. 2000; Kumar et al. 2000; Adamic-Huberman 2001; Laura et al. 2003)
Additional level of complexity: Weights and Strengths l j Links carry weights/traffic: wij i In- and out- strengths Adamic-Huberman 2001: broad distribution of sin
n Model: directed network (i) Growth j (ii) Strength driven preferential attachment (n: kout=m outlinks) i “Busy gets busier” AND...
n Weights reinforcement mechanism j i The new traffic n-i increases the traffic i-j “Busy gets busier”
Evolution equations (Continuous approximation) Coupling term
Resolution Ansatz supported by numerics:
Approximation Total in-weight i sini : approximately proportional to the total number of in-links i kini , times average weight hwi = 1+ Then: A=1+ gsin2 [2;2+1/m]
Numerical simulations Measure of A prediction of Approx of g
Numerical simulations NB: broad P(sout) even if kout=m
Clustering spectrum i.e.: fraction of connected couples of neighbours of node i
Clustering spectrum • d increases => clustering increases • New pages: point to various well-known pages, often connected • together => large clustering for small nodes • Old, popular pages with large k: many in-links from many less popular pages which are not connected together • => smaller clustering for large nodes
Clustering and weighted clustering takes into account the relevance of triangles in the global traffic
Clustering and weighted clustering Weighted Clustering larger than topological clustering: triangles carry a large part of the traffic
Assortativity Average connectivity of nearest neighbours of i
Assortativity • knn: disassortative behaviour, as usual in growing networks • models, and typical in technological networks • lack of correlations in popularity as measured by the in-degree
Summary • Web: heterogeneous topology and traffic • Mechanism taking into account interplay between topology and traffic • Simplemechanism=>complex behaviour, scale-free distributions for connectivity and traffic • Analytical study possible • Study of correlations: non-trivial hierarchical behaviour • Possibility to add features (fitnesses, rewiring, addition of edges, etc...), to modify the redistribution rule... • Empirical studies of traffic and correlations?