1 / 28

I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France. I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France) A. V á zquez (Notre-Dame University, USA) A. Vespignani (LPT, France). http://www.th.u-psud.fr/page_perso/Barrat.

debbie
Download Presentation

I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Traceroute-like exploration of unknown networks: a statistical analysisA. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, France) L. Dall’Asta (LPT, France) A. Vázquez (Notre-Dame University, USA) A. Vespignani (LPT, France) http://www.th.u-psud.fr/page_perso/Barrat cond-mat/0406404

  2. Plan of the talk • Apparent complexity of Internet’s structure • Problem of sampling biases • Model for traceroute • Theoretical approach of the traceroute mapping process • Numerical results • Conclusions

  3. Internet representation • Multi-probe reconstruction (router-level) • Use of BGP tables for the Autonomous System level (domains) Many projects (CAIDA, NLANR, RIPE, IPM, PingER...) Graph representation different granularities

  4. =>Large-scale visualizations TOPOLOGICAL ANALYSIS by STATISTICAL TOOLS and GRAPH THEORY !

  5. Main topological characteristics • Small world Distribution of Shortest paths (# hops) between two nodes

  6. Poisson distribution Main topological characteristics • Small world : captured by Erdös-Renyi graphs With probability p an edge is established among couple of vertices <k> = p N

  7. n 3 Higher probability to be connected 2 1 Main topological characteristics • Small world • Large clustering: different neighbours of a node • will likely know each other =>graph models with large clustering, e.g. Watts-Strogatz 1998

  8. Main topological characteristics • Small world • Large clustering • Dynamical network • Broad connectivity distributions • also observed in many other contexts • (from biological to social networks) • huge activity of modeling

  9. Result of a sampling: is this reliable ? Main topological characteristics Broad connectivity distributions: obtained from mapping projects Govindan et al 2002

  10. Sampling biases Internet mapping: => spanning tree Sampling is incomplete Lateral connectivity is missed (edges are underestimated) Finite size sample

  11. Bad sampling ? Sampling biases • Vertices and edges best sampled in the proximity of sources • Bad estimation of some topological properties Statistical properties of the sampled graph may sharply differ from the original one Lakhina et al. 2002 Clauset & Moore 2004 De Los Rios & Petermann 2004 Guillaume & Latapy 2004

  12. Evaluating sampling biases Real graph G=(V,E) (known, with given properties ) simulated sampling Sampled graph G’=(V’,E’) Analysis of G’, comparison with G

  13. Model for traceroute First approximation: union of shortest paths G=(V, E) Targets Sources NB: Unique Shortest Path

  14. Model for traceroute First approximation: union of shortest paths G’=(V’, E’) Very simple model, but: allows for some analytical and numerical understanding

  15. More formally... • G = (V, E): sparse undirected graph with a set of • NSsourcesS= { i1, i2, …, iNS } • NT targetsT = {j1, j2, …, jNT } • randomly placed. • The sampled graph G’=(V’,E’) is obtained by considering the union of all the traceroute-like paths connecting source-target pairs. PARAMETERS: (probing effort) Usually NS=O(1), rT=O(1)

  16. Analysis of the mapping process For each set = { S,T }, the indicator function that a given edge(i, j) belongs to the sampled graph is with if (i,j)  path between l,m otherwise,

  17. usuallyST<< 1 Mean-field statistical analysis Averaging over all the possible realizations of the set = { S,T }, WE HAVE NEGLECTED CORRELATIONS !! <pij > >1 – exp(- rSrT bij ) betweenness

  18. Betweenness centrality b for each pair of nodes (l,m) in the graph, there are nlm shortest paths between l and m nijlm shortest paths going through ij bij is the sum of nijlm/ nlm over all pairs (l,m) k i ij: large betweenness j jk: smallbetweenness Also: flow of information if each individual sends a message to all other individuals Similar concept: node betweenness bi

  19. Consequences of the analysis <pij > >1 – exp(- rSrT bij ) • Smallest betweenness • bij=2 => <pij >> 2 rSrT i j (i.e. i and j have to be source and target to discover i-j) • Largest betweenness • bij=O(N2) => <pij >> 1

  20. Results for the vertices <pij > >1 – exp(- e bij/N) <pi >> 1 – (1-rT) exp( - e bi/N ) (discovery probability) N*k/Nk> 1 – exp( - e b(k)/N ) (discovery frequency) <k*>/k >e ( 1 + b(k)/N )/k (discovered connectivity) • discovery probability strongly related with the centrality ; • vertex discovery favored by a finite density of targets ; • accuracy increased by increasing probing effort  . Summary:

  21. Numerical simulations 1. Homogeneous graphs: (ex: ER random graphs) • peaked distributions of k and b • narrow range of betweenness => Good sampling expected only for high probing effort

  22. Numerical simulations • Homogeneous graphs • Heavy-tailed graphs (ex: Scale-free BA model) • broad distributions of k and b • P(k) ~ k-3 • large range of available values => Expected: Hubs well-sampled independently of  (b(k) ~kb )

  23. Numerical simulations Homogeneous graphs Homogeneously pretty badly sampled

  24. Numerical simulations Scale-free graphs Hubs are well discovered

  25. Numerical simulations Homogeneous graphs NS=1 No heavy-tail behaviour except.... • heavy-tailed P*(k) only for NS = 1 (cf Clauset and Moore 2004) • cut-off around <k> => large, unrealistic <k> needed • bad sampling of P(k)

  26. Numerical simulations Scale-free graphs • good sampling, especially of the heavy-tail; • almost independent of NS ; • slight bending for low degree (less central) nodes => bad evaluation of the exponents.

  27. Summary • Analytical approach to a traceroute-like sampling process • Link with the topological properties, in particular the betweenness • Usual random graphs more “difficult” to sample than heavy-tails • Heavy-tails well sampled • Bias yielding a sampled scale-free network from a homogeneous network: only in few cases <k> has to be unrealistically large Heavy tails properties are a genuine feature of the Internet however Quantitative analysis might be strongly biased (wrong exponents...)

  28. Perspectives • Optimizedstrategies: • separate influence of rT, rS • location of sources, targets • investigation of other networks • Results on redundancy issues • Massive deployment traceroute@home cf alsoGuillaume and Latapy 2004 and... • The internet is aweighted networks • bandwidth, traffic, efficiency, routers capacity • Data are scarse and on limited scale • Interaction among topology and traffic

More Related