1 / 14

Network Awareness & Failure Resilience in Self-Organising Overlay Networks

Network Awareness & Failure Resilience in Self-Organising Overlay Networks. L. Massouli é, A.-M. Kermarrec, A.J. Ganesh Microsoft Research. The Internet…. Context: unstructured overlays. Peer machines connected to the Internet, Each only maintains IP addresses of neighbors.

helki
Download Presentation

Network Awareness & Failure Resilience in Self-Organising Overlay Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Awareness & Failure Resilience in Self-Organising Overlay Networks L. Massoulié, A.-M. Kermarrec, A.J. Ganesh Microsoft Research

  2. The Internet… Context: unstructured overlays • Peer machines connected to the Internet, • Each only maintains IP addresses of neighbors. • Applications: • Low-level: message dissemination (flooding) • Higher-level: tree construction; content search,… • E.g., Gnutella, ScaMP,… • Overlay structure = graph of “who knows who” relations; choice of neighbors flexible.

  3. Objectives Adapt overlay graph structure, for: • Improving resilience to failures, • Reducing network load, • Reducing network impact on application performance.

  4. Network Awareness Cost of overlay connection (i,j): • Communication cost = number of network hops n(i,j); • Application cost = propagation delay from i to j (both measured by ping).  Assumption: some function c(i,j) captures network cost, to be minimised; easily measured.

  5. Failure Resilience • i.e., connectivity in the presence of link / node failures. • Benchmark: connectivity of random graphs (relevant for existing systems, like ScaMP)  Random graph on N nodes, with mean degree of c.log(N) supports node or link failure rates up to 1-1/c. degree distribution disconnections: due to isolated nodes

  6. Formal problem statement • Adapt graph in a distributed way, keeping number of edges fixed, so as to reduce objective function • Parameter w: controls trade-off between objectives di=degree of node i; Forces degree balancing c(i,j)=cost of maintaining connection (i,j), to both network and overlay app.

  7. j k j k i i Solution: a Metropolis algorithm • Periodically each node i picks two current neighbours j, k • Candidate rewiring: • Local evaluation of impact on energy: • Rewiring accepted with probability:

  8. Metropolis algorithm (2) • Defines Markov chain on set of connected graphs with initial number of edges E, and stationary distribution: hence concentrates on low energy configurations.

  9. Analysis of failure resilience • Key result: • For an average degree of c.log(N), c>0, resulting graph remains connected for link failure rates up to exp(-1/c). • Improves upon failure resilience of uniform random graphs (cf. Erdös-Renyi law); • Essentially optimal failure resilience for uniform random link failures.

  10. Experimental results • Topologies: • Georgia Tech transit-stub model, 5050 core nodes, 100-node LANs attached to core nodes. • (Subgraph of) Microsoft’s corporate network, with 298 core nodes. • Initial overlay: random, with average degree of 2.log(N) (based on ”ScaMP” system) • Costs: delays (1ms per local hop, 40ms per non-local hop). • N=50,000 peers.

  11. Connectivity (Corp) 180 160 140 Number of disconnections 120 100 80 60 (0,0,0) (10,1,100) (10,1,1000) 40 20 0 0 5000 10000 15000 20000 25000 Number of faulty nodes

  12. Degree (Corp) 50,000 mean value= 19.39 45,000 40,000 (0,0,0) (10,1,100) 35,000 (10,1,1000) Number of nodes 30,000 (50,1,100) (50,1,1000) 25,000 20,000 15,000 10,000 5,000 0 10 20 30 40 50 60 70 0 Number of neighbours

  13. Network delays • Reduced by a factor of 4 on Corp and >2 on GT-5050. • Similar reductions on standard deviation of distances between neighbours.

  14. Outlook • Trade-off between network locality and degree balancing; • Study application-related costs for diverse applications: • For fast dissemination, aggregation of network delays not enough  “small-world” topologies; • Other distances for other applications: “semantic distance” between peers for content searching.

More Related