1 / 18

Understanding Positive Feedback Loops in DHTs

Explore the concept of positive feedback loops in Distributed Hash Tables (DHTs) through a detailed analysis of Pastry's behavior under stress conditions and the implications for network scalability and reliability.

kiele
Download Presentation

Understanding Positive Feedback Loops in DHTs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Positive Feedback Loops in DHTsorBe Careful How You Simulate January 13, 2004 Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz From “Handling Churn in a DHT”, available at http://bamboo-dht.org/pubs.html (and in the back of the room)

  2. Background • A year ago, started benchmarking DHTs • Usual goals: • Improve the state of the art • Provide a metric of success • Interested in real implementations • Not simulations • Want to use the DHTs in real applications • Need a solid experimental framework

  3. PlanetLab • Our first testbed • A “real” network • Some machines bandwidth, CPU limited • Lots of cross traffic • But problems • Too hard to get reproducible results • Too little scale (~250 machines)

  4. ModelNet • Run several virtual hosts per CPU • Override systems calls like sendto, recvfrom • Route all packets through single host • Applies delay, queuing, loss • Uses 10,000 node AS-level topology • Allows for reasonable scale • Have run with up to 4,000 DHT nodes • Reproducible results

  5. A Simple Experiment • Start 1000 nodes in a DHT (FreePastry) • Let network stabilize • Start 200 more • What happens?

  6. FreePastry under Massive Join • Does the bandwidth explosion have something to do with the DHT’s collapse?

  7. Talk Overview • Background • Teaser • Pastry review • Pastry’s problem and a fix • Conclusions and future work

  8. 111… 0… 110… 10… Pastry Review • Each DHT node has • An identifier in [0,2160) • Leaf set • Predecessors • Successors • Routing table • Nodes w/similar prefixes • Choose node for each prefix by proximity (in network latency) • Each node responsible for keys closest to its ID

  9. Pastry Join Algorithm function join (A, G) = G’ = nearest_neighbor (A, G); (B, P) = lookup (G’, IDA); LA = get_leaf_set (B); for i from 0 to |P| - 1 do k = len_longest_matching_pfx (IDA, IDPi); Ri = get_routing_table_level (Pi, k);

  10. Probes in Pastry’s Join • To compute nearest_neighbor, must probe • Looking for nearest node in some set • Existing nodes also probe joining node • Castro et al. estimate ~150 probes/join • Independent of congestion for correctness • On failure, must probe to find replacement • May need many probes to find closest one

  11. Talk Overview • Background • Teaser • Pastry review • Pastry’s problem and a fix • Conclusions and future work

  12. Teaser Explaination • In network under stress, many probes • If bandwidth limited, interfere with each other • Lots of dropped probes looks like a failure • Pastry responds to failure, sending more • Probability of drop goes up • We have a positive feedback cycle (squelch) • Easy to confirm • Increasing available b.w. solves problem

  13. What Went Wrong? • Pastry publications show it working fine • Existing Pastry results are of two types: • Simulations of 10,000-100,000 nodes • Don’t model queuing, delay, or cross traffic • Planetlab tests using 10s of nodes • Low scale, ample bandwidth on chosen hosts

  14. A Simple Fix • Idea: fix broken links periodically • Instead of recovering in reaction to failure • Breaks feedback loop • Also, scale back period in response to loss • Now it’s a negative feedback cycle (damping) • Still have a probe problem: • How to probe independently of congestion? • Good probes important for neighbor proximity

  15. Restoring Proximity • Finding the closest neighbor takes time • Meanwhile, routing is no longer O(log n) • Fix: fill holes with first appropriate node • Can find such a node using a lookup • Immediately restores O(log n) routing • Later, can look for close nodes • Again, periodically, with backoff on failure • Use several techniques not covered here

  16. Related Work • Chord’s stabilization is proactive, periodic • Not clear what motivated this decision • Mahajan et al. • Simulation-based study of Pastry under churn • Automatic tuning of maintenance rate • Suggest increasing rate on failures! • Liben-Nowell et al. • Analytical lower bound on maintenance costs

  17. Conclusions • Simplifying network model dangerous • May lead to bad design, false sense of correctness • Separate concerns in DHT routing • Correctness – comes from leaf set • Efficiency – comes from filled routing table • Proximity – only a concern after 1 and 2 • Can we do better in simulation? • And still scale to 10,000s of nodes? • ModelNet requires a whole cluster…

  18. Thanks for Listening! More information available at http://bamboo-dht.org

More Related