180 likes | 227 Views
Explore the concept of positive feedback loops in Distributed Hash Tables (DHTs) through a detailed analysis of Pastry's behavior under stress conditions and the implications for network scalability and reliability.
E N D
Positive Feedback Loops in DHTsorBe Careful How You Simulate January 13, 2004 Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz From “Handling Churn in a DHT”, available at http://bamboo-dht.org/pubs.html (and in the back of the room)
Background • A year ago, started benchmarking DHTs • Usual goals: • Improve the state of the art • Provide a metric of success • Interested in real implementations • Not simulations • Want to use the DHTs in real applications • Need a solid experimental framework
PlanetLab • Our first testbed • A “real” network • Some machines bandwidth, CPU limited • Lots of cross traffic • But problems • Too hard to get reproducible results • Too little scale (~250 machines)
ModelNet • Run several virtual hosts per CPU • Override systems calls like sendto, recvfrom • Route all packets through single host • Applies delay, queuing, loss • Uses 10,000 node AS-level topology • Allows for reasonable scale • Have run with up to 4,000 DHT nodes • Reproducible results
A Simple Experiment • Start 1000 nodes in a DHT (FreePastry) • Let network stabilize • Start 200 more • What happens?
FreePastry under Massive Join • Does the bandwidth explosion have something to do with the DHT’s collapse?
Talk Overview • Background • Teaser • Pastry review • Pastry’s problem and a fix • Conclusions and future work
111… 0… 110… 10… Pastry Review • Each DHT node has • An identifier in [0,2160) • Leaf set • Predecessors • Successors • Routing table • Nodes w/similar prefixes • Choose node for each prefix by proximity (in network latency) • Each node responsible for keys closest to its ID
Pastry Join Algorithm function join (A, G) = G’ = nearest_neighbor (A, G); (B, P) = lookup (G’, IDA); LA = get_leaf_set (B); for i from 0 to |P| - 1 do k = len_longest_matching_pfx (IDA, IDPi); Ri = get_routing_table_level (Pi, k);
Probes in Pastry’s Join • To compute nearest_neighbor, must probe • Looking for nearest node in some set • Existing nodes also probe joining node • Castro et al. estimate ~150 probes/join • Independent of congestion for correctness • On failure, must probe to find replacement • May need many probes to find closest one
Talk Overview • Background • Teaser • Pastry review • Pastry’s problem and a fix • Conclusions and future work
Teaser Explaination • In network under stress, many probes • If bandwidth limited, interfere with each other • Lots of dropped probes looks like a failure • Pastry responds to failure, sending more • Probability of drop goes up • We have a positive feedback cycle (squelch) • Easy to confirm • Increasing available b.w. solves problem
What Went Wrong? • Pastry publications show it working fine • Existing Pastry results are of two types: • Simulations of 10,000-100,000 nodes • Don’t model queuing, delay, or cross traffic • Planetlab tests using 10s of nodes • Low scale, ample bandwidth on chosen hosts
A Simple Fix • Idea: fix broken links periodically • Instead of recovering in reaction to failure • Breaks feedback loop • Also, scale back period in response to loss • Now it’s a negative feedback cycle (damping) • Still have a probe problem: • How to probe independently of congestion? • Good probes important for neighbor proximity
Restoring Proximity • Finding the closest neighbor takes time • Meanwhile, routing is no longer O(log n) • Fix: fill holes with first appropriate node • Can find such a node using a lookup • Immediately restores O(log n) routing • Later, can look for close nodes • Again, periodically, with backoff on failure • Use several techniques not covered here
Related Work • Chord’s stabilization is proactive, periodic • Not clear what motivated this decision • Mahajan et al. • Simulation-based study of Pastry under churn • Automatic tuning of maintenance rate • Suggest increasing rate on failures! • Liben-Nowell et al. • Analytical lower bound on maintenance costs
Conclusions • Simplifying network model dangerous • May lead to bad design, false sense of correctness • Separate concerns in DHT routing • Correctness – comes from leaf set • Efficiency – comes from filled routing table • Proximity – only a concern after 1 and 2 • Can we do better in simulation? • And still scale to 10,000s of nodes? • ModelNet requires a whole cluster…
Thanks for Listening! More information available at http://bamboo-dht.org