1 / 36

A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond

A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond. Hagit Attiya, Technion Jennifer Welch , Texas A&M University. Introduction. One of the main themes of Nancy's work has been proving lower bounds and impossibility results for problems that arise in distributed computing.

shelby
Download Presentation

A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A World of (Im)PossibilitiesNancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

  2. Introduction • One of the main themes of Nancy's work has been proving lower bounds and impossibility results for problems that arise in distributed computing. • Overview some of Nancy's results • Less known results, hidden gems closer to our hearts • Emphasize their meaning and implications • How they influenced the development of the field and of distributed systems • Concentrating on their positive impact World of (Im)Possibilities

  3. Best-Known Example: FLP Impossibility of asynchronous fault-tolerant consensus [Fischer, Lynch, Paterson] Motivated work on • strengthening models of computation • partially synchronous models [Dwork, Lynch, Stockmeyer] • unreliable failure detectors [Chandra, Toueg] • weakening the problem definition • k-set agreement [Chaudhuri] • renaming [Attiya et al.] • condition-based approaches [Raynal, Rajsbaum et al.] World of (Im)Possibilities

  4. FLP: Impact • Related practical problems: • transaction commit • leader election • atomic broadcast • maintaining consistent replicated data • The wait-free hierarchy (classify concurrent abstract data types) [Herlihy] • Attempts to solve k-set agreement and renaming led to the application of topology in distributed computing. [Chaudhuri] [Borowsky, Gafni][Saks, Zaharoglou][Herlihy, Shavit] World of (Im)Possibilities

  5. 2nd Example: Brewer's Conjecture [Brewer, PODC 2000 invited talk] A web service cannot provide all three guarantees: • Consistency • Availability • Partition-tolerance World of (Im)Possibilities

  6. What Does This Mean? [Gilbert, Lynch, SIGACT News 2002] A web service cannot provide all three guarantees: • Consistency: atomicity of (read / write) operations • Availability: request by nonfaulty client gets response • Partition-tolerance: even when lost messages create two partitioned components in the network World of (Im)Possibilities

  7. p1 p0 writes 1 p0 writes 1 Exec 1: Exec 2: look same to p1 p1 reads 0 Exec 3: Proof Idea X adapted from [Attiya, Bar-Noy, Dolev] p0 X X X p1 reads 0 contradiction World of (Im)Possibilities

  8. Brewer's Conjecture: Implications • Traditional database services maintain the consistency and fail to provide availability in the face of partitions • Relax the consistency guarantees of the web service • Sometimes miss values or return stale data (Internet queries) [PIER: Huebsch, Hellerstein, Lanham, Loo, Shenker, Stoica] • Allow partitions to evolve separately, and build mechanisms to cope when this happens (stream processing) [Medusa: Balazinska, Balakrishnan, Stonebraker] • Sacrifice availability, but not often (stream processing)… [BOREALIS: Balazinska, Balakrishnan, Madden, Stonebraker] • Assume a mechanism to guard against partitions… [CQ: Shah, Hellerstein, Brewer] World of (Im)Possibilities

  9. 3rd Example: Best-Case Cost of Fault-Tolerant Algorithms Does making an algorithm be fault-tolerant incur a cost even when the system is well-behaved? • Previous investigation focused on the synchronous case • early stopping algorithms for consensus: 2 rounds vs. 1 round for non-fault-tolerant algorithm [Dolev, Reischuk, Strong] [Dwork, Moses] [Moses, Tuttle] • non-blocking commit: twice as many rounds as for blocking commit [Dwork, Skeen] • What about the asynchronous case? World of (Im)Possibilities

  10. Are Wait-Free Algorithms Fast? [Attiya, Lynch, Shavit] • Studies the best-case complexity of an algorithm • When there are no failures, although algorithm can tolerate any number of crashes (is wait-free) • When the execution is synchronized, although the algorithm works in asynchronous executions also • Complexity measure of interest is running time • Time is measured by synchronized rounds • Problem of interest is approximate agreement n = 6 World of (Im)Possibilities

  11. Wait-Free Algorithms are not Fast • A non-fault-tolerant algorithm takes O(1) time • one process writes its input and the rest read it • achieves perfect agreement ( = 0) • Prove an Ω(log n) time lower bound for wait-free approximate agreement • So there are problems for which being wait-free in the asynchronous model imposes more than constant additional cost even when failures do not occur. World of (Im)Possibilities

  12. < log n Proof Idea this process cannot influence the decision 0 0 0 0 0 < n 0 0 0 decide0 0 World of (Im)Possibilities

  13. 0 0 0 0 < n 0 0 0 0 < log n Proof Idea < 1 decide1 1 decide0 World of (Im)Possibilities

  14. The Best-Case Cost of Fault-Tolerance • Formalize the idea of "designing for the normal / common case" and show its cost [Lampson, "Hints for computer system design"] • The idea of accommodating the worst case & measuring the best / normal / common case has become standard. • message cost of consensus in failure-free runs [Halpern, Hadzilacos] • contention-free step complexity [Alur, Taubenfeld] • obstruction-free step complexity [Ellen, Luchangco, Moir, Shavit] World of (Im)Possibilities

  15. Interleaving Algorithms • Also an approximate agreement algorithm matching the (log n) time lower bound • Interleaves two algorithms: • One guarantees fault-tolerance • Another guarantees best-case time complexity • Need to coordinate results… • Using a “virtual” two-process approximate agreement algorithm • Similar applications of interleaving, especially in randomized consensus [Saks, Shavit, Woll] • E.g., this morning session [Aspnes, Attiya, Censor] World of (Im)Possibilities

  16. Application: Replicated Storage [Yu and Vahdat] • Emulates a shared memory • Replication-based implementation of wide-area data access services • need automatic regeneration of failed replicas and reconfiguration of groups • Probabilistic guarantee: reads may return stale values with a small probability • Optimizes for best case: • Failure-free reconfiguration is quick and cheap • Failure-induced calls a consensus protocol [Saks, Shavit, Woll]for replicas to agree on next configuration World of (Im)Possibilities

  17. 4th Example: Clock Synchronization • In a distributed system with n nodes that experiences variable message delays, how closely can the nodes' clocks be synchronized? World of (Im)Possibilities

  18. p0 d-u d p1 p0 d d-u p1 Clock Synchronization Lower Bound [Lundelius, Lynch] • No algorithm can synchronize n clocks closer than (1-1/n)uFor a clique with same message delay uncertainty uon all links (u = max delay - min delay) • Even if no failures and no clock drift • Proof introduced the shiftingtechnique shift p0 backwards by u World of (Im)Possibilities

  19. What About Other Topologies? [Halpern, Megiddo, Munshi] • Arbitrary topologies and nonuniform uncertainties • Adversary's optimal strategy is to maximize a certain quantity • involving neighboring nodes' initial clock values and the delays between them • subject to constraints on message uncertainty • Bound is expressed as a system of equations, and this linear program is solved using optimization techniques • Shifting notion is captured in the linear program • Not in closed form except for a few special cases • Bound is tight World of (Im)Possibilities

  20. What About Closed Form Bounds? [Biaz, Welch] • If uncertainties are symmetric (same in both directions of a link), then lower bound is diam/2 where diam is diameter of the graph w.r.t. uncertainties c d b 1 2 5 diam = 9 3 3 2 4 a 4 f 5 e World of (Im)Possibilities

  21. Arbitrary topology G with arbitrary uncertainties is equivalent to clique G' with same nodes where uncertainty between any two nodes is length of shortest path between them in G (w.r.t. uncertainties) [Halpern, Megiddo, Munshi] Shift a carefully chosen execution on the clique, for 2 nodes diam apart to get the diam/2 lower bound. Shifting Equivalent Clique 3 a a b 5 6 6 3 4 3 2 9 f f c 4 2 5 1 5 e d 3 World of (Im)Possibilities

  22. c d b 1 2 5 3 3 2 4 a 4 f 5 e What About Upper Bounds? • For arbitrary graph and arbitrary topology, the radius is an upper bound [Halpern, Megiddo, Munshi] • Since radius ≤ diam, within factor of 2 diam = 9 radius = 5 • Tight & almost tight closed form upper bounds for some specific common topologies with uniform uncertainties [Biaz, Welch] World of (Im)Possibilities

  23. External Clock Synchronization • What about external synchronization, when some clocks have outside time sources? • Previous results for internal synchronization • The tight bound on how close a node's clock can get to the source time is half the shortest path distance (w.r.t. uncertainties) from the node to a source [Attiya, Hay, Welch] c d source b 1 2 bounds are: b: 3/2 c: 1/2 e: 3/2 f: 5/2 5 3 2 4 source a 3 4 f 5 World of (Im)Possibilities

  24. Optimal Synchronization Per Execution • Given information collected in a specific execution,by some algorithm strategy, find the tightest possible synchronization • internal synchronization, offline algorithm [Attiya, Herzberg, Rajsbaum] • external synchronization, online algorithm [Patt-Shamir, Rajsbaum] • extended to handle clock drift [Ostrovsky, Patt-Shamir] World of (Im)Possibilities

  25. Gradient Clock Synchronization • The clock skew between any pair of nodes should be a function of the distance between them [Fan, Lynch] c d b clocks of a and d need not be as tightly synch'ed as those of a and b a f e World of (Im)Possibilities

  26. Gradient Clock Synchronization • motivated by problems in sensor networks, or more generally, large scale networks, where nodes in the same locality need to be more tightly synchronized • data fusion • target tracking http://www.mikalac.com/mis/missile.html World of (Im)Possibilities

  27. Gradient Clock Synch Lower Bound • Closest that two nodes' clocks can get (in worst case) is (log D / log log D) • D is diameter of network  global influence • Algorithms requiring a fixed maximum skew for nearby nodes may not scale well • E.g., TDMA http://www.dsna-dti.aviation-civile.gouv.fr/actualities /revuesgb/revue64gb/64pgarticle2gb/telecom_c2gb.html World of (Im)Possibilities

  28. hardware clock 1+ max slope < 1+ min slope < (1+)-1 clock time (1+)-1 real time Gradient Clock Synch Lower Bound: Assumption 1 Nonzero clock drift: (hardware) clocks can run fast or slow, within known bounds World of (Im)Possibilities

  29. Gradient Clock Synch Lower Bound: Assumption 2 Algorithm must ensure that (logical) clocks always increase at some minimum positive rate  logical clock min slope <  clock time  real time World of (Im)Possibilities

  30. pn p3 p2 p1 Gradient Clock Synch LB: Simple Case • Consider a simple algorithm in which the clock value of p1is periodically propagated down the chain • Can construct execution in which pn-1's new clock value is larger than pn's old clock value by an amount depending on D • carefully choose message delays • manipulate clock drift rates • cause nodes to suddenly jump to higher values without synchronizing with their neighbors • Insight in the paper is generalizing this to any algorithm World of (Im)Possibilities

  31. Is the Lower Bound Tight? • Recall lower bound is (log D / log log D) • Several pre-existing algorithms have O(D) • Then upper bound improved to O(√D) [Locher, Wattenhofer] • Recently upper bound improved to O(log D) [Lenzen, Locher, Wattenhofer] • Still a small gap; can the lower bound be improved? World of (Im)Possibilities

  32. How Long Can Large Difference Last? • In the simple diffusion algorithm on the chain, large difference between pn-1 and pnonly lasts while message is in transit • Perhaps difficulties could be avoided by keeping track of “generation” of clock value and only comparing apples with apples (clocks of the same generation)? • but this could be complicated World of (Im)Possibilities

  33. And There’s a Lot More… • Lower bounds on space for mutual exclusion [Burns, Lynch] • Lower bound on number of messages for leader election in synchronous rings [Frederickson, Lynch] • Impossibility results for data link layer and connection management [Fekete, Lynch, Mansour, Spinelli] [Kleinberg, Attiya, Lynch] • Lower bound on time for consensus in partially synchronous models [Attiya, Dwork, Lynch, Stockmeyer] • Lower bound on time for synchronous k-set agreement [Chaudhuri, Herlihy, Lynch, Tuttle] • Tradeoff between safety and liveness for randomized coordinated attack [Varghese, Lynch] • Impossibility of boosting fault tolerance [Attie, Guerraoui, Kouznetsov, Lynch, Rajsbaum] • … World of (Im)Possibilities

  34. Final Observations • Strive to make the results relevant • Natural problems • Practical architectural assumptions • Realistic performance measures (for lower bounds) • Crisp arguments (ingenious but clear) • Easy to understand and verify • Simple to extend and lead to follow-ups World of (Im)Possibilities

  35. Take-Home Message • Impossibility results help the development of the area • Understanding inherent limits guides efforts in the appropriate directions • And setting boundaries is good for everyone… World of (Im)Possibilities

  36. Thanks for your attention Thank you, Nancy!

More Related