430 likes | 598 Views
Two Techniques for Proving Lower Bounds. Hagit Attiya Technion. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A. Goal of this Presentation. Describe two common techniques for proving lower bounds in distributed computing:
E N D
Two Techniques for Proving Lower Bounds Hagit Attiya Technion TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA
Goal of this Presentation • Describe two common techniques for proving lower bounds in distributed computing: • Information theory arguments • Covering • Variations • Applications
My always first slide… problem nicer system architecture algorithm implementation real system architecture
Part I Information Theory Arguments
Overview • Bound the flow of information among processes (and memory) • Show that information takes long to be acquired • Argue that solving a particular problem requires information about many processes • Usually applies to: • Shared memory systems • Synchronous executions (imply lower bounds also for asynchronous executions) • Details depend on the primitives used
Single-writer registers: Possible argument • Need to read from each process • The state of a process can be found only in its own register • Hence, first process must read n registers
Not really When processes take steps together First process doubles information in 2nd step But can’t do better than that
More Refined Argument • Consider synchronized executions • Processes take steps in rounds • All reads appear before all writes • INF(pi,t-1): The set of inputs influencing process pi at the start of round t • For t = 1, INF(pi,t-1) = {pi} • For t > 1, ifpi reads a value written by pj, INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1) • For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1)
INF determines the state • INF(pi,t-1): The set of inputs influencing process pi at the start of round t • For t = 1, INF(pi,t-1) = {pi} • For t > 1, ifpi reads a value written by pj, INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1) • For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1) Proof by case analysis Lemma: If the states of processes in INF(pi,t-1) are the same in configurations C and C’, then pitakes the same steps in a t-round execution from C and from C’
Size of INF • INF(pi,t-1): The set of inputs influencing process pi at the start of round t • For t = 1, INF(pi,t-1) = {pi} • For t > 1, ifpi reads a value written by pj, INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1) • For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1) • I(t) = max |INF(pi,t)| I(t) ≤ 2t Lemma: I(0) = 1, and I (t) ≤ 2 I(t-1)
Simple application: Computing OR • Consider input configuration C0 = (0,0, , 0, , 0) • The size of the influence set of a process is < n in all rounds < log n • Some process pi is not in INF(p1,log n-1) • By lemma, p_1 returns the same value in C0 and in C1 = (0,0, , 1, , 0) • A contradiction pi
Application: Approximate agreement For a small ² > 0 • Processes start with input in [0,1] • Must decide on an output in [0,1] such that • All outputs are within ² of each other (agreement) • If all inputs are v, the output is v (validity) System is asynchronous and a process must decide even if it runs by itself (solo termination)
Application: Approximate agreement [Attiya, Shavit, Lynch] • Consider input configuration C0 = (0,0, , , , 0) • Run all processes to completion from C0must decide 0 • If number of rounds T < log n • I(T) < n • 9 process pi INF(p1,T)
Approximate agreement (cont.) • Consider two input configurations C0 = (0, , , , , 0) C1 = (0, , 1 , , 0) • Run pi to completion, must decide 1 • pi INF(p1,T) • p1 still decides 0 when running from this configuration, contradicting agreement pi Theorem: Solo-terminating approximate agreement requires (log n) rounds in a synchronous failure-free run
Approximate agreement (cont.) • Consider two input configurations C0 = (0, , , , , 0) C1 = (0, , 1 , , 0) • Run pi to completion, must decide 1 • pi INF(p1,T) • p1 still decides 0 when running from this configuration, contradicting agreement pi Overhead of solo-termination: in “nice” runs, since otherwise, a synchronous algorithm can solve the problem in one round. Theorem: Solo-terminating approximate agreement requires (log n) rounds in a synchronous failure-free run
With multi-writer registers • Previous theorem does not hold • A wait-free approximate agreement algorithm that takes O(1) rounds in “nice” executions [Schenk] • Even simpler: An O(1) OR algorithm
With multi-writer registers • Previous theorem does not hold • A wait-free approximate agreement algorithm that takes O(1) rounds in “nice” executions [Schenk] • Even simpler: An O(1) OR algorithm • Only a few initial configurations to distinguish between Overhead of single-writer registers: Separates single-writer and multi-writer registers Can you find it?
Information flow with multi-writer registers The previous argument does not hold Instead, consider how learning more information allows to differentiate between input configurations Capture as a partitioning of process states and memory values [Beame] (0, , , , , 0) (0, , 0 , , 1) (0, , 1 , , 0) (1, , 1 , , 0)
Multi-writer registers: Ordering events Within each round • Put all reads, then • Put all writes • Reads obtain value written at the end of previous round
Partitioning into equivalence classes For process p and round t, two input configurations are in the same equivalence class of P(p,t)if p is in the same state after t rounds from both(in a synchronous failure-free execution) P(t): the number of classes after t rounds (max over p) V(R,t), V(t) defined similarly for locations R P(t), V(t) · (4n+2)2t−2 Lemma: P(t) · P(t-1)V(t-1) and V(t) · n P(t-1)+V(t-1)
Application: The collect problem • update(v) stores v as latest value of a process • collect() returns a set of values (one per process) When each process initially stores one of two values • There are 2n possible input configurationsEach leading to a different output Previous lemma implies (4n+2)2t−2 ≥ P(t) ≥ 2n • Must have (log n) rounds
Also for other primitives (CAS) Non-reading CAS Reading CAS returns the old value (can be handled, but we won’t do that) Can also extend to non-reading kCAS • CAS(R,old,new){ • if R==old then • R = new • return success • else return fail • }
Careful with CAS More information flow in a sequence of steps initially, R == 0 cas(R,0,1) cas(R,1,2) . . . cas(R,n−1,n) On the other hand cas(R,n-1,n) cas(R,n-2,n-1) . . . cas(R,0,1)
Ordering events within a round Put all reads first. Put all writes last. For every register R whose current value is v, consider all CAS events: • Put all events with old v: all fail • Put all events with old == v: only the first succeeds(assumes operations are non-degenerate) Allows to prove a lemma analogue to multi-writer registers (different constants)
Information Flow with Bounded Fan-In Arbitrary objects, but bounded contention • Not too many processes access the same base object similtaneously Isolate processes n a Q-independentexecution • Only processes in Q take steps • Access only objects not modified by processes in Q • For a process p 2 Q, a Q-independent execution is indistinguishable from a p-solo execution
Constructing independent executions Lemma: For any algorithm using only objects with contention ≤ w and every t ≥ 0, there is a t-round Qt-independent execution, with| Qt | ≥ n/(w+2)t Proof by induction, with a trivial base case. Induction step: consider Qt-independentexecution. We use the following result from graph theory. Look at the next steps processes in Qt are about to perform, and construct an undirected graph (V,E) Turan theorem: Any graph (V,E) has an independent set of size |V|2/(|V|+2|E|)
Induction step: The graph • V = Qt • E contains an edge {pi, pj} if • pi and pjaccess the same object, or • pi is about to read an object modified by pj, or • pjis about to read an object modified by pi |E| ≤ | Qt|(w+1)/2 Turan’s theorem and inductive hypothesis there is an independent set Qt+1 of size ≥ n/(w+2)t Omit all steps of Qt – Qt+1from the execution to get aQt+1-independentexecution
Application: Weak Test&Set Weak test&set: Like test&set but at most onesuccess Take t such that (w+2)t < n Lemma gives a t-round {pi,pj}-independent execution • Each of pi and pj seems to be running solo • must succeed • Contradiction Theorem: The solo step complexity of weak test&setis (log n / log w )
Part II Covering
Covering: The basic idea Several processes write to the same location Writes by early processes are lost, if no read in between • Must write to distinct locations • Other process must read these locations
Max Register • WriteMax(v,R) operation • ReadMaxoperation op returns the maximal value written by a WriteMax operation that • completed before op started, or • overlaps op • Special case of a linearizable object
Lower bound for ReadMax operation [Jayanti, Tan, Toueg] The proof is constructive Theorem: ReadMax must read n different registers.
Construction for the lower bound p1 … pkperform WriteMax operations writes by p1 … pk to R1 … Rk Pnperforms ReadMax operation reads R1 … Rk ®k ¯k °k Proof by induction on k = 0, …, n Base case is simple Taking k = n yields the result
Inductive Step p1 … pkperform WriteMax operations writes by p1 … pk to R1 … Rk Pnperforms ReadMax operation ®k ¯k °k must write to R R1 …Rk pk+1 perform WriteMax operations writes by p1 … pk to R1 … Rk does not observe pk+1 Pnperforms ReadMax operation ¯k °k
Inductive Step p1 … pkperform WriteMax operations writes by p1 … pk to R1 … Rk Pnperforms ReadMax operation ®k ¯k °k must write to R R1 …Rk pk+1 perform WriteMax operations ¼k writes by p1 … pk to R1 … Rk must readR R1 …Rk Pnperforms ReadMax operation ¯k °k
Inductive Step p1 … pkperform WriteMax operations writes by p1 … pk to R1 … Rk Pnperforms ReadMax operation ®k ¯k °k pk+1 perform WriteMax operations ¼k writes by p1 … pk to R1 … Rk Pnperforms ReadMax operation write to Rk+1 ¯k °k Claim follows with R1 … RkRk+1 and ®k+1 = ®k¼k
Swap objects Theorem holds for other primitives and objects, e.g., (register-to memory) swap Need some care in constructing ¼k, °k • swap(R,v){ • tmp = R • return tmp • }
Result holds also for other objects • E.g., counters • Constructed execution contains many increment operations • Better algorithms when • Few increment operations • Max register holds bounded values [Aspnes, Attiya, Censor-Hillel]
Counters with CAS Counters can be implemented with a single location R, and a single CAS per operation: • To increment, simply: • read previous value from R • CAS +1 to R • To read the counter, simply read R Lots of contention on R! This is inevitable
The memory stalls measure [Dwork, Herlihy, Waarts] If k processes access (or modify) the same location at the same configuration • The first process incurs one step, and no stalls • The second process incurs one step, and one stall • . • . • . • The k’th process incurs one step, and k-1 stalls
Lower bound on number of stalls Theorem: ReadCounter must incur n stalls + steps. Similar construction as in previous theorem p1 … pkperform Increment operations p1 … pkpoised onR1 … Rm, m · k Pnperforms ReadCounter operation accessesR1 … Rm
Lower bound on number of stalls Theorem: ReadCounter must incur n stalls + steps. Similar construction as in previous theorem p1 … pkperform Increment operations p1 … pkpoised onR1 … Rm, m · k Pnperforms ReadCounter operation incurs k stalls + steps accessesR1 … Rk
Wrap-up • There are many lower bound results But fewer techniques… • Some results & techniques are relevant to questions asked in Transform • Material is based on monograph-in-writing with Faith Ellen • Let me know if you want to proof-read it!