1 / 40

Efficient Dependency Tracking for Relevant Events in Shared Memory Systems

Efficient Dependency Tracking for Relevant Events in Shared Memory Systems. Anurag Agarwal (anurag@cs.utexas.edu) Vijay K. Garg (garg@ece.utexas.edu) PDS Lab University of Texas at Austin. Outline. Motivation Background Chain Clock Instances of Chain Clock Experimental Results

goldy
Download Presentation

Efficient Dependency Tracking for Relevant Events in Shared Memory Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal (anurag@cs.utexas.edu) Vijay K. Garg (garg@ece.utexas.edu) PDS Lab University of Texas at Austin

  2. Outline • Motivation • Background • Chain Clock • Instances of Chain Clock • Experimental Results • Conclusion

  3. Motivation • Dependency between events required for global state information • Applications like monitoring and debugging • Vector clock [Fidge 88, Mattern 89] • O(N) operations for a system with N processes • Dynamic creation of processes

  4. Outline • Motivation • Background • Chain Clock • Instances of Chain Clock • Experimental Results • Conclusion

  5. Relevant Events • Events “useful” for application • Predicate Detection • “There are no messages in the channel” p1 p2 p3 p4

  6. Vector Clocks [Fidge 88, Mattern 89] • Assigns N-tuple (V) to every relevant event • e → f iff e.V < f.V (clock condition) • Process Pi : • V = (0, … , 0) • On an event e • If e is receive of message m: V = max (V, m.V) • If e is a relevant event: V[i] = V[i] + 1 • If e is a send of message m: m.V = V

  7. Outline • Motivation • Background • Chain Clock • Instances of Chain Clock • Experimental Results • Conclusion

  8. p1 a b c d p2 p3 e f g h p4 Key Idea • Any chain in the computation poset can function as a process a b c e d h f g

  9. Chain Clocks • A component in timestamp corresponds to a chain • Change “Rule II” in the vector clock algorithm • If e is a relevant event V[e.c] = V[e.c] + 1 • Theorem: Chain clocks guarantee the “clock condition” • Goal: Online decomposition of poset into as few chains as possible

  10. Outline • Motivation • Background • Chain Clock • Instances of Chain Clock • DCC • ACC • VCC • Experimental Results • Conclusion

  11. Dynamic Chain Clocks (DCC) • Shared vector Z maintains up-to-date values of all components • Each process starts with empty vector • Rule II • e.c = j such that Z[j] = e.V[j] • Give preference to component last updated by Pi • V[e.c] = V[e.c] + 1

  12. DCC: Example • If e is receive of message m: V = max (V, m.V) • If e is a relevant event: e.c = i s.t. Z[i] = V[i] V[e.c] = V[e.c] + 1 Z[e.c] = Z[e.c] + 1 • If e is a send of message m: m.V = V p1 (1) (1,1) = max{(1),(0,1)} (2,1) (3,1) p2 (0,1) p3 (3,1) (3,2) V1 V2 V3 Z 1 3 2 0 3 3 2 1 1 1 1 2 1 2 1

  13. Problem • Number of processes can be much larger than minimal number of chains p1 (1) p2 (1,2) (0,1) p3 (0,1,1) (1,2,2) p4 (0,1,1,1) (1,2,2,2)

  14. Optimal Chain Decomposition • Antichain: Set of pairwise concurrent elements • Width: Maximum size of an antichain • Dilworth’s Theorem [1950] : A poset of width k can be partitioned into k chains and no fewer. • Requires knowledge of complete poset

  15. Online Chain Decomposition • Elements of poset presented in a total order consistent with the poset • Assign elements to chains as they arrive • Can be modeled as a game between • Bob : Presents elements • Alice : Assigns them to chains • Felsner [1997] : For a poset of width k, Bob can force Alice to use k(k+1)/2 chains

  16. Chain Partitioning Algorithm (ACC) • Felsner gave an algorithm which meets the k(k+1)/2 bound • Our algorithm is simpler and more efficient • B1 … Bk : |Bi| = i • For an element z: • Insert into the first queue q in Bi with head < z • Swap queues in Bi and Bi-1 leaving q in its place z B1 B2 B3

  17. Drawback of DCC and ACC • Require a shared data structure • Monitoring applications generally need a central server • Hybrid clocks • Multiple servers, each responsible for a subset of processes • Finds chains within a process group

  18. Shared Memory System • Accesses to shared variables induce dependencies • Observation: Access events for a shared variable form a chain • Variable-based Chain Clocks (VCC) • Associate a component with every variable

  19. y = 2 x = 0 x = 2 x =1 y = 1 x = 1 VCC Application: Predicate Detection • Predicate : (x = 1) and (y = 1) • Only events changing x and y are relevant • Associate a component of VCC with x and other with y Initially: x=0, y = 0

  20. Outline • Motivation • Background • Chain Clock • Instances of Chain Clock • Experimental Results • Conclusion

  21. Experiments • Setup • A multithreaded application • Each thread generates a sequence of events • Parameters: • Number of Processes • Number of Events • Probability of relevant event: a • Metrics • Number of components used • Execution time

  22. Components Used Events = 100 a = 1%

  23. Execution Time Events = 100 a = 1%

  24. Effect of Relevancy Threads = 100 Events = 100

  25. Conclusion • Generalized vector clocks to a class of algorithms called Chain Clocks • Dynamic Chain Clock (DCC) can provide tremendous speedup and reduce memory requirement for applications • Antichain-based Chain Clock (ACC) meets the lower bound for chain decomposition

  26. Questions?

  27. Example: Poset of width 2 • For a poset of width 2, Alice can force Bob to use 3 chains 3 1 1 2

  28. Drawback of DCC and ACC • Require a shared data structure • Monitoring applications generally need a central server • Hybrid clocks • Multiple servers, each responsible for a subset of processes • Finds chains within a process group

  29. Example: Poset of width 2 • For a poset of width 2, Alice can force Bob to use 3 chains 3 1 1 2

  30. Chain Partitioning Algorithm (ACC) • Felsner gave an algorithm which meets the k(k+1)/2 bound • Our algorithm is simpler and more efficient • B1 … Bk : |Bi| = i • For an element z: • Insert into the first queue q in Bi with head < z • Swap queues in Bi and Bi-1 leaving q in its place z B1 B2 B3

  31. Happened Before Relation (→)[Lamport 78] • Distributed computation with N processes • Every process executes a series of events • Internal, send or receive event p1 p2 • e → f if there is a path from e to f • e║f if there is no path between e and f

  32. Future work • Lower bound for online chain decomposition when a decomposition into N chains is already known • Other chain decomposition strategies

  33. Distributed System: Time vs Threads Events = 100 a = 1%

  34. Distributed System: Events vs Time Threads = 100 a = 1%

  35. Effect of Number of Events Threads = 100 a = 1%

  36. DCC: Example • If e is receive of message m: V = max (V, m.V) • If e is a relevant event: e.c = i s.t. Z[i] = V[i] V[e.c] = V[e.c] + 1 Z[e.c] = Z[e.c] + 1 • If e is a send of message m: m.V = V p1 (1) (1,1) = max{(1),(0,1)} (2,1) (3,1) p2 (0,1) p3 (3,1) (3,2) V1 V2 V3 Z 1 3 2 0 3 3 2 1 1 1 1 2 1 2 1

  37. Example for DCC – is it appropriate ? • Is the content a bit too much for this amount • Where can I reduce it ? • Remove VCC or ACC ? • Chain clock • Generalizes vector clocks • Reduces the time and memory overhead • Elegantly handles dynamic process creation

More Related