1 / 37

Maintaining SP Relationships Efficiently, on-the-fly

Maintaining SP Relationships Efficiently, on-the-fly. Jeremy Fineman. The Problem. Fork-Join (e.g. Cilk) programs have threads that operate either in series or logically parallel. Want to query relationship between two threads as the program runs.

hanzila
Download Presentation

Maintaining SP Relationships Efficiently, on-the-fly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maintaining SP Relationships Efficiently, on-the-fly Jeremy Fineman

  2. The Problem • Fork-Join (e.g. Cilk) programs have threads that operate either in series or logically parallel. • Want to query relationship between two threads as the program runs. • For example, Nondeterminator uses relationship between two threads as basis for determinacy race.

  3. Parse Tree S • Represent SP-DAG as a parse tree • S nodes show series relationships • P nodes are parallel relationships a c d S d b a P b c

  4. Least Common Ancestor S • SP-Bags uses LCA lookup. • LCA of b and d is an S-node • So b and d are in series • Cost is (v,v) per query (in Nondeterminator) S d a P b c

  5. Two Complementary Walks S • At S-node, always walk left then right • At P-node, can go left then right, or right then left S d a P b c

  6. Two Complementary Walks S • Produce two orders of threads: • ab c d • a c b d • Notice b || c, and orders differ between b and c. S d a P b c

  7. Two Complementary Walks S • Claim: If e1 precedes e2 in one walk, and e2 precedes e1 in the other, then e1 || e2. S d a P b c

  8. Maintaining both orders in a single tree walk • Walk of tree represents execution of program. • Can execute program twice, but execution could be nondeterministic. • Instead, maintain both thread orderings on-the-fly, in a single pass.

  9. Order Maintenance Problem • We need a data structure which supports the following operations: • Insert(X,Y): Place Y after X in the list. • Precedes(X,Y): Does X come before Y?

  10. Naïve Order Maintenance Structure • Naïve Implementation is just a linked list

  11. Naïve Order Maintenance Insert • Insert(X,Y) does standard linked list insert X Y

  12. Naïve Order Maintenance Insert • Insert(X,Y) does standard linked list insert X Y

  13. Naïve Order Maintenance Insert • Insert(X,Y) does standard linked list insert X Y

  14. Naïve Order Maintenance Query • Precedes(X,Z) looks forward in list. X Z

  15. Naïve Order Maintenance Query • Precedes(X,Z) looks forward in list. X Z

  16. Naïve Order Maintenance Query • Precedes(X,Z) looks forward in list. X Z

  17. Naïve Order Maintenance Query • Precedes(X,Z) looks forward in list. X Z

  18. The algorithm • Recall, we are thinking in terms of parse tree. • Maintain two order structures. • When executing node x: • Insert children of x after x in the lists. • Ordering of children depends on whether x is an S or P node.

  19. Example S1 S2 d a P b c Order 1: Order 2:

  20. Example S1 S2 d a P b c S1 Order 1: S1 Order 2:

  21. Example S1 S2 d a P b c S1 S2 d Order 1: S1 S2 d Order 2:

  22. Example S1 S2 d a P b c S1 S2 d Order 1: S1 S2 d Order 2:

  23. Example S1 S2 d a P b c S1 S2 a P d Order 1: S1 S2 a P d Order 2:

  24. Example S1 S2 d a P b c S1 S2 a P d Order 1: S1 S2 a P d Order 2:

  25. Example S1 S2 d a P b c S1 S2 a P d Order 1: S1 S2 a P d Order 2:

  26. Example S1 S2 d a P b c S1 S2 a P b c d Order 1: S1 S2 a P c b d Order 2:

  27. Example S1 S2 d a P b c S1 S2 a P b c d Order 1: S1 S2 a P c b d Order 2:

  28. Example S1 S2 d a P b c S1 S2 a P b c d Order 1: S1 S2 a P c b d Order 2:

  29. Example S1 S2 d a P b c S1 S2 a P b c d Order 1: S1 S2 a P c b d Order 2:

  30. Analysis • Correctness does not depend on execution • Any valid serial or parallel execution produces correct results. • Inserts after x in orders only happen when x executes. • Only one processor will ever insert after x. • Running time depends on implementation of order maintenance data structure.

  31. Serial Running Time • Current Nondeterminator does serial execution. • Can have O(T1) queries and inserts. • Naïve implementation is • O(n) time for query of n-element list. • O(1) time for insert. • Total time is very poor: O(T12)

  32. Use Dietz and Sleator Order Maintenance Structure • Essentially a linked list with labels. • Queries are O(1): just compare the labels. • Inserts are O(1) amortized cost. • On some inserts, need to perform relabel. • O(T1) operations only takes O(T1) time. • Gives us linear time Nondeterminator. Better than SP-bags algorithm.

  33. Parallel Problem • Dietz and Sleator relabels on inserts • Does not work concurrently. • Lock entire structure on insert? • Query is still O(1). • Single relabel can cost O(T1) operations. • Critical path increases to O(T1) • Running time is O(T1/p + T1).

  34. Parallel Problem Solution • Leverage the Cilk scheduler: • There are only O(pT) steals • There is no contention on subcomputations done by single processor between steals. • We do not need to lock every insert.

  35. Parallel Problem Solution Global Structure Size: O(pT) • Top level is global ordering of subcomputations. • Bottom level is local ordering of subcomputation performed on single processor. • On a steal, insert O(1) nodes into global structure.

  36. Parallel Running Time • An insert into a local order structure is still O(1). • An insert into the global structure involves locking the entire global structure. • May need to wait for p processors to insert serially. • Amortized cost is O(p) per insert. • Only O(pT) inserts into global structure. • Total work and waiting time is O(T1 + p2T) • Running time is O(T1/p + pT)

  37. Questions?

More Related