1 / 40

Synthesizing Concurrent Collectors: A Structured Approach

Designing practical concurrent algorithms is challenging; this research aims to synthesize correct and optimal collectors from coarse-grained specifications, focusing on a specific family of collection algorithms. The study explores a unifying framework for collection algorithms, with numerous contributions and insights.

byronj
Download Presentation

Synthesizing Concurrent Collectors: A Structured Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CGCExplorer: A Semi-Automated Search Procedurefor Provably Correct Concurrent Collectors Martin Vechev Eran Yahav David Bacon Noam Rinetzky University of Cambridge IBM T.J. Watson Research Center Tel Aviv University

  2. Synthesizing Concurrent Algorithms • Designing practical and efficient concurrent algorithms is hard • trading off simplicity for performance • fine-grained coordination • Result: sub-optimal, buggy algorithms • Need a more structured approach to synthesize correct and optimal implementations out of coarse-grained specifications Some tasks are best done by machine, while others are best done by human insight; and a properly designed system will find the right balance. – D. Knuth

  3. Synthesizing Concurrent Collectors • Concurrent garbage collectors • Widely used • Must be correct, but also fast and scalable • Many algorithms, not many formal proofs • A challenge problem for verification and synthesis • Concurrency • Heap with no a priori bound • Focus on a specific family of collection algorithms • A generalization of Dijkstra’s algorithm • Concurrent, Tracing, Non-moving • Single mutator, single collector (non-parallel)

  4. Mutator Step Trace Step Expose Mutator Collector Contributions • Unifying framework – collection algorithms as common skeleton with parametric functions

  5. Contributions

  6. Contributions explored 1,600,000 collection algorithms found 6 correct algorithms hundreds of variations specified various sets of blocks in 10 cycles

  7. Overview Generation Verification

  8. Algorithm Space - Counting Algorithms • Track collector’s progress (wavefront) • Count pointer installations from behind wavefront • Increment on install, decrement on delete • Up to a predetermined counting threshold • expose objects with count > 0 when finished tracing Collector wavefront root 1 object header scanned field

  9. Counting Algorithms: High Level View Mutator Step Trace Step read field valueupdate wavefront (collector progress)mark target object update source field to target objcheck wavefrontif source field behind wavefront - update new target object count - update old target object count select objects with count > 0produce new roots Expose Mutator Collector

  10. Coarse-Grained to Fine-Grained Synchronization Mutator Step (source, field, new) Trace Step (source, field) { C1: dst = source.field C2: source.field.WF = true C3: mark dst } { M1: old = source.field M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC-- M6: source.fld = new } atomic atomic Set Expose (log) { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V } What now ? Can we remove atomics ? atomic Result is incorrect, may lose objects!

  11. Coarse-Grained to Fine-Grained Synchronization Mutator Step (source, field, new) Trace Step (source, field) { C1: dst = source.field C2: source.field.WF = true C3: mark dst } { M1: old = source.field M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC-- M6: source.fld = new } Set Expose (log) { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V } What now ? Can we remove atomics ?

  12. Coarse-Grained to Fine-Grained Synchronization Mutator Step (source, field, new) Trace Step (source, field) { C1: dst = source.field C2: source.field.WF = true C3: mark dst } { M1: old = source.field M2: w = source.field.WF M5: w  old.MC-- M3: w new.MC++ M4: w log = log U {new} M6: source.fld = new } “When in doubt, use brute force.” --Ken Thompson Set Expose (log) { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} return V } What now ? Can we remove atomics ?

  13. System Input – Building Blocks Tracing Step Building Blocks Mutator Building Blocks C1: dst = source.field C3: mark dst C2: source.field.WF = true M1: old = source.field M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC-- M6: source.fld = new Expose Building Blocks E1: o= remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} • Input Constraints • Mutator blocks: [M3, M4] • Tracing blocks: [C1, C3] • Expose blocks:[ E1, E2, E3, E4 ] • Dataflow e.g. M2 < M3

  14. Mutator Step (source, field, new) { M1: old = source.field M6: source.fld = new M2: w = source.field.WF M3: w  new.MC++ M4: w  log = log U {new} M5: w  old.MC— } Set Expose(log) { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o} } System Output – (Verified) Algorithms Trace Step (source, field) • Explored 306 variations in around 2 mins { C1: dst = source.field C3: mark dst C2: source.field.WF = true } • Least atomic (verified) algorithm with given blocks

  15. But What Now ? • How do we get further improvement? • Need more insights • Need new building blocks • Example: start and end of collector reading a field Coordination Meta-data Atomicity Ordering

  16. Continuing the Search… • We derived a non-atomic algorithm (at the granularity of blocks) • Non atomic write-barrier, collector step and expose • System explored over 1,600,000 algorithms (took ~34 hours) • All experiments took ~41 machine hours and ~3 human hours

  17. CGC: Challenge for Automatic Verification • Unbounded heap and sequence of mutations • Checking a global invariant is hard • State space too big even for partial checking • 3 nodes can quickly consume several GB in the SPIN model checker • Solution • Manually boil down to a local invariant • Automatically prove local invariant • Use abstraction - unbounded number of concrete nodes conservatively represented by small, bounded number of abstract nodes

  18. What Do We Prove? • Want to prove collector safety • Retaining all live objects • Local invariant: for every object • If an object is referenced from a scanned field at time of expose, it is either marked, or its count > 0 • Show for any arbitrary object, under any arbitrary sequence of mutations

  19. Abstraction Intuition wavefront root hiddn hiddn 2 object header scanned field Select tracked representative object Track reference count only for the selected object

  20. Abstraction Intuition wavefront root hiddn hiddn 2 object header scanned field Only up to a fixed number of pointers matter – up to counting threshold • Track these precisely • Forget the rest

  21. Recap Generation Verification Find proof outline Find proof building blocks

  22. What’s next? • Concurrent Collector Synthesis • Get real algorithms • Mapping to real machine instructions • Yet another level of search • Synthesis of other concurrent algorithms • In the pipeline – concurrent set algorithms • Local abstractions for concurrent programs

  23. The End

  24. Are your algorithms practical? What are the limitations of this approach? Would it work for my problem? How do you prove that your algorithms terminate? Can you show another algorithm? How do you reduce the number of calls to the model-checker? You didn’t mention any related work Can you give more details on experimental results? Invited Questions

  25. ANSWERS FOLLOW

  26. Where Do Building Blocks Come From? • Read/write of heap location, and • Collector coordination meta-data • e.g., collector progress, state flags

  27. header header header header header header count marked start_1 end_1 fld_1 fld_1 fld_1 fld_1 fld_1 fld_1 start_2 end_2 fld_2 fld_2 fld_2 fld_2 fld_2 fld_2 fld_2 fld_2 fld_2 fld_2 fld_2 fld_2 start_3 end_3 count marked count marked start_1 end_1 end_1 start_2 end_2 start_2 end_2 start_3 start_3 end_3 count marked count marked start_1 end_3 count marked Progress Coordination Metadata 6 bits … 5 bits … 1 bit 0 bits

  28. Refined Input – Finer Building Blocks Mutator Building Blocks Collector Building Blocks C1: dst = source.field C3: mark dst C2s: source.field.WFs = true C2e: source.field.WFe = true M1: old = source.field M2s: ws = source.field.WFs M2e: we = source.field.WFe M3s: ws  new.MC++ M4s: ws  log = log U {new} M5e: we  old.MC-- M6: source.fld = new Expose Building Blocks • Input Constraints • Mutator:[ M3s, M4s ] • Tracing: [C1, C3], C2s < [C1, C3] < C2e • Expose:[ E1, E2, E3, E4 ] • Dataflow: e.g. M2s < M3s E1: o = remove element from log E2: mc = o.MC E3: (mc > 0) mark o E4: (mc > 0) V = V U {o}

  29. System Output Mutator Step (source, field, new) Trace Step (source, field) { M1: old = source.field M2e: we = source.field.WFe M6: source.fld = new M2s: ws = source.field.WFs M3s: ws  new.MC++ M4s: ws  log = log U {new} M5e: we  old.MC– } { C2s: source.field.WFs = true C1: dst = source.field C3: mark dst C2e: source.field.WFe = true } Set expose (log) { E1: o = remove element from log E2: mc = o.MC E3: (mc > 0)  mark o E4: (mc > 0)  V = V U {o} } • Constraints = Insights. e.g.: M2e < M6 < M2s and. C2s < C13 < C2e

  30. (Some) Related Work • Superoptimizer: a look at the smallest program, Massalin, ASPLOS’87 • Finite state, limited length of instruction sequences • Programming by Sketching, Solar-Lezama et. al., PLDI’05 • Finite state • Sketching with Stencils, Solar Leazma et. al., PLDI’07 • Automatic discovery of mutual exclusion algorithms, Bar David and Taubenfeld, PODC’03 • Finite state • Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms, PLDI’06 • CheckFence: Sebastian Burckhardt, Rajeev Alur and Milo M. K. Martin, PLDI’07 • …

  31. Algorithm Exploration lessatomic different orders moreatomic

  32. lessatomic lessatomic lessatomic different orders different orders different orders moreatomic moreatomic moreatomic Algorithm Exploration Mutator Step Trace Step Expose

  33. Limitations • Need algorithm designer insights • Designer needs to understand results of each phase • Abstraction is tailor-made • Designing an abstraction for the next collector? • Pushing the limits of current model-checkers • Multiple mutators? Unbounded number of mutators? • Better partial-order reduction may help

  34. Are Your Algorithms Practical? • Are your algorithms correct? • Honest answer: not yet • So far focused on correctness more than on performance • However, counting algorithms are of practical interest The moral is that for the design of multiprocessor installations we cannot rely on the traditional approach of the optimistic engineer, who, when the design looks reasonable, puts it together to see if it works. --Edsger W.Dijkstra

  35. Experimental Results + About 180 minutes of human working with the system (3.8 Ghz Xeon processor and 8 Gb memory running version 4 of RedHat Linux.)

  36. Why Does it Work? • Ingredients • Relentless optimism • Limited setting • Limited Setting • single collector, single mutator • counting threshold is known • algorithm skeleton is fixed • algorithm uses a barrier before moving to the sweep phase • … (see paper)

  37. Algorithm Space - Counting Algorithms • Concurrent • Single mutator, single collector (not parallel) • Tracing • Computes transitive reachability from roots • Non-Moving • Collector does not relocate objects

  38. How Do You Prove Termination? Manually

  39. DEMONS START HERE IF NOT EARLIER

  40. Synthesizing Concurrent Algorithms it seems unavoidable that multiprocessor installations will be built… it seems equally unavoidable that many of them will be put together by aforementioned optimistic engineer. I shudder at the thought of all the new bugs: they will only delight the Devil.Am I too pessimistic? Nobody knows the trouble I have seen... --Edsger W.Dijkstra Some tasks are best done by machine, while others are best done by human insight; and a properly designed system will find the right balance. – D. Knuth

More Related