1 / 38

ParaLog: Enabling and Accelerating Online Parallel Monitoring of Multithreaded Applications

ParaLog: Enabling and Accelerating Online Parallel Monitoring of Multithreaded Applications. Evangelos Vlachos , Michelle L. Goodstein, Michael A. Kozuch, Shimin Chen, Phillip B. Gibbons, Babak Falsafi and Todd C. Mowry. Software Errors & Analysis Tools. Errors abundant in parallel software

isolde
Download Presentation

ParaLog: Enabling and Accelerating Online Parallel Monitoring of Multithreaded Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ParaLog: Enabling and Accelerating Online Parallel Monitoring of Multithreaded Applications Evangelos Vlachos, Michelle L. Goodstein, Michael A. Kozuch, Shimin Chen, Phillip B. Gibbons, Babak Falsafi and Todd C. Mowry

  2. Software Errors & Analysis Tools • Errors abundant in parallel software • Program crashes/vulnerabilities, limited performance • Three main categories of analysis tools • Checking before, during or after program execution • Instruction-grain Lifeguards • Online detailed analysis, but with high overhead • Several tools available, but mostly support for single-threaded code ParaLog: a framework for efficient analysis of parallel applications ASPLOS '10 - ParaLog

  3. Lifeguards and Parallel Applications Application Threads Timesliced Execution & Analysis Parallel Execution & Analysis Butterfly Analysis ParaLog Time (previous talk) (this talk) windows of uncertainty precise application order DBI tools available today • high overhead due • to serialization - some false positives +software-based - new hardware required +no false positives +even better performance

  4. Low-Overhead Instruction-level Analysis [Chen et. al., ISCA’08] lifeguard application online monitoring platform lifeguard thread event capturing event delivery applicationthread metadata event stream Application core Lifeguard core accelerators: IT, IF, MTLB add_handler(){ i = load_state(r2); j = load_state(r4); if(check(i, j)) upd_state(r1); else error(); } add r1  r2, r4 add, r1, r2, r4 ASPLOS '10 - ParaLog

  5. Challenges in Parallel Monitoring [ParaLog] online parallel monitoring platform lifeguard application global metadata lifeguard thread 1 event capturing event delivery applicationthread 1 event stream accelerators: IT, IF, MTLB lifeguard thread k event delivery event capturing applicationthread k event stream accelerators: IT, IF, MTLB ASPLOS '10 - ParaLog

  6. Addressing the Challenges [ParaLog] • Application event ordering • Ensuring metadata access atomicity efficiently • Parallelizing hardware accelerators online parallel monitoring platform lifeguard application global metadata lifeguard thread 1 event capturing event delivery applicationthread 1 event stream order enforcing application-onlyorder capturing accelerators: IT, IF, MTLB accelerators: IT, IF, MTLB dependence arcs lifeguard thread k event delivery event capturing applicationthread k event stream order enforcing application-onlyorder capturing accelerators: IT, IF, MTLB accelerators: IT, IF, MTLB ASPLOS '10 - ParaLog

  7. Outline • Introduction • Addressing the Challenges of Parallel Monitoring • Capturing & enforcing application event ordering • Ensuring metadata access atomicity • Parallelizing hardware accelerators • Evaluation • Conclusions ASPLOS '10 - ParaLog

  8. Event Ordering: the Problem • Case Study: Information flow analysis (i.e., Taintcheck) Lifeguard Application thread j thread k thread j thread k Application Time Lifeguard Time st_handler(A) store(A) load(A) ld_handler(A) Expose happens-before information to lifeguards ASPLOS '10 - ParaLog

  9. Event Ordering: the solution (1/2) Application Lifeguard • Coherence-based ordering of application events • Similar to FDR, but online, focusing on application-only events thread j thread k thread j thread k Time tj- 1 tj store(A) st_handler(A) tj+ 1 tk - 1 wait while progressj < tj tk load(A) {thread j, tj} {thread j, tj} tk+ 1 ld_handler(A) progressj: tj- 1 progressj: tj progressk: tk progressk: tk- 1 progressj: tj- 2 progressk: tk- 2 ASPLOS '10 - ParaLog

  10. Is monitoring coherence enough? Event Ordering: the Solution (2/2) Application Lifeguard • Previous work has not solved the problem of Logical Races • Both logical races and system calls resolved with Conflict Alert messages thread j thread k thread j thread k Application Time Lifeguard Time free(A)start free(A) Metadata(A) free(A)end ld_handler(A) load(A) Logical Race Conflict Alert Message Dependence ASPLOS '10 - ParaLog

  11. Metadata Atomicity • Frequent use of locking too expensive • # of instructions added & synchronization cost • Dependence arcs handle the majority of the cases • Sufficient conditions: • One-to-one data-to-metadata mapping • Application reads don’t become metadata writes • Enforcing dependence arcs  race-free operation • Rest of the cases handled by acquiring a lock • Lock used only in the load_handler(); other handlers safe (more details in the paper) ASPLOS '10 - ParaLog

  12. Parallel Hardware Accelerators • Speed-up frequent lifeguard actions • Metadata-TLB;fast metadata address calculation • Idempotent Filters; filter out redundant checking • Inheritance Tracking; fast tracking of dataflow paths • Accelerators have only local view of the analysis • Cache locally analysis information (e.g., frequent events) • Important events have application-wide effects (e.g., free()) • Coherence-like issues with accelerators’ local state • Important events accompanied by Conflict Alerts • Use Conflict Alerts to flush accelerators’ state ASPLOS '10 - ParaLog

  13. Outline • Introduction • Addressing the Challenges of Parallel Monitoring • Capturing & enforcing application event ordering • Ensuring metadata access atomicity • Parallelizing hardware accelerators • Evaluation • Conclusions ASPLOS '10 - ParaLog

  14. Experimental Framework • Log-Based Architectures framework • Simics full-system simulation • CMP system with {2, 4, 8, 16} cores • {1, 2, 4, 8} of application and lifeguard threads • Sequentially Consistent memory model • Benchmarks and multithreaded Lifeguards used • SPLASH-2 and PARSEC • TaintCheck: Information flow tracking; accelerated by M-TLB, IT • AddrCheck: Memory access checking; accelerated by M-TLB, IF • Comparison with Timesliced Monitoring ASPLOS '10 - ParaLog

  15. Performance Results: AddrCheck Normalized to sequential, unmonitored 8 app/lifeguard threads 16 cores total ASPLOS '10 - ParaLog

  16. Performance Results: AddrCheck ASPLOS '10 - ParaLog

  17. Performance Results: AddrCheck 15.4 1.9 9.5 6.1 6.7 2.9 2.3 2.1 6.2 1.9 2.4 1.7 • Timesliced Monitoring is not scalable • On average 15x slowdown over No Monitoring (8 threads) ASPLOS '10 - ParaLog

  18. Performance Results: AddrCheck • Highest overhead with 8 threads: SWAPTIONS 6x • Lowest overhead with 8 threads: < 5% • Average overhead with 8 threads: 26% ASPLOS '10 - ParaLog

  19. Performance Results: TaintCheck ASPLOS '10 - ParaLog

  20. Performance Results: TaintCheck 10 4.6 1.7 2.9 2.1 12.9 11.5 15.7 1.9 6.6 1.9 2.4 2.8 1.7 • Timesliced Monitoring is not scalable • On average 23x slowdown over No Monitoring (8 threads) ASPLOS '10 - ParaLog

  21. Performance Results: TaintCheck • Highest overhead with 8 threads: BARNES 2.6x • Lowest overhead with 8 threads: LU  5% • Average overhead with 8 threads: 48% ASPLOS '10 - ParaLog

  22. Other Results in the Paper • Order capturing and order enforcing under TSO • Performance Impact of Lifeguard Accelerators • AddrCheck: [1.13x – 3.4x], TaintCheck: [2x – 9x] • A less expensive order capturing mechanism gets similar performance results • 1 timestamp per core vs. 1 timestamp per cache block ASPLOS '10 - ParaLog

  23. Conclusions • ParaLog: Fast and precise parallel monitoring • Components of event ordering • Normal memory accesses: monitor coherence activity • Logical Races; use of Conflict Alert messages • Metadata Atomicity • Enforcing dependence arcs ensures atomicity (most cases) • Parallel Hardware Accelerators • Flush local state on remote events (Conflict Alert) • Average overhead is relatively low • AddrCheck: 26% and TaintCheck: 48% (8 threads) ASPLOS '10 - ParaLog

  24. Questions ? ASPLOS '10 - ParaLog

  25. Backup Slides ASPLOS '10 - ParaLog

  26. Metadata Atomicity LockSet • Synchronization-free fast path vs. slow path • Concurrent application reads; no ordering available! • Concurrent metadata reads: follow the fast-path • Concurrent metadata writes: follow slow-path acquiring a lock • Concurrent metadata read and write: read may get either value • In any other case dependence arcs are available AddrCheck TaintCheck MemCheck ASPLOS '10

  27. Parallel Hardware Accelerators • Accelerators have only local view of the analysis • Important events have system-wide effects • Case study: Idempotent Filters and AddrCheck ✖ ✔ ✔ ✖ ✔ ✔ Delivered to lifeguard Builds on Remote Conflict Messages LG 0 free(A) R(A) IF ✖ Redundant; discarded R(A) R(B) R(A) R(A) LG 1 Flush IF filters Flush local and remote IF filters IF R(A) R(B) R(C) R(A) free(A) ✖ ✔ ✔ ✔ • Details for parallel M-TLB and IT can be found in the paper ASPLOS '10 - ParaLog

  28. Performance Impact of Lifeguard Accelerators • Accelerators provide a major speedup [2x – 9x] 11.3 6.8 7.3 9.4 ASPLOS '10 - ParaLog

  29. Performance Impact of Lifeguard Accelerators • Accelerators provide a major speedup [1.13x – 3.4x] ASPLOS '10 - ParaLog

  30. Transitive Reduction Sensitivity Study • Limited transitive reduction • No major performance impact; savings in chip area ASPLOS '10 - ParaLog

  31. Supporting Total Store Order (TSO) • Cycle of dependencies in relaxed memory models • TSO relaxes the RAW ordering • Previous work (RTR): maintain versions of data • Identify SC offending instructions; save loaded value • This paper: maintain versions of metadata Memory Order:     Commit order Thread 0 Thread 1 Log 0Log 1 Lifeguard 0 produce_version(v1,A) P(v1, A) P(v0, B)   0 Wr(A) Wr(B) Wr(A) Wr(B) store_handler(A) 1 C(v0, B) C(v1, A) wait_until_available(v0,B) 2 Rd(B) Rd(A)   Rd(B, v0) Rd(A, v1) load_handler(B, v0) ASPLOS '10 - ParaLog

  32. Parallel Hardware Accelerators • Speed-up frequent lifeguard actions • Fast metadata address calculation – Metadata-TLB • Fast tracking of data-flow paths – Inheritance Tracking • Filter out redundant checking – Idempotent Filters • Per-instruction checking gives the same result; cache event • Accelerators have only local view of the analysis • Important events have system-wide effects (e.g., free()) • Coherence-like issues with accelerators’ local state • Important events accompanied by Conflict Alerts • Use Conflict Alerts to flush state and deliver pending events ASPLOS '10 - ParaLog

  33. Experimental Framework ASPLOS '10 - ParaLog

  34. Relative Slowdown - TaintCheck ASPLOS '10 - ParaLog

  35. Relative Slowdown - AddrCheck 3.0 6.0 ASPLOS '10 - ParaLog

  36. Performance Results - AddrCheck 15.4 1.9 9.5 6.1 6.7 2.9 2.3 2.1 6.2 1.9 2.4 1.7 ASPLOS '10 - ParaLog

  37. Performance Results - TaintCheck 10 4.6 1.7 2.9 2.1 12.9 11.5 15.7 1.9 6.6 1.9 2.4 2.8 1.7 ASPLOS '10 - ParaLog

  38. Parallel Hardware Accelerators • Speed-up frequent lifeguard actions • Metadata-TLB & Inheritance Tracking (discussed in the paper) • Idempotent Filters; identify and filter out redundant checking • Per-instruction checking gives the same result • Cache incoming event and local state to identify redundancy • Accelerators have only local view of the analysis • Important events have application-wide effects (e.g., free()) • Coherence-like issues with accelerators’ local state • Important events accompanied by Conflict Alerts • Use Conflict Alerts to flush accelerators’ state ASPLOS '10 - ParaLog

More Related