1 / 41

Dynamic Feedback: An Effective Technique for Adaptive Computing

Dynamic Feedback: An Effective Technique for Adaptive Computing. Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/~{pedro,martin}. Basic Issue: Efficient Implementation of Atomic Operations in Object-Based Languages

alexandra
Download Presentation

Dynamic Feedback: An Effective Technique for Adaptive Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Feedback:An Effective Techniquefor Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/~{pedro,martin}

  2. Basic Issue: Efficient Implementation of Atomic Operations in Object-Based Languages Approach: Reduce Lock Overhead by Coarsening Lock Granularity Problem: Coarsening Lock Granularity May Reduce Available Concurrency

  3. Solution: Dynamic Feedback • Multiple Lock Coarsening Policies • Dynamic Feedback • Generate Multiple Versions of Code • Measure Dynamic Overhead of Each Policy • Dynamically Select Best Version • Context • Parallelizing Compiler • Irregular Object-Based Programs • Pointer-Based Data Structures • Commutativity Analysis

  4. Talk Outline • Lock Coarsening • Dynamic Feedback • Experimental Results • Related Work • Conclusions

  5. Model of Computation Atomic Operations • Parallel Programs • Serial Phases • Parallel Phases Serial Phase Parallel Phase Serial Phase • Atomic Operations on Shared Objects • Mutual Exclusion Locks • Acquire Constructs • Release Constructs L.acquire() L.release() Mutual Exclusion Region

  6. Problem: Lock Overhead L.acquire() L.release() L.acquire() L.release()

  7. L.acquire() L.release() L.acquire() L.release() L.acquire() L.release() Solution: Lock Coarsening Original After Lock Coarsening Reference: Diniz and Rinard “Synchronization Transformations for Parallel Computing”, POPL97

  8. Lock Coarsening Trade-Off • Advantage: • Reduces Number of Executed Acquires and Releases • Reduces Acquire and Release Overhead • Disadvantage: May Introduce False Exclusion • Multiple Processors Attempt to Acquire Same Lock • Processor Holding the Lock is Executing Code that was Originally in No Mutual Exclusion Region

  9. L.acquire() L.release() L.acquire() L.release() L.acquire() • • • L.release() L.acquire() L.release() L.acquire() L.release() False Exclusion False Exclusion Original After Lock Coarsening

  10. Lock Coarsening Policy Goal: Limit Potential Severity of False Exclusion Mechanism: Multiple Lock Coarsening Policies • Original: Never Coarsen Granularity • Bounded: Coarsen Granularity Only Within Cycle-Free Subgraphs of ICFG • Aggressive: Always Coarsen Granularity

  11. Choosing Best Policy • Best Lock Coarsening Policy May Depend On • Topology of Data Structures • Dynamic Schedule Of Computation • Information Required to Choose Best Policy Unavailable at Compile Time • Complications • Different Phases May Have Different Best Policy • In Same Phase, Best Policy May Change Over Time

  12. Code Version Original Bounded Aggressive Aggressive Original Overhead Time Sampling Phase Production Phase Sampling Phase Solution: Dynamic Feedback • Generated Code Executes • Sampling Phases: Measure Performance of Different Policies • Production Phases : Use Best Policy From Sampling Phase • Periodically Resample to Discover Best Policy Changes

  13. Guaranteed Performance Bounds • Assumptions: • Overhead Changes Bounded by Exponential Decay Functions • Worst Case Scenario: • No Useful Work During Sampling Phase • Sampled Overheads Are Same For All Versions • Overhead of Selected Version Increases at Maximum Rate • Overhead of Other Versions Decreases at Maximum Rate Overhead V0 Time S S S P

  14. T T T Work - Work Š T  i j i Work = 1P+SN (1 - o1(t)) dt P+SN P P+SN Work - Work Š (P+SN)  opt 0 opt Guaranteed Performance Bound Definition 1. Policy p is at Most  Worse Than Policy p over a Time Interval T if i j Work = 0T (1 - oi(t)) dt where Definition 2. Dynamic Feedback is at Most  Worse Than the Optimal if where Result 1. To Guarantee this Bound (1 - ) P + (1/) e(-P) Š (- 1) SN + (1/)

  15. Guaranteed Performance Bounds (1 - ) P + (1/) e(-P) (- 1) SN + (1/) Constraint Values Feasible Region Production Interval P Production Interval Too Short: Unable to Amortize Sampling Overhead Production Interval Too Long: May Execute Suboptimal Policy for Long Time Basic Constraint: Decay Rate () Must be Small Enough

  16. Dynamic Feedback: Implementation • Code Generation • Measuring Policy Overhead • Interval Selection • Interval Expiration • Policy Switch

  17. Code Generation • Statically Generate Different Code Versions for Each Policy • Alternative: Dynamic Code Generation • Advantages of Static Code Generation: • Simplicity of Implementation • Fast Policy Switching • Potential Drawback of Static Code Generation • Code Size (In Practice Not a Problem)

  18. Measuring Policy Overhead • Sources of Overhead • Locking Overhead • Waiting Overhead • Compute Locking Overhead • Count Number of Executed Acquire/Release Constructs • Estimate Waiting Overhead • Count Number of Spins on Locks Waiting to be Released ( ( ) ) Number of Spins Number of Acquire/Release Acquire/Release Execution Time x + x Spin Time Sampled Overhead = Sampling Time

  19. Interval Selection and Expiration • Fixed Interval Values • Sampling Interval: 10 milliseconds • Production Interval: 10 seconds • Good Results for Wide Range of Interval Values • Polling Code for Expiration Detection • Location: Back Edges of Parallel Loop • Advantage: Low Overhead • Disadvantage: Potential Interaction with Iteration Size Atomic Operations Polling Points

  20. Policy Switch • Synchronous • Processors Poll Timer to Detect Interval Expiration • Barrier At End of Each Interval • Advantages: • Consistent Transitions • Clean Overhead Measurements • Disadvantages: • Need to Synchronize All Processors • Potential Idle Time At Barrier

  21. Experimental Results • Parallelizing Compiler Based on Commutativity Analysis [PLDI’96] • Set of Complete Scientific Applications • Barnes-Hut N-Body Solver (1500 lines of C++) • Liquid Water Simulation Code (1850 lines of C++) • Seismic Modeling String Code (2050 lines of C++) • Different Lock Coarsening Policies • Dynamic Feedback • Performance on Stanford DASH Multiprocessor

  22. 60 60 60 Dynamic Dynamic Original Original 40 40 40 Dynamic Serial Serial Size Text Segment (Kbytes) Size Text Segment (Kbytes) Original Size Text Segment (Kbytes) Serial 20 20 20 0 0 0 Barnes-Hut Water String Code Sizes

  23. 60 40 Original Percentage Lock Overhead 20 Bounded Aggressive 0 Barnes-Hut (16K Particles) Lock Overhead Percentage of Time that the Single Processor Execution Spends Acquiring and Releasing Mutual Exclusion Locks 60 60 40 40 Percentage Lock Overhead Percentage Lock Overhead 20 20 Original Bounded Original Aggressive 0 0 Aggressive String (Big Well Model) Water (512 Molecules)

  24. Aggressive Bounded Original Contention Overhead Percentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors 100 100 100 75 75 75 50 50 50 Contention Percentage 25 25 25 0 0 0 0 4 8 12 16 0 4 8 12 16 0 4 8 12 16 Processors Processors Processors Barnes-Hut (16K Particles) Water (512 Molecules) String (Big Well Model)

  25. Ideal Performance Results: Barnes-Hut 16 Aggressive Dynamic 12 Feedback Bounded Speedup 8 Original 4 0 0 4 8 12 16 Number of Processors Barnes-Hut on DASH (16K Particles)

  26. Ideal Bounded Dynamic Feedback Original Aggressive Performance Results: Water 16 12 Speedup 8 4 0 0 4 8 12 16 Number of Processors Water on DASH (512 Molecules)

  27. Ideal Original Dynamic Feedback Aggressive Performance Results: String 16 12 Speedup 8 4 0 0 4 8 12 16 Number of Processors String on DASH (Big Well Model)

  28. Summary • Code Size Is Not An Issue • Lock Coarsening Has Significant Performance Impact • Best Lock Coarsening Policy Varies With Application • Dynamic Feedback Delivers Code With Performance Comparable to The Best Static Lock Coarsening Policy

  29. Related Work • Adaptive Execution Techniques (Saavedra Park:PACT96) • Dynamic Dispatch Optimizations (Hölzle Ungar:PLDI94) • Dynamic Code Generation (Engler:PLDI96) • Profiling (Brewer:PPoPP95) • Synchronization Optimizations (Plevyak et al:POPL95)

  30. Conclusions • Dynamic Feedback • Generated Code Adapts to Different Execution Environments • Integration with Parallelizing Compiler • Irregular Object-Based Programs • Pointer-Based Linked Data Structures • Commutativity Analysis • Evaluation with Three Complete Applications • Performance Comparable to Best Hand-Tuned Optimization

  31. BACKUP SLIDES

  32. 16 Ideal 14 Aggressive Bounded 12 Original 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of Processors Performance Results : Barnes-Hut Speedup Barnes-Hut (16K Particles)

  33. 16 Ideal Bounded 14 12 Original Aggressive 10 Speedup 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of Processors Performance Results: Water Water (512 Molecules)

  34. 16 Ideal 14 Original 12 Aggressive 10 8 Speedup 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of Processors Performance Results: String String (Big Well Model)

  35. Policy Switch Timer Expires Policy 1 Timer Expires Policy 2

  36. Motivation Challenges: • Match Best Implementation to Environment • Heterogeneous and Mobile Systems Goal: • Develop Mechanisms to Support Code that Adapts to Environment Characteristics Technique: • Dynamic Feedback

  37. Overhead for Barnes-Hut 0.5 0.4 Original 0.3 Sampled Overhead Bounded 0.2 0.1 Aggressive 0 0 5 10 15 20 25 Execution Time (Seconds) Barnes-Hut on DASH (8 Processors) FORCES Loop Data Set - 16K Particles

  38. 0.5 0.4 0.3 Sampled Overhead 0.2 Original 0.1 Bounded 0 0 10 20 30 40 50 60 Execution Time (Seconds) Overhead for Water Water on DASH (8 Processors) INTERF Loop Data Set - 512 Molecules

  39. 1 Aggressive 0.8 0.6 Sampled Overhead 0.4 0.2 Original 0 0 10 20 30 40 50 60 Execution Time (Seconds) Overhead for Water Water on DASH (8 Processors) POTENG Loop Data Set - 512 Molecules

  40. 1 Aggressive 0.8 0.6 Sampled Overhead 0.4 0.2 Original 0 0 100 200 300 400 500 Execution Time (Seconds) Overhead for String String on DASH (8 Processors) PROJFWD Loop Data Set -Big Well

  41. Code Version Aggressive Bounded Original Aggressive Overhead Time Sampling Phase Production Phase Sampling Phase Dynamic Feedback

More Related