1 / 34

Synchronization Transformations for Parallel Computing

Synchronization Transformations for Parallel Computing. Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/~{pedro,martin}. Motivation. Parallel Computing Becomes Dominant Form of Computation

happy
Download Presentation

Synchronization Transformations for Parallel Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Synchronization TransformationsforParallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara http://www.cs.ucsb.edu/~{pedro,martin}

  2. Motivation Parallel Computing Becomes Dominant Form of Computation Parallel Machines Require Parallel Software Parallel Constructs Require New Analysis and Optimization Techniques Our Goal Eliminate Synchronization Overhead

  3. Talk Outline • Motivation • Model of Computation • Synchronization Optimization Algorithm • Applications Experience • Dynamic Feedback • Related Work • Conclusions

  4. Acq Mutual Exclusion Region S1 Rel Model of Computation • Parallel Programs • Serial Phases • Parallel Phases • Single Address Space • Atomic Operations on Shared Data • Mutual Exclusion Locks • Acquire Constructs • Release Constructs

  5. Acq Rel Reducing Synchronization Overhead S1 S2 S3

  6. Rel Acq

  7. Synchronization Optimization Idea: Replace Computations that Repeatedly Acquire and Release the Same Lock with a Computation that Acquires and Releases the Lock Only Once Result: Reduction in the Number of Executed Acquire and Release Constructs Mechanism: Lock Movement Transformations and Lock Cancellation Transformations

  8. Lock Cancellation

  9. Acquire Lock Movement

  10. Release Lock Movement

  11. Synchronization Optimization Algorithm Overview: • Find Two Mutual Exclusion Regions With the Same Lock • Expand Mutual Exclusion Regions Using Lock Movement Transformations Until They are Adjacent • Coalesce Using Lock Cancellation Transformation to Form a Single Larger Mutual Exclusion Region

  12. Interprocedural Control Flow Graph

  13. Acquire Movement Paths

  14. Release Movement Paths

  15. Migration Paths and Meeting Edge

  16. Intersection of Paths

  17. Compensation Nodes

  18. Final Result

  19. Synchronization Optimization Trade-Off • Advantage: • Reduces Number of Executed Acquires and Releases • Reduces Acquire and Release Overhead • Disadvantage: May Introduce False Exclusion • Multiple Processors Attempt to Acquire Same Lock • Processor Holding the Lock is Executing Code that was Originally in No Mutual Exclusion Region

  20. False Exclusion Policy Goal: Limit Potential Severity of False Exclusion Mechanism: Constrain the Application of Basic Transformations • Original: Never Apply Transformations • Bounded: Apply Transformations only on Cycle-Free Subgraphs of ICFG • Aggressive: Always apply Transformations

  21. Experimental Results • Automatic Parallelizing Compiler Based on Commutativity Analysis [PLDI’96] • Set of Complete Scientific Applications (C++ subset) • Barnes-Hut N-Body Solver (1500 lines of Code) • Liquid Water Simulation Code (1850 lines of Code) • Seismic Modeling String Code (2050 lines of Code) • Different False Exclusion Policies • Performance of Generated Parallel Code on Stanford DASH Shared-Memory Multiprocessor

  22. Lock Overhead Percentage of Time that the Single Processor Execution Spends Acquiring and Releasing Mutual Exculsion Locks 60 60 60 Original 40 40 40 Bounded Percentage Lock Overhead Percentage Lock Overhead Percentage Lock Overhead 20 20 20 Original Bounded Original Aggressive Aggressive 0 0 Aggressive 0 Barnes-Hut (16K Particles) String (Big Well Model) Water (512 Molecules)

  23. Aggressive Bounded Original Contention Overhead Percentage of Time that Processors Spend Waiting to Acquire Locks Held by Other Processors 100 100 100 75 75 75 50 50 50 Contention Percentage 25 25 25 0 0 0 0 4 8 12 16 0 4 8 12 16 0 4 8 12 16 Processors Processors Processors Barnes-Hut (16K Bodies) Water (512 Molecules) String (Big Well Model)

  24. 16 Ideal 14 Aggressive Bounded 12 Original 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of Processors Performance Results : Barnes-Hut Speedup Barnes-Hut (16384 bodies)

  25. 16 Ideal Bounded 14 12 Original Aggressive 10 Speedup 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of Processors Performance Results: Water Water (512 Molecules)

  26. 16 Ideal 14 Original 12 Aggressive 10 8 Speedup 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of Processors Performance Results: String String (Big Well Model)

  27. Choosing Best Policy • Best False Exclusion Policy May Depend On • Topology of Data Structures • Dynamic Schedule Of Computation • Information Required to Choose Best Policy Unavailable at Compile Time • Complications • Different Phases May Have Different Best Policy • In Same Phase, Best Policy May Change Over Time

  28. Solution: Dynamic Feedback • Generated Code Consists of • Sampling Phases: Measure Performance of Different Policies • Production Phases : Use Best Policy From Sampling Phase • Periodically Resample to Discover Changes in Best Policy • Guaranteed Performance Bounds

  29. Dynamic Feedback Code Version Aggressive Bounded Original Aggressive Overhead Time Sampling Phase Production Phase Sampling Phase

  30. 16 Ideal Aggressive 14 Dynamic 12 Feedback 10 Bounded 8 Original 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of Processors Dynamic Feedback : Barnes-Hut Speedup Barnes-Hut (16384 bodies)

  31. 16 Ideal Bounded 14 Dynamic 12 Feedback 10 Original Speedup Aggressive 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of Processors Dynamic Feedback : Water Water (512 Molecules)

  32. 16 Ideal 14 Original Dynamic 12 Feedback 10 Aggressive 8 Speedup 6 4 2 0 0 2 4 6 8 10 12 14 16 Number of Processors Dynamic Feedback : String String (BigWell Model)

  33. Related Work • Parallel Loop Optimizations (e.g. [Tseng:PPoPP95]) • Array-based Scientific Computations • Barriers vs. Cheaper Mechanisms • Concurrent Object-Oriented Programs (e.g. [PZC:POPL95]) • Merge Access Regions for Invocations of Exclusive Methods • Concurrent Constraint Programming • Bring Together Ask and Tell Constructs • Efficient Synchronization Algorithms • Efficient Implementations of Synchronization Primitives

  34. Conclusions • Synchronization Optimizations • Basic Synchronization Transformations for Locks • Synchronization Optimization Algorithm • Integrated into Prototype Parallelizing Compiler • Object-Based Programs with Dynamic Data Structures • Commutativity Analysis • Experimental Results • Optimizations Have a Significant Performance Impact • With Optimizations, Applications Perform Well • Dynamic Feedback

More Related