1 / 38

Optimistic Synchronization for Efficient Parallel Programs

Learn about the advantages and limitations of optimistic synchronization for automatic parallelization. Understand the implementation of atomic operations and the synchronization selection algorithm for fine-grain synchronization. Explore the benefits of optimistic synchronization in modern processors.

fullerk
Download Presentation

Optimistic Synchronization for Efficient Parallel Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effective Fine-Grain Synchronization For Automatically Parallelized Programs Using Optimistic Synchronization Primitives Martin Rinard University of California, Santa Barbara

  2. Problem Efficiently Implementing Atomic Operations On Objects Key Issue Mutual Exclusion Locks Versus Optimistic Synchronization Primitives Context Parallelizing Compiler For Irregular Object-Based Programs Linked Data Structures Commutativity Analysis

  3. Talk Outline • Histogram Example • Advantages and Limitations of Optimistic Synchronization • Synchronization Selection Algorithm • Experimental Results

  4. Histogram Example class histogram { private: int counts[N]; public: void update(int i) { counts[i]++; } }; parallel for (i = 0; i < iterations; i++) { int c = f(i); h->update(c); } 3 7 4 1 2 0 5 8

  5. Cloud Of Parallel Histogram Updates Histogram iteration 0 3 iteration 8 7 4 iteration 2 iteration 1 1 iteration 7 2 iteration 3 0 iteration 6 iteration 4 5 8 iteration 5 Updates Must Execute Atomically

  6. One Lock Per Object class histogram { private: int counts[N]; lock mutex; public: void update(int i) { mutex.acquire(); counts[i]++; mutex.release(); } }; Problem: False Exclusion

  7. One Lock Per Item class histogram { private: int counts[N]; lock mutex[N]; public: void update(int i) { mutex[i].acquire(); counts[i]++; mutex[i].release(); } }; Problem: Memory Consumption

  8. Histogram 3 7 4 1 2 0 5 8 Optimistic Synchronization Load Old Value Compute New Value Into Local Storage Commit Point No Write Between Load and Commit Write Between Load and Commit Commit Succeeds Write New Value Commit Fails Retry Update

  9. Load Old Value Compute New Value Into Local Storage Commit Fails Retry Update Parallel Updates With Optimistic Synchronization Load Old Value 3 7 4 Compute New Value Into Local Storage 1 2 0 5 8 Commit Succeeds Write New Value

  10. Optimistic Synchronization In Modern Processors • Load Linked (LL) - Used To Load Old Value • Store Conditional (SC) - Used To Commit New Value Atomic Increment Using Optimistic Synchronization Primitives retry: LL $2,0($4) # Load Old Value addiu $3,$2,1 # Compute New Value Into # Local Storage SC $3,0($4) # Attempt To Store New Value beq $3,0,retry # Retry If Failure

  11. Optimistically Synchronized Histogram class histogram { private: int counts[N]; public: void update(int i) { do { new_count = LL(counts[i]); new_count++ } while (!SC(new_count, counts[i])); } };

  12. Aspects of Optimistic Synchronization • Advantages • Slightly More Efficient Than Locked Updates • No Memory Overhead • No Data Cache Overhead • Potentially Fewer Memory Consistency Requirements • Advantages In Other Contexts • No Deadlock, No Priority Inversions, No Lock Convoys • Limitations • Existing Primitives Support Only Single Word Updates • Each Update Must Be Synchronized Individually • Lack of Fairness

  13. Synchronization In Automatically Parallelized Programs Serial Program Assumption: Operations Execute Atomically CommutativityAnalysis Unsynchronized Parallel Program Requirement: Correctly Synchronize Atomic Operations Synchronization Selection Goal: Choose An Efficient Synchronization Mechanism for Each Operation Synchronized Parallel Program

  14. Atomicity Issues In Generated Code Serial Program Assumption: Operations Execute Atomically CommutativityAnalysis Unsynchronized Parallel Program Goal: Choose An Efficient Synchronization Mechanism For Each Operation Synchronization Selection Requirement: Correctly Synchronize Atomic Operations Synchronized Parallel Program

  15. Use Optimistic Synchronization Whenever Possible

  16. Model Of Computation • Objects With Instance Variables class histogram { private: int counts[N]; }; • Operations Update Objects By Modifying Instance Variables void histogram::update(int i) { counts[i]++; } 4 2 5 h->update(1) 4 4 2 3 5 5

  17. Commutativity Analysis • Compiler Computes Extent Of Computation • Representation of All Operations in Computation • In Example: { histogram::update } • Do All Pairs Of Operations Commute? • No - Generate Serial Code • Yes - Automatically Generate Parallel Code • In Example: h->update(i) and h->update(j) commute for all i, j

  18. Synchronization Requirements • Traditional Parallelizing Compilers • Parallelize Loops With Independent Iterations • Barrier Synchronization • Commutativity Analysis • Parallel Operations May Update Same Object • For Generated Code To Execute Correctly, Operations Must Execute Atomically • Code Generation Algorithm Must Insert Synchronization

  19. Default Synchronization Algorithm class histogram { private: int counts[N]; lock mutex; One Lock Per Object public: void update(int i) { mutex.acquire(); counts[i]++; mutex.release(); } }; Operations Acquire and Release Lock

  20. Synchronization Constraints Synchronization Constraint Can Use Optimistic Synchronization - Read/Compute/Write Update To A Single Instance Variable Must Use Lock Synchronization - Updates Involve Multiple Interdependent Instance Variables Operation counts[i] = counts[i]+1; aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa temp = counts[i]; counts[i] = counts[j]; counts[j] = temp;

  21. Synchronization Selection Constraints Can Use Optimistic Synchronization Only For Single Word Updates That All Updates To Same Instance Variable Must Use Same Synchronization Mechanism Read An Instance Variable Compute A New Value That Depends On No Other Updated Instance Variable Write New Value Back Into Instance Variable

  22. Synchronization Selection Algorithm Operates At Granularity Of Instance Variables Compiler Scans All Updates To Each Instance Variable If A Class Has A Lock Synchronized Variable, Class is Marked Lock Synchronized If All Updates Can Use Optimistic Synchronization, Instance Variable Is Marked Optimistically Synchronized If At Least One Update Must Use Lock Synchronization, Instance Variable Is Marked Lock Synchronized

  23. Synchronization Selection In Example • class histogram { • private: int counts[N]; • public: void update(int i) { • counts[i]++; • } • }; Optimistically Synchronized Instance Variable histogram NOT Marked As Lock Synchronized Class

  24. Code Generation Algorithm All Lock Synchronized Classes Augmented With Locks Operations That Update Lock Synchronized Variables Acquire and Release the Lock in the Object Operations That Update Optimistically Synchronized Variables Use Optimistic Synchronization Primitives

  25. Optimistically Synchronized Histogram class histogram { private: int counts[N]; public: void update(int i) { do { new_count = LL(counts[i]); new_count++ } while (!SC(new_count, counts[i])); } };

  26. Experimental Results

  27. Methodology • Implemented Parallelizing Compiler • Implemented Synchronization Selection Algorithm • Parallelized Three Complete Scientific Applications • Barnes-Hut, String, Water • Produced Four Versions • Optimistic (All Updates Optimistically Synchronized) • Item Lock (Produced By Hand) • Object Lock • Coarse Lock • Used Inline Intrinsic Locks With Exponential Backoff • Measured Performance On SGI Challenge XL

  28. 0.4 8 Data And Lock On Different Cache Lines 0.3 6 Locked Locked Update Time (microseconds) Update Time (microseconds) 0.2 4 Optimistic Optimistic Unsynchronized 0.1 2 Unsynchronized 0 0 Time For One Update Time for One Cached Update On Challenge XL Time for One Uncached Update On Challenge XL

  29. Synchronization Frequency Optimistic, Item Lock Barnes-Hut Object Lock 661 Coarse Lock Optimistic, Item Lock String Object Lock Optimistic, Item Lock Water Object Lock 25 Coarse Lock 0 5 10 15 Microseconds Per Synchronization

  30. Memory Consumption For Barnes-Hut 50 40 30 Memory Consumption (MBytes) 20 10 0 Optimistic Item Lock Object Lock Coarse Lock Total Memory Used To Store Objects

  31. Memory Consumption For String 5 4 3 Memory Consumption (MBytes) 2 1 0 Optimistic Item Lock Object Lock Total Memory Used To Store Objects

  32. Memory Consumption For Water 1.5 1 Memory Consumption (MBytes) 0.5 0 Optimistic Item Lock Object Lock Coarse Lock Total Memory Used To Store Objects

  33. 24 24 24 24 16 16 16 16 Speedup 8 8 8 8 0 0 0 0 0 8 16 24 0 8 16 24 0 8 16 24 0 8 16 24 Processors Processors Processors Processors Speedups For Barnes-Hut Optimistic Item Lock Object Lock Coarse Lock

  34. Speedups For String 24 24 24 16 16 16 Speedup 8 8 8 0 0 0 0 8 16 24 0 8 16 24 0 8 16 24 Processors Processors Processors Optimistic Item Lock Object Lock

  35. Speedups For Water 24 24 24 24 16 16 16 16 Speedup 8 8 8 8 0 0 0 0 0 8 16 24 0 8 16 24 0 8 16 24 0 8 16 24 Processors Processors Processors Processors Optimistic Item Lock Object Lock Coarse Lock

  36. Acknowledgements • Pedro Diniz • Parallelizing Compiler • Silicon Graphics • Challenge XL Multiprocessor • Rohit Chandra, T.K. Lakshman, Robert Kennedy, Alex Poulos • Technical Assistance With SGI Hardware and Software

  37. Bottom Line • Optimistic Synchronization Offers • No Memory Overhead • No Data Cache Overhead • Reasonably Small Execution Time Overhead • Good Performance On All Applications • Good Choice For Parallelizing Compiler • Minimal Impact On Parallel Program • Simple, Robust, Works Well In Range Of Situations • Major Drawback • Current Primitives Support Only Single Word Updates • Use Optimistic Synchronization Whenever Applicable

  38. Future The Efficient Implementation Of Atomic Operations On Objects Will Become A Crucial Issue For Mainstream Software • Small-Scale Shared-Memory Multiprocessors • Multithreaded Applications and Libraries • Popularity of Object-Oriented Programming • Specific Example: Java Standard Library Optimistic Synchronization Primitives Will Play An Important Role

More Related