Behavior of Synchronization Methods in Commonly Used Languages and Systems

Distributed Computing and SystemsChalmers University of TechnologyGothenburg, Sweden Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B. Chatterjee, N. Nguyen, M. Papatriantafilou, P. Tsigas

Developing a multithreaded application… The boss wants .NET Java is nice Multicores everywhere The client wants speed… (C++?) Yiannis Nikolakopoulosioaniko@chalmers.se

Developing a multithreaded application… Concurrent Data StructuresThen we need Synchronization. The worker threads need to access data Yiannis Nikolakopoulosioaniko@chalmers.se

Implementing Concurrent Data Structures Performance Bottleneck Yiannis Nikolakopoulosioaniko@chalmers.se

Implementing Concurrent Data Structures Runtime System Hardware platform Which is the fastest/most scalable? Yiannis Nikolakopoulosioaniko@chalmers.se

Implementing concurrent data structures Yiannis Nikolakopoulosioaniko@chalmers.se

Problem Statement • How the interplay of the above parameters and the different synchronization methods, affect the performance and the behavior of concurrent data structures. Yiannis Nikolakopoulosioaniko@chalmers.se

Outline Introduction Experiment Setup Highlights of Study and Results Conclusion Yiannis Nikolakopoulos ioaniko@chalmers.se

Which data structures to study? Represent different levels of contention: • Queue - 1 or 2 contention points • Hash table - multiple contention points Yiannis Nikolakopoulosioaniko@chalmers.se

How do we choose implementation? Possible criteria: • Framework dependencies • Programmability • “Good” performance Yiannis Nikolakopoulosioaniko@chalmers.se

Interpreting “good” • Throughput:The more operations completed per time unit the better. • Is this enough? Yiannis Nikolakopoulosioaniko@chalmers.se

Non-fairness Yiannis Nikolakopoulosioaniko@chalmers.se

What to measure? • Throughput:Data structure operations completed per time unit. Average operations per thread Operations by thread i Yiannis Nikolakopoulosioaniko@chalmers.se

Implementation Parameters Programming C++ Java C# (.NET, Mono) Environments TAS, TTAS, Lock - free, Array lock Synchronization PMutex, Reentrant, lock construct, Methods Lock - free memory synchronized M utex management NUMA Intel Nehalem, 2 x 6 core AMD Bulldozer, 4 x 12 core Architectures (24 HW threads) (48 HW threads) Do they influence fairness? Yiannis Nikolakopoulosioaniko@chalmers.se

Experiment Parameters • Different levels of contention • Number of threads • Measured time intervals Yiannis Nikolakopoulosioaniko@chalmers.se

Outline • Queue • Fairness • Intel vs AMD • Throughput vs Fairness • Hash Table • Intel vs AMD • Scalability Introduction Experiment Setup Highlights of Study and Results Conclusion Yiannis Nikolakopoulos ioaniko@chalmers.se

Observations: Queue Fairness can change along different time intervals 24 Threads, High contention Yiannis Nikolakopoulos ioaniko@chalmers.se

Observations: Queue Significantly different fairness behavior in different architectures Fairness 24 Threads, High contention Yiannis Nikolakopoulos ioaniko@chalmers.se

Observations: Queue Significantly different fairness behavior in different architectures Fairness 24 Threads, High contention Lock-free is less affected in this case Yiannis Nikolakopoulos ioaniko@chalmers.se

C++ C++ 1 16 14 0,8 12 10 0,6 8 Operations per ms (thousands) 0,4 6 Fairness 4 0,2 2 0 0 2 4 6 8 12 24 48 2 4 6 8 12 24 48 Threads Threads TTAS Lock-free PMutex Queue: Throughput vs Fairness Fairness 0.6 s, Intel Throughput Yiannis Nikolakopoulos ioaniko@chalmers.se

Observations: Hash table • Operations are distributed in different buckets • Things get interesting when #threads > #buckets • Tradeoff between throughput and fairness • Different winners and losers • Contention is lowered in the linked list components Yiannis Nikolakopoulosioaniko@chalmers.se

Observations: Hash table Fairness differences in Hash table across architectures 24 Threads, High contention Yiannis Nikolakopoulos ioaniko@chalmers.se

Observations: Hash table Fairness differences in Hash table across architectures 24 Threads, High contention Lock-free is again not affected Yiannis Nikolakopoulos ioaniko@chalmers.se

Observations: Hash table In C++, custom memory management and lock-free implementations excel in scalability and performance. Yiannis Nikolakopoulos ioaniko@chalmers.se

Conclusion Which is the fastest/most scalable? • Complex synchronization mechanisms (Pmutex, Reentrant lock) pay off in heavily contended hot spots • Scalability via more complex, inherently parallel designs and implementations • Tradeoff between throughput and fairness • LF Hash table • Reentrant lock vs Array Lock vs LF Queue • Fairness can be heavily influenced by HW • Interesting exceptions Is fairness influenced by NUMA? Yiannis Nikolakopoulos ioaniko@chalmers.se

Behavior of Synchronization Methods in Commonly Used Languages and Systems

Behavior of Synchronization Methods in Commonly Used Languages and Systems

Presentation Transcript

commonly used advertising techniques

Commonly used separation techniques

Commonly Used Accelerants

Commonly Used Weather Instruments

Commonly used challenges

60 Commonly Used Prepositions

Commonly Used Distributions

Commonly Used Challenges

Commonly Used Lab Equipment

Synchronization Methods

Comparison of Five Commonly Used Gene-Gene Interaction Detecting Methods in Schizophrenia

Commonly Used Medications

Unit 4 Commonly Used Methods in Translation （ 1 ）

Synchronization Methods

COMMONLY USED PROBABILITY DISTRIBUTION

Most Commonly Used Materials

Commonly used Building Material

Maintenance Tips for Commonly Used Roof Systems

Commonly Used Programming Languages in Website development

5 Commonly Used Carpet Cleaning Methods In Gold Coast

Four Commonly Used Printing Methods

Commonly Used Distributions