160 likes | 332 Views
Extended Memory Semantics for Thread Synchronization. Sheng Li, Ying Zhou Operating System Progress Report Nov 1 st , 2007 . Problems. Hardware multithreading is no longer a privilege of supercomputing, it is already part of the major microprocessors.
E N D
Extended Memory Semantics for Thread Synchronization Sheng Li, Ying Zhou Operating System Progress Report Nov 1st, 2007
Problems • Hardware multithreading is no longer a privilege of supercomputing, it is already part of the major microprocessors. • E.g. In Sun Niagara 2 has 64 threads/chip and 256 threads/server. • Concurrency management is one of the biggest challenges in multithreaded system • Key requirement:Low overhead and scalable thread synchronization • Synchronization mechanisms • Atomic primitives (Test-and-Set, Compare-and-Swap, LL-SC) • Software routines built on them have poor performance and scalability • Empty/Full bits, using extension bit for each memory location to denote the empty/full state. • Better performance [1], but still not enough
Our Goal • Solve the synchronization bottleneck by using Extended Memory Semantics • Better performance and scalability • Quantify the performance gain when using EMS, compared to other synchronization mechanisms (e.g Empty/Full bits)
64 bits of data/metadata Extension bit Extended Memory Semantics Memory instructions are characterized synchronization behavior. • Load.ff, Load.fe, Store.xf, Store.ef, Store.xe. (F--- Full, e---empty, x---don’t care)
EMS handler • There is no free lunch… EMS handler has overhead • Creating the handler threads • To queue up memory requests, to build the data structure
What we have done so far • Build the EMS model on both architecture and OS aspects in the Structural Simulation Toolkit (SST) • SST is the simulation environment for massively lightweight multithreading , developed at Notre Dame and Sandia Lab • Modified the glibc to use EMS • Especially pthread library • Design benchmarks for different categories • Run the simulations to evaluate EMS performance
Tightly Coupled Parallel • Each thread competes with the others for the only lock before updating the counter • Very high contention, worst case
Loosely Coupled Parallel • Each thread competes locks with the others before updating the counters. • Mild contention
Embarrassingly Parallel • No contention, no locks
Embarrassingly parallel and loosely coupled parallel • Low synchronization overhead--- guaranteed by EMS • EMS shows very good scalability Synchronization distribution
Tightly Coupled Parallel • Bad performance for EMS in the worst case • Most of threads are used for synchronization, not for real job
The Road Ahead • Build/complete other synchronization mechanisms (e.g. Empty/Full bits and etc) into SST • Modify glibc to make it support for other synchronization mechanisms • Compare performance between EMS and other synchronization mechanisms
Thank you! Questions?
Bibliography [1] Performance and Programming Experience on the Tera MTA, Larry Carter, John Feo, Allan Snavely, PPSC, 1999
Lightweight Threads • Thread context (frame) is 32 double words (256 bytes) • Two double words are reserved for the thread status; 30 general purpose registers. • No other per thread state, easy for multithreading . • Frames are stored in memory (No Register File) • Registers are aliases for memory locations