90 likes | 104 Views
Explore the benefits of thread-local storage, TSS patterns, and TSS emulation in C++11 for efficient, thread-global data management. Understand implementation options and costs for achieving optimized performance.
E N D
Thread-Specific Storage (TSS) Chris Gill and Venkita Subramonian E81 CSE 532S: Advanced Multi-Paradigm Software Development
Thread Local Storage in C++11 • A variable can be declared thread_local as of C++11 • Lifetime is the lifetime of the thread • Useful for data that are logically global to the thread • Good for avoiding passing references to it up and down call stack • E.g., if data are made extern, or static, or put in a namespace, etc. • Good fences make good neighbors • Not visible to other threads (unless a pointer/reference is given away) • What if there are many different thread-specific data? • If all threads use instances of all the same types all the time, can put them in a structand make instances of the struct thread local • Otherwise, thread-specific storage (TSS) pattern can help
A More Complete and General Solution:Thread-Specific Storage (TSS) Pattern • Logically thread-global access point • Maps index to object • Index is a 2-tuple • e.g., an STL pair of <key,std::thread::id> • Avoids lock overhead • Separate copy per <key, std::thread::id> • Logically a mxn table • Sparse/dense, small/large • Implement accordingly A TSS table points to different kinds of thread-specific objects tid1 tid2 tid3 tid4 key1 TSS table key2 connections key3 errno values
Alternative Table Implementations Hash Map key1 tid2 key1 tid4 • 2-D array is good for many use-cases • Small #s of threads, keys • And/or densely populated • May avoid data races • Hash map, skip-list, etc. may be better for others • Large row/column sizes • Sparsely populated • But, adds some overhead • Data races may occur key3 tid1 key3 tid3 key3 tid4
TSS and Resource Indexing thread-specific objects • Multiple object lookup keys • Each key in a thread is for a different object • Explicit tid indexing • Used when a thread needs to cross-reference another’s TSS • Watch out for race conditions • Avoid locking if at all possible • Benefit of thread id indexing • Threads remain mostly unaware of each other’s TSS resources • As if each were the only thread in the process that uses TSS • Unless a thread compares the thread id it is given with its own via std::this_thread::get_id() distinguished by keys distinguished by thread ids
Key issues Identity of the distributable thread abstraction (GUID) Mapping and remapping DT to different local threads E.g., when DT makes a remote call, release local thread to reactor E.g., when DT makes a nested call back onto the same host Distributable Thread (DT) TSS Variant Remote call carries DT’s parameters with it Binding of a single DT to different local OS threads Host 1 Host 2 RTCORBA 2.0 Scheduler RTCORBA 2.0 Scheduler remote calls and returns <GUID2, TID2> <GUID1, TID1> <GUID1, TID1> <GUID1, TID2>
Distributable Thread (DT) TSS Variant • A distributable thread can use thread-specific storage • Avoids locking of global data • Context: OS provided TSS is efficient, uses OS thread id • Problem: distributable thread may span OS threads • Difficult to access prior storage • Solution: TSS emulation • based on <GUID,tid> pair • also useful idea on platforms that don’t provide native TSS • Key question to answer • What is the cost of TSS emulation compared to the OS provided version of TSS?
TSS Emulation Costs (Mgeta, RTAS04) • Pentium tick timestamps • Nanosecond resolution on 2.8 GHz P4, 512KB cache, 512MB memory • RedHat 7.3, real-time class • Called create repeatedly • Then, called write/read repeatedly on one key • Upper graph shows scalability of key creation • Cost scales linearly with number of keys in OS, ACE TSS • Emulation costs ~2usec more per key creation • Lower graph shows the emulated write costs ~1.5usec, read ~.5usec more
Conclusions • Benefits of using Thread-Specific Storage Pattern • Efficiency of access (no locking) • Reusability (via Wrapper Façade) • Ease of use (hides complexity) • Liabilities of the pattern • Potential cluttering of the TSS map • Objects not used by multiple threads don’t belong in the map • Putting them there wastes space, adds program complexity • “Yet another” factor obscuring system structure/behavior • E.g., have to understand map during multi-threaded debugging • Language-specific implementation options • May reduce portability • E.g., templates and operator overloading