290 likes | 416 Views
Lowering the Overhead of Software Transactional Memory Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William N. Scherer III, Michael L. Scott. Featuring: RSTM – low overhead STM library for C++ Presenting: Yosef Etigin. What is this paper about?.
E N D
Lowering the Overheadof Software Transactional MemoryVirendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William N. Scherer III, Michael L. Scott Featuring: RSTM – low overhead STM library for C++ Presenting: Yosef Etigin
What is this paper about? • Design and implementation of RSTM. • RSTM is meant to be a fast STM library for C++ multi-threaded programs. • RSTM main features: • Cache-optimized metadata organization. • No memory allocations during runtime, except for cloning objects. • Use a contention manager to tune performance. • Allow different strategies: eager/lazy acquire, visible/invisible readers.
User application beginTx { openRO, openRW } endTx RSTM Library HW: atomic Load & Store, CAS Where RSTM fits in? • Requires atomic load/store and CAS in hardware. • Provides C++ “Smart Pointers” API that can be used to safely access shared data within transactions.
Overview • RSTM Theory • Transaction Semantics • Readers • Writers • RSTM Design • Descriptor • Data Object • Shared Object Handle • RSTM Implementation • Resolving the data object • Open for read-only • Acquire • Open for read-write • Commit • Abort • Performance results • Conclusion
Transaction Semantics • Data is considered in object granularity. • Objects are shadowed, rather than changed “in place”. • Inside a transaction, objects may be opened for read-only or for read-write. • Objects that are opened for read-write are cloned, and those for read-only are not. • “Commit” tries to set the clone as the current object. • “Abort” tries to set the original as the current object. • Transactions may abort each other, but they consult the ContentionManager (CM) before doing so.
Readers • A thread that opens an object for reading may become a “visible” or “invisible” reader. • “visible” = visible to writers. • Reader must have a consistent view of its opened objects. • “consistent” = no writer has made a change that the reader sees only in some of its opened objects. • Inconsistency might cause hardwareexceptions and infinite loops, thus: • Invisible reader, on every “open”, must validate all previously opened objects (O(n2) cost). • Visible reader must be explicitly aborted by a writer that acquired it.
Writers • Opening an object for writing involves “acquiring” it. • Acquiring is getting exclusive access to the object. • Writers conflict with other writers and with visible readers. • Visible readers can co-exist with each other. • Acquiring can be done in eager or lazy fashion: • Eager – acquire an object as soon as it’s opened. • Lazy – acquire it prior to committing the transaction. • Eager acquire aborts doomed transactions immediately, but causes more conflicts. • Lazy acquire enables readers to run together with a writer that is not committing yet. • Has the same consistency issue as with invisible reads.
Contention Management • CM is a Thread-local object • Notified of transaction events • Decides what to do on a conflict: • Abort a transaction or spin-wait • Which transaction to abort, if any • For instance: “Polka” CM • Prefers writers over readers
Overview • RSTM Theory • Transaction Semantics • Readers • Writers • RSTM Design • Descriptor • Data Object • Shared Object Handle • RSTM Implementation • Resolving the data object • Open for read-only • Acquire • Open for read-write • Commit • Abort • Performance results • Conclusion
Shared Object Handle Data Object (New) header visible readers owner next Data Object (Old) Descriptor (writer) Descriptor (reader) Descriptor (reader) RSTM Design Thread 3 Thread 1 Thread 2
Descriptor • Each thread has a static descriptor that is used for all transactions of this thread. • Don’t support nested transactions • Descriptor has: • Status: ACTIVE / COMMITTED / ABORTED • Lists of opened objects: • Visible, invisible reads. • Eager, lazy writes.
Data Object • Shared objects hold, in addition to data fields, “owner” and “next” fields. • Owner is the descriptor of the current writer thread, if any. • Next is the original object, if this is a writer-made clone.
Shared Object Handle (1) • Encapsulates a reference to a shared object. • Global variables are handles rather than pointers. • Direct pointers are obtained within a transaction, via “open”. • Holds: • “header” word - identifies the current version of the object. • “visible readers” word – bitmap of the visible readers.
Shared Object Handle (2) • The header is a single word that holds a pointer and a dirty bit. • Take advantage of address alignment • The pointer holds some data object “pObj”. • The dirty bit tells whether “pObj” is a clean object, or a writer-made clone. • Saves one dereference in the common case of non-conflicting access.
Shared Object Handle (3) • “Visible readers” is a bitmap of the visible readers. • Bit i of the mask is set if thread i is a visible reader of the object. • Allows getting all readers or adding a reader in a single atomic operation. • Limits the number of visible readers • All others will be invisible
Overview • RSTM Theory • Transaction Semantics • Readers • Writers • RSTM Design • Descriptor • Data Object • Shared Object Handle • RSTM Implementation • Resolving the data object • Open for read-only • Acquire • Open for read-write • Commit • Abort • Performance results • Conclusion
RSTM Implementation • This section will provide pseudo-code for the most important STM operations: • Open object for read-only • Open object for read-write • Commit • Abort • We present pseudo-code for methods of Descriptor class, which is the object that implements RSTM functionality.
Resolving the Data Object // This function returns the up-to-date data object, associated with // a handle. If the object has an active owner, call CM. Object *Descriptor::resolve(Handle *shared) { long snapshot = shared->header; Object *ptr = snapshot & ~1; // mask out LSB if (snapshot & 1) { // dirty switch (ptr->owner->m_status) { case ACTIVE: m_cm.handleConflict(this, ptr->owner); return NULL; case COMMITTED: return ptr; case ABORTED: return ptr->next; } } else { // clean return ptr; } }
Open for Read-Only // Open an object for read-only Object *Descriptor::openRO(Handle *shared) { long headerSnapshot = shared->header; // find the data object Object *ptr; do { ptr = resolve(shared); } while (!ptr); if (m_isVisible) { m_visibleReads.add(shared); // install this tx as a visible reader of the object while (! CAS(&shared->readers, shared->readers, shared->readers | (1 << m_id)) ); // make sure no writer acquired this object before he could see the CAS above if (headerSnapshot != shared->header) abort(); } else { m_invisibleReads.add(shared); } validate(); return ptr; }
Open for Read-Write // Open an object for read-write Object *Descriptor::openRW(Handle *shared) { // find the data object Object *ptr; do { ptr = resolve(shared); } while (!ptr); // make a writeable clone Object *clone = ptr->clone(); clone->owner = this; clone->next = ptr; // eager acquires now. lazy acquires later. if (m_isEager) { acquire(shared, clone); m_eagerWrites.add(shared, clone); } else { m_lazyWrites.add(shared, clone); } validate(); return clone; }
Acquire // acquire the object void Descriptor::acquire(Handle *shared, Object *clone) { // replace the header with a dirty reference to the clone if (!CAS( &shared->header, shared->header, (long)clone | 1)) abort(); // abort all visible readers for (i = 0; i < sizeof(shared->readers) * 8; ++i) { if (shared->readers & (1 << i)) allDescriptors[i]->abort(); } // record this object for cleanup m_acquiredObjects.add(<shared, clone>); }
Commit // commit a transaction void Descriptor::onCommit() { validate(); // acquire now lazily opened-for-rw objects acquireLazyWrites(); // if this CAS succeeds our clones (if any) become the active objects CAS( &m_status, ACTIVE, COMMITTED ); if (m_status == COMMITTED) { // replace a dirty reference to our clone // with a clean reference to our clone for (<shared, clone> in m_acquiredObjects) { CAS( &shared->header, clone | 1, clone ); } for (Shared *shared in m_visibleReads) { while (!CAS( &shared->readers, shared->readers, shared->readers & ~(1 << m_id)) ); } } else { abort(); } } Linearization Point
Abort // called when “Aborted” exception is caught void Descriptor::onAbort() { // after this CAS, our clones (if any) are discarded CAS( &m_status, ACTIVE, ABORTED ); // cleanup the written objects // replace a dirty reference to our clone // with a clean reference to the original object for (<shared, clone> in m_acquiredObjects) { CAS( &shared->header, clone | 1, clone->next ); } // remove the thread from readers bitmap of all // visibly opened objects for (Shared *shared in m_visibleReads) { while (!CAS( &shared->readers, shared->readers, shared->readers & ~(1 << m_id)) ); } }
Overview • RSTM Theory • Transaction Semantics • Readers • Writers • RSTM Design • Descriptor • Data Object • Shared Object Handle • RSTM Implementation • Resolving the data object • Open for read-only • Acquire • Open for read-write • Commit • Abort • Performance results • Conclusion
Performance Results (1) • Compare ASTM and RSTM (previous work showed that ASTM outperforms DSTM and OSTM). • Platform: 16-processor SunFire 6800 at 1.2GHz. • Use several benchmarks with different configurations: visible/invisible readers, eager/lazy writers. • Each benchmark was run for 10 seconds with 1 to 28 threads. • Contention manager: “Polka”. • Count successful transactions.
Performance Results (2) • RSTM with invisible readers is ~2 times better than ASTM. • Visible readers are expensive because each access reads the root node and causes cache invalidation. • The only difference between C++ ASTM and RSTM is metadata organization.
Performance Results (3) • In LinkedList, FGL performs bad if #threads > #CPUs due to preemption. • In LinkedList, ASTM outperforms RSTM since each writer invalidates objects for many readers. • HashTable allows great concurrency, so RSTM works well (~3 times faster than ASTM).
Performance Results (4) • In RandomGraph and LFUCache, all STM’s perform worse than CGL, because these data structures do not allow much concurrency. • Nevertheless, RSTM beats ASTM.
Conclusion • RSTM has a novel metadata organization which reduces overhead, due to: • One level of indirection instead of the common two. • Using static instead of dynamic data structures. • RSTM provides a variety of policies for conflict detection, so can be customized for a given workload. • Compared to ASTM, RSTM gives better performance due to better metadata organization.