220 likes | 352 Views
Hybrid Transactional Memory. Reza Sherafat Prof. Cristiana Amza University of Toronto Dec 4, 2006. Quick Background Review. A transaction is a sequence of operations that “as a whole” is performed atomically. Life cycle of a transaction:
E N D
Hybrid Transactional Memory Reza SherafatProf. Cristiana Amza University of Toronto Dec 4, 2006
Quick Background Review • A transaction is a sequence of operations that “as a whole” is performed atomically. • Life cycle of a transaction: • Initialization: start a transaction by storing the current state; • Execution: Open objects for read/write; • Data modifications are hidden from others; • Watch for conflicts; • Termination: end the transaction • Successful completion (Commit): Let other threads know about the changes were made; and modifications take effect; or • Unsuccessful completion (Abort):Discard modifications
Outline • Motivations • Hybrid Transactional Memory • Implementation • Evaluations • Conclusions
Motivations • In parallel programs we must protect concurrent access to shared data. • Locks are widely used; but several problems are associated with using locks: • Performance (speedup) Overhead of locking (wait time, acquire, release) Granularity (hard to balance wait time, overhead) Over serialization • Programming Hard for programmers to write and debug Deadlocks are hard to avoid • Other problems Priority inversion Problem when a process holding the lock crashes
Transactional Memory (TM) • Main idea: Non-blocking execution • Execute each concurrent transaction speculatively; • Apply changes when transaction completed successfully. • Non-conflicting access to shared objects within transactions is allowed: • Once conflict is detected, transaction rolls back and state is restored (abort); • TM support is provided through an API: • Start a transaction • Abort/commit a transaction • Wrap objects in TM objects • Properties of transactions: • Atomic: a transaction is like a single unit (all-or-nothing) • Serializable: concurrent Start a transaction t transactions are performed in some serial order • Obstruction-freedom: guarantees progress of one process in absence of contention • No deadlock
Conflicting Access to Shared Data • Conflicts in accessing shared data may result in data inconsistencies. • Conflicts happen when an object that has been accessed by other transactions (read or write) is updated before others commit. • Multiple readers are allowed • Only one writer is allowed at each time • The system ensures that transactions that access data don’t conflict. If no conflicts occur, the transactions are serializable. • Conflict resolution: once a conflict is detected, we can get a serializable execution by aborting all but one of the conflicting transactions. • Speculative modifications of aborted transactions are discarded. Old values before starting the transaction become valid.
Hybrid TM Each approach should implement TM semantics: Start transaction, open object, detect conflicts, abort, commit. • Hardware-based approaches: • Bounded number of locations • Maintain versions in cache → Low overhead • Software-based approaches: • Unbounded number of locations can be accessed within a transaction • Slow due to overhead of maintaining multiple copies • Potentially orders of magnitude • Hybrid: Combines the benefits of both approaches • High performance (unless the transaction exceeds HW limits) • Support for unlimited transactional objects • Handles simultaneous data access from HW/SW modes
Implementations • Two modes for executing transactions: HW vs. SW. • In general, HW mode is preferred (it is faster), unless we run out of resources. • Naïve approach: the system has a universal mode of operation. • A better approach: transactions have two modes to choose from. • Each transaction separately chooses the mode of operation when it starts. • Better performance and utilization of system resources • Other policies may also be applied to chose the mode: If the transaction fails for a number of time (e.g., 3) then start in SW mode; • Pure HW/SW implementations must be tailored such that they can coexist. • Objects may be accessed simultaneously in transactions in HW, SW modes. • Interoperability is a must.
Hardware TM A HW-TM scheme that can used for the Hybrid implementation that relies on the standard cache coherence protocol and some additional components. • Cache coherence protocol handles data consistencies across multiple processors: • Only one processor has permission to write to a cache line; • No processor can read a line that another processor has permission to write to. • Additional components on each processor store speculative data and check for conflicts: • ISA extensions • Instructions for: transactional begin, commit, abort, load/store, etc. • Additional components on the processor chip (In parallel with the L1 cache) • Transactional buffer: old, • Transactional state table: state of the contexts (threads) running on the processor • All memory accesses within a transaction are done transactionally.
Old field is keeps speculative values Transactional semantics: Start transaction: Transactional state for that context is set to SELECT, ALL. Abort: Exception flag is set, clear corresponding read/write bits, invalidates speculative written data Commit: Update the transactional state. Detect conflicts: read/write bit vector If the exception flag is set, any attempt to commit or load/store by the transaction results in a trap that will be handled by the exception handler. HW-TM Question: How is abort implemented across multiple processors? CCP!
X Modify Valid Copy State State Object Pointer Object Contents Object Contents State Pointer State Pointer Old Old New New Quick Review of DSTM Before accessing an object within a transaction Object Contents
Software TM • Uses a locator similar to DSTM: • Redirection and object copying. • The locator also keeps track of the readers. • As opposed to local hash tables to store the last data value in each read transaction. • This helps early abort, and avoids validation when committing • A locator consists of: • Valid field • Write state (one) • Read state (multiple) • Old/new objects • Object size A locator object in Hybrid-TM
Putting Things Together • Transactions in HW may conflict with those of SW, and vice versa. • Opening an object in HW: • [read the TMObject pointer transactionally] • Abort all conflicting HW/SW • Opening an object in SW: • Create a state object, and load it transactionally • Abort conflicting HW/SW transactions • Hardware aborts Hardware • A load/store (trans. by default) causes an abort • Software aborts Hardware • When SW opens a TMObject, it assigns it to a new locator. Since the object is transactionally read by the HW, the transaction is aborted. • Hardware aborts Software • When HW opens a TMObject, it writes ABORTED to transaction state having this object • Software aborts Software • Write ABORTED to the state from the reader/writer pointers.
X State State Object Pointer Object Contents Object Contents State Pointer State Pointer Old Old New New Software aborts Hardware Conflict detected by the threads in the hardware mode In the Hardware Mode Modify in place Object Contents Thread 1: HW mode Thread 2: HW mode In the Software Mode Copy and Modify Thread 3: SW mode
Evaluations • Three microbenchmarks • VR: Small critical section (overhead of starting/committing transactions) • HT: Simultaneous lookup operations (per object overhead of transactions) • GU: Course grained locking vs. transactional memory • For each case two scenarios: Low and High Contention • Compare four synchronization implementations • Lock • Pure Hardware Transactional Memory • Pure Software Transactional Memory • Hybrid Transactional Memory
Evaluations (Hybrid Execution) • In all cases of hybrid execution, the ratio of SW/HW mode is very small. • This is due to relatively (compared to size of transactional objects) large size of transactional buffer. (is this realistic?) • Since in most transactions HW mode is used, this does not give a good view of the overhead associated with effects of slow SW mode.
Evaluations (VR) • When # of processors grow, contention does not grow significantly • This is because transactions are too small (conflicts rarely happen)
Evaluations (HT) • It is true that several lookup operations can be performed simultaneously, however those operations will be rolled back all together once a conflict with a writer occurs • This seems to be significant for slightly long duration transactions • The lock performance is better. • The paper claims similar behavior would be achieved by reader-writer locks; • I expect that would have a much better performance, since once underway concurrent operations will not be undone
Evaluations (GU) • Why does the execution time decreases in the lock implementation from GU-low to GU-high? • It is usually inverse! • Do locks have back-offs?
Conclusions • Transactional memory outperforms the lock-based synchronization in most cases • Hybrid Transactional Memory approach gives a good balance between scalability of SW and performance of HW • Requires only modest hardware support (transactional buffer, state table) • Within system limits: Good performance for most transactions • Exceeding system limits: fallbacks to software mode when a transaction cannot complete within the hardware bounds • More needs to be gone to ensure progress.
Nested transaction? • Additional limits for the HW: • Contexts • Hybrid has limitations: • Uses transactional buffer • I am not sure how the non-blocking mechanism is implemented across multiple processors.