300 likes | 410 Views
Transactional Memory. James Larus and Christos Kozyrakis. MOTIVATION. Transition from sequential computing to parallel computing Achieving optimal performance from Multicore computers based on improving parallelism in programming.
E N D
TransactionalMemory James Larusand Christos Kozyrakis
MOTIVATION • Transition from sequential computing to parallel computing • Achieving optimal performance from Multicorecomputers based on improving parallelism in programming. • Find better abstractions for expressing parallel computation and for writing parallel programs • Current Programming Constructs. • Threads, Locks, Semaphores etc
TRANSACTIONAL MEMORY • A transaction is a form of program execution. • In case of parallel programming, TM offers a mechanism that allows portions of a program to execute in isolation, without regard to other, concurrently executing tasks. • TM provides lightweight transactions for threads running in a shared address space. • TM ensures the atomicity and isolation of concurrently executing tasks. • TM provides a basis to built parallel abstractions
TRANSACTIONAL MEMORY • Atomicity • Atomicity ensures program state changes effected by code executing in a transaction are indivisible from the perspective of other, concurrently executing. • Isolation • Isolation ensures that concurrently executing tasks cannot affect the result of a transaction, so a transaction produces the same answer as when no other task was executing.
PROGRAM MODEL • General TM Systems • Provide simple atomic statements that execute a block of code (and the routines it invokes) as a transaction. • Not a replacement for general synchronization such as semaphores or condition variables. • AME • Executing most of a program in transactions • Supports asynchronous programming
ADVANTAGES • TM offers a simpler alternative to mutual exclusion by shifting the burden of correct synchronization from a programmer to the TM system. • Program’s author only needs to identify a sequence of operations on shared data that should appear to execute atomically to other, concurrent thread. • Transactions make synchronization composable, which enables the construction of concurrency programming abstractions.
LIMITATIONS • Transactions by themselves cannot replace all synchronization in a parallel program • Synchronization is often used to coordinate independent tasks • Consider, a producer-consumer programming relationship. • Transactions can ensure the tasks’ shared accesses do not interfere • If the consumer transaction finds the value is not available, it can only abort and check for the value later. • TM systems provide a guard that prevents a transaction from starting execution until a predicate becomes true. • Retry and orElseconstructs by Haskell TM • The trade-offs and programming pragmatics of the TM programming model are still not understood. • The performance of TM is not yet good enough for widespread use. • Software TM systems (STM) impose considerable overhead costs on code running in a transaction • HTM fall back on software for large transactions
TRANSACTIONAL MEMORY IMPLEMENTATION • STM (Software Transactional Memory • HTM (Hardware Transactional Memory) • Most TM systems of both types implement optimistic concurrency control. • The alternative pessimistic concurrency control requires a transaction to establish exclusive access to a location.
STM • STM • Implemented lock-free, atomic, multi-location operations entirely in software • Required a program to declare in advance the memory locations to be accessed by a transaction
STM • DSTM • Object-granularity, deferred-update STM system • Conflict Detection • Early Detection • Late Detection • Read- Write Conflicts • Only clone objects that are modified. • Read-Object List • Conditions for Commit • No concurrently executing transaction modified an object read by T • Transaction T is not modifying an object that another transaction is also modifying. • Performance of DSTM dependent on workload
STM • Deferred Update Systems • WSTM system detects conflicts at word, not object, granularity Direct update Systems • Avoid unnecessary conflicts if two transactions access different fields in an object • Extended Java with an atomic statement that executed its block in a transaction • Policy to select which transaction to abort in case of conflict. • “Polka Policy” – Track no. of objects it has open and uses them as priority.
STM • Direct Update Systems • Transactions directly modify an object, rather than a copy. • Must record the original value of each modified memory location. • Must prevent a transaction from reading the locations modified by other, uncommitted transactions, thereby reducing the potential for concurrent execution • Require a lock to prevent multiple transactions from updating an object concurrently. • Direct-update STM systems provide forward progress guarantees to an application by detecting and aborting failed or blocked threads.
HTM • Hardware Acceleration for STM • The primary source of overhead for an STM is the maintenance and validation of read sets • Invokes instrumentation routine • HASTM first proposed by Saha et al. • Provides the STM with two capabilities through per-thread mark bits at the granularity of cache blocks • Software can check if a mark bit was previously set for a given block of memory and that no other thread wrote to the block since it was marked. • Software can query if potentially there were writes by other threads to any of the memory blocks that the thread marked.
HTM • HASTM • Implements mark bits using additional metadata for each block in the per-processor cache of a Multicorechip • The read instrumentation call checks and sets the mark bit for the memory block that contains an object’s header • If the mark bit was set, indicating that the transaction previously accessed the object, it is not added to the read set again • Validation • Relies on software based validation if checked. • In HASTM, the mark bits may be lost if a processor is used to run other tasks
HTM • SigTM • Uses hardware signatures to encode the read set and write set for software transactions • A hardware Bloom filter outside of the caches computes the signatures • Software instrumentation provides the filters with the addresses of the object • Hardware in the computer monitors coherence traffic for requests for exclusive accesses to a cache block, which indicates a memory update • The hardware tests if the address in a request is potentially in a transaction’s read or write set by examining the transaction’s signatures. • Either aborts or falls back on SW validation. • Capacity and conflict misses do not cause software validation • May produce false conflicts due to address aliasing in a Bloom filter • SigTMsignatures track physical addresses
HTM • HTM systems require no software instrumentation of memory references within transaction code. • Manages data versions and tracks conflicts transparently as software performs ordinary read and write accesses • Rely on a computer’s cache hierarchy and the cache coherence protocol to implement versioning and conflict detection
HTM • Transactional Coherence and Consistency (TCC) • Deferred update HTM that performs conflict detection when a transaction attempts to commit. • Each cache block is annotated with R and W tracking bits • Cache blocks in the write set act as a write buffer and do not propagate the memory updates until the transaction commits. • Two-phase protocol.
HTM • Hardware acquires exclusive access to all cache blocks in the write set using coherence messages • The hardware instantaneously resets all W bits in the cache, which atomically commits the updates by this transaction • If validation fails, hardware reverts to a software handler • Conflict Detection
HTM • Advantages & Limitations • An HTM system can outperform a lockbased STM by a factor of four and the corresponding hardware-accelerated STM by a factor of two • The caches used to track the read set, write set, and data versions have finite capacity and may overflow on a long transaction • The transactional state in caches is large and is difficult to save and restore • Placing implementation-dependent limits on transaction sizes is unacceptable from a programmer’s perspective.
SOLUTIONS • Offending transaction executes to completion • HTM system can update memory directly without tracking the read set, write set, or old data . • However, no other transactions can execute • Virtualized TM (VTM) • Maps the key bookkeeping data structures for transactional execution (read set, write set, write buffer or undo- log) to virtual memory • Hardware caches hold the working set of these data structures • Hybrid HTM–STM system • transaction starts in the HTM mode • restarted in the STM mode with additional instrumentation if resources exceeded • Provides good performance for short transactions.
HARDWARE/SOFTWARE INTERFACE FORTRANSACTIONAL MEMORY • Four interface mechanisms for HTM systems • The first mechanism is a two-phase commit protocol that architecturally separates transaction validation from committing its updates to memory • The second mechanism is transactional handlers that allow software to interface on significant events • The third mechanism is support for closed and open-nested transactions • Fourth, multiple types of load and store instructions what allow compilers to distinguish accesses to thread private, immutable, or idempotent data from accesses to truly shared data
Open Issues • Transaction that executed an I/O operation may roll back at a conflict. • Strong and weak atomicity. • STM systems generally implement weak atomicity, in which non-transactional code is not isolated from code in transactions • HTM systems, on the other hand, implement strong atomicity • TM must coexist and interoperate with existing programs and libraries
CONCLUSION • TM provide a time tested model for isolating concurrent computations from each other • Raises the level of abstraction for reasoning about concurrent tasks