290 likes | 467 Views
Hybrid Transactional Memory. Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen,. Intel Labs University of Michigan Intel Labs Intel Labs Intel Labs. Promise of Transactional Memory (TM). Maintain consistency in the presence of errors. Easier to program
E N D
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs Intel Labs Intel Labs
Promise of Transactional Memory (TM) Maintain consistency in the presence of errors Easier to program Compose naturally Easier to get parallel performance No deadlocks Avoid priority inversion and convoying Supports fault tolerance 6 5 4 3 2 1 lock(l1); lock(l2); A = A – 10; B = B + 10; unlock(l1); unlock(l2); ... if ( error ) recovery_code(); transaction { A = A – 10; B = B + 10; } ... if ( error ) abort_transaction; Simplify Parallel Programming Hybrid Transactional Memory
Flavors of Transactional Memory Easier to program Compose naturally Easier to get parallel performance No deadlocks Maintain consistency in the presence of errors Avoid priority inversion and convoying Supports fault tolerance 6 5 3 1 4 2 Basic Support programmer abort Support nonblocking Our Work: Efficient support for a TM that supports all these features Hybrid Transactional Memory
TM Implementations Requires versioning support and conflict detection • Hardware approach [ Herlihy’93 ] • Bounded number of locations • Maintain versions in cache → Low overhead • Pure-software approach [ Herlihy’03, Harris’03 ] • Unbounded number of locations can be accessed within a transaction • Slow due to overhead of maintaining multiple copies • Potentially orders of magnitude • Unbounded hardware approach [ Hammond’04, Ananian’05, Rajwar’05, Moore’06 ] • Require significant hardware support • Discussed in more detail in the paper Hybrid Transactional Memory
Hardware Approach Low overhead Buffers transactional state in Cache More concurrency Cache-line granularity Bounded resource Assembly Within a module Software Approach High overhead Uses Object copying to keep transactional state Less Concurrency Object granularity No resource limits High-level languages Across modules Hardware vs. Software TM Useful BUT Limited to library writers Useful BUT Limited to special data structures Neither is satisfactory for broader use Hybrid Transactional Memory
This Work A Hybrid Transactional Memory Scheme • Requires modest hardware support • Changes are localized • Supports unbounded number of locations • Performance of hardware when within hardware resource limits ( Low Overhead of pure Hardware TM ) • Gracefully fall back to software if the hardware resource limits are exceeded ( Unbounded resources of Pure software TM ) Experimentally demonstrate effectiveness of our approach Hybrid Transactional Memory
Outline • Motivation • Proposed Architectural Support • Hybrid Transactional Memory • Performance Evaluation • Conclusions
ISA Extensions • Start of a Transaction • Begin Transaction All ( XBA ) or Select ( XBS ) • Save Register State ( SSTATE ) • Specify handler on abort due to conflict ( XHAND ) • During a Transaction • Perform memory loads and store • Override defaults ( LDX, STX, LDR, STR ) • On Transaction Abort • Explicit Abort Transaction ( XA ) • Restore Register State ( RSTATE ) • On Transaction Commit • Commit Transaction ( XC ) Hybrid Transactional Memory
Our proposed changes Modest and Localized Modifications to Core L1 $ No changes to Interconnect Coherence Protocol L2 $ Memory Baseline CMP Architecture Core Core Core L1 $ L1 $ L1 $ Interconnect L2 $ Hybrid Transactional Memory
Three requirements: Maintain two versions Detect conflict Same core: Tag Another core: Cache coherence Atomic commit and abort Bounded Capacity of TM $ Associativity of TM $ and L2 Hardware Support for TM Regular Accesses Transactional Accesses L1 $ Transactional $ New Data Addl. Tag Old Data Data Tag Tag To Interconnect Core L1 $ Hybrid Transactional Memory
Outline • Motivation • Proposed Architectural Support • Hybrid Transactional Memory • Existing pure software scheme • Our hybrid scheme • Performance Evaluation • Conclusions
Pure Software TM [ Herlihy’03 ] State Object Pointer Object Contents State Pointer Old New • We use this Pure Software TM as a starting point • Implemented without any special architectural support using two techniques • Use copies of objects to keep transactional state • Make modifications on the copy during a transaction • Add a level of indirection • Switch the versions on when a transaction is committed Object Contents Hybrid Transactional Memory
Pure Software TM Scheme Cont’d X Modify Valid Copy State State Object Pointer Object Contents Object Contents State Pointer State Pointer Old Old New New Before accessing an object within a transaction Object Contents Hybrid Transactional Memory
Our Hybrid Transactional Memory • Two modes: Hardware and Software mode • The two modes need to coexist • Non-solution: Make all threads transition modes in lockstep • Avoid versioning overheads (allocation and copying) in the hardware mode • Still incur the indirection overheads • Tricky because it needs to bridge the hardware and software schemes • Hardware mode needs to modify data in-place • Pure Software TM assumes data is never modified in-place • Different sharing granularity • Cache-line (Hardware) vs. Object (Software) • Different conflict detection scheme • Data (Hardware) vs. State (Software) Hybrid Transactional Memory
Hybrid Scheme Example X State State Object Pointer Object Contents Object Contents State Pointer State Pointer Old Old New New Conflict detected by the threads in the hardware mode In the Hardware Mode Modify in place Object Contents Thread 1: HW mode Thread 2: HW mode In the Software Mode Copy and Modify Thread 3: SW mode Hybrid Transactional Memory
Hybrid Scheme Summary Object Contents State Object Pointer Object Contents State Pointer Old New Hybrid Transactional Memory
Outline • Motivation • Proposed Architectural Support • Hybrid Transactional Memory • Performance Evaluation • Conclusions
Experimental Framework • Infrastructure • Cycle-accurate execution-driven Multi-core simulator • Modified GCC • Three microbenchmarks • Two scenarios: Low and High Contention • Compare four synchronization implementations • Lock • Pure Hardware Transactional Memory • Pure Software Transactional Memory • Hybrid Transactional Memory Hybrid Transactional Memory
Performance Normalized Execution Time Number of Cores Benchmark: Vector-Reduce Contention: Low Hybrid Transactional Memory
Outline • Motivation • Proposed Architectural Support • Hybrid Transactional Memory • Performance Evaluation • Conclusions
Conclusions • Transactional Memory is a promising approach • Makes parallel programming an easier task • Easier to achieve parallel speedup • Hybrid Transactional Memory approach works • Requires only modest hardware support • Common case: Good performance for most transactions • Uncommon case: Graceful fallback to software mode when a transaction cannot complete within the hardware bounds Hybrid Transactional Memory
Transactions A Synchronization Mechanism to coordinate accesses to shared data by concurrent threads (An alternative to locks) Transaction: A group of operations on shared data Transaction { A = A – 10; B = B + 10; ... if (error) abort_transaction; } An API Enhancement: 1. Abort in middle of a transaction o On encountering a error Hybrid Transactional Memory
Transactional Memory (TM) • A transaction satisfies the following properties • Atomicity: All-or-nothing • On Commit: all operations become visible • On Abort: none of the operations are performed • Isolation (Serializable) • The transactions committed appear to have been performed in some serial order • Additional Properties • Optimistic concurrency control • Necessary for achieving good parallel speedup • Non-blocking (Optional) • Avoid Priority Inversion • Avoid Convoying Hybrid Transactional Memory
Advantage 1: Performance L1 Data Conflict C D L1 L1 A A C B D B L1 L1 A A A L1 Transactions Locks Serialized on Locks Finer granularity locks helps Burden on programmer Optimistically execute concurrently Abort and restart on data conflict Automatically done by runtime Hybrid Transactional Memory
Advantage 2: Reduces Bugs • With locks, programmers need to • Remember mapping between shared data and locks that guard them • Make sure the appropriate locks are held while accessing shared data • Make lock granularity as small as possible • Avoid deadlocks due to locks • All of these can cause subtle bugs • With TM, programmer does not have to deal with these problems Hybrid Transactional Memory
Other Advantages • Allows new programming paradigms • Simplifies error handling • A new style of programming: Speculate and Verify Programmer can abort offending transactions • Avoids other problems that locks suffer from • Priority Inversion: A low-priority thread can grab a lock and block a higher-priority thread • Convoying: If a thread holding a lock blocks on a high-latency event (like context-switch or I/O), it can cause other threads to wait for long periods • Fault Tolerant: If a process holding a lock dies, other processes will hang forever Runtime system can abort offending transactions Hybrid Transactional Memory
Benchmark: Vector-Reduce Contention: Low Normalized Execution Time Number of Cores Hybrid Transactional Memory
ABCDEF Abcdef Ghijk ABCDEF Abcdef Ghijk Abcdef Ghijk Abcdef Ghijk Abcdef Ghijk ABCDEF Abcdef Ghijk Hybrid Transactional Memory