230 likes | 469 Views
Transactional Memory An Overview of Hardware Alternatives. David A. Wood University of Wisconsin Transactional Memory Workshop April 8 th , 2005. What’s database got to do with it?. Atomicity All updates, or none Consistency Correct at begin and end Isolation Partial work not visible
E N D
Transactional MemoryAn Overview of Hardware Alternatives David A. Wood University of Wisconsin Transactional Memory Workshop April 8th, 2005
What’s database got to do with it? • Atomicity • All updates, or none • Consistency • Correct at begin and end • Isolation • Partial work not visible • Inputs stay stable • Durability • Survive “system” failures All (or some) memory ops, not just database objects Despite increasing awareness of failures Thread-Level Transactional Memory
CPU 801 Database Storage • Lock bits on virtual memory • 128 byte granularity • Added to pagetable and TLB • Caches user’s lock state • Trap on lock conflict • No h/w for logging, abort, etc. • Only uniprocessors • 801 and RS/6000 Memory TLB Tid Was this transactional memory? Thread-Level Transactional Memory
SQL/801 • “The development of SQL/801 was greatly simplified because, with minor exceptions, it considers only a single user. It achieves multiuser concurrency [on a uniprocessor] by running in multiple processes using the shared database storage….” Chang and Mergen, ’88 • Largest transactional memory application • Only real hardware transactional memory implementation • No one seems to be looking at what they learned Thread-Level Transactional Memory
Basic Transactional Mechanisms • Isolation • Detect when transactions conflict • Track read and write sets • Version management • Record new and old values • Atomicity • Commit new values • Abort back to old values Thread-Level Transactional Memory
H/W Transactional Memory Systems • Knight’s Lisp Work • Transactional Memory • Oklahoma Update • SLE/TLR • Transactional Coherence and Consistency • Unbounded TM • Virtual TM • Thread-level TM Thread-Level Transactional Memory
Knight’s Lisp Work [’86] • Parallel execution of sequential code • Break program into “transaction blocks” • Multiple loads in a transaction • Exactly one store ends the transaction • No register state passed between transactions • Execute transactions in parallel • Track dependences (i.e., read set) • Abort and restart on conflicting write • Transactions commit in sequential order • Broadcast writes on commit Thread-Level Transactional Memory
CPU Knight’s Hardware • Two caches • Dependency cache • Tracks read set • Bus monitor detects conflicts • Confirm cache • Holds write set • Supports multiple writes • Commits • Check dep. cache • Broadcast writes • Fast aborts • Invalidate Confirm cache • Use old values in Dep. Cache • Immediately restart execution Memory Dependency Cache Confirm Cache Spawned two threads: TLS & TM Thread-Level Transactional Memory
H&M’s Transactional Memory [’93] • Targets explicitly parallel (non-functional) codes • Motivated by lock-free data structures • Transactions: • Read and write multiple locations • Commit in arbitrary order • Implicit begin, explicit commit operations • Abort affects memory, not registers • Software manages restarting execution • Validate instruction detects pending abort • Implementation extends cache coherence • Read/Write locks correspond to MOESI states • Add orthogonal transaction states Thread-Level Transactional Memory
CPU H&M’s Transactional Memory • Adds Transaction Cache • Stores all data accessed by transactions • 2 copies of each line • Before and after image • Even for read-only data • Small, fully associative • Abort on all conflicts • NACK conflicting requests • Abort NACKed transaction • Fast commit and abort • Change trans. cache state Memory Cache Transaction Cache Thread-Level Transactional Memory
SLE/TLR • Hardware exploits speculative processors • Read sets tracked by coherence protocol • Write set maintained in store queue • Abort restarts execution, including register state • Speculative lock elision (SLE) • Elide locks from the dynamic execution stream • Convert critical sections to optimistic transactions • Concurrently execute non-conflicting transactions • Fall back on explicit locks if conflicts • Transactional Lock Removal (TLR) • Resolve conflicts using priority ordering (timestamps) • Delay lower priority transactions • Deadlock and starvation free Thread-Level Transactional Memory
Transactional Coherence and Consistency [’04] • TCC unifies coherence, memory consistency, and transaction support • All transactions, all the time • Transaction ordering • Ordered, Unordered, Partially Ordered • Supports thread-level speculation • Optimistic concurrency model • Unordered transactions serialize at commit • Conflicts detected at commit Thread-Level Transactional Memory
TCC On-Chip Interconnect Broadcast updates at commit Write buffer ~4 kB, holds new values until commit Shadow register file checkpoints architectural registers L2 Cache Logically Shared CPU L1 D L1 cache tracks read set, bit per line SRF Thread-Level Transactional Memory
TCC • Commits are sequential • Broadcasts addresses of all updates • Supports large transactions • Serialize all other transactions • Grabs and holds the commit bus • Cannot abort large transactions • Updates affect L2/Mem; no undo • Extensions forthcoming • talk to Kunle and Christos Thread-Level Transactional Memory
Unbounded Transactional Memory (UTM) • Unbounded transactions • Arbitrary size • Not limited by write buffer, cache, or memory • Arbitrary duration • Not limited by interrupts, context switch, etc. • Complex implementation • Not justified by performance • Settle for “nearly” unbounded transactions • Much simpler hardware Thread-Level Transactional Memory
Transactional Linux • Almost all of the transactions require < 100 cache lines • 99.9% need fewer than 54 cache lines • There are, however, some very large transactions! • >500k-byte fully-associative cache required Log-log scale Thread-Level Transactional Memory
Large Transaction Memory (LTM) • Register checkpoints • Snapshot of rename maps • Cache tracks read and write sets • T-bits mark transactional blocks • Cache holds new data values “in place” • O-bit indicates overflow to in-memory hashtable • Memory holds committed state • Abort invalidates all modified blocks • Miss on re-execution • Transactional writes force memory updates • Repeated writes (e.g., to local data) are written through Thread-Level Transactional Memory
Virtual Transactional Memory (VTM) • Only an overflow mechanism • No overhead on common in-cache case • Check shared overflow counter on cache miss • Low overhead when no conflict • Shared Bloom Filter rules out conflicts • Filter resides in virtual memory • Higher overhead on possible conflict • Hardware table walk to detect actual conflict • Table resides in virtual memory • Only incurred by large transactions with likely conflict • Supports context switches and paging Thread-Level Transactional Memory
801 revisited • Why didn’t 801 database storage succeed? • Lock bits helped performance and simplified software • Answer #1: • Changing lock bits requires TLB shootdown • Too complicated for the benefits? • Not a current problem: transaction h/w is easy • Answer #2: • Not universally available • DB2 was (is) multiplatform • Can’t rely on feature only available in one architecture • Still a relevant concern Thread-Level Transactional Memory
Need Standard Transaction Interface • Abstract away resource requirements • Support large, long transactions • Virtualize transactional memory • Transaction semantics between threads • NOT a hardware property • Permit range of implementations • Hardware, software, and combinations Thread-Level Transactional Memory
Thread-level Transactional Memory • Abstract mechanisms • Version management • Update memory “in place” • Log “before images” to thread level VM • Isolation • Logically extend memory words with read and write bits • Implementations can be conservative (e.g., blocks) • Atomicity • Commits easy due to in place updates • Aborts trap to user-level software • Hardware can accelerate common case Thread-Level Transactional Memory
Conclusions • Make the common case fast • 99+% of transactions fit in hardware • Lots of alternatives • Make both commits and aborts fast • Handle the uncommon case • Large transactions will occur, deal with ‘em • Shouldn’t be limited by hardware • Agree on a common abstraction • Success requires multi-platform support • Let vendors compete on price-performance Thread-Level Transactional Memory