630 likes | 643 Views
Log-based Transactional Memory. Mark D. Hill Multifacet Project, Univ. of Wisconsin—Madison. Multicore here: “Intel has 10 projects in the works that contain four or more computing cores per chip” —Intel CEO, Fall ’05
E N D
Log-based Transactional Memory Mark D. Hill Multifacet Project, Univ. of Wisconsin—Madison • Multicore here: “Intel has 10 projects in the works that contain four or more computing cores per chip” —Intel CEO, Fall ’05 • How program? “Blocking on a mutex is a surprisingly delicate dance” —OpenSolaris, mutex.c
LogTM Contributors • Faculty • Mark Hill, Ben Liblit, Mike Swift, David Wood • Students • Jayaram Bobba, Derek Hower, Kevin Moore,Haris Volos, Luke Yen • Alumna • Michelle Moravan • Funding • Grants from U.S. National Science Foundation • Donations from Intel and Sun Wisconsin Multifacet Project
Summary • Our Transactional Memory (TM) goals • Unlimited TM model: even large/long transactions • Facilitate SW composition: unlimited nesting • Accelerate with some HW support • Log-based TM (Signature Edition) • Supports unlimited TM w/ nesting • Accelerates commit by writing new values in place(after saving old values in a per-thread log) • Signatures summarize read/write sets • HW mechanisms: simple, policy-free, SW accessible Wisconsin Multifacet Project
Outline • TM Motivation & Background • Why TM?, Terminlogy, & Taxonomy • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • LogTM Evaluation • LogTM Operating System Interactions (optional) • Summary & Future Directions Wisconsin Multifacet Project
Thread 0 move(a, b, key1); Thread 1 move(b, a, key2); Locks are Hard // WITH LOCKS void move(T s, T d, Obj key){ LOCK(s); LOCK(d); tmp = s.remove(key); d.insert(key, tmp); UNLOCK(d); UNLOCK(s); } Moreover Coarse-grain locking limits concurrency Fine-grain locking difficult DEADLOCK! Wisconsin Multifacet Project
Transactional Memory (TM) void move(T s, T d, Obj key){ atomic { tmp = s.remove(key); d.insert(key, tmp); } } • Programmer says • “I want this atomic” • TM system • “Makes it so” • Software TM (STM) Implementations • Currently slower than locks • Always slower than hardware? • Hardware TM (HTM) Implementations • Leverage cache coherence & speculation • Fast • But hardware finite & should be policy-free Wisconsin Multifacet Project
Some Transaction Terminology Transaction: State transformation that is: Atomic (all or nothing) Consistent Isolated (serializable) Durable (permanent) Commit: Transaction successfully completes Abort: Transaction fails & must restore initial state Read (Write) Set: Items read (written) by a transaction Conflict: Two concurrent transactions conflict if either’s write set overlaps with the other’s read or write set Wisconsin Multifacet Project
Modules expose interfaces, NOT implementations Example Insert() calls getID() from within a transaction The getID() transaction is nested inside the insert() transaction int getID() { // child TX begin_transaction(); id = global_id++; commit_transaction(); return id; } Nested Transactions for Software Composition void insert(object o){ // parent TX begin_transaction(); t.insert(getID(), o); commit_transaction(); } Wisconsin Multifacet Project
Closed Nesting • On Commit child transaction is merged with its parent • Flat • Nested transactions “flattened” into a single transaction • Only outermost begins/commits are meaningful • Any conflict aborts to outermost transaction • Partial rollback • Child transaction can be aborted independently • Can avoid costly re-execution of parent transaction Child transactions remain isolated until parent commits Wisconsin Multifacet Project
Implementing TM • Version Management • new values for commit • old values for abort • Must keep both • Conflict Detection • Find read-write, write-read or write-write conflictsamong concurrent transactions • Allows multiple readers OR one writer Large state (must be precise) Checked often (must be fast) Wisconsin Multifacet Project
How Do Hardware TM Systems Differ? Conflict Detection Lazy: checkon commit Eager: checkbefore read/write Like Databases withOptimistic Conc. Ctrl. No HTMs (yet) Stanford TCC Illinois Bulk Like Databases withConservative C. Ctrl. Herlihy/Moss TM MIT LTM Intel/Brown VTM MIT UTM Wisconsin LogTM Wisconsin Multifacet Project
Transactional Memory Goals/Challenges • Unlimited TM Model • Large transactions: cache victimization & even paging • Long transactions: thread switching/mitgration • OS traps/calls? • Facilitate SW composition • Unlimited closed nesting (open nesting?) • Accelerate with at most modest HW support • Make the common case fast • Make HW simple, policy-free, & SW exposed Wisconsin Multifacet Project
Outline • TM Motivation & Background • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • LogTM Evaluation • LogTM Operating System Interactions (optional) • Summary & Future Directions Wisconsin Multifacet Project
Single-CMP/Multicore System Core0 Core2 Core13 Core14 Core15 … L1 $ L1$ L1$ L1$ L1$ Interconnect L2 $ DRAM Wisconsin Multifacet Project
LogTM Per-Core Hardware Registers Register Checkpoint Conflict Detection: Signatures Version Mgmt:Pointers toSegmented Log Read TMCount Write LogFrame SummaryRead LogPtr SummaryWrite Processor (SMT Context) Tag Data No ExplicitTM State Data Caches Wisconsin Multifacet Project
Outline • Motivation & Background • LogTM Hardware Preview • LogTM Version Management • Basic Logging & Segmented Logs for Nesting • LogTM Conflict Detection • LogTM Evaluation • LogTM Operating System Interactions (optional) • Summary & Future Directions Wisconsin Multifacet Project
1 1 1 1 34------------ -- ------------ --23 LogTM’s Eager Version Management • New values stored in place • Old values stored in transaction log • Allocated per-thread in virtual memory(like per-thread stacks) • Filled by hardware(during transactions) • Read by software (on abort) VA Memory Block R W Sets 00 12-------------- 0 0 40 --------------24 --------------23 0 0 C0 56-------------- 34-------------- 0 0 1000 c0 Log Base 1000 Transaction Log 1040 40 Log Ptr 1090 1080 TM count 1 Wisconsin Multifacet Project
Segmented Transaction Log for Nesting • LogTM’s log is a stack of frames (like activation records) • A frame contains: • Header (including saved registers and pointer to parent’s frame) • Undo records (block address, old value pairs) • Garbage headers (headers of committed closed transactions) • Commit action records • Compensating action records Header LogFrame Undo record LogPtr Undo record 2 0 TM count Header 1 Undo record Undo record Wisconsin Multifacet Project
Closed Nested Commit • Merge child’s log frame with parent’s • Mark child’s header as “dummy header” • Copy pointer from child’s header to LogFrame Header LogFrame Undo record LogPtr Undo record TM count Header 2 1 Undo record Undo record Wisconsin Multifacet Project
LogTM Version Management Discussion • Eager Version Management via Segment Log • Advantages: • Transaction read new values normally (w/o bypassing) • No data movement at commit • Both old & new data in (virtual) memory • Both old & new data can be cached or victimized • Supports unbounded nesting • No extra indirection (unlike STM) • Disadvantages • Aborts slower & handled by software • Adds HW to write log • Requires eager conflict detection? Wisconsin Multifacet Project
Outline • TM Motivation & Background • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • Signatures, Nesting, & Detection via Coherence • LogTM Evaluation • LogTM Operating System Interactions (optional) • Summary & Future Directions Wisconsin Multifacet Project
LogTM-SE Read/Write Set Summary • Use Per-Thread Signatures (adapted from Bulk) • (Original LogTM used in-cache read/write bits) External ST E External ST F A C D B Program: xbegin LD A ST B LD C LD D ST C … FALSE POSITIVE: CONFLICT! ALIAS Hash Function(s) NO CONFLICT 00100100 00000100 00100100 00000000 00100100 00100100 R W 00100010 00000000 00100010 00000010 00100010 Wisconsin Multifacet Project
Conflict Detection for Unbounded Nesting Nesting Affects Signatures (not coherence next) Nested Begin: Save R/W Signatures on Log Partial Abort: Restore R/W Signatures (Closed) Nested Commit: Discard Saved Signatures Open Nesting also handled Recall LogTM’s Segmented Log already supportsversion management for unbounded nesting Add saved signature space to frame header <Skip Nested Signature Example> Wisconsin Multifacet Project
Nested Begin Transaction Log Program Processor State xbegin LD … ST … xbegin 01001000 01001000 00000000 R 01010010 00000000 Xact header 01010010 W Undo entry Undo entry 1 TMCount Undo entry Log Frame Xact header Log Ptr Wisconsin Multifacet Project
Nested Begin Transaction Log Program Processor State xbegin LD … ST … xbegin 01001000 R 01010010 Xact header W Undo entry Undo entry 2 TMCount Undo entry Log Frame Xact header 01001000 01010010 Log Ptr Wisconsin Multifacet Project
Partial Abort Transaction Log Program Processor State xbegin LD … ST … xbegin LD … ST … ABORT! 01001001 01001000 R 01010010 01110110 Xact header W Undo entry Undo entry 1 2 TMCount Undo entry Log Frame Xact header 01001000 01010010 Log Ptr Undo entry Undo entry Wisconsin Multifacet Project
Nested Commit Transaction Log Program Processor State xbegin LD … ST … xbegin LD … ST … xend 01001000 01001001 R 01010010 Xact header 01110110 W Undo entry Undo entry 1 2 TMCount Undo entry Log Frame Xact header 01001000 Garbage Hdr 01010010 Log Ptr Undo entry Undo entry Wisconsin Multifacet Project
Unbounded Nesting Support Summary Closed nesting: Begin: save signatures Abort: restore signatures Commit: No signature action Open nesting: Begin: save signatures Abort: restore signatures Commit: restore signatures Wisconsin Multifacet Project
LogTM’s Eager Conflict Detection (before access) LogTM detects conflicts using coherence • Requesting core issues coherence request • L2 directory forwards to other core(s) • Responding core • Detects conflict using local signatures • Informs requesting processor of conflict (4) Requesting core resolves conflict Wisconsin Multifacet Project
GETX DATA Protocol Animation: Transactional Write • Core C0 store • C0 sends get exclusive (GETX) request • L2 Directory respondswith data (old) • C0 executes store L2 Directory I [old] M@C0 [old] C0 C1 TM mode TM mode 1 0 0 (W-) (--) (--) Signature (--) Signature M [new] M [old] I [none] I [none] Wisconsin Multifacet Project
Conflict! Protocol Animation: Transactional Conflict • In-cache transaction conflict • C1 sends get shared (GETS) request • L2 Directory forwards to P0 • C1 detects conflict and sends NACK L2 Directory M@C0 [old] GETS Fwd_GETS C0 C1 TM mode TM mode 1 0 0 (W-) Signature (--) Signature I [none] M [new] M [new] NACK Wisconsin Multifacet Project
Cache Victimization Gracefully Handled! • Consider eviction of transactional data from Core C0 • No Effect on R/W Set Summary via Signatures • For Conflict Detection,Forward Coherence Requests After Victimization • Trivial with broadcast coherence • Silent S replacements w/ directory: S @ C0 S @ C0 • Writeback to directory sticky: M @ C0 Sticky-M @ C0 • Recall Eager Version Management via Log • On commit: no need to re-fetch victimized block • On abort: SW log walk naturally re-fetches victimized block Wisconsin Multifacet Project
Sticky States: No New Bits in L1 Cache or L2 Directory Shared (L2) Directory State Private (L1) Cache State Wisconsin Multifacet Project
Conflict Resolution • Conflict Resolution • Can wait risking deadlock or abort risking livelock • Wait/abort transaction at requesting or responding proc? • LogTM resolves conflicts at requesting processor • Original LogTM included HW timestamps • Requesting processor can waits (using nacks/retries) • or aborts if other processor is waiting (deadlock possible)& it is logically younger • Current LogTM has requesting processor traps to software contention manager that decides who waits/aborts Wisconsin Multifacet Project
LogTM Conflict Detection Discussion • Eager Conflict Detection via Signatures & Coherence • Advantages: • Supports unbounded nesting • Signatures are compact HW • Signatures software-accessible: save/restore for nesting • Coherence provide efficient conflict detection • Disadvantages • Signatures have false positives • Requires modest coherence protocol changes • Does not (yet) handle thread migration & paging(but coming later) Wisconsin Multifacet Project
Outline • TM Motivation & Background • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • LogTM Evaluation • Methods, vs. Lock, & vs. Perfect Signatures • LogTM Operating System Interactions (optional) • Summary & Future Directions Wisconsin Multifacet Project
Single-CMP LogTM System 2-way 2-way 2-way 2-way 2-way … Interconnect Registers Register Checkpoint L2 $ Read TMCount Write LogFrame SummaryRead LogPtr SummaryWrite Core 15 (SMT Context 0) DRAM Core1 Core0 Core13 Core15 Core14 (SMT Context 1) L1$ L1 $ L1$ L1$ L1$ Wisconsin Multifacet Project
Experimental Methodology Infrastructure Virtutech Simics full-system simulation Wisconsin GEMS timing modules System 32 transactional threads (16 cores x 2 SMT threads/core) 32kB 4-way L1 I and D, 64-byte blocks, 1cycle latency 8MB 8-way unified L2, 34 cycle latency L2 directory for coherence, maintains full sharer bit vector Workloads Radiosity, Raytrace, Mp3d, Cholesky Berkeley DB Wisconsin Multifacet Project
Lock Results Wisconsin Multifacet Project
Perfect Signature Results Perfect signatures similar or better than Locks Wisconsin Multifacet Project
Realistic Signature Results Realistic Signatures similar to Perfect Signatures and Locks For our workloads, false positives are not a problem Wisconsin Multifacet Project
What about scalability? • Bigger system • Bigger transactions • False positives are a function of: • Transaction size • Transactional duty cycle • Number of concurrent transactional threads • Filtering due to on-chip directory protocol • Signatures gracefully degrade to serialization Wisconsin Multifacet Project
LogTM Evaluation Discussion • LogTM Running Splash & BerkeleyDB • Good News: • Works! • Performs similar to locks • Signature false postive not (yet) an issues • Bad News • Baby workloads • Baby workloads • Baby workloads Wisconsin Multifacet Project
Outline • TM Motivation & Background • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • LogTM Evaluation • LogTM Operating System Interactions (optional) • Escape Actions, Thread Switching, (& Paging) • Summary & Future Directions Wisconsin Multifacet Project
Escape Actions • Allow non-transactional escapes from a transaction • (e.g., system calls, I/O) • Similar to Zilles’s pause/unpause • Escape actions never: • Abort • Stall • Cause other transactions to abort • Cause other transactions to stall • Commit and compensating actions • similar to open nests Not recommended for the average programmer! Wisconsin Multifacet Project
Thread Switching Support Why? Support long-running transactions What? Conflict Detection for descheduled transactions How? Summary Read / Write Signatures w/ Invariant: If thread t of process P is scheduled to use an active signature,the corresponding summary signature holds the union of the saved signatures from all descheduled threads from process P. Updated using TLB-shootdown-like mechanism<skip example> Wisconsin Multifacet Project
Handling Thread Switching W W W W 00000000 00000000 00000000 00000000 Summary Summary Summary R R R Summary 00000000 00000000 00000000 R 00000000 OS T2 T3 T1 W 00000000 Summary R 00000000 W 01001000 W 0100000 W 0100000 W 00000000 R 01010010 R 01010010 R 01000010 R 00000000 P1 P4 P2 P3 Wisconsin Multifacet Project
Handling Thread Switching W W W 00000000 00000000 00000000 Summary Summary Summary R R R 00000000 00000000 00000000 W 01001000 00000000 Summary OS R 01010010 00000000 Deschedule T2 T3 T1 W 00000000 Summary R 00000000 W 01001000 W 0100000 W 0100000 W 00000000 01001000 R 01010010 R 01010010 R 01000010 R 00000000 01010010 P1 P4 P2 P3 Wisconsin Multifacet Project
Handling Thread Switching W W W 00000000 00000000 00000000 Summary Summary Summary R R R 00000000 00000000 00000000 W W 01001000 01001000 Summary Summary R R 01010010 01010010 W 01001000 Summary OS R 01010010 Deschedule T2 T3 T1 W 00000000 Summary R 00000000 W 01001000 W 0100000 W 0100000 W 00000000 R 01010010 R 01010010 R 01000010 R 00000000 P1 P4 P2 P3 Wisconsin Multifacet Project
Handling Thread Switching W W 01001000 01001000 Summary Summary R R 01010010 01010010 W 01001000 Summary OS R 01010010 T1 T2 T3 W W 00000000 00000000 Summary Summary R R 00000000 00000000 W 00000000 W 0100000 W 0100000 W 00000000 R 00000000 R 01010010 R 01000010 R 00000000 P1 P4 P2 P3 Wisconsin Multifacet Project