620 likes | 826 Views
Marek Olszewski. Jeremy Cutler. Greg Steffan. A Dynamic Binary-Rewriting Approach to Software Transactional Memory. appeared in PACT 2007, Brasov, Romania University of Toronto. The Parallel Programming Challenge. Coarse-grained locking Easy to program Scales poorly
E N D
Marek Olszewski Jeremy Cutler Greg Steffan A Dynamic Binary-Rewriting Approach to Software Transactional Memory appeared in PACT 2007, Brasov, Romania University of Toronto
The Parallel Programming Challenge • Coarse-grained locking • Easy to program • Scales poorly • Fine-grained locking • Scales well • Hard to get right • eg., deadlock, priority inversion, etc. • The promise of Transactional Memory • As easy to program as coarse-grained locking • Performance/scalability of fine-grained locking
Transactional Memory (TM) Transactions: ? ? Source Code: ... atomic { ... access_shared_data(); ... } ... ... atomic { ... access_shared_data(); ... } ... ... atomic { ... access_shared_data(); ... } ... TM System Programmer: Specifies threads/transactions in source code TM System: Executes transactions optimistically in parallel 1) Checkpoints execution 2) Detects conflicts 3) Commits or aborts and re-executes
TM Implementations • Flavors of TM: • Hardware (HTM), Software (STM), Hybrid (HyTM) • STM is especially compelling • Exploit current commodity hardware (multicores) • Learn about real TM systems and apps • Current STM Systems: • Java: DSTM, ASTM • C or C++: McRT icc, TL2, RSTM, OSTM • object-based or programmer intensive (or both) Our focus: arbitrary C/C++, realistic environment
Programming with STM Loader Source Code: #include <glib.h> GTree *tree; ... atomic { g_tree_insert(tree &key, &val); } ... Executable: STM Compiler my_app Running Application: my_app Shared Library: glib “Legacy Locks” Pre-compiled Binary kernel System Calls Not handled by current compiler/library-based STMs
JudoSTM: An Overview • Key design choices: • Dynamic Binary Rewriting (DBR) • insert instrumentation to implement STM • Value-based conflict detection • Resulting key features: • Privileged transactions (support system calls) • Legacy lock elision • Efficient invisible readers
JudoSTM Design Choice 1 • Dynamic Binary Rewriting (DBR) • Judo DBR Framework (user-space version of JIFL†) • † JIT Instrumentation - A Novel Approach To Dynamically Instrument Operating Systems, SIGOPS EuroSys 2007
Dynamic Binary Rewriting Original Code: Code Cache: bb1 bb1 bb1 bb2 bb3 bb4 Judo
Dynamic Binary Rewriting Original Code: Code Cache: bb1 bb1 bb2 bb2 bb3 bb2 bb4 Judo
Dynamic Binary Rewriting Original Code: Code Cache: bb1 bb1 bb1 bb2 bb3 bb2 bb2 bb4 bb4 bb4 Judo
Judo - Performance Normalized Runtime Overhead Overhead low enough to implement STM?
DBR-Based STM Goal: Perform These Efficiently • For all non-stack write instructions • Track write addresses and values (write-set) • Write-buffer the values from regular memory • For all non-stack read instructions • Redirect to the write-buffer • If miss: track read addr.s and values (read-set) • When a transaction completes: • Acquire commit lock(s) • Validate read-set (value-based conflict detection) • Commit write-set to memory • Release commit lock(s)
DBR: Attractive Properties for STM • Performance: overheads are amortized • code cache • Can handle arbitrary code and shared libraries • any/all code is transactionalized as it executes • Sandboxed Transactions • Typical STM: • inconsistent values could stray execution • i.e., stray to non-transactionalized code (very bad!) • solution: frequent & costly read-set validation • DBR-based STM: • any/all code is transactionalized as it executes Tough problems for conventional STMs addressed by DBR
JudoSTM Design Choice 2 • Value-Based Conflict Detection • (as opposed to location-based)
Location-Based Conflict Detection Strip versions: Strip versions: 0 0 0 Strip versions: Strips Transaction 1: Main Memory: 6 2 3 5 2 3 5 Transaction 2: Legend: Read Written
Location-Based Conflict Detection Transaction 1: Transaction 1: 2 3 5 Strip versions: Main Memory: 6 2 3 5 2 3 5 Strip versions: 0 0 0 0 0 Transaction 2: Strip versions: Legend: Read Written
Location-Based Conflict Detection 6 2 3 5 Transaction 1: 2 3 5 Strip versions: 0 Main Memory: 6 2 Strip versions: 0 0 0 0 0 Transaction 2: Transaction 2: 6 9 Strip versions: Legend: Read Written
Location-Based Conflict Detection 6 2 3 5 6 9 Transaction 1: 2 3 5 Strip versions: 0 Main Memory: 6 2 Strip versions: 0 1 0 0 0 Transaction 2: Transaction 2: 9 Strip versions: 0 Commit step 1) Validate Read Set Commit step 2) Publish Writes (and inc version #s) Legend: Read Written
Location-Based Conflict Detection 6 2 3 5 Commit step 1) Validate Read Set Abort! Transaction 1: Transaction 1: 2 3 5 Strip versions: 0 Main Memory: 6 9 Strip versions: 0 0 1 0 Transaction 2: Strip versions: 0 Note: all transactions must maintain strip version #s Legend: Read Written
Value-Based Conflict Detection Transaction 1: Transaction 1: 2 3 5 Main Memory: 6 2 3 5 2 3 5 Transaction 2: Legend: Read Written
Value-Based Conflict Detection Transaction 1: 2 3 5 Main Memory: 6 2 3 5 6 2 Transaction 2: Transaction 2: 6 9 Legend: Read Written
Value-Based Conflict Detection Transaction 1: 2 3 5 Main Memory: 6 2 3 5 6 2 Transaction 2: Transaction 2: 6 9 9 Commit step 1) Validate Read Set Commit step 2) Publish Writes Legend: Read Written
Value-Based Conflict Detection Commit step 1) Validate Read Set Abort! Transaction 1: Transaction 1: 2 3 5 Main Memory: 6 2 3 5 6 9 Transaction 2: Note: no version information to maintain Legend: Read Written
JudoSTM Feature 1: • Privileged transactions • Can execute (but not roll back) system calls • Grab commit lock(s) when about to make a syscall • Release when transaction completes • Only one privileged transaction exists at a time
Privileged Transactions Transaction 1: Transaction 1: 2 3 5 Main Memory: 6 2 3 5 2 3 5 Transaction 2: Legend: Read Written
Privileged Transactions Transaction 1: 2 3 5 Main Memory: 6 2 3 5 6 2 Transaction 2: Transaction 2: 9 (privileged, syscalls) Privileged: can write directly to memory may be uninstrumented Legend: Read Written
Privileged Transactions Commit step 1) Validate Read Set Abort! Transaction 1: Transaction 1: 2 3 5 Main Memory: 6 2 3 5 6 9 Transaction 2: Value-based conflict detection facilitates system calls within transactions! Legend: Read Written
JudoSTM Feature 2: • Legacy Lock Elision • Safely ignore locks within legacy code
Legacy Lock Elision lock acquire Transaction 1: Transaction 1: 0 1 Main Memory: 2 2 0 0 5 6 Lock: Transaction 2: Legend: Read/Write Read Written
Legacy Lock Elision Transaction 1: 1 0 Main Memory: 2 2 0 0 5 6 Lock: Transaction 2: Transaction 2: 1 0 lock acquire Legend: Read/Write Read Written
Legacy Lock Elision Transaction 1: 1 0 Main Memory: 2 6 0 0 5 6 Lock: Transaction 2: Transaction 2: 0 0 1 6 9 lock release Legend: Read/Write Read Written
Legacy Lock Elision Transaction 1: 1 0 silent store Main Memory: 2 6 0 0 5 6 Lock: Transaction 2: Transaction 2: 0 0 1 0 6 9 9 Commit step 1) Validate Read Set Commit step 2) Publish Writes Legend: Read/Write Read Written
Legacy Lock Elision lock release Transaction 2: Transaction 1: 1 0 0 5 7 Main Memory: 5 6 0 0 5 6 9 Lock: Transaction 2: Legend: Read/Write Read Written
Legacy Lock Elision Commit step 1) Validate Read Set Transaction 2: Transaction 1: 0 0 1 5 7 Main Memory: 5 6 0 0 5 6 9 Lock: Transaction 2: Legend: Read/Write Read Written
Legacy Lock Elision Commit step 2) Publish Writes Transaction 2: Transaction 1: 0 0 0 1 5 7 7 Main Memory: 5 6 0 0 5 6 9 Lock: Transaction 2: Value-based conflict detection facilitates the elision of legacy locks! Legend: Read/Write Read Written
JudoSTM Feature 3: • Efficient Invisible Readers
Supporting Invisible Readers • Invisible Readers: don’t report reads to others • good performance • but can lead to inconsistent read data: errors! • Data errors: segfault, divide by zero • Cheap solution: catch with trap/signal handlers • Control errors: jump to non-instrumented code • Typical solution: verify read-set after every load • Expensive! O(N2) • DBR solution: prevented by sandboxing • DBR instruments all code as it executes
JudoSTM Details • Implementation
(reminder)Goal: Perform These Efficiently • For all non-stack write instructions • Track write addresses and values (write-set) • Buffer the values from regular memory • For all non-stack read instructions • Redirect to the write-buffer • If miss: track read addr.s and values (read-set) • When a transaction completes: • Acquire commit lock(s) • Validate read-set (value-based conflict detection) • Commit write-set to memory • Release commit lock(s)
Read/Write Buffer Implementation Linear probed open-addressed hashtables Read Hashtable: Read Buffer: Write Hashtable: Write Buffer: Address Address Efficient lookup: 5 insts for a hit (+ state-saving?) Efficient validate and commit?
Efficient Commit: Executable Write-Buffer Write Hashtable: Top ptr Write Buffer: movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 ret Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes
Efficient Commit: Executable Write-Buffer Write Hashtable: Top ptr Write Buffer: movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000025,0x80B10BB8 ret Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes
Efficient Commit: Executable Write-Buffer Write Hashtable: Top ptr Write Buffer: movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x0000ab42,0x80B10BCC movl $0x00000025,0x80B10BB8 ret Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes
Efficient Commit: Executable Write-Buffer Write Hashtable: Top ptr Write Buffer: movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x80B10CFC,0x80B10CA4 movl $0x0000ab42,0x80B10BCC movl $0x00000025,0x80B10BB8 ret Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes
Efficient Commit: Executable Write-Buffer Write Hashtable: Top ptr Write Buffer: movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x80B10CFC,0x80B10CA4 movl $0x0000ab42,0x80B10BCC movl $0x00000025,0x80B10BB8 ret Execute the write-buffer to commit!
Top ptr Efficient Validation: Executable Read-Buffer Read Hashtable: Read Buffer: cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort ret Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes
Top ptr Efficient Validation: Executable Read-Buffer Read Hashtable: Read Buffer: cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes
Top ptr Efficient Validation: Executable Read-Buffer Read Hashtable: Read Buffer: cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000005, 0x80B10BB8 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes
Top ptr Efficient Validation: Executable Read-Buffer Read Hashtable: Read Buffer: cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000100, 0x80B10BCC jne,pn judostm_trans_abort cmp $0x00000005, 0x80B10BB8 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes
Top ptr Efficient Validation: Executable Read-Buffer Read Hashtable: Read Buffer: cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000100, 0x80B10BCC jne,pn judostm_trans_abort cmp $0x00000005, 0x80B10BB8 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Execute the read-buffer to validate the read-set!