160 likes | 408 Views
D THREADS : Efficient Deterministic Multithreading. Tongping Liu, Charlie Curtsinger and, Emery D. Berger Dept. of Computer Science University of Massachusetts, Amherst Presented by: Lokesh Gidra. Concurrent Programming is hard!. Prone to deadlocks and race conditions.
E N D
DTHREADS: Efficient Deterministic Multithreading Tongping Liu, Charlie Curtsinger and, Emery D. Berger Dept. of Computer Science University of Massachusetts, Amherst Presented by: LokeshGidra
Concurrent Programming is hard! • Prone to deadlocks and race conditions. • Thread interleavings are non-deterministic Hard to debug! • Deterministic Multithreaded System (DMT) eliminates this non-determinism. • Same program with same input same result. • Simplifies debugging. • Simplifies record and replay (eliminates need to track memory operations). • Multiple replicated execution for fault tolerance.
Contributions • DTHREADS guarantees deterministic execution. • Straightforward deployment: replaces libpthread. No recompilation required. • Eliminates cache-line false sharing (as a side effect). • Makes printf debugging practical!
Basic Idea • Isolated memory access between different threads. • Replace threads with processes. • Replace pthread_create() with clone system call. • Memory mapped files are used to share memory (globals and the heap). Heap Thread 1 Thread 2
Deterministic Synchronization(Global token is the key!) • Locks • If held by someone else, pass the token. • Release the token only when lock count is 0. • Condition Variables • Pthread_cond_wait: Remove from token’s Q and add to variable’s Q. • Pthread_cond_signal: remove first thread in variable Q and add to token’s Q.
Contd… • Barriers (similar to condition variable) • If not last to enter: move self from token Q to barrier Q. • otherwise, move all from barrier Q to token Q. • Thread Creation • Child: place on token Q; wait for || phase. • Thread Exit/Cancellation • Remove from Q, call pthread_exit()/kill()
Memory Allocation and OS Support • Assign sub-heap to each thread using deterministic thread index. • Superblocks allocated using locks deterministic. • Intercepts system calls which affect program execution (like sigwait). • Intercepts read/write system calls: touch pages for COW, to avoid segfault.
Performance • On 8-core machine with 16GB RAM, 4MB L2. • Benchmarks from PARSEC and Phoenix suites. For 9 of 14 benchs, dthreads runs nearly as fast or faster than pthreads, while providing determinism.
Scalability • Scales nearly as well or better than pthreads. • Scales almost always as well or better than CoreDet.
Limitations • Incurs substantial overhead for apps with large number of: • short lived transactions. • modified pages per-transaction. • No control over external non-determinism. • Apps using Ad-hoc synchronization are not supported. • Sharing of stack variables is not supported. • Increases program’s memory footprint. • Will perform poorly if #threads > #cores.
Personal Observations(side-effects on NUMA systems) • Substantially reduces TLB miss cost: • For 64-bit apps, one TLB miss: • Pthreads: ~1500 cycles • Dthreads: ~500 cycles • Diff-ing will be too expensive: • 4K as compared to just few cache lines.
Take Away • Deterministic Multithreaded Systems are good. • Dthreads: an easy to deploy DMT system. • Supports all pthread APIs. • Replaces threads with processes for memory isolation. • Uses twin pages and diff-ing to commit changes. • Avoids cache-line false sharing. • Good for apps with less transactions. • Or, can we say for scalable apps? • Doesn’t support Ad-hoc synchronization.
Optimizations • Lazy Commit • Lazy twin creation and diff elimination • Single threaded execution • Lock ownership • Parallelization