1 / 16

D THREADS : Efficient Deterministic Multithreading

D THREADS : Efficient Deterministic Multithreading. Tongping Liu, Charlie Curtsinger and, Emery D. Berger Dept. of Computer Science University of Massachusetts, Amherst Presented by: Lokesh Gidra. Concurrent Programming is hard!. Prone to deadlocks and race conditions.

diata
Download Presentation

D THREADS : Efficient Deterministic Multithreading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DTHREADS: Efficient Deterministic Multithreading Tongping Liu, Charlie Curtsinger and, Emery D. Berger Dept. of Computer Science University of Massachusetts, Amherst Presented by: LokeshGidra

  2. Concurrent Programming is hard! • Prone to deadlocks and race conditions. • Thread interleavings are non-deterministic  Hard to debug! • Deterministic Multithreaded System (DMT) eliminates this non-determinism. • Same program with same input  same result. • Simplifies debugging. • Simplifies record and replay (eliminates need to track memory operations). • Multiple replicated execution for fault tolerance.

  3. Contributions • DTHREADS guarantees deterministic execution. • Straightforward deployment: replaces libpthread. No recompilation required. • Eliminates cache-line false sharing (as a side effect). • Makes printf debugging practical!

  4. Basic Idea • Isolated memory access between different threads. • Replace threads with processes. • Replace pthread_create() with clone system call. • Memory mapped files are used to share memory (globals and the heap). Heap Thread 1 Thread 2

  5. Fence and Global Token

  6. Commit Protocol

  7. Deterministic Synchronization(Global token is the key!) • Locks • If held by someone else, pass the token. • Release the token only when lock count is 0. • Condition Variables • Pthread_cond_wait: Remove from token’s Q and add to variable’s Q. • Pthread_cond_signal: remove first thread in variable Q and add to token’s Q.

  8. Contd… • Barriers (similar to condition variable) • If not last to enter: move self from token Q to barrier Q. • otherwise, move all from barrier Q to token Q. • Thread Creation • Child: place on token Q; wait for || phase. • Thread Exit/Cancellation • Remove from Q, call pthread_exit()/kill()

  9. Memory Allocation and OS Support • Assign sub-heap to each thread using deterministic thread index. • Superblocks allocated using locks  deterministic. • Intercepts system calls which affect program execution (like sigwait). • Intercepts read/write system calls: touch pages for COW, to avoid segfault.

  10. Performance • On 8-core machine with 16GB RAM, 4MB L2. • Benchmarks from PARSEC and Phoenix suites. For 9 of 14 benchs, dthreads runs nearly as fast or faster than pthreads, while providing determinism.

  11. Scalability • Scales nearly as well or better than pthreads. • Scales almost always as well or better than CoreDet.

  12. Limitations • Incurs substantial overhead for apps with large number of: • short lived transactions. • modified pages per-transaction. • No control over external non-determinism. • Apps using Ad-hoc synchronization are not supported. • Sharing of stack variables is not supported. • Increases program’s memory footprint. • Will perform poorly if #threads > #cores.

  13. Personal Observations(side-effects on NUMA systems) • Substantially reduces TLB miss cost: • For 64-bit apps, one TLB miss: • Pthreads: ~1500 cycles • Dthreads: ~500 cycles • Diff-ing will be too expensive: • 4K as compared to just few cache lines.

  14. Take Away • Deterministic Multithreaded Systems are good. • Dthreads: an easy to deploy DMT system. • Supports all pthread APIs. • Replaces threads with processes for memory isolation. • Uses twin pages and diff-ing to commit changes. • Avoids cache-line false sharing. • Good for apps with less transactions. • Or, can we say for scalable apps? • Doesn’t support Ad-hoc synchronization.

  15. Optimizations • Lazy Commit • Lazy twin creation and diff elimination • Single threaded execution • Lock ownership • Parallelization

More Related