120 likes | 210 Views
Accurate and Efficient Filtering for the Intel Thread Checker Race Detector. By Paul Sack, Brian E. Bliss, Zhiqiang Ma, Paul Petersen, Josep Torrellas. 2014-10-23. OS Lab. Ok-Kyoon Ha. 2006 ACM. Motivation. debugging data races is a difficult task
E N D
Accurate and Efficient Filtering for the Intel Thread Checker Race Detector By Paul Sack, Brian E. Bliss, Zhiqiang Ma, Paul Petersen, Josep Torrellas 2014-10-23 OS Lab Ok-Kyoon Ha 2006 ACM
Motivation • debugging data races is a difficult task • detector has two common types of algorithms - Lockset-based algorithm & Vector clock-based algorithm • data race-detection tools - have reasonable overheads (2x slowdowns) - do not provide much useful information or have limited usage models • Intel Thread Checker - provide an abundance of useful information and have few usage constrains - have high performance costs (233x slowdowns) SBMP06
Overheads of Intel’s Thread Checker - instrumentation alone: slowdown of 22x - full algorithm: slowdown of 233x - memory overhead: imposes a 20x SBMP06
Approach • Objective - to reduce the amount of work done by the algorithm • Filtering useless references SBMP06
Three Filters (1/3) • Stack Filter - filter if one thread accesses another’s stack - cannot cause data races to be lost and is very efficient • Implementation Issues of Stack Filter - the simplest filter and has the lowest overhead - compares the memory reference address with the stack base and limit address SBMP06
Three Filters (2/3) • Duplicate Filter - maintain the first load and store references to a variable in each segments - filter duplicate references in segments - can only cause Thread Checker to lose duplicate data races • Implementation Issues of Duplicate Filter - slower than the stack filter - maintains filter tables that organized 4 fields add size type ID T1 T2 add size type ID SBMP06
R1, W1 R1, W1 W’ R’ R R, W W Three Filters (3/3) • FSM Filter - base the Eraser state machine - filter reference in the Private state and in the Shared Read Only state - filter the initial references (Uninit → Private, Private → SHD RO) UNINIT PRIVATE SHR RW SHR RO Eraser state machine SBMP06
Experimental Setup • Environments - 4-way 2.5GHz Pentium 4 workstation - use the SPLASH-2 applications - run with 4 threads on 4 processors • Measurements - filtering statistics are collected by running each application three times - performance results are collected by running each application nine times - each application is run in Thread Checker with and without three filters - compare the number of data-race bugs reported with and without the filters SBMP06
Filtering Effectiveness Incremental filtering effectiveness Different filter combinations SBMP06
Performance Speedups obtained with filtering SBMP06
Data-race Detection Characterizing the impact of the three filers combined SBMP06
Conclusions and Future Work • Conclusion - Intel Thread Checker slowdown of 233x on average - filtering out the vast majority of memory references - develop three filters that filter 98% of all memory references - speedups of 3.3x on average • Future Work - improve the FSM filter - to improve the other overhead sources in Thread Checker SBMP06