300 likes | 511 Views
MUVI: Automatically Inferring Multi-Variable Access Correlations and Detecting Related Semantic and Concurrency Bugs. Shan Lu ( shanlu@cs.uiuc.edu ) Shan Lu , Soyeon Park , Chongfeng Hu , Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca A. Popa, and Yuanyuan Zhou University of Illinois
E N D
MUVI: Automatically InferringMulti-Variable Access Correlations andDetecting Related Semantic and Concurrency Bugs Shan Lu (shanlu@cs.uiuc.edu) Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca A. Popa, and Yuanyuan Zhou University of Illinois http://opera.cs.uiuc.edu
Bugs are bad! • Software bugs are costly! • Account for 40% of system failures [Marcus2000] • Cost US economy $59.5 billion annually [NIST] • Techniques to improve program correctness are desired
Software bug categories • Memory bugs • Improper memory accesses and usage • A lot of study and effective detection tools • Semantic bugs • Violation to the design requirements or programmer intentions • Biggest part (~80%*) of software bugs • No silver bullet • Concurrency bugs • Wrong synchronization in concurrent execution • Increasingly important with the pervading concurrent program trend • Hard to detect *Have Things Changed Now? -- An Empirical Study of Bug Characteristics in Modern Open Source Software [ACID’06]
An important type of semantic information • Software programs contain many variables • Variables are NOT isolated • Semantic bond exists among variables • Correct programs consistently access correlated variables Variable Access Correlation t y x z u s v w
Class THD { … char* db; int db_length; } struct net_device_stats { … long rv_packets long rv_bytes; } struct fb_var_screeninfo { … int red_msb; int blue_msb; int green_msb; int transp_msb; } struct st_test_file * cur_file; struct st_test_file * file_stack; 4 MySQL Linux Linux MySQL Constraint specification Different representation Different aspects Implementation-demand Variable correlation in programs • Semantic correlation widely exists among variables
Variable access correlation ( constraint ) • Maintaining correlation usually needs consistent access db db_length write ( ) access* ( ) rv_packets rv_bytes *access: read or write write ( ) write ( ) red/…/transp red/…/transp access ( ) access ( ) write ( ) write ( ) file_stack cur_file A1 ( x ) A2 ( y ) access access read read write write Variable access correlation
Violating the correlations leads to bugs • Programmers may forget to access correlated variables • A type of semantic bugs not handled by previous tools Mostly consistent access --- correct Inconsistent access --- BUG! Correlated variables More examples of inconsistent update bugs are in our paper. Confirmed by Linux developers Inconsistent update bugs
Violating the correlations leads to bugs (ii) Thread 1 Thread 2 js_FlushPropertyCache ( … ) { memset ( cachetable, 0, SIZE); … cacheempty = TRUE; } js_PropertyCacheFill ( … ) { cachetable[indx] = obj; … cacheempty = FALSE; } struct JSCache { … JSEntry table[SIZE]; bool empty; } lock ( T ) lock ( T ) unlock ( T ) unlock ( T ) lock ( E ) BUG lock ( E ) • Programmers may forget to synchronize concurrent accesses to correlated variables • This is NOT a traditional data race bug • Bug occurs even if accesses to each single variable are well synchronized Mozilla unlock ( E ) unlock ( E ) Multi-variable concurrency bugs
Our contribution • A technique to automatically infer variable access correlation • Bug detection based on variable access correlation • Inconsistent-update semantic bugs • Multi-variable concurrency bugs • Disclose correlations and new bugs from real-world applications (Linux-device_driver, Mozilla, MySQL, Httpd) • > 6000 variable correlations • 39 new inconsistent-update semantic bugs • 4 new multi-variable concurrency bugs from Mozilla
Outline • Motivation • What is variable access correlation • MUVI variable access correlation inference • MUVI bug detection • Inconsistent-update semantic bug detection • Multi-variable concurrency bug detection • Evaluation • Conclusions
Basic idea of correlation inference access correlation A1 ( x ) A2 ( y ) • Our target: • Our inference method: • Assumption: mature program, mostly correct • x and y appear together in many times • x and y seldom appear separately Statistically infer access correlation based on variable access pattern in source code • How to judge``together’’? • Our metric: • static code distance within a function scope • Our paper talks about other potential metrics Access correlation How to do this efficiently?
Frequent itemset mining • A common data mining technique • Itemset: a set of items ( no order ) • E.g. (v, w, x, y, z) • Sub-itemset: • E.g. (w, y) • Itemset database • Goal: find frequent sub-itemsets in an itemset database • Support: number of appearances • E.g. support of (w, y) is 3 • Frequent: support > threshold ( v, w, x, y, z ) (v, w, y, z, s ) (v, w, y, t ) (v, x, m, n)
Pre-processing Itemset Database Mining Frequent variable sets Post-processing Variable access correlation Flowchart of variable correlation inference Source files How? How?
MUVI Inference algorithm (pre-process) • What is an item? • A variable • What is an itemset? • A function • What to put into an itemset? • Accessed variables • Access type (read/write) Program Source Code ? Itemset Database
MUVI Inference algorithm (pre-process) • Input: program • Output: an itemset database • Flow-insensitive, inter-procedural analysis • Consider Global variables and structure-typed variables • Also consider variables accessed in callee functions Database int x; f1 ( ) { read x; } f2 ( ) { S t; write t.y; } int z; f3 ( ) { read z; f1 ( ); f2 ( ); } f1 {read, x} {read, x} f3 f2 {write, S::y} {write, S::y} f3 {read, z} … …… f1 f2
MUVI Inference algorithm (post-process) • Input: frequent variable sets (x, y), which appear together in many functions • Pruning • What if x and y appear separately many times? • Prune out low confidence (conditional probability) pairs • What if x is too popular, e.g. stderr, stdout? • Categorize based on access type • write (x) write (y)? Or write (x) read (y)? etc. • Output: variable correlation A1 ( x ) A2 ( y )
Outline • Motivation • MUVI variable access correlation inference • MUVI bug detection • Inconsistent-update semantic bug detection • Multi-variable concurrency bug detection • Evaluation • Conclusions
Inconsistent-update bug detection • Step 1: get all write(x)acc(y) correlations • Step 2: get all violations to above correlations • Step 3: prune out unlikely bugs • Code analysis to check caller and callee functions write (fb_var_screeninfo::blue_msb) access (fb_var_screeninfo::transp_msb) #support = 11 #violation = 1 (function neofb_check_var) inconsistent-update bug
Multi-variable concurrency bug detection-- MUVI Lock-set algorithm • Original algorithm • Look for common locks among conflicting accesses to each shared variable • MV Lock-Set algorithm • Look for common locks among conflicting accesses to each shared variable and their correlated accesses
Multi-variable concurrency bug detection-- Other MUVI extension algorithm • MUVI happens-before algorithm • Check the happens-before relation among conflicting accesses to each single variable • Check the happens-before relation among conflicting accesses to each single variable and correlated accesses • Other extension • Extending hybrid race detection • Extending atomicity violation bug detection
Outline • Motivation • MUVI variable access correlation inference • MUVI bug detection • Inconsistent-update semantic bug detection • Multi-variable concurrency bug detection • Evaluation • Conclusions
Methodology • For variable correlation and inconsistent-update bug detection: • Linux (device driver) • Mozilla • MySQL • PostgreSQL • For multi-variable concurrency bug detection: • Fiveexisting real bugs from Mozilla and MySQL All latest versions Find four new multi-variable concurrency bugs during the detection process
Results on correlation inference • Macro, inline functions • coincidence
Inconsistent-update bug detection results • Semantic exceptions • Wrong correlations • No future read access
Multi-variable concurrency bug detection results • MV-Happens-Before has similar results • Variables are conditionally correlated • The correlation is missed by MUVI
Multi-variable concurrency bug detection results • 4 new multi-variable concurrency bugs detected! Wrong result!
Conclusion • Variable access correlations can be inferred • Variable access correlation is important • Help detect two types of bugs • Other usage • Provide specifications to ease programming • Provide hints for assigning locks or TMs • E.g. AtomicSet, AutoLocker, Colorama
Related works • Program specification inference • [ErnstICSE00], [EnglerSOSP01], [KremenekOSDI06], [LiblitPLDI03], [WhaleyISSTA02], [YangICSE06], etc. • Code pattern mining • [LiOSDI04], [LiFSE05], [LivshitsFSE05], etc. • Concurrency bug detection • [ChoiPLDI02], [EnglerSOSP03], [FlanaganPOPL04], [SavageTOCS97], [Praun01], [XuPLDI05], [YuSOSP05], etc. • Techniques for easing concurrent programming • [Harris03], [HerlihyISCA93], [McCloskeyPOPL06], [Rajwar02], [Hammond04], [Moore6], [Rossbach07], etc.
Acknowledgement • Prof. Stefan Savage (shepherd) • Anonymous reviewers • Prof. Liviu Iftode • GOOGLE student travel grant • NSF, DOE, Intel research grants
Thanks! http://opera.cs.uiuc.edu