1 / 122

Scalable Dynamic Analysis for Automated Fault Location and Avoidance

Scalable Dynamic Analysis for Automated Fault Location and Avoidance. Rajiv Gupta. Funded by NSF grants from CPA , CSR , & CRI programs and grants from Microsoft Research. Motivation. Software bugs cost the U.S. economy about $59.5 billion each year [ NIST 02 ] . Embedded Systems

urania
Download Presentation

Scalable Dynamic Analysis for Automated Fault Location and Avoidance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Dynamic Analysis for Automated Fault Location and Avoidance Rajiv Gupta Funded by NSF grants from CPA, CSR, & CRI programs and grants from Microsoft Research

  2. Motivation • Software bugs cost the U.S. economy about $59.5 billion each year [NIST 02]. • Embedded Systems • Mission Critical / Safety Critical Tasks • A failure can lead to Loss of Mission/Life. (Ariane 5)arithmetic overflow led to shutdown of guidance computer. (Mars Climate Orbiter)missed unit conversion led to faulty navigation data. (Mariner I)missing superscripted bar in the specification for the guidance program led to its destruction 293 seconds after launch. (Mars Pathfinder)priority inversion error causing system reset. (Boeing 747-400) loss of engine & flight displays while in flight. (Toyota hybrid Prius) VSC, gasoline-powered engine shut off. (Therac-25) wrong dosage during radiation therapy. …….

  3. Program Execution Fault Fault Location Fault Avoidance Overview Long-running Multi-threaded Scalability Tracing + Logging Dynamic Slicing Offline Environment Faults Online

  4. Fault Location Goal: Assist the programmer in debugging by automatically narrowing the fault to a small section of the code. • Dynamic Information • Data dependences • Control dependences • Values • Execution Runs • One failed execution & • Its perturbations

  5. Program …… Dynamic Information Dynamic Dependence Graph Execution Data Control

  6. Approach Detect execution of statementssuch that • Faulty codeAffectsthe value computed bys; or • Faulty code isAffected-bythe value computed bys through a chain of dependences. Estimate the set of potentially faulty statements froms: • Affects: statements from which s is reachable in the dynamic dependence graph. (Backward Slice) • Affected-by: statements that are reachable from sin the dynamic dependence graph. (Forward Slice) • Intersect slices to obtain a smaller fault candidate set.

  7. Failure inducing Input Backward Slice Forward Slice [Korel&Laski,1988] [ASE-05] Erroneous Output Backward & Forward Slices

  8. Failure Inducing Input Erroneous Output Backward & Forward Slices [ASE-05] • For memory bugs the number of statements is very small (< 5).

  9. Combined Slice Critical Predicate: An execution instance of a predicate such that changing its outcome “repairs” the program state. Backward Slice of CP Bidirectional Slice + Forward Slice of CP [ICSE-06] Bidirectional Slices • Found critical predicates in 12 out of 15 bugs • Search for critical predicate: • Brute force: 32 predicates to 155K predicates; • After Filtering and Ordering: 1 to 7K predicates.

  10. v 1 1 1 C(v): [0,1] 1 - any change in v will change 0 - all values of v produce same [PLDI-06] Pruning Slices Confidence inv How? Value profiles.

  11. Test Programs Real Reported Bugs Injected Bugs • Nine logical bugs (incorrect ouput) • Unix utilities • grep 2.5, grep 2.5.1, flex 2.5.31, make 3.80. • Six memory bugs (program crashes) • Unix utilities • gzip, ncompress, polymorph, tar, bc, tidy. • Siemens Suite (numerous versions) • schedule, schedule2, replace, print_tokens.. • Unix utilities • gzip, flex

  12. Dynamic Slice Sizes

  13. Combined Slices

  14. Evaluation of Pruning Siemen’s Suite • Single error is injected in each version. • All the versions are not included: • No output or the very first output is wrong; • Root cause is not contained in the BS (code missing error).

  15. Evaluation of Pruning

  16. Backward Slice [AADEBUG-05] ≈ 31% of Executed Statements Erroneous output Failure inducing input Critical predicate Confidence Analysis Combined Slice [ASE-05,ICSE-06] Pruned Slice [PLDI-06] ≈ 36% of Backward Slice ≈ 11% of Exec. ≈ 41% of Backward Slice ≈ 13% of Exec. Effectiveness

  17. Slicing is effective in locating faults. No more than 10 static statements had to be inspected. Effectiveness

  18. X= A = A<0 X= X= X= A = • Inspect pruned slice. • Dynamically detect an • Implicit dependence. • Incrementally expand • the pruned slice. =X A = A<0 A<0 X= [PLDI-07] =X =X Implicit dependence Execution Omission Errors

  19. Scalability of Tracing Dynamic Information Needed • Dynamic Dependences • for all slicing • Values for Confidence Analysis • for pruning slices • annotates the static program representation Whole Execution Trace (WET) • Trace Size • ≈ 15 Bytes / Instruction

  20. Trace Sizes & Collection Overheads • Trace sizes are very large for even 10s of execution.

  21. Compacting Whole Execution Traces • Explicitly remember dynamiccontrol flow trace. • Infer as many dynamic dependences as possible from control flow (94%), remember the remaining dependences explicitly (≈ 6%). • Specialized graph representation to enable inference. • Explicitly remember value trace. • Use context-based method to compress dynamic control flow, value, and address trace. • Bidirectional traversal with equal ease [MICRO-04, TACO-05]

  22. Input: N=2 11: z=0 21: a=0 31: b=2 41: p=&b 51: for I=1 to N do 61: if (i%2==0) then 81: a=a+1 91: z=2*(*p) 52: for I=1 to N do 62: if (i%2==0) then 71: p=&a 82: a=a+1 92: z=2*(*p) 101: print(z) Dependence Graph Representation 1: z=0 2: a=0 3: b=2 4: p=&b 5: for i = 1 to N do 6: if ( i %2 == 0) then 7: p=&a endif endfor 8: a=a+1 9: z=2*(*p) 10: print(z)

  23. 1: z=0 2: a=0 3: b=2 <2,7> <3,8> 4: p=&b 5:for i=1 to N <4,8> <5,6><9,10> T 6:if (i%2==0) then <10,11> T <5,7><9,12> F 7: p=&a <7,12> 8: a=a+1 <11,13> <5,8><9,13> <12,13> 9: z=2*(*p) <13,14> 10: print(z) Dependence Graph Representation T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Input: N=2 11: z=0 21: a=0 31: b=2 41: p=&b 51: for i = 1 to N do 61: if ( i %2 == 0) then 81: a=a+1 91: z=2*(*p) 52: for i = 1 to N do 62: if ( i %2 == 0) then 71: p=&a 82: a=a+1 92: z=2*(*p) 101: print(z) F

  24. 1 1 1 1 1 2 2 2 2 2 3 3 3 3 4 6 7 9 3 4 6 7 9 3 5 6 7 9 3 4 6 8 9 3 5 6 8 9 4 4 5 5 4 3 3 6 6 5 6 7 8 7 8 9 9 7 10 10 Transform: Traces of Blocks 1 2 10

  25. (...,20) ... X = X = X = (10,10) (20,20) (30,30) Y= X Y= X Y= X (20,21) ... =Y 21 Infer: Local Dependence Labels 10,20,30

  26. X = X = X = (20,20) *P = *P = *P = (10,10) Y= X Y= X Y= X Transform: Local Dep. Labels (20,20) 10,20

  27. X = X = X = X = (20,20) *P = *P = *P = *P = (10,10) Y= X Y= X Y= X Y= X (10,11) (20,21) (10,11) (20,21) =Y =Y 11,21 11,21 Transform: Local Dep. Labels 10 20 10,20

  28. X = Y = X = Y = X = Y = (10,21) X = Y = X = Y = X = Y = (10,21) (10,21) (20,11) (20,11) = Y = X = Y = X = Y = X (20,11) Group: Non-Local Dep. Edges 10 20 11,21

  29. Compacted WET Sizes ≈ 4 Bits / Instruction

  30. Slicing Times [PLDI-04] vs. [ICSE-03]

  31. Dep. Graph Generation Times • Offline post-processing after collecting address and control flow traces • ≈ 35x of execution time • Onlinetechniques [ICSM 2007] • Information Flow: 9x to18x slowdown • Basic block Opt.: 6x to10x slowdown • Trace level Opt.: 5.5x to 7.5x slowdown • Dual Core: ≈1.5x slowdown • Online Filteringtechniques • Forward slice of all inputs • User-guided bypassing of functions

  32. Reducing Online Overhead • Record non-deterministic events online • Less than 2x overhead • Deterministic replay of executions • Trace faulty executions off-line • Replay the execution • Switch on tracing • Collect and inspect traces • Trace analysis is still a problem • The traces correspond to huge executions • Off-line overhead of trace collection is still significant

  33. Reducing Trace Sizes Checkpointing Schemes • Trace from the most recent checkpoint • Checkpoints are of the order of minutes. • Better but the trace sizes are still very large. Exploiting Program Characteristics • Multithreaded and server-like [ISSTA-07, FSE-06] • Examples : mysql, apache. • Each request spawns a new thread. • Do not trace irrelevant threads.

  34. x Checkpoint log Reduced log x Trace Beyond Tracing • Checkpoint: capture memory image. • Execute and Record (log) Events. [ISSTA-07] • Upon Crash, Rollback to checkpoint. • Reduce log and Replay execution using reduced log. • Turn on tracing during replay. • Applicable to Multithreaded Programs

  35. An Example • A mysql bug • “load …” command will crash the server if database is not specified Without typing “use database_name”,thd->db is Null.

  36. Example – Execution and log file Time Run mysql server open path=/etc/my.cnf … Wait for connection Blue – T0 Red – T1 Green – T2 Gray - Scheduler User 1 connects to the server Create Thread 1 Wait for command User 2 connects to the server Create Thread 2 Wait for command Recv “show databases” Handle command User 1: “show databases” Recv “use test; select * from b” Handle command User 2: “use test” “ select * from b” Recv “load data …” Handle -- (server crashes) User 1: “load data into table1”

  37. Execution Replay using Reduced log Time Run mysql server open path=/etc/my.cnf … Wait for connection User 1 connects to the server Create Thread 1 Recv “load data …” Handle -- (server crashes) User 2 connects to the server User 1: “show databases” User 2: “show databases” “ select * from b” User 1: “load data into table1”

  38. Execution Reduction • Effects of Reduction • Irrelevant Threads • Replay-only vs. Replay & Trace • How? By identifying Inter-thread Dependences • Event Dependences - found using the log • File Dependences - found using the log • Shared-Memory Dependences - found using replay • Naïve approach requires thread id of last writer of each address • Space and time efficient detection • Memory Regions: Non-shared vs shared • Locality of References to Regions • Space requirement reduced by 4x • Time requirement reduced by 2x

  39. Experimental Results

  40. Program-bug Original Optimized Trace Sizes Num. of dependences Experimental Results

  41. Logging Program-bug Orig. OPT. Execution Times (seconds) Experimental Results

  42. Static Binary Analyzer Diablo Control Dependence Checkpoint + log Application binary Compressed Trace Instrument code Reduced Log Execution Engine Valgrind Record Replay Jockey Slicing Module WET Traces Slices Input Output Debugging System

  43. Fault Avoidance Large number of faults in server programs are caused by the environment. • 56 % of faults in Apache server. Types of Faults Handled • Atomicity Violation Faults. • Try alternate scheduling decisions. • Heap Buffer Overflow Faults. • Pad memory requests. • Bad User Request Faults. • Drop bad requests. Avoidance Strategy • Recover first time, Prevent later. • Record the change that avoided the fault.

  44. Experiments

  45. Program Execution Long-running Multi-threaded Scalability Tracing + Logging Fault Fault Location Fault Avoidance Dynamic Slicing Offline Environment Faults Online Summary

  46. Dissertations Xiangyu Zhang, Purdue University • Fault Location Via Precise Dynamic Slicing, 2006. SIGPLAN Outstanding Doctoral Dissertation Award Sriraman Tallam, Google • Fault Location and Avoidance in Long-Running Multithreaded Programs, 2007.

  47. Fault Location via State Alteration CS 206 Fall 2011

  48. Value Replacement: Overview • Aggressive state alteration to locate faulty program statements [Jeffrey et. al., ISSTA 2008] INPUT: Faulty program and test suite (1+ failing runs) TASK: (1) Perform value replacements in failing runs (2) Rank program statements according to collected information OUTPUT: Ranked list of program statements

  49. ERROR ERROR REPLACE VALUES Correct? / Incorrect? Incorrect Output Correct Output Alter State by Replacing Values Passing Execution Failing Execution Failing Execution: Altered State

  50. 1 1 1 0 1 1 1 1 1 (F) 2 3 0 5 Example of a Value Replacement PASSING EXECUTION: (output: ?) 1: read (x, y); 2: a := x - y; 3: if (x < y) 4: write (a); else 5: write (a + 1);

More Related