1 / 19

Mining Function Usage Patterns to Find Bugs

Mining Function Usage Patterns to Find Bugs. Chadd Williams. Thesis. We can discover important properties by looking at source code changes. Source code is full of interesting properties describes how the source code is written rule that one must adhere to for code to work correctly

jcomer
Download Presentation

Mining Function Usage Patterns to Find Bugs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Function Usage Patterns to Find Bugs Chadd Williams

  2. Thesis We can discover important properties by looking at source code changes • Source code is full of interesting properties • describes how the source code is written • rule that one must adhere to for code to work correctly • what to do with values from a function • how to use an API • Can we find the properties? • every change is committed • changes highlight misunderstood code open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) • Can we use these rules to help the developer to find bugs?

  3. Why? • We wrote the code, we know the rules! • Implicit rules build up over time • little or no documentation • failure to understand implicit rules causes bugs • 32% of bugs detected during maintenance1 • How much do you know about your 10 year old code base? • Didn’t someone rewrite the matrix objects? • What about that third party library? [1] Matsumura, T., Monden, A., Matsumoto, K., The Detection of Faulty Code Violating Implicit Coding Rules, IWPSE ’02

  4. Static Analysis • Analysis of code without execution • examine the source code only • Many successful static analysis tools check for violations of system specific rules • how to use an internal API • specialized lock/unlock functionality • data validation requirements • Often produces many false warnings • can historical information improve this?

  5. value = foo(); if( value != error_code) { newPosition + = value; } … … value = foo(); newPosition + = value; … Commit General Technique • Inspect each commit to each file • Identify properties in each version • Compare sets of properties to determine new instances of properties • Identify commonly added properties

  6. Evaluation • Does historical information help? • can we get the same value by only looking at the latest version of the source code? • Metric • are the likely bugs near the top? • cumulative precision • Precision: number of likely bugs vs. number of warnings inspected

  7. value = foo(); if( value != error_code) { // Check newPosition + = value; } … Tool Inferred Bug Fix … value = foo(); newPosition + = value; // ??? … Return Value Check Bug • Identify functions whose return value induces a code change Apache Results • Provide developers a list of sorted warnings • use historical information for sorting Chi-square = 6.15 p is less than or equal to 0.025

  8. HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps ); mdi = HeapAlloc(GetProcessHeap()); if (!mdi) HeapFree(GetProcessHeap(), 0, cs); Discovering Function Usage Patterns • Function Usage Pattern • describe function invocations with respect to each other • static analysis • intraprocedural • describe relationships between functions • implicit rules

  9. Goals • Discover valid patterns • use data mining techniques to identify patterns • Identify buggy patterns • which patterns commonly cause a code change • Find violations of these patterns • static analysis • use history to rank violations

  10. int foo(){ open(); read(); } int foo(){ open(); } Commit new instance of the pattern open() -> read() Mining Changes in Function Usage Patterns • Find new instances of patterns • where that instance was not found in the revision immediately prior • This finds a large number of patterns • need context to strengthen the ties between the pair of functions • Data Flow

  11. Data Flow • Identify data flow relationships between function pairs • produced/consume • use same data • update same data HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps ); • Data flow confidence • what percent of new instances of foo() -> bar() have a data flow relationship? HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps ); HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) hdc.x = genX();

  12. Commit Commit int foo(){ open(); read(); close(); } int foo(){ open(); read(); } Bug-Prone Patterns • How does a new instance enter the source code • both of the function calls were added • one function call was added • the added function completed the pairing • bug fix? refactoring? • Bug confidence • what percent of new instances of foo()->bar() are created by adding one function call? And which function call is most likely to be added? int foo(){ }

  13. One Function Call Added Two Function Calls Added Two Function Calls Added One Function Call Added Valid, Bug Prone Patterns • Patterns added completely could indicate valid patterns • Patterns added by adding one function call indicate: • refactoring/very misunderstood pattern • random noise • Which are likely to be buggy?

  14. Ranking of Violations • Number of violations for each pattern • experience from the current code base • Data Flow Confidence • which are valid patterns • Bug Confidence • which have caused code changes in the past • Confidence • how often, when foo() is added, is foo()->bar() created

  15. Preliminary Results • Student Projects – CS 3 • Introduction to C • CVS history for each student for each project • CVS commit to see automated test results • 50% precision on final submission • Apache web server • 50% precision rate top 10 warnings • identified a refactoring • Wine TREEVIEW_ValidItem(tree,item); TREEVIEW_SendTreeviewNotify(tree,command,item);

  16. Apache Case Study • 1,129 C source files • includes modules • Apache Portable Runtime • 41,000 CVS commits • 6,000 compilable CVS transactions that change source files for the Linux version • Studied httpd-2.0 branch • July 1996 through Oct 2003 • some files have history back through 1.0 branch

  17. Apache Refactoring • Found many patterns of this form: • Change debug logging • previously printf • now ap_log_error or ap_log_rerror • Change debug logging • previously printf • now ap_log_error or ap_log_rerror How often is this pattern created by adding exactly one function call How often, when one function call is added to create this pattern, is it the second function call Thu Nov 18 23:07:53 1999 UTC (6 years, 3 months ago) … I then changed all the fprintf(stderr calls to ap_log_error …

  18. Can we find bugs? • Static analysis to identify violations of ap_log_error patterns • 16 of first 20 warnings are likely bugs • first 20 warnings involving ap_log_error • ranking based on • violations per pattern • bug confidence • data flow confidence • Why do these bugs exist? • missed refactorings • bugs caused by not knowing implicit rules This refactoring started in 1999

  19. Conclusions • Interesting properties can be mined from change history • function usage patterns • Using historical information has improved static analysis tools • provide a list of ranked warnings to user • reduced false positive rate

More Related