1 / 29

Segmented Symbolic Analysis

Segmented Symbolic Analysis. Wei Le Rochester Institute of Technology. Motivation. Symbolic analysis has many important applications in software tools [ S en , Marinov , Agha ‘05] [ Godefroid , Klarlund , Sen ‘05] [Le, Soffa ’08] [ Chipounov , Kuznetsov , Candea ‘12]

galen
Download Presentation

Segmented Symbolic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Segmented Symbolic Analysis Wei Le Rochester Institute of Technology

  2. Motivation • Symbolic analysis has many important applications in software tools [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen‘05] [Le, Soffa ’08] [Chipounov, Kuznetsov, Candea ‘12] • Compared to testing with concrete input: better coverage • Compared to other static techniques: more precise • Will continue being a powerful tool due to improved scalability [Chipounov, Kuznetsov, Candea ‘12]

  3. Challenges of Symbolic Analysis • Loops: Can have an statically unknown bound • Library calls: the source code of a library is typically not available at compile time

  4. Previous Solutions • Loops - very small state space is covered • Iterate once [Cadar, Dunbar, Engler ‘ 08] [Chipounov, Kuznetsov, Candea ‘12] • Report unknown [Xie, Chou, Engler ‘03] • Pattern matching [Saxena, Poosankam, McCamant, Song ‘00] • Library calls – imprecise and manual effort • A concrete value [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05] • Manually constructed models (e.g., simplified C implementation) [Bush, Pincus, Sielaff ‘00] [Chipounov, Kuznetsov, Candea ‘12]

  5. Segmented Symbolic Analysis - Insights • Code is not uniformly easy to analyze • We should leverage the structural and semantic relations between statements to partition a program and apply different analyses accordingly • The capabilities of static analysis are limited; we should introduce dynamic analysis to supply information that a pure static symbolic analyzer is slow or unable to produce

  6. Overall Approach • Perform symbolic analysis • When an unknown occurs, identify code segments that cause unknown • Construct unit tests and automatically generate inputs • Run tests, perform dynamic inference to generate symbolic rules and symbolic values (transfer functions) • Resume symbolic analysis using inferred rules

  7. Novelty of the Work • Weave static and dynamic analyses on demand on a concurrent framework • Dynamic analysis is fully automatic (notrunning the entire program but on code segments) • Aggregated information from multiple runs: regression analysis • Programs mostly consist of linear operations [Knuth’71] [Halbwachs, Proy, P. Roumanoff‘97] • Determining program properties often only requires linear constraints [Halbwachs, Y.-E. Proy, and P. Roumanoff ’97] [[Xie, Chou, Engler ‘03] • We assume that linear relations can characterize relevant behavior of small code segments

  8. Overview using an Example

  9. Traditional SA with Library Models Segmented SA Traditional SA struct stat s 1 char filename[32] 2 Buffer Overflow char* temp = argv[1] 3 32 > Len(argv[1])+1 int i = 0 32 > Len(temp)+1 4 no *temp != ‘\0’ 5 yes Loop Len(filename’) = Len (temp) filename[i] = *temp++ 6 i++ 7 Loop Unknown t =_stat64i32(filename,&s) 8 Library 32 > Len(filename)+1 Library Unknown 32 > Len(filename)+1 no Len (filename’) = Len(filename) t == 0 9 yes strcat(filename, “, ”) 32 > Len(filename)+1 10 32 > Len(filename)+1 32 > Len(filename)+1

  10. Unit Test to Infer the Loop //initialize with test inputs char* temp = _GenChars(test_buf); char* filename = _GenChars(test_buf); //code segment for the loop inti = 0; while(*temp != '\0'){ filename[i] = *temp++; i++; } //output Len(filename) char* _result = _GenChars(g_buf); int _rint = strlen(filename); itoa(_rint, _result, 10); fputs(_result, fp); //cleanup …

  11. Reduce to Regression Analysis

  12. Internal Design and Components

  13. The Helium framework Symbolic Analysis & Partition Program for Unknown Dynamic Inference On Demand q q …… q q Request Inference Repository Solving Unknown Not Found New Rules Test Synthesizer Inference Engine Solving Solved Solved Respond

  14. Components on the Helium Framework • Static component: • Perform demand-driven, path-sensitive symbolic analysis • Isolate the code segment that causes unknown • Determine the environment for the code segment

  15. Interaction Protocol Dynamic Inference V: Inquiry Request Code Unit C: Code Test Output Symbolic Analysis Test Input E: Env Transfer Func Inference Respond

  16. Test Synthesizer Construct a Unit Test from Program Segment Code Segment Construct Runnable Code Select Code Segment Determine Test Output Variables Determine Test Input Variables

  17. Dynamic Inference as Regression Analysis Y = X0 + a1 X1 + a2 X2 … + anXn Inference via Regression Data for Explanatory Variables Model Selection Input Transformation Data for Response Variables Piecewise Linear Simple, Multiple, Polynomial Linear Linear Symbolic Rules

  18. Explanatory Models for Representing Code Semantics (SUPPOSE a: OUTPUT VAR, b, c, d: INPUT VARS)

  19. Experimental Setup • Implementation - Phoenix and Disolver, analyzingC/C++/C# • A traditional symbolic analysis that gives up in loops and library calls • Segmented symbolic analysis • Applications of both symbolic analyses to detect infeasible paths and buffer overflows • Research Questions: • Can we find useful symbolic rules and values? • Are we improving the detection capabilities for infeasible paths and buffer overflows? • What are the capabilities of segmented symbolic analysis? • Is the technique still scalability and practical?

  20. Experimental Results: Compare the two

  21. Dynamic Inference for Buffer Overflow

  22. Performance

  23. Experimental Summary • Improved the detection capabilities: 5 times more buffer overflows • Inferred 1135 models • 2/3 of the loops are eligible for size, 29.3% yields runnable unit tests, inferred models from 23.8% loops • Unit tests for 81.4% library calls are runnable and models are inferred for 70.4% library calls • Scalability is still practical • We can handle loops that traditional symbolic analysis cannot

  24. Capabilities of Segmented Symbolic Analysis

  25. Loops We can Handle //loop handled by segment symbolic analysis for (p = name; *p != '\0'; p++){ if (isascii((int)*p) && isupper((int)*p)){ *p = tolower(*p); tryagain = TRUE; } }

  26. Loops We cannot Handle Yet for (n = 7; n >= 8 - pfburh->r.w % 8; n--) { rcSource[i++] = rcolors[m_netbuf[y * bytesPerRow + x] >> n & 1] ; }

  27. Related Work • Various symbolic analyses for bug finding, debugging [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05] [Le, Soffa ’08] [Chipounov, Kuznetsov, Candea ‘12] • Hybrid symbolic analysis [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05] [Chipounov, Kuznetsov, Candea ‘12] • Dynamic invariants discovery [Ernst, Czeisler, Griswold, Notkin ‘ 00]

  28. Conclusions A novel hybrid technique that flexibly weaves static and dynamic analyses on demand for their maximum capabilities of discovering program semantic information Addressed the two key challenges : 1) partitioning a program to construct valid unit tests, and 2) mapping the problems of discovering symbolic relations between program variables to regression analysis. Fully automatic and can be generally applied for determining different program properties and for different programs.

  29. Thank you and Questions?

More Related