290 likes | 402 Views
Segmented Symbolic Analysis. Wei Le Rochester Institute of Technology. Motivation. Symbolic analysis has many important applications in software tools [ S en , Marinov , Agha ‘05] [ Godefroid , Klarlund , Sen ‘05] [Le, Soffa ’08] [ Chipounov , Kuznetsov , Candea ‘12]
E N D
Segmented Symbolic Analysis Wei Le Rochester Institute of Technology
Motivation • Symbolic analysis has many important applications in software tools [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen‘05] [Le, Soffa ’08] [Chipounov, Kuznetsov, Candea ‘12] • Compared to testing with concrete input: better coverage • Compared to other static techniques: more precise • Will continue being a powerful tool due to improved scalability [Chipounov, Kuznetsov, Candea ‘12]
Challenges of Symbolic Analysis • Loops: Can have an statically unknown bound • Library calls: the source code of a library is typically not available at compile time
Previous Solutions • Loops - very small state space is covered • Iterate once [Cadar, Dunbar, Engler ‘ 08] [Chipounov, Kuznetsov, Candea ‘12] • Report unknown [Xie, Chou, Engler ‘03] • Pattern matching [Saxena, Poosankam, McCamant, Song ‘00] • Library calls – imprecise and manual effort • A concrete value [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05] • Manually constructed models (e.g., simplified C implementation) [Bush, Pincus, Sielaff ‘00] [Chipounov, Kuznetsov, Candea ‘12]
Segmented Symbolic Analysis - Insights • Code is not uniformly easy to analyze • We should leverage the structural and semantic relations between statements to partition a program and apply different analyses accordingly • The capabilities of static analysis are limited; we should introduce dynamic analysis to supply information that a pure static symbolic analyzer is slow or unable to produce
Overall Approach • Perform symbolic analysis • When an unknown occurs, identify code segments that cause unknown • Construct unit tests and automatically generate inputs • Run tests, perform dynamic inference to generate symbolic rules and symbolic values (transfer functions) • Resume symbolic analysis using inferred rules
Novelty of the Work • Weave static and dynamic analyses on demand on a concurrent framework • Dynamic analysis is fully automatic (notrunning the entire program but on code segments) • Aggregated information from multiple runs: regression analysis • Programs mostly consist of linear operations [Knuth’71] [Halbwachs, Proy, P. Roumanoff‘97] • Determining program properties often only requires linear constraints [Halbwachs, Y.-E. Proy, and P. Roumanoff ’97] [[Xie, Chou, Engler ‘03] • We assume that linear relations can characterize relevant behavior of small code segments
Traditional SA with Library Models Segmented SA Traditional SA struct stat s 1 char filename[32] 2 Buffer Overflow char* temp = argv[1] 3 32 > Len(argv[1])+1 int i = 0 32 > Len(temp)+1 4 no *temp != ‘\0’ 5 yes Loop Len(filename’) = Len (temp) filename[i] = *temp++ 6 i++ 7 Loop Unknown t =_stat64i32(filename,&s) 8 Library 32 > Len(filename)+1 Library Unknown 32 > Len(filename)+1 no Len (filename’) = Len(filename) t == 0 9 yes strcat(filename, “, ”) 32 > Len(filename)+1 10 32 > Len(filename)+1 32 > Len(filename)+1
Unit Test to Infer the Loop //initialize with test inputs char* temp = _GenChars(test_buf); char* filename = _GenChars(test_buf); //code segment for the loop inti = 0; while(*temp != '\0'){ filename[i] = *temp++; i++; } //output Len(filename) char* _result = _GenChars(g_buf); int _rint = strlen(filename); itoa(_rint, _result, 10); fputs(_result, fp); //cleanup …
The Helium framework Symbolic Analysis & Partition Program for Unknown Dynamic Inference On Demand q q …… q q Request Inference Repository Solving Unknown Not Found New Rules Test Synthesizer Inference Engine Solving Solved Solved Respond
Components on the Helium Framework • Static component: • Perform demand-driven, path-sensitive symbolic analysis • Isolate the code segment that causes unknown • Determine the environment for the code segment
Interaction Protocol Dynamic Inference V: Inquiry Request Code Unit C: Code Test Output Symbolic Analysis Test Input E: Env Transfer Func Inference Respond
Test Synthesizer Construct a Unit Test from Program Segment Code Segment Construct Runnable Code Select Code Segment Determine Test Output Variables Determine Test Input Variables
Dynamic Inference as Regression Analysis Y = X0 + a1 X1 + a2 X2 … + anXn Inference via Regression Data for Explanatory Variables Model Selection Input Transformation Data for Response Variables Piecewise Linear Simple, Multiple, Polynomial Linear Linear Symbolic Rules
Explanatory Models for Representing Code Semantics (SUPPOSE a: OUTPUT VAR, b, c, d: INPUT VARS)
Experimental Setup • Implementation - Phoenix and Disolver, analyzingC/C++/C# • A traditional symbolic analysis that gives up in loops and library calls • Segmented symbolic analysis • Applications of both symbolic analyses to detect infeasible paths and buffer overflows • Research Questions: • Can we find useful symbolic rules and values? • Are we improving the detection capabilities for infeasible paths and buffer overflows? • What are the capabilities of segmented symbolic analysis? • Is the technique still scalability and practical?
Experimental Summary • Improved the detection capabilities: 5 times more buffer overflows • Inferred 1135 models • 2/3 of the loops are eligible for size, 29.3% yields runnable unit tests, inferred models from 23.8% loops • Unit tests for 81.4% library calls are runnable and models are inferred for 70.4% library calls • Scalability is still practical • We can handle loops that traditional symbolic analysis cannot
Loops We can Handle //loop handled by segment symbolic analysis for (p = name; *p != '\0'; p++){ if (isascii((int)*p) && isupper((int)*p)){ *p = tolower(*p); tryagain = TRUE; } }
Loops We cannot Handle Yet for (n = 7; n >= 8 - pfburh->r.w % 8; n--) { rcSource[i++] = rcolors[m_netbuf[y * bytesPerRow + x] >> n & 1] ; }
Related Work • Various symbolic analyses for bug finding, debugging [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05] [Le, Soffa ’08] [Chipounov, Kuznetsov, Candea ‘12] • Hybrid symbolic analysis [Sen, Marinov, Agha ‘05] [Godefroid, Klarlund, Sen ‘05] [Chipounov, Kuznetsov, Candea ‘12] • Dynamic invariants discovery [Ernst, Czeisler, Griswold, Notkin ‘ 00]
Conclusions A novel hybrid technique that flexibly weaves static and dynamic analyses on demand for their maximum capabilities of discovering program semantic information Addressed the two key challenges : 1) partitioning a program to construct valid unit tests, and 2) mapping the problems of discovering symbolic relations between program variables to regression analysis. Fully automatic and can be generally applied for determining different program properties and for different programs.