300 likes | 316 Views
Static bug detection is a minor approach for software quality assurance compared to testing. It works for specific kinds of bugs and sometimes generates false positives. It is easy to start and does not require setup or installation. Static bug detection can guarantee the software to be free of certain kinds of bugs without the need for debugging.
E N D
CS4723 Software Validation and Quality Assurance Lecture 15 Static Bug Detection and Verification
Static bug detection Static bug detection is a minor approach for software quality assurance, compared with testing Compared to testing Work for specific kinds of bugs Sometimes not scalable Generate false positives Easy to start (no build, no setup, no install …) Sometimes can guarantee the software to be free of certain kinds of bugs No need for debugging 2
State-of-art: static bug detection Type-specific detection (Fixed Specification and improvement is provided) Major or important type of bugs Null pointer, memory leak, unsafe cast, injection, buffer overflow, Dynamic SQL error, racing, deadlock, dead loop, html error, UI inconsistency, i18n bugs, … A large bunch of techniques for each kind of bugs Most of them have severe limitations preventing them from practical usage Specification based detection Model checking, symbolic execution, theorem proving 3
Specification A description of the correct behavior of software We must have formal specification to do static bug detection Three main types of specifications Value Temporal Data Flow 4
Value Specification The value (s) of one or several variable (s) must satisfy a certain constraint Example: Final Exam Score <= 100 sortedlist(0) >= sortedlist(1) http_url.startsWith(“http”) Sql_query belongs to Language_SQL 5
Temporal Specification Two events (or a series of events) must happen in a certain order Example lock() -> unlock() file.open() -> file.close() and file.open() -> file.read() They are different, right? Temporal Logic Lock() -> F(unlock()) (!read())U(open()) 6
Data Flow Specification Data from a certain source must / must not flow to a certain sink Example: ! Contact Info -> Internet Password -> encryption -> Internet Data Flow Specification are mainly for security usage 7
General Specifications Common behaviors of all software a/b -> b!=0 a.field -> a!=null a[x] -> x<a.length() p.malloc() -> p.free() lock(s) -> unlock(s) while(Condition) -> F(!Condition) <script> xxx </script> -> ! User_input -> xxx ! Hard-coded string -> User Interface Divide by 0 Null Pointer Reference Buffer Overflow Memory Leak deadlock Infinite Loop XSS I18n error 8
Checking SpecificationsBasic ways Value Specifications Symbolic execution Abstract Interpretation Temporal Specification Model Checking Data Flow Specification Graph traversal (Data Dependence Graph) 9
Static symbolic execution Basic Example Here T is the condition for the statement to be executed, (y=s) is the relationship of all variables to the inputs after the statement is executed T (y=s), s is a symbolic variable for input y = read(); y = 2 * y; if (y <= 12) y = 3; else y = y + 1; print ("OK"); T (y=2*s) T (y=2*s) T^y<=12 (y = 3) T^!(y<=12) (y= 2*s + 1) T^ 2*s<=12 (y= 3 ) | T^!(2*s<=12) (y=2*s + 1) (2*s <= 12 & y = 3) & y <= 0 Not Satisfiable !(2*s <= 12) & (y = 2*s + 1) & y<=0 Not Satisfiable Prove y > 0?
Static symbolic execution Complex Example T (y=s), s is a symbolic variable for input y = read(); p = 1; while(y < 10){ y = y + 1; if y >2 p = p + 1; else p = p + 2; } print (p); T (p = 1, y = s) T (p = 1, y = s) T^ s<10 (y = s + 1, p = 1) T^ 2<s+1<10 (y = s + 2, p = 2) | s+1<=2 (y = s + 2, p = 3) T^!(2 < s + 1< 10) (y = s + 1, p = 2) … T^s + 1<=2 (y = s + 1, p = 3) Prove p > 0? 11
Abstract Interpretation Symbolic execution tries to record all changes and relations in the memory with symbolic values Too many things to record, not scalable Usually only a small part of data is useful Abstract Interpretation Using similar ways with symbolic execution Instead of using symbolic values, using abstract values… 12
Abstract Interpretation Abstract domains A map from concrete values to abstract values Example: Integer -> +, -, 0 String -> [0…9]*, other Pointer -> null / not null Abstract Operations +, -, *, /, concatenation … Join: when two branch merge, or a statement is executed for the second time OP: Dom*Dom -> Dom 13
Abstract Operations An example of integers Integer -> +, -, 0 + (+) + = + - (+) - = - + (+) - = ? Two special abstract values in abstract domains : means all possible values : means no value 14
Abstract Interpretation Complex Example p = y = read(); p = 1; while(y < 10){ y = y + 1; if y >2 p = p + 1; else p = p + 2; } print (p); p > 0 It is called a fixed point! p > 0 p > 0 p > 0 p > 0 (+) 1 -> p > 0 p > 0 (+) 1 -> p > 0 p > 0 (+) 2 -> p > 0 p > 0 (+) 2 -> p > 0 p > 0 (join) p > 0 -> p > 0 p > 0 (join) p > 0 -> p > 0 Prove p > 0?
State-of-practice: static bug detection Findbugs A tool developed by researchers from UMD Widely used in industry for code checking before commit The idea actually comes from Lint Lint A code style enforcing tool for C language Find bad coding styles and raise warnings Bad naming Hard coded strings … 16
Idea: do it reversely Most static bug detection tools Set up a specification (either from users or well-defined ones) E.g., Devisor should not be 0, null pointer should not be referred to, the salary of a personal cannot be negative Check all possible cases to guarantee that the specification hold Otherwise provide counter-examples Findbugs Detect code patterns for bugs E.g., a = null, b = a.field; str.replace(“ ”, “”); 17
Characters of Findbugs Based on existing concrete code patterns Check code patterns locally: only do inner-procedure analysis What are the advantages and disadvantages of doing so? Perform bug ranking according to the probability and potential severity of bugs Probability: the bug is likely to be true Severity: the bug may cause severe consequence if not fixed 18
Application of Findbugs-like tools Findbugs is adopted by a number of large companies such as Google Usually only the issues with highest confidence/severity are reported as issues A statistics in Google 2009: More than 4000 issues are identified, in which 1700 bugs are confirmed, and 1100 are fixed. The software department of USAA is using PMD, an alternative of Findbugs 19
Patterns to be checked 404 bug patterns in 6 major categories Bad Practice / Dodgy code Correctness Internationalization Vulnerability / Security Multithread correctness Performance 20
Bad Practice / Dodgy code Hackish code, not stable and may harm future maintenance Examples: Equals method should not assume type of object argument boolean Equals(Object o){ Myclass my = (Myclass)o; return my.id = this.id; } Abstract class defines covariant compareTo() method int compareTo(Myclass obj){ … } 21
Correctness The code pattern may result in incorrect behavior of the software Examples: DMI: Collections should not contain themselves List s = new …; … if(s.contains(s)){ … } DMI: Invocation of hashCode on an array Int[] x = new int[10]; … x.hashcode(); 22
Internationalization A code pattern that will hard future i18n of the software Example: Use toUpperCase, toLowerCase on localized strings String s = getLocale(key); s.toUpperCase(); Perfrom tobytes() on localized strings String s = getLocale(key); s.getBytes(); 23
Multi-thread correctness A code pattern that may cause incorrectness in multi-thread execution Examples Synchronization on boxed primitive private static Boolean inited = Boolean.FALSE; ... synchronized(inited) { if (!inited) { init(); inited = Boolean.TRUE; } } ... 24
Vulnerability/Security The code pattern may result in vulnerability or security issues Examples: SQL: A SQL query is generated from a non-constant String String str = “select” + bb + ” ddd” + … server.execute(str); This code directly writes an HTTP parameter to JSP output, which allows for a cross site scripting vulnerability Para = request.getParameter(key); out.print(Para); 25
Performance The code pattern may harm the performance of the software Examples: SBSC: Method concatenates strings using + in a loop String s = ""; for (int i = 0; i < field.length; ++i) { s = s + field[i]; } StringBuffer buf = new StringBuffer(); for (int i = 0; i < field.length; ++i) { buf.append(field[i]); } String s = buf.toString(); 26
Major problem: False positives Overall precision 5% to 10% on open source and industry projects Developers want to make sure they do not waste effort on a false positive Usually more bugs than developers can fix 27
Solution: Bug ranking Ranking bug categories Some categories are more likely to be bugs than others How to give scores to each category? Check large number of issues in the history of software How large a proportion is fixed? Raise precision to about 30% in the 25% top ranked bugs 28
Findbugs Disadvantages Can not guarantee the software to be free of certain bugs Still involve many false positives Advantages Easy to start Scalable Relatively less false positives Some what like testing Becomes the most popular and practical static bug detection techniques 29
Review of Static Bug Detection Specification-based static bug detection Value Specifications Temporal Specifications Data Flow Specifications Pattern-based static bug detection Findbugs Bug Ranking