Building a Model for Static Analysis: Lexical, Parsing, Abstract Syntax, Semantic Analysis

Static Analysis Chapter 4

Summary (1) • Building a model of the program: • Lexical analysis • Parsing • Abstract syntax • Semantic Analysis • Tracking control flow • Tracking Dataflow • Taint propagation • Pointer aliasing

Reporting Reporting results Eliminating unwanted results Explaining the significance of the results. Summary (2) • Analysis Algorithms • Checking Assertions • Naive local analysis • Approaches to local analysis • Global analysis • Research tools • Rules • Rule formats • Rules for taint propagation • Rules in print

Introduction

Building a model: Lexical Analysis • Decompose input into sequence of “tokens. • Ignore comments, whitespace. • Often uses regular expressions

Building a model: Parsing • Uses a “context free grammar” to match the token stream. • A Grammar consists of a set of productions which describe symbols in the language. • Parser performs a derivation by matching the productions rules in the grammar to produce a “parse tree”. • There are many facilities to build scanners and parsers; lex and yacc are just one pair

Building a model: Abstract syntax • Abstract syntax tree = syntax tree - “garbage nonterminals” • Sometimes, the AST may simplify the code: for example, all loops could be converted to a special kind of loop. If a tool is multilingual, some ASTs for different languages may be similar.

Building a model: Semantic Analysis • During parsing, a symbol table is being build also; • It points to the definition • Contains its type • Need the info to make explicit conversions, can also do some method checking. • Often, the AST is now converted to a form more suited to the analysis, could be more than one.

Building a model: Tracking control flow • Construct control flow graph from intermediate representation: • nodes are basic blocks of code, • edges are potential control flows between the different nodes. • Back edges represent possible loops. • Call graph traces calls between nodes

Building a model: Tracking dataflow • Using the control flow analysis, we can now see how variables get set: this leads to converting a program to Static Single Assignment form (SSA) • SSA = variables are only assigned a value once (so use sub-index). • In case of merging control flows, use a Φ-function to “resolve” the issue. Φ(v1,v2)

Building a model: taint propagation • Taint propagation = tracing where user-controlled variables/values are

Building a model: Pointer aliasing • Detecting pointer aliasing is important in static analysis for taint propagation. • Many possible relationships, such as: • “must alias” • “may alias” • “cannot alias” • etc

Analysis Algorithms: Introduction • Reason for analyzing a program is determining context. For example • cin >> is bad only when used with character arrays • Danger of strcpy depends on parameter sizes. • Two parts to analysis: • Analysis within procedures aka intraprocedural analysis. • Analysis of procedure interactions aka interprocedural analysis. • Because the two terms are so similar will use the terms “local” and “global” instead.

Analysis Algorithms:Checking Assertions • Easiest way to check situations is to check assertions. For example strcpy(dest, src) is safe if (and only if) assert(alloc_size(dest) > strlen(src)); • Will use this approach to check for problems. • Three kinds of problems: • Trusting input for badly behaving data • Buffer overflows • Variable/type state.

Analysis Algorithms:Naive local analysis • Need to have an idea of local variable values to check assertions. • Straight-lne code is relatively simple. • Gets complicated when program logic gets complicated. • Loops are worse than if-then-else

Analysis Algorithms:Approaches to local analysis • Abstract interpretation: abstract away irrelevant properties. • For loops, do a flow insensitive analysis . • Predicate Transfomers: • Weakest precondition necessary to satisfy a postcondition. (Dijkstra) • Model Checking (for example, a FSA)

Analysis AlgorithmsGlobal Analysis • Cannot be safely ignored. (For example, an environment variable is passed to a procedure as a character array, which is then copied, opening a vulnerability). • Naive approach: inlining... • More flexible approach: function summaries: • Short statement specifying pre-conditions and post-conditions of the function; often replace the function during analysis. • Program analysis then becomes similar to a graph traversal.

LAPSE: Light Analysis for Program Security in Eclipse (for J2EE apps) MOPS: Model checking Programs for Security properties SATURN uses Boolean satisfiabiity for temporal safety problems Analysis AlgorithmsResearch Tools (1) • ARCHER = Array CHEckeR • BOON: checks array indices, imprecise • Cqual uses type analysis to do taint analysis; requires taint declarations • Eau Claire uses a theorem prover

Xg++ uses templates to find uses of untrusted data. Some others: ESP SLAM BLAST Analysis AlgorithmsResearch Tools (2) • Splint, by adding annotations, can be used to find abstraction violations, like global variable modifications, use before init, array bounds.. • Pixy detects XSS vulnerability in PHP programs

Rules • The need for rules. • Automatic? • Need for security, define library behavior, etc. • Built-in or can be added?

Rule formats • Best to write the rules independently of the analyzer. But need formats: • (page 98 for two variants) • Annotations in the code (Java, Microsoft SAL) • There were many ad-hoc systems for writing rules which were unsuccessful. • PQL (page 101)

Rules for taint propagation • Source rules • Sink rules • Pass-through rules • Cleanse rules • Entry-points (invoked by attacker) Taint flags: taint types.

Reports • Careful with output! • False positives • Try to be part of a Unified SDE • Try to make result formatting and ordering an option. • Sometimes assumptions may simplify the output. • Rank by severity (most severe first) (buffer overflow more severe than null pointer dereference) • Confidence is important. (p 107 for a graph

Reports (2) • Eliminating unwanted results: use pragmas or annotations, or eliminate whole categories. If code modification is not possible, a combination of line numbering and pattern matching is necessary. • Result significance, see examples in book.

Building a Model for Static Analysis: Lexical, Parsing, Abstract Syntax, Semantic Analysis

Building a Model for Static Analysis: Lexical, Parsing, Abstract Syntax, Semantic Analysis

Presentation Transcript

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4-4

Chapter 4

Chapter 4

Chapter 4 - 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

Chapter 4