290 likes | 786 Views
Elsa/Oink/Cqual++: Open-Source Static Analysis for C++. Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006. Goals. Build extensible infrastructure to Find certain categories of bugs
E N D
Elsa/Oink/Cqual++:Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006
Goals • Build extensible infrastructure to • Find certain categories of bugs • Exhaustively, within some constraints • At compile time • In real-world C and C++ programs • Using composable analyses
Components • Elkhound: Generalized LR Parser Generator • Elsa: C++ Parser • Oink: Whole-program dataflow • Cqual++: Type qualifier analysis
Elkhound: GLR Parser Generator • GLR eliminates the pain of LALR(1) • Unbounded lookahead • Allows ambiguous grammars! • 10x faster than other GLR implementations • Novel combination of GLR and LALR(1) • User-defined disambiguation • Early: during parsing • Late: after generating AST w/ambiguities
Type Expr Type Expr Example: ‘>’ ambiguity new C < 3 > + 4 > + 5 ; new C < 3 > + 4 > + 5 ;
Example: ‘>’ ambiguity Type Correct Expr new C < 3 > + 4 > + 5 ; Type Incorrect Expr new C < 3 > + 4 > + 5 ; unparenthesized ‘>’ symbol
Example: Type vs. Variable • In C & C++, sometimes hard to tell whether a name refers to a type or a variable Expr Expr Type Expr (a) & (b) (a) & (b) or
Example: Type vs. Variable • In C & C++, sometimes hard to tell whether a name refers to a type or a variable int a; // hidden class C { int f(int b) { return (a) & (b); } typedef int a; // visible };
Elsa: Extensible C++ Front-end • Parses ANSI C++ with GNU extensions • Uses GLR to handle the ambiguities • Extensible components: • flex lexer • Elkhound parser • AST defined with custom tool • Type checker
No lexer feedback hack! The Elsa Block Diagram possibly ambiguous AST annotated unambiguous AST preproc’d source token stream finalAST Type Checker Post Process Lexer Parser
Extending the Syntax • ANSI or GNU? Both! • Declarative language • Extend simply by concatenating ANSI Base: GNU Extension: nonterm ConditionalExp { -> Exp {...} -> Exp "?" Exp ":" Exp {...} } nonterm ConditionalExp { -> Exp "?" ":" Exp {...} }
superclass name superclass ctor parameter subclass ctor list parameter subclass names subclass ctor parameter Declarative Abstract Syntax class Statement (SourceLoc loc){ -> S_compound(ASTList<Statement> stmts); -> S_if(Condition cond, Statement thenBranch, Statement elseBranch); -> S_while(Condition cond, Statement body); // ... }
Extending the Abstract Syntax • ANSI or GNU? Both! • Declarative language • Extend simply by concatenating ANSI Base: GNU Extension: class Statement { -> S_decl(Declaration decl); -> S_expr(Expression expr); -> S_if(...); -> S_for(...); M } class Statement { -> S_function(Function f); } GNU nested functions
Semantic Analysis • Disambiguate • Compute types • Resolve overloading • Insert implicit conversions • Instantiate templates
Disambiguation Ambiguous syntax example: return (x)(y); S_return expr ambiguity link E_cast E_funCall type expr func arg TypeId E_variable E_variable E_variable x y
Lowered Output: Simplified C++ • Original or Lowered output can be printed • Lowering always done: • Templates are instantiated • Implicit type conversions inserted • Lowering optionally done: • Implicit member functions created • Implicit ctor/dtor calls inserted
C++ or XML, In and Out C++ C++ Elsa XML XML First pass renders to a canonical form. Serialization commutes with lowering.
Cqual++: Dataflow • Dataflow Analysis on Type Qualifiers • Successor to Cqual: Jeff Foster, Alex Aiken char $tainted *getenv(); void printf(char $untainted *fmt, ...); int main() { char *x = getenv(“foo”)); printf(x);}
Feature: Polymorphic Dataflow int f(int x) {return x;} int main() { int $tainted t = ...; int a = f(t); int $untainted u = f(3); }
Feature: “Funky Qualifiers”:Fake Function Bodies char $_1_2 *strcat(char $_1_2 *dest, const char $_1 *src); int main() { char $tainted *x; char $untainted *y; strcat(y, x); } {1} ½ {1,2}
Feature: Separate Compilation for Scalability • “Compile” each file to a dataflow graph • only flow behavior between external symbols matters • compress by finding smaller graph with same flow behavior; typically saves factor of 12 • “Link” each graph • AST is gone at linking so we save even more space
Non-Feature: Cqual++ Is Not Flow-Sensitive q = p; ... time passes ... p->s = read_from_network(); use_in_untrusting_way(p->s); // does p == q still?? q->s = "innocuous"; use_in_trusting_way(p->s); $tainted??
What Exactly Is ‘Data-Flow’? char *launderString(char *in) { int len = strlen(in); char *out = malloc(len+1); for (int i=0; i<len; ++i) { out[i] = 0; for (int j=0; j<8; ++j) if (in[i] & (1<<j)) out[i] |= (1<<j); } out[len] = '\0'; return out; }
Application: Finding Format-String Vulnerabilities • Printf() is an interpreter • the format string is a program • %n writes number of bytes written to memory pointed to by the arg • ex: printf(“stuff%n”, p) means *p = 5 • if no argument p, printf() writes through some pointer on the stack • do not allow untrusted data in first arg to printf
Application: Finding User-Kernel Vulnerabilities • Kernel must check user pointers are valid • must point to memory mapped into user process’s address space • otherwise could manipulate the kernel data • This is also a dataflow/taint analysis
Rob’s Cqual LinuxUser-Kernel Results • 2.4.20, full config, 7 bugs, 275 false pos. • 2.4.23, full config, 6 bugs, 264 false pos. • including other trials on same kernels: • found 17 different security vulnerabilites • found bugs missed by other tools and manually • all but one bug confirmed exploitable • significant “bug churn” across kernel versions
Linus’s “Sparse” Toolfor User-Kernel Vulnerabilities • Linus also has a tool using type qualifiers • it requires manual annotation of every var • In contrast, Cqual++ infers the qualifiers • only sources and sinks need be annotated • and any “sanitizer” functions: • Linus says this “is not the C way” • ok, he can write all the annotations
Future Application: Finding Character-Set Confusions • Microsoft confusing ASCII and UCS2 • Mozilla has 20-ish differnt charcter sets • they should only flow together through conversion functions • if array sizes differ, confusions can be a security hole too
Oink Vision:Composable Analysis Tools • Compilers refuse to compile bugs • well, some classes of bugs • and you may have to wait until tomorrow morning to find out • Correctness analysis is expected as part of any compiler toolchain • The analyses are composable and extensible