1 / 29

Elsa/Oink/Cqual++: Open-Source Static Analysis for C++

Elsa, Oink, and Cqual++ are open-source static analysis tools for C++ that aim to find certain categories of bugs at compile time. They use composable analyses components to analyze real-world C and C++ programs.

alabelle
Download Presentation

Elsa/Oink/Cqual++: Open-Source Static Analysis for C++

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Elsa/Oink/Cqual++:Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

  2. Goals • Build extensible infrastructure to • Find certain categories of bugs • Exhaustively, within some constraints • At compile time • In real-world C and C++ programs • Using composable analyses

  3. Components • Elkhound: Generalized LR Parser Generator • Elsa: C++ Parser • Oink: Whole-program dataflow • Cqual++: Type qualifier analysis

  4. Elkhound: GLR Parser Generator • GLR eliminates the pain of LALR(1) • Unbounded lookahead • Allows ambiguous grammars! • 10x faster than other GLR implementations • Novel combination of GLR and LALR(1) • User-defined disambiguation • Early: during parsing • Late: after generating AST w/ambiguities

  5. Type Expr Type Expr Example: ‘>’ ambiguity new C < 3 > + 4 > + 5 ; new C < 3 > + 4 > + 5 ;

  6. Example: ‘>’ ambiguity Type Correct Expr new C < 3 > + 4 > + 5 ; Type Incorrect Expr new C < 3 > + 4 > + 5 ; unparenthesized ‘>’ symbol

  7. Example: Type vs. Variable • In C & C++, sometimes hard to tell whether a name refers to a type or a variable Expr Expr Type Expr (a) & (b) (a) & (b) or

  8. Example: Type vs. Variable • In C & C++, sometimes hard to tell whether a name refers to a type or a variable int a; // hidden class C { int f(int b) { return (a) & (b); } typedef int a; // visible };

  9. Elsa: Extensible C++ Front-end • Parses ANSI C++ with GNU extensions • Uses GLR to handle the ambiguities • Extensible components: • flex lexer • Elkhound parser • AST defined with custom tool • Type checker

  10. No lexer feedback hack! The Elsa Block Diagram possibly ambiguous AST annotated unambiguous AST preproc’d source token stream finalAST Type Checker Post Process Lexer Parser

  11. Extending the Syntax • ANSI or GNU? Both! • Declarative language • Extend simply by concatenating ANSI Base: GNU Extension: nonterm ConditionalExp { -> Exp {...} -> Exp "?" Exp ":" Exp {...} } nonterm ConditionalExp { -> Exp "?" ":" Exp {...} }

  12. superclass name superclass ctor parameter subclass ctor list parameter subclass names subclass ctor parameter Declarative Abstract Syntax class Statement (SourceLoc loc){ -> S_compound(ASTList<Statement> stmts); -> S_if(Condition cond, Statement thenBranch, Statement elseBranch); -> S_while(Condition cond, Statement body); // ... }

  13. Extending the Abstract Syntax • ANSI or GNU? Both! • Declarative language • Extend simply by concatenating ANSI Base: GNU Extension: class Statement { -> S_decl(Declaration decl); -> S_expr(Expression expr); -> S_if(...); -> S_for(...); M } class Statement { -> S_function(Function f); } GNU nested functions

  14. Semantic Analysis • Disambiguate • Compute types • Resolve overloading • Insert implicit conversions • Instantiate templates

  15. Disambiguation Ambiguous syntax example: return (x)(y); S_return expr ambiguity link E_cast E_funCall type expr func arg TypeId E_variable E_variable E_variable x y

  16. Lowered Output: Simplified C++ • Original or Lowered output can be printed • Lowering always done: • Templates are instantiated • Implicit type conversions inserted • Lowering optionally done: • Implicit member functions created • Implicit ctor/dtor calls inserted

  17. C++ or XML, In and Out C++ C++ Elsa XML XML First pass renders to a canonical form. Serialization commutes with lowering.

  18. Cqual++: Dataflow • Dataflow Analysis on Type Qualifiers • Successor to Cqual: Jeff Foster, Alex Aiken char $tainted *getenv(); void printf(char $untainted *fmt, ...); int main() { char *x = getenv(“foo”)); printf(x);}

  19. Feature: Polymorphic Dataflow int f(int x) {return x;} int main() { int $tainted t = ...; int a = f(t); int $untainted u = f(3); }

  20. Feature: “Funky Qualifiers”:Fake Function Bodies char $_1_2 *strcat(char $_1_2 *dest, const char $_1 *src); int main() { char $tainted *x; char $untainted *y; strcat(y, x); } {1} ½ {1,2}

  21. Feature: Separate Compilation for Scalability • “Compile” each file to a dataflow graph • only flow behavior between external symbols matters • compress by finding smaller graph with same flow behavior; typically saves factor of 12 • “Link” each graph • AST is gone at linking so we save even more space

  22. Non-Feature: Cqual++ Is Not Flow-Sensitive q = p; ... time passes ... p->s = read_from_network(); use_in_untrusting_way(p->s); // does p == q still?? q->s = "innocuous"; use_in_trusting_way(p->s); $tainted??

  23. What Exactly Is ‘Data-Flow’? char *launderString(char *in) { int len = strlen(in); char *out = malloc(len+1); for (int i=0; i<len; ++i) { out[i] = 0; for (int j=0; j<8; ++j) if (in[i] & (1<<j)) out[i] |= (1<<j); } out[len] = '\0'; return out; }

  24. Application: Finding Format-String Vulnerabilities • Printf() is an interpreter • the format string is a program • %n writes number of bytes written to memory pointed to by the arg • ex: printf(“stuff%n”, p) means *p = 5 • if no argument p, printf() writes through some pointer on the stack • do not allow untrusted data in first arg to printf

  25. Application: Finding User-Kernel Vulnerabilities • Kernel must check user pointers are valid • must point to memory mapped into user process’s address space • otherwise could manipulate the kernel data • This is also a dataflow/taint analysis

  26. Rob’s Cqual LinuxUser-Kernel Results • 2.4.20, full config, 7 bugs, 275 false pos. • 2.4.23, full config, 6 bugs, 264 false pos. • including other trials on same kernels: • found 17 different security vulnerabilites • found bugs missed by other tools and manually • all but one bug confirmed exploitable • significant “bug churn” across kernel versions

  27. Linus’s “Sparse” Toolfor User-Kernel Vulnerabilities • Linus also has a tool using type qualifiers • it requires manual annotation of every var • In contrast, Cqual++ infers the qualifiers • only sources and sinks need be annotated • and any “sanitizer” functions: • Linus says this “is not the C way” • ok, he can write all the annotations

  28. Future Application: Finding Character-Set Confusions • Microsoft confusing ASCII and UCS2 • Mozilla has 20-ish differnt charcter sets • they should only flow together through conversion functions • if array sizes differ, confusions can be a security hole too

  29. Oink Vision:Composable Analysis Tools • Compilers refuse to compile bugs • well, some classes of bugs • and you may have to wait until tomorrow morning to find out • Correctness analysis is expected as part of any compiler toolchain • The analyses are composable and extensible

More Related