Automated Whitebox Fuzz Testing

Automated Whitebox Fuzz Testing Patrice Godefroid (Microsoft Research)‏ Michael Y. Levin (Microsoft Center for Software Excellence)‏ David Molnar (UC-Berkeley, visiting MSR) Presented by Yuyan Bao

Acknowledgement • This presentation is extended and modified from • The presentation by David Molnar • Automated Whitebox Fuzz Testing • The presentation by Corina Păsăreanu (Kestrel) • Symbolic Execution for Model Checking and Testing

Agenda • Fuzz Testing • Symbolic Execution • Dynamic Testing Generation • Generational Search • SAGE Architecture • Conclusion • Contribution • Weakness • Improvement

Fuzz Testing • An effective technique for finding • security vulnerabilities in software • Apply invalid, unexpected, or • random data to the inputs of a program. • If the program fails(crashing, access violation exception), the defects can be noted.

Whitebox Fuzzing • This paper present an alternative whitebox fuzz testing approach inspired by recent advances in symbolic execution and dynamic test generation • Run the code with some initial well-formed input • Collect constraints on input with symbolic execution • Generate new constraints (by negating constraints one by one) • Solve constraints with constraint solver • Synthesize new inputs

Symbolic Execution • A method for deriving test cases which satisfy a given path • Performed as a simulation of the computation on the path • Initial path function = Identity function, Initial path condition = true • Each vertex on the path induce a symbolic composition on the path function and a logical constraint on the path condition: If an assignment was made: If a conditional decision was made: path condition path condition branch condition Output: path function and path condition for the given path Slide from “White Box Testing and Symbolic Execution” by Michael Beder White Box Testing and Symbolic Execution

Symbolic execution tree: x:X,y:Y PC: true true false x:X,y:Y PC: X>Y x:X,y:Y PC: X<=Y Code: int x, y; if (x > y) { x = x + y; y = x - y; x = x - y; if (x – y > 0) assert (false); } x:X+Y,y:Y PC: X>Y x:X+Y,y:X PC: X>Y Not reachable x:Y,y:X PC: X>Y - “Simulate” the code using symbolic values instead of program numeric data true false (PC=“path condition”) x:Y,y:X PC: X>Y  Y-X>0 x:Y,y:X PC:X>Y  Y-X<=0 FALSE! Symbolic Execution

Dynamic Test Generation • Executing the program starting with some initial inputs • Performing a dynamic symbolic execution to collect constraints on inputs • Use a constraint solver to infer variants of the previous inputs in order to steer the next executions

Dynamic Test Generation void top(char input[4]) { int cnt = 0; if (input[0] == ‘b’) cnt++; if (input[1] == ‘a’) cnt++; if (input[2] == ‘d’) cnt++; if (input[3] == ‘!’) cnt++; if (cnt >= 3) crash(); } input = “good” • Running the program with random values for the 4 input bytes is unlikely to discovery the error: a probability of about • Traditional fuzz testing is difficult to generate input values that will drive program through all its possible execution paths.

Depth-First Search I0 != ‘b’ I1 != ‘a’ void top(char input[4]) { int cnt = 0; if (input[0] == ‘b’) cnt++; if (input[1] == ‘a’) cnt++; if (input[2] == ‘d’) cnt++; if (input[3] == ‘!’) cnt++; if (cnt >= 3) crash(); } I2 != ‘d’ I0 != ‘b’ I3 != ‘!’ I1 != ‘a’ I2 != ‘d’ I3 != ‘!’ good

Depth-First Search void top(char input[4]) { int cnt = 0; if (input[0] == ‘b’) cnt++; if (input[1] == ‘a’) cnt++; if (input[2] == ‘d’) cnt++; if (input[3] == ‘!’) cnt++; if (cnt >= 3) crash(); } I0 != ‘b’ I1 != ‘a’ I2 != ‘d’ I3 == ‘!’ good goo!

Practical limitation void top(char input[4]) { int cnt = 0; if (input[0] == ‘b’) cnt++; if (input[1] == ‘a’) cnt++; if (input[2] == ‘d’) cnt++; if (input[3] == ‘!’) cnt++; if (cnt >= 3) crash(); } I0 != ‘b’ I1 != ‘a’ I2 == ‘d’ I3 != ‘!’ good godd Every time only one constraint is expanded, low efficiency!

Generational search • BFS with a heuristic to maximize block coverage • Score returns the number of new blocks covered Slide from “Symbolic Execution” by Kevin Wallace, CSE504 2010-04-28

Generational Search Key Idea: One Trace, Many Tests bood void top(char input[4]) { int cnt = 0; if (input[0] == ‘b’) cnt++; if (input[1] == ‘a’) cnt++; if (input[2] == ‘d’) cnt++; if (input[3] == ‘!’) cnt++; if (cnt >= 3) crash(); } gaod I0 == ‘b’ godd I1 == ‘a’ I2 == ‘d’ I3 == ‘!’ good goo! “Generation 1” test cases

The Generational Search Space void top(char input[4]) { int cnt = 0; if (input[0] == ‘b’) cnt++; if (input[1] == ‘a’) cnt++; if (input[2] == ‘d’) cnt++; if (input[3] == ‘!’) cnt++; if (cnt >= 3) crash(); } 0 good 1 goo! 1 godd 2 god! 1 gaod 2 gao! 2 gadd 1 bood 2 boo! 2 bodd 2 baod gad! bod! baod badd bad!

SAGE Architecture(Scalable Automated Guided Execution) Constraints Input0 Trace File Task: Tester Check for Crashes (AppVerifier)‏ Task: Tracer Trace Program (Nirvana)‏ Task: CoverageCollector Gather Constraints (Truscan)‏ Task: SymbolicExecutor Solve Constraints(Disolver) Input1 Input2 … InputN

Initial Experiences with SAGE • Since 1st MS internal release in April’07: dozens of new security bugs found(most missed by blackbox fuzzers, static analysis)‏ • Apps: image processors, media players, file decoders,… • Many bugs found rated as “security critical, severity 1, priority 1” • Now used by several teams regularly as part of QA process

Experiments ANI Parsing - MS07-017 RIFF...ACONLIST B...INFOINAM.... 3D Blue Alternat e v1.1..IART.... ................ 1996..anih$...$. ................ ................ ..rate.......... ..........seq .. ................ ..LIST....framic on......... .. RIFF...ACONB B...INFOINAM.... 3D Blue Alternat e v1.1..IART.... ................ 1996..anih$...$. ................ ................ ..rate.......... ..........seq .. ................ ..anih....framic on......... .. Only 1 in 232 chance at random! Crashing test case Initial input Initial input : 341 branch constraints, 1,279,939 total instructions. Crash test cases: 7706 test cases, 7hrs 36 mins

1867196225 X X X X X 2031962117 X X X X X 612334691 X X 1061959981 X X 1212954973 X X 1011628381 X X X 842674295 X 1246509355 X X X 1527393075 X 1277839407 X 1951025690 X Different Crashes From Different Initial Input Files 100 zero bytes Bug ID seed1 seed2 seed3 seed4 seed5 seed6 seed7 Media 1: 60 machine-hours, 44,598 total tests, 357 crashes, 12 bugs

ConclusionBlackbox vs. Whitebox Fuzzing • Cost/precision tradeoffs • Blackbox is lightweight, easy and fast, but poor coverage • Whitebox is smarter, but complex and slower • Recent “semi-whitebox” approaches • Less smart but more lightweight: Flayer (taint-flow analysis, may generate false alarms), Bunny-the-fuzzer (taint-flow, source-based, heuristics to fuzz based on input usage), autodafe, etc. • Which is more effective at finding bugs? It depends… • Many apps are so buggy, any form of fuzzing finds bugs! • Once low-hanging bugs are gone, fuzzing must become smarter: use whitebox and/or user-provided guidance (grammars, etc.)‏ • Bottom-line: in practice, use both!

Contribution • A novel search algorithm with a coverage-maximizing heuristic • It performs symbolic execution of program traces at the x86 binary level • Tests larger applications than previously done in dynamic test generation

Weakness • Gathering initial inputs • The class of inputs characterized by each symbolic execution is determined by the dependence of the program’s control flow on its inputs • Not a general solution • Only for x86 windows applications • Only focus on file-reading applications • Nirvana simulates a given processor’s architecture

Improvement • Portability • Translate collected traces into an intermediate language • Reproduce bugs on different platforms without re-simulating the program • Better way to find initial inputs efficiently

Thank you! Questions?

Automated Whitebox Fuzz Testing