240 likes | 373 Views
Julio Auto [julio {funny a} julioauto com]. Triaging Bugs with Dynamic Dataflow Analysis. Agenda. The Problem The Solution Demo Solution Details What’s Next ? Greetings & References. Preface. We will be talking about analyzing closed-source software here
E N D
Julio Auto [julio {funny a} julioauto com] Triaging Bugs with Dynamic Dataflow Analysis
Agenda • The Problem • The Solution • Demo • Solution Details • What’s Next? • Greetings & References
Preface • We will be talking about analyzing closed-source software here • Absolutely no debugging information needed • However... • Depending on the complexity of the bug, even people with the source might opt for this analysis too • E.g. Vendors receiving crash reports
The Problem • Sometimes people just have to analyze bugs in closed-source software • These bugs may come from: • A fuzzing session • Contributor-sent Proof-of-Concept codes • In-the-wild exploit code • Etc... • As varying as the sources of bugs are the reasons why one wants to analyze them, but this is irrelevant. The fact is...
The Problem (2) • ANALYZING BUGS CAN BE HARD! • A seasoned reverse engineer may take weeks to get somewhere • If the target software is too big • If the data consumed is in a very complex and/or undisclosed format • If bugs in this target are so rare that your reversing team has no previous experience with it • But which bugs do we mostly care for?
The Problem (3) • “Analyzing bugs” is very broad • No ./write-me-a-very-detailed-advisory • We will concentrate in answering one question: what exact part of my data made the program crash? • Understanding that and how such data is transformed is primordial
The Solution • DynamicDataflowAnalysis • Watching data and its ramifications as thedoomedprogram executes • Whatwe do really is TaintAnalysis • We start with a subsetoftheprogram’s data: theattacker’s input – assume it’sevil • Its ‘ramifications’ are tainted memory, taintedregisters • ... butwe do it backwards.
The Solution (2) TaintANalysis Backwardstaintanalysis This is theEvil Input This is ofinterest Is anyoftheseofinterest? Is anyofthesefromtheEvil Input?
The Solution (3) • Sowereallydon’tcareabouteverytaintedpieceof data in theprocessspace • Mostof it is legitimate, anyway • Thus, weavoidtheexplosionofwatched data • Pluswecan do stufflike: • Bug: moveax, [esi] (whereesi = DEADBEEFh) • Analysis runs... • ... andreports: esi = user[4] + var_unk * 8
The Solution (4) • This is alldone in twosteps: tracingandanalysis • Firstwe trace theprogramfrom a “good” pointuntil it crashes • The trace is incrementallydumped to a file • Notjustthedisassembly, butalso some extra info • E.g.: In thepastslide’sexample, effectiveaddress ([esi]) == DEADBEEFh • Thenthe trace file goesunderanalysis
The Solution (5) • The “good” starting point
The Solution (6) • Sowefeedthe trace file to theanalyzerandtell it: • “Address ranges fromABCDh to ACCDhandfromDCBAh to DCCAhheldEvil Input” • “I wannaknowif ‘esi’ wastaintedbyEvil Input” • Andmagichappens!
The Solution (7) • Considerations • Tracing is verytime-consuming • For the bug I’ll analyze as an example, it takes about 2 hours to dump the 650,000+ instructions it executes • Theanalysis... notsomuch • 1 to 2 minutes • May sound like much, but how long would take to do it manually? • Plus, youcanalways use this time to do somethingelsewhilethecomputer is working for you
Demo • Introducing... Visual Data Tracer!
Solution Details • The VDT Tracer is implemented as a WinDbg extension • Because WinDbg is free and it’s a great debugger • The VDT Analyzer is a stand-alone C++ app • The tracer needs to understand some simple instruction “semantics” • E.g.: The source and destination operands • Currently only the basic x86 subset is implemented (no x87, MMX, etc)
Solution Details (2) • The semantic rules are simplified to avoid dumping useless info to the trace file • E.g.: a ‘push’ does not meaninfgully change ‘esp’ (same for ‘inc’, ‘dec’, and their destination ops) • They are also written to fit the very simplistic format of the trace file entries • All of this makes the analysis easier, thus faster, and yet useful
Solution Details (3) • Trace file entry: • Mnemonic • Destination operand • Source operand • Up to three source operand “dependences” • Dependences are, for example, the elements of an indirectly addressed memory operand • This effectively exposes the dataflow relations as a Tree (rooted at the crash instruction) • Performing the backwards taint analysis becomes then a matter of searching the tree, which VDT does with a BFS algorithm
Solution Details (4) • Putting it together so far movedi, 0x1234 ; dst=edi, src=0x1234 moveax, [0xABCD] ; dst=eax, src=ptr 0xABCD ; Note 0xABCD is evil addr leaebx, [eax+ecx*8] ; dst=ebx, src=eax, srcdep1=ecx mov [edi], ebx ; dst=ptr 0x1234, src=ebx movesi, [edi] ; dst=esi, src=ptr 0x1234, srcdep1=edi movedx, [esi] ; Crash!!!
Solution Details (5) • Simplifying semantic rules to fit that format is not always easy • CMPXCHG r/m32, r32 • “Compare EAX with r/m32. If equal, ZF is set and r32 is loaded into r/m32. Else, clear ZF and load r/m32 into EAX.” • The aftermath: the need for “conditional taints” • i.e. One of the possibilities of controlling ‘r/m32’ is controlling ‘r32’ AND ‘eax’ • Note that “alternative taints” is also existant, implemented in the form of srcdep{1,2,3}
Solution Details (6) • Other subtleties to watch for • AH defines EAX • EAX defines AL • AL does not define AH • Similar problem for 1-byte and 2-byte memory accesses
What’s Next? • Extending the coverage of x86 • Enhancing speed • God knows how... • Heuristically detecting user input • e.g. By making the tracer understand CreateFile() • Automatic exploit generation • What else? • Any ideas, let me know...
References • SpiderPig Project - http://piotrbania.com/all/spiderpig/ • Very similar ideas, different approach • !exploitable - http://www.codeplex.com/msecdbg • A more superficial (but much faster) tool for bug triaging • If you have many bugs to triage, you can first run !exploitable on them and, then, use VDT on those that seem really interesting
Greetings • Julien Vanegue • For all the lecturing, motivating and supporting • Piotr Bania • For discussing DDF analysis and much more • People from PSV (http://www.unprotectedhex.com/psv) • For letting me idle on IRC, leeching their knowledge • Everyone else who talks to me about security and similarly cool stuff
Julio Auto [julio {funny a} julioauto com] Triaging Bugs with Dynamic Dataflow Analysis