410 likes | 430 Views
Practical (Introduction to) Reverse Engineering. Julio Auto <julio . auto *a* gmail >. Agenda. Part I - 101 Why this presentation? (I mean... WHY?!?!) A few concepts (Mumble jumble++) Demo (Show me the goods) Part II - 1337 Advancing RE (Do your own!) Something extra (Finish pretty)
E N D
Practical (Introduction to) Reverse Engineering Julio Auto <julio . auto *a* gmail>
Agenda • Part I - 101 • Why this presentation? (I mean... WHY?!?!) • A few concepts (Mumble jumble++) • Demo (Show me the goods) • Part II - 1337 • Advancing RE (Do your own!) • Something extra (Finish pretty) • Linkz, lulz, refz, and shoutz • Q & (maybe) A
Why? • Initially suggested by the H2HC crew • Based on my article ‘Cracking CrackMes’, published earlier this year while working for my previous employer, Scanit ME • RE is getting lots of attention, and many people seem interested in learning it • Still, it remains largely a black art
Why? (2) • It seems, then, that moving up from ground zero is the most problematic step • This presentation tries to help fix it • It aims to expose instant useful knowledge • And pointers to where go digging deeper • Instead of advanced research _results_, basic _techniques_ and _processes_ • Obs.: We’ll be targeting the Windows platform most of the time in this speech
Concepts • Reverse Engineering is a very self-explicative term • You take something and, from there, try to learn how (some aspect of) it was engineered • It’s also obviously broad • For example, it’s often used to describe the process through which you generate a higher-level, architectural view of a piece of software given its source code
My Own Concept • Think of the times you asked yourself “why” and “how” and let it go without an answer... • ... • ... • ... RE is not letting go
A Few Applications • Malware Analysis • Vulnerability Analysis • Security Assessment of 3rd-party COTS • Evaluation/Breaking of copy-protection schemes • Assorted how’s and why’s
Why Still a Black Art? • Perhaps because people think it’s only good for SW cracking • Perhaps because DRM has become a nightmare no one is happy with and related laws everywhere bash reversers too hard every now and then (does anybody remember Dmitry Sklyarov, the DMCA and all that madness?) • Perhaps because many people still think it should be illegal (wtf?!)
How To Learn • The Crack-Me approach • The one I illustrate in the paper I mentioned • Small and targeted challenges with different levels and obstacles to choose from • The real life approach • Choose a real-world problem and attack it • Tough but rewarding • We’ll demo a bit of both
Tools of The Trade • Probably millions of tools that can give you some useful piece of info about your target • I’ll try to restrict myself to the most relevant/common, then • Unfortunately, many of the best tools are commercial • On the other hand, many of them have free/student/evaluation versions • For the rest... Well, remember “the real life approach”? ;)
Debuggers • Obvious importance • Fairly good variety • It’s nice to play and know your way with all of them • But mastering them all is quite hard, so you’ll most likely elect your debugger of choice in little time • Choose your debugger well!
Debuggers (2) • WinDbg • My personal choice of debugger • Developed by MSFT • Comes for free in the “Debugging Tools for Windows” package • Amazingly rich in features • Extensible with some C++ programming • Not the easiest or simplest dev environment • Very rich API, though • Poor interface
Debuggers (3) • Visual Studio Debugger • It’s crap, not suited for reversing • But it’s pretty and nice for developers :> • Seriously, don’t try to go very far reversing with it • It may use up the rest of your sanity
Debuggers (4) • OllyDbg • Enjoys quite a lot of popularity in the reversing community • Nice interface • In particular, a nice disassembly view • Comes in a few “tuned” versions, being one of the most popular...
Debuggers (5) • Immunity Debugger • Developed by Immunity Inc. (one of uCon’s proud sponsors) • Extends OllyDbg with a python interpreter and exposes a couple of debugging modules for the user to interact with • Very neat plugin support • Embeds a command-line with windbg-aliased commands • Maintains a forum to support developers/users of ImmDbg plugins
Debuggers (6) • gdb • The standard debugger on *NIX systems • Quite complete debugger • Not the best thing in the RE world, but overall a good debugger
Disassemblers • Reading assembly is not the sweetest thing for most people • The way the code is represented is extremely important and makes an increasingly great difference in big RCE tasks • Therefore, being confortable with your disassembler is essential
Disassemblers (2) • Pretty much every debugger is capable of disassembling • Apart of that, there’s lots of other tools that can do it too • In Linux, objdump is pretty much a standard tool • However, one particular tool is specially known for its disassembly features
Disassemblers (3) • IDA Pro • Supports many binary formats and architectures • Displays the code in graphs, which greatly enhance the visualization • Block-level CFGs • Many things can be customized/adjusted • Graph layout, data types, annotations... • Quite frankly, it’s in every reverser’s toolkit • IDA Pro is a commercial tool currently in version 5.4 • But version 4.9 is available in a free edition
System Monitoring Tools • All of those from the SysInternals Suite • Process Explorer • RegMon • FileMon • TCPView • Etc...
Advanced Tools • Binary Diff’ers • BinDiff • Decompilers • Hex-Rays • RE Frameworks • ERESI ;) • PaiMei and all the PyThings
Demo • We’ll try and beat a crack-me challenge • This crack-me was taken from a real competition • HITB Dubai 2007 CTF • Perhaps it can serve as a tip for uCon’s CTF as well
RE – Advanced Topics • Cutting to the chase, advancing RE basically means automating stuff • Many of the RE tools are scriptable/programmable/extensible • Developing smart ways to deal with repetitive tasks is the way for more effective analyses
RE – Advanced Topics (2) • Less often, you might see opportunities to advance RE in ways not based on automation • Defeating a new anti-debug trick • Developing new environments for RE • Virtualization, Sandboxing... • Or even radically changing paradigms • E.g. The graph-based approach to binary navigation
RE – Advanced Topics (3) • Perhaps the most important lesson here is not to reinvent the wheel • Re-use the tools you have! • You’ll be amazed at how much stuff you can do by “glueing” pieces together • Having that said... • Perhaps the tools you have are not perfect • Or you might wanna re-do something just for learning • But be sure to have the right goals in mind!
Teaching By Example • I will demonstrate how you can use advanced RE to solve real life problems • The main idea behind the “re-use” thing I mentioned in the previous is slide is too keep your solution simple, by focusing on the logics itself rather than in the engineering • Unfortunately, what I’m about to show is actually a bad example in this aspect (more on this later)
Problem • Suppose you have ways to reproduce a high-profile, possibly exploitable bug – Yay! • BUT.... • The target is closed-source software • The target is as large and complex as an operating system – and way less documented • The input is huge and has a complex, possibly undisclosed format • The source of the bug can be anywhere in the input • From user-input to actual bug/crash, about 3 million instructions happen
Introducing LEP • LEP tries to answer a big question in this problem: • What exact part of this input is causing the bug? • If you can answer this question and somehow co-relate this with the input format, you may gain a great deal of understanding of the bug • For this, I have invented a new technique: “Staged Partial Tracing-Based Backwards Taint Analysis” • Because not sounding like a Ph.D. is so 2001 :> • And also because we all just love new terms we can go media-cuckoo about
Introducing LEP (2) • One-liner idea: If we know when our input is brought to memory and know where it’s mapped, we can trace the program from this point to the crash and then go backwards analyzing the dataflow to find out where the faulting data came from • We do it in two stages, with a component for each: the tracer and the analyzer • Simple, huh?
Fundamental Concepts • When we trace the program, it becomes “linear”, i.e. control-flow is irrelevant • Dataflow becomes concretely deterministic • Aliasing is not an issue (no need to theorize on side-effects) • All info we need is available in runtime • In particular, effective addresses • If the input is as big as the problem states, it should be no problem to find it in memory • We get most of the info we need from the disassembly text (ASCII)! It’s like hacking with grep again!
LEP Tracer • A WinDbg extension • Traces every instruction until the program raises an exception • Dumps the following instruction info to a file: • Mnemonic • Destination operand • Source operand • Dependences of the source op – e.g. mov eax,[ecx+edx*2]
LEP Tracer (2) • Discards control-flow changing instructions • Discards in/out instructions (all relevant input should be in memory already?) • Discards other groups of instructions that will be supported as we go • FPU, MMX, SSE{2,3}, etc... • Tries to parse the right info even when the debugger is too stupid to work as expected • Why not to compute effective addresses in rep’ed instructions?
LEP Analyzer • Reads the file generated by the tracer and goes bottom-up investigating the dataflow • You have to specify the piece of data that causes the last instruction to fail – usually (always?) a register • And the memory range(s) where your input was mapped into, at the time the trace was taken • Ignores register “slices” for simplicity • (al || ah) == ax == eax == rax
LEP Analyzer (2) • When the source operand of a given instruction is an immediate/constant, LEP tries it best to evaluate whether it _transforms_ or _overwrites_ the destination • If it overwrites, we finish the analysis for this branch • mov eax, deadf0f0h • Else if it transforms, we keep looking for another def of the same destination operand • inc eax • This gives a very special meaning for LEP’s existence • Otherwise, searching for occurences of the faulting data inside the input could be just as effective • LEP also tries to identify non-obvious constant overwrites • xor eax, eax
Engineering Tech-Talk • LEP was intended to be written entirely in Python • Didn’t work for performance reasons • LEP Tracer is written in C++, since it’s a WinDbg extension • It makes use of a reference of the x86 instruction set written in XML by MazeGen • The XML is mapped to C++ using CodeSynthesis’ XSD XML Data Binding • LEP Analyzer was firstly written in Python • Then I also re-wrote it in C++ • LEP Analyzer’s search algorithm was initially a DFS • Then I implemented it as a BFS
Demo II • Placeholder slide :>
Linkz & Refz • Cracking CrackMes • http://www.scanit.net/rd/wp/wp04 • X86 Opcode and Instruction Reference, by MazeGen • http://ref.x86asm.net/ • CodeSynthesis XSD – XML Data Binding for C++ • http://www.codesynthesis.com/products/xsd/ • Thousands of elite RE projects • http://www.google.com • Seriously though, contact me if you can’t find anything
Greetz & Shoutz • Filipe Balestra for lending me the bug used in the 2nd demo • H2HC crew for inspiring me to do this work • uCon Crew for having the elitest con ever • Everybody in the room for coming • The ERESI team, with whom I have most of my discussions about RE, programa analysis, etc • All of the great people that I know from the security scene • It’s simply impossible to mention each and everyone of you, but you know who you are!
Practical (Introduction to) Reverse Engineering Julio Auto <julio . auto *a* gmail>