1 / 40

Practical Introduction to Reverse Engineering

Agenda. Part I - 101Why this presentation? (I mean... WHY?!?!)A few concepts (Mumble jumble )Demo (Show me the goods)Part II - 1337Advancing RE (Do your own!)Something extra (Finish pretty)Linkz, lulz, refz, and shoutzQ

paul2
Download Presentation

Practical Introduction to Reverse Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Julio Auto <julio . auto *a* gmail> Practical (Introduction to) Reverse Engineering

    2. Agenda Part I - 101 Why this presentation? (I mean... WHY?!?!) A few concepts (Mumble jumble++) Demo (Show me the goods) Part II - 1337 Advancing RE (Do your own!) Something extra (Finish pretty) Linkz, lulz, refz, and shoutz Q & (maybe) A

    3. Why? Initially suggested by the H2HC crew Based on my article Cracking CrackMes, published earlier this year while working for my previous employer, Scanit ME RE is getting lots of attention, and many people seem interested in learning it Still, it remains largely a black art

    4. Why? (2) It seems, then, that moving up from ground zero is the most problematic step This presentation tries to help fix it It aims to expose instant useful knowledge And pointers to where go digging deeper Instead of advanced research _results_, basic _techniques_ and _processes_ Obs.: Well be targeting the Windows platform most of the time in this speech

    5. Concepts Reverse Engineering is a very self-explicative term You take something and, from there, try to learn how (some aspect of) it was engineered Its also obviously broad For example, its often used to describe the process through which you generate a higher-level, architectural view of a piece of software given its source code

    6. My Own Concept Think of the times you asked yourself why and how and let it go without an answer... ... ... ... RE is not letting go

    7. A Few Applications Malware Analysis Vulnerability Analysis Security Assessment of 3rd-party COTS Evaluation/Breaking of copy-protection schemes Assorted hows and whys

    8. Why Still a Black Art? Perhaps because people think its only good for SW cracking ? Perhaps because DRM has become a nightmare no one is happy with and related laws everywhere bash reversers too hard every now and then ? (does anybody remember Dmitry Sklyarov, the DMCA and all that madness?) Perhaps because many people still think it should be illegal ? (wtf?!)

    9. How To Learn The Crack-Me approach The one I illustrate in the paper I mentioned Small and targeted challenges with different levels and obstacles to choose from The real life approach Choose a real-world problem and attack it Tough but rewarding Well demo a bit of both

    10. Tools of The Trade Probably millions of tools that can give you some useful piece of info about your target Ill try to restrict myself to the most relevant/common, then Unfortunately, many of the best tools are commercial On the other hand, many of them have free/student/evaluation versions ? For the rest... Well, remember the real life approach? ;)

    11. Debuggers Obvious importance Fairly good variety Its nice to play and know your way with all of them But mastering them all is quite hard, so youll most likely elect your debugger of choice in little time Choose your debugger well!

    12. Debuggers (2) WinDbg My personal choice of debugger Developed by MSFT Comes for free in the Debugging Tools for Windows package Amazingly rich in features Extensible with some C++ programming Not the easiest or simplest dev environment Very rich API, though Poor interface

    13. Debuggers (3) Visual Studio Debugger Its crap, not suited for reversing But its pretty and nice for developers :> Seriously, dont try to go very far reversing with it It may use up the rest of your sanity

    14. Debuggers (4) OllyDbg Enjoys quite a lot of popularity in the reversing community Nice interface In particular, a nice disassembly view Comes in a few tuned versions, being one of the most popular...

    15. Debuggers (5) Immunity Debugger Developed by Immunity Inc. (one of uCons proud sponsors) Extends OllyDbg with a python interpreter and exposes a couple of debugging modules for the user to interact with Very neat plugin support Embeds a command-line with windbg-aliased commands Maintains a forum to support developers/users of ImmDbg plugins

    16. Debuggers (6) gdb The standard debugger on *NIX systems Quite complete debugger Not the best thing in the RE world, but overall a good debugger

    17. Disassemblers Reading assembly is not the sweetest thing for most people The way the code is represented is extremely important and makes an increasingly great difference in big RCE tasks Therefore, being confortable with your disassembler is essential

    18. Disassemblers (2) Pretty much every debugger is capable of disassembling Apart of that, theres lots of other tools that can do it too In Linux, objdump is pretty much a standard tool However, one particular tool is specially known for its disassembly features

    19. Disassemblers (3) IDA Pro Supports many binary formats and architectures Displays the code in graphs, which greatly enhance the visualization Block-level CFGs Many things can be customized/adjusted Graph layout, data types, annotations... Quite frankly, its in every reversers toolkit IDA Pro is a commercial tool currently in version 5.4 But version 4.9 is available in a free edition ?

    20. System Monitoring Tools All of those from the SysInternals Suite Process Explorer RegMon FileMon TCPView Etc...

    21. Advanced Tools Binary Differs BinDiff Decompilers Hex-Rays RE Frameworks ERESI ;) PaiMei and all the PyThings

    22. Demo Well try and beat a crack-me challenge This crack-me was taken from a real competition HITB Dubai 2007 CTF Perhaps it can serve as a tip for uCons CTF as well ?

    23. RE Advanced Topics Cutting to the chase, advancing RE basically means automating stuff Many of the RE tools are scriptable/programmable/extensible Developing smart ways to deal with repetitive tasks is the way for more effective analyses

    24. RE Advanced Topics (2) Less often, you might see opportunities to advance RE in ways not based on automation Defeating a new anti-debug trick Developing new environments for RE Virtualization, Sandboxing... Or even radically changing paradigms E.g. The graph-based approach to binary navigation

    25. RE Advanced Topics (3) Perhaps the most important lesson here is not to reinvent the wheel Re-use the tools you have! Youll be amazed at how much stuff you can do by glueing pieces together Having that said... Perhaps the tools you have are not perfect Or you might wanna re-do something just for learning But be sure to have the right goals in mind!

    26. Teaching By Example I will demonstrate how you can use advanced RE to solve real life problems The main idea behind the re-use thing I mentioned in the previous is slide is too keep your solution simple, by focusing on the logics itself rather than in the engineering Unfortunately, what Im about to show is actually a bad example in this aspect ? (more on this later)

    27. Problem Suppose you have ways to reproduce a high-profile, possibly exploitable bug Yay! BUT.... The target is closed-source software The target is as large and complex as an operating system and way less documented The input is huge and has a complex, possibly undisclosed format The source of the bug can be anywhere in the input From user-input to actual bug/crash, about 3 million instructions happen

    28. WHAT DO YOU DO????

    29. Introducing LEP LEP tries to answer a big question in this problem: What exact part of this input is causing the bug? If you can answer this question and somehow co-relate this with the input format, you may gain a great deal of understanding of the bug For this, I have invented a new technique: Staged Partial Tracing-Based Backwards Taint Analysis Because not sounding like a Ph.D. is so 2001 :> And also because we all just love new terms we can go media-cuckoo about

    30. Introducing LEP (2) One-liner idea: If we know when our input is brought to memory and know where its mapped, we can trace the program from this point to the crash and then go backwards analyzing the dataflow to find out where the faulting data came from We do it in two stages, with a component for each: the tracer and the analyzer Simple, huh?

    31. Fundamental Concepts When we trace the program, it becomes linear, i.e. control-flow is irrelevant Dataflow becomes concretely deterministic Aliasing is not an issue (no need to theorize on side-effects) All info we need is available in runtime In particular, effective addresses If the input is as big as the problem states, it should be no problem to find it in memory We get most of the info we need from the disassembly text (ASCII)! Its like hacking with grep again!

    32. LEP Tracer A WinDbg extension Traces every instruction until the program raises an exception Dumps the following instruction info to a file: Mnemonic Destination operand Source operand Dependences of the source op e.g. mov eax,[ecx+edx*2]

    33. LEP Tracer (2) Discards control-flow changing instructions Discards in/out instructions (all relevant input should be in memory already?) Discards other groups of instructions that will be supported as we go FPU, MMX, SSE{2,3}, etc... Tries to parse the right info even when the debugger is too stupid to work as expected ? Why not to compute effective addresses in reped instructions?

    34. LEP Analyzer Reads the file generated by the tracer and goes bottom-up investigating the dataflow You have to specify the piece of data that causes the last instruction to fail usually (always?) a register And the memory range(s) where your input was mapped into, at the time the trace was taken Ignores register slices for simplicity (al || ah) == ax == eax == rax

    35. LEP Analyzer (2) When the source operand of a given instruction is an immediate/constant, LEP tries it best to evaluate whether it _transforms_ or _overwrites_ the destination If it overwrites, we finish the analysis for this branch mov eax, deadf0f0h Else if it transforms, we keep looking for another def of the same destination operand inc eax This gives a very special meaning for LEPs existence Otherwise, searching for occurences of the faulting data inside the input could be just as effective LEP also tries to identify non-obvious constant overwrites xor eax, eax

    36. Engineering Tech-Talk LEP was intended to be written entirely in Python Didnt work for performance reasons ? LEP Tracer is written in C++, since its a WinDbg extension It makes use of a reference of the x86 instruction set written in XML by MazeGen The XML is mapped to C++ using CodeSynthesis XSD XML Data Binding LEP Analyzer was firstly written in Python Then I also re-wrote it in C++ LEP Analyzers search algorithm was initially a DFS Then I implemented it as a BFS

    37. Demo II Placeholder slide :>

    38. Linkz & Refz Cracking CrackMes http://www.scanit.net/rd/wp/wp04 X86 Opcode and Instruction Reference, by MazeGen http://ref.x86asm.net/ CodeSynthesis XSD XML Data Binding for C++ http://www.codesynthesis.com/products/xsd/ Thousands of elite RE projects http://www.google.com ? Seriously though, contact me if you cant find anything

    39. Greetz & Shoutz Filipe Balestra for lending me the bug used in the 2nd demo H2HC crew for inspiring me to do this work uCon Crew for having the elitest con ever Everybody in the room for coming ? The ERESI team, with whom I have most of my discussions about RE, programa analysis, etc All of the great people that I know from the security scene Its simply impossible to mention each and everyone of you, but you know who you are!

    40. Questions?

    41. Julio Auto <julio . auto *a* gmail> Practical (Introduction to) Reverse Engineering

More Related