400 likes | 792 Views
Agenda. Part I - 101Why this presentation? (I mean... WHY?!?!)A few concepts (Mumble jumble )Demo (Show me the goods)Part II - 1337Advancing RE (Do your own!)Something extra (Finish pretty)Linkz, lulz, refz, and shoutzQ
E N D
1. Julio Auto
<julio . auto *a* gmail> Practical (Introduction to) Reverse Engineering
2. Agenda Part I - 101
Why this presentation? (I mean... WHY?!?!)
A few concepts (Mumble jumble++)
Demo (Show me the goods)
Part II - 1337
Advancing RE (Do your own!)
Something extra (Finish pretty)
Linkz, lulz, refz, and shoutz
Q & (maybe) A
3. Why? Initially suggested by the H2HC crew
Based on my article Cracking CrackMes, published earlier this year while working for my previous employer, Scanit ME
RE is getting lots of attention, and many people seem interested in learning it
Still, it remains largely a black art
4. Why? (2) It seems, then, that moving up from ground zero is the most problematic step
This presentation tries to help fix it
It aims to expose instant useful knowledge
And pointers to where go digging deeper
Instead of advanced research _results_, basic _techniques_ and _processes_
Obs.: Well be targeting the Windows platform most of the time in this speech
5. Concepts Reverse Engineering is a very self-explicative term
You take something and, from there, try to learn how (some aspect of) it was engineered
Its also obviously broad
For example, its often used to describe the process through which you generate a higher-level, architectural view of a piece of software given its source code
6. My Own Concept Think of the times you asked yourself why and how and let it go without an answer...
...
...
...
RE is not letting go
7. A Few Applications Malware Analysis
Vulnerability Analysis
Security Assessment of 3rd-party COTS
Evaluation/Breaking of copy-protection schemes
Assorted hows and whys
8. Why Still a Black Art? Perhaps because people think its only good for SW cracking ?
Perhaps because DRM has become a nightmare no one is happy with and related laws everywhere bash reversers too hard every now and then ? (does anybody remember Dmitry Sklyarov, the DMCA and all that madness?)
Perhaps because many people still think it should be illegal ? (wtf?!)
9. How To Learn The Crack-Me approach
The one I illustrate in the paper I mentioned
Small and targeted challenges with different levels and obstacles to choose from
The real life approach
Choose a real-world problem and attack it
Tough but rewarding
Well demo a bit of both
10. Tools of The Trade Probably millions of tools that can give you some useful piece of info about your target
Ill try to restrict myself to the most relevant/common, then
Unfortunately, many of the best tools are commercial
On the other hand, many of them have free/student/evaluation versions ?
For the rest... Well, remember the real life approach? ;)
11. Debuggers Obvious importance
Fairly good variety
Its nice to play and know your way with all of them
But mastering them all is quite hard, so youll most likely elect your debugger of choice in little time
Choose your debugger well!
12. Debuggers (2) WinDbg
My personal choice of debugger
Developed by MSFT
Comes for free in the Debugging Tools for Windows package
Amazingly rich in features
Extensible with some C++ programming
Not the easiest or simplest dev environment
Very rich API, though
Poor interface
13. Debuggers (3) Visual Studio Debugger
Its crap, not suited for reversing
But its pretty and nice for developers :>
Seriously, dont try to go very far reversing with it
It may use up the rest of your sanity
14. Debuggers (4) OllyDbg
Enjoys quite a lot of popularity in the reversing community
Nice interface
In particular, a nice disassembly view
Comes in a few tuned versions, being one of the most popular...
15. Debuggers (5) Immunity Debugger
Developed by Immunity Inc. (one of uCons proud sponsors)
Extends OllyDbg with a python interpreter and exposes a couple of debugging modules for the user to interact with
Very neat plugin support
Embeds a command-line with windbg-aliased commands
Maintains a forum to support developers/users of ImmDbg plugins
16. Debuggers (6) gdb
The standard debugger on *NIX systems
Quite complete debugger
Not the best thing in the RE world, but overall a good debugger
17. Disassemblers Reading assembly is not the sweetest thing for most people
The way the code is represented is extremely important and makes an increasingly great difference in big RCE tasks
Therefore, being confortable with your disassembler is essential
18. Disassemblers (2) Pretty much every debugger is capable of disassembling
Apart of that, theres lots of other tools that can do it too
In Linux, objdump is pretty much a standard tool
However, one particular tool is specially known for its disassembly features
19. Disassemblers (3) IDA Pro
Supports many binary formats and architectures
Displays the code in graphs, which greatly enhance the visualization
Block-level CFGs
Many things can be customized/adjusted
Graph layout, data types, annotations...
Quite frankly, its in every reversers toolkit
IDA Pro is a commercial tool currently in version 5.4
But version 4.9 is available in a free edition ?
20. System Monitoring Tools All of those from the SysInternals Suite
Process Explorer
RegMon
FileMon
TCPView
Etc...
21. Advanced Tools Binary Differs
BinDiff
Decompilers
Hex-Rays
RE Frameworks
ERESI ;)
PaiMei and all the PyThings
22. Demo Well try and beat a crack-me challenge
This crack-me was taken from a real competition
HITB Dubai 2007 CTF
Perhaps it can serve as a tip for uCons CTF as well ?
23. RE Advanced Topics Cutting to the chase, advancing RE basically means automating stuff
Many of the RE tools are scriptable/programmable/extensible
Developing smart ways to deal with repetitive tasks is the way for more effective analyses
24. RE Advanced Topics (2) Less often, you might see opportunities to advance RE in ways not based on automation
Defeating a new anti-debug trick
Developing new environments for RE
Virtualization, Sandboxing...
Or even radically changing paradigms
E.g. The graph-based approach to binary navigation
25. RE Advanced Topics (3) Perhaps the most important lesson here is not to reinvent the wheel
Re-use the tools you have!
Youll be amazed at how much stuff you can do by glueing pieces together
Having that said...
Perhaps the tools you have are not perfect
Or you might wanna re-do something just for learning
But be sure to have the right goals in mind!
26. Teaching By Example I will demonstrate how you can use advanced RE to solve real life problems
The main idea behind the re-use thing I mentioned in the previous is slide is too keep your solution simple, by focusing on the logics itself rather than in the engineering
Unfortunately, what Im about to show is actually a bad example in this aspect ? (more on this later)
27. Problem Suppose you have ways to reproduce a high-profile, possibly exploitable bug Yay!
BUT....
The target is closed-source software
The target is as large and complex as an operating system and way less documented
The input is huge and has a complex, possibly undisclosed format
The source of the bug can be anywhere in the input
From user-input to actual bug/crash, about 3 million instructions happen
28. WHAT DO YOU DO????
29. Introducing LEP LEP tries to answer a big question in this problem:
What exact part of this input is causing the bug?
If you can answer this question and somehow co-relate this with the input format, you may gain a great deal of understanding of the bug
For this, I have invented a new technique: Staged Partial Tracing-Based Backwards Taint Analysis
Because not sounding like a Ph.D. is so 2001 :>
And also because we all just love new terms we can go media-cuckoo about
30. Introducing LEP (2) One-liner idea: If we know when our input is brought to memory and know where its mapped, we can trace the program from this point to the crash and then go backwards analyzing the dataflow to find out where the faulting data came from
We do it in two stages, with a component for each: the tracer and the analyzer
Simple, huh?
31. Fundamental Concepts When we trace the program, it becomes linear, i.e. control-flow is irrelevant
Dataflow becomes concretely deterministic
Aliasing is not an issue (no need to theorize on side-effects)
All info we need is available in runtime
In particular, effective addresses
If the input is as big as the problem states, it should be no problem to find it in memory
We get most of the info we need from the disassembly text (ASCII)! Its like hacking with grep again!
32. LEP Tracer A WinDbg extension
Traces every instruction until the program raises an exception
Dumps the following instruction info to a file:
Mnemonic
Destination operand
Source operand
Dependences of the source op e.g. mov eax,[ecx+edx*2]
33. LEP Tracer (2) Discards control-flow changing instructions
Discards in/out instructions (all relevant input should be in memory already?)
Discards other groups of instructions that will be supported as we go
FPU, MMX, SSE{2,3}, etc...
Tries to parse the right info even when the debugger is too stupid to work as expected ?
Why not to compute effective addresses in reped instructions?
34. LEP Analyzer Reads the file generated by the tracer and goes bottom-up investigating the dataflow
You have to specify the piece of data that causes the last instruction to fail usually (always?) a register
And the memory range(s) where your input was mapped into, at the time the trace was taken
Ignores register slices for simplicity
(al || ah) == ax == eax == rax
35. LEP Analyzer (2) When the source operand of a given instruction is an immediate/constant, LEP tries it best to evaluate whether it _transforms_ or _overwrites_ the destination
If it overwrites, we finish the analysis for this branch
mov eax, deadf0f0h
Else if it transforms, we keep looking for another def of the same destination operand
inc eax
This gives a very special meaning for LEPs existence
Otherwise, searching for occurences of the faulting data inside the input could be just as effective
LEP also tries to identify non-obvious constant overwrites
xor eax, eax
36. Engineering Tech-Talk LEP was intended to be written entirely in Python
Didnt work for performance reasons ?
LEP Tracer is written in C++, since its a WinDbg extension
It makes use of a reference of the x86 instruction set written in XML by MazeGen
The XML is mapped to C++ using CodeSynthesis XSD XML Data Binding
LEP Analyzer was firstly written in Python
Then I also re-wrote it in C++
LEP Analyzers search algorithm was initially a DFS
Then I implemented it as a BFS
37. Demo II
Placeholder slide :>
38. Linkz & Refz Cracking CrackMes
http://www.scanit.net/rd/wp/wp04
X86 Opcode and Instruction Reference, by MazeGen
http://ref.x86asm.net/
CodeSynthesis XSD XML Data Binding for C++
http://www.codesynthesis.com/products/xsd/
Thousands of elite RE projects
http://www.google.com ?
Seriously though, contact me if you cant find anything
39. Greetz & Shoutz Filipe Balestra for lending me the bug used in the 2nd demo
H2HC crew for inspiring me to do this work
uCon Crew for having the elitest con ever
Everybody in the room for coming ?
The ERESI team, with whom I have most of my discussions about RE, programa analysis, etc
All of the great people that I know from the security scene
Its simply impossible to mention each and everyone of you, but you know who you are!
40. Questions?
41. Julio Auto
<julio . auto *a* gmail> Practical (Introduction to) Reverse Engineering