160 likes | 395 Views
Partial Automation of an Integration Reverse Engineering Environment of Binary Code. Author : Cristina Cifuentes Reverse Engineering, 1996., Proceedings of the Third Working Conference on On page(s): 50 - 56 8-10 Nov. 1996 Monterey, CA, USA. Introduction. What’s the problem?
E N D
Partial Automation of an Integration Reverse Engineering Environment of Binary Code Author : Cristina Cifuentes Reverse Engineering, 1996., Proceedings of the Third Working Conference on On page(s): 50 - 56 8-10 Nov. 1996 Monterey, CA, USA
Introduction • What’s the problem? • Investment made on software when newer machine is available. • Two points of view for migration of software: • From a commercial view: • Software needs to be available on the new machine at the same time. • From a software developer’s point of view: • Software developed in-house is an investment and asset to an organization. • Software migration is not a trivial problem!!
Four approaches to solve this problem • Use a native compiler to compile the source code for the new platform. • Emulation of old machine’s instructions using micro-code hardware in new machine. • Emulation of old machine’s instructions in software in new machine. • Binary translation
Problems • On using a native compiler to compile the source code: • Compilation requires access to all source code, which may not be feasible. • On Emulation of old machine’s instructions using micro-code hardware • It’s requires special micro-programmable hardware, which is not include in today’s RISC machine. • On Emulation of old machine’s instructions in software • Software emulation is easy to implement but slow.
Structure of a Binary Translator and a De-compiler • Front-end: • The front-end is a machine-dependent module that loads the source binary program, disassembles it, and translates it into an intermediate representation. • Middle-end: • Performs the code analysis for the translation, and performs optimizations on the code • Back-end: • It is a target machine-dependent module that generates code for the target machine
An Integrated Reverse Engineering Environment for Binary Code • Loader • Disassembler • Signature generator • Prototype generator • New Jersey machine-code toolkit (NJMC) • Idiom analyzer • Control flow graph generator • UBM/UDM
Loader • Just like the operating system loader. • Read the binary file by decoding the binary-file format used to store the program, and determine the file’s structure (instructions, tables, symbol tables).
Disassembler • Parses the binary image of the program and translates it to assembler or some equivalent representation. • It parsed starting at the entry point and following all paths from this point. • Analysis address of indexed and indirect jumps or calls
Idiom analyzer • Detect idioms and translates the sequence of instructions into intermediate instructions. • An idiom is a sequence o instructions that has a special meaning that can't be derived from semantics of the individual instructions alone. • Examples: • ARM : • bl foo • X86 • Sub ax,immedLo • Sbb ax,immedHi • = sub dx:ax, immedHi:immedLo
Control flow graph generator • Constructs a control flow graph for each subroutine of the program. • The control flow graph is part of the intermediate representation of any reverse engineering tool that deals with binary code.
Second Generation Tools • Signature generator • Automatically determines library signatures • Prototype generator • Automatically determines the types of the formal arguments of library subroutines, and the type of the return value for functions. • New Jersey machine-code toolkit (NJMC) • Facilitate the decoding of machine instructions by provide a specification language to define machine instructions.
UBM/UDM • Universal binary-translation machine • Generates binary programs for target machine • Universal decompilation machine • Generates high-level language (like C).
Conclusions • This paper presents an integrated environment for the reverse engineering of binary programs. • Such environment is suitable for the development of disassemblers, binary translators and decompilers. • Make retargetable techniques essential in order to develop such tools for a variety of machines rather than for one specific machine.