210 likes | 369 Views
iBinHunt : Binary Hunting with Inter-Procedural Control Flow Jiang Ming, Meng Pan, and Debin Gao College of Information Sciences and Technology, Penn State University D’Crypt Pte Ltd School of Information Systems, Singapore Management University. 1. 2. 3. 1. 2. 3.
E N D
iBinHunt: Binary Hunting with Inter-Procedural Control FlowJiang Ming, Meng Pan, and DebinGao College of Information Sciences and Technology, Penn State University D’CryptPte Ltd School of Information Systems, Singapore Management University 1 2 3 1 2 3
Introduction Binary Hunting: automatically finding Semantic Differences in binary programs Need to capture Semantic Differences Differences in functionality (input-output behavior) SyntacticDifferencescause false positives Differences in instructions Register allocation Basic-block reordering Variables rename ….
An example: gzip Different instructions in two versions, but with the same semantics A patch with 5 lines of code All the 75 non-empty functions are changed Gzip Long File Name Buffer Overflow Vulnerability http://www.securityfocus.com/bid/3712 1 1
Importance of Binary Hunting Security applications of binary hunting Finding security vulnerabilities with patched binary “BinHunt: Automatically finding semantic differences in binary programs”, ICICS 2008 Automatic patch-based exploit (1-day exploit ) generation “Automatic Patch-Based Exploit Generation is Possible”, IEEE S&P 2008 Software plagiarism detection “GPLAG: detection of software plagiarism by program dependence graph analysis”, KDD 2006 Adapting trained anomaly detectors to software patches “Automatically adapting a trained anomaly detector to software patches”, RAID 2009 Malware analysis “Polymorphic worm detection using structural information of executables”, RAID 2005 “Large-scale malware indexing using function-call graphs”, CCS 2009 …
Challenge Source code of binary files is not available Function name extracted from these binary files are unreliable Variety of obfuscation …… • Latest solutions -- find similarity/difference in control flow structure • rather than binary instructions • Resistant to “superficial” changes • Example: BinDiff, BinHunt, DarunGrim, SMIT
Intra-procedural control flow vs. Inter-procedural control flow • Inter-procedural control flow • No function boundary • Huge graph with large size of nodes, where graph isomorphism is impractical Intra-procedural control flow Most previous work focus on the intra-procedural control flow. Sub-graph isomorphism problem is NP-complete. • Example: 96% of non-empty functions of thttpdhave fewer than 30 basic blocks. • Graph isomorphism is practical in analyzing intra-procedural control flow • Example: thttpd-2.25 totally has more than 4,300 basic blocks. More than 4,000 candidate matchings for single basic block
Function Transformation Obfuscation 1 Function transformation obfuscation is well-studied Inliningfunctions Outlining functions Cloning functions Interleaving functions Performing such obfuscation is simple and without intensive analysis of the binaries. Inlining and outlining transformations 1 C. Collberg, C. Thomborson, and D. Low. A taxonomy of obfuscating transformations. Technical Report 148, Department of Computer Sciences, The University of Auckland, July 1997.
Advanced control flow obfuscation • Redirecting control-flow with exceptions • “Binary Obfuscation Using Signals”, USENIX Security 2007 • “binOb+: a framework for potent and stealthy binary obfuscation”,AsiaCCS2010 Control flow flattening “Protection of software-based survivability mechanisms”, DSN 2001 “An Approach to the Obfuscation of Control-Flow of Sequential Computer Programs”,ISC 2001 • Function boundary information (Intra-procedural control flow) is not reliable !
Overview of iBinHunt iBinHunt: Binary Diffing with Inter-Procedural Control Flow Graphs iBinHuntprovides practical solutions to large number of basic block matchings Dynamic Tainting: Monitor the execution of the two binary programs under a common input and use taint analysis to record all basic blocks involved in the processing of the input. Deep taint: assign different taint tags to various parts of the input; only basic blocks from two binary programs that are marked with the same taint tags are considered matching candidates (a reduction factor of up to 74%). Basic block comparison: symbolic execution is first used to represent outputs of the basic blocks with their input symbols, and a theorem prover is then used to check if the outputs from the two basic block are semantically equivalent. Automatic input generation: increases the coverage of tainted basic blocks by automatically generating inputs that result in different execution traces.
Deep taint for basic block comparison Inter-Procedural Control Flow Graphs Deep Taint Deep taint execution trace Basic block comparison
An example: thttpd Input and its taint tag colors Dynamic execution traces with Deep taint
Basic Blocks comparison Symbolic execution and theorem proving Use symbolic execution to represent final values of outputs (registers and variables) Use a theorem prover to test if the outputs of two basic blocks are always the same given the same inputs Context aware the permutation of outputs of the equivalent basic blocks is the permutation of inputs of the successor blocks. Obtain the matching strength based on the result from the theorem
Basic block matching we need to consider two other groups of blocks for finding matched blocks. • Blocks are not semantically equivalent but with the same taint tags • Blocks are not tainted but on the dynamic execution trace • They could very likely be the differences between the two programs that iBinHunt is trying to locate. E.g., BB_13232 and BB_16184 are the location of binary difference • Due to various reasons including limitations of taint analysis, not directly processing program inputs (e.g., signal processing), etc.
Matching Strength Basic blocks B1 and B2are considered matched to one another if B1 and B2 have the same taint tags (possibly non-tainted) and B1 and B2are semantically equivalent (evaluated by symbolic execution and a theorem proving); or a predecessor of B1 and a predecessor of B2 match; or a successor of B1 and a successor of B2 match.
Automatic Input Generation Initial Input: GETindex.htmlHTTP/1.1 Host: . New Input Symbolic Formula Constraint Solver (STP) Concrete Execution Symbolic Execution
Evaluation We applied iBinHuntto find semantic differences in several versions of thttpd and gzip. There are two main aspects on which we want to evaluate: Efficiency: how many basic blocks can be matched under our definition of matching strength, how many matchings are identified by deep taint, and how long it takes to find these matchings. Accuracy: confirm these differences by comparing them to the ground truth (program source code). Different versions of thttpd and gzip (number of lines changed / total number of lines)
Matching basic blocks We evaluate: Matched basic blocks that are semantically the same; Matched ones that are not semantically equivalent but have both a predecessor and a successor matched; Basic blocks are not semantically equivalent but have either a predecessor or a successor matched. The time taken by input generation and deep taint;
Effectiveness of deep taint Results show that more than 34% and 67% of the matched basic blocks in thttpd and gzip contain the same taint tags. a large number of these matchings do contain the same taint tags; even though many basic blocks are not tainted by our limited number of program inputs, their neighbors are tainted in most cases and the tainted neighbors help matchings to be identified. Percentage of matched basic blocks with the same taint representation
Accuracy BB_1371 from thttpd-2.19 should match with BB_1689 in thttpd-2.25, both of which deal with the “-i” argument. However, BB_1687 in thttpd-2.25 also contains the same (type of) instructions, which confuses the binary diffing tool in the matching.
Discussions Limitations The power of iBinHuntis limited by the non-perfect basic block coverage. In our experiments with thttpd and gzip, some basic blocks are not covered even if we continue to generate new program inputs Performance Future work More optimization on the code to improve efficiency. Parallelizing Dynamic Taint Tracking More in-depth binary difference analysis, in which (part of) the programs are only semantically equivalent on certain subset of the inputs.
Conclusion Introduce function obfuscation attacks in existing binary diffing tools that analyze intra-procedural control flow of programs. Propose a novel binary diffing tool called iBinHunt which analyzes the inter-procedural control flow. iBinHunt makes use of a novel technique called deep taint.