260 likes | 279 Views
Explore how Conditional Code Obfuscation can thwart powerful input-oblivious analyzers in malware analysis. Learn about its implementation, implications, and more. Presented at NDSS 2008.
E N D
Impeding Malware Analysis Using Conditional Code Obfuscation Paper by: Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee Conference: Network and Distributed System Security Symposium (NDSS), 2008 Presented by: LIU Limin
Outline • Introduction • Conditional Code Obfuscation • Implications • Implementation and Evaluation • Discussion
Introduction • Hundreds of new malware samples appear every day. • Trojans, Rootkits, Worms, Viruses, Backdoors … • Automated malware analysis becomes increasingly important. • Static analysis • Dynamic analysis • State-of-the-art analyzer
Malware Analysis • Defense • Static analysis • Dynamic analysis • Input-oblivious analyzers (Dynamic multiple path exploration, Forcedexecution) • Offense • Polymorphism, metamorphism and opaque predicates. • Trigger based behavior. (time-bombs, logic-bombs, bot commands etc.) ?
Obfuscation • Obfuscations that are easily to be applicable on existing code can be a threat. • Conditional Code Obfuscation: A simple, automated and transparent obfuscation against powerful input-oblivious analyzer.
Outline • Introduction • Conditional Code Obfuscation • Implications • Implementation And Evaluation • Discussion
Conditional Code Snippets • E.g.1 cmd = get_command (sock); if (strcmp (cmd, “startkeylogger”) == 0) { log_keys(); } • E.g.2 n = get_day_of_month (); if ((n > 10) && (n<20)) { attack(); }
Obfuscated example snippet • Original code cmd = get_command (sock); if (strcmp (cmd, “startkeylogger”) == 0) { log_keys(); } • Obfuscated code One-way cmd = get_command (sock); if (hash (cmd) == H) /* here, H= hash(“startkeylogger”)*/ { decrypt_function (encr_log_keys, cmd); encr_log_keys(); /*encrypted log_keys*/ }
General Obfuscation Mechanism • Hash properties • Pre-image resistance: infeasible to find c given Hc. • Second pre-image resistance: hard to find another c’ for which Hash (c’) = Hc . • Candidate conditions • Equality operators: ‘==’, strcmp, strncmp, memcmp… • Unsupported operators: ‘>’, ‘<’… • Conditional code • Code that gets executed when a condition is satisfied.
Automation using Static Analysis • Finding Conditional Code • Identify candidate conditions • Construct a CFG for each function • Identify basic blocks having conditional branches • Select candidate conditions those contain equality operators • Find corresponding conditional code • Intra-procedural: basic blocks which are control dependent on condition with true outcome • Inter-procedural: set of functions which only be reachable when certain condition is satisfied
Automation using Static Analysis • Handling Common Conditional Code • Duplicate the code and encrypt it separately for each candidate condition.
Simplifying Compound Constructs • Operators (&& or ||…) combine more than one simple condition • Break the compound conditions into semantically equivalent but simplified conditions
Outline • Introduction • Conditional Code Obfuscation • Implications • Implementation And Evaluation • Discussion
Consequences to Existing Analyzers • Path exploration and input discovery • Construct constraints for each path (e.g. X == c ). • Input Discovery (EXE) • Discover inputs from constraints by using symbolic execution. Obfuscated constraints is “Hash (X) == Hc” Infeasible to reverse the hash function.
Consequences to Existing Analyzers • Forcing execution • Force execution along a specific path without solving the constraints • Without key, program crashes. • Static analysis • Conceal the behavior in the encrypted block
Attacks • Brute Force and Dictionary Attacks • Constraint: Hash (X) = Hc • Find possible X for satisfying above equation. • Domain (X) : set of all possible values that X may take during execution. • t: time taken to a test a single value of X or the hash computation time. • Brute Force attempt: time = |Domain (X)|* t . • If X is n bits in length, attack requires 2nt time.
Outline • Introduction • Conditional Code Obfuscation • Implications • Implementation And Evaluation • Discussion
Implementation • Platform: Linux • Input: C/C++ Source; Output: ELF Binary • Four phases: • Front-end Code Parsing Phase • Analysis/Transformation Phase • Code Generation Phase • Encryption Phase • Two Levels: • Binary level: decrypted code is executable • Intermediate code level: data types information
Analysis phase • Candidate Condition Replacement • Identify candidate conditions and their conditional code • Hash function: SHA-256 • Decipher Routine • Encryption algorithm: AES with 256-bit keys • Decryption Key and Markers • Key (X) = Hash (X|N), N is a nonce. • marker: foresee the exact location of the corresponding code in the resulting binary file.
Encryption phase • Identify code blocks needing encryption. • Extracts the encryption key Kc. • Replace K c and End_marker() with NOP instructions. • Calculate the size of the block to be encrypted. • Place the size as argument to the call to Decipher. • Encrypt the block with the key Kc.
Experimental Evaluation • Evaluate system by determining how many manually identified trigger-based malicious behavior were automatically and completed obfuscated. • Three levels of obfuscation strength: • Strong: strings • Medium: integers • Weak: boolean flags
Outline • Introduction • Conditional Code Obfuscation • Implications • Implementation And Evaluation • Discussion
Strengths • Malware author can modify the programs to improve the strengths. • Introducing more candidate conditions. • Query for resources and compare with the names. • Replace operators such as <, > or != by ==. • Increasing the size of the concealed code. • Incorporate triggers that encapsulates more execution behavior. • Increasing the input domains. • Use variables with larger domains (e.g., string) or use integer with larger size.
Weakness • Limited types of conditions • Equality checks. • Input domain may be very small in some cases. • 32-bit or 64-bit integers.
Possible ways to defeat • Equipped with decryptors that reduce the search space of keys by taking the input domain into account. • the result or an argument receiving data from a system call, e.g. gettimeofday. • Input-aware analysis. • Collection mechanisms capture interaction of the binary with its environment.
Conclusion • An obfuscation scheme that can be automatically applied on malware programs. • The obfuscation conceal trigger based-malicious behavior from state-of-the-art malware analyzers. • It is shown that the obfuscation scheme is capable of concealing a large fraction of malicious triggers by experiment.