330 likes | 655 Views
Binary Packing/Unpacking. Steve Matsumoto Timber Deng Donna Tang Nadim Taha JD Nir Carnegie Mellon University. Outline. Introduction Unpacking Frameworks Static Dynamic Anti-Unpacking Strategies Passive Active. Packers.
E N D
Binary Packing/Unpacking Steve Matsumoto Timber Deng Donna Tang NadimTaha JD Nir Carnegie Mellon University
Outline • Introduction • Unpacking Frameworks • Static • Dynamic • Anti-Unpacking Strategies • Passive • Active
Packers “Programs that transform an input binary’s appearance without affecting its execution semantics” Guo, Fanglu, Peter Ferrie, and Tzi-CkerChiueh. "A study of the packer problem and its solutions." Recent Advances in Intrusion Detection. Springer Berlin/Heidelberg, 2008.
Uses • Evade signature-based AV detection • Make reverse engineering more difficult • Sometimes used to mitigate piracy: • Themida / WinLicense • SoftwarePassport • ASPack/ASProtect • VMProtect • MoleBox • The Enigma Protector • Obsidium • …
Prevalence & Motivation • “over 80% of malware is packed” • “more than 50% of malware samples are simply repacked versions of existing malware” • “at least 75% of all malware binaries use code-packing techniques to protect their code from static analysis and modification” • “about 65% of [packed] executable files are known malware”
Simplified View Kang, Min Gyung, PongsinPoosankam, and Heng Yin. "Renovo: A hidden code extractor for packed executables." Proceedings of the 2007 ACM workshop on Recurring malcode. ACM, 2007.
Unpacking • Static • Mostly relies on identifying the packer used • Search for possible transition points, target addresses, etc. • Dynamic • Used in most unpacking research • Attempt to find unpacked code through execution of the binary
Dynamic Unpacking • Debugging (PolyUnpack) • Virtualization (Ether) • “Sandboxing” (OmniUnpack, Justin) • Process Emulation (Commercial AVs) • Whole System Emulation (Renovo, Pandora’s Bochs) • Dynamic Binary Instrumentation (MmmBop, TarteTatin, Saffron)
PolyUnpack • Automates extraction of packed code • Static analysis: generate code model • Dynamic analysis: query code model Static disassembly P Static Code View I Malware Continue Dynamic disassembly Is icur in I? Instruction sequence icur
Ether • Transparency by hardware virtualization • Shadow flags, page tables to avoid detection
OmniUnpack • Page-level monitoring • W^X is emulated in software (by hooking the page-fault handler and desynchronizing the TLB) • Scanning is performed on the first “dangerous” system call if dirty pages were executed since the last scan • The dirty page set is cleared after the scan • Relatively low overhead (~11% on packed programs) • Does not capture the OEP
Justin • Page-level monitoring • W^X is enforced by the hardware(NX bit) • Claims to preserve entry-point information • Uses a set of heuristics to detect the end of unpacking and holistically scan the memory image • Assumptions • The entire binary is unpacked at load-time • A dirty page is executed on completion • The stack pointer is restored before execution • Command line arguments are copied to the stack • The process address space layout is preserved
Comparing OmniUnpack and Justin • Both assume the entire binary is unpacked before any of the unpacked pages are executed • OmniUnpack is meant for continuous monitoring while Justin attempts to detect the end of unpacking and triggers a scan on the entire memory image • Justin relies on a fragile set of heuristics to trigger the scan • Justin visibly modifies userspacestate (vectored exception handlers are maintained in a linked-list residing on the heap) • Justin can sometimes recover the OEP
Renovo • Emulation-based approach (TEMU) • Byte-level memory tracking • Makes use of a kernel module to reason about OS-level semantics • Every write instruction is instrumented • Execution of dirty instructions is only checked at the exit point of every basic block • Dirty pages are dumped every time a dirty memory location is executed • The default timeout value is set to 4 minutes • 8 times slower than native execution
MmmBop • Custom DBI framework • Does not support multithreading (unlike PIN) • Dynamically instruments every basic block to jump back to the DBI engine at the exit point before executing it • Instrumented basic blocks are executed out-of-place • Hooks the KiUserExceptionDispatcher to sanitize the EXCEPTION_RECORD and CONTEXT structures • Instruments call instructions to push the original return value on the stack • Instruments memory writes to handle self-modifying code • Keeps track of and emulates hardware breakpoints • Relies on a user-specified address range to identify the OEP • Defeatableby code introspection
Dynamic Unpacking Comparison • Emulation &DBI provide finer-grained control • Emulation does not require OS/hardware support and is therefore the most portable • Single-step debugging is the least efficient strategy • HW-assisted virtualization & emulation are the safest • All vulnerable to time-based detection
Anti-unpacking Strategies • Passive: • Anti-disassembly • Active: • Anti-dumping • Anti-debugging • Anti-emulation • Anti-virtualization
ClamAV – Signature Statistics (03/19/10) • NDB literal signature count: 55394 - Shortest signature is 4 bytes long - Longest signature is 392 bytes long - Average signature length is 120.2 bytes - 51245 of these target Portable Executables (PEs) - 46877 unique 3B prefixes & 48536 unique 4B prefixes • DB literal signature count: 28819 - Shortest signature is 10 bytes long - Longest signature is 210 bytes long - Average signature length is 67.2 bytes - 19898 unique 3B prefixes & 23116 unique 4B prefixes P(page boundary) ~=3%
OmniUnpack “To guarantee that the malware is blocked in time, the signature should describe code that is present in memory when a dangerous system call is about to damage the system”
OmniUnpack What about system(“rm –Rf /*”)?
Anti-debugging • “sti; hlt” and “mov/pop ss; pop esp” are atomic pushf pop eax and eax, 100h jnz “exit”
Anti-debugging • “sti; hlt” and “mov/pop ss; pop esp” are atomic push ss pop ss pushf pop eax and eax, 100h jnz “exit”
Anti-emulation • Time-based detection • Lower time bound on a serializing instruction • Prefetchers/Branch predictors • Time-lock puzzles • Make the emulator timeout • Improper emulation of the execution environment • SEH invoked unpacking stubs • srand()/rand() • Incomplete ISA support • Floating-point instructions
Anti-virtualization • Red Pill intswallow_redpill () { unsigned char m[2+4], rpill[] = "\x0f\x01\x0d\x00\x00\x00\x00\xc3"; *((unsigned*)&rpill[3]) = (unsigned)m; ((void(*)())&rpill)(); return (m[5]>0xd0) ? 1 : 0; } • Hypervisor present bit (ECX:31 of leaf 0x1)
Review • Unpacking Frameworks • Static • Dynamic • Anti-Unpacking Strategies • Passive • Active Packing/Unpacking is a perpetual arms race