490 likes | 675 Views
Malware Analysis and Instrumentation. Andrew Bernat and Kevin Roundy. Paradyn Project. Center for Computing Science June 14, 2011. Forensic analysts need help. 90% of malware resists analysis [1] Malware attacks cost billions of dollars annually [2]
E N D
Malware Analysis and Instrumentation • Andrew Bernat and Kevin Roundy Paradyn Project Center for Computing Science June 14, 2011
Forensic analysts need help • 90% of malware resists analysis[1] • Malware attacks cost billions of dollars annually[2] • 65% of users feel effect of cyber crime[3] • 69% cybercrimes are resolved[3] • 28 days on average to resolve a cybercrime[3] Malware Binary 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 0c 85 a5 94 2b 20 fd 5b 95 Malware Analysis and Instrumentation [1] McAfee. 2008 [2] Computer Economics. 2007 [3] Norton. 2010
Forensic analysts need help The needed toolbox • Binary code identification • Control- and data-flow analysis • Instrumentation • Effectiveness on malware Malware Binary 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 0c 85 a5 94 2b 20 fd 5b 95 Malware Analysis and Instrumentation
Dyninst is a toolbox for analysts library injection function replace- ment loop, block, function, instruction instrument- ation symbol table reading, writing forward & backward slices machine language parsing CFG loop analysis call stack walking Dyninst Dyninst binary rewriting program binary 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 Control flow analyzer Data flow analyzer Instrumenter process control Malware Analysis and Instrumentation
Dyninst is a toolbox for analysts Analysis tool library injection function replace- ment loop, block, function, instruction instrument- ation symbol table reading, writing Mutator forward & backward slices machine language parsing CFG CFG • Specifies instrumentation • Gets callbacks for runtime events • Builds high-level analysis loop analysis call stack walking Dyninst Dyninst binary rewriting program binary 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 Control flow analyzer Data flow analyzer Instrumenter process control Malware Analysis and Instrumentation
Dyninst is a toolbox for analysts Code snippets printf(…) getTarget(insn) counter++ if (pred) callback(…) Code visualizations Analysis tool Analysis of network communications Mutator CFG • Specifies instrumentation • Gets callbacks for runtime events • Builds high-level analysis Time bomb detection and analysis Identification of stolen data Dyninst program binary 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 Reports on anti-analysis techniques Control flow analyzer Data flow analyzer Instrumenter Malware Analysis and Instrumentation
Dyninst on malware Code snippets printf(…) getTarget(insn) counter++ if (pred) callback(…) Code visualizations Code visualizations Malware defeats static analysis & is sensitive to instrument-ation Analysis tool Analysis of network communications Analysis of network communications Mutator CFG • Specifies instrumentation • Gets callbacks for runtime events • Builds high-level analysis Time bomb detection and analysis Time bomb detection and analysis Identification of stolen data Identification of stolen data Dyninst malware binary 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 Reports on anti-analysis techniques Reports on anti-analysis techniques Control flow analyzer Data flow analyzer Instrumenter Malware Analysis and Instrumentation
Dyninst on malware Code snippets printf(…) getTarget(insn) counter++ if (pred) callback(…) Code visualizations Malware defeats static analysis & is sensitive to instrument-ation Analysis tool Analysis of network communications Mutator CFG CFG • Specifies instrumentation • Gets callbacks for runtime events • Builds high-level analysis Time bomb detection and analysis Identification of stolen data SR- Dyninst Dyninst malware binary 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 static-dynamic analysis Sensitivity Resistant Instrumenter Reports on anti-analysis techniques Control flow analyzer Data flow analyzer Instrument-er Control flow analyzer Data flow analyzer CFG Malware Analysis and Instrumentation
Outline Anti-analysis tricks Hybrid static-dynamic analysis Sensitivity resistance Results Anti H.A. S.R. Res. 9 Malware Analysis and Instrumentation
Anti-analysis tricks Anti Obfuscated control flow Obfuscated control flow indirect control flow, stack tampering, overlapping code, signal-based ctrl flow Unpacked code Unpacked code all-at-once, block-, loop-, function-at-a-time, to empty or allocated space Anti-analysis Overwritten code single operand or opcode, whole instruction, function, code section, buffer Overwritten code PC-sensitive code PC-sensitive code call-pop pairs, return-address manipulation, call-stack tampering & probing Anti-patching Anti-patching checksum whole regions, probe for patches, use code as data, move stack ptr Anti-instrumentation Address-space probing Address-space probing scans & probes of locations that should be un-allocated Malware Analysis and Instrumentation
Obfuscated control flow Anti 40d002 storm worm Entry Point obfuscated control flow obfuscated control flow unpacked code overwritten code pc-sensitive code anti-patching address-space probing Malware Analysis and Instrumentation
Unpacked code Anti storm worm Entry Point obfuscated control flow obfuscated control flow unpacked code overwritten code 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 0c 85 a5 94 2b 20 fd 5b 95 e7 c2 16 90 14 8a 14 26 60 d9 83 a1 37 1b 2f b9 51 84 02 1c 22 8e 63 01 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 0c 85 a5 94 2b 20 fd 5b 95 e7 c2 16 90 14 8a 14 26 60 d9 83 a1 37 1b 2f b9 51 84 02 1c 22 8e 63 01 pc-sensitive code anti-patching address-space probing 12 Malware Analysis and Instrumentation
Overwritten code Anti Entry Point Upack packer obfuscated control flow obfuscated control flow unpacked code overwritten code 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 0c 85 a5 94 2b 20 fd 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 5b 95 e7 c2 16 90 14 8a 14 26 60 d9 83 a1 37 1b 2f b9 51 84 02 1c 22 8e 63 01 pc-sensitive code anti-patching address-space probing 13 Malware Analysis and Instrumentation
PC Sensitive code Anti e.g., ASProtect Use call to get current PC Local Data Access call data Pop PC into register obfuscated control flow obfuscated control flow pop esi add esi, eax mov ebx, ptr[esi] unpacked code Construct pointer and dereference overwritten code pc-sensitive code anti-patching address-space probing 14 Malware Analysis and Instrumentation
Anti-patching Anti Checksumming detects instrumentation [Aucsmith 96] e.g., PECompact checksum routine protected code xoreax, eax calculate checksum of protected region add eax, ptr[ebx] add ebx, 4 cmpebx, 0x41000 jne .loop obfuscated control flow compare to expected value cmpeax, .chksum jne .fail jmp unpacked code overwritten code pass fail fail instrument-ationis detected pc-sensitive code anti-patching address-space probing 15 Malware Analysis and Instrumentation
Address-space probing Anti code Memory Scan int *ptr = 0; data segv_handler() { ptr += PAGESIZE; goto RESTART: } code instrumentation sigaction(SIGSEGV, segv_handler); while(1) { RESTART: *ptr; ptr += PAGESIZE; } obfuscated control flow obfuscated control flow unpacked code overwritten code pc-sensitive code anti-patching address-space probing 16 Malware Analysis and Instrumentation
Code discovery algorithm H.A. Hybrid algorithm: Parse from known entry points Instrument control flow that may lead to new code Resume execution ? ? instrument overwrite exception CALL ptr[eax] DIV eax, 0 Malware Analysis and Instrumentation
Code discovery algorithm H.A. Hybrid algorithm: Parse from known entry points Instrument control flow that may lead to new code Resume execution ? ? instrument overwrite exception CALL ptr[eax] DIV eax, 0 Malware Analysis and Instrumentation
Code discovery algorithm H.A. Hybrid algorithm: Parse from known entry points Instrument control flow that may lead to new code Resume execution ? ? instrument overwrite exception CALL ptr[eax] DIV eax, 0 Malware Analysis and Instrumentation
Code discovery algorithm H.A. Hybrid algorithm: Parse from known entry points Instrument control flow that may lead to new code Resume execution ? ? instrument overwrite exception CALL ptr[eax] DIV eax, 0 Malware Analysis and Instrumentation
Code discovery algorithm H.A. Hybrid algorithm: Parse from known entry points Instrument control flow that may lead to new code Resume execution ? instrument overwrite exception CALL ptr[eax] DIV eax, 0 Malware Analysis and Instrumentation
Accurate parsing H.A. • Standard control-flow traversal • start from known entry points • follow control flow to find code • New conservative assumption • unresolved calls may not return • So, we don’t parse garbage code • Newstack tamper detection • backwards slice at ret instruction • So,we detect modified return • addresses call ptr[eax] garbage pop ebp inc ebp push ebp ret Hybrid Analysis of Program Binaries
Instrumentation-based discovery H.A. Invalid control transfers Indirect control transfers Exception-based control transfers call 401000 Invalid Region jmp eax call ptr[eax] push eax ? ? ret xor eax, eax mov ebx, ptr[eax] Exception Handler Malware Analysis and Instrumentation
Instrumentation-based discovery H.A. Dyninst process … call ptr[eax] ? Hybrid Analysis of Program Binaries
Instrumentation-based discovery H.A. Dyninst process … call ptr[eax] … call ptr[eax] jmp 823456 save state call findTarget (ptr[eax]) restore state findTarget(targ) { if ( !cacheLookup(targ) ) RPC_updateAnalysis(targ); } Hybrid Analysis of Program Binaries
Overwritten code discovery H.A. Dyninst write RWX RWX RWX 26 Malware Analysis and Instrumentation
H.A. Overwritten code discovery Dyninst • When to update • Challenges • large incremental overwrites • writes to data • writes to own page code write handler CFG update routine write R E R E R E 27 Hybrid Analysis of Program Binaries
D.A. Overwritten code discovery Dyninst • When to update • Challenges • large incremental overwrites • writes to data • writes to own page • Approach • Delay the update until write routine terminates code write handler CFG update routine write R E R E R E 28 Hybrid Analysis of Program Binaries
Overwritten code discovery H.A. Dyninst • Update after overwrite • Handle overwrite signal • instrument write loop exits • copy overwritten page • restore write permissions • resume execution • Update CFG when writes end • remove overwritten and unreachable blocks • parse at entry points to overwritten regions • remove write permissions • resume execution • Update after overwrite • Handle overwrite signal • instrument write loop exits • copy overwritten page • restore write permissions • resume execution • Update CFG when writes end • remove overwritten and unreachable blocks • parse at entry points to overwritten regions • remove write permissions • resume execution code write handler CFG update routine write cb cb R-X RWX R-X R-X 29 Malware Analysis and Instrumentation
Overwritten code discovery H.A. Dyninst • Update after overwrite • Handle overwrite signal • instrument write loop exits • copy overwritten page • restore write permissions • resume execution • Update CFG when writes end • remove overwritten and unreachable blocks • parse at entry points to overwritten regions • remove write permissions • resume execution code write handler CFG update routine write cb cb R-X R-X RWX R-X 30 Malware Analysis and Instrumentation
Behavior Changes S.R. Program modification affects local behavior These changes propagate Malware detects changes (or crashes) Malware Analysis and Instrumentation
Sensitivity Resistant Approach S.R. • Identify instructions sensitive to modification • Moved instructions that access the program counter • Memory operations that may access patched code • Memory operations that may scan the address space • Project effects on program behavior • Are output (or control flow) affected? • Use a forward slice and symbolic evaluation • Determine how to compensate for modification • E.g. by emulating the original instruction Malware Analysis and Instrumentation
PC-sensitivity analysis S.R. Sensitive: call foo Slice: call foo ret Symbolic expansion: pc = $retAddr + $delta main: reloc_main: main: call foo ... call next <data> next: pop %esi add %esi, %eax mov (%esi), %ebx jmp %ebx foo: ... ret main: call foo ... push $next pop %esi add %esi, %eax mov (%esi), %ebx jmp %ebx Sensitive: call next Slice: call next pop %esi add %esi, %eax mov %(esi), %ebx jmp %ebx Symbolic expansion: pc = [$next + %eax + $delta] Malware Analysis and Instrumentation
Sensitivity Classes S.R. • PC (program counter) sensitive • Moved instruction that accesses the PC • CF (control flow) sensitive • Instruction whose control flow successor was moved • CAD (code as data) sensitive • Instruction that reads from overwritten memory • AVU (allocated vs. unallocated) sensitive • Instruction that accesses newly allocated memory Malware Analysis and Instrumentation
Visible Compatibility S.R. • What behavior do we need to preserve? • Allow localized changes that aren’t visible from outside the program • Preserve: • Output • Approximation: control flow Malware Analysis and Instrumentation
Handling CAD Sensitivity S.R. code checksum routine patch data xor eax, eax patch code jmp 863828 add eax, ptr[ebx] add ebx, 4 cmpebx, 0x41000 jne .loop add ebx, 4 cmpebx, 0x41000 jne .loop save state patch emulate (add eax, ptr[ebx]) restore state cmpeax, .chksum jne .fail instrumentation pass fail fail shadow memory Malware Analysis and Instrumentation
Emulating Memory (Simplified) S.R. • Save state • Determine effective address • Translate effective address • Restore state • Emulate original memory instruction push %eax push %ecx push %edx lahf push %eax lea <original>, %ebx call translate pop %eax sahf pop %edx pop %ecx pop %eax mov (%ebx), %ebx Malware Analysis and Instrumentation
The Devil in the Details S.R. • IA-32 is a rich instruction set • Most instructions can access memory • And malware uses a wide variety of them • Instruction classes: • Most common: MOD/RM byte • Less common: “string” operations • Least common: absolute address Malware Analysis and Instrumentation
String Operations S.R. <save> mov %edi, %edx mov %esi, %ecx call TranslateShift add %edx, %edi add %ecx, %esi movs sub %edx, %edi sub %ecx, %esi <restore> movs • “String” instructions implicitly use ESI/EDI • scas/lods/stos/movs/cmps/ins/outs • Some update ESI/EDI, making emulation tricky • Malware loves these for copying blocks of memory Malware Analysis and Instrumentation
Address-space scanning S.R. code scan routine patch data xor eax, eax code patch movptr[eax], ebx add eax, 4 cmpeax, 0 jne .loop jmp 863828 add eax, 4 cmpebx, 0 jne .loop save state patch emulate (movptr[eax], ebx) restore state call chk_mem instrumentation pass fail fail segv_handler ... dyn_segv_handler ... ... Malware Analysis and Instrumentation
Exception Handler Interposition S.R. Windows Libraries push %eax push %ecx push %edx lahf push %eax lea <original>, %eax call translate pop %eax sahf pop %edx pop %ecx pop %eax mov (%eax), %eax Exception Record Faulting insn: <reloc_addr> Faulting addr: 0 Registers: Faulting insn: <orig_addr> Faulting addr: <eff_addr> Registers: dyn_segv_handler ... ... segv_handler ... Malware Analysis and Instrumentation
The packers we’re studying Res. SR-Dyninst Packer Malware market share[1] Obfuscated Self-modifying Anti instru-mentation Dyninst √ UPX 9.45% √ PolyEnE 6.21% yes EXECryptor 4.06% yes yes yes x yes yes yes x Themida 2.95% yes yes yes PECompact 2.59% √ √ Upack 2.08% yes yes nPack 1.74% √ anti-debugging techniques √ Aspack 1.29% yes yes √ FSG 1.26% yes √ yes Nspack 0.89% yes yes Asprotect 0.43% yes yes √ x Armadillo 0.37% yes yes yes Yoda's Protector 0.33% yes yes yes √ √ WinUPack 0.17% yes yes MEW 0.13% √ yes Malware Analysis and Instrumentation [1] Packer (r)evolution. Panda Research, 2008. Two-month average Feb-March 2008.
Sample malware analysis factory Res. Controlflow graph showing executed blocks Stack trace at 1st network communication 200 binaries malware binary 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 malware binary 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 malware binary 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 malware binary 7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21 Trace of Win API calls comprehensive instrumentation network call instrumentation Defensive tactics report • unpacked code • overwritten code • control flow obfuscations SD-Dyninst Malware Analysis and Instrumentation
Factory results for Conficker A Res. packed payload initial bootstrap code Malware Analysis and Instrumentation
Factory results for Conficker A Res. unpacked block static block API func non executed block
Factory results for Conficker A Res. Instrument network calls and perform a stack-walk Stack-walk of Conficker’s communications thread Frame pc=0x100016f7 func: DYNstopThread at 0x100001670 [Dyninst] Frame pc=0x71ab2dc0 func: select at 0x71ab2dc0 [Win DLL] Frame pc=0x401f34 func: nosym1f058 at 0x41f058 [Conficker] (We can also print stackwalks of Conficker’s other threads) Malware Analysis and Instrumentation
Improved Dyninst overhead Res. • Reduced relocation overhead despite emulation • Better handling of program features • Exceptions • Indirect control flow Malware Analysis and Instrumentation
Conclusion • SR-Dyninst gives you • All the benefits of Dyninst on malware • Safer instrumentation on normal binaries • Ongoing work • Anti-debugger techniques • More descriptive CFGs • Automated defensive-mode activation • SR-Dyninst in next Dyninst release Malware Analysis and Instrumentation