180 likes | 272 Views
Eureka: A Framework for Enabling Static Malware Analysis the 13 th European Symposium on Research in Computer Security ( ESORICS ) conference 2008. WANG Zhi. Outline. Overview of Generic Unpacker. 1. System Call Level Heuristic. 2. Statistics-Based Unpacking. 3. Evaluation Metrics. 4.
E N D
Eureka: A Framework for Enabling Static Malware Analysisthe 13thEuropean Symposium on Research in Computer Security (ESORICS) conference2008 WANG Zhi
Outline Overview of Generic Unpacker 1 System Call Level Heuristic 2 Statistics-Based Unpacking 3 Evaluation Metrics 4
Overview of Unpacker • Static analyses: decompile and analyze the logical structure, flow, and data stored within the binary itself. • Dynamic analyses: monitor the behavior of the malware binary at runtime. • Fine-grained monitor (Instruction-level) • Coarse-grained monitor (page-level)
Generic Automatic Unpackers PolyUnpack Renovo OmniUnpack Eureka Instruction-level Instruction-level Page-level System call level Model-base trigger Heuristic trigger Heuristic trigger Heuristic and Statistical trigger slow slow fast fast • The variability in unpacking strategies come from the granularity of tracking unpacking behavior.
Eureka Eureka Coarse-grained execution tracing NtTerminateProcess NtCreateProcess Statistical bigram analysis bigram.
Coarse-grained Execution Tracing • Eureka uses the event of program exit as a trigger. • NtTerminateProcess implies that the unpacked malicious payload has been successfully decrypted. • A large fraction of current malware use a new process (NtCreateProcess) to execute the unpacked malicious payload.
Problems • Not all malware exit and keep an executing version resident in memory • Packers can make spurious event of creating new process. • Malware authors can simply avoid exiting the malware process. • The above two simple heuristics may work for a large fraction of malware today( as much as 80%), it may not be the same for future malware.
Statistical bigram analysis • Mining statistical patterns in x86 code • Use simple n-gram analysis • Use the IDA Pro to extract regions from executable that were marked as functions. • Looking for the most common bigrams ( opcode pairs or 2-byte opcodes) and space bigrams( byte pairs separated by 1 or more bytes) • Found FF 15(call) , FF 75(push), E8---00 and E8---FF are prevalent in x86 code.
Bigram Counts • Bigram counts during execution of goat file packed with Aspack
Bigram Counts • Bigram counts during execution of goat file packed with Molbox
Bigram Counts • Bigram counts during execution of goat file packed with Armadillo
Bigram Counts • There are consistent and significant shifts in the bigram counts. • The simple bigram counting approach had over a 95% success rate in distinguishing between packed and unpacked malware instance.
Evaluation Metrics • Code-to-data ratio • An observable difference between packed code and unpacked code is the amount of identifiable code and data found in the binary • Use IDA Pro to identify valid code sequences. • In IDA Pro, data are represented by db, dw or dd. • In packed executables, the ratio is below 3%. • In unpacked executables, the ratio is above 50%.
Code-to-data ratio Packed Unpacked
Code-to-data ratio Grey area stand for data Blue area stand for code Original notepad.exe memory space Packed notepad.exe memory space