500 likes | 672 Views
Mimimorphism: A New Approach to Binary Code Obfuscation. Zhenyu Wu, Steven Gianvecchio, Mengjun Xie Advisor: Dr. Haining Wang. Malware Propagation & Detection. Internet & Ubiquitous Computing Billions of networked computers Playground for malware Suppression Techniques Static analysis
E N D
Mimimorphism:A New Approach to Binary Code Obfuscation Zhenyu Wu, Steven Gianvecchio, Mengjun Xie Advisor: Dr. Haining Wang
Malware Propagation & Detection • Internet & Ubiquitous Computing • Billions of networked computers • Playground for malware • Suppression Techniques • Static analysis • Low latency, high throughput • Widely used, IDS deployable • Dynamic analysis
The Game of Hide and Seek • Unique substring • Segments of the binary • Algorithmic detection • Build in transformations • Statistical analysis • Anomalies in code body • Advanced pattern matching • N-gram signatures • Semantic analysis • Persist high-level fingerprints • Un-obfuscated • Binary in plain • Oligomorphism • Simple transformation (XOR) • Polymorphism • Compression and encryption • Metamorphism • Meta transformation (P-code) • State of the Art • Control-flow encryption • Byte frequency manipulation
Fugitive On The Run WANTED $5,000,000
Fugitive On The Run • Polymorphism • Compression & Encryption Nobody looks like a small dark box! ? ?
Fugitive On The Run • Metamorphism • Reordering Components Cannot evade feature detections Wanted $5,000,000 ! !
Fugitive On The Run • Control Flow Encryption • Prevent feature analysis Increases suspicion ? ?
Fugitive On The Run • The Real Player • Assume other people’s identity (Mimicry)
Fugitive On The Run • Lessons Learned: • Evasion without obfuscating features • Evasion by refusing inspection • Evasion by mimicking • Obfuscating original features • Open to inspection, but disguises detection
Binary Executable Mimicry • Mimimorphism: • Reversible transformation of an executable that produces output statically resembles other benign programs • Characteristics: • Completely erases features from the original binary • High order statistics matches benign executables • Transformed payload consists of “meaningful” control flows, highly resemble those from benign executables
Mimic Functions • Text Stenography Technique • Transforms the input data and produces mimicry output copies that assume statistical and grammatical (structural) properties of another type of data • Originally proposed by Peter Wayner as means to transport sensitive data under harsh surveillance • Novel use of Huffman coding
mass 000111 (32 bits) (6 bits) Mimic Functions Huffman Tree • Huffman Coding • Digesting • Builds a Huffman tree according to the symbol frequency • Encoding • Removes redundancies of the input data using a given Huffman tree • Decoding • Recovers the original data from the “condensed” data by emitting symbols according to the original Huffman tree 0 1 s 0 1 m a 01 s 00 m 01 a
Mimic Functions • What if we decodea piece of random data? • Produces “meaningless” data, but • The output exhibits similar symbol frequency to the digest- and - • Input data can be recovered by Huffman encode • Regular Mimic Function • Learn: Build a Huffman tree from sample text • Mimicry: Huffman decode on input (randomized) • Recover: Huffman encode
0 1 c 0 1 l n Mimic Functions chi Huffman “Forest” • Insufficiencies • Produces illegible, garbled text • Frequency distributions follow 2n distribution • High-order Mimic Function • Captures interdependencies • Build multiple Huffman trees • One for each unique symbol prefix • Produces “sensible” text with much more “natural” symbol frequency distributions rou 0 1 t 0 1 ins n g 0 1 p t
Mimicry Text Sample • Mimicry of Peter Wayner’s paper • Produced by 6th order mimic function Each of these historical reason, I don’t recommend using gA(t) to choose the safe. These one-to-one encoded with n leaves and punctuation. The starting every intended to find the same order mimic files. A Method is to break the trees by constructing the mimics the path down the most even though, offer no way that is, in this paper. Figure will not overflow memory. These produced by truncating letter. This need to handle n-th ordered compartment of nonsense words cannot bear any resemblance to B because this task is a Huffman showed in [1], [2], [3] among others.
Mimimorphism • The Challenge: Machine Language Mimicking • Consists of instructions and control flows • Each instruction has a strict format to follow • Machines never make “typo”, or use wrong “tense”! • Mimic function has no knowledge of instructions • Often makes mistakes generating instructions • Have a low success rate of creating mimicry control flows • Our Solution • Integrate a custom assembler / disassembler • Help the mimic function understand the language
Mimimorphism: Digesting • Digesting Mimicry Target XOR High Order Instruction Mimic Function Exec. Binaries Disassemble PUSH DEC MOV Instruction Huffman Forest Control Flows Mimicry Digest
Mimimorphism: Digesting • Digesting Instruction Prefix XOR MOV MOV PUSH XOR DEC Inst. Prefixes (Atomic op., repeat, operand size, etc.) Exec. Binary PUSH 0 1 ModR/M (Mod / Reg. / R/M) DEC INC 0 1 MOV SIB (Scale / Idx. / Base) MOV PUSH MOV Displacement Instruction Huffman Tree COMMON_INST Structure
Mimimorphism: Digesting • Digesting Instruction Encoding Template Instruction Prefix MOV XOR MOV PUSH XOR DEC Inst. Prefixes (Atomic op., repeat, operand size, etc.) PUSH 0 1 ModR/M (Mod / Reg. / R/M) DEC INC 0 1 SIB (Scale / Idx. / Base) MOV PUSH MOV Displacement Instruction Huffman Tree COMMON_INST Structure
Mimimorphism: Digesting • Digesting Instruction Encoding Template MOV MOV Inst. Prefix ModR/M 0 1 Inst. Prefixes (Atomic op., repeat, operand size, etc.) Inst. Prefixes (Atomic op., repeat, operand size, etc.) EAX 0 1 ECX EDX ModR/M (Mod / Reg. / R/M) ModR/M (Mod / Reg. / R/M) 0 0 1 1 SIB (Scale / Idx. / Base) SIB SIB (Scale / Idx. / Base) 2x8+16 16bit REP 3x4+0 …… Displacement Displacement Displacement
Mimimorphism: Digesting • Digesting Instruction Encoding Template Instruction Prefix MOV XOR MOV PUSH Inst. Prefix ModR/M 0 1 DEC Inst. Prefixes (Atomic op., repeat, operand size, etc.) EAX 0 1 0 1 ECX EDX ModR/M (Mod / Reg. / R/M) INC 0 0 1 1 0 1 SIB SIB (Scale / Idx. / Base) 2x8+16 16bit REP 3x4+0 MOV PUSH …… Displacement Displacement Instruction Huffman Tree
Mimimorphism: Digesting • Digesting Instruction Prefix Instruction Prefix XOR MOV XOR PUSH PUSH XOR DEC XOR Inst. Prefixes (Atomic op., repeat, operand size, etc.) DEC PUSH PUSH DEC DEC 0 1 ModR/M (Mod / Reg. / R/M) MOV INC 0 1 SIB (Scale / Idx. / Base) MOV PUSH MOV Displacement Instruction Huffman Tree
Mimimorphism: Digesting • Digesting Instruction Prefix DEC XOR PUSH MOV DEC PUSH POP MOV DEC PUSH DEC MOV 0 1 0 1 1 0 JMP CALL CMP INC 1 0 1 0 MOV MOV XCHG PUSH Mimimorphic Digest
Mimimorphism: Encoding • Encoding PRNG High Order Instruction Mimic Function Mimicry Digest Binary Data Assemble Mimicry Binaries
Mimimorphism: Encoding • Encoding XOR PUSH DEC XOR 01001001100101010001010010001001 Mimicry Digest Binary Data PUSH DEC 0 1 INC 0 1 MOV PUSH Instruction Huffman Tree Instruction Prefix
Mimimorphism: Encoding • Encoding Instruction Encoding Template MOV XOR PUSH Inst. Prefix ModR/M 0 1 DEC 01001001100101010001010010001001 EAX Binary Data 0 1 0 1 ECX EDX INC 0 0 1 1 0 1 SIB 2x8+16 16bit REP 3x4+0 MOV MOV PUSH …… Displacement Instruction Huffman Tree
Mimimorphism: Encoding • Encoding Instruction Encoding Template MOV Inst. Prefix ModR/M 0 1 01001001100101010001010010001001 EAX 0 1 16bit ECX EDX ECX 0 0 1 1 SIB 2x8+16 16bit 3x4+0 REP …… Displacement 3x4+0
Mimimorphism: Encoding • Encoding Instruction Encoding Template MOV MOV Inst. Prefix ModR/M 0 1 01001001100101010001010010001001 Inst. Prefixes (Atomic op., repeat, operand size, etc.) EAX 0 1 16bit ECX EDX ModR/M (Mod / Reg. / R/M) ECX 0 0 1 1 SIB SIB (Scale / Idx. / Base) 2x8+16 16bit REP 3x4+0 …… Displacement 3x4+0 Displacement COMMON_INST Structure
Mimimorphism: Encoding • Encoding MOV XOR 01001001100101010001010010001001 Inst. Prefixes (Atomic op., repeat, operand size, etc.) PUSH ModR/M (Mod / Reg. / R/M) DEC SIB (Scale / Idx. / Base) ? MOV Displacement COMMON_INST Structure
Mimimorphism: Encoding • Encoding Instruction Prefix XOR XOR 01001001100101010001010010001001 PUSH PUSH DEC DEC MOV MOV MOV
Mimimorphism: Decoding • Decoding High Order Instruction Mimic Function Mimicry Digest Mimicry Binaries Disassemble Binary Data PRNG
Experimental Setup • Training • Select 100 Windows XP system files as mimicry target • They represent typical legitimate binaries • Trained using 7th and 8th order mimimorphic engines • Most control flow basic blocks have 7-8 instructions • Evaluations • Statistical Anomaly Tests • Kolmogorov-Smirnov Test & Entropy Test • Semantic Detection Test • Control Flow Fingerprinting
Evaluation Results 0.09 • Statistical Tests • Kolmogorov-Smirnov Test • Maximum byte frequency distribution differences • Legitimate: 0.074±0.045; Mimimorphic: 0.093±0.006 • Entropy Test • Measurement of predictability (or randomness) of data • Legitimate: 6.353±0.258; Mimimorphic: 6.528±0.021 0.074 0.516 6.353
Evaluation Results • Semantic Tests • Control Flow Fingerprinting • Statically analyze executables (with a special disassembler) and extract control flow patterns • Detecting malwares by matching their characteristic control flow patterns (i.e., shared fingerprints) • Between original binary and Mimimorphic instances • Shared fingerprints: the lower the better • Only 1 out of 100 instances share a single fingerprint (out of hundreds of thousands fingerprints)
Evaluation Results • Semantic Tests • Between mimimorphic and legitimate binaries • Shared fingerprints: the higher the better • 7th order mimimorphic instances: • Average 1856.46±372.5 (72.93 benign files) • Minimum 1057 (44 files); Maximum 3321 (92 files) • 8th order mimimorphic instances: • Average 11407.99±912.42 (81.37 benign files) • Minimum 9606 (70 files); Maximum 14216 (91 files)
Evaluation Results • Semantic Tests • A sample mimicry control flow pattern • Reproduced by a 7th order mimimorphic instance
Limitations & Discussions • Application Constraint • Memory consumption: 600MB for 7th order and 1.2GB for 8th order mimimorphic transformation • Disk-based on-demand digest storage • Size increase: 20x inflation for 7th order and 30x for 8th order mimimorphic transformation • Typical malware are less than 100KB • Mimimorphism results in 2~3MB files
Conclusion • We propose mimimorphism as a novel binary obfuscation technique • Enhanced high order mimic functions with custom assembler / disassembler • Achieves evasion by disguising, not refusing detection • Effective against both statistical anomaly detection as well as semantic fingerprinting tests
Limitations & Discussions • Robustness against other approaches • Automatic n-gram detections • Typical x86 instruction length: 2.1~2.8 • 8th order mimimorphism can approach 16-gram mimicry • Existing n-gram detection algorithms can hardly scale up to • Static semantic analysis • Mimimorphism does not target specific detection techniques • Focuses on reproducing features from benign programs • Immune to lower order signature detections
Limitations & Discussions • Robustness against other approaches • Deep syntactic analysis • Fails to exactly reproduce high level syntactic features: • 45% “functions” do not have matching prologue and epilogue • Many jump instructions go across function boundaries • Detectable program-level anomalies • Not all programs follow conventions • Could lead to false positives
Limitations & Discussions • The Problem of the Unpacker • Mimimorphic transformation does not provide solution for hiding the unpacker • However, we believe unpackers do benefit from using mimimorphism • Unpacker is the weakness of polymorphism because it is easy to be “spotted” – all other payload is not executable! • All mimimorphic payload is “executable”, separating unpacker code from the payload becomes non-trivial
Mimimorphism: Decoding • Decoding High Order Instruction Mimic Function Mimicry Digest Mimicry Binaries Disassemble Binary Data PRNG
Mimimorphism: Decoding • Decoding Instruction Prefix XOR MOV MOV PUSH XOR DEC Inst. Prefixes (Atomic op., repeat, operand size, etc.) 0 Mimicry Binary 0 PUSH 0 1 ModR/M (Mod / Reg. / R/M) DEC INC 0 1 Decoded Bits MOV SIB (Scale / Idx. / Base) MOV MOV PUSH MOV Displacement Instruction Huffman Tree COMMON_INST Structure
Mimimorphism: Decoding Decoded Bits • Decoding Instruction Prefix MOV XOR MOV PUSH Inst. Prefix ModR/M 0 1 DEC Inst. Prefixes (Atomic op., repeat, operand size, etc.) EAX 0 1 0 0 0 1 ECX EDX ModR/M (Mod / Reg. / R/M) INC 0 0 1 1 0 1 Decoded Bits SIB SIB (Scale / Idx. / Base) 2x8+16 16bit REP 3x4+0 MOV PUSH MOV …… Displacement Displacement Instruction Huffman Tree COMMON_INST Structure
Mimimorphism: Decoding Decoded Bits 1 0 1 0 • Decoding MOV MOV Inst. Prefix ModR/M 0 1 Inst. Prefixes (Atomic op., repeat, operand size, etc.) Inst. Prefixes (Atomic op., repeat, operand size, etc.) EAX 0 1 16bit ECX EDX ModR/M (Mod / Reg. / R/M) ModR/M (Mod / Reg. / R/M) ECX 0 0 1 1 SIB (Scale / Idx. / Base) SIB SIB (Scale / Idx. / Base) 2x8+16 16bit REP 3x4+0 …… Displacement Displacement Displacement 3x4+0
Mimimorphism: Decoding Decoded Bits 1 0 1 0 • Decoding MOV XOR MOV PUSH Inst. Prefix ModR/M 0 1 DEC Inst. Prefixes (Atomic op., repeat, operand size, etc.) EAX 0 1 0 0 16bit 0 1 ECX EDX ModR/M (Mod / Reg. / R/M) ECX INC 0 0 1 1 0 1 Decoded Bits SIB SIB (Scale / Idx. / Base) 2x8+16 16bit 3x4+0 REP MOV PUSH MOV …… Displacement Displacement 3x4+0 Instruction Huffman Tree
Mimimorphism: Decoding • Decoding Instruction Prefix Instruction Prefix XOR PUSH XOR XOR DEC XOR PUSH 01001001 10010101 DEC PUSH PUSH 0 0 1 0 1 0 DEC DEC 0 1 MOV INC 0 1 Decoded Bits MOV PUSH MOV Instruction Huffman Tree