410 likes | 536 Views
Alphapack. Jenna Kallaher Costas Akrivoulis Raul Gonzalez. Agenda. Overview Architecture Demo Security Performance Future Work. Overview. What is code virtualization? VMProtect , Themida New language, compiler & emulator What is our take on code virtualization?
E N D
Alphapack Jenna Kallaher Costas Akrivoulis Raul Gonzalez
Agenda • Overview • Architecture • Demo • Security • Performance • Future Work
Overview • What is code virtualization? • VMProtect, Themida • New language, compiler & emulator • What is our take on code virtualization? • Randomly reassign opcodes withinX86 • Based on Qemu & LLVM
Architecture Jenna Kallaher Costas Akrivoulis Raul Gonzalez
Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary
Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary
Language Generator Emulator Generator Packed Binary Generator Mapping Compiler Generator Raw Bytecode Generator Input.c Packed Binary
Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Packed Binary Input.c
Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Compiler Input.c Packed Binary
Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Compiler Input.c Packed Binary
Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary
Language Generator Emulator Generator Packed Binary Generator Raw Bytecode Compiler Generator Raw Bytecode Generator Input.c Packed Binary
Language Generator Emulator Generator Mapping Packed Binary Generator Raw Bytecode Compiler Generator Raw Bytecode Generator Input.c Packed Binary
Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary
Language Generator Emulator Generator Emulator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary
Language Generator Emulator Generator Emulator Packed Binary Generator Compiler Generator Raw Bytecode Generator Raw Bytecode Input.c Packed Binary
Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary
Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary
Language Generation Jenna Kallaher Costas Akrivoulis Raul Gonzalez
Opcode Translation • Why can't we randomly reassign all opcodes? • Many opcodes are related • DIV:0xF6,0xF7 • Opcodes have ranges (register offsets) • INC:0x40...0x47 • Classify opcodes into buckets • Can reassign opcodes within buckets
Opcode Translation Cont. • Challenge: Opcode dependencies • LLVM’s JIT Compiler emits hardcoded bytes • 0xE8,0x66,0x0F, ... • Instructions share same opcode 0x90:PFCMPGE,SETO,NOOP,XCHG16ar,XCHG32ar,XCHG32ar64,XCHG64ar,PAUSE, VPGATHERDQ,VPGATHERDD,VPROTB • Manual resolution of errors is required
Compiler Generation Jenna Kallaher Costas Akrivoulis Raul Gonzalez
LLVM Modification • Opcode translations added to TableGen files • LLVM recompiled • Recompiling LLVM is the most time consuming operation in our process
Bytecode Generation Jenna Kallaher Costas Akrivoulis Raul Gonzalez
Source Code Compilation • Source code compiled w/ modified Clang • Linker inserts pre-compiled code • Not compiled with our clang • Cannot translate all opcodes in .textregion • User code address range is recorded • Constrains when translation is turned on/off • Blacklist addresses based on function name
Emulator Generation Jenna Kallaher Costas Akrivoulis Raul Gonzalez
Qemu Modification • 3,500 line case statement in translate.c • Case statements reordered • “Undo” LLVM compile time translations • Challenges • QEMU doesn't support all syscalls (fork, futex) • Nested cases/opcodes • Floating point
Demo Jenna Kallaher Costas Akrivoulis Raul Gonzalez
Security Jenna Kallaher Costas Akrivoulis Raul Gonzalez
Anti-RE • We have it • Not what we are here to discuss today
Brute-Force • How many unique “languages” can we create? • How long would it take a brute-forcer • To find the “right” opcode translation • Assuming an oracle responds in 1ns
Statistical Analysis • What if reverse engineer is clever? • Transformed X86’ maintains same statistical properties as normal X86 • Instruction Frequency • Extensions • Arguments
Performance Jenna Kallaher Costas Akrivoulis Raul Gonzalez
Run Time *Performance bounded by original Qemu
Future Work • Remove need to recompile LLVM/QEMU • Multiple Emiters • Encryption • Randomize registers • Randomize syscall numbers • Create a new ISA instead of modifying X86 • Defeat statistical analysis
Multi-Emiter + Rand. Registers MOV $10, %EAX PUSH %EAX *Prevents trivial statistical analysis LEA $10, %ESP POP %EBP