1 / 41

Alphapack

Alphapack. Jenna Kallaher Costas Akrivoulis Raul Gonzalez. Agenda. Overview Architecture Demo Security Performance Future Work. Overview. What is code virtualization? VMProtect , Themida New language, compiler & emulator What is our take on code virtualization?

giulio
Download Presentation

Alphapack

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alphapack Jenna Kallaher Costas Akrivoulis Raul Gonzalez

  2. Agenda • Overview • Architecture • Demo • Security • Performance • Future Work

  3. Overview • What is code virtualization? • VMProtect, Themida • New language, compiler & emulator • What is our take on code virtualization? • Randomly reassign opcodes withinX86 • Based on Qemu & LLVM

  4. Architecture Jenna Kallaher Costas Akrivoulis Raul Gonzalez

  5. Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary

  6. Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary

  7. Language Generator Emulator Generator Packed Binary Generator Mapping Compiler Generator Raw Bytecode Generator Input.c Packed Binary

  8. Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Packed Binary Input.c

  9. Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Compiler Input.c Packed Binary

  10. Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Compiler Input.c Packed Binary

  11. Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary

  12. Language Generator Emulator Generator Packed Binary Generator Raw Bytecode Compiler Generator Raw Bytecode Generator Input.c Packed Binary

  13. Language Generator Emulator Generator Mapping Packed Binary Generator Raw Bytecode Compiler Generator Raw Bytecode Generator Input.c Packed Binary

  14. Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary

  15. Language Generator Emulator Generator Emulator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary

  16. Language Generator Emulator Generator Emulator Packed Binary Generator Compiler Generator Raw Bytecode Generator Raw Bytecode Input.c Packed Binary

  17. Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary

  18. Language Generator Emulator Generator Packed Binary Generator Compiler Generator Raw Bytecode Generator Input.c Packed Binary

  19. Language Generation Jenna Kallaher Costas Akrivoulis Raul Gonzalez

  20. Opcode Translation • Why can't we randomly reassign all opcodes? • Many opcodes are related • DIV:0xF6,0xF7 • Opcodes have ranges (register offsets) • INC:0x40...0x47 • Classify opcodes into buckets • Can reassign opcodes within buckets

  21. Opcode Translation Cont.

  22. Opcode Translation Cont. • Challenge: Opcode dependencies • LLVM’s JIT Compiler emits hardcoded bytes • 0xE8,0x66,0x0F, ... • Instructions share same opcode 0x90:PFCMPGE,SETO,NOOP,XCHG16ar,XCHG32ar,XCHG32ar64,XCHG64ar,PAUSE, VPGATHERDQ,VPGATHERDD,VPROTB • Manual resolution of errors is required

  23. Compiler Generation Jenna Kallaher Costas Akrivoulis Raul Gonzalez

  24. LLVM Modification • Opcode translations added to TableGen files • LLVM recompiled • Recompiling LLVM is the most time consuming operation in our process

  25. Bytecode Generation Jenna Kallaher Costas Akrivoulis Raul Gonzalez

  26. Source Code Compilation • Source code compiled w/ modified Clang • Linker inserts pre-compiled code • Not compiled with our clang • Cannot translate all opcodes in .textregion • User code address range is recorded • Constrains when translation is turned on/off • Blacklist addresses based on function name

  27. Emulator Generation Jenna Kallaher Costas Akrivoulis Raul Gonzalez

  28. Qemu Modification • 3,500 line case statement in translate.c • Case statements reordered • “Undo” LLVM compile time translations • Challenges • QEMU doesn't support all syscalls (fork, futex) • Nested cases/opcodes • Floating point

  29. Demo Jenna Kallaher Costas Akrivoulis Raul Gonzalez

  30. bzip2 objdump (60 opcodes changed)

  31. Security Jenna Kallaher Costas Akrivoulis Raul Gonzalez

  32. Anti-RE • We have it • Not what we are here to discuss today

  33. Brute-Force • How many unique “languages” can we create? • How long would it take a brute-forcer • To find the “right” opcode translation • Assuming an oracle responds in 1ns

  34. Brute-Force Cont.

  35. Statistical Analysis • What if reverse engineer is clever? • Transformed X86’ maintains same statistical properties as normal X86 • Instruction Frequency • Extensions • Arguments

  36. Statistical Analysis Cont.

  37. Performance Jenna Kallaher Costas Akrivoulis Raul Gonzalez

  38. Run Time *Performance bounded by original Qemu

  39. Future Work • Remove need to recompile LLVM/QEMU • Multiple Emiters • Encryption • Randomize registers • Randomize syscall numbers • Create a new ISA instead of modifying X86 • Defeat statistical analysis

  40. Multi-Emiter + Rand. Registers MOV $10, %EAX PUSH %EAX *Prevents trivial statistical analysis LEA $10, %ESP POP %EBP

  41. Questions?

More Related