Saumya Debray The University of Arizona Tucson, AZ 85721

Saumya Debray The University of Arizona Tucson, AZ 85721 Understanding software that doesn’t want to be understoodReverse engineering obfuscated BINARIEs

The Problem • Rapid analysis and understanding of malware code essential for swift response to new threats • Malicious software are usually heavily obfuscated against analysis • Existing approaches to reverse engineering such code are primitive • not a lot of high-level tool support • requires a lot of manual intervention • slow, cumbersome, potentially error-prone • Delays development of countermeasures

Goals Develop automated techniques for analysis and reverse engineering of obfuscated binaries • semantics-based • output is functionally equivalent to, but simpler than, the input program • generality • should work on any obfuscation • even ones we haven’t thought of yet! • should minimize assumptions about obfuscations

Challenges • can’t make assumptions about obfuscations • what do we leverage for deobfuscation? • distinguishing code we care about from code we don’t • how do we know which instructions we care about? • scale • “needle in haystack” • no. of instructions executed increases by 270x (VMprotect) to 4300x (Themida) [Lau 2008] • anti-analysis defenses • runtime unpacking • anti-emulation, anti-debug checks

Our Approach • no obfuscation-specific assumptions • treat programs as input-to-output transformations • use semantics-preserving transformations to simplify execution traces • dynamic analysis to handle runtime unpacking Taint analysis (bit-level) Semantics-preserving transformations Control flow reconstruction control flow graph input program map flow of values from input to output reconstruct logic of simplified computation simplify logic of input-to-output transformation

Ex 1:Emulation-based Obfuscation • examination of the code reveals only the emulator’s logic • actual program logic embedded in byte code • lots of “chaff” during execution • separating emulator logic from payload logic tricky • emulators can be nested bytecode logic (data) Obfuscator input program mutation engine random seed emulator (code)

Ex 2:Return-Oriented Programs (ROP) • Originally designed to bypass anti-code-injection defenses • stitches together existing code fragments ( “gadgets”), e.g., in system libraries • Logic can be difficult to discern • gadgets are typically scattered across many different functions and/or libraries • gadgets can overlap in memory in weird ways • control flow structures (if-else, loops, function calls) are typically implemented using non-standard idioms

Example 1 (emulation-obfuscation) factorial (Themida)

Example 2 (ROP) factorial o original ROP

Example: Unpacking + Emulation Interactions between Obfuscations input-to-output computation (further simplified) execution trace input unpack unpack input used to construct control flow graph output output instructions “tainted” as propagating values from input to output

Results • Ex. 1. binary search : Themida deobfuscated original obfuscated (cropped)

Results • Ex. 2. Hunatcha(drive infection code) : ExeCryptor deobfuscated original obfuscated (cropped)

Results • Ex. 3. fibonacci: ROP deobfuscated original obfuscated

Results • Ex. 4. Win32/Kryptik.OHY: Code Virtualizer obfuscated deobfuscated multiple layers of runtime code generation unpacking code the CFG shown materializes incrementally initial unpacker is emulation-obfuscated

Results: CFG Similarity

Lessons and Issues • Static vs. dynamic analysis • multiple layers of runtime code generation/unpacking limits utility of static analysis • dynamic analysis can run into problems of scale • O(n2) algorithms impractical ; even O(n log n) can be problematic • trade memory space for execution time/complexity • code coverage — multi-path exploration? • Taint propagation • byte/word-level analyses may not be precise enough • we use (enhanced) bit-level taint propagation • Simplified trace → CFG: NP-hard • semantic considerations?

Conclusions • Rapid analysis and understanding of malware code essential for swift response to new threats • need to deal with advanced code obfuscations • obfuscation-specific solutions tend to be fragile • We describe a semantics-based framework for automatic code deobfuscation • no assumptions about the obfuscation(s) used • promising results on obfuscators (e.g., Themida) not handled by prior research

Additional material

Semantics-based simplification • Quasi-invariant locations: locations that have the same value at each use. • Our transformations (currently): • Arithmetic simplification • adaptation of constant folding to execution traces • consider quasi-invariant locations as constants • controlled to avoid over-simplification • Data movement simplification • use pattern-driven rules to identify and simplify data movement. • Dead code elimination • need to consider implicit destinations, e.g., condition code flags.

Saumya Debray The University of Arizona Tucson, AZ 85721

Saumya Debray The University of Arizona Tucson, AZ 85721

Presentation Transcript

SPECTRUM 2003 Tucson, Arizona

The University of Arizona

Peter Loch University of Arizona Tucson, Arizona USA

TUCSON, AZ.

University of Arizona Library April 5 – 8, 2006 Tucson, Arizona Lori Critz

The University of Arizona

The University of Arizona

Akiko Aharonian Testing Office University of Arizona, Tucson, AZ Aubree Gold

MGCT2 Arizona Inn, Tucson

The University of Arizona

tool fabrication Tucson AZ, custom awnings Tucson AZ, Hand Rails Tucson AZ, Decorative Hand Rails Tucson AZ, Security Ca

1 Department of Geosciences, University of Arizona Tucson, Arizona 85719-0077 USA

Dennis H. Evans Department of Chemistry, University of Arizona, Tucson, AZ 85721 (USA)

Senior Care Tucson | caregiver agencies tucson az

The Top Rated Outdoor Attractions of Tucson, Arizona

Exterminators tucson arizona | Tucson termite treatment

Locksmith Tucson in Arizona

Graphic Designer Tucson - Graphic Design Tucson AZ

Landscaping Tucson AZ