110 likes | 261 Views
Binary Translation and Applications. R. Sekar Stony Brook University. Binary Translation for Protecting Applications. Basic approach: Instrument OS+application to enforce policies that protect the application from a hostile OS Why binary translation?
E N D
Binary Translation and Applications R. Sekar Stony Brook University
Binary Translation for Protecting Applications • Basic approach: Instrument OS+application to enforce policies that protect the application from a hostile OS • Why binary translation? • Versatile: enforce a wide range of properties • Low-level: memory pages, instructions/operands,… • Higher-level: fine-grained (data-structure level) memory isolation, policies on callable functions and parameters,… • Global: information flow, control-flow integrity,… • Wide applicability: • COTS and legacy applications available only in binary form • Components in hand-written assembly code • Parts of the OS, performance-critical components of some applications (e.g., Firefox, GIMP, some media codecs, …) • Flexibility: • apply different policies on different OS components (allows some components to be more trusted than others)
Binary Translation Today … • State-of-art uses dynamic binary translation • Instrument each instruction just before first execution • Side-steps one of the key challenges with COTS binary: accurate disassembly (and dynamically generated code) • Is dynamic translation really practical? • Yes! It is already in wide use in a number of tools • Valgrind, VMWare, QEMU, … • One-time translation overhead can be • Relatively low for coarse-grained instrumentation • i.e., where only a small fraction of instructions are instrumented • Easily amortized for long-running applications
So what is the problem? • Very high overheads for fine-grained instrumentation • Many security applications instrument most instructions, e.g., taint-tracking and fine-grained memory isolation • Dynamic approaches save/restore registers and flags so that they can be used by the added instrumentation • Need many memory reads and writes for each instruction • Purely dynamic approach precludes most optimizations (which rely on static analysis for soundness) • 400% to 4000% (e.g., Valgrind memcheck) overhead! • Large applications can take several minutes to start up! • Difficulty in reasoning about higher level properties • Another limitation of pure dynamic translation
Our Approach • Use (mostly) static binary rewriting to reduce overheads • Eliminate most runtime overhead for disassembly or translation • Can use static analysis to reduce instrumentation overheads • But COTS binaries pose many challenges • Binaries lack information available to compilers • Variable kind (local/global), size or type • Function boundaries or number of parameters • Position-independent code (PIC), non-standard use of stack, functions with side-effects, and aliasing are common • More complications: Hand-written assembly, exceptions, multi-threading, unrestricted pointer arithmetic and pointer forging … • Solution • Develop a static-analysis based approach that systematically overcomes these challenges • Can also form the basis for reasoning about higher level properties
Previous Results: Binary Taint-Tracking • Metadata for each word of data • Metadata for M : TAGMAP[M/4] 1 UNTRUSTED TAGMAP 0 TRUSTED ADDRESS SPACE
Taint-Tracking: Problem with Performance • R = R + M • Save R1,R2,R3 in memory • R1 = &M • R2 = TAGMAP [R1 >> 2] • R3 = RegTaint [R] • Taint = R2 || R3 • RegTaint [R] = Taint • Restore R1, R2, R3
Binary Taint-Tracking: Key Results • A newmodular, scalable static analysis for binaries that recovers information about • local variables and function parameters • PC-relative addressing (PIC code) • limited information about aliasing • Effective new optimizations supported by static analysis • Register-caching of metadata (taint) • Metadata-sharing among locations with equal taint labels • “Fast path” code specialization • Good performance • Our 30% to 160% overhead is much faster than previous works (4x to 40x slowdown) • Fast-enough for online operation with no perceptible slowdown for many CPU-intensive applications (e.g., media players) • No perceptible difference to application startup times
Key Research Problems • Static analysis of binary code • Cope with challenges of posed by low-level code (hand-written assembly, pointer fabrication, violation of stack/calling conventions …) • Reconstruct higher level views needed to support required security properties • Robust disassembly • Avoid optimistic assumptions or assumptions regarding compilers used for code generation • Rely on static analysis instead • If assumptions are unavoidable, then verify them at runtime • Fall back to dynamic translation when all else fails • Indirect calls that cannot be analyzed, dynamically generated code, …
Key Research Problems (Continued.) • Threat analysis and defenses • Analyze the full range of threats (especially low-level threats) posed by hostile OSes and develop defense mechanisms • Leveraging compiler support (when available) • Utilizing type or other information provided by compiler to enhance property enforcement • Tie-in to SVM and certifying translation • Exploit hardware features • Features for enhanced isolation, multi-core, …