LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks

LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan zhou, Youfeng Wu University of Illinois at Urbana-Champaign Intel Corporation The Ohio State University

Information Flow Tracking • Taint Analysis • To detect / prevent security attacks • For attacks that corrupts control data • General: not for specific types of software vulnerabilities • Even for unknown attacks

Approach • 1. Tag (label) the input data from unsafe channels: network • 2. Propagate the data tags through the computation • Any data derived from unsafe data are also tagged as unsafe • 3. Detect unexpected usages of the unsafe data • Switch the program control to the unsafe data

A Simple Example • a is unsafe • Information flows from a to b: b is unsafe • If c is unsafe, jumping to the location pointed by c fails

Three Ways <1> • Language-based • For programs written in special type-safe programming languages • To track information flow at compile time • Good: No runtime overhead • Bad: Only for specific program languages • Not Practical

Three Ways <2> • Instrumentation • To track the information flow and detect exploits at runtime • Source code instrumentation • Lower overhead • Cannot track in third-party library code • Require a specification of library calls • Complex, error-prone, side-effects • Binary code instrumentation • Runtime overhead: 37 times

Three Ways <3> • Hardware-based • RIFLE • Good: low overhead • Bad: Non-trivial hardware extensions

Overview of LIFT • Dynamically instruments the binary code • (1) tracking information flow • (2) detect security exploits • Advantages: • Low overhead, software-only, No source code • Built on top of StarDBT • Binary translator by Intel

Design of LIFT • Basic design • Tag management • Information flow tracking • Exploit detection • Protection of the tag space • Optimizations

Tag Management: Design • Associate a one-bit tag for each byte of data in memory and general data register • 0: safe; 1: unsafe • At the beginning: all tags are cleared to zero • Data may be tagged with 1 when • It is read from network or standard input • Information flow from other unsafe data to it • An unsafe data can become safe if it is reassigned from some safe data

Tag Management: Storage • For memory data • Storage: a special memory region (tag space) • Look-up: one-to-one mapping between a tag bit and a memory byte in the virtual address space • Overhead: 12.5% • Compression: • memory data nearby each other usually have similar tag values • For general registers • Store tags in a dedicated extra register (64-bit) • Reduce overhead • If no spare registers: a special memory area • No significant overhead as the L1 cache • Hardware ??

Information Flow Tracking <1> • Dynamically instrument instructions • Instrumented once at runtime, and executed multiple times • The instrumentation is done before the instruction in the original program • Tracks information flow based on data dependencies but not control dependencies

Information Flow Tracking <2> • For data movement-based instructions • E.g., MOV, PUSH, POP • Tag propagation: source operand  destination • For arithmetic instructions • E.g., ADD, OR • Tag propagation: both source operands  destination • For instructions that involve only one operand • E.g., INC • The tag does not change

Information Flow Tracking <3> • Special cases • XOR reg, reg: reset reg to zero • SUB reg, reg: • Clear the corresponding tag

Exploit Detection • Also instrument instructions to detect exploits • Unsafe data cannot be used as a return address or the destination of an indirect jump instruction

Protection of Tag Space and Code • It is necessary to protect them • To protect the LIFT code • Make the memory pages that store the LIFT code read-only • To protect the tag space • Turn off the access permission of the pages that store the tag values of the tag space itself • Any access of the original program or hijacked code to the tag space results in access to the corresponding tag and triggers a fault

Optimizations • 47 times runtime overhead • Three binary optimizations

Fast Path (FP): Motivation • Observation: for most server applications, majority of tag propagations are zero-to-zero • From safe data sources to a safe destination

FP: Approach <1> • Before a code segment, insert a check • Check whether all its live-in and live-out registers and memory data are safe or not • If so, no need to do tracking inside the code segment • Run the fast binary version (check version) • If not, run the slow version (track version)

FP: Approach <2> • Live-in: source operand • Live-out: may change to safe after the execution if they are unsafe before the execution • Others: • (a) not used in the code segment • (b) dead at the beginning or end of the code segment

FP: More Technique Details • Difficult to know the address of all units at the beginning • Run the check version first • Postpone the check until the memory location is known • Jump to track version when the check fails • Granularity of code segments • Basic blocks • Hot trace • Remove unnecessary checks • Network processing component

Merged Check (MC): Motivation • Temporal / Spatial Locality • A recently accessed data is likely to be accessed again in a near future • After an access to a location, memory locations that are nearby are also likely to be accessed again in near future • To combine multiple checks into one • Combine the temporally and spatially nearby checks

Merged Check: Approach • Clustering the memory references into groups • Scan all the instructions and build a data dependency graph for each memory reference • Introduce version number to represent the timing attribute • Clustering based on spatially / temporally distance

Fast Switch (FS) • When the program execution switches between the original binary code and the instrumented code it requires saving and restoring the context • Introduce large runtime overhead because they are inserted at many locations • Use cheaper instructions and remove unnecessary saves / restores

Evaluation • Effectiveness • Performance

Evaluation: Effectiveness

Evaluation: Performance <1> • Throughput and response time of Apache • Throughput: 6.2% (StarDBT: 3.4%) • Time: 90.9%

Evaluation: Performance <2> • SPEC2000: 3.6 times on average

Conclusion • A “Practical” Information flow tracking system • Low-overhead • Not requiring hardware extension • Not requiring source code

Discussions • Source-code instrumentation • 81% on average for CPU-intensive C-programs • 5% on average for IO-intensive (sever) program • If we are able to apply similar optimization techniques to source-code instrumentation, the performance could be “practical” • Binary-code instrumentation • CPU-bound: 24 times • Apache server: worst case 25 times, most cases: 5~10 times

More Discussions • Focus on basic design and three optimizations • Not much details about the taint analysis • Evaluation • Effectiveness: false positive / false negative • Performance • IO-incentive vs. CPU-incentive • More benchmarks • Formal model to analyze taint analysis

LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks