1 / 31

LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks

LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks. Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan zhou, Youfeng Wu University of Illinois at Urbana-Champaign Intel Corporation The Ohio State University. Information Flow Tracking.

Download Presentation

LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan zhou, Youfeng Wu University of Illinois at Urbana-Champaign Intel Corporation The Ohio State University

  2. Information Flow Tracking • Taint Analysis • To detect / prevent security attacks • For attacks that corrupts control data • General: not for specific types of software vulnerabilities • Even for unknown attacks

  3. Approach • 1. Tag (label) the input data from unsafe channels: network • 2. Propagate the data tags through the computation • Any data derived from unsafe data are also tagged as unsafe • 3. Detect unexpected usages of the unsafe data • Switch the program control to the unsafe data

  4. A Simple Example • a is unsafe • Information flows from a to b: b is unsafe • If c is unsafe, jumping to the location pointed by c fails

  5. Three Ways <1> • Language-based • For programs written in special type-safe programming languages • To track information flow at compile time • Good: No runtime overhead • Bad: Only for specific program languages • Not Practical

  6. Three Ways <2> • Instrumentation • To track the information flow and detect exploits at runtime • Source code instrumentation • Lower overhead • Cannot track in third-party library code • Require a specification of library calls • Complex, error-prone, side-effects • Binary code instrumentation • Runtime overhead: 37 times

  7. Three Ways <3> • Hardware-based • RIFLE • Good: low overhead • Bad: Non-trivial hardware extensions

  8. Overview of LIFT • Dynamically instruments the binary code • (1) tracking information flow • (2) detect security exploits • Advantages: • Low overhead, software-only, No source code • Built on top of StarDBT • Binary translator by Intel

  9. Design of LIFT • Basic design • Tag management • Information flow tracking • Exploit detection • Protection of the tag space • Optimizations

  10. Tag Management: Design • Associate a one-bit tag for each byte of data in memory and general data register • 0: safe; 1: unsafe • At the beginning: all tags are cleared to zero • Data may be tagged with 1 when • It is read from network or standard input • Information flow from other unsafe data to it • An unsafe data can become safe if it is reassigned from some safe data

  11. Tag Management: Storage • For memory data • Storage: a special memory region (tag space) • Look-up: one-to-one mapping between a tag bit and a memory byte in the virtual address space • Overhead: 12.5% • Compression: • memory data nearby each other usually have similar tag values • For general registers • Store tags in a dedicated extra register (64-bit) • Reduce overhead • If no spare registers: a special memory area • No significant overhead as the L1 cache • Hardware ??

  12. Information Flow Tracking <1> • Dynamically instrument instructions • Instrumented once at runtime, and executed multiple times • The instrumentation is done before the instruction in the original program • Tracks information flow based on data dependencies but not control dependencies

  13. Information Flow Tracking <2> • For data movement-based instructions • E.g., MOV, PUSH, POP • Tag propagation: source operand  destination • For arithmetic instructions • E.g., ADD, OR • Tag propagation: both source operands  destination • For instructions that involve only one operand • E.g., INC • The tag does not change

  14. Information Flow Tracking <3> • Special cases • XOR reg, reg: reset reg to zero • SUB reg, reg: • Clear the corresponding tag

  15. Exploit Detection • Also instrument instructions to detect exploits • Unsafe data cannot be used as a return address or the destination of an indirect jump instruction

  16. Protection of Tag Space and Code • It is necessary to protect them • To protect the LIFT code • Make the memory pages that store the LIFT code read-only • To protect the tag space • Turn off the access permission of the pages that store the tag values of the tag space itself • Any access of the original program or hijacked code to the tag space results in access to the corresponding tag and triggers a fault

  17. Optimizations • 47 times runtime overhead • Three binary optimizations

  18. Fast Path (FP): Motivation • Observation: for most server applications, majority of tag propagations are zero-to-zero • From safe data sources to a safe destination

  19. FP: Approach <1> • Before a code segment, insert a check • Check whether all its live-in and live-out registers and memory data are safe or not • If so, no need to do tracking inside the code segment • Run the fast binary version (check version) • If not, run the slow version (track version)

  20. FP: Approach <2> • Live-in: source operand • Live-out: may change to safe after the execution if they are unsafe before the execution • Others: • (a) not used in the code segment • (b) dead at the beginning or end of the code segment

  21. FP: More Technique Details • Difficult to know the address of all units at the beginning • Run the check version first • Postpone the check until the memory location is known • Jump to track version when the check fails • Granularity of code segments • Basic blocks • Hot trace • Remove unnecessary checks • Network processing component

  22. Merged Check (MC): Motivation • Temporal / Spatial Locality • A recently accessed data is likely to be accessed again in a near future • After an access to a location, memory locations that are nearby are also likely to be accessed again in near future • To combine multiple checks into one • Combine the temporally and spatially nearby checks

  23. Merged Check: Approach • Clustering the memory references into groups • Scan all the instructions and build a data dependency graph for each memory reference • Introduce version number to represent the timing attribute • Clustering based on spatially / temporally distance

  24. Fast Switch (FS) • When the program execution switches between the original binary code and the instrumented code it requires saving and restoring the context • Introduce large runtime overhead because they are inserted at many locations • Use cheaper instructions and remove unnecessary saves / restores

  25. Evaluation • Effectiveness • Performance

  26. Evaluation: Effectiveness

  27. Evaluation: Performance <1> • Throughput and response time of Apache • Throughput: 6.2% (StarDBT: 3.4%) • Time: 90.9%

  28. Evaluation: Performance <2> • SPEC2000: 3.6 times on average

  29. Conclusion • A “Practical” Information flow tracking system • Low-overhead • Not requiring hardware extension • Not requiring source code

  30. Discussions • Source-code instrumentation • 81% on average for CPU-intensive C-programs • 5% on average for IO-intensive (sever) program • If we are able to apply similar optimization techniques to source-code instrumentation, the performance could be “practical” • Binary-code instrumentation • CPU-bound: 24 times • Apache server: worst case 25 times, most cases: 5~10 times

  31. More Discussions • Focus on basic design and three optimizations • Not much details about the taint analysis • Evaluation • Effectiveness: false positive / false negative • Performance • IO-incentive vs. CPU-incentive • More benchmarks • Formal model to analyze taint analysis

More Related