1 / 25

FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation. G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008. Motivation. Tainting Schemes extremely useful for security and debugging purposes Eg TaintCheck, PointerCheck

bess
Download Presentation

FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008

  2. Motivation • Tainting Schemes extremely useful for security and debugging purposes • Eg TaintCheck, PointerCheck • Implemented in Software • Usually some kind of DBI •  Extremely Versatile •  Really Slow •  Problems with Multithreaded Apps, JIT compilation, and self-modifying Code

  3. Motivation • So, make hardware for it • Multiple examples: Raksha, Minos, etc •  Fast •  Can deal with strange codes that trouble S/W •  Extensive modifications in the OoO core, caches, buses, memories required •  Limit the state which can be manipulated, usually to a few bits, easily managed by H/W • So, who is going to implement it? • Solution: FlexiTaint • Use H/W to accelerate what the S/W is doing • Common Case Propagation, and metadata manipulation

  4. FlexiTaint Overview RISC ISA

  5. Metadata Management • Taint State 1..16 bits per word • 1-Level table in the application address space • Protected from the application • No need to widen buses, caches etc • L1-T cache for taint bits: 4 kB for 2-bit states • No changing L1-D, no port contention • Taint state shares L2

  6. Metadata Management cont. • 2 Registers for that • MTBR: Memory Taint Base Register: start of the table • FTCR: FlexiTaint configuration Register: bits/word • Both must be saved on a context switch by the O/S • All loads/stores prefetch taint state to L1-T • State 0..0 is assumed to be a safe one • State can manipulated directly by special instructions • Must be added somehow after special events • Read a file, malloc, input purging etc

  7. Taint Propagation • Takes place after the OoO core • Can be turned off and completely bypassed if unnecessary • The normal Commit becomes Pre-CoMmiT • A software handler receives 4 arguments: • OpCode, Reg1 State, Reg2 State, Mem State • And returns the output state and whether an exception should be raised • Handler address stored in TPCHR • Restricted access register

  8. TPCache • The answer of the S/W handler for the same inputs will be the same • Cache it • 128 entry direct mapped response cache • Indexed by opcode, Reg1 state, Reg2 state, Mem State (folded in 7 bits) • Stores the Output State and Exception bit • Cleared every time the TPCHR (software handler address register) is changed • Usually on context switch

  9. Taint Propagation cont. • Example: For instructions that do not touch memory • Remember RISC ISA Reserved for instructions that touch memory After the OoO core has ended. Size of the Architectural Register File, NOT the physical one State of Reg0 hardwired to 0 128-entry Direct Mapped Cleared when TPCHR changes ALARM!

  10. Taint Propagation Cont. • Example: Stores Suppresses silent stores

  11. Filter Taint Propagation Table • Still, TPCache lookups take 1 cycle • If dependent instructions were retired in the same cycle, the In Order taint propagation will stall • Pressure to the physical register file and ROB • Well, usually 0..00 is good, and when zeroes are combined, the result is 0..00 • Also, if only one Non-zero, then usually you have unary propagation • Create a table to store that

  12. Filter Taint Propagation Table • Stores for each opcode (256) 2-bit value • 512 bits total, must be stored on context switch • Really fast lookups, allows for same-cycle propagation

  13. FlexiTaint Implementation • 4 stage in order pipeline • Receives non-speculative instructions • First 2 stages: Look up • Filter TPT • L1-T • 3rd stage Taint Propagation • TPC Lookup • Or trivial propagation through Filter TPT • 4th stage commit

  14. O/S interaction • Summary of what the O/S needs to store on context switches • TPCHR (handler address) • FTCR (state size) • MTBR (shadow state address) • Filter TPT content (64 bytes) • The TPCache can simply be discarded • All state in the address space of the application • So swapping, virtualization, etc normally

  15. Multiprocessor Consistency • Data and Metadata accessed in 2 different cycles • Potential consistency issues • Solution for Loads: • Prefetch State when data address is resolved • If state does not hit in the L1-T a few cycles later, replay the load • Solution for Stores: • Prefetch State (same with load) • Write only when data/metadata both hit in the L1 • Usually L1-T is always a hit due to prefetch

  16. Lifeguards • 1st: TaintCheck 1 bit state per word • Allows for maximum optimization 10 in the Filter TPT (unary propagation and zero optimization) • TPCache and S/W will consider XOR R1,R1,R1 cases • 2nd: 1-bit PointerCheck • Stores which words are valid heap pointers • Good for leak detection • And something that Raksha cannot handle • Filter TPT: 01 (non-pointers produce non-pointers) • 3rd: A Combination with 2-bit states • Filter TPT: 01 (untainted non-pointers produce untainted non-pointers)

  17. Lifeguard Rules TaintCheck Rules 1-bit Heap PointerCheck

  18. Simulation • SESC simulator • 8-core system • 4-issue OoO superscalar cores @ 2.93GHz • L1-D 32-Kbytes, 8-way set associative, dual ported, 64 byte blocks • L2 4MBytes 16-way set associate, single-ported, 64-byte blocks • Small for 8 core system • L1-T: 4 KB, 4-ways set associative, dual ported, 64-byte blocks • Bus 64-bits wide @ 1333 MHz

  19. Performance overhead ~1% for SPEC 2K and 4% for Splash2 Splash 2 is worse due to false sharing of metadata

  20. FlexiTaint Optimization Performance

  21. Non-Silent Writes

  22. L1-T line Size Sensitivity Analysis Smaller Cache line → Less false sharing of Metadata

  23. L1-T Size Sensitivity Analysis • For 4 KB ~1% overhead for SPEC 2k • 8 KB minimal gains • 2 KB 2.8% overhead • Conclusion: 4 KB is fine for 1 and 2 bit states

  24. Raksha Simulation • Use FlexiTaint to simulate previously proposed hardware • And implement the lifeguard that they couldn’t handle (1-bit Heap PointerCheck) • Obviously FlexiTaint proves better

  25. Conclusion • Versatile scheme to handle most lifeguards with low overhead • Nice idea to cache the answer of the software handler • In general, a good idea • With its limitation though (LockSet) • Questions?

More Related