270 likes | 417 Views
FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation. G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008. Motivation. Tainting Schemes extremely useful for security and debugging purposes Eg TaintCheck, PointerCheck
E N D
FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008
Motivation • Tainting Schemes extremely useful for security and debugging purposes • Eg TaintCheck, PointerCheck • Implemented in Software • Usually some kind of DBI • Extremely Versatile • Really Slow • Problems with Multithreaded Apps, JIT compilation, and self-modifying Code
Motivation • So, make hardware for it • Multiple examples: Raksha, Minos, etc • Fast • Can deal with strange codes that trouble S/W • Extensive modifications in the OoO core, caches, buses, memories required • Limit the state which can be manipulated, usually to a few bits, easily managed by H/W • So, who is going to implement it? • Solution: FlexiTaint • Use H/W to accelerate what the S/W is doing • Common Case Propagation, and metadata manipulation
FlexiTaint Overview RISC ISA
Metadata Management • Taint State 1..16 bits per word • 1-Level table in the application address space • Protected from the application • No need to widen buses, caches etc • L1-T cache for taint bits: 4 kB for 2-bit states • No changing L1-D, no port contention • Taint state shares L2
Metadata Management cont. • 2 Registers for that • MTBR: Memory Taint Base Register: start of the table • FTCR: FlexiTaint configuration Register: bits/word • Both must be saved on a context switch by the O/S • All loads/stores prefetch taint state to L1-T • State 0..0 is assumed to be a safe one • State can manipulated directly by special instructions • Must be added somehow after special events • Read a file, malloc, input purging etc
Taint Propagation • Takes place after the OoO core • Can be turned off and completely bypassed if unnecessary • The normal Commit becomes Pre-CoMmiT • A software handler receives 4 arguments: • OpCode, Reg1 State, Reg2 State, Mem State • And returns the output state and whether an exception should be raised • Handler address stored in TPCHR • Restricted access register
TPCache • The answer of the S/W handler for the same inputs will be the same • Cache it • 128 entry direct mapped response cache • Indexed by opcode, Reg1 state, Reg2 state, Mem State (folded in 7 bits) • Stores the Output State and Exception bit • Cleared every time the TPCHR (software handler address register) is changed • Usually on context switch
Taint Propagation cont. • Example: For instructions that do not touch memory • Remember RISC ISA Reserved for instructions that touch memory After the OoO core has ended. Size of the Architectural Register File, NOT the physical one State of Reg0 hardwired to 0 128-entry Direct Mapped Cleared when TPCHR changes ALARM!
Taint Propagation Cont. • Example: Stores Suppresses silent stores
Filter Taint Propagation Table • Still, TPCache lookups take 1 cycle • If dependent instructions were retired in the same cycle, the In Order taint propagation will stall • Pressure to the physical register file and ROB • Well, usually 0..00 is good, and when zeroes are combined, the result is 0..00 • Also, if only one Non-zero, then usually you have unary propagation • Create a table to store that
Filter Taint Propagation Table • Stores for each opcode (256) 2-bit value • 512 bits total, must be stored on context switch • Really fast lookups, allows for same-cycle propagation
FlexiTaint Implementation • 4 stage in order pipeline • Receives non-speculative instructions • First 2 stages: Look up • Filter TPT • L1-T • 3rd stage Taint Propagation • TPC Lookup • Or trivial propagation through Filter TPT • 4th stage commit
O/S interaction • Summary of what the O/S needs to store on context switches • TPCHR (handler address) • FTCR (state size) • MTBR (shadow state address) • Filter TPT content (64 bytes) • The TPCache can simply be discarded • All state in the address space of the application • So swapping, virtualization, etc normally
Multiprocessor Consistency • Data and Metadata accessed in 2 different cycles • Potential consistency issues • Solution for Loads: • Prefetch State when data address is resolved • If state does not hit in the L1-T a few cycles later, replay the load • Solution for Stores: • Prefetch State (same with load) • Write only when data/metadata both hit in the L1 • Usually L1-T is always a hit due to prefetch
Lifeguards • 1st: TaintCheck 1 bit state per word • Allows for maximum optimization 10 in the Filter TPT (unary propagation and zero optimization) • TPCache and S/W will consider XOR R1,R1,R1 cases • 2nd: 1-bit PointerCheck • Stores which words are valid heap pointers • Good for leak detection • And something that Raksha cannot handle • Filter TPT: 01 (non-pointers produce non-pointers) • 3rd: A Combination with 2-bit states • Filter TPT: 01 (untainted non-pointers produce untainted non-pointers)
Lifeguard Rules TaintCheck Rules 1-bit Heap PointerCheck
Simulation • SESC simulator • 8-core system • 4-issue OoO superscalar cores @ 2.93GHz • L1-D 32-Kbytes, 8-way set associative, dual ported, 64 byte blocks • L2 4MBytes 16-way set associate, single-ported, 64-byte blocks • Small for 8 core system • L1-T: 4 KB, 4-ways set associative, dual ported, 64-byte blocks • Bus 64-bits wide @ 1333 MHz
Performance overhead ~1% for SPEC 2K and 4% for Splash2 Splash 2 is worse due to false sharing of metadata
L1-T line Size Sensitivity Analysis Smaller Cache line → Less false sharing of Metadata
L1-T Size Sensitivity Analysis • For 4 KB ~1% overhead for SPEC 2k • 8 KB minimal gains • 2 KB 2.8% overhead • Conclusion: 4 KB is fine for 1 and 2 bit states
Raksha Simulation • Use FlexiTaint to simulate previously proposed hardware • And implement the lifeguard that they couldn’t handle (1-bit Heap PointerCheck) • Obviously FlexiTaint proves better
Conclusion • Versatile scheme to handle most lifeguards with low overhead • Nice idea to cache the answer of the software handler • In general, a good idea • With its limitation though (LockSet) • Questions?