Shimin Chen LBA Reading Group

Complete Information Flow Tracking from the Gates UpTiwari, Wassel, Mazloom, Mysore, Chong, Sherwood, UCSB, ASPLOS 2009 Shimin Chen LBA Reading Group

Introduction • In a traditional microprocessor, information is leaked practically everywhere and by everything • Can be a serious problem for exceptionally sensitive financial, military, and personal data • Cryptography, authentication • Developers in these domains are willing to go to remarkable lengths to minimize the amount of leaked information • flushing the cache before and after executing a piece of critical code (Osvik et al. 2006) • attempting to scrub the branch predictor state (Aciicmez et al. 2007) • normalizing the execution time of loops by hand (Kocher 1996) • randomizing or prioritizing the placement of data into the cache (Lee et al. 2005) • Previous works on DIFT are not adequate

GLIFT: Gate-Level Information-Flow Tracking • This paper: • presents a processor architecture and implementation • can track all information flows • A novel logic discipline: GLIFT logic • Augment arbitrary logic blocks with tracking logic • Make compositions of augmented blocks • Synthesizable processor implementation with a restricted ISA • Provably-sound information-flow tracking • Allow tasks such as public-key cryptography and message authentication

Theoretical Understanding • In a Turing-complete machine, the general problem of determining whether information flows in a program from variable x to variable y is undecidable: • “any procedure purported to decide it could be applied to the statement if f(x) halts then y := 0 and thus provide a solution to the halting problem for arbitrary recursive function” (Denning and Denning 1977). • The paper builds a machine: • by construction, will not allow unbounded execution • All hidden flows of information are made explicit

Outline • Introduction • Gate Level Information Flow Tracking • Architecture • Evaluation • Conclusions

Idea • Understand how information flows through primitive logic gates • Compose these gates together into more complex structures • Treat the whole processor as a logical function • Operates on a set of inputs • Results in a set of outputs • The trust of outputs should be determined based on the trust of inputs • Assumption: • Binary state: trusted (0) or untrusted (1)

GLIFT for an AND gate AND Gate AND GateTruth Table Partial truth table for the shadow logic Shadow logic for AND Gate

Composing Larger Functions • Use MUX as a simple example • The shadow logic can be composed from shadow logics of gates • Not minimum but always sound, for example, the two inputs to the OR gate cannot be both 1 • If S is trusted and the selected input is trusted, o is trusted • If S is untrusted, o is untrusted unless both a and b are trusted and are equal

Step 1: Handling Conditionals • Problem with conventional architecture • If X is untrusted, then PC becomes untrusted • Selected instruction becomes untrusted • Bits that select target register are untrusted • All of the registers may be marked as untrusted • Must keep PC trusted

Solution: Predication • All the instructions are executed • If predicate is 0, the instruction does not have effects: target register is not overwritten • PC is trusted • Predicates can become untrusted • Suppose P0 is untrusted

Example target • The line selecting R2 is untrusted • The other control lines are trusted • R2 will be marked untrusted no matter P0= 0 or 1 • End result: no matter the untrusted predicate is true or not, the destination is marked as untrusted.

Step 2: Handling Loops • Loops are hard • for (i=0; i<=X; i++) A[i]=1; • Information flow from X to A[X+1] • A[X+1]==0 tells us about X • Information flow from X to A[X+n] for all n • Implicit timing channel

Solution: Statically Specify Number of Iterations • countjump instruction: • Specify number of loop iterations • jump target address • Example (my understanding from the description) • Loop start address:…………countjump # iterations, loop start address • The first time countjump is encountered, the # iterations is loaded into an internal loop counter register • The loop counter register is decremented every time countjump is encountered, and PC  loop start address • When the register becomes 0, PC  PC + 1 • countjump cannot be predicated

Early Termination • In “C”, we have “break” statement that can terminate a loop early • Here, the paper proposes: • Predicate all the instructions in the loop with the termination condition • When the termination condition becomes true, the loop body does not have effects

Step 3: Constraining Loads and Stores • Indirect loads and stores are bad • e.g., M[reg]  value • If reg is untrusted, then essentially all the memory locations become untrusted • “Intuitively, the problem is that accessing one untrusted address causes every other address to become implicitly untrusted by virtue of them not being accessed or modified.” • Limit the ISA to only allow: • Direct load/store: addresses are immediate constants • Loop-relative addressing: load-looprel, store-looprel • e.g., load-looprel R0, 0x100, C0 • Loads M[0x100 + C0] • C0..C7 are counters: explicitly initialized by init-counter, and incremented by a fixed value w/ increment-counter • counter operations cannot be predicated

Proof-of-Concept Implementation • Verilog • Use Altera’s QuartusII software to synthesize it onto a Stratix II FPGA • 32-bit machine • 64KB Instruction memory, 64KB Data Memory • Registers: • A program counter • 8 general purpose registers • 2 predicate registers • 8 registers to store loop counters (that count down the number of iterations) • 8 other registers to store explicit array indices (used as offsets for load-looprel and store-looprel instructions). • No pipelining

Augment the Processor with GLIFT Logic • Each bit of processor state is explicitly shadowed: • every register gets a shadow register • every memory has a shadow RAM • The logic and signals are shadowed by generating the proper trust propagation logic

ISA

A code snippet from the SubBytes function in AES encryption algorithm Basically this is the following in “C”: for (i=0; i<16; i++) { state[i] = SBox[state[i]]; }

Hardware Impact Altera’s Nios is a commercial product: RISC instruction set, reasonably optimized Nios econ: unpipelined 6 stage core, without caches, branch-predictors etc. Nios std: pipelined, 4KB instruction cache GLIFT base: unpipelined, no tracking GLIFT full: GLIFT base + tracking

Hardware Impact 70 % area increase compared to GLIFT base Small frequency degradation: adding GLIFT tracking does not have big impact on the latency

Application Kernels • Dynamic instruction counts vary substantially • FSM and AES have a lot of table look-ups, which become full table iterations

Conclusions • Bigger, slower, harder to program, and computationally less powerful • For the first time, provides the ability to account for all information flows through the chip. • My learning: • Understanding deeper about information leaks • Efforts to prevent leaks are very significant • Sacrifice programmability: restrictions on loop, load/store • Proof-of-concept does not even talk about issues such as cache

Shimin Chen LBA Reading Group